# Lecture: Counterfactual Framework

2. Causal impact, d-separation and backdoors Slides►
4. Example of back door criterion Read►
5. Minimizing stratification through backdoor criterion Read► Slides►
6. Network analysis using Grow Shrink & Hiton & Sequence R code► Slides► Soylu's Video►
7. Network analysis using Poisson regression Read► Dispersion►
9. Impact of sequence on accuracy of network learning algorithms Read►

## Assignment

For this assignment you can use any statistical package.  Work can be done in group's of two students but you cannot work with a student that you have previously teamed up with.

Question 1: Inside an electronic health record, there are data on outcomes of a particular intervention.  Using the network drawn below, write the equations that would allow you to estimate what would happen if the intervention was not given.  First, write an equation for each node in the network based on variables that precede it.  For example, the regression equation for predicting whether there is an adverse event is given by the equation:

Outcome = a + b Treatment + c Severity

Second, set the variables that change across these equations to the relevant values.  For example, set Treatment to be zero.

Question 2: The following graph was used to simulate data on bundling payment for total hip fracture treatment:

• Recover the original network and calculate the causal impact of H on BP.  Data► R Code► Detail R Code►

• If you were using logistic or ordinary regression equations, write what set of equations are represented by the above network.  In each instance write all the variables that are in the regression equation and the variables that have a statistically significant relationship with the response variable.  For example, LTH is regressed on all variables that precede it which are DME, CL, P and H.  But only P and H have a statistically significant relationship with LTH.  This regression can be shown as:

LTH = a + b DME + c CL + d P* + e H*

In the above equation, the statistically significant relationships are shown with a star (*).  A missing star indicates an insignificant relation.  Using the data, estimates the parameters of each of the regressions.  Can these set of equations be used to create the network.  In how many ways does the regression equations differ from the network model in the graph.

Question 3: Construct a decision aid for selection among antidepressants.

3. Repeat the following analysis for at least 5 antidepressant(s).  Separate analysis must be done for each antidepressant or antidepressant combination (shown in variable CONCAT).
• Create data sets for each antidepressant(s) combination.  This data sets will include a patient several times, if the patient received different combinations of antidepressants over time.  Group By Concat and ID variable to remove the weekly data. If the patient received the antidepressant(s) combination, assign it a value of 1 and otherwise, when they received other combination of antidepressant(s) assign it a value of 0.
• Identify the parents in the Markov Blanket of each antidepressant.  You can use logistic regression to do this.  For each antidepressant use all variables that precede it as independent variables in the regression.  Use the variables that are significant predictors of the antidepressant as the parents in the Markov Blanket of the antidepressant.
• Stratify treatment and exclude from the list of parents in the Markov Blanket any variable not related to remission (measured as referred to follow-up).  Calculate the unconfounded impact of antidepressant(s) on remission.  Stratify the remaining variables in the parents in the Markov Blanket of treatment and calculate the impact of antidepressant(s) on remission.
4. Evaluate for a patient with PTSD and neurological disorders which of the 5 antidepressant(s) combination is most likely to lead to remission.