Lecture: Causal Networks  


Assigned Reading

  1. Session overview YouTube► See also YouTube►
  2. Introduction to Causal Networks Read►
  3. Learning network models
  4. More


Submit one file for all questions.  Include all charts, code, and output in the same file.  Start each question in a separate page or sheet. Include in the first page a summary page.  In the summary page write statements comparing your work to answers given or videos.  For example, "I got the same answers as the Teach One video for question 1." 

For this assignment you can use any statistical package, including R, SAS, and SPSS.  Your instructor is familiar with Netica and BayesiaLab.  R packages are also used often.  OpenBUGS and Gibbs Sampler, Stan, OpenMarkov, and Direct Graphical Model are open source software.  Netica is free for networks less than 15 nodes. A more complete list is available in Wikipedia under "Bayesian Networks."  OpenBUGS► Stan► Direct Graphical Models► OpenMarkov► Graphical Models Toolkit► PyMC► Genie Smile► SamIam► Bayes Server► AIspace► BayesiaLab► Hugin► AgenaRisk► dVelox► System Modeler► UnBBayes► Uninet► Tetrad► Dezide► Netica►

Question 1: Draw networks based on the following independence assumptions.  When directed networks are possible, give formulas for predicting the last variable in the networks from marginal and pair-wise conditional probabilities.  Keep in mind that absence of independence assumption implies dependence. Review►Wang's Teach One►

Nodes in Network Assumption
X, Y, Z I(X,Y)
X, Y, Z I(X,Y), Not I(X,Y|Z)
X, Y, Z I(X,Y), I(X,Y|Z), Y measured last
X, Y, Z, W I(X,Y), I(X,Y|Z), I({X,Y},W|Z), W measured last
X, Y, Z, W I(X,Y), I(Z,W), and X measured before Z and Y measured before W

Question 2: Write an SQL code to calculate the probability of negative outcome in the situation where the patient is severely ill and has not signed a "Do Not Resuscitate" (DNR) order.  Note that probabilities for events that are mutually exclusive and exhaustive should add up to one. Data► Bushra's Teach One► Anto's Teach One► Slides► SQL►

Question 3: Redo problem 2 in Netica or other software and verify the accuracy of your answer.  To accomplish this, organize the 4-node network inside Netica and direct the links between the nodes, as in the graph structure in question 3.  Then for every node, enter the table of probabilities as per tables given in Question 3.  For example, for the DNR node enter the two probabilities of 0.1 and 0.9 into the Table within the node for DNR.  Once the entire network (the graph and the related probabilities) has been entered into Netica, evaluate the risks for a patient who is severely ill and has not signed a "Do Not Resuscitate" order. Netica► Shruti's Teach One► Usman's Teach One►

Question 4: If you were using logistic or ordinary regression equations, write what set of equations are represented by the following network:

Bundled payments for total hip fracture

 In each instance write all the variables that are in the regression equation and mark with * the variables that have a statistically significant relationship with the response variable.  For example, LTH is regressed on all variables that precede it which are DME, CL, P and H.  But only P and H have a statistically significant relationship with LTH.  This regression can be shown as:

LTH = a + b DME + c CL + d P* + e H*

Can these set of equations be used to create the network?  If you had a list of regression equations, how would you display the network model that follows from them?

Question 5: Construct a Bayesian probability network model that would predict success with antidepressants. A network model will include variables, and mediators of the effect of variables, on response to the antidepressant.   Include all baseline diagnoses and gender as covariates. Assume that gender occurs before baseline diagnoses. Baseline diagnoses occur before any treatment. Assume that antidepressant treatments occur before report of remission and in the following order:

Star*D Medications

Remission should be considered an end node.  Gender is a root node.  All other  variables, e.g. diagnoses, could be either root or intermediary nodes but all occur prior to use of antidepressant. The antidepressants that were given prior to an antidepressant should be used as a covariate.  The data has been modified to report per person data, without visit-based weekly data. Data►

  • (a) Identify the parents in the Markov blanket of citalopram using LASSO regression.  The response variable is citalopram (not CIT).  The independent variables are all variables that occur prior to citalopram: baseline diagnoses and gender. If the minimum Lambda parameter introduces too many variables, simplify your work by relying on Lambda value of 1 standard error, 1se.  Sankeerthi's Lasso Regression► Simple LASSO►
  • (b) Identify parents in Markov blanket of remission through LASSO regression.  The response variable is remission.  The independent variables are all variables that occur prior to remission: gender, baseline diseases, and citalopram.  Evaluate the model at lambda of 1se, that is lambda.1se and not lambda.min.
  • Download Netica software. This software is free for use with networks with less than 15 nodes. Make a node for each variable in the two regressions you made in step (a) and (b).  The node should have exactly the same name as the variable in the data. Capitalization matters. Spelling matters. To make the variable display better, you can add a description that corrects for the lack of capitalization or replaces dash line with space. Nodes should have the same levels as the variable in the data.  In most cases, these levels are 0 and 1. You can enter a descriptive level to accompany the numerical level. Using Netica software draw a line from each independent variable in the two regressions to the response variable in the two regressions.  
  • (d) Using Netica software, estimate the parameters of the network you have created. Fit the data to the model you have created in Netica. Use Cases, Learn, and Incorporate Case File.  Once all cases have been incorporated use the thunderbolt sign to compile the model. This will set the tables within all nodes. The software will estimate the parameters for the model you have created. The following image shows how you can incorporate the case file into Netica. Keep in mind that nodes that have 50% change of being present or absent are likely to indicate a variable that did not match with the name in the data file.

The following two images show two networks derived at different levels of Lambda, one at "lambda.min" and another at "lambda.1se". The network on the left shows that citalopram has an impact on remission.  The network on the right shows that it does not, i.e. there is no direct line connecting citalopram to remission.

Citalopram remission

  • (e) Predict the effect of citalopram on remission for patients who have neurological disease and PTSD. This means that in Netica you select patients who have these two conditions and then you compare the probability of remission for patients treated and not treated with citalopram.  Above, two networks are presented. The network to the right also shows how to calculate the probability of remission, when the patient has PTSD and Neurological diseases.  Notice how these two nodes are set to 100%. You would need to toggle these variables between 0% to 100% to see how the probability of remission changes.

Question 6: Using the data provided and Netica software, construct a network model of COVID-19, Influenza, and other upper respiratory infections symptoms. Create a node at the center with three levels.  Web Calculator► Background► Data► Data (no missing values)► Ghaida Alsadah's Teach One►

  • First, LASSO regress the disease variable on all symptoms, age and gender. This regression will identify variables in the Markov blanket of the disease.
  • Second, identify parents and children in the Markov Blanket of the disease variable.  Symptoms, by definition, occur after the disease and demographic variables occur prior to the disease.  Using Netica software, draw an arrow from the disease variable to symptoms that were statistically significant in the LASSO regression.  Similarly, draw an arrow from demographic variables to the disease, if the demographic variable was a significant predictor in the LASSO regression. 
  • Third, use each symptom that is statistically significant in first step as a response variable for a new LASSO regression.  The independent variables in these regressions are the disease variable, other symptoms, and gender. Using Netica, draw an arrow from the variables that are statistically significant to the symptom used as response variable. Repeat for other symptoms as response variable. Here is a sample of the results of these regressions.  Note that findings above 0.15 or below -0.15 are listed, the choice of 0.15 is arbitrary focus on large magnitude associations. The relationships from age and gender to the symptoms are not listed as all were not significant. Relationships that can create a cycle are crossed out.

      Aches Chest Pain Chills Red Eye Cough Diarrhea Fatigue Fever Head- ache Nausea Runny Nose Short Breath Vomit Wheeze
    Aches     0.24   0.17   0.37   0.22          
    Chest Pain                       0.17    
    Chills 0.18     0.41         0.16          
    Red Eye     0.40                      
    Cough               0.27     0.17      
    Fatigue 0.36               0.18   0.18      
    Fever         0.21                  
    Headache 0.21   0.21       0.18       0.16      
    Nausea                         0.54  
    Runny Nose       -0.16 0.21                 0.15
    Short Breath   0.28                       0.36
    Vomit           0.15       0.69        
    Wheeze                     0.21 0.38    

  • Fourth, enable Netica software to learn the parameters of the model.
  • Fifth, report the probability of COVID-19 in a patient with fever, cough and runny nose and unknown other symptoms.  Set these three nodes to symptom being present and read the network probability for COVID-19.

TAN model for COVID-19 Symptoms

Question 7: The attached data show the percent of diabetes in different 2,228 counties within United States in 2010, 2011, and 2012 years. We want to understand if access to food stores affects diabetes. Create the network model, using data from repeated LASSO regressions.  The first regression will be diabetes in 2012 on all 2011 variables.  Other LASSO regressions will have as response/dependent variable the statistically significant variables in the previous regression regressed on all 2010 variables. Draw the network model using Netica.  Stratify the parents in Markov blanket of diabetes in 2012; calculate the impact of access to quality food stores in 2011 on diabetes using stratified covariate balancing.  Data► Instruction for SCB► New SCB Code► Netica► Answer► Sean's Teach One►

The following shows one possible model and not necessarily the model you will construct with your data.  This model was organized without race and education levels higher than 1.
Network Model of food access and diabetes

Question 8: Inside an electronic health record, there are data on outcomes of a particular intervention.  Using the network drawn below, write the equations that would allow you to estimate what would happen if the intervention was not given.  First, write an equation for each node in the network based on variables that precede it.  For example, the regression equation for predicting whether there is an adverse event is given by the equation:

Outcome = a + b Treatment + c Severity

Second, set the variables that change across these equations to the relevant values.  For example, set Treatment to be zero.  Velosky's Teach One►

Question 8: The following graph was used to simulate data on bundling payment for total hip fracture treatment:
Bundled payments for total hip fracture

Recover the original network using LASSO regression and calculate the causal impact of H on BP using Netica.  Data► Joanne Min's Teach One► Code►

Recovered Simulated Network



For additional information (not part of the required reading), please see the following links:/p>

  1. Introduction to causal inference Read 1► Read 2► Video► Slides►
  2. Meta analysis through Bayesian networks Read►
  3. Introduction to Bayesian networks Read►
  4. Learning Bayesian Networks Read►
  5. Selection of Judea Pearl's articles PubMed►
  6. Applications of Bayesian networks in healthcare PubMed►
  7. Use of graphs in removing confounding Read►
  8. Learning Bayesian networks from correlated data Read►
  9. Bayesian networks in neuroscience Read►
  10. Cost analysis using Bayesian networks Read►
  11. Comparison of Bayesian network and logistic models Read►
  12. Bayesian network classifiers Read►
  13. Introduction to Markov process Tim's Lecture►
  14. Explanation of predictions Aloudah's Lecture►
  15. Outcome based prescribing for citalopram Slides►

This page is part of the course on Comparative Effectiveness by Farrokh Alemi, PhD Home► Email►