 # Lecture: Causal Networks

2. Introduction to Causal Networks Read►
3. Learning network models
4. More

## Assignment

For this assignment you can use any statistical package, including R, SAS, and SPSS.  Your instructor is familiar with Netica and BayesiaLab.  R packages are also used often.  OpenBUGS and Gibbs Sampler, Stan, OpenMarkov, and Direct Graphical Model are open source software.  Netica is free for networks less than 15 nodes. A more complete list is available in Wikipedia under "Bayesian Networks."  OpenBUGS► Stan► Direct Graphical Models► OpenMarkov► Graphical Models Toolkit► PyMC► Genie Smile► SamIam► Bayes Server► AIspace► BayesiaLab► Hugin► AgenaRisk► dVelox► System Modeler► UnBBayes► Uninet► Tetrad► Dezide► Netica►

Work on this assignment can be done in group's of two students but you cannot work with a student that you have previously teamed up with.

1. Draw networks based on the following independence assumptions.  When directed networks are possible, give formulas for predicting the last variable in the networks from marginal and pair-wise conditional probabilities.  Keep in mind that absence of independence assumption implies dependence. Review►

 Nodes in Network Assumption X, Y, Z I(X,Y) X, Y, Z I(X,Y), Not I(X,Y|Z) X, Y, Z I(X,Y), I(X,Y|Z), Y measured last X, Y, Z, W I(X,Y), I(X,Y|Z), I({X,Y},W|Z), W measured last X, Y, Z, W I(X,Y), I(Z,W), and X measured before Z and Y measured before W

Wang's Teach One Video►

2. Construct a Bayesian probability network model that would predict success with antidepressants. A network model will include variables, and mediators of the effect of variables, on response to the antidepressant.   Include all baseline diagnoses and gender as covariates.  Include previous antidepressants as covariates.  To understand the sequence of trying antidepressants, examine the following figure: Calculate remission if the patient was on the antidepressant. Remission or relapse should be considered an end node.  Gender is a root node.  All other  variables, e.g. diagnoses, could be either root or intermediary nodes but all occur prior to use of antidepressant. The antidepressants that were given prior to an antidepressant should be used as a covariate.  The data has been modified to report per person data, without visit-based information. Data►
(a) Identify the parents in the Markov blanket of citalopram using bnLearn software in R and a constraint based algorithm such as Grow Shrink.  All baseline diseases occur prior to treatment with citalopram.
(b) Identify the parents in Markov blanket of citalopram using regression analysis.  Amr's Regression Code►
(c) Identify impact of citalopram on response using stratified covariate balancing
(d) Using the network model, predict the response to citalopram for a patient with neurological disorder, and PTSD.
(e) Using SQL, predict response to citalopram for a patient with PTSD and neurological disorders, based on the nearest strata. SQL for Similar Strata►

3. Write an SQL code to calculate the probability of negative outcome in the situation where the patient is severely ill and has not signed a "Do Not Resuscitate" (DNR) order.  Note that probabilities for events that are mutually exclusive and exhaustive should add up to one. Data► Bushra's Teach One► Anto's Teach One► Slides► SQL► 4. Redo problem 3 in Netica or other software and verify the accuracy of your answer.  To accomplish this project organize the 4 node network inside Netica and direct the links between the nodes as in the graph structure.  Then for every node, enter the table of probabilities as per tables given in Question 3.  For example, for the DNR node enter the two probabilities of 0.1 and 0.9 into the Table within the node for DNR.  Once the entire network (the graph and the related probabilities) has been entered into Netica, evaluate the risks for a patient who is severely ill and has not signed a "Do Not Resuscitate" order.     Netica► Shruti's Teach One► Usman's Teach One►
5. The attached data show the percent of diabetes in different 2,228 counties within United States in 2010, 2011, and 2012 years. We want to understand if access to food stores affects diabetes. Create the network model, using data from 7 LASSO regressions.  The first regression will be diabetes in 2012 on all 2011 variables.  Six other LASSO regressions will have as dependent variable the statistically significant variables in the previous regression regressed on all 2010 variables. Draw the network model using Netica.  Stratify the parents in Markov blanket of diabetes in 2012; calculate the impact of access to quality food stores in 2011 on diabetes using stratified covariate balancing.  Data► Instruction for SCB► New SCB Code► Netica► Answer► 6. Inside an electronic health record, there are data on outcomes of a particular intervention.  Using the network drawn below, write the equations that would allow you to estimate what would happen if the intervention was not given.  First, write an equation for each node in the network based on variables that precede it.  For example, the regression equation for predicting whether there is an adverse event is given by the equation:

Outcome = a + b Treatment + c Severity

Second, set the variables that change across these equations to the relevant values.  For example, set Treatment to be zero. 7. The following graph was used to simulate data on bundling payment for total hip fracture treatment: • Recover the original network and calculate the causal impact of H on BP.  Data► R Code► Detail R Code►
• If you were using logistic or ordinary regression equations, write what set of equations are represented by the above network.  In each instance write all the variables that are in the regression equation and the variables that have a statistically significant relationship with the response variable.  For example, LTH is regressed on all variables that precede it which are DME, CL, P and H.  But only P and H have a statistically significant relationship with LTH.  This regression can be shown as:

LTH = a + b DME + c CL + d P* + e H*

In the above equation, the statistically significant relationships are shown with a star (*).  A missing star indicates an insignificant relation.  Using the data, estimates the parameters of each of the regressions.  Can these set of equations be used to create the network.  In how many ways does the regression equations differ from the network model in the graph.
8. Does poor access to quality food stores affects diabetes?  The following data show rate of diabetes in several geographic areas with different levels of access.  The unit of analysis is the geographic area.  For each area the percent of presence of different variables are included in the data.  The data are collected over 5 years and include the following variables: county_fips, state, id, age, race, gender, income, year, marital_status, cellphone, insured, edu, cty_year, adults_in_hh, smoke, diabetes, bmi, unemployment, active_commuting, food_stores, restaurants, complete_case, z_ue, z_ac, z_food, z_rest, z_ue_mean, z_ac_mean, z_food_mean, z_rest_mean, z_ue_diff, z_ac_diff, z_food_diff, z_rest_diff, and region.   Fit a network model to the data, where rate of diabetes in the 5th year is the dependent variable of interest.  All other variables are measured in the 4th year.  Data►

A number of county-level analysis are published, including: More on BMI► More on depression► More on mortality► More on food outlets► More on pollution►

## More

2. Meta analysis through Bayesian networks Read►
3. Introduction to Bayesian networks Read►