﻿ Causal Networks with Regression

# Lecture: Regression Networks

1. Using LASSO regression: Constructing causal networks through regression: A tutorial PubMed►
2. Using Poisson regression in networks: Read Chapter 13 in Statistical Analysis of Electronic Health Records by Farrokh Alemi, 2020
3. Set the sequence of variables
• Python and R code for LASSO regressions, temporal analysis, R-squared calculations, and other topics Zip►
4. Learn the network structure through LASSO regression
5. Estimate parameters of the network model
• Learning joint distribution of variables from regressions Slides►
6. Graphical LASSO
7. Vang's Python code for constructing Bayesian Network using LASSO regressions
8. Learning networks through regressions

## Assignment

Include in the first page a summary page.  In the summary page write statements comparing your work to answers given or videos.  For example, "I got the same answers as the Teach One video for question 1."

Question 1: This lecture shows how regressions can be used to identify a network structure.  The basic idea is that one regress a response variable (one of the nodes in the network) on all variables that precede it.  Thus, the independent variables are all preceding variables.  The statistically significant variables (in case of LASSO regression, the non-zero variables) in the regressions indicate the parents in the Markov Blanket of the variable.  And the collection of all parents in Markov Blanket identify the entire network structure.  Write out the set of equations that can identify the following network:

In each instance write all the variables that are in the regression equation.  These include the response (dependent) and the independent variable.   Mark with * the independent variables that have a statistically significant (non-zero) relationship with the response variable.  For example, LTH is regressed on all variables that precede it which are DME, CL, P and H.  But only P and H have a statistically significant relationship with LTH.  This regression can be shown as: LTH = a + b DME + c CL + d P* + e H*

Resources for Question 1:

Question 2: Construct a Bayesian probability network model that would predict success with antidepressants. A network model will include variables, and mediators of the effect of variables, on response to the antidepressant.   Include all baseline diagnoses and gender as covariates. Assume that gender occurs before baseline diagnoses. Baseline diagnoses occur before any treatment. Assume that antidepressant treatments occur before report of remission and in the following order:

Remission should be considered an end node.  Gender is a root node.  All other  variables, e.g. diagnoses, could be either root or intermediary nodes but all occur prior to use of antidepressant. The antidepressants that were given prior to an antidepressant should be used as a covariate.  The data has been modified to report per person data, without visit-based weekly data.

1. Identify the parents in the Markov blanket of citalopram using LASSO regression.  The response variable is citalopram (not CIT).  The independent variables are all variables that occur prior to citalopram: baseline diagnoses and gender. Rely on Lambda value of 1 standard error, 1se.
2. Identify parents in Markov blanket of remission through LASSO regression.  The response variable is remission.  The independent variables are all variables that occur prior to remission: gender, baseline diseases, and citalopram.  Evaluate the model at lambda of 1se, that is lambda.1se and not lambda.min.
3. Download Netica software. This software is free for use with networks with less than 15 nodes. Make a node for each variable in the two regressions you made in step (a) and (b).  The node should have exactly the same name as the variable in the data. Capitalization matters. Spelling matters. To make the variable display better, you can add a description that corrects for the lack of capitalization or replaces dash line with space. Nodes should have the same levels as the variable in the data.  In most cases, these levels are 0 and 1. You can enter a descriptive level to accompany the numerical level. Using Netica software draw a line from each independent variable in the two regressions to the response variable in the two regressions.
4. Using Netica software, estimate the parameters of the network you have created. Fit the data to the model you have created in Netica. Use Cases, Learn, and Incorporate Case File.  Once all cases have been incorporated use the thunderbolt sign to compile the model. This will set the tables within all nodes. The software will estimate the parameters for the model you have created. The following image shows how you can incorporate the case file into Netica. Keep in mind that nodes that have 50% change of being present or absent are likely to indicate a variable that did not match with the name in the data file.
5. Predict the effect of citalopram on remission for patients who have neurological disease and PTSD. This means that in Netica you select patients who have these two conditions and then you compare the probability of remission for patients treated and not treated with citalopram.

Resources for Question 2:

Question 3: Inside an electronic health record, there are data on outcomes of a particular intervention.  Using the network drawn below, write the equations that would allow you to estimate what would happen if the intervention was not given.  First, write an equation for each node in the network based on that variables that precede it and indicate the significant relations with an astric.

Resources for Question 3:

• The regression equation for predicting whether there is an adverse event is given by regressing Adverse Outcomes on all prior variables which are Severity, DNR, Treatment and Provider's decision.  The resulting equation will have 2 variables which have a statistically significant non-zero relation to outcome: Outcome = a + b Treatment* + c Severity* +d DNR + e Provider.

Question 4: The following graph was used to simulate data on bundling payment for total hip fracture treatment. Recover the original network from the data using LASSO regression and calculate the causal impact of H on BP using Netica.

Resources for Question 4:

Question 5: The following data show for how many patients one disease occurs before another.  For example, for 21 patients D occurs before N and for 24 patients the reverse order occurs. There is no patient in our sample where both U and M have occurred in the same person.  Which of the following statements are correct, if we consider the relationship between these two diseases and the rest of the diseases?

1. U occurs before M, when and if they both occur in the same patient
2. M occurs before U, when and if they both occur in the same patient
3. There is not sufficient information to know the temporal order of U and M
 D M N U V D 7 21 7 0 M 52 34 7 N 24 16 7 30 U 38 5 14 V 45 27 3 52

Resources for Question 5:

Question 6:  Draw a network that explains variation in Y from its direct, and indirect, causes.  Draw the network based on the following 5 regressions. Statistically significant variables are indicated by a circle and Y occurs after all other variables. Information is not available on whether Xs occur after one another. Make sure that you remove cycles. When an interaction of two variables is significant, treat it as if both variables are significant.  R squared less than 5% and coefficients less than or equal to 0.05 can be ignored.

Resources available for question 6