## Lecture: Regression Networks
## Assigned Reading- Using LASSO regression: Constructing causal networks through regression: A tutorial PubMed►
- Using Poisson regression in networks: Read Chapter 13 in Statistical Analysis of Electronic Health Records by Farrokh Alemi, 2020
- Set the sequence of variables
- Python and R code for LASSO regressions, temporal analysis, R-squared calculations, and other topics Zip►
- Learn the network structure through LASSO regression
- Comparison of multiple regression and network models Slides► Video► YouTube►
- Guide to LASSO Read► Slides► YouTube►
- Combining several LASSO regressions into a single network
- Chain of regressions, removing cycles, network structure, estimating joint distributions Slides►
- Ghaida's Teach One YouTube► Slide Show►
- Python code for LASSO Logistic Regression with interaction terms ChatGPT►
- Learning network structure from regressions Slides►
- Estimate parameters of the network model
- Learning joint distribution of variables from regressions Slides►
- Graphical LASSO
- Vang's Python code for constructing Bayesian Network using LASSO regressions
- Code Python►
- Learning networks through regressions
## AssignmentInclude in the first page a summary page. In the summary page write statements comparing your work to answers given or videos. For example, "I got the same answers as the Teach One video for question 1."
In each instance write all the variables that are in the regression equation. These include the response (dependent) and the independent variable. Mark with * the independent variables that have a statistically significant (non-zero) relationship with the response variable. For example, LTH is regressed on all variables that precede it which are DME, CL, P and H. But only P and H have a statistically significant relationship with LTH. This regression can be shown as: LTH = a + b DME + c CL + d P* + e H* Resources for Question 1:
Remission should be considered an end node. Gender is a
root node. All other variables, e.g. diagnoses, could be either
root or intermediary nodes but all occur prior to use of
antidepressant. The antidepressants that were given prior to an
antidepressant should be used as a covariate. The data has been
modified to report per person data, without visit-based weekly data. - Identify the parents in the Markov blanket of citalopram using LASSO regression. The response variable is citalopram (not CIT). The independent variables are all variables that occur prior to citalopram: baseline diagnoses and gender. Rely on Lambda value of 1 standard error, 1se.
- Identify parents in Markov blanket of remission through LASSO regression. The response variable is remission. The independent variables are all variables that occur prior to remission: gender, baseline diseases, and citalopram. Evaluate the model at lambda of 1se, that is lambda.1se and not lambda.min.
- Download Netica software. This software is free for use with networks with less than 15 nodes. Make a node for each variable in the two regressions you made in step (a) and (b). The node should have exactly the same name as the variable in the data. Capitalization matters. Spelling matters. To make the variable display better, you can add a description that corrects for the lack of capitalization or replaces dash line with space. Nodes should have the same levels as the variable in the data. In most cases, these levels are 0 and 1. You can enter a descriptive level to accompany the numerical level. Using Netica software draw a line from each independent variable in the two regressions to the response variable in the two regressions.
- Using Netica software, estimate the parameters of the network you have created. Fit the data to the model you have created in Netica. Use Cases, Learn, and Incorporate Case File. Once all cases have been incorporated use the thunderbolt sign to compile the model. This will set the tables within all nodes. The software will estimate the parameters for the model you have created. The following image shows how you can incorporate the case file into Netica. Keep in mind that nodes that have 50% change of being present or absent are likely to indicate a variable that did not match with the name in the data file.
- Predict the effect of citalopram on remission for patients who have neurological disease and PTSD. This means that in Netica you select patients who have these two conditions and then you compare the probability of remission for patients treated and not treated with citalopram.
Resources for Question 2: - Data Download►
- Impact of Lambda on network structure Read►
- How to fit data to Netica Menu Options►
- Teach One for 2a Sankeerthi's Lasso Regression► Simple LASSO►
- Chris Miller's Teach One R Code►
- Solutions to Netica components Read►
Resources for Question 3: - The regression equation for predicting whether there is an adverse event is given by regressing Adverse Outcomes on all prior variables which are Severity, DNR, Treatment and Provider's decision. The resulting equation will have 2 variables which have a statistically significant non-zero relation to outcome: Outcome = a + b Treatment* + c Severity* +d DNR + e Provider.
- Velosky"s Teach One YouTube►
Resources for Question 4: - Data Download►
- Joanne Min's Teach One YouTube► Code►
- Answer Network Image►
- Chandana Dasarraju Teach One Slides►
- U occurs before M, when and if they both occur in the same patient
- M occurs before U, when and if they both occur in the same patient
- There is not sufficient information to know the temporal order of U and M
Resources for Question 5:
Resources available for question 6 - Chain of regressions, removing cycles, network structure, estimating joint distributions Slides►
- Ghaida's Teach One YouTube► Slide Show►
## MoreFor additional information (not part of the required reading), please see the following links: - Learning Bayesian networks from correlated data Read►
- Comparison of Bayesian network and logistic models Read►
This page is part of the course on Causal Analysis by Farrokh Alemi, Ph.D. Course Home► Email► |