## Lecture: Diabetes
## Assigned Reading- Impact of neighborhood variables on Type 2 diabetes
## Assignment
Resources for Question 1:
Before you do this
regression, delete the entire row of data for
missing diabetes variable. Missing independent variables should be
set to 1, or imputed from the data. Prepare a logistic LASSO regression.
If using R set the hyper parameter to 1se. If using Python set the hyper
parameter so about 10 variables remain in the equation. List the variables that are
parents in Markov blanket of diabetes. : In indirect regressions, the missing independent variable should be replaced with 1 or imputed. If the response variable is missing then the entire row of data should be deleted. For each indirect regression (i.e., regressions where the response variable is not diabetes), start from the original data and drop the rows of variables where the response variable is missing. For example, make adjustments for missing values for regression predicting "nervous system" or "circulatory system" by starting from the original data so that you do not eliminate variables missing for one, as if they are missing for both. For indirect regression use the temporal analysis to select independent variables that precede the response variable. Thus, if we are predicting "external causes of injury," then use only those independent variables that precede it. Use the regressions to create the structure of the data. Remove cycles. Use the regressions to generate joint distribution of the data. Create a visual model of the data using Netica (if you have more than 15 variables and do not have license to Netica, you can take an image of the structure before saving it). Provide the image as the report of the structure. Generate the joint distribution of the direct predictors of each node using regression equation and report these in Excel tables. You can also report these as part of Netica table structures. Describe if social determinants of illness are direct, or indirect, causes of diabetes. Resources for Question 2 - Simulated de-identified data for model building Download►
- Simulated de-identified data for validation Download►
- Network model of factors that affect diabetes Web►
- Data on order of occurrence of pairs of systems Download►
- Jeanne Peck's Teach One Python►
- Python code for LASSO logistic regression with interaction terms and Pseudo R-squared calculations ChatGPT►
## MoreFor additional information (not part of the required reading), please see the following links: - Pearl's direct and indirect effects Read► Web Appendix►
- Saeed's lecture Video►
- Mediation analysis allowing for exposure-mediator interactions Read►
- Mediation analysis through stable weights Read►
- Practical guide to mediation analysis through inverse odds ratio Read► Slides►
- Mediation analysis revisited Read►
This page is part of the course on Comparative Effectiveness by Farrokh Alemi, PhD Home► Email► |
||||||||||||||||||||||||||||||||||||||||||||||||||