Question 1: The attached data show the percent of diabetes in different 2,228 counties within
United States in 2010, 2011, and 2012 years. We want to
understand if access to food stores affects diabetes. Create the network model,
using data from repeated LASSO regressions. The
first regression will be diabetes in 2012 on all 2011 variables.
Other LASSO regressions will have as response/dependent variable the
statistically significant variables in the previous regression
regressed on all 2010 variables. Draw the network model using Netica.
the parents in Markov blanket of diabetes in 2012; calculate the impact of access
to quality food stores in 2011 on diabetes using stratified
Instruction for SCB►
New SCB Code►
Sean's Teach One►
The following shows one possible model and not necessarily the model
you will construct with your data. This model was organized without
race and education levels higher than 1.
Question 2: What are causes of diabetes? Using LASSO
regression construct a causal network for explaining variation in
incidence of diabetes. In the attached data, the dependent variable
is incidence of diabetes. This variable is calculated after all
other variables. There are 21 independent variables that may cause
diabetes as listed below:
These variables were constructed so that for each patient the likelihood ratio associated with the worst diagnosis of the patient within the variable is listed. Construct a causal network and describe if social determinants of illness are direct, or indirect, causes of diabetes. Construct your models using the training data set and cross-validate using the validation data set.
Resources for Question 2
For additional information (not part of the required reading), please see the following links: