﻿ Social Determinants of Diabetes

# Take Home Exam 2024

## Procedures

• Provide Jupyter output and code in PDF format
• Provide all answers (include images of the network) in a single Word document

## Assignment

Question 1: What is your name, email, and phone number.

Question 2:  In the following data, examine the impact of a number of variables on BP.   The variables occur in the following temporal order:  DME, CL, P, H, LTH, PBD, RF, SNF, HHA, HOS, HO, OT, BP.

1. What are parents in the Markov Blanket of RF?
2. Regress all variables on preceding variables.  Use logistic LASSO regression if the variable is binary or use ordinary LASSO regression if the variable is continuous.  Display a summary table of these regressions showing R-squared, intercept, and regression coefficients.
3. Construct a network model using regressions. show each variable as a node. Draw arcs from preceding non-zero variables to response variables in the LASSO regressions.
4. Estimate probability of each node having a value above average for the node.  First set values above average to 1 and all other values to 0.  Then calculate the probability of each node.  Display the Table for predicting OT from joint distribution of its significant direct predictors. Display the probability of BP, when we have no information about the occurrence of other variables.
5. Create a counterfactual regression. Report the R squared, intercept, and coefficients for LTH on all preceding variables, but not on H.
6. Construct a counterfactual model where the impact of H is not mediated by LTH.  Display the network and the predicted probability of BP.
7. What is the causal impact of H on BP?  This analysis reports the causal impact as observed.
8. What is the causal impact of H on BP, using the counterfactual model, where the mediated impact of H through LTH has been removed?
9. What percent of impact of the H on BP is mediated through LTH?

Following resources are available

## Questions and Answers on This Exam

1. My question has to do with part of Question 2, where it says to regress on preceding variables. It also has additional information on what type of regression we should do. I wanted to ask why we are doing logistic LASSO regression on binary variables, if in our previous mediation assignment, we had binary variables present there, but we did ordinary regression then. If I have binary variables again for this dataset, can I still use ordinary regression as the review mentioned? If I need to do logistic lasso regression, are we expected to show everything that involves this such as training and testing sets.  Answer:  If you have binary response variable, then you must use logistic regression, ordinary regression will not be appropriate.  If you have binary independent variables, and you have a continuous response variable, then ordinary regression is appropriate.  When you do regressions, it is sufficient to show percent of variation explained (in case of logistic show McFadden R-square), coefficients, and intercept.
2. My question was for Part a, Question 2, it asks a question about parents. To determine this, regression needs to be done. Is it expected to run the regression prior to knowing the clause of whether it is binary or not, or do the regression by my choice? Answer:  Yes a regression on prior variables can identify parents in Markov Blanket of a node. Yes you need to use an appropriate regression, binary for binary response variable and ordinary for continuous response variable.
3. My question is about part d, Question 2, it asks about finding the probability for each node having a value above average for the node. Are you able to explain a little more what this means. I understand how to find the average, but I wanted more clarification on this question. Answer:  Network models are built on joint probability distribution of variables.  To obtain these joint distributions, it helps to revise the variables into binary or discrete variables.  One way to do so is to assign all values above average to be 1 and all values below it to be 0.  Once you do this the rest of analytical steps for network modeling using regressions can be done.
4. My question is for part f, Question 2, it asks to create a counterfactual model. Previously I revised my real model to conduct the Counterfactual, but are you expecting two Netica models to be submitted, with one being the real, and the other being the counterfactual. Answer: Yes, you need to create two network models, one based on regressions and the other based on artificially manipulated regression where mediating arcs are removed.  The manipulated network is referred to as counterfactual network.
5. It mentions we need to do work through Jupyter Notebook, but if Logistic Regressions are required, I understand how to do it using R, but are we allowed to use that programming tool for the questions? Answer:  You are not required to use Jupyter, but if you do, please have a PDF output.

This page is part of the course on Comparative Effectiveness by Farrokh Alemi, PhD Home► Email►