Take Home Exam 2024
Procedures
- Submit your answers in Blackboard
- Provide Jupyter output and code in PDF format
-
Provide all answers (include images of the network) in a single Word document
Assignment
Question 1: What is your name, email, and phone number.
Question 2: In the following data, examine the impact of
a number of variables on BP. The variables occur in
the following temporal order: DME, CL, P, H, LTH, PBD, RF, SNF, HHA,
HOS, HO, OT, BP.
- What are parents in the Markov Blanket of RF?
- Regress all variables on preceding variables. Use logistic
LASSO regression if the variable is binary or use ordinary LASSO
regression if the variable is continuous. Display a
summary table of these regressions showing R-squared,
intercept, and regression coefficients.
- Construct a network model using
regressions. show each variable as a node. Draw arcs from preceding
non-zero variables to response variables in the LASSO regressions.
- Estimate probability of each
node having a value above average for the node. First set values
above average to 1 and all other values to 0. Then calculate the
probability of each node. Display the Table for predicting OT
from joint distribution of its significant direct predictors. Display the
probability of BP, when we have no information about the occurrence of
other variables.
- Create a counterfactual regression. Report the R squared, intercept, and coefficients for LTH on all preceding variables, but not on H.
- Construct a counterfactual model where the impact of H is not mediated by LTH. Display the
network and the predicted probability of BP.
- What is the causal impact of H on BP? This
analysis reports the causal impact as observed.
- What is the causal impact of H on BP, using the
counterfactual model, where the mediated impact of H
through LTH has been removed?
- What percent of impact of the H on BP is mediated
through LTH?
Following resources are available
Questions and Answers on This Exam
- My question has to do with part of Question 2, where it says to regress on preceding variables. It also has additional information on
what type of regression we should do. I wanted to ask why we are doing logistic LASSO regression on binary variables, if in our previous
mediation assignment, we had binary variables present there, but we did ordinary regression then. If I have binary variables again for
this dataset, can I still use ordinary regression as the review mentioned? If I need to do logistic lasso regression, are we expected
to show everything that involves this such as training and testing sets. Answer: If you have binary response
variable, then you must use logistic regression, ordinary regression will not be appropriate. If you have binary independent variables,
and you have a continuous response variable, then ordinary regression is appropriate. When you do regressions, it is sufficient to show
percent of variation explained (in case of logistic show McFadden R-square), coefficients, and intercept.
- My question was for Part a, Question 2, it asks a question about parents. To determine this, regression needs to be done. Is it
expected to run the regression prior to knowing the clause of whether it is binary or not, or do the regression by my choice? Answer:
Yes a regression on prior variables can identify parents in Markov Blanket of a node. Yes you need to use an appropriate regression,
binary for binary response variable and ordinary for continuous response variable.
- My question is about part d, Question 2, it asks about finding the probability for each node having a value above average for the node.
Are you able to explain a little more what this means. I understand how to find the average, but I wanted more clarification on this
question. Answer: Network models are built on joint probability distribution of variables. To obtain these
joint distributions, it helps to revise the variables into binary or discrete variables. One way to do so is to assign all values
above average to be 1 and all values below it to be 0. Once you do this the rest of analytical steps for network modeling using regressions can be done.
- My question is for part f, Question 2, it asks to create a counterfactual model. Previously I revised my real model to conduct
the Counterfactual, but are you expecting two Netica models to be submitted, with one being the real, and the other being the
counterfactual. Answer: Yes, you need to create two network models, one based on regressions and the other based on
artificially manipulated regression where mediating arcs are removed. The manipulated network is referred to as counterfactual network.
- It mentions we need to do work through Jupyter Notebook, but if Logistic Regressions are required, I understand how to do it using R,
but are we allowed to use that programming tool for the questions? Answer: You are not required to use Jupyter,
but if you do, please have a PDF output.
This page is part of the course on Comparative Effectiveness by Farrokh Alemi, PhD Home►
Email►
|