Lecture: Mediation Analysis  

 

Overview

Constructing probability models that if we act on the advice of the model we see the expected model results Video►

Objectives

  • Create a counterfactual model of the data
  • Estimate mediation effect in real and counterfactual models of the universe

Assigned Reading

  • Tutorial on using regression for network construction Read► (Use instructor's last name as password)
  • Read about "Graphical Representation of Counterfactuals" in "Causal Inference in Statistics" pages 101-107
  • Calculating mediated effects through counterfactual models Slides► Narrated Slides► Video► YouTube►
  • Mediation Analysis:
    1. Assume the following temporal order of variables: Covariate C1 first, Covariate C2 second, Exposure X third, Mediator M fourth, and Outcome Y last.
    2. Learn network structure through chain of regressions Slides► Video► YouTube►
    3. Learn network parameters and joint distribution of variables through regressions Slides► Video► YouTube►
    4. Estimate un-confounded and average treatment effects of exposure Slides► Video► YouTube►
    5. Re-estimate the impact of variables under counterfactual assumptions.  Estimate mediation impact.  Slides► Narrated Slides► Video► YouTube►

Assignment

Question 1: Use LASSO regressions to create the network of symptoms and COVID-19 diagnosis.  Remove equations that explain less than 10% of variation in the response variables. Remove coefficients where the absolute value of coefficients are equal or less than 0.05.  Remove cycles, none should exists if you always regressed response variables on independent variables that occur prior to it. From the network calculate the following

  1. What is the order of occurrences of the symptoms, age, gender, and results of COVID-19 laboratory tests? 
  2. What are the direct predictors of COVID-19 Laboratory test results?  Assume the following order for the variables: D1: Age, D2: Female, X1: Shivering, X2: Fatigue, X3: Loss of taste, X4: Fever, X5: Headaches, X6: Loss of smell, X7: Chills, X8: Muscle aches X9: Diarrhea, X10: Cough, X11: Shortness of breath, X12: Runny nose, X13: Sore throat, X14: Loss of balance, X15: Vomiting, X16: Joint pain, X17: Loss of appetite, X18: Wheezing, X19: Difficulty breathing, X20: Excessive sweating, Y: COVID-19 Test Results.
  3. What is the best network that fits the data? Establish the structure of the network ignoring regressions that explain less than 10% of the variation in test results and ignoring variables where absolute value of coefficients are less than or equal to 0.05.
  4. Estimate the parameters of the network from repeated LASSO regressions.  Report the joint probability of COVID-19 positive test results, if we do not know which symptoms were present.  
  5. What are parents in the Markov blanket of Fever?
    • Use regressions to identify these parents in Markov Blanket of Fever
    • Use the network to read parents in Markov Blanket of Fever
  6. What is the un-confounded effect of fever on probability of positive COVID-19 diagnosis?
    • Use inverse propensity weights to removing confounding
    • Switch the distribution of direct predictors of Fever so that patients with and without Fever have the same distribution of direct predictors
  7. What is the parents in Markov blanket of Chills?
    • Use Network to identify the parents in Markov blanket of Chills
    • Use regressions to identify parents in Markov blanket of Chills
  8. LASSO regress Chills on its direct predictors, not including Fever.  Report intercept, coefficients, and McFadden R-square. 
  9. Revise the network to create a counterfactual network in which Fever is not mediated by Chills (no arc from Fever to Chills)
  10. What is the mediated impact of fever on COVID-19 through Chills?

Resources for Question 1:

  1. Data Download►
  2. Team Teach One Slides► Narrated Slides► Code► Intelligent Tutor►
  3. Rachael King's Teach One YouTube►
  4. Yatisha Rajanala's Teach One Answers► Real Network► Counterfactual Network► Code► Netica Tables►
  5. Answers and overview of calculations Read►
  6. Analysis in the observed, "real," network, which includes a link from Fever to Chills

    Impact of Symptoms on COVID-19
    1. This network was drawn from repeated LASSO regressions of the data, ignoring R-square < 0.10 and coefficients < 0.05. Read►
    2. Netica network with associated tables Zip►
  7. Analysis in the Counterfactual network, which excludes the link between Fever and Chills:

    Counterfactual Network without Fever link to Chills
    1. Regression of Chills on prior predictors, excluding Fever as a predictor Read►
    2. Revised table for predicting Chills CSV file►
    3. Netica network with associated tables Zip►
  8. Percent of effect of Fever mediated through Chills Excel►

Optional Question 2:  What percent of the average effect of dyslexia on passing the exam is mediated through having a tutor?  The following 5 variables were response variables in 5 logistic regressions on all preceding variables: (1) passing the exam, (2) having dyslexia, (3) having dyscalculia, (4) having tutor and (5) completed all assigned problems. The estimated regression coefficients and McFadden R2 is provided in the following table:

  Passed Exam Dyslexic Dyscalculia Had Tutor Completed assignments
Intercept 0.30 0.6 0.85 0.15 0.49
Dyslexic 0.20     0.85  
Dyscalculia 0.42 0.10   0.35  
Had tutor 0.25       0.82
Completed assignments 0.70        
McFadden R2 0.18 0.15 0.13 0.23 0.34

Construct a revised counterfactual model using the following regressions.

  Had tutor (counterfactual)
Intercept 0.13
Dyslexic Not allowed in the regression
Dyscalculia 0.42 
McFadden R2 0.27

To answer this question, create twin models with one difference.  Create the structure of the model from the regressions.  Estimate the joint distribution of variables from the regression coefficients. Call this the real model.  Remove the link from Dyslexia to having a tutor.  And redo the model structure and parameters.  Call this the counterfactual model. Compare the change in passing the exam associated with change in dyslexia status of the students.  Report the percent of change that can be attributed to having a tutor for dyslexic students.   

More

For additional information (not part of the required reading), please see the following links:

  1. Path analysis of mediation coefficient Excel►
  2. Pearl's direct and indirect effects Read► Web Appendix►
  3. Saeed's lecture Video►
  4. Mediation analysis allowing for exposure-mediator interactions Read►
  5. Mediation analysis through stable weights Read►
  6. Practical guide to mediation analysis through inverse odds ratio Read► Slides►
  7. Mediation analysis revisited Read►

This page is part of the course on Comparative Effectiveness by Farrokh Alemi, PhD Home► Email►