HAP 823: Causal Analysis

Lecture: All of Us Project on Bipolar Depression Treatment  

 

Assigned Reading

  • Effectiveness of antidepressants PubMed►
  • Proxy measure for remission of depression symptoms PubMed►
  • SAFE procedures for excluding some independent variables from LASSO regressions Read►

Assignment

The semester long project in this course is to assess the effectiveness of an existing guide to depression medications in minority populations.

(A) Register for All of US.  This step was assigned prior to start of the course. If you have not done all registration steps, including training, then you will need to solve this problem quickly. This registration may take several weeks for students who do not have a State ID.  Otherwise, it should take about 90 minutes.  Also make sure that you remember your password as there are multiple accounts set up in this process.  You need to write down the password for each sign in separately on a piece of paper as you may confuse which password is needed when.

  1. Register for an account on @researchallofus.org
  2. Change from temporary password to a new password and record your password on paper somewhere.
  3. Turn on Google 2-Step Verification
  4. Verify your identity with Login.gov.  This step requires a state ID or Drivers License, and text phone.
  5. There are multiple passwords that you should keep in mind.  There is your GMU password, your research workbench password on All of Us and your computer password, and your Google password.  Please make sure that you keep these accounts separate and read the messages carefully to see which password is needed.
  6. Complete All of Us Registered Tier Training
  7. You do not need to get additional data access beyond registration data. George Mason University does not allow access to Controlled Tier.
  8. Sign the Code of Conduct Sign Data User Code of Conduct

When you have registered completely, you should see something like this page:

Create your Workspace in All of Us Database

 

(B) Create Cohort and Related Data Sets.  Note that a cohort and data sets are different concepts. 

  1. Create your cohort in All of Us. 
  2. Limit the cohort by patients who have bipolar depression.  
  3. Create the concept for bipolar depression. Review in PubMed how investigators have defined Major depression in EHRs. Alternatively, use conditions defined within All of Us to select the right definition of Major Depression.
  4. The unit of analysis is medications and not individuals.  An individual can have multiple medications.  Define the database so that there is one entry for each bipolar depression medication.   
  5. Create your data sets, for your cohort.  Do not include non-EHR data or surveys. Note that creation of medication data set requires creation of concepts that capture the medication in the data. In your cohort, select demographics (age, gender) and all conditions as independent variables of interest.  No survey responses are needed for independent variables. Rely only on EHR data only. You also need the date of first use (purchase) of the medication. The date of occurrence of the response variable is the first time the variable/condition has occurred.  Here are the data points that you need to include in your data sets:
    1. ID of antidepressant
    2. ID of person
    3. Age at first intake of antidepressant
    4. Sex at birth
    5. 50,000 conditions

The following resources may be of use in this task:

  • Organizing antidepressants CSV►
  • Creating survival variable Read►
  • Vlad Cardenas Teach One Part 1  YouTube► Slides►
  • Vlad Cardenas Teach One Part 2  You Tube► Slides► Code►
  • Rasil Alamri’s Teach One Part 1 YouTube►
  • Mona Mohamed’s Teach One Part 2 YouTube►
  • Source Code Part 2: Adding AI Predictors, Generating Prediction, and Executing Regressions PDF►
  • Reference Data Mapping File for AI Predictors CSV►
  • Divya Bhavanam's Teach One on predicting from the AI system and All of Us data YouTube►

(C) Describe the Population.  In this step you need to create Table 1 in your eventual report.  This Table should include the description of the population.  For examples of Table 1 see PubMed.  Provide a summary of your data that includes number of antidepressants examined, number of individuals involved, number of antidepressants discontinued, number of days individuals followed, number of days antidepressants continued, number of medical conditions at baseline of use of antidepressants, number of antidepressants used prior to baseline, experience with previous antidepressants. 

(D) Fit a Network Model to the Data:  Use chain of LASSO regressions to create a network model of direct and indirect predictors of remission after taking your antidepressant.  Include pairwise interaction of conditions.  This may result in too many independent variables.  Analyze the 2 most common medication for bipolar depression separately.  To reduce the number of independent variable use the SAFE procedure or likelihood ratios, where strong rules are used to exclude some variables.

(E) Report Your findings: This report should include the following section and provided at approximate times indicated by email to the instructor:

  1. Abstract.  Include a structured abstract using objective of the study, method, results, and main conclusion.  The abstract should be written after you complete other sections.  The abstract must not exceed 500 words and should report the number of words used in the abstract.
  2. Background literature review should not exceed 1 page. Your one page literature review should assume a reader familiar with the literature and not exceed three paragraph.  The first paragraph should address the significance of the area you are addressing, including prevalence of depression and importance of selection of the medications. The second paragraph should describe failure of clinicians in selecting the right medication, as reported in the literature. The paragraph should not exceed two or three sentences but can have numerous references.  The last paragraph should discuss how your analysis can help selection of medication based on patient's medical history.  Background section should be a brief synthesis of existing research findings related to the problem being addressed in the study. Every sentence should have a reference.  We are not interested in unsupported claims.
  3. Method section should be a complete description of the methods; and there is no page limit but brevity is appreciated. It should include a paragraph or a sentence on source of data. It should describe the inclusion and exclusion criteria for the creation of the cohort and compare these criteria to what has been done in the literature. It should have a sentence or a paragraph, with citations, on definition of remission.  It should have a sentence or a paragraph on number of, and definition of, independent variables. These statements should clarify how missing values were treated and explain what steps were taken to ensure that independent variables occur prior to response/dependent variable. There should be a paragraph on analytical methods used.  
  4. Results section should describe the findings and there is no page limit.  Table 1 should be description of the population studied.  Figures and additional tables should summarize the statistical findings. These should include parameters of your model and the fit between the guide and experience of African Americans. There should not be any discussion of findings in the result section.  Result section should include a Netica model. Here is an example of a Table to report McFadden R-square for AI predictions:

    Table 2: Cross-Validated McFadden R2 in Predicting African Americans Antidepressant Response

    Med 1 Med 2

    Cases

    1,984

    2,658

    Cases with Remission

    780

    1,064

    Model accuracy

    7%

    12%


    Table 3: Top 5 Factors with Largest Absolute value Added to the AI
      Med 1 Med 2
    Intercept xx vv
    Top factor, coefficient factor, coefficient
    1st factor, coefficient factor, coefficient
    2nd
    3rd
    4th
    5th


  5. Discussion section should include 4 distinct sections and there is no page limits.  The first section should be a summary of the key findings.  The second section should be a review of support for the findings in the literature. The third section should summarize study limitations.  The last section should conclude with policy implications.

 


This page is part of the HAP 823 course organized by Farrokh Alemi, Ph.D. Home► Email►