Lecture: Propensity Scoring  

Overview

 

Readings and Lecture Resources

  • Propensity Scoring
    • Read chapter 13 Statistical Analysis of Electronic Health Records in Big Data in Healthcare, pages 327 to 344   
    • Example propensity scoring Excel► Slides► YouTube►
  • Propensity score quintile matching
    • Read chapter 13 Statistical Analysis of Electronic Health Records in Big Data in Healthcare, pages 332 to 337
  • Propensity Score with Inverse Probability Matching
  • Measuring treatment effects Read►
  • Matching on propensity scores Read►
  • Propensity scores and time to events Read►
  • Propensity scoring of cost data Data► MatchIT R Code►

Assignment

Assignments should be submitted in Blackboard.  Include in the first page a summary page.  In the summary page write statements comparing your work to answers given or videos.  For example, "I got the same answers as the Teach One video for question 1."  You are welcomed to use any software.

Question 1: The following data were collected for residents in the Medical Foster Home and in Nursing Homes.  The data is organized by quartiles of severity of illness.  Each quartile shows increasing participation in Medical Foster Home program, indicating that sicker patients are more likely to participate in the medical foster home program.  We want to remove the effect of participation in the program from our estimate of cost differences.  Report whether the residents in the Medical Foster Home have lower cost to residents in the Nursing Homes with similar likelihood of participation in the Medical Foster Home program. 

Severity of Illness  Quartiles Number of Residents Cost of care/day
Medical Foster Home Nursing Home Medical Foster Home Nursing Home
1 45 3,201 $2.71 $87.98
2 89 3,156 $4.44 $77.80
3 168 3,077 $11.31 $78.58
4 514 2,731 $31.82 $72.77
5 1,775 1,470 $109.62 $40.54

  • Answer in chapter 13 Statistical Analysis of Electronic Health Records in Big Data in Healthcare, pages 338.  Don't do this in R, it is a lot more work than needed. Note that the data here and the data in the book in page 335 differ in a significant way.  The data here is in quartile of severity of illness, while the data in the book are in quartile of propensity score. The correct way to solve this data is to estimate propensity weights and multiple costs by these weights to calculate average treatment effect.
  • Answer in Excel Image►

Question 2: Using the data, what is the inverse propensity weight (i.e. one ver the conditional probability of participating in the Medical Foster Home) for the 45 patients who fall in quintile 1.

Severity of Illness  Quartiles Number of Residents Cost of care/day
Medical Foster Home Nursing Home Medical Foster Home Nursing Home
1 45 3,201 $2.71 $87.98
2 89 3,156 $4.44 $77.80
3 168 3,077 $11.31 $78.58
4 514 2,731 $31.82 $72.77
5 1,775 1,470 $109.62 $40.54
  1. 0.014 
  2. 0.986
  3. 71.43
  4.  Cannot be determined
  5.  1.01
  6.  None of these  

Answer in chapter 13 Statistical Analysis of Electronic Health Records in Big Data in Healthcare, pages 337 to 338

Question 3:  The objective of this analysis is to estimate patients' response to bupropion antidepressant, after controlling for their background differences.  The data are report bi-weekly.  There are 22,254 records for about 4,000 patients. Organize the data so there is one row for each patient.  Patients who did not receive bupropion are assumed to have received an alternative antidepressant.  The unit of the analysis is antidepressant trials and not unique person.  So the ID that should be used is the combination of patient ID and Concat_Levels. If the patient has taken the bupropion antidepressant at any time during the study period, then mark it as 1, otherwise 0. Notice that some patients have taken the medication and others have not.  For the covariates, include gender, risk of suicide, heart, vascular, hematopoietic, eyes ears nose throat larynx, gastrointestinal, renal, genitourinary, musculoskeletal Integument, neurological, psychiatric illness, respiratory, liver, endocrine, alcohol, amphetamine, cannabis use, opioid use, panic, specific phobia, social phobia, OCD, PTSD, anxiety, borderline personality, dependent personality, antisocial personality, paranoid personality, personality disorder, anorexia, bulimia, and cocaine use.  If the covariate is ever present assume that it is present. Exclude covariates that are not present for any of the patients. The outcome of interest is remission of depression symptoms. The medication is considered to have caused the remission, if while on the medication, the patient is discharged to follow-up portion of the study, then "Treatment_plan_equal_3" is set to 1.  Use "Treatment_Plan_Equal_3" and not "Remission" variable as an indication of effectiveness of the antidepressant. Balance the data to remove the effects of covariates.  Show visually that you have successfully balanced the data.  Use the following steps to accomplish this:

  1. Calculate the propensity of taking the antidepressant.  Regress taking of the antidepressant on the covariates. 
  2. Calculate inverse propensity weights
  3. Verify that weighted regression removes the effects of all covariates. Regress the antidepressants on the weighted covariates and verify that none have a statistically significant effect on selection of the antidepressant.  Visually show that the data have been balanced.
  4. Estimate Impact on Response: Regress response to the antidepressant on the covariates and treatment (taking the bupropion antidepressant). Describe how well the model was balanced and how well the impact of antidepressant was estimated.

Question 4: The following data have been taken from nurses rounding in a facility.  The time they spent with patients has been recorded.  In addition, several characteristics of the patients have also been recorded and standardized.  Do any of the nurses have a significant impact on overall satisfaction in the unit?  Use propensity scoring to control for alternative explanations of factors that affect satisfaction.

Question 5:  In a nursing home, data were collected on residents' survival and disabilities.  The data are listed in the following order: ID, age, gender (M for male, F for Female), number of assessments completed on the person, number of days followed, days since first assessment, days to last assessment, unable to eat, unable to transfer, unable to groom, unable to toilet, unable to bathe, unable to walk, unable to dress, unable to bowel, unable to urine, dead (1) or alive (0), and assessment number.  Note that death in 6 months is a variable to be constructed from Dead (1) or Alive (0) and days since assessment.  Also note that individuals have multiple assessments, each assessment should have an indicator whether the patient died in 6 months.

  1. Calculate for each assessment, whether the person died before the next 6 months. Exclude single assessments, for which it is not possible to determine if the person dies in the next 6 months.
  2. Examine if unable to eat increases 6-month mortality risk, after controlling for other variables. In this analysis, unable to eat is an exposure variable and should be regressed on other covariates.  Then inverse propensity weights should be used to remove the effects of other covariates before regressing 6-month mortality on unable to eat and other covariates. 

More

For additional information (not part of the required reading), please see the following links:

  1. A practical guide to propensity scoring using R Read►
  2. Guide to propensity scoring Read►

This page is part of the HAP 819 course on Advanced Statistics organized by Farrokh Alemi PhD Home► Email►