- Session overview
- Propensity Scoring
- Read chapter 13 in Statistical Analysis of Electronic Health
Records by Farrokh Alemi, 2020
- Tutorial on propensity score
- Measuring treatment effects
- Matching on propensity scores
- Propensity scores and time to events
- Propensity scoring of cost data
Assignments should be submitted in Blackboard. Include in the first page a summary page. In the summary page
write statements comparing your work to answers given or videos. For
example, "I got the same answers as the Teach One video for question 1."
Question 1: The following data provide the length of
stay of patients seen by Dr. Smith (Variable Dr. Smith=1) and his peer group
(variable Dr. Smith = 0). Answer following questions:
Balance the data by propensity to seek care from Dr. Smith.
This involve first predicting probability of a patient type
utilizing services of Dr. Smith; then weighting the data inversely
proportional to the probability of using Dr. Smith. Note that
patients cared for by Dr. Smith and by his peer group will have a
different set of weights. The net results of weights is that
patients cared for by Dr. Smith and his peer will have the same rate
of various diseases. (a) Graphically show that the weighting procedure
you followed results in same set
of patients treated by either Dr. Smith or his peer. (b) Report the un-confounded impact of Dr. Smith on length of stay.
Question 2: The following data provide the survival
among cancer patients. The data provides 35 common comorbidities
for patients who have or don't have stomach cancer. Use both logistic
and ordinary regression to analyze these data and report the difference
of the findings, in particular:
- Using logistic regression, calculate the propensity to have
- Group the diagnoses using SQL. Within the naturally occurring groups of diagnoses,
calculate probability of cancer. Calculate the logit of
the probability. Regress the logit function on the diagnoses
using ordinary regression.
- Report how the coefficients for the comorbidities of stomach cancer.
How do these coefficients change across the two methods?
Question 3: The objective of this
analysis is to find
response to antidepressants. You can select one of the
These data come from STAR*D experiment conducted by National
Institute of Medicine. Read about the study protocol.
Download data. Use instructor's last name as password.
- The data are report bi-weekly or monthly. There are 22,254
records for about 4,000 patients. Organize the data so there is one
row for each patient.
The enclosed data report on citalopram, bupropion, mirzapine, buspirone,
lithium, nortriptyline, sertraline, thyroid, tranylclypromine,
and venlafaxine. Please focus the analysis on only one of
the antidepressants or a combination of two antidepressants
taken simultaneously. For the time being
ignore the dose of the medication.
- Exclusions: Patients who did not receive
bupropion are assumed to have received the alternative
antidepressant. The unit of the analysis is antidepressant
trials and not necessary unique person. So the ID that
should be used is the combination of patient ID and
- Treatment: If the patient has taken the
antidepressant at any time during the study period, then mark it
as 1, otherwise 0. Notice that some patients have taken the
medication and others have not. Within the combination of
ID and Concat_levels look for any occasion of use of bupropion.
- Covariates: For the covariates, include
gender, risk of suicide, heart, vascular, haematopoietic, eyes
ears nose throat larynx, gastrointestinal, renal, genitourinary,
musculoskeletal Integument, neurological, psychiatric illness,
respiratory, liver, endocrine, alcohol, amphetamine, cannibis
use, opioid use, panic, specific phobia, social phobia, OCD,
PTSD, anxiety, borderline personality, dependent personality,
antisocial personality, paranoid personality, personality
disorder, anorexia, bulimia, and cocaine use. If the
is ever present assume that it is present. Exclude
that are not present for any of the patients. Combine
covariates that occur occasionally.
- Outcome: The medication is considered to
have caused the remission, if while on the medication, the
patient is discharged to follow-up portion of the study, then
"Treatment_plan_equal_3" is set to 1. Use
"Treatment_Plan_Equal_3" and not "Remission" variable as an
indication of effectiveness of the antidepressant, since the
remission variable does not indicate that the clinician was in
agreement that the patients symptoms are well managed.
- Balance the data to remove the effects of covariates. Show visually that
you have successfully balanced the data. Use the following
steps to accomplish this:
- Calculate Propensity Score: Calculate the
propensity of taking the antidepressant. Regress taking of
the antidepressant on the covariates.
- Weights: Calculate inverse propensity
- Verify Balance: Verify that weighted
regression removes the effects of all covariates. Regress
the antidepressants on the covariates and verify that none have
a statistically significant effect on selection of the
antidepressant. Visually show that the data have been
Describe how well the model was balanced and how well the impact
of antidepressant was estimated.
Answer by Sankeerthi
- Estimate Impact on Response: Regress
response to the antidepressant on the covariates and taking the
Solutions can be obtained using different software.
Question 4: The following problem was first created
Morgan and Harding and we have adjusted it to fit within health care. In this example,
the outcome are length of stay in the hospital, the treatment is the
clinician/his peer group and the strata are a mix of medical history and demographic variables
that account for the pattern of self-selection into treatment.
This mix have been divided into 3 strata: low, medium and high risk. What is the impact of
clinician on length of stay, after removing confounding associated with
severity of the patients' illness?
Length of Stay
Solution by Morgan and Harding Read►
Question 5: The following data have been taken from nurses
rounding in a facility. The time they spent with patients has been
recorded. In addition,
several characteristics of the patients have also been recorded and
standardized. Do any of the nurses have a
significant impact on overall satisfaction in the unit?
Data► Yamani's answer►
Carlos's Teach One ►
Carlos's SQL Code►
Question 6: In a nursing home, data were
collected on residents' survival and disabilities. The data are
listed in the following order: ID, age, gender (M for male, F for
Female), number of assessments completed on the person, number of days
followed, days since first assessment, days to last assessment, unable
to eat, unable to transfer, unable to groom, unable to toilet, unable to
bathe, unable to walk, unable to dress, unable to bowel, unable to
urine, dead (1) or alive (0), and assessment number.
Predict from the patient's assessments (i.e. their age and disabilities
at time of assessment) if the patient is likely to die and should be
admitted to the hospice program.
Sherline's Teach One►
For additional information (not part of the required reading), please see the following links:
- A practical guide to propensity scoring using R
- Guide to propensity scoring
This page is part of the course on Comparative Effectiveness by Farrokh Alemi PhD Home►