You have 4 days to answer the following 2 questions, all questions
must be answered. Copy the questions into Microsoft Word document
and provide answers to the questions within the word document.
Submit one document as your answers to all questions.
Enter your name, email, and phone number. Enter time and date you
started working on the exam.
Question 1: The following
Data► provide the survival
among cancer patients. The data provides 35 common comorbidities
for patients who have or don't have stomach cancer. Use both logistic
and ordinary regression to analyze these data and report the difference
of the findings, in particular:
- Using logistic regression, calculate the propensity to have
cancer. Regress the variable cancer on various comorbidities.
- Use "Group By" command in SQL to examine combination of the diagnoses. Within the naturally occurring
combination of diagnoses,
calculate probability of cancer. Calculate the logit of
the probability. Regress the logit function on the diagnoses
using ordinary regression.
- Using the propensity score calculated in step 1, weight the data to remove
confounding and show that the weights have removed confounding in
the data. Cancer and non-cancer patients should not differ
in the rate of comorbidities. Visually show this in a plot.
- Report the unconfounded impact of cancer on mortality using
propensity scoring. In particular, carry out a weighted
regression using the weights you calculated in step 3.
Estimate the impact of cancer on mortality after balancing the data
so that cancer and non-cancer patients do not differ in
- Report the unconfounded impact of cancer on mortality using the
intercept of regressing cancer patients (cases) on non-cancer
patients (controls). In particular, regress mortality rate of
cancer + strata patients on no cancer + strata. Report the
intercept of this regression. Explain why this intercept does
not show the effects of comorbidities.
- Describe why the answers in 4 and 5 are different.
Question 2: The objective of this
analysis is to find
the unconfounded impact of
bupropion on remission of depression symptoms.
Use both propensity scoring and SQL to solve the problem.
- These data come from STAR*D experiment conducted by National
Institute of Medicine. Read about the study protocol.
- Download data. Use instructor's last name as password.
- The data are report bi-weekly or monthly. There are 22254
records for about 4,000 patients observed over several levels of
experiments. Organize the data so there is one row for each
- Focus: The data report on citalopram, bupropion, mirzapine, buspirone,
lithium, nortriptyline, sertraline, thyroid, tranylclypromine,
and venlafaxine. Please focus the analysis on bupropion. For the time being
ignore the dose of the medication and focus on whether the
patient received the antidepressant.
- Exclusions: Patients who did not receive
bupropion are assumed to have received the alternative
antidepressant. The unit of the analysis is antidepressant
trials and not necessary unique person. So the ID that
should be used is the combination of patient ID and
- Treatment: If the patient has taken the
antidepressant at any time during the study period, then mark it
as 1, otherwise 0. Notice that some patients have taken the
medication and others have not. Within the combination of
ID and Concat_levels look for any occasion of use of
- Covariates: For the covariates, include
gender, risk of suicide, heart, vascular, haematopoietic, eyes
ears nose throat larynx, gastrointestinal, renal, genitourinary,
musculoskeletal Integument, neurological, psychiatric illness,
respiratory, liver, endocrine, alcohol, amphetamine, cannibis
use, opioid use, panic, specific phobia, social phobia, OCD,
PTSD, anxiety, borderline personality, dependent personality,
antisocial personality, paranoid personality, personality
disorder, anorexia, bulimia, and cocaine use. If the variable
is ever present assume that it is present. Exclude any variable
that is not present for any of the patients or combine
covariates that occur occasionally.
- Outcome: The medication is considered to
have caused the remission, if while on the medication, the
patient is discharged to follow-up portion of the study, i.e.
"Treatment_plan_equal_3" is set to 1. This variable is 1
when the patient symptoms have subsided and the patient is
referred to follow up and maintenance of the medication.
The following code cleans the data and prepares it for the analysis.
- Balance the data to remove the effects of covariates. Show visually that
you have successfully balanced the data. Use the following
steps to accomplish propensity scoring:
- Calculate Propensity Score: Calculate the
propensity of taking the antidepressant. Regress taking of
the antidepressant on the covariates.
- Weights: Calculate inverse propensity
- Verify Balance: Verify that weighted
regression removes the effects of all covariates. Regress
the antidepressants on the covariates and verify that none have
a statistically significant effect on selection of the
- Estimate Impact on Response: Regress
response to the antidepressant on the covariates and taking the
- Balance the data using SQL. Use the following steps:
- Stratify the data for patients who received the
antidepressant. Call these cases.
- Stratify the data for patients who did not receive the
antidepressant. Call these controls.
- Match cases and controls on the strata
- Calculate the intercept for the regression of case's
probability of mortality on control's probability of mortality.
This page is part of the course on Comparative Effectiveness by Farrokh Alemi PhD Home►