Midterm Exam 2018  


You have 4 days to answer the following 2 questions, all questions must be answered.  Copy the questions into Microsoft Word document and provide answers to the questions within the word document.  Submit one document as your answers to all questions. 

Enter your name, email, and phone number.  Enter time and date you started working on the exam.

Question 1: The following Data► provide the survival among cancer patients.  The data provides 35 common comorbidities for patients who have or don't have stomach cancer. Use both logistic and ordinary regression to analyze these data and report the difference of the findings, in particular:

  1. Using logistic regression, calculate the propensity to have cancer.  Regress the variable cancer on various comorbidities.
  2. Use "Group By" command in SQL to examine combination of  the diagnoses.  Within the naturally occurring combination of diagnoses, calculate probability of cancer.  Calculate the logit of the probability.  Regress the logit function on the diagnoses using ordinary regression.
  3. Using the propensity score calculated in step 1, weight the data to remove confounding and show that the weights have removed confounding in the data.  Cancer and non-cancer patients should not differ in the rate of comorbidities.  Visually show this in a plot.
  4. Report the unconfounded impact of cancer on mortality using propensity scoring.  In particular, carry out a weighted regression using the weights you calculated in step 3.  Estimate the impact of cancer on mortality after balancing the data so that cancer and non-cancer patients do not differ in comorbidities.  
  5. Report the unconfounded impact of cancer on mortality using the intercept of regressing cancer patients (cases) on non-cancer patients (controls).  In particular, regress mortality rate of cancer + strata patients on no cancer + strata.  Report the intercept of this regression.  Explain why this intercept does not show the effects of comorbidities. 
  6. Describe why the answers in 4 and 5 are different. 

Question 2:  The objective of this analysis is to find the unconfounded impact of bupropion on remission of depression symptoms.   Use both propensity scoring and SQL to solve the problem.

  • These data come from STAR*D experiment conducted by National Institute of Medicine. Read about the study protocol. Protocol►
  • Download data.  Use instructor's last name as password.  Data►
  • The data are report bi-weekly or monthly.  There are 22254 records for about 4,000 patients observed over several levels of experiments. Organize the data so there is one row for each patient.  
    • Focus: The data report on citalopram, bupropion, mirzapine, buspirone, lithium, nortriptyline, sertraline, thyroid, tranylclypromine, and venlafaxine.  Please focus the analysis on bupropion.    For the time being ignore the dose of the medication and focus on whether the patient received the antidepressant. 
    • Exclusions: Patients who did not receive bupropion are assumed to have received the alternative antidepressant.  The unit of the analysis is antidepressant trials and not necessary unique person.  So the ID that should be used is the combination of patient ID and Concat_Levels.  
    • Treatment: If the patient has taken the antidepressant at any time during the study period, then mark it as 1, otherwise 0. Notice that some patients have taken the medication and others have not.  Within the combination of ID and Concat_levels look for any occasion of use of bupropion.   
    • Covariates: For the covariates, include gender, risk of suicide, heart, vascular, haematopoietic, eyes ears nose throat larynx, gastrointestinal, renal, genitourinary, musculoskeletal Integument, neurological, psychiatric illness, respiratory, liver, endocrine, alcohol, amphetamine, cannibis use, opioid use, panic, specific phobia, social phobia, OCD, PTSD, anxiety, borderline personality, dependent personality, antisocial personality, paranoid personality, personality disorder, anorexia, bulimia, and cocaine use.  If the variable is ever present assume that it is present.  Exclude any variable that is not present for any of the patients or combine covariates that occur occasionally.  
    • Outcome: The medication is considered to have caused the remission, if while on the medication, the patient is discharged to follow-up portion of the study, i.e. "Treatment_plan_equal_3" is set to 1.  This variable is 1 when the patient symptoms have subsided and the patient is referred to follow up and maintenance of the medication.   

The following code cleans the data and prepares it for the analysis.  SQL►

  1. Balance the data to remove the effects of covariates.  Show visually that you have successfully balanced the data.  Use the following steps to accomplish propensity scoring:
    • Calculate Propensity Score: Calculate the propensity of taking the antidepressant.  Regress taking of the antidepressant on the covariates. 
    • Weights: Calculate inverse propensity weights
    • Verify Balance: Verify that weighted regression removes the effects of all covariates.  Regress the antidepressants on the covariates and verify that none have a statistically significant effect on selection of the antidepressant.
    • Estimate Impact on Response: Regress response to the antidepressant on the covariates and taking the antidepressant.  
  2. Balance the data using SQL.  Use the following steps:
    • Stratify the data for patients who received the antidepressant.  Call these cases.
    • Stratify the data for patients who did not receive the antidepressant.  Call these controls.
    • Match cases and controls on the strata
    • Calculate the intercept for the regression of case's probability of mortality on control's probability of mortality.

This page is part of the course on Comparative Effectiveness by Farrokh Alemi PhD Home►  Emailâ–º