HAP 819: Advanced Statistics II

Lecture: Stratified Covariate Balancing  


Assigned Reading


Submit assignments in Blackboard. Include in the first page a summary page.  In the summary page write statements comparing your work to answers given or videos.  For example, "I got the same answers as the Teach One video for question 1." 

Question 1, Benchmarking Clinicians: The following data provide the length of stay of patients seen by Dr. Smith (Variable Dr Smith=1) and his peer group (variable Dr. Smith = 0).  Let "cared for Dr. Smith" be the treatment variable and Length of Stay be the outcome variable.   

  1. Use LASSO Logistic regression to describe differences in comorbidities of patients seen by Dr. Smith and his peer group. 
  2. Balance the data through stratified covariate balancing so that Dr. Smith and his peer group see the same types of patients.
  3. Graphically show that the weighting procedure of stratified covariate balancing results in similar patients treated by Dr. Smith and his peer.   
  4. Report the un-confounded impact of Dr. Smith on length of stay using the common odds ratio of having above average length of stay. Report the impact of Dr. Smith on length of stay using the weighted length of stay. 

Resources for Question 1:

Question 2, Survival from Stomach Cancer: The following data provide the survival among stomach cancer patients.  The data provides 35 common comorbidities for patients who have or don't have stomach cancer.  Patients who have cancer are considered cases and patients who do not have cancer are considered controls. In analysis of data one wants to maximize the overlap between cases and controls to be around at least 80%, so that findings can be generalized to most of the original cases. The percent of cases that are matched is called overlap.  It is defined as:

Number matched over total cases

  1. Using LASSO regression identify covariates that are most likely to affect survival of patients with stomach cancer.
  2. Using SQL, group the data into commonly occurring strata. Within each strata, calculate the odds of mortality for stomach cancer.  
  3. Calculate the common odds ratio across strata. Report how the un-confounded and confounded odds of mortality from stomach cancer are different from each other.
  4. Conduct sensitivity analysis for the calculated common odds ratio.  Sensitivity analysis is the process of changing one variable and re-examining the conclusions. Drop one of the  comorbidities from the analysis and repeat the entire analysis.

Resources for Question 2:

Question 3, Comparative Effectiveness of Antidepressants:  These data come from STAR*D experiment conducted by National Institute of Medicine.  The data report the experience of approximately 4,000 patients with various antidepressants: citalopram, bupropion, mirzapine, buspirone, lithium, nortriptyline, sertraline, thyroid, tranylclypromine, and venlafaxine. The data are reported for a total of 22,254 visits.  Visits may be 2 week or more apart.  Not every patient shows for every scheduled visit.  Organize the data so there is one row for each patient and each antidepressant trial (known in the data as Concat). Note that this field considers combination of antidepressants as a new antidepressant. Ignore the dose of the medication.  Patients received multiple antidepressants during these trials until something worked for them. Include each time a new antidepressant was tried as a separate trial. If the patient has taken the antidepressant at any time during the trial, then mark it as 1, otherwise 0. Notice that some patients have taken the medication and others have not. Patients who have not taken a particular medication have taken other medications, so at any time we are comparing one medication to alternative treatments.  The medication is considered to have caused the remission if the patient is referred to follow up portion of the study, at any point while taking the medication; i.e. the variable "Treatment_plan_equal_3" is set to 1 while taking the medication .

  1. For 3 antidepressants, balance the data using SQL and stratified covariate balancing. 
  2. If necessary use parents in Markov Blanket of the medication to improve overlap beyound 80%.
  3. Describe which of the 3 medications should a patient who has PTSD and neurological disorders take.

Resources for Question 3:

Question 4, Individual Employee's Contribution to Patients Dissatisfaction:  When patients rate their satisfaction with care, they rate all employees within the unit serving them. The following data have been taken from nurses rounding in a facility.  The time they spent with patients has been recorded.  In addition, several characteristics of the patients have also been recorded and standardized.  Using stratified covariate balancing indicate if any of the individual nurses are responsible for patient's overall satisfaction rating?  Please note that the listed teach one assignment uses the wrong command for stratified covariate balancing package.  

Resources for Question 4:


For additional information (not part of the required reading), please see the following links:

  1. Collapsing strata Read►

This page is part of the course on Advanced Statistics II by Farrokh Alemi, Ph.D. Home► Email►