Lecture: Stratified Regression 


Assigned Reading

  • Stratified Regression (use instructor's last name for password) 
    • Read Chapter 18 in Statistical Analysis of Electronic Health Records by Farrokh Alemi, 2020 
    • Slides►
  • Cursor and do-while SQL commands 


Submit one file for all questions.  Include all charts, code, and output in the same file.  Start each question in a separate page or sheet. Include in the first page a summary page.  In the summary page write statements comparing your work to answers given or videos.  For example, "I got the same answers as the Teach One video for question 1." 

Question 1: Estimate mortality rate in 6 months for lung cancer patients with various common comorbidities.  Data► SQL► Jehanzeb's Solution► Video► SQL & R Code Combined►

  • Identify parents in the Markov blanket of lung cancer.
  • Verify that all comorbidties make prognosis of lung cancer worse. 
  • Use SQL code and parents in Markov blanket of lung cancer, to estimate survival from lung cancer.
  • Use SQL to construct case/control comparisons for each comorbidity of lung cancer. 
  • Use SQL to estimate the intercept for parameters of the multiplicative function form.  Estimate the overall k parameter for the multiplicative model. 
  • Report the mortality rate for patients who just have lung cancer and no other comorbidities. 
  • Provide the equation that calculates the risk for combination of lung cancer and its comorbidities.

Question 2: Many patients, at end of life, experience disabilities. In fact, disabilities are often used to anticipate end of life. The attached data show the disabilities residents of veteran administration nursing homes have experienced.  Estimate how various disabilities predict mortality in 6 months.  The data do not have headers.  The variables are listed in the following order: ID, age, gender (M for male, F for Female), number of assessments completed on the person, number of days followed, days since first assessment, days to last assessment, unable to eat, unable to transfer, unable to groom, unable to toilet, unable to bathe, unable to walk, unable to dress, unable to bowel, unable to urine, dead (1) or alive (0), and assessment number.  The following table should assist in organizing the data. 

ID Age Sex tAssess Followed DaysFirst DaysLast uEat uSit uGroom uToilet uBathe uWalk uDress uBowel uUrine Alive AssessID
1 66 M 9 915 0 915 0 0 0 0 0 1 0 0 0 0 1
1 66 M 9 915 7 908 0 0 0 0 0 1 0 0 0 0 2
1 66 M 9 915 18 897 0 0 0 0 0 1 0 0 0 0 3
1 66 M 9 915 238 677 0 0 0 0 0 1 0 0 0 0 4

  • Clean the data using the following steps: The age at death is given as a row of data.  For each assessment calcualte if the patient dies in 6 months from the assessment.  If the patient never dies assume not dead in 6 months.  At death assume that the patient has all disabilities, as is the data indicates no disabilities at death.  Drop last assessment as no outcomes can be calculated from last assessment. Assume age of assessment is age at first assessment (given as the second variable) plus days to assessment/365.  Residents with negative age should be dropped because of date of birth errors.  Residents 100 or more years should be dropped because of small sample.  Note that the analysis is done at assessment level and not at patient level.  Data► Clean►
  • Predict from the patient's assessments (i.e. their age, gender, and disabilities at time of assessment) if the patient is likely to die in the next 6 months and may be a candidate for hospice care.  Do not use regression in these analysis and estimate the parameters using SQL. SQL► Answer►
  • Calculate the k constant for the multiplicative model using SQL.  SQL►
    Generate possible k values and see which one of the k values satisfy the equation:   
     multiplicative k constant
  • Use the model you have developed to predict the probability of mortality for a 75 year old resident with urine, bowel, and toilet disabilities.  Enter the case description into a table called RecentCases, using Create Table and Insert Value commands.  Then use this table to predict the probability of mortality for this resident.  SQL►
    Make sure that the probability of mortality is adjusted to range between minimum amd maximum probabilities for different strata.  Stratfied regression provides a transformed probability that should be adjusted to estimate the actual probability using this formula:
    Transformed probability
    Where Max is the maximum and Min is the minimum probabilities for each strata. 

Question 3: The following data show the variation in diabetes in select counties across United States.  Using stratified covariate balancing report the impact of access to supermarkets on diabetes after controlling for other variables. Data►

  • Check that all variables are positively and monotonely related to prevalence of diabetes in the county. Monotone?►
  • Assign a binary variable to each variable in such a manner that when the variable is 1, diabetes is more likely.
  • Create a multiplicative model for predicting diabetes.

Super computer helps

Question 4 These data come from STAR*D experiment conducted by National Institute of Medicine.  Use instructor's last name as password.  The data report the experience of approximately 4,000 patients with various antidepressants: citalopram, bupropion, mirzapine, buspirone, lithium, nortriptyline, sertraline, thyroid, tranylclypromine, and venlafaxine.  Data► Protocol►

The data are reported for a total of 22,254 visits.  Visits may be 2 week or more apart.  Not every patient shows for every scheduled visit.  Organize the data so there is one row for each patient and each antidepressant trial (known in the data as Concat). Note that this field considers combination of antidepressants as a new antidepressant. Ignore the dose of the medication. Patients received multiple antidepressants during these trials until something worked for them. Include each time a new antidepressant was tried as a separate trial. If the patient has taken the antidepressant at any time during the trial, then mark it as 1, otherwise 0. Notice that some patients have taken the medication and others have not. Patients who have not taken a particular medication have taken other medications, so at any time we are comparing one medication to alternative treatments.  The medication is considered to have caused the remission if the patient is referred to follow up portion of the study, at any point while taking the medication; i.e. the variable "Treatment_plan_equal_3" is set to 1 while taking the medication .

  1. Clean and organize the data for analysis of bupropion
  2. Identify the parents in the Markov blanket of bupropion
  3. Create a multiplicative model of the impact of variables in the parent of Markov Blanket of bupropion and buproprion itself on remission.
  4. Predict remission rate for bupropion using two nearest strata SQL►


For additional information (not part of the required reading), please see the following links:

  1. Multi-attribute preference functions. Health Utilities Index.  PubMed►
  2. Utility functions for health profiles PubMed►
  3. How decisions reveal our preferences PubMed►

This page is part of the course on Comparative Effectiveness by Farrokh Alemi PhD Home►  Email►