HAP 819: Advanced Statistics

Lecture: Multiplicative Regression 

 

Assigned Reading

  • Read Chapter 18 in Statistical Analysis of Electronic Health Records by Farrokh Alemi, 2020  Slides►
  • Cursor and do-while SQL commands Slides► YouTube►

Assignment

Submit one file for all questions.  Include all charts, code, and output in the same file.  Start each question in a separate page or sheet. Include in the first page a summary page.  In the summary page write statements comparing your work to answers given or videos.  For example, "I got the same answers as the Teach One video for question 1." 

Question 1: Estimate mortality rate in 6 months for lung cancer patients with various common comorbidities. 

  1. Use LASSO Logistic Regression to identify comorbidities that are predictive of survival in 6 months.
  2. For comorbidities that make prognosis of lung cancer worse (i.e., non-zero comorbidities in the regression), identify impact of the comorbidity alone by itself, without any other comorbidities. Report the number of such cases and the probability of mortality within 6 months.
  3. Fit a multiplicative model to the data and estimate the parameters of the model.
  4. Estimate the overall k parameter for the multiplicative model. Estimate k by trying -1, 0, and +1. Use the following relationship among the parameters to estimate k:
    k hyper parameter for multiplicative regression

Resources for Question 1:

Question 2: Many patients, at end of life, experience disabilities. In fact, disabilities are often used to anticipate end of life. The attached data show the disabilities residents of veteran administration nursing homes have experienced.  Estimate how various disabilities predict mortality in 6 months.  The data do not have headers.  The variables are listed in the following order: ID, age, gender (M for male, F for Female), number of assessments completed on the person, number of days followed, days since first assessment, days to last assessment, unable to eat, unable to transfer, unable to groom, unable to toilet, unable to bathe, unable to walk, unable to dress, unable to bowel, unable to urine, dead (1) or alive (0), and assessment number.  The following table should assist in organizing the data. 

ID Age Sex Total
Assess
Days
Followed
Days
from
First
Days
from
Last
Unable
to
Eat
Unable
to
Sit
Unable
to
Groom
Unable
to
Toilet
Unable
to
Bathe
Unable
to
Walk
Unable
to
Dress
Unable
to
Bowel
Unable
to
Urine
Alive ID
for
Assess
1 66 M 9 915 0 915 0 0 0 0 0 1 0 0 0 0 1
1 66 M 9 915 7 908 0 0 0 0 0 1 0 0 0 0 2
1 66 M 9 915 18 897 0 0 0 0 0 1 0 0 0 0 3
1 66 M 9 915 238 677 0 0 0 0 0 1 0 0 0 0 4

  1. Clean the data using the following steps:
    • For each assessment calculate if the patient dies in 6 months from the assessment. If the patient never dies assume not dead in 6 months.
    • At death assume that the patient has all disabilities.
    • Drop last assessment as no outcomes can be calculated from the last assessment.
    • Assume age of assessment is age at first assessment plus days to assessment/365.
    • Residents with negative age should be dropped because of date of birth errors.
    • Residents 100 or more years should be dropped because of small sample.
    • Adjust all variables to be binary, either 0 or 1; and 1 is assigned to the level that increases the odds of mortality.
  2. Predict from the patient's assessments (i.e. their age, gender, and disabilities at time of assessment) if the patient is likely to die in the next 6 months and may be a candidate for hospice care.   
  3. Calculate the fit in the data for possible k values of 1.0,  0.0, and -1.0. Calculate the k constant for the multiplicative model that fits the following formula: 
      k hyper parameter for multiplicative regression
  4. Use the model you have developed to predict the probability of mortality for a 75 year old resident with only urine, bowel, and toilet disabilities.   

Following resources are available for Question 2:

Question 3:  These data come from STAR*D experiment conducted by National Institute of Medicine.  Use instructor's last name as password.  The data report the experience of approximately 4,000 patients receiving various antidepressants: citalopram, bupropion, mirzapine, buspirone, lithium, nortriptyline, sertraline, thyroid, tranylclypromine, and venlafaxine.  The data are reported for a total of 22,254 visits.  Visits may be 2 weeks or more apart.  Not every patient shows for every scheduled visit.  Organize the data so there is one row for each patient and each antidepressant trial (known in the data as Concat). Note that this field considers combination of antidepressants as a new antidepressant. Ignore the dose of the medication. Patients received multiple antidepressants during these trials until something worked for them. Include each time a new antidepressant was tried as a separate trial. If the patient has taken the antidepressant at any time during the trial, then mark it as 1, otherwise 0. Notice that some patients have taken the medication and others have not. Patients who have not taken a particular medication have taken other medications, so at any time we are comparing one medication to alternative treatments.  The medication is considered to have caused the remission if the patient is referred to follow up portion of the study, i.e., the variable "Treatment_plan_equal_3" is set to 1.

  1. Clean and organize the data for analysis of bupropion
  2. Identify the baseline characteristics that are non-zero LASSO predictors of receiving bupropion
  3. Create a multiplicative model of the impact of baseline variables and bupropion on remission.

Resources for Question 3:

More

For additional information (not part of the required reading), please see the following links:

  1. Multi-attribute preference functions. Health Utilities Index.  PubMed►
  2. Utility functions for health profiles PubMed►
  3. How decisions reveal our preferences PubMed►

This page is part of the course on Comparative Effectiveness by Farrokh Alemi PhD Home► Email►