Lecture: Logistic Regression  


Assigned Reading

  • Session overview YouTube►
  • Logistic regression
    • Read Chapter 12 in Statistical Analysis of Electronic Health Records by Farrokh Alemi, 2020
  • Replacing logistic regression with ordinary regression


Assignments should be submitted in Blackboard.  Include in the first page a summary page.  In the summary page write statements comparing your work to answers given or videos.  For example, "I got the same answers as the Teach One video for question 1." Or you can write: "There was no answer sheet available for question 2."  We prefer that assignments are done in Python.

Question 1: What is the Logit of an event that has probability 0.75, what if the probability is 0.80? Answer► Teach One (Python)►

Question 2: Regress survival in next 6 months on disabilities of the patients, age of patients, gender of patients and whether they  participated in the medical foster home program. MFH is an intervention for nursing home patients.  In this program, nursing home patients are diverted to a community home and health care services are delivered within the community home.  The resident eats with the family and relies on the family members for socialization, food and comfort.  It is called "foster" home because the family previously living in the community home is supposed to act like the resident's family. Enrollment in MFH is indicated by a variable MFH=1. 

Survival is reported in two variables.  One variable indicates survival in 6 months.  Another reports days known to survive, if the patient has died and otherwise null.  Thus a null value in this latter variable indicates the patient did not die.  

The functional disabilities are probabilities that the patient has the disability.  These probabilities are generated from the CCS diagnoses and demographics of the person. Use longterm disabilities. These are the disabilities with suffix 365.  If the disability is higher than 0.5, then assume the person is disabled. 

  1. Clean the data using SQL.  Convert the disabilities to binary variables.  Convert the age to decades using Floor function
  2. Group the data by MFH,  disabilities, binary race variables, gender, and decades.  Call each combination a stratum.
  3. Calculate the probability of survival in each strata.  
  4. Create a regression model to explain the relationship among the variables and survival. 
  5. Explain the probability of survival for a strata with the highest probability of survival.  
  6. List the top 4 predictors of survival (list these predictors using English language and not coded data). 
  7. Describe, in English, if the MFH program contributes to survival.  Provide the evidence for your claim.

Use the instructor's last name as the password for the data.  Data► Teach One (Python)►

Question 3: The following data provide the length of stay of patients seen by Dr. Smith (Variable Dr Smith=1) and his peer group (variable Dr. Smith = 0).  Does Dr. Smith see a different set of patients than his peer group?  In particular, what is the probability of patients being seen by Dr. Smith.  Regress the choice of provider on the 9 diagnoses provided.  Data►  Kavalloor's Teach One► Teach One (Python)►

Question 4:  In a nursing home, data were collected on residents' survival and disabilities.  The data are listed in the following order: ID, age, gender (M for male, F for Female), number of assessments completed on the person, number of days followed, days since first assessment, days to last assessment, unable to eat, unable to transfer, unable to groom, unable to toilet, unable to bathe, unable to walk, unable to dress, unable to bowel, unable to urine, dead (1) or alive (0), and assessment number.  Predict from the patient's assessments (i.e. their age and current disabilities at time of assessment) if the patient is likely to die. Here are the steps in this analysis:  Data► Joo Li's Teach One► Joo Li's SQL Code► Teach One (Python)►

  1. Read the data, making sure all entries are numbers.  Calculate age at each assessment not just at first assessment. 
  2. Clean the data, removing impossible situations (remove cases with date of assessment after death). 
  3. Remove irrelevant cases (all cases that have only one assessment)
  4. For each assessment, remove all assessments that are more than 6 months older. 
  5. Organize age at current admission into a binary variable above or below the average age at current assessment.
  6. Calculate a new variable for each assessment that checks if the person would have an eating in the next 6 months.  This requires you to join the data for each person with itself (excluding all assessments prior or including current assessment). 
  7. Group the data based on current disabilities, gender, and age.  Count the number of residents who died within 6 months of assessment for combination of disabilities, gender and age.  To do this, first assess the number of days from first assessment for the death.  Then examine if the assessment time is within 180 days of day of death. 
  8. Use ordinary regression to regress the Logit of odds of dying on various current disabilities, age, gender, and pair wise interactions of these variables. 
  9. Identify what is the Markov Blanket of feeding disability in 6 months. 

Question 5: Repeat question 4 but now predict 6 month likelihood of first occurrence of walking disorders instead of death.  In this analysis, exclude all assessments that occur after walking disability has occurred. Data► Chelsea Zabowski's Teach One ► R-code► Teach One (Python)►


For additional information (not part of the required reading), please see the following links:

  1. Regression using R Read►
  2. Statistical learning with R Read►
  3. Open introduction to statistics Read►

This page is part of the course on Comparative Effectiveness by Farrokh Alemi PhD Home►  Email►