Statistical Process Improvement
Georgetown University

Risk Assessment


This lecture we focus on  patients’ prognosis / risk / severity of illness.  Not all patients are the same. They may have the same disease but radically different severity of illness.  All analysis of healthcare outcomes starts with understanding the patient and their comorbidities better. This week we answer age old patient questions such as “What is going to happen to me?” A few years ago, I designed the  multimorbidity indices. These indices are quite accurate and a good way to capture differences among the patients. We will discuss how CMS currently measures patients’ prognosis. The information you learn this week will be used in subsequent weeks to do risk-adjusted control charts. This week we focus on the use of these indices in measuring patients’ prognoses. .

Assigned Reading

  • Chapter 5 in Big Data in Health Care: Statistical Analysis of Electronic Health Record Read►


  1. Construction of multimorbidity indices
  2. Predicting from multimorbidity indices
  3. Checking accuracy of multimorbidity models


Instruction for Submission of Assignments: Assignments should be submitted directly on Blackboard.  In rare situations assignments can be sent directly by email to the instructor. Submission should follow these rules:

  1. Make sure that your control charts follow the visual rules below:  (1) Control limits must be in red and without markers, (2) Observed lines must have markers, (3) X and Y axis must be labeled, and (4) Charts must be linked to the data. 
  2. Submit a summary page, as separate file or as first part of your response file.  In the summary page you should list how your answers to the question differs from answers provided within the assignment (inside Teach One or other answers).  You must indicate for each question if your control chart is exactly the same as seen in Teach One or other formats.  For each question, you must indicate if the answers you have provided is the same as the answers supplied on the web.  If there are no answers provided, you must indicate that there were no answers available on the web to compare your answers to.

Question 1: Construct a simple Multi-Morbidity Index.  Assess the average severity of CHF, MI, Diabetes, Hypertension, Alcohol Use, and ACL surgery (assume that sicker patients have longer stays). To calculate the average severity associated with a disease, compare all cases with the disease to all control patients without the disease. Make sure that in each comparison, patients with and without disease have the same set of comorbidities. For example, to find the average length of stay for patients with MI, select all MI patients with the following comorbidities: CHF,DM,AA.  Then compare these cases to controls who do not have MI but have the same comorbidities. To help you understand this assignment, consider the following table.  In this table, we see different strata of mutually exclusive and exhaustive comorbidities.  Then, within each strata we can observe the impact of MI.  The impact of MI is the average impact of MI within each strata.  Your objective is to create this table before you calculate the impact of MI.  To do so, first estimate n1 through n5 through a file where the data is restricted to cases with MI (WHERE MI=1).  Then, estimate the values n6 through n10 through a file where the data are restricted to non-MI patients (WHERE MI=0).  Merge these data, making sure that you match on the strata.  Then you can calculate the impact of MI. 

Strata Cases of MI Controls with No MI Difference
AA n1 n6 n1-n6
AA, CHF n2 n7 n2-n7
AA,DM n3 n8 n3-n8
DM, CHF n4 n9 n4-n9
AA,DM,CHF n5 n10 n5-n10

Question 2:  Calculate the likelihood ratio associated with each diagnosis in predicting above or below average length of stay.  First calculate the weighted average length of stay for each diagnosis.  Assign individuals who have above weighted average length of stay 1 and those below 0.  Next calculate the likelihood ratio (remembering to weight each strata by the count of patients that fall into each strata).  To calculate the likelihood ratio, select all individuals who have above weighted average length of stay.  Examine the prevalence of the diagnosis among them.  Select all individuals who have below weighted average length of stay and select the prevalence of the diagnosis among them.  The ratio of these two calculated numbers constitutes the likelihood ratio. Enter the likelihood ratios calculated for each diagnosis into a Table called #LR.

ICD9 Likelihood Ratio
MI 3.47

Question 3:  Using the Table #LR (see question 2), calculate the probability of long stay (not length of stay but the probability of long stays) for the 3 patients described below.  Note that diagnoses codes that do not have an associated likelihood ratio should be assigned a likelihood ratio of 1 or ignored. Please use SQL/Python code to calculate multiplication of the likelihood ratios. Since the data are presented in one column, the attached slides and video can help you calculate the probability of a set of likelihood ratios in one column.  You should not enter the values by hand.  You will not be able to do so when there are large number of patients.

1 AA
1 DM
2 MI
3 DM
3 AA
3 MI

Question 4: Calculate the likelihood ratio associated with each diagnosis. You would need to use SQL to do this assignment.  This is a massive database of 17 million records. Microsoft SQL server can analyze the entire data in one run. Submit your SQL code and the 10 diagnosis with highest and lowest likelihood ratios.  By opening this file you agree not to share the file with anyone else.

Question 5: Calculate the Receiver Operating Characteristic curve associated with predicting from age whether the patient will live or die.  The data do not provide a predicted values, but you can construct a prediction based on age exceeding the cutoff value.  If the patient's age exceeds the cutoff value, then we would predict that the patient will die.  For example, the cutoff values for the ages provided can be 40, 50, 60, 70, 80, 90.  If we take the cutoff value 60, then we predict that all patients above 60 will die and all patients below 60 will live.  Calculate sensitivity and specificity for all cutoffs.  Then, draw the Receiver Operating Characteristic curve.  At what age cutoff the sum of the sensitivity and specificity is at its maximum?

   Mid-Point of Age Range Total 
45 55 65 75 85
True Condition  Alive  33 30 35 25 20 143
Dead  3 5 7 15 33 63
Total  36 35 42 40 53 206

  • Download above table Data►
  • Belaineh's Teach One Slides► YouTube►
  • Guide to Receiver Operating Curve and SQL Code Textbook pages 114-117►
  • Answer to question Answer►

    Question 6: Calculate the Receiver Operating Characteristic curve for male and female ages in the attached data set. The field "Alive" contains whether the patient was alive in 6 months post assessment  date.  Alive is shown with 0 and dead with 1.  Age at start of data collection is provided in the field "Age."  Patients' age at assessment must be calculated from age at start and additional days till assessment. The field "DaysFirst" indicates number of days since recording of the age.  The field "Sex" indicates whether male (M) or female (F). 

    Question 7:  Using the likelihood ratios provided, calculate the probability of mortality for the cases described in the case file. In the attached likelihood ratio file, ICD9 variable refers to the diagnosis code, RDX refers to the first, second, third, fourth or fifth time the diagnosis code occurs for the same person. LR is the calculated likelihood ratio for a specific diagnosis and repeat code.  A likelihood ratio above 1 indicates the person is more likely to die. The variable nDead refers to number of patients with specific diagnosis and repeat code dying in 6 months, nAlive refers to number of patients with specific diagnosis and repeat code being alive in 6 months. Using the case file, predict the probability of death in 6 months for cases with different medical history. List the 4 patients with the highest probability of mortality. 












































    • Introduction to Standard Query Language  More►
    • A comparative study of various severity measures. PubMed►
    • Use of severity of illness to classify and monitor medication errors More►
    • Use of pharmacy data to measure severity of illness More  
    • Nursing severity indices More►
    • Measures of outpatient severity of illness More►
    • Severity of episodes of illness More►
    • Disease Staging More►
    • Patient Management Categories More►
    • The Acute Physiological and Chronic Health Evaluation (APACHE) index More►
    • Medisgroup More►
    • Computerized Severity Index More►
    • Diagnosis based severity indices Indices Charlson Index► Elixhauser List►
    • Comorbidity-Poly-pharmacy Score More►


Copyright 1996 Farrokh Alemi, Ph.D. Created on Sunday, October 06, 1996 4:20:30 PM Most recent revision 09/21/2022.  This page is part of the course on Quality, the lecture on Risk Assessment.