Risk Assessment

This lecture we focus on patients’ prognosis / risk / severity of illness. Not all patients are the same. They may have the same disease but radically different severity of illness. All analysis of healthcare outcomes starts with understanding the patient and their comorbidities better. This week we answer age old patient questions such as “What is going to happen to me?” A few years ago, I designed the multimorbidity indices. These indices are quite accurate and a good way to capture differences among the patients. We will discuss how CMS currently measures patients’ prognosis. The information you learn this week will be used in subsequent weeks to do risk-adjusted control charts. This week we focus on the use of these indices in measuring patients’ prognoses. .

Assigned Reading

Chapter 5 in Big Data in Health Care: Statistical Analysis of Electronic Health Record Read►

Presentations

Construction of multimorbidity indices
- Selecting the right predictors Slides► YouTube►
- SQL for calculating Likelihood ratios Slides► YouTube►
Predicting from multimorbidity indices
- Product of values in one column of data Slides► YouTube►
- SQL code for predicting outcomes Slides► YouTube►
- Basic ideas for predicting YouTube►
- Scoring multimorbidity indices Slides► YouTube►
Checking accuracy of multimorbidity models
- Calculation of sensitivity, specificity, and Area under the Receiver Operating Curves Excel► Slides► YouTube►

Assignments

Instruction for Submission of Assignments: Assignments should be submitted directly on Blackboard. In rare situations assignments can be sent directly by email to the instructor. Submission should follow these rules:

Make sure that your control charts follow the visual rules below: (1) Control limits must be in red and without markers, (2) Observed lines must have markers, (3) X and Y axis must be labeled, and (4) Charts must be linked to the data.
Submit a summary page, as separate file or as first part of your response file. In the summary page you should list how your answers to the question differs from answers provided within the assignment (inside Teach One or other answers). You must indicate for each question if your control chart is exactly the same as seen in Teach One or other formats. For each question, you must indicate if the answers you have provided is the same as the answers supplied on the web. If there are no answers provided, you must indicate that there were no answers available on the web to compare your answers to.

Question 1: Construct a simple Multi-Morbidity Index. Assess the average severity of CHF, MI, Diabetes, Hypertension, Alcohol Use, and ACL surgery (assume that sicker patients have longer stays). To calculate the average severity associated with a disease, compare all cases with the disease to all control patients without the disease. Make sure that in each comparison, patients with and without disease have the same set of comorbidities. For example, to find the average length of stay for patients with MI, select all MI patients with the following comorbidities: CHF,DM,AA. Then compare these cases to controls who do not have MI but have the same comorbidities. To help you understand this assignment, consider the following table. In this table, we see different strata of mutually exclusive and exhaustive comorbidities. Then, within each strata we can observe the impact of MI. The impact of MI is the average impact of MI within each strata. Your objective is to create this table before you calculate the impact of MI. To do so, first estimate n1 through n5 through a file where the data is restricted to cases with MI (WHERE MI=1). Then, estimate the values n6 through n10 through a file where the data are restricted to non-MI patients (WHERE MI=0). Merge these data, making sure that you match on the strata. Then you can calculate the impact of MI.

Download Data►
Guide Video► Slides►
Answer for LOS►
Aminalla's Python Teach One Slides► You Tube►

Strata Cases of MI Controls with No MI Difference

AA n1 n6 n1-n6

AA, CHF n2 n7 n2-n7

AA,DM n3 n8 n3-n8

DM, CHF n4 n9 n4-n9

AA,DM,CHF n5 n10 n5-n10

Question 2: Calculate the likelihood ratio associated with each diagnosis in predicting above or below average length of stay. First calculate the weighted average length of stay for each diagnosis. Assign individuals who have above weighted average length of stay 1 and those below 0. Next calculate the likelihood ratio (remembering to weight each strata by the count of patients that fall into each strata). To calculate the likelihood ratio, select all individuals who have above weighted average length of stay. Examine the prevalence of the diagnosis among them. Select all individuals who have below weighted average length of stay and select the prevalence of the diagnosis among them. The ratio of these two calculated numbers constitutes the likelihood ratio. Enter the likelihood ratios calculated for each diagnosis into a Table called #LR.

Download Data►
SQL code for calculating likelihood ratios SQL►
Kanfer's Teach One YouTube►
Lakkakula's Python Teach One Slides► You Tube►
Ramidi's Python Teach One Slides►

ICD9	Likelihood Ratio
AA
CHF
DM
MI	3.47

Question 3: Using the Table #LR (see question 2), calculate the probability of long stay (not length of stay but the probability of long stays) for the 3 patients described below. Note that diagnoses codes that do not have an associated likelihood ratio should be assigned a likelihood ratio of 1 or ignored. Please use SQL/Python code to calculate multiplication of the likelihood ratios. Since the data are presented in one column, the attached slides and video can help you calculate the probability of a set of likelihood ratios in one column. You should not enter the values by hand. You will not be able to do so when there are large number of patients.

SQL Slides► Video►
Vittorio's Python Teach One Slides► You Tube►
Dr. Uriyo's Python solution Slides► You Tube►

ID	ICD9
1	AA
1	DM
1	HIV
2	MI
2	CHF
3	DM
3	CHF
3	AA
3	MI

Question 4: Calculate the likelihood ratio associated with each diagnosis. You would need to use SQL to do this assignment. This is a massive database of 17 million records. Microsoft SQL server can analyze the entire data in one run. Submit your SQL code and the 10 diagnosis with highest and lowest likelihood ratios. By opening this file you agree not to share the file with anyone else.

Massive data (for password contact your instructor) Download►
SQL code Access► Microsoft SQL Server►
Marla's guide Slides►
Answer►

Question 5: Calculate the Receiver Operating Characteristic curve associated with predicting from age whether the patient will live or die. The data do not provide a predicted values, but you can construct a prediction based on age exceeding the cutoff value. If the patient's age exceeds the cutoff value, then we would predict that the patient will die. For example, the cutoff values for the ages provided can be 40, 50, 60, 70, 80, 90. If we take the cutoff value 60, then we predict that all patients above 60 will die and all patients below 60 will live. Calculate sensitivity and specificity for all cutoffs. Then, draw the Receiver Operating Characteristic curve. At what age cutoff the sum of the sensitivity and specificity is at its maximum?

		Mid-Point of Age Range					Total
		45	55	65	75	85	Total
True Condition	Alive	33	30	35	25	20	143
	Dead	3	5	7	15	33	63
	Total	36	35	42	40	53	206

Download above table Data►
Belaineh's Teach One Slides► YouTube►
Guide to Receiver Operating Curve and SQL Code Textbook pages 114-117►

Answer to question Answer►

Question 6: Calculate the Receiver Operating Characteristic curve for male and female ages in the attached data set. The field "Alive" contains whether the patient was alive in 6 months post assessment date. Alive is shown with 0 and dead with 1. Age at start of data collection is provided in the field "Age." Patients' age at assessment must be calculated from age at start and additional days till assessment. The field "DaysFirst" indicates number of days since recording of the age. The field "Sex" indicates whether male (M) or female (F).

Question 7: Using the likelihood ratios provided, calculate the probability of mortality for the cases described in the case file. In the attached likelihood ratio file, ICD9 variable refers to the diagnosis code, RDX refers to the first, second, third, fourth or fifth time the diagnosis code occurs for the same person. LR is the calculated likelihood ratio for a specific diagnosis and repeat code. A likelihood ratio above 1 indicates the person is more likely to die. The variable nDead refers to number of patients with specific diagnosis and repeat code dying in 6 months, nAlive refers to number of patients with specific diagnosis and repeat code being alive in 6 months. Using the case file, predict the probability of death in 6 months for cases with different medical history. List the 4 patients with the highest probability of mortality.

Likelihood Ratios►
Case files►
Python solution by Dr. Uriyo Slides► You Tube►

icd9

RDX

LR

nDead

nAlive

TotalDead

TotalNotDead

I272.4

1

0.209880372

10316

247365

121567

611807

I272.4

2

0.332834587

3788

57277

121567

611807

I272.4

3

0.430493078

2407

28139

121567

611807

I272.4

4

0.481841512

1509

15761

121567

611807

I272.4

5

0.927568154

1769

9598

121567

611807

Introduction to Standard Query Language More►
A comparative study of various severity measures. PubMed►
Use of severity of illness to classify and monitor medication errors More►
Use of pharmacy data to measure severity of illness More
Nursing severity indices More►
Measures of outpatient severity of illness More►.
Severity of episodes of illness More►.
Disease Staging More►
Patient Management Categories More►
The Acute Physiological and Chronic Health Evaluation (APACHE) index More►
Medisgroup More►
Computerized Severity Index More►
Diagnosis based severity indices Indices Charlson Index► Elixhauser List►
Comorbidity-Poly-pharmacy Score More►

Risk Assessment

Assigned Reading

Presentations

Assignments

More