Process Improvement

Georgetown University


Benchmarking Clinicians

This week we discuss benchmarking and how you can compare apples-to-apples and oranges-to-oranges.  Clinicians and their peer groups differ in types of patients they see. We will show you how to use data balancing to compare them on the same type of patients. We motivate the concepts using decision trees but then move quickly to using Structured Query Language for large data analysis.

This week is unusual for a course on process improvement.  Statistical process control classes rarely talk about benchmarking. When they do, they rarely talk about data balancing. This stuff is the real frontier of the field.  Stay with me as we go through the theoretical basis of it, concepts like distribution switches and synthetic cases may seem new and somewhat esoteric but stay with the material as the end result is practical and useful.

Assigned Reading

  • Chapter 17 in Big Data in Health Care: Statistical Analysis of Electronic Health Record Read►



Instruction for Submission of Assignments: Assignments should be submitted directly on Blackboard.  In rare situations assignments can be sent directly by email to the instructor. Submission should follow these rules:

  1. Make sure that any control charts follow the visual rules below:  (1) Control limits must be in red and without markers, (2) Observed lines must have markers, (3) X and Y axis must be labeled, and (4) Charts must be linked to the data. 
  2. Submit a summary file or put a summary at start of your response file.  In the summary page you should list how your answers to the question differs from answers provided within the assignment (inside Teach One or other answers).  You must indicate for each question if your control chart is exactly the same as seen in Teach One or other formats.  For each question, you must indicate if the answers you have provided is the same as the answers supplied on the web.  If there are no answers provided, you must indicate that there were no answers available on the web to compare your answers to.

Question 1: In the following question, use SQL, Python, or Excel to analyze the data.  Assume that we have followed two clinicians, Smith and Jones, and the two decision trees in Figure 1.

Figure 1:  Practice Patterns of Dr. Jones and Smith

  • What is the expected length of stay for each of the clinicians?
  • What is the expected length of stay for Dr. Smith if he were to take care of patients of Dr. Jones?
  • What is the expected length of stay for Dr. Jones if he were to take of patients of Dr. Smith?

Question 2:  The following data report patients with 10 Diagnostic Related Groups and 3 HCC indices cared for by a clinician and his peer group.  Use SQL to determine if the clinician is more efficient than his peer group. 

Question 3: The following table shows the observed and expected length of stay for 30 patients.  Use paired comparison of means to test that the expected and observed length of stay are the same.  Assuming normal distribution of the length of stay, use risk-adjusted control chart to plot the data.  Make sure that control limits are derived from the expected values and observations are contrasted to these limits.  This analysis can be done using Tukey or XmR and you need to select which chart produces tighter control limits.  The conclusions you arrive at based on (a) paired comparison of expected and observed length of stay and (b) the risk-adjusted control charts should be the same if in both situations we were calculating the control limits from the same number of cases.  Are they?  

Question 4:  Use the procedure described for outcomes in synthetic cases, to estimate mortality rate for 80 year residents with walking and toileting disabilities but no other disabilities.  Note that we want to rely on at least 30 cases in making this estimate.  In the database there are not 30 cases with these two disabilities and 80 years of age.  Therefore, we would like you to estimate the survival days using synthetic case outcomes. You can create a synthetic case from 80 year olds who are unable to walk and residents who are unable to toilet.  Alternatively you can select a different set of residents, such as 80 year olds who are unable to toilet and residents who are unable to walk. 

Also note that the data do not have headers.  Use the following dictionary of variables to create a header for the data

Order Variable Description
1 ID Resident's ID
2 Age Age at first assessment
3 Sex Gender of resident
4<4 tAssess Number of assessments 
5 Followed Days resident followed
6 DaysFirst Days from first assessment
7 DaysLast Days to last assessment
8<8 uEat Unable to eat
9 uSit Unable to sit
10 uGroom Unable to groom
1111 uToilet Unable to toilet
12 uBathe Unable to bathe
1313 uWalk Unable to walk
14 uDress Unable to dress
15 uBowel Bowel incontinent 
16 uUrine Urine incontinent 
17 EverDead Patient dead at one point in time
18 AssessID Assessment ID
19 Dead6Months Dead within 6 months of assessment

Question 5:  The following data show the recovery from various disabilities in two nursing homes.  Two sets of data are presented.  The first set shows the disabilities of the patients at admission to the nursing home, using variables that start with "u", standing for "unable".  The recovery from the disabilities is also shown in variables that start with "r".  Compare the performance of these two nursing homes using distribution switch method.  In particular, switch the distribution for age, gender, and 9 disabilities on admission.   The outcome of interest is the number of disabilities recovered from (variable shown as nRecovery).  Use synthetic method to estimate outcome for cases not present in both nursing homes. Which nursing home has better outcome for its own residents?  What happens if residents at nursing home A were cared for at nursing home B, which nursing home would have better outcomes now?  What will happen if the reverse happens? 

Question 6: The following data report length of stay (LOS) for 10 patients of Dr. Jones and 10 patients of Dr. Smith. What is the expected outcome (average outcome) for Dr. Smith? What is the expected outcomes if Dr. Jones if he was seeing Dr. Smith's patients? To answer this question, replace each outcome of Dr. Jones with average outcome of same type of patient seen by Dr. Smith. Is Dr. Smith more efficient than Dr. Jones? Make sure that you submit an Excel sheet with formulas for all calculated values.

Dr. Smith
Patient Previous MI CHF Shock LOS
1 1 1 0 4
2 1 1 0 5
3 1 0 0 4
4 1 0 1 5
5 1 0 1 4
6 1 0 1 4
7 1 0 1 5
8 0 0 0 2
9 0 0 0 2
10 0 0 0 1
Dr. Jones
Patient Previous MI CHF Shock LOS
1 1 1 0 5
2 1 1 0 5
3 1 1 0 5
4 1 1 1 5
5 1 0 1 5
6 1 0 1 5
7 1 0 1 5
8 1 0 0 4
9 0 0 0 2
10 0 0 0 2

Question 7: In data presented in question B, what is the expected outcome if Dr. Smith sees patients of Dr. Jones?  Note that Dr. Smith does not see any patient like patient 4 of Dr. Jones.  We need to estimate a synthetic control for this patient.  To do so, filter the data for patients of Dr. Smith (this is already done since the data of Dr. Smith is presented separately).  Regress length of stay on previous MI, CHF and Shock.  You learned about regression in the first part of this course. Evaluate the regression equation at values corresponding to the condition of patient 8 of Dr. Jones.  Use the regression prediction of length of stay to create a synthetic patient for Dr. Smith and calculate the expected outcome for Dr. Smith seeing patients of Dr. Jones.  Make sure that you submit Excel sheet with formulas for all calculated values. 


  1. Practice profiling PubMed► 
  2. Importance of risk adjustment in measuring performance in primary care PubMed►

Prepared by Farrokh Alemi, Ph.D. This page is part of the course on Statistical Process Improvement