Statistical Process Improvement
Georgetown University

Review of Probability and Distributions


This week we talk through how we can measure and describe uncertainty.  When a person says to his partner “I am not sure where this relationship is going?” what does that mean precisely?   Can we assign numbers to our uncertainty about the future, even uncertainty about love.  We are all familiar with probability as a frequency of an event but not as a strength of our belief in the likelihood of the event. This week we look at probability and its calculus as a method of organizing opinions.  The neat part of this week is the procedure for estimating the probability of rare events.  In healthcare, adverse events such as wrong side surgery are rare. This week, we learn how to accurately measure rate of occurrence of rare events.  When it comes to acting under uncertainty, one needs to estimate conditional probability, where the probability of outcome is calculated under different actions. Conditional probabilities are calculated by shrinking the universe of possibilities. Once conditional probabilities are calculated, expectation is used to recommend action under uncertain situations. Enough about what is coming up, let us proceed.

Assigned Reading

This section of the course is a review of material you have had in an introductory course on statistics

  • Chapter 3 in Big Data in Health Care: Statistical Analysis of Electronic Health Record Read►


  1. Defining probability  Slides► YouTube►
  2. Probability calculus  Slides► YouTube►
  3. Probability distributions & expectations  Slides► Video► YouTube►
  4. Mathematical concept of expectation Slides► YouTube►
  5. Expected variability in random variables  Slides►  Video►  YouTube►
  6. Descriptive statistics  Slides► Video►
  7. Measures of variability Video►  Slides►  YouTube►
  8. SPSS Tutorial 1►  Tutorial 2►
  9. Population and sampling YouTube►
  10. Observational studies   Slides► Video► YouTube►
  11. Numerical data  YouTube► Excel►
  12. Data basics Slides► Video► YouTube►
  13. Data matrices and types of variables Slides► Video►  YouTube►
  14. Variance and standard deviation Slides►  Video►  YouTube►
  15. Box plots, quartiles, and the median How to►
  16. Histograms and shape  How to►  How to►  YouTube►


Instruction for Submission of Assignments: Assignments should be submitted directly on Blackboard.  In rare situations assignments can be sent directly by email to the instructor. Submission should follow these rules:

  1. Submit a Jupyter Notebook file or Excel/Word/Python file.  Try to answer all questions in one document in separate sheets, or sections. 
  2. Make sure that any control charts follow the visual rules below:  (1) Control limits must be in red and without markers, (2) Observed lines must have markers, (3) X and Y axis must be labeled, and (4) Charts must be linked to the data. 
  3. The first sheet of Excel or first section of the file should be a summary page.  In the summary page you should list how your answers to the question differs from answers provided within the assignment (inside Teach One or other answers).  You must indicate for each question if your control chart is exactly the same as seen in Teach One or other formats.  For each question, you must indicate if the answers you have provided is the same as the answers supplied on the web.  If there are no answers provided, you must indicate that there were no answers available on the web to compare your answers to.

Help is Available:  A tutor is available to walk you through these assignments and answer your questions.  Regular synchronous sessions are available to help you and your peer discuss these assignments.  Dr. Alemi and Dr.Uriyo are available on text between 9 am and 9 pm and welcomes your text. 

Question 1: In this problem we ask you to calculate a case mix index for a hospital from classification of its patients into Diagnostic Related Groupings (DRGs).  In Health Administration programs case mix issues arises in multiple courses where severity of patients receiving care in different hospitals are discussed.  The case mix index allows the comparison of two hospitals. It is generally calculated as a weighted length of stay across all DRGs see in the hospital. The concept of weighted average was discussed in this section.  In a case mix index, the weights are the probability of observing patients in a particular DRG category.  Each DRG category is assumed to be mutually exclusive and exhaustive.  The number of patients who are admitted for different DRGs are indicated in the attached data file.  From these numbers you calculate the probability of each DRG.  By multiplying the probability of the DRG by length of stay you get the contribution of each DRG.  The case mix index is the sum of the product of probability of each DRG and length of stay within each DRG.  The higher the case mix index, the larger the expected length of stay at the hospital. Which hospital has a higher case mix index? 

Question 2: Download Hospital Compare Data using the link below.  Select flat file "Complications - Hospital.CSV"  Read the data into Excel.  For all hospitals select "Rate of complications for hip/knee replacement patients".  You can do this by using Excel's filter. Calculate the average rate across all hospitals. Calculate the standard deviation for the rate across all hospitals. Excel has commands for calculation of standard deviation and averages, please use these commands. Report the average rate and the standard deviation of the rate to your instructor (do not include the data in your submission).  Data are also available through Medicare Compare site:

Question 3: For this question use the file "Complications - Hospital.CSV" in Hospital Compare.  Same file was also downloaded for question 2.  Make a histogram of the rate of complications for hip/knee replacement patients at different hospitals using the data you downloaded in the previous step. 

Question 4: For this question use the file "Complications - Hospital.CSV" in Hospital Compare.  Same file was also downloaded for question 2.  Plot the relationship between rate of complications for hip/knee replacements and pressure score.  Use scatter plot in Excel.  Have the rate of complications as X-axis and pressure scores as Y axis.  



Copyright 1996 Farrokh Alemi, Ph.D. Most recent revision 01/29/2024.  This page is part of the course on Statistical Process Control, this is the lecture on Introduction to Probability.