Statistical Process Improvement
Georgetown University

Bivariate Analysis


Assigned Reading

  • Chapter 4 in Big Data in Health Care: Statistical Analysis of Electronic Health Record Read►



Instruction for Submission of Assignments: Assignments should be submitted directly on Blackboard.  In rare situations assignments can be sent directly by email to the instructor. Submission should follow these rules:

  • The first sheet in the file should be a summary page.  In the summary page you should list how your answers to the question differs from answers provided within the assignment (inside Teach One or other answers).  You must indicate for each question if your control chart is exactly the same as seen in Teach One or other formats.  For each question, you must indicate if the answers you have provided is the same as the answers supplied on the web.  If there are no answers provided, you must indicate that there were no answers available on the web to compare your answers to.

A. This problem describes a type of problem typically discussed in Marketing classes, where managers are trained to understand market participation and market share.  We have simplified the number of variables and cases in the problem to make it easier to analyze. A typical realistic problem may have hundreds of variables and thousands of cases. Data►

  1. What is the probability of hospitalization given that you are male? Select all males and count the number of patients who were hospitalized. Calculate the probability as the ratio of males hospitalized to number of males.  Video► SWF►
  2. Create a contingency table for interaction between age and gender.  Is age independent from gender. Answer►
  3. Is insurance independent of age?  Check that the probability of combination of insurance and age can be estimated from the product of probability of insurance and age or use the contingency table of age and insurance.
  4. What is the probability associated of being more than 65 years old among hospitalized patients? Start by selecting all hospitalized patients, then count the number among hospitalized patients who are more than 65 years old.  The likelihood or the probability of being over 65 among hospitalized patient is the number of patients hospitalized and above 65 divided by the number of hospitalized patients:
    Likelihood Formula
  5. What is the probability of being hospitalized given you are more than 65 years old?  This time we are switching the condition. Now we are asking for the probability among patients who are more than 65 years old.  So select all patients who are more than 65 years old and then count the number who are hospitalized.  In contrast to the previous question the ratio is calculated by dividing the number of patients above 65 who were hospitalized divided by number above 65 years. Elina's SQL►
  6. In predicting hospitalization, what is the likelihood ratio, LR, associated with being more than 65 years old?  This is not the same as the likelihood of being above 65 given that you are hospitalized.  It should be calculated as follows:
    Likelihood Ratio Formula
  7. What is the prior odds for hospitalization before any other information is available?  The probability of hospitalization is calculated as the number hospitalized by the number in the sample. Prior odds is calculated as the probability of hospitalization by one minus the probability.  A simpler way to do so, the prior odds is the ratio of number hospitalized divided by the number not hospitalized or as:

    Prior Odds Formula

  8. Analyze the data in the Table and report if any two variables are conditionally independent of each other in predicting probability of hospitalization? Consider the pairs  Gender & Age, Age & Insured, and Gender & Insured.  If two events are independent, then the likelihood ratio associated with the combined event should be the product of the likelihood ratios of each event.  If the likelihood ratio cannot be calculated because of division by zero, then skip that check. In using likelihood ratios to test the independence of two variables, note that you have to test it for all levels in the variable.  So for example, if we are examining the independence of age and gender, then you would test the independence of four set of combination of variables from their components: 
    • Likelihood ratio Age>65 and Male = Likelihood ratio of Age>65 * Likelihood ratio of Male
    • Likelihood ratio Age>65 and Female = Likelihood ratio of Age>65 * Likelihood ratio of Female
    • Likelihood ratio Age<=65 and Male = Likelihood ratio of Age<=65 * Likelihood ratio of Male
    • Likelihood ratio Age<=65 and Female = Likelihood ratio of Age<=65 * Likelihood ratio of Female
     Keep in mind that because the number of cases are too few, many ratios cannot be calculated.



Copyright 1996 Farrokh Alemi, Ph.D. Most recent revision 05/12/2023.  This page is part of the course on Statistical Process Control, this is the lecture on Introduction.