Book on Statistics for Health Care Managers:  Analysis of Data in Electronic Health Records

 

by Farrokh Alemi, PhD

 

Chapter 1: Introduction►

Chapter 2: Preparing Data►

Chapter 3: Probability and Distributions►

Chapter 4: Bivariate Analysis►

Chapter 5: Risk Assessment►

Chapter 6: Comparison of Means►

Chapter 7: Comparison of Rates►

Chapter 8: Analysis of Time to Adverse Events►

Chapter 9: Tukey Chart►

Chapter 10: Causal Control Charts►

Chapter 11: Multivariate Regression►

Chapter 12: Logistic Regression►

Chapter 13: Poisson Regression►

Chapter 14: Matched Case Control►

Chapter 15: Stratified Covariate Balancing►

Chapter 16: Stratified Regression►

 

 

 

1. Introduction

Question 1: Select  your teach one assignments.  We subscribe to the principle of "Learn one, do one, teach one."  The best way to learn is to teach the topic.  At start of the course, you select a topic in the course to teach.  A week prior to the due date, you submit all assignments in the topic to the instructor. On due date, you email to all students in the class a You Tube video showing how to solve assigned homework.  Missing any of the deadlines on this assignment will lead to 10% reduction in your grade for each week of delay.  If you and others have the same topic, you can focus on different parts of the lecture or assignments. Use Blackboard discussion to indicate the topic you want to focus on. If a student has already taken the topic, it is your responsibility to negotiate how your focus will differ.

Question 2: Plot the following data, where the X axis is the time periods, the Y-axis shows the observed value, the upper control limit and the lower control limits.  Distinguish between the first seven data periods as these were collected pre-intervention.  Remaining data points were collected post intervention.  Title the chart. Create a legend that defines the name for various lines.  Make sure that the observation line has markers and the control limits have no markers.  Make portion of Upper and Lower limit lines that are post intervention dashed.  Make portion of the line that is pre-intervention straight line.  Make all Upper and Lower limit lines red.  Data► Slides►  Listen►

Question 3: This problem will test your familiarity with Excel. Consider the following population data, P: {0, 4, 10, 15, 47, 100}. Assume that you are sampling WITHOUT replacement i.e. once a population element is selected in a sample, that element cannot be selected again for that sample.

  •  List all possible samples of size 2 which can be drawn from set P. For example (0, 4) is a sample of size 2. Another such sample is (15, 47). Verify that you have listed all 15 possible samples.
  • Calculate the arithmetic mean for each individual sample that you listed earlier. Since there are 15 samples you should end up with 15 means. List all means.
  •  Calculate the arithmetic mean of the 15 means that you calculated earlier. This is the average value of all sample means. Compare this value to average of the initial 6 numbers, the population mean. Are they close? Same?

Question 4: In the following Table, fill in all cell values so that all indicated relationships hold. X and Y are random variables.  X with a bar above it indicates the average of X values in the column X. This problem will test your understanding of several principles related to elementary statistics and algebraic operations. Submitted your completed work as an Excel file.

Question 5: The following data are arranged in order of increasing magnitude. Figure out the two missing values (represented by question marks) such that the mean for this data equals the median (for this problem, assume that missing values are positive integers).  Data: ?, 2, 4, 5, 6, 8,?

2. Preparing Data

To complete these assignments, you need to download either Microsoft SQL Server or Microsoft Access.  Video►  

  • MAC users should first download and install virtualization software (VirtualBox, VMWare Player/Fusion, Parallels). Get a copy of an operating system from MSDN (i.e. Windows 10), or Linux (i.e., Ubuntu, or CentOS). Configure a virtual machine and install the guest operating system. Note: In this course we will be mostly using VirtualBox, but you can use any virtualization solution. 
  • Install MS SQL.  DreamSpark (with Microsoft software) library is available free of charge for GMU students. Click on start shopping.  Select SQL Server 2014 Express With Advanced Services With Service Pack 1 32/64-bit (English). Download►  
  • Use your GMU email to register (without Masonlive, i.e. userid@gmu.edu). Do not use your GMU password. Only Microsoft software is available on this site. Software is FREE. Contact instructor if your password does not work.

Question 1: Import data from the following four files into four tables.  Ptid file► Claims file► ICD file► CPT file► Video►

  • Identify patients that have diabetes in the above database.  Video►
  • Calculate the average cost of each diagnosis sorted from most expensive to least expensive.  Exclude all bills with negative or 0 values.  Video►
  • Show if men are more likely to have diabetes than women.  Video►
  • Calculate which month is most likely to have a diagnosis reported.  Video►

Question 2: Download the attached file of ICD9 codes and descriptions and find the seven errors in the data, where the same ICD9 code has been assigned different descriptions.  Data► Video►

Question 3: Download data in four zipped files and link to these data (do not import the data, link to it). For password contact your instructor.  By opening this file you agree not to share the file with anyone else.  Unzip the files twice, first to get to the directory and then to get to the actual file.  Link to the data into Access or read the data into Microsoft SQL server.  We recommend you use SQL server.  To create the database, open SQL server and right click on the database and start a new database.  Right click on the database, select "Tasks," select "Import Data," select "Flat File Source" as the source of the data, change file type to "CSV Files," browse to where you unzipped the files, indicate that field names are in the first row, select as destination "SQL Server Native Client" file type. Massive Data► Access Code► SQL Code to Merge Files► Create Database► Visual Guide to Read SQL►

  • Remove blanks from numeric data, such as DxAtAge.  Convert text data in the AgeAtDeath to float.  This is typically done through if statements such as these:
    •  IIF(DxAtAge >0, DxAtAge, Null)
    • IIF(AgeAtDeath="Null", Null, cast(AgeAtDeath AS Float))
  • Calculate the average age of and the standard deviation of the diagnosis.  Which 10 diagnoses occur first, meaning which diagnosis occurs at a younger age.   Access Code► SQL Code & Answer►
  • List the top 20 most frequent diagnoses that co-occur.  To complete this task you would need to join the table to itself.  Then, use ICD9 code in one table as the first and the ICD9 code in the second table as the second of the pair.  Count the number that match any pair of diagnoses.  Access Code► SQL Code►
  • Use STUFF function to concatenate list of unique diagnoses for the same person.  Count numbers of times these lists occur more than 29 times STUFF SQL►
  • Identify individuals whose date of death might be in error and have visits post date of death. Exclude them. Report the top 10 IDs that remain in order of IDs.  Access Code► SQL Code►
  • Rank order diagnosis in order of their reoccurrence for the same person.  Rank or Row Number functions are described in Google.  Look up the format of the function and implement it in your SQL code. Identify the 1st, 2nd and 3rd re-occurrences of every diagnoses.  For example, the following tables shows how the rank order should work for person with ID 1:

ID ICD9 Rank
1 410 1
1 250 1
1 410 2
1 250 2
1 250 3
1 100 1
1 250 4

3. Probability & Distributions

Question 1: In this problem we ask you to calculate a case mix index for a hospital from classification of its patients into Diagnostic Related Groupings (DRGs).  In Health Administration programs case mix issues arises in multiple courses where severity of patients receiving care in different hospitals are discussed.  The case mix index allows the comparison of two hospitals. It is generally calculated as a weighted length of stay across all DRGs see in the hospital. The concept of weighted average was discussed in this section.  In a case mix index, the weights are the probability of observing patients in a particular DRG category.  Each DRG category is assumed to be mutually exclusive and exhaustive.  The number of patients who are admitted for different DRGs are indicated in the attached data file.  From these numbers you calculate the probability of each DRG.  By multiplying the probability of the DRG by length of stay you get the contribution of each DRG.  The case mix index is the sum of the product of probability of each DRG and length of stay within each DRG.  The higher the case mix index, the larger the expected length of stay at the hospital. Which hospital has a higher case mix index?  Data► Answer►

Question 2: Download Hospital Compare Data using the link below.  Select flat file "Complications - Hospital.CSV"  Read the data into Excel.  For all hospitals select "Rate of complications for hip/knee replacement patients".  You can do this by using Excel's filter. Calculate the average rate across all hospitals. Calculate the standard deviation for the rate across all hospitals. Excel has commands for calculation of standard deviation and averages, please use these commands. Report the average rate and the standard deviation of the rate to your instructor (do not include the data in your submission).  Data is also available through Medicare Compare site: Hospital Compare► Answer►

Question 3: For this question use the file "Complications - Hospital.CSV" in Hospital Compare.  Same file was also downloaded for question 2.  Make a histogram of the rate of complications for hip/knee replacement patients at different hospitals using the data you downloaded in the previous step.  Data►

Question 4: For this question use the file "Complications - Hospital.CSV" in Hospital Compare.  Same file was also downloaded for question 2.  Plot the relationship between rate of complications for hip/knee replacements and pressure sores.  Use scatter plot in Excel.  Have the rate of complications as X-axis and pressure scores as Y axis.   Data► Excel Scatter Plots►

4. Bivariate Analysis

A. This problem describes a type of problem typically discussed in Marketing classes, where managers are trained to understand market participation and market share.  We have simplified the number of variables and cases in the problem to make it easier to analyze. A typical realistic problem may have hundreds of variables and thousands of cases. Data►

  1. What is the probability of hospitalization given that you are male? Select all males and count the number of patients who were hospitalized. Calculate the probability as the ratio of males hospitalized to number of males.  Video► SWF►
  2. Create a contingency table for interaction between age and gender.  Is age independent from gender. Answer►
  3. Is insurance independent of age?  Check that the probability of combination of insurance and age can be estimated from the product of probability of insurance and age or use the contingency table of age and insruance.
  4. What is the probability associated of being more than 65 years old among hospitalized patients? Start by selecting all hospitalized patients, then count the number among hospitalized patients who are more than 65 years old.  The likelihood or the probability of being over 65 among hospitalized patient is the number of patients hospitalized and above 65 divided by the number of hospitalized patients:
    Likelihood Formula
  5. What is the probability of being hospitalized given you are more than 65 years old?  This time we are switching the condition. Now we are asking for the probability among patients who are more than 65 years old.  So select all patients who are more than 65 years old and then count the number who are hospitalized.  In contrast to the previous question the ratio is calculated by dividing the number of patients above 65 who were hospitalized divided by number above 65 years.
  6. In predicting hospitalization, what is the likelihood ratio, LR, associated with being more than 65 years old?  This is not the same as the likelihood of being above 65 given that you are hospitalized.  It should be calculated as follows:
    Likelihood Ratio Formula
  7. What is the prior odds for hospitalization before any other information is available?  The probability of hospitalization is calculated as the number hospitalized by the number in the sample. Prior odds is calculated as the probability of hospitalization by one minus the probability.  A simpler way to do so, the prior odds is the ratio of number hospitalized divided by the number not hospitalized or as:

    Prior Odds Formula

  8. Analyze the data in the Table and report if any two variables are conditionally independent of each other in predicting probability of hospitalization? Consider the pairs  Gender & Age, Age & Insured, and Gender & Insured.  If two events are independent, then the likelihood ratio associated with the combined event should be the product of the likelihood ratios of each event.  If the likelihood ratio cannot be calculated because of division by zero, then skip that check. In using likelihood ratios to test the independence of two variables, note that you have to test it for all levels in the variable.  So for example, if we are examining the independence of age and gender, then you would test the independence of four set of combination of variables from their components: 
    • Likelihood ratio Age>65 and Male = Likelihood ratio of Age>65 * Likelihood ratio of Male
    • Likelihood ratio Age>65 and Female = Likelihood ratio of Age>65 * Likelihood ratio of Female
    • Likelihood ratio Age<=65 and Male = Likelihood ratio of Age<=65 * Likelihood ratio of Male
    • Likelihood ratio Age<=65 and Female = Likelihood ratio of Age<=65 * Likelihood ratio of Female
     Keep in mind that because the number of cases are too few, many ratios cannot be calculated.   Video► SWF►

5. Risk & Severity of Illness Assessment

Question 1: Construct a simple Multi-Morbidity Index.  Assess the average severity of CHF, MI, Diabetes, Hypertension, Alcohol Use, and ACL surgery (assume that sicker patients have longer stays). To calculate the average severity associated with a disease, compare all cases with the disease to all control patients without the disease. Make sure that in each comparison, patients with and without disease have the same set of comorbidities. For example, to find the average length of stay for patients with MI, select all MI patients with the following comorbidities: CHF,DM,AA.  Then compare these cases to controls who do not have MI but have the same comorbidities. To help you understand this assignment, consider the following table.  In this table, we see different strata of mutually exclusive and exhaustive comorbidities.  Then, within each strata we can observe the impact of MI.  The impact of MI is the average impact of MI within each strata.  Your objective is to create this table before you calculate the impact of MI.  To do so, first estimate n1 through n5 through a file where the data is restricted to cases with MI (WHERE MI=1).  Then, estimate the values n6 through n10 through a file where the data are restricted to non-MI patients (WHERE MI=0).  Merge these data, making sure that you match on the strata.  Then you can calculate the impact of MI.  Data►  Video► Slides►  SQL Code► Answer►

Strata Cases of MI Controls with No MI Difference
AA n1 n6 n1-n6
AA, CHF n2 n7 n2-n7
AA,DM n3 n8 n3-n8
DM, CHF n4 n9 n4-n9
AA,DM,CHF n5 n10 n5-n10

Question 2:  In question 1, calculate the likelihood ratio associated with each diagnosis in predicting above or below average length of stay.  First calculate the average length of stay for each diagnosis.  Assign individuals who have above average length of stay 1 and those below average length of stay 0.  Next calculate the likelihood ratio.  To calculate the likelihood ratio, select all individuals who have above average length of stay.  Examine the prevalence of the diagnosis among them.  Select all individuals who have below average length of stay and select the prevalence of the diagnosis among them.  The ratio of these two calculated numbers constitutes the likelihood ratio. We have calculated this number to be 1.67.  See if you get the same answer. Show intermediary calculations. Data► Access►

Question 3: Calculate the average age and the likelihood ratio associated with diagnoses. You would need to use SQL to do this assignment.  You can use any SQL software, including Access. Since the data is massive (17 million rows), keep in mind that Access requires you to analyze the data in partitions. Microsoft SQL server can analyze the entire data in one run. Submit your SQL code and the 10 diagnosis with highest and lowest likelihood ratios. For password to access data contact your instructor.  By opening this file you agree not to share the file with anyone else. Massive Data► Access► SQL Code► Marla's Guide► Answer►

 

6. Comparison of Means

Question 1.  In Health Administration Programs conducting satisfaction surveys are usually covered in courses on quality improvement.  This exercise shows how data from satisfaction surveys can be analyzed over time. Assume that, in different time periods, 4 randomly selected patients rated their satisfaction with our services.  Are we improving? Data► Answer►

Question 2.  In Health Administration programs accounting courses typically cover cost data.  Following data were obtained regarding the cost of taking care of patients over several time periods.  Are our costs within expectation? Data► Slides► Listen► Answer►

Question 3: In this assignment, examine if payments for AMI to hospitals that received higher payment at start of 2015 has changed over time.  You can use Access database for opening the data in this assignment. Downloading these data files and linking them is a time consuming activity, allocate sufficient time to this activity. Keep the databases you download, as in this book we will repeatedly rely on the Hospital Compare data. In a health administration program, these data may be used in a variety of courses, including courses on quality improvement, informatics, and cost accounting. Download dated data files from the archives. These files look like HosArchive_YYYYMMDD.zip, where YYYY is the year, and MM is the month. Unzip the files. Extract Hospital.mdb files for each time period, make sure you record the year and the month into the database name. Keep in mind that start and end periods of data are also given inside the files.  Start from 1/1/2015 till the most recent available database. Since the data on AMI payments does not change in the files, you can save time and read only the following 3 files into a new database:

File Database Start End
HQI_HOSP_AMI_Payment   HOSArchive_20150122 2010 2013
HQI_HOSP_Payment HOSArchive_20150716 2011 2014
HQI_HOSP_PaymentAndValueOfCare Hospital_20161110 or HOSArchive_20161110 2012 2015

In these files the denominator indicates the number of patients.  Payment indicates average payment per patient.  Select data that meet the following conditions:

  1. Select the "PAYM_30_AMI" measure
  2. Select hospitals who at start of 2015 had greater than national average payment.  This information is provided in the field [Compared to National] in the file [HQI_HOSP_AMI_Payment ]. 
  3. Select hospitals that in the denominator do not have "Not Available".

Examine if the payments to the hospitals changed over time. Submit an Excel file containing the control chart for the data. Download►2016-11-10.zip► 2016-08-10.zip► 2016-05-04.zip► 2015-12-10.zip► 2015-10-08.zip► 2015-07-16.zip► 2015-05-06.zip► 2015-04-16.zip► 2015-01-22.zip►  Guide► Answer►

Question 4: Calculate the Probability of observing Z values in the following ranges in a normal distribution.  Answers are provided, show your intermediary table look-up values and calculations. z_calculator► Answer►

Probability for Z value

Question 5: Using the databases from Hospital Compare, download quarterly data on average number of hours of restraints from the file "HOSPITAL_QUARTERLY_QUALITYMEASURE_IPFQR_HOSPITAL".  Examine the data for "UNIVERSITY OF ALABAMA HOSPITAL".  This is hospital ID code 010023.  Not all time periods include new data.  I was able to find the following set of data (you can f ind more):

Numerator (Overall Hours of Restraint Use)  Denominator (Overall Number of Patients) Start Date for Data Collection End Date for Data Collection
3.15 4,449 10/1/2012 3/31/2013
18.18 9,226 4/1/2013 12/31/2013
59.06 12,878 1/1/2014 12/31/2014

 Note that the field HBIPS-2_Overall_Num indicates the numerator for the measure "Hours of physical-restraint use."  The denominator for the same overall measure is in the field HBIPS-2_Overall_Den.  The dictionary provides the interpretation of these two fields as hours of restraint and number of patients examined. Using the procedure for XmR control chart examine if the total number of hours of restraints has changed over time. Download►2016-11-10.zip► 2016-08-10.zip► 2016-05-04.zip► 2015-12-10.zip► 2015-10-08.zip► 2015-07-16.zip► 2015-05-06.zip► 2015-04-16.zip► 2015-01-22.zip►Dictionary►

Question 6: Find the value of standard normal variable z such that area under the curve below z is .3300 or .1003.  z Calculator►

Question 7: Variable X has a mean of 10 and standard deviation of 2 in the population. Calculate the z score that corresponds to X = 20. z Calculator►

Question 8: Variable X has a mean of 4 and standard deviation of 2 in the population. Find the value of X such that the corresponding z score for this value is -3. z Calculator►

Question 9: At a board meeting,  you hear different statements about who the clinic is serving.  According to the statements, 68% of the patients are under 25 years old and 10% are over 50 years old. Assuming that the patients are normally distributed, find the mean and standard deviation of the distribution.  We estimated the mean to be 10.62 and the standard deviation to be 30.74.  Show how these mean and standard deviations were calculated.  z Calculator► Answer►

Question 10: You are health administrator who is interested in examining organizational motivation and commitment of hospital employees. As part of a research project you have collected data from 16 employees serving in various wards of a hospital in a large metropolitan area. The following table provides information on Employee type (1 = trainee, 2 = supervisor, 3 = manager), Gender (1 = female, 2 = male), Training (0-10 scale), Motivation (0-10 scale), and Commitment (0-10 scale) of the 16 employees in your sample. Please make sure that you treat employees and gender as categorical variables and training, motivation and commitment as quantitative variables.

Employee
Type

Gender

Training

Motivation

Commitment

1

1

4

5

4

1

1

9

8

5

1

1

4

8

5

1

1

7

6

6

1

1

1

1

1

2

1

0

1

1

2

1

5

5

6

2

1

10

0

1

2

2

1

7

2

2

2

3

9

4

2

2

4

6

5

3

2

5

6

6

3

2

0

1

1

3

2

2

3

0

3

2

3

2

1

3

2

0

1

0

(a) Compute descriptive statistics for each variable.  Check that you have not made an error in data entry and that your descriptive statistics match the following Excel output.

(b) Use Excel to perform a one sample test to evaluate whether or not the mean motivation level of all employees in the population is different from 5. The null hypothesis is that 1 = 5; i.e. the population mean motivation level is equal to 5. The alternative hypothesis is that 1 ≠ 5; i.e. the population mean motivation level is significantly different from 5. Calculate the mean (4.31) and the standard deviation (3.00) using functions in Excel. Calculate the t-statistic and its degrees of freedom. Calculate the critical value and test if the critical value is less than alpha of 0.05. Copy/paste relevant Excel output. Provide interpretation of "t" test results.  

(c) Use Excel to perform a paired samples t-test to evaluate whether or not the mean Motivation level is significantly different from mean Commitment level in the population. The null hypothesis is that 1 = 2, i.e. the sample mean motivation level is equal to the sample mean commitment level. The alternative hypothesis is that  1 ≠  2, i.e. the sample mean motivation level is significantly different from the sample mean commitment level. Test at alpha levels less than 0.05. Copy/paste relevant Excel output. Provide interpretation of t-test results. Answer►

(d) Use Excel to perform an independent samples t-test (assuming equal variances) to evaluate whether or not the mean Motivation level differs significantly between male and female employees in the population. The null hypothesis is that 1 = 2; i.e. the sample mean motivation level for females is equal to the sample mean motivational level for males. The alternative hypothesis is that 1 ≠  2; i.e. the sample mean motivation level for females is significantly different from the sample mean motivational level for males. Copy/paste relevant Excel output. Provide interpretation of t-test results. For this problem you can assume that a pooled variance test is appropriate and alpha level is 0.05. Answer►

 

7. Comparison of Rates

Question 1: This exercise allows you to examine how quality of hospital care is evaluated through data obtained from electronic health records.  This exercise is discussed in several health administration courses, including courses on quality improvement, informatics, or statistics. In practice, the need to analyze these types of data arises in improvement teams, where managers, clinicians and sometimes patients work together to systematically improve healthcare processes. In this assignment, you are asked to examine if prophylactic antibiotics are discontinued within 24 hours after end of surgery.  Instead of providing you with the data, we ask that you download the data on the web so that you learn more about the source of the data and peculiar organization of the data.  Hospital Compare provides information on many quality indicators.  In this assignment, you are asked to focus on failure to stop use of antibiotics post surgery. The Hospital Compare web site provides the following rationale for why this measure of quality is helpful:  "A goal of prophylaxis with antibiotics is to provide benefit to the patient with as little risk as possible. It is important to maintain therapeutic serum and tissue levels throughout the operation. Intra-operative re-dosing may be needed for long operations. However, administration of antibiotics for more than a few hours after the incision is closed offers no additional benefit to the surgical patient. Prolonged administration does increase the risk of clostridium difficile infection and the development of antimicrobial resistant pathogens."  To complete this assignment complete the following tasks:

  1. Download Hospital Compare data for the year 2015 and 2016.  Merge the file " HQI_HOSP_TimelyEffectiveCare" across all databases for different years. 
  2. Create a query for each year.  Select measure ID "SCIP_INF_3"; this measure refers to prophylactic antibiotic use. Select to work with data from "Southeast Alabama Medical Center"; this is provider ID "010001". 
  3. Calculate rate of overuse of antibiotics.  In these data, the field "Score" refers to the number of patients with overuse of antibiotics.  The field "Sample" refers to all selected surgical patients with no evidence of prior infection. The ratio of Score to Sample provides the rate of overuse of antibiotics.  

Construct a control chart showing rate of overuse of prophylactic antibiotic over time in Southeast Alabama Medical Center.  Download the needed data using the following databases:  2016-11-10.zip► 2016-08-10.zip► 2016-05-04.zip► 2015-12-10.zip► 2015-10-08.zip► 2015-07-16.zip► 2015-05-06.zip► 2015-04-16.zip► 2015-01-22.zip►   Guide► Answer►

Question 2:  Nursing home and hospital administrators implement many initiatives to reduce falls.  This problem shows how these initiatives can be measured and evaluated.  Prepare a risk-adjusted control chart for data on falls in a nursing home. Data►  Video► SWF► Answer►

Question 3: Following data were obtained on post surgical infection rates.  Are we having more infections than expected from the patients' conditions? Answer►

Week Risk of Infection for each Patient Number Infected
1 0.9 0.8 0.7 0.8 0.9 0.85 6
2 0.7 0.8 0.7 0.6 0.8 0.9 5
3 0.8 0.95 0.92 0.87     4
4 0.5 0.6 0.66 0.67     3
5 0.3 0.4 0.5 0.4 0.5 0.34 2
6 0.3 0.4 0.5       1

Question 4: Download Hospital Compare data for the year 2015 and 2016.  Start from 1/1/2015 till the most recent available database. Since the data on AMI payments does not change in the files, you can save time and read only the following 3 files into a new database:

File Database Field Start End
HQI_HOSP_AMI_Payment   HOSArchive_20150122 Compared to National 2010 2013
HQI_HOSP_Payment HOSArchive_20150716 Category 2011 2014
HQI_HOSP_PaymentAndValueOfCare Hospital_20161110 or HOSArchive_20161110 Payment Category 2012 2015

In these files the denominator indicates the number of patients.  Whether the hospital had above average payments are reported in different fields: [Compared to National], [Category], and [Payment Category]  Select data that meet the following conditions:

  1. Select the "PAYM_30_AMI" measure
  2. Select hospitals that in the denominator do not have "Not Available".

Construct a control chart showing probability of change over time.  Download using the following files:  2016-11-10.zip► 2016-08-10.zip► 2016-05-04.zip► 2015-12-10.zip► 2015-10-08.zip► 2015-07-16.zip► 2015-05-06.zip► 2015-04-16.zip► 2015-01-22.zip► Guide► Answer►

Question 5: On average 1 in 10 patients have some sort of medication error in our facility; what is the probability that out of 10 patients none will have medication errors?

Question 6: The probability of a patient recovering from a heart operation is 0.9. Assuming that heart operations are not related to each other, what is the probability that exactly 4 out of 8 next patients will survive the operation?

Question 7: An insurance salesperson is about to sell life insurance policies to 7 unrelated women, all of whom are of the same age, belong to the same race, and are in good health. According to independent actuarial estimates, the probability of a woman of this age, race, and health status being alive 20 years from now is 80%. What is the probability that in 20 years all 3 women are alive?

 

8. Time to Adverse Events

Question 1: We have received a set of comments over time on the web. Check if we have improved?   Please note that there are no pre- and post intervention periods and therefore control limits must be calculated from the entire period. Data► Answer►

Question 2: The following data show the speed with which Alabama Medical Center's Emergency Room provides a patient who has fractured long bone with pain medication.  Examine if the organization has been able to reduce the response time to less than 73 minutes? Answer► You Tube from Aras►

Measure Start Date Measure End Date Minutes to Pain Medication Sample
4/1/2013 3/31/2014 78 132
7/1/2013 6/30/2014 68 124
7/1/2013 6/30/2014 68 124
10/1/2013 9/30/2014 72 122
1/1/2014 12/31/2014 76 138
4/1/2014 3/31/2015 85 144
7/1/2014 6/30/2015 86 137
10/1/2014 9/30/2015 95 121
1/1/2015 12/31/2015 99 111

Question 3: A control chart requires independent repeated observations over time.  In addition, based on other factors, different charts are appropriate. In the following conditions identify one or more control charts that are appropriate.  If more than one control chart is appropriate, the usual approach is to try both charts and select the chart with tighter controls (i.e. control limits with smaller absolute difference):

 
Is there one observation per time period? Is the observation measured on a continuous (interval) scale? Are data outliers (values more than 2 standard deviations away from mean)? Recommended Chart
Yes Yes Yes  
Yes Yes No  
Yes No -  
No Yes  -  
No No  -  

 

9. Tukey Chart

Question 1: In health administration programs, data on waiting time are examined in courses on quality and operations research.  Using the attached data, determine if the waiting time in our urgent care center has changed? Data► Answer►

Question 2: In Hospital Administration Programs, time to adverse events is typically taught in courses on quality.  It also may be referred to in courses on strategy, if the hospital is focused on competing based on quality.

Hospital Compare reports the measure OP_21. The Score field provides time (in minutes) from emergency department arrival to initial oral, intranasal or parenteral pain medication administration for the patients with a diagnosis of a long bone fracture.  The field Sample provides the number of patients used to calculate the time to pain medication. In this assignment, we ask you to track the performance of "Inova Fairfax Medical Center" over two years. Focus on the Score variable.  Download Hospital Compare data; these years include data from 2013 to 2015.  Merge the file " HQI_HOSP_TimelyEffectiveCare" across all the databases that you have downloaded.  Select measure ID: "OP_21" .  Construct a control chart showing time between pain medications. The data are reported for a range of time; assume that the data are reported for the mid-point of the range. Download data using the following 9 files:  Rationale 2016-11-10.zip► 2016-08-10.zip► 2016-05-04.zip► 2015-12-10.zip► 2015-10-08.zip► 2015-07-16.zip► 2015-05-06.zip► 2015-04-16.zip► 2015-01-22.zip►

Construct both a Tukey and a time-between control chart for the data.   For the time-between control chart assume that data points above 52 minutes exceed and observations below 52 minutes are less than the national average. Answer►

Question 3: Analyze the following data using Tukey, XmR and Time-In-Between (more than 30 minutes of exercise considered a successful day) charts. Produce 3 charts and discuss if the findings from the 3 charts are similar.  To decide if the exercise time has changed, rely on the control chart with the smallest difference between upper and control limit. Data► Answer►

 

Causal Control Chart

Question 1: There are many reasons for excessive boarding times in emergency rooms.  The following data shows the simulated experience of a hospital.  For each time period, we show the number of excessive boarding and total patients examined.  These numbers are divided into 4 strata each corresponding to presence of alternative reasons for backup; in particular whether there was image or hospital-bed back up present.  The pre-intervention period corresponds to before hiring additional personnel for the emergency room.  The post intervention refers to the period after hiring,  Create a control chart to examine if the addition of new personnel in the emergency room has led to decline of excessive boarding times. Data► Excel Example►

  Month Pre-Intervention Controls Post-Intervention Cases
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Total Examined None 3 6 19 27 27 3 1 27 19 17 19 4 27 19 3 28 24 21 13 4 17 18 28 18 18
Image Backup 21 22 4 22 0 5 16 9 23 27 6 11 3 21 27 21 2 25 8 24 0 29 19 16 28
Bed Availability 1 1 2 1 2 0 5 8 1 2 9 11 17 1 10 2 4 11 12 14 12 2 17 13 2
Image & Bed Backup 4 18 28 28 20 18 7 18 19 13 24 26 8 26 19 4 24 17 0 26 12 29 20 16 0
Excessive Boarding None 0 0 2 1 3 0 0 3 1 2 2 0 3 1 0 4 2 2 0 0 1 1 3 1 2
Image Backup 10 10 1 6 0 2 6 2 9 7 3 6 1 22 60 35 2 57 17 41 0 42 45 23 56
Bed Availability 0 0 0 1 2 0 1 3 0 1 6 7 8 0 7 1 1 4 11 7 10 1 12 12 0
Image & Bed Backup 5 18 18 28 20 18 7 10 23 13 11 20 7 26 19 2 15 17 0 26 14 16 20 9 0

Using weighting procedures, the following control chart is obtained:

Causal Control Chart

Question 2:  Estimate the effect of election of president Trump on healthcare stock of Humana Inc.  Election of president Trump was a surprise to many.  After his election, stock prices may reflect both his election as well as the stock's historical patterns. President Trump had promised to repeal and perhaps replace Obama Care. Humana Inc. was one of the insurers who was active within Obama Care. It eventually dropped out of the insurance markets months later; but at the time it should have been affected by the uncertainty about future of Affordable Care Act (Obama care). Examine the price of Humana's stock prices 2 months before and 2 months after the election; did the election make a difference in percent of days the stock went up? Complete the analysis for every 6 days the market was open. Repeat the analysis and this time control for the Nasdaq index fund (as a surrogate of general economy) and Standard & Poor Healthcare Sector index (a surrogate for changes in the healthcare industry). After controlling for these two alternative explanations, did the election still affect percent of days stock went up?  Data► Chihiro's Slides► Chihiro's write up► Excel

 

11. Multivariate Regression

Question 1: Predict total cost of long term care for residents in an alternative to nursing homes. Do the residents' medical history predict total cost of care? Examine if residents in the alternative nursing home have on average a lower cost of care than residents in nursing homes.  Access to the data requires a password. For password use the instructor's last name. Data►

 

12. Logistic Regression

Question 1: Assess the propensity that patients have for joining the alternative to nursing homes. Access to the data requires a password. For password use the instructor's last name. Data►

 

13. Poisson Regression

14. Matched Case Control

15. Stratified Covariate Balancing

Question 1: Find response to citalopram for patients with different types of depression. For each diagnosis use the remaining diagnoses as covariates and identify for which diagnoses citalopram is best.  For the analysis use data from STAR*D experiment conducted by NIMH. For this assignment you can use any statistical software or use R software prepared by the instructor for stratified covariate balancing  Download R►

  1. Read about the study protocol. Protocol►
  2. Download data.  Use instructor's last name as password.  Must enter password twice. Data 2010► Data 2003► Complete Data►
  3. Create a stratified covariate balancing model of impact of citalopram on depression.
  4. Use at least 10 variables to remove confounding in the model.  Balance the data to remove the effects of other types of co-occurring mental health diagnoses.  Show visually that the stratified covariate balancing has been able to remove the effects of other variables from response to citalopram
  5. Is there a smaller set of variables that you could stratify. Identify the Markov blanket of citalopram in predicting response to treatment.  Vang's Slides►
  6. Describe what predicts success of citalopram.
  7. Describe how well the model predicts response to citalopram.

See work done by Mazloum► Elashka & Aiyar►

 

16. Stratified Regression