
Book on Statistics for Health Care Managers: Analysis of Data in Electronic Health Records
Chapter 3: Probability and Distributions► Chapter 4: Bivariate Analysis► Chapter 6: Comparison of Means► Chapter 7: Comparison of Rates► Chapter 8: Analysis of Time to Adverse Events► Chapter 10: Causal Control Charts► Chapter 11: Multivariate Regression► Chapter 12: Logistic Regression► Chapter 13: Poisson Regression► Chapter 14: Matched Case Control► Chapter 15: Stratified Covariate Balancing► Chapter 16: Stratified Regression►


1. Introduction
Question 1: Select your teach one assignments. We subscribe to the principle of "Learn one, do one, teach one." The best way to learn is to teach the topic. At start of the course, you select a topic in the course to teach. A week prior to the due date, you submit all assignments in the topic to the instructor. On due date, you email to all students in the class a You Tube video showing how to solve assigned homework. Missing any of the deadlines on this assignment will lead to 10% reduction in your grade for each week of delay. If you and others have the same topic, you can focus on different parts of the lecture or assignments. Use Blackboard discussion to indicate the topic you want to focus on. If a student has already taken the topic, it is your responsibility to negotiate how your focus will differ. Question 2: Plot the following data, where the X axis is the time periods, the Yaxis shows the observed value, the upper control limit and the lower control limits. Distinguish between the first seven data periods as these were collected preintervention. Remaining data points were collected post intervention. Title the chart. Create a legend that defines the name for various lines. Make sure that the observation line has markers and the control limits have no markers. Make portion of Upper and Lower limit lines that are post intervention dashed. Make portion of the line that is preintervention straight line. Make all Upper and Lower limit lines red. Data► Slides► Listen► Question 3: This problem will test your familiarity with Excel. Consider the following population data, P: {0, 4, 10, 15, 47, 100}. Assume that you are sampling WITHOUT replacement i.e. once a population element is selected in a sample, that element cannot be selected again for that sample.
Question 4: In the following Table, fill in all cell values so that all indicated relationships hold. X and Y are random variables. X with a bar above it indicates the average of X values in the column X. This problem will test your understanding of several principles related to elementary statistics and algebraic operations. Submitted your completed work as an Excel file. Question 5: The following data are arranged in order of increasing magnitude. Figure out the two missing values (represented by question marks) such that the mean for this data equals the median (for this problem, assume that missing values are positive integers). Data: ?, 2, 4, 5, 6, 8,? 

2. Preparing Data
To complete these assignments, you need to download either Microsoft SQL Server or Microsoft Access. Video►
Question 1: Import data from the following four files into four tables. Ptid file► Claims file► ICD file► CPT file► Video►
Question 2: Download the attached file of ICD9 codes and descriptions and find the seven errors in the data, where the same ICD9 code has been assigned different descriptions. Data► Video► Question 3: Download data in four zipped files and link to these data (do not import the data, link to it). For password contact your instructor. By opening this file you agree not to share the file with anyone else. Unzip the files twice, first to get to the directory and then to get to the actual file. Link to the data into Access or read the data into Microsoft SQL server. We recommend you use SQL server. To create the database, open SQL server and right click on the database and start a new database. Right click on the database, select "Tasks," select "Import Data," select "Flat File Source" as the source of the data, change file type to "CSV Files," browse to where you unzipped the files, indicate that field names are in the first row, select as destination "SQL Server Native Client" file type. Massive Data► Access Code► SQL Code to Merge Files► Create Database► Visual Guide to Read SQL►


3. Probability & Distributions
Question 1: In this problem we ask you to calculate a case mix index for a hospital from classification of its patients into Diagnostic Related Groupings (DRGs). In Health Administration programs case mix issues arises in multiple courses where severity of patients receiving care in different hospitals are discussed. The case mix index allows the comparison of two hospitals. It is generally calculated as a weighted length of stay across all DRGs see in the hospital. The concept of weighted average was discussed in this section. In a case mix index, the weights are the probability of observing patients in a particular DRG category. Each DRG category is assumed to be mutually exclusive and exhaustive. The number of patients who are admitted for different DRGs are indicated in the attached data file. From these numbers you calculate the probability of each DRG. By multiplying the probability of the DRG by length of stay you get the contribution of each DRG. The case mix index is the sum of the product of probability of each DRG and length of stay within each DRG. The higher the case mix index, the larger the expected length of stay at the hospital. Which hospital has a higher case mix index? Data► Answer► Question 2: Download Hospital Compare Data using the link
below. Select flat file "Complications  Hospital.CSV" Read the data
into Excel. For all hospitals select "Rate of complications for hip/knee
replacement patients". You can do this by using Excel's filter. Calculate
the average rate across all hospitals. Calculate the standard deviation for the
rate across all hospitals. Excel has commands for calculation of standard
deviation and averages, please use these commands. Report the average rate and
the standard deviation of the rate to your instructor (do not include the data
in your submission). Data is also available through Medicare Compare site:
Hospital Compare►
Answer► Question 3: For this question use the file "Complications  Hospital.CSV" in Hospital Compare. Same file was also downloaded for question 2. Make a histogram of the rate of complications for hip/knee replacement patients at different hospitals using the data you downloaded in the previous step. Data► Question 4: For this question use the file "Complications  Hospital.CSV" in Hospital Compare. Same file was also downloaded for question 2. Plot the relationship between rate of complications for hip/knee replacements and pressure sores. Use scatter plot in Excel. Have the rate of complications as Xaxis and pressure scores as Y axis. Data► Excel Scatter Plots► 

4. Bivariate Analysis
A. This problem describes a type of problem typically discussed in Marketing classes, where managers are trained to understand market participation and market share. We have simplified the number of variables and cases in the problem to make it easier to analyze. A typical realistic problem may have hundreds of variables and thousands of cases. Data►


5. Risk & Severity of Illness Assessment
Question 1: Construct a simple MultiMorbidity Index. Assess the average severity of CHF, MI, Diabetes, Hypertension, Alcohol Use, and ACL surgery (assume that sicker patients have longer stays). To calculate the average severity associated with a disease, compare all cases with the disease to all control patients without the disease. Make sure that in each comparison, patients with and without disease have the same set of comorbidities. For example, to find the average length of stay for patients with MI, select all MI patients with the following comorbidities: CHF,DM,AA. Then compare these cases to controls who do not have MI but have the same comorbidities. To help you understand this assignment, consider the following table. In this table, we see different strata of mutually exclusive and exhaustive comorbidities. Then, within each strata we can observe the impact of MI. The impact of MI is the average impact of MI within each strata. Your objective is to create this table before you calculate the impact of MI. To do so, first estimate n1 through n5 through a file where the data is restricted to cases with MI (WHERE MI=1). Then, estimate the values n6 through n10 through a file where the data are restricted to nonMI patients (WHERE MI=0). Merge these data, making sure that you match on the strata. Then you can calculate the impact of MI. Data► Video► Slides► SQL Code► Answer►
Question 2: In question 1, calculate the likelihood ratio associated with each diagnosis in predicting above or below average length of stay. First calculate the average length of stay for each diagnosis. Assign individuals who have above average length of stay 1 and those below average length of stay 0. Next calculate the likelihood ratio. To calculate the likelihood ratio, select all individuals who have above average length of stay. Examine the prevalence of the diagnosis among them. Select all individuals who have below average length of stay and select the prevalence of the diagnosis among them. The ratio of these two calculated numbers constitutes the likelihood ratio. We have calculated this number to be 1.67. See if you get the same answer. Show intermediary calculations. Data► Access► Question 3: Calculate the average age and the likelihood ratio associated with diagnoses. You would need to use SQL to do this assignment. You can use any SQL software, including Access. Since the data is massive (17 million rows), keep in mind that Access requires you to analyze the data in partitions. Microsoft SQL server can analyze the entire data in one run. Submit your SQL code and the 10 diagnosis with highest and lowest likelihood ratios. For password to access data contact your instructor. By opening this file you agree not to share the file with anyone else. Massive Data► Access► SQL Code► Marla's Guide► Answer►


6. Comparison of Means
Question 1. In Health Administration Programs conducting satisfaction surveys are usually covered in courses on quality improvement. This exercise shows how data from satisfaction surveys can be analyzed over time. Assume that, in different time periods, 4 randomly selected patients rated their satisfaction with our services. Are we improving? Data► Answer► Question 2. In Health Administration programs accounting courses typically cover cost data. Following data were obtained regarding the cost of taking care of patients over several time periods. Are our costs within expectation? Data► Slides► Listen► Answer► Question 3: In this assignment, examine if payments for AMI to hospitals that received higher payment at start of 2015 has changed over time. You can use Access database for opening the data in this assignment. Downloading these data files and linking them is a time consuming activity, allocate sufficient time to this activity. Keep the databases you download, as in this book we will repeatedly rely on the Hospital Compare data. In a health administration program, these data may be used in a variety of courses, including courses on quality improvement, informatics, and cost accounting. Download dated data files from the archives. These files look like HosArchive_YYYYMMDD.zip, where YYYY is the year, and MM is the month. Unzip the files. Extract Hospital.mdb files for each time period, make sure you record the year and the month into the database name. Keep in mind that start and end periods of data are also given inside the files. Start from 1/1/2015 till the most recent available database. Since the data on AMI payments does not change in the files, you can save time and read only the following 3 files into a new database:
In these files the denominator indicates the number of patients. Payment indicates average payment per patient. Select data that meet the following conditions:
Examine if the payments to the hospitals changed over time. Submit an Excel file containing the control chart for the data. Download►20161110.zip► 20160810.zip► 20160504.zip► 20151210.zip► 20151008.zip► 20150716.zip► 20150506.zip► 20150416.zip► 20150122.zip► Guide► Answer► Question 4: Calculate the Probability of observing Z values in the following ranges in a normal distribution. Answers are provided, show your intermediary table lookup values and calculations. z_calculator► Answer►
Question 5: Using the databases from Hospital Compare, download quarterly data on average number of hours of restraints from the file "HOSPITAL_QUARTERLY_QUALITYMEASURE_IPFQR_HOSPITAL". Examine the data for "UNIVERSITY OF ALABAMA HOSPITAL". This is hospital ID code 010023. Not all time periods include new data. I was able to find the following set of data (you can f ind more):
Note that the field HBIPS2_Overall_Num indicates the numerator for the measure "Hours of physicalrestraint use." The denominator for the same overall measure is in the field HBIPS2_Overall_Den. The dictionary provides the interpretation of these two fields as hours of restraint and number of patients examined. Using the procedure for XmR control chart examine if the total number of hours of restraints has changed over time. Download►20161110.zip► 20160810.zip► 20160504.zip► 20151210.zip► 20151008.zip► 20150716.zip► 20150506.zip► 20150416.zip► 20150122.zip►Dictionary► Question 6: Find the value of standard normal variable z such that area under the curve below z is .3300 or .1003. z Calculator► Question 7: Variable X has a mean of 10 and standard deviation of 2 in the population. Calculate the z score that corresponds to X = 20. z Calculator► Question 8: Variable X has a mean of 4 and standard deviation of 2 in the population. Find the value of X such that the corresponding z score for this value is 3. z Calculator► Question 9: At a board meeting, you hear different statements about who the clinic is serving. According to the statements, 68% of the patients are under 25 years old and 10% are over 50 years old. Assuming that the patients are normally distributed, find the mean and standard deviation of the distribution. We estimated the mean to be 10.62 and the standard deviation to be 30.74. Show how these mean and standard deviations were calculated. z Calculator► Answer► Question 10: You are health administrator who is interested in examining organizational motivation and commitment of hospital employees. As part of a research project you have collected data from 16 employees serving in various wards of a hospital in a large metropolitan area. The following table provides information on Employee type (1 = trainee, 2 = supervisor, 3 = manager), Gender (1 = female, 2 = male), Training (010 scale), Motivation (010 scale), and Commitment (010 scale) of the 16 employees in your sample. Please make sure that you treat employees and gender as categorical variables and training, motivation and commitment as quantitative variables.
(a) Compute descriptive statistics for each variable. Check that you have not made an error in data entry and that your descriptive statistics match the following Excel output.
(b) Use Excel to perform a one sample test to evaluate whether or not the mean motivation level of all employees in the population is different from 5.
The null hypothesis is that µ1 = 5; i.e. the population mean motivation level
is equal to 5. The alternative hypothesis is that µ1 ≠ 5; i.e. the
population mean motivation level is significantly different from 5.
Calculate the mean (4.31) and the standard deviation (3.00) using functions
in Excel. Calculate the tstatistic and its degrees of freedom. Calculate
the critical value and test if the critical value is less than alpha of
0.05. Copy/paste relevant Excel output. Provide interpretation of "t" test results.


7. Comparison of Rates
Question 1: This exercise allows you to examine how quality of hospital care is evaluated through data obtained from electronic health records. This exercise is discussed in several health administration courses, including courses on quality improvement, informatics, or statistics. In practice, the need to analyze these types of data arises in improvement teams, where managers, clinicians and sometimes patients work together to systematically improve healthcare processes. In this assignment, you are asked to examine if prophylactic antibiotics are discontinued within 24 hours after end of surgery. Instead of providing you with the data, we ask that you download the data on the web so that you learn more about the source of the data and peculiar organization of the data. Hospital Compare provides information on many quality indicators. In this assignment, you are asked to focus on failure to stop use of antibiotics post surgery. The Hospital Compare web site provides the following rationale for why this measure of quality is helpful: "A goal of prophylaxis with antibiotics is to provide benefit to the patient with as little risk as possible. It is important to maintain therapeutic serum and tissue levels throughout the operation. Intraoperative redosing may be needed for long operations. However, administration of antibiotics for more than a few hours after the incision is closed offers no additional benefit to the surgical patient. Prolonged administration does increase the risk of clostridium difficile infection and the development of antimicrobial resistant pathogens." To complete this assignment complete the following tasks:
Construct a control chart showing rate of overuse of prophylactic antibiotic over time in Southeast Alabama Medical Center. Download the needed data using the following databases: 20161110.zip► 20160810.zip► 20160504.zip► 20151210.zip► 20151008.zip► 20150716.zip► 20150506.zip► 20150416.zip► 20150122.zip► Guide► Answer► Question 2: Nursing home and hospital administrators implement many initiatives to reduce falls. This problem shows how these initiatives can be measured and evaluated. Prepare a riskadjusted control chart for data on falls in a nursing home. Data► Video► SWF► Answer► Question 3: Following data were obtained on post surgical infection rates. Are we having more infections than expected from the patients' conditions? Answer►
Question 4: Download Hospital Compare data for the year 2015 and 2016. Start from 1/1/2015 till the most recent available database. Since the data on AMI payments does not change in the files, you can save time and read only the following 3 files into a new database:
In these files the denominator indicates the number of patients. Whether the hospital had above average payments are reported in different fields: [Compared to National], [Category], and [Payment Category] Select data that meet the following conditions:
Construct a control chart showing probability of change over time. Download using the following files: 20161110.zip► 20160810.zip► 20160504.zip► 20151210.zip► 20151008.zip► 20150716.zip► 20150506.zip► 20150416.zip► 20150122.zip► Guide► Answer► Question 5: On average 1 in 10 patients have some sort of medication error in our facility; what is the probability that out of 10 patients none will have medication errors? Question 6: The probability of a patient recovering from a heart operation is 0.9. Assuming that heart operations are not related to each other, what is the probability that exactly 4 out of 8 next patients will survive the operation? Question 7: An insurance salesperson is about to sell life insurance policies to 7 unrelated women, all of whom are of the same age, belong to the same race, and are in good health. According to independent actuarial estimates, the probability of a woman of this age, race, and health status being alive 20 years from now is 80%. What is the probability that in 20 years all 3 women are alive?


8. Time to Adverse Events
Question 1: We have received a set of comments over time on the web. Check if we have improved? Please note that there are no pre and post intervention periods and therefore control limits must be calculated from the entire period. Data► Answer► Question 2: The following data show the speed with which Alabama Medical Center's Emergency Room provides a patient who has fractured long bone with pain medication. Examine if the organization has been able to reduce the response time to less than 73 minutes? Answer► You Tube from Aras►
Question 3: A control chart requires independent repeated observations over time. In addition, based on other factors, different charts are appropriate. In the following conditions identify one or more control charts that are appropriate. If more than one control chart is appropriate, the usual approach is to try both charts and select the chart with tighter controls (i.e. control limits with smaller absolute difference):


9. Tukey Chart
Question 1: In health administration programs, data on waiting time are examined in courses on quality and operations research. Using the attached data, determine if the waiting time in our urgent care center has changed? Data► Answer► Question 2: In Hospital Administration Programs, time to adverse events is typically taught in courses on quality. It also may be referred to in courses on strategy, if the hospital is focused on competing based on quality. Hospital Compare reports the measure OP_21. The Score field provides time (in minutes) from emergency department arrival to initial oral, intranasal or parenteral pain medication administration for the patients with a diagnosis of a long bone fracture. The field Sample provides the number of patients used to calculate the time to pain medication. In this assignment, we ask you to track the performance of "Inova Fairfax Medical Center" over two years. Focus on the Score variable. Download Hospital Compare data; these years include data from 2013 to 2015. Merge the file " HQI_HOSP_TimelyEffectiveCare" across all the databases that you have downloaded. Select measure ID: "OP_21" . Construct a control chart showing time between pain medications. The data are reported for a range of time; assume that the data are reported for the midpoint of the range. Download data using the following 9 files: Rationale 20161110.zip► 20160810.zip► 20160504.zip► 20151210.zip► 20151008.zip► 20150716.zip► 20150506.zip► 20150416.zip► 20150122.zip► Construct both a Tukey and a timebetween control chart for the data. For the timebetween control chart assume that data points above 52 minutes exceed and observations below 52 minutes are less than the national average. Answer► Question 3: Analyze the following data using Tukey, XmR and TimeInBetween (more than 30 minutes of exercise considered a successful day) charts. Produce 3 charts and discuss if the findings from the 3 charts are similar. To decide if the exercise time has changed, rely on the control chart with the smallest difference between upper and control limit. Data► Answer►


Causal Control Chart
Question 1: There are many reasons for excessive boarding times in emergency rooms. The following data shows the simulated experience of a hospital. For each time period, we show the number of excessive boarding and total patients examined. These numbers are divided into 4 strata each corresponding to presence of alternative reasons for backup; in particular whether there was image or hospitalbed back up present. The preintervention period corresponds to before hiring additional personnel for the emergency room. The post intervention refers to the period after hiring, Create a control chart to examine if the addition of new personnel in the emergency room has led to decline of excessive boarding times. Data► Excel Example►
Using weighting procedures, the following control chart is obtained:
Question 2: Estimate the effect of election of president Trump on healthcare stock of Humana Inc. Election of president Trump was a surprise to many. After his election, stock prices may reflect both his election as well as the stock's historical patterns. President Trump had promised to repeal and perhaps replace Obama Care. Humana Inc. was one of the insurers who was active within Obama Care. It eventually dropped out of the insurance markets months later; but at the time it should have been affected by the uncertainty about future of Affordable Care Act (Obama care). Examine the price of Humana's stock prices 2 months before and 2 months after the election; did the election make a difference in percent of days the stock went up? Complete the analysis for every 6 days the market was open. Repeat the analysis and this time control for the Nasdaq index fund (as a surrogate of general economy) and Standard & Poor Healthcare Sector index (a surrogate for changes in the healthcare industry). After controlling for these two alternative explanations, did the election still affect percent of days stock went up? Data► Chihiro's Slides► Chihiro's write up► Excel


11. Multivariate Regression
Question 1: Predict total cost of long term care for residents in an alternative to nursing homes. Do the residents' medical history predict total cost of care? Examine if residents in the alternative nursing home have on average a lower cost of care than residents in nursing homes. Access to the data requires a password. For password use the instructor's last name. Data►


12. Logistic Regression
Question 1: Assess the propensity that patients have for joining the alternative to nursing homes. Access to the data requires a password. For password use the instructor's last name. Data►


13. Poisson Regression


14. Matched Case Control


15. Stratified Covariate Balancing
Question 1: Find response to citalopram for patients with different types of depression. For each diagnosis use the remaining diagnoses as covariates and identify for which diagnoses citalopram is best. For the analysis use data from STAR*D experiment conducted by NIMH. For this assignment you can use any statistical software or use R software prepared by the instructor for stratified covariate balancing Download R►
See work done by Mazloum► Elashka & Aiyar►


16. Stratified Regression
