- Read chapter 2 in Big Data in Health Care: Statistical Analysis of
Electronic Health Record
Instruction for Submission of Assignments: Assignments
should be submitted directly on Blackboard. In rare situations
assignments can be sent directly by email to the instructor. Submission
should follow these rules:
- We prefer if you submit your answers in a
- All Excel cells, except the cells containing the data, must have
formulas. Do not paste the value into the cell, it must be
calculated using a formula. Even simple steps, such as adding two
numbers, should be done using formulas.
- Make sure that any control charts follow the visual rules below:
(1) Control limits must be in red and without markers, (2) Observed
lines must have markers, (3) X and Y axis must be labeled, and (4)
Charts must be linked to the data.
- Copy and paste SQL or R code into Excel sheet. Plot data in
- The first sheet in the file should be a summary page. In the
summary page you should list how your answers to the question differs
from answers provided within the assignment (inside Teach One or other
answers). You must indicate for each question if your control
chart is exactly the same as seen in Teach One or other formats.
For each question, you must indicate if the answers you have provided is
the same as the answers supplied on the web. If there are no
answers provided, you must indicate that there were no answers available
on the web to compare your answers to.
To complete these assignments, you need to download Microsoft SQL Server.
Patruni's Teach One►
- MAC users should first download and install virtualization software (VirtualBox, VMWare Player/Fusion,
Parallels). Get a copy of an operating system from MSDN (i.e. Windows 10),
or Linux (i.e., Ubuntu, or CentOS). Configure a virtual machine and install the guest operating system.
Note: In this course we will be mostly using VirtualBox, but you can use any virtualization solution.
- Install MS SQL. DreamSpark (with Microsoft software) library is available
free of charge for GMU students. Click on start shopping. Select SQL Server 2014 Express
With Advanced Services With Service Pack 1 32/64-bit (English).
- Use your GMU email to register (without Masonlive, i.e. email@example.com). Do not use your GMU password. Only Microsoft software is available on this site. Software is FREE.
Contact instructor if your password does not work.
Question 1: In this problem we ask you to calculate a case mix index for a hospital
from classification of its patients into Diagnostic Related Groupings
(DRGs). In Health Administration programs case mix issues arises in
multiple courses where severity of patients receiving care in different
hospitals are discussed. The case mix index allows the comparison of two
hospitals. It is generally calculated as a weighted length of stay
across all DRGs see in the hospital. The concept of weighted average was
discussed in this section. In a case mix index, the weights are the
probability of observing patients in a particular DRG category. Each DRG
category is assumed to be mutually exclusive and exhaustive. The number
of patients who are admitted for different DRGs are indicated in the
attached data file. From these numbers you calculate the probability of
each DRG. By multiplying the probability of the DRG by length of stay
you get the contribution of each DRG. The case mix index is the sum of
the product of probability of each DRG and length of stay within each
DRG. The higher the case mix index, the larger the expected length of
stay at the hospital. Which hospital has a higher case mix index?
Question 2: Download the attached
file of ICD9 codes and descriptions and find the seven errors in the data, where the same ICD9 code has been assigned different descriptions.
Kavallor's Teach One►
Question 3: Download data in three/four zipped files. You can
focus on the entire data or focus on the data for patients who have at least 365
days of encounters. For password contact your
instructor. By opening this file you agree not to share the file with
anyone else. Unzip the files twice, first to get to the directory and then
to get to the actual file. Link to the data into Access or read the data
into Microsoft SQL server. We recommend you use SQL server. To
create the database, open SQL server and right click on the database and start a
new database. Right click on the database, select "Tasks," select "Import
Data," select "Flat File Source" as the source of the data, change file type to
"CSV Files," browse to where you unzipped the files, indicate that field names
are in the first row, select as destination "SQL Server Native Client" file type.
>365 Days Data,
SQL Code to
Visual Guide to Read SQL►
- Remove blanks from numeric data, such as DxAtAge. Convert text data
in the AgeAtDeath to
float. This is typically done through if statements such as these:
- IIF(DxAtAge >0, DxAtAge, Null)
- IIF(AgeAtDeath="Null", Null, cast(AgeAtDeath AS Float))
- Calculate the average age of and the standard deviation of the
diagnosis. Which 10 diagnoses occur first, meaning which diagnosis
occurs at a younger age.
SQL Code & Answer►
- List the top 20 most frequent diagnoses that co-occur. To complete
this task you would need to join the table to itself. Then, use ICD9
code in one table as the first and the ICD9 code in the second table as the
second of the pair. Count the number that match any pair of diagnoses.
- Use STUFF function to concatenate list of unique diagnoses
for the same person. Count numbers of times these lists occur more
than 29 times STUFF
- Identify individuals whose date of death might be in error and have
visits post date of death. Exclude them. Report the top 10 IDs that remain
in order of IDs.
- Rank order diagnosis in order of their reoccurrence for the same person. Rank or Row
Number functions are described on the web. Search for these commands
and look at some examples online. For how many unique persons does the ICD
diagnoses 250.00 repeat twice in the data? Rank
For example, the following tables shows how the rank
order should work for person with ID 1:
||First occurrence of diagnosis 410
||First occurrence of diagnosis 250
||Second occurrence of diagnosis 410
||Second occurrence of diagnosis 250
- Install Microsoft Server Virtual Box►
- Modeling relationships
- Relationships and primary keys in Access More►
- Complex SQL queries
- Tables and records using Access
- How to find unmatched records?
- Identifying diabetic patients
- Calculating average cost per diagnosis
- Crosstab query
- How duplicate cases are identified and eliminated
- Querying a database
- See how to import data