Supplement to Chapter on Preparing Data
- Types of SQL commands Slides►
Video►
- Creating tables in Microsoft Server SQL
Read►
Slides►
Video►
- Select command
Slides►
Video►
- GROUP BY command
Slides►
Video►
-
WHERE and HAVING commands Slides►
Read►
Video►
-
Introduction to joins
Slides►
Video►
Read►
-
RANK and row numbers
Slides►
Video►
-
Date functions
Slides►
Video►
-
Text manipulation
Slides►
Video►
-
SQL code for cleaning the data
Slides►
Video►
Individual Assignments
No individual assignment should be completed in teams. Submit your
work in Blackboard. Do not discuss the work with other students.
Question 1: Calculate various queries using the data
provided
-
Import data from the following four files into four tables.
Ptid► Claims► ICD► CPT►
-
Identify patients that have diabetes in the above database.
-
Calculate the average cost of each diagnosis sorted from most
expensive to least expensive. Exclude all bills with negative or 0
values.
-
Show if men are more likely to have diabetes than women.
-
Calculate which month is most likely to have a diagnosis reported.
Question 2: Download the attached file of ICD9
codes and descriptions and find the seven errors in the data, where the
same ICD9 code has been assigned different descriptions.
Data► Video► Download► SQL► Slides► Screen
Shots►
Question 3: Count how many patients continue
to visit despite having been reported to be dead. Data►
Question 4: Use GROUP BY and COUNT to report how
many patients have same diagnosis occurring at same age, essentially
having a duplicate entry in the database. Data► Video► Download► SQL► Slides► Screen
Shots►
Team Assignment
Each week you will be complete a team assignment. Each week you are to
work with a classmate that you have not previously worked with, so that
by the end of this course, you will have completed team, assignments
with 8 different partners.
General requirements:
-
Work in teams of 2 persons.
-
Upon submission, indicate the name of your team member.
-
Each member of the team should submit a separate assignment.
-
No copying of code from each other but feel free to learn from each
other. The data reported must be the same. Come to an agreement on
the findings and help each other to arrive to the same findings.
-
If team assignments are completed with individual effort, then the
student loses 10% of the grade.
Team tasks:
-
Download data in zipped files.
-
Merge the files.
-
For password contact your instructor.
-
Do not share the data file. By opening this file you agree not to
share the file with anyone.
-
Unzip the files, first to get to the directory and then to get to
the actual file.
-
Import data into Microsoft SQL server.
-
To create the database, open SQL server and right click on the
database and start a new database. Select "Tasks," select "Import
Data," select "Flat File Source" as the source of the data, change
file type to "CSV Files," browse to where you unzipped the files,
indicate that field names are in the first row, select as
destination "SQL Server Native Client" file type. Video► Download► SQL► Slides► Screen
Shots►
-
Remove blanks from numeric data, such as DxAtAge.
-
Convert text data in the AgeAtDeath to float. This is typically
done through if statements such as these:
-
IIF(DxAtAge >0, DxAtAge, Null)
-
IIF(AgeAtDeath="Null", Null, cast(AgeAtDeath AS Float))
-
Identify individuals whose date of death or birth might be in error
and have visits post date of death or prior to birth. Exclude them. Access► SQL
Code►
-
Remove diagnoses occurring more than once for the same patient at
same age
-
Remove patient ids that are null or start with Z.
-
Remove diagnoses for patients whose age of diagnosis is null
-
Remove diagnoses that occur before patient is 21 years old or when
patient is more than 99 years old
-
Rank order diagnosis in order of their reoccurrence for the same
person. Assign ranks for diagnoses that occur for the same person
once, twice, three times, four times, or more than 5 times. Remove
diagnoses that re-occur more than 5 times.
-
Order the data by patient ID and diagnosis and rank of diagnosis
To complete this team assignment, upload the first 4 lines of the
merged clean data into a word file and submit to your instructor. The 4 lines of data
results for both team members should be the same, work together until
you achieve the same results. The SQL code could be different.
-
Definition, content and use of electronic health records among
United States providers PubMed►
-
Microsoft Access
-
Creating tables and records Read► Video►
-
Creating forms Video► Slides►
-
Create hierarchical relationships Video► SWF►
-
Introduction to queries SWF►
|