Supplement to Chapter on Probability
- Deﬁning probability Slides► Video► YouTube►
- Probability calculus Slides► Video► YouTube►
- Probability distributions & expectations Slides►
- Conditional probability Slides► Video►
- Likelihood ratio Slides► Video►
the right predictors Slides► Video►
- SQL for calculating Likelihood ratios Slides► Video►
Question 1: Calculate the likelihood ratio
associated with repeated (first, second, third, fourth, and fifth)
Infection of unspecified site and diabetes. Calculate the
likelihood ratio as:
Take your definition for unspecified bacterial infection and
diabetes from Agency for Health Care Quality and Research's
Clinical Classification Software (CCS codes). Since codes have
removed some elements of the diagnosis code, add an "I" prior to the
diagnosis code and a period prior to the last 2 digits. The
CCS diagnosis code should correspond to the format of ICD9 codes in
the data. Rank order the repeated bacterial diagnoses of each
person and calculate a separate likelihood ratio for each occasions
of repetition of the unspecified bacterial diagnosis. Make
sure that you count distinct individuals in calculation of the
likelihood ratios. Plot the relationship between likelihood
ratio for diabetes and the number of bacterial infections. CCS
Question 2: Redo question 1 but this time exclude
patients who died within 6 months of the nth unspecified bacterial
- Work in team of 2 persons. Do not work with a person that you
have previously worked within a team project.
- Upon submission, indicate the name of your team member. Both
team members must submit the team's work separately.
- No copying of code from each other but feel free to learn from
each other. The data reported by team members must be the same,
the SQL code can be different. Come to an agreement on the
findings and help each other to arrive to the same findings.
- If team assignments are completed with individual effort, then
the student loses 10% of the grade.
- Download data Video► Download► SQL► Slides► Screen
- Clean the data as you or your teammate had done so in the
- Verify that both team members are working with same set of
- Randomly set aside 80% of data for training and 20% for
validation. Use the training data set in the following
- Estimate the likelihood ratios associated with each diagnosis
and its repetitions.
- Calculate separate likelihood ratios for first, second,
third, fourth, and fifth occurrences of the same diagnosis
for the same person.
- Adjust for situations where the outcome never or always
- Identify the total number of unique diagnoses for which you have
an estimated likelihood ratio.
- Rank order the estimated likelihood ratios in order of the
- Using the web, identify the name of 4 diagnoses with the largest
To complete this team assignment, upload your SQL code, the
first 4 diagnoses with the largest likelihood ratios (report the
names and the likelihood ratios), and the total number of unique
diagnoses into a word document. Then, upload the document into
Blackboard. Each student will upload their document by Sunday,
11:55 PM, EST.