Supplement to Chapter on Probability

  1. Presentations

    1. Defining probability  Slides►  Video►  YouTube►
    2. Probability calculus  Slides►  Video►  YouTube►
    3. Probability distributions & expectations Slides►   Video►  YouTube►
    4. Conditional probability Slides► Video►
    5. Likelihood ratio Slides► Video►
    6. Selecting the right predictors Slides► Video►
    7. SQL for calculating Likelihood ratios Slides► Video►
  2. Assignments

    Question 1: Calculate the likelihood ratio associated with repeated (first, second, third, fourth, and fifth) Infection of unspecified site and diabetes.  Calculate the likelihood ratio as:

    Take your definition for unspecified bacterial infection and diabetes from Agency for Health Care Quality  and Research's Clinical Classification Software (CCS codes).  Since codes have removed some elements of the diagnosis code, add an "I" prior to the diagnosis code and a period prior to the last 2 digits.  The CCS diagnosis code should correspond to the format of ICD9 codes in the data.  Rank order the repeated bacterial diagnoses of each person and calculate a separate likelihood ratio for each occasions of repetition of the unspecified bacterial diagnosis.  Make sure that you count distinct individuals in calculation of the likelihood ratios.  Plot the relationship between likelihood ratio for diabetes and the number of bacterial infections.  CCS Codes►  Data►

    Question 2: Redo question 1 but this time exclude patients who died within 6 months of the nth unspecified bacterial infection.  Data►

    Team Assignment

    General requirements:

    • Work in team of 2 persons.  Do not work with a person that you have previously worked within a team project.
    • Upon submission, indicate the name of your team member.  Both team members must submit the team's work separately.
    • No copying of code from each other but feel free to learn from each other. The data reported by team members must be the same, the SQL code can be different. Come to an agreement on the findings and help each other to arrive to the same findings.
    • If team assignments are completed with individual effort, then the student loses 10% of the grade.   

    Team tasks:

    1. Download data  Video► Download► SQL► Slides► Screen Shots►
    2. Clean the data as you or your teammate had done so in the previous week.  
    3. Verify that both team members are working with same set of cleaned data. 
    4. Randomly set aside 80% of data for training and 20% for validation. Use the training data set in the following calculations. 
    5. Estimate the likelihood ratios associated with each diagnosis and its repetitions. 
      • Calculate separate likelihood ratios for first, second, third, fourth, and fifth occurrences of the same diagnosis for the same person. 
      • Adjust for situations where the outcome never or always occurs
    6. Identify the total number of unique diagnoses for which you have an estimated likelihood ratio.
    7. Rank order the estimated likelihood ratios in order of the likelihood ratio
    8. Using the web, identify the name of 4 diagnoses with the largest likelihood ratios. 

    To complete this team assignment,  upload your SQL code, the first 4 diagnoses with the largest likelihood ratios (report the names and the likelihood ratios), and the total number of unique diagnoses into a word document.  Then, upload the document into Blackboard.  Each student will upload their document by Sunday, 11:55 PM, EST.