Complex Queries

This session shows how database manipulations can provide you with measures of episodes of illness.  Measures of episodes of illness are important in evaluating performance of clinicians and healthcare  programs.  This page is based on a recent publication.  For details see:  Alemi F, Walters SR.  A Mathematical Theory for Identifying and Measuring Severity of Episodes of Care. l Manag Health Care. 2006 April/June;15(2):72-82. PubMed►

This page is based on United States patent application 10/054,706 filed on 1/24/2002 by George Mason University on behalf of Farrokh Alemi, Ph.D. and Valentin Prudius.  We grant  permission to individual scientists within university, Federal and State  governments settings to use these procedures free of licensing fees for  the purpose of evaluating its effectiveness.  Permission is also  granted to all students using this procedure as part of an educational class.

Episodes of Illness

This paper provides a mathematical model for identifying episodes of  care and measuring the severity of an episode.  Measures of episodes of  care, in general, and the approach proposed here, in particular, can be  used to set capitation rates or to profile clinicians performance.  Numerous approaches to measuring episodes of care exist.  Examples  include Prospective Risk Adjustment, Ambulatory Visit Groups, Disease  Staging, Products of Ambulatory Care, Ambulatory Diagnosis Groups and  Ambulatory Care Groups. In addition to broad approaches to measurement  of episodes of illness, many have developed disease specific episodes of  care. There have been a number of review articles about episodes of  care. Given the wide range of approaches to measurement of episodes of  care, the natural question to ask is why we need yet another approach? 

Three reasons have motivated us to seek a new approach to measuring  episodes of care. First, we provide a mathematical model for measuring  episodes of care. No other approach does so. Most existing approaches to  measuring episodes of care do not describe the internal procedures used  for measuring severity or identifying episodes of care. Some commercial  approaches consider such information as business secrets that should not  be disclosed. Even when they do describe the internal mechanism of the  approach, all rely on heuristics that make clinical sense but do not  provide a mathematical theory for the relation between the variables  used in constructing episodes of care. Thus, researchers face a black  box -- the content of which they know little about or is based on  heuristics that they cannot easily modify and reapply. In the absence of  a theory, it is difficult to learn from one study how better measures  can be constructed. Each study and each approach exists on its own  merits and fails to contribute to the other. Then researchers tend to  compete on claims of accuracy rather than to build on each other's work.  As a result, while many approaches exist, cumulative progress in the  field where in one investigator builds on another person’s approach,  has been limited.

The mathematical theory proposed in this paper allows  us to accumulate information and improve our understanding of how  severity of episodes of care should be measured. Future researchers can  change the theory to arrive at more accurate predictions. As new  insights are found, the theory is modified and knowledge is accumulated.  Second, our proposed approach does not classify diagnoses into clusters  of diseases before identifying episodes of care. All existing approaches  are built on the concept of classifying possible diagnoses into a few  clusters and then findings rules for creating episodes for these  clusters. Schneeweiss and colleagues reported that 92 diagnosis clusters  make up 86 percent of all ambulatory visits. Others have expanded this  set to 125, with varying levels of severity and different time periods,  during which the diagnoses in the cluster will belong to the same  episode.  We propose an approach that does not attempt to reduce  the large set of possible diagnoses into a smaller set of clusters.   Reduction into smaller set of diseases, by definition, gives up  important nuances in order to have a manageable set of diagnoses.

For  example, infections often follow wounds and therefore may be considered  part of the same episode. But an otitis media, even though an infection  of the ear, could not possibly be part of an episode of trauma to the  leg. Defining all infections as one cluster of diagnoses forces  investigators to ignore important differences that might exist between  types of infections. In our approach, all operations are defined on  individual diagnoses without need to pre-set these diagnoses into broad  clusters. Sometimes classification of diseases into clusters is based on  the etiology of the disease, leading in our view to counter intuitive  classifications. An episode of trauma may include fracture to the knee  as well as fracture of the leg even though the knee and leg are  different problems. Congestive heart failure may be part of episode of  myocardial infarction even though one involves the heart the other the  lung. Two very dissimilar diagnoses may be part of the same episode,  even though these diagnoses do not describe the same illness. Third, the  objective of our proposed approach, contrary to some existing approaches  is not to create homogenous resource use episodes. Thus, not all  follow-up visits are part of the same episode even though they may all  be short visits and therefore have similar resource use. 

In our  approach, the nature of the diagnosis, not the intensity of visits, is  the basis of classifying visits into episodes. For example, follow-up  visit for myocardial infarction is part of an MI episode and a follow-up  visit for trauma is part of trauma episode. Intensity-based measures  cannot be used for evaluating whether the numbers of visits are  appropriate. In essence, they are fee schedules, except that these fee  schedules are based on groups of visits or diagnoses and not single  visit diagnosis. We propose a relation-based episode classification  system that remedies this important shortcoming. It can be used to judge  appropriateness of number of visits.

Proposed Method of Measuring Severity of Episodes

We start our description of the methods with a few definitions. An  episode of care is a group of diagnoses on the same patient that  describes the course of a given illness. Note that this definition does  not depend on the nature of services delivered, the doctor delivering  services, or the site of services. Nor, contrary to others, does this  definition assume that services are temporally contiguous. Thus, it  allows for episodes to be overlapping; for example, a patient may have  an acute exacerbation of their chronic diabetes and experience an  episode of upper respiratory infection. Episodes are characterized by  what we call "an anchor diagnosis." This is the diagnosis that gives its  name to the episode. Episodes have starting (sometimes referred to as  trigger diagnosis) and stopping points that may be different from the  anchor diagnosis. Episodes are characterized by a rate of progression, a  peak severity during the course of episode, and morbidity and mortality  outcomes. One episode, for example, may have a rapid onset, progress to  a very serious condition, and then lead to death. Another episode may  have a slow onset and never become serious.

Selecting diagnoses that are part of same episode

Defining an episode begins with selecting diagnoses that are part of  the same episode. Imagine that a patient has had a series of diagnosis D1,   D2, ... Dm at times T1, T2,   through Tm. Whether two diagnoses are part of the same episode depends   on the nature of the two diagnoses and the time between them. Two   diagnoses that are similar or related in nature should be part of the   same episode unless they occur at significantly different times. If we   define Pia as the probability that the diagnosis “i and   diagnosis “a belong to the same episode, then the theory suggests that:

Pia= function {Tia,   Sia}

Where the similarity between the diagnosis “i” and diagnosis “a” is Sia and number of days between diagnosis “i” and diagnosis “a” is Tia  and calculated as:

Tia= Ta Ti  and T ia > 0

Note that the probability of being part of the same episode, Pia,   should be directly related to similarity of two diagnoses, Sia,   and inversely related to, Tia, the time between the two   diagnoses. A specific mathematical function that preserves these two   relationships is:

Pia= S ia / (1+βT ia)

In the above equation,  α and β are constants. When a patient presents with several diagnoses, then the probability that any two of the diagnoses   may belong to an episode is calculated using the above formula. Later, these pair-wise probabilities of belonging to the same episode are used   to classify diagnosis into groups -- using one of many widely available  classification methods. For a specific example of a classification   algorithm see Appendix at end of this paper.

Severity of an Episode

Diagnoses differ in terms of their severity. We show severity of  diagnosis i as Sevi and calculate the overall severity of   an episode by the following mathematical formula:

Overall severity of episode = 1-   п i  (1 - Sevi )

There are many different mathematical formulas for aggregating  severity of individual diagnosis to severity of an episode. The most  common approach is to add or average the severity scores for each  diagnosis. Adding scores is not appropriate, as episodes with few severe  diagnoses would be scored lower than episodes with many non-severe  diagnoses. Averaging is also not appropriate, as patients who have two  diagnoses, one severe and the other not, will be rated lower than  patients with just the severe diagnosis. Instead of adding or averaging  the scores, we prefer using the above Multiplicative model. For example,  if a patient has two diagnoses, one with severity score 0.9 and another  with severity score 0.5, then the overall severity of the episode is  calculated as: Overall severity for the patient = 1 -(1-0.9)*(1-0.5) =  0.95 Compared to the adding or the averaging formula, the multiplicative  formula has several advantages: The influence of severe diagnoses on the  overall score are not diluted by non-severe diagnoses and merely  increasing the number of diagnoses will not necessarily result in high  overall severity scores.  

Evaluation of the Algorithm

We created a measure for severity of episodes of illness for  developmentally delayed children who were enrolled in the Medicaid  program of one Southeastern State. Developmentally delayed children use  health services extensively. To reduce computational difficulties and  without loss of generality, we randomly sampled 565 patients among the  3250 patients in the database. The data included both in-patient and  outpatient Medicaid payments for the patient. The in-patient portion  included both the health professionals billing and the institution's  bills. On average, the State paid $9,296 per patient per year. The  standard error of the cost was $2,238, reflecting large variation in  cost of care across patients. Cost ranged from low of $29 (reflecting  patients enrolled for portion of the year) to $884,967 per year.  Estimating the time between two diagnoses, Tia, is easy and   can be read directly from the database by taking the absolute value of   the difference in dates of the two diagnoses. Estimating the similarity   of the two diagnoses, Sia, was more difficult. A surrogate   measure of similarity of two diagnoses is the number of times the two   diagnoses co-occur within a specific time frame. The implicit assumption   is that complications and related problems tend to occur in clusters. We   calculated a score proportional to the likelihood that two diagnoses   belong to the same episode by the formula provided earlier. We used this   score to classify diagnoses into episodes in such a manner that   diagnoses within one episode were more similar than diagnoses in   different episodes. Appendix A gives a detailed example of how diagnoses   were classified. The mean number of episodes was 147 (standard error =   320). Patients differed considerably in the number of episodes they had.   We calculated the severity of each diagnosis as the average amount paid   for the diagnosis. Severity and costs are not always related, especially   when patients die before expensive services can be delivered. But in our   database no patient passed away. Therefore, cost may have been a   reasonable surrogate measure of severity.  


To test the accuracy of our measures of episodes of care, we  regressed cost of care on severity of the episode, number of episodes  and interaction between number of episodes and severity of episodes. We  measured cost of care by the amount the State paid for each encounter.  Since patients' eligibility for Medicaid changes frequently, the amount  paid by the State is only an approximate measure of total cost of care  of the patient. To have one estimate of severity for a patient, we  averaged the severity scores for each patient across all their episodes  during the year. The averaged severity score ranged from 0.01 to 0.27.  The mean was 0.03 (standard error = 0.001). Table 1 summarizes  regression results.

        Coefficients P-value
 Intercept   -7297  0.003
 Average severity of episodes   -33.58  0.000
 Number  of episodes   444971 0
Interaction between number of episodes & severity of  episodes      756 0
 Table 1: Regression of "Amount paid by the State" on  severity and number of episodes
Number  of observations   = 565,     Adjusted R Squared   = 53.11% 

The dependent variable was "the amount paid by the State". All three  independent variables -- "the average severity of the episodes", "the  number of episodes of the patient" and “the interaction between the  severity and the number of episodes” -- were statistically significant  predictors of the dependent variable at alpha levels lower than 0.001.  The R-Squared adjusted by degrees of freedom was 53%.


Data showed that episodes of care can be constructed from encounter  databases. Furthermore, the proposed measure of episode of care  explained a large percentage of variance in cost of care. The magnitude  of the percent of variance explained by the measures reported here is of  special interest. Many measures of severity and case mix report R2   values less than 10%.   Because our approach explains a large   percent of the variance, our confidence in the validity of our measure   of severity of episodes is increased. One may expect the performance of   the approach developed in this paper to deteriorate when parameters of   the model are estimated from one database and applied to another   unrelated database. Nevertheless the magnitude of percent of variations   in objective data explained by our approach is so high that we are   hopeful that even with drops in performance, our approach will remain   relatively more accurate than many existing approaches. The approach   proposed here can be used to construct episodes of care for specific   diseases. Thus, if one investigator is interested in episodes for   diabetes and another is interested in episodes of cancer, both can use   the algorithm proposed here by pre-selecting patients with a particular   disease. The most appealing part of the proposed approach is the ease   with which the approach can be integrated with existing databases. The   proposed mathematical model works on any administrative database, which   has information on date of visit and diagnosis. Any person familiar with   database operations can implement it. In addition, electronic medical   record companies can use the algorithm proposed here to embed methods of   analyzing performance of clinicians within their electronic record   systems.

Appendix A: An Algorithm for Classifying Severity of Episodes of Illness

1) Start with a Table of patient unique identification number,  diagnosis and time of diagnosis. Here is a small example database:
Time (dd/mm/yy) Patient ID Diagnosis
    01/01/01 1001 A
    12/01/01 1001 B
    22/01/01 1002 A
    12/01/01 1002 B
    22/01/01 1003 C
    02/02/01 1001 D
    02/02/01 1002 B
    12/02/01 1003 D
    13/02/01 1003 B
    01/05/01 1002 C
Table 2:  Starting Data

2) Create a query identifying for any pair of diagnoses the number of  unique patients for whom the two diagnoses co-occur within 30 days. Note  that the co-occurrence of diagnosis "a" and "b" does not depend on the  order of which one comes first. Here is how the query will look like for  the above example data: 

First diagnosis Second diagnosis Co-occurrences     First diagnosis Second diagnosis Co-occurrences
    A     A     2           C     A     0
    A     B     2           C     B     1
    A     C     0           C     C     2
    A     D     1           C     D     1
    B     A     2           D     A     1
    B     B     2           D     B     2
    B     C     1           D     C     1
    B     D     2           D     D     2

3) For each patient conduct the following analysis: a) For the  patient, when the same diagnosis occurs at two different time periods,  rename the diagnoses into unique names -- usually a combination of the  name and date of diagnosis. For example patient 1002 has the following  data when renamed:

   Time (dd/mm/yy)     Patient ID     Diagnoses
    12/01/01     1002     B1201
    22/01/01     1002     A
    13/02/01     1002     B1302
    01/05/01     1002     C
Table 4:  Renaming same diagnosis at  two times

4) For the patient, measure the absolute value of the length of time  between any pair of diagnoses for the patient, refer to this as time  between any two diagnoses. For example for patient 1002 the time between  two different diagnoses will be:

First diagnosis Second diagnosis Time     First diagnosis Second diagnosis Time
     A      B1201      10         B1302     A       21
     A      B1302      21             B1302      B1201       31
     A      C      38            B1302      C      17
    B1201      A     10        C      A       38
     B1201      B1302     31            C      B1201       48
     B1201      C     48            C     B1302      17
 Table 5:  Time between diagnoses

5) For the patient, look up the similarity of any pair of different  diagnoses they have from step "2" and divide this by absolute value of  the time between the two diagnoses, from step "b". Refer to this as the  score. For example for the patient 1002 the results will be:

   First diagnosis     Second diagnosis       Time     First diagnosis  Second diagnosis  Time
  A     B1201 .20            B1302      A .10
     A      B1302  .10             B1302      B1201 .06
     A      C .0            B1302     C   .06
    B1201      A  .20         C      A 0
     B1201      B1302    .06        C     B1201  .02
     B1201      C     .02              C      B1302 .06
 Table 6:  Calculation of the Score

6) For the patient, standardized the score so that it ranges between  1 and zero by subtracting the minimum value from each score and dividing  the results by the difference of maximum and minimum score. Refer to  this as standardized score. For the patient 1002 the standardized score  is as follows:

 First diagnosis Second diagnosis  Time       First diagnosis  Second diagnosis Time
 A  B1201 1      B1302  A  .50
  A  B1302  .48    B1302  B1201  .30
  A  C   .00        B1302  C   .30
     B1201  A  1.0    C  A  .00
 B1201  B1302  .32    C  B1201  .10
 B1201  C 0.1    C  B1302  .30
 Table 7:  Standardized scores

7) Classify different diagnoses into episodes by using the  standardized score. The following is one classification procedure that  could be used:

  1. Combine the two diagnoses with maximum standardized score into   one episode if the value of the standardized score is higher than a   pre-set cutoff -- usually 0.5.
  2. Create a new diagnosis to represent the two diagnoses that were  combined into an episode. Calculate the standardized score for this  new diagnosis by averaging the standardized score of its two  components.
  3. Exclude the diagnoses that have already been combined into new  diagnoses from further analysis and repeat steps starting from step  1. For example, the data for case 1002 will follow these steps:
    4       A      B1201      B1302      C
     A            1.0      .48      .00
     B1201      1.0        .32      .10
     B1302      .50      .32             .30
     C      .00      .10      .30       
 Table 8:  Classify diagnoses into  episodes
  1. A new diagnosis is created named AB1201 and standardized scores   for the new diagnosis are calculated as the average of its   component:
    4       A      B1201 AB1201
     B1302      .50      .32 (.5+.32)/2
     C      .00      .10 (.00+.10)/2
 Table 9:  Calculation for a new  diagnosis
  1. The diagnosis already combined into an episode are excluded from   further analysis and the steps are repeated and a new maximum of   0.41 is found.
        B1302      C      AB1201
     B1302             .30      .41
     C      .30             .05
 Table 10:  Finding a new maximum
  1. The new maximum is not higher than the cutoff of 0.5. Therefore,   no other diagnoses are combined into new episodes.

The result of the calculation for patient 1002 was three episodes:

  • The combination of diagnosis A and diagnosis B on 12/01/01.
  • Diagnosis B on 13/02/01 by itself.
  • Diagnosis C by itself. Note that diagnosis B on 13/02/01 was not  combined with diagnosis B on 12/01/01 even though both are the same  diagnosis.

Analyze Data 

Part A:  Import data from the following files into four tables.  From ptid.xls to Patients, from claims.xls to Claims, from icd.xls to ICD9, from cpt.xls to CPT table.  Video► SWF►

  1. For each person, find the description of the diagnoses with maximum cost.  Video►  SWF►
  2. Calculate the number of times two pair of diagnoses co-occur in the same person.  Video►   SWF►
  3. Calculate for each person, the average time between two consecutive diagnoses.  Video►  SWF►

Part B:  Measure severity of illness using H-CUP data.  Contact the instructor for access to the data.   More►

  1. Organize your data You Tube►
  2. Measure severity You Tube►

Please bring your work to class or email screen captures to the instructor.  Email►


There are several presentations for this lecture:

  1. Lecture on episodes of illness. Listen►  SWF►  Slides►
  2. Calculate similarity of two diagnoses Video►  SWF►
  3. Walk through algorithm for measuring episodes of illness Silent video►  SWF►

Narrated slides and video require use of  Flash.  Download►


  1. Episode Treatment Groups  PubMed►
  2. Translation into Arabic.   Read► 

An algorithm for classifying episodes

-- Episodes of Care Classification

-- Table and View Creation Script

-- Author: Farrokh Alemi, Samuel Walters

-- Last Update: 2/14/06

-- Format: ANSI SQL-92

-- See a database implementation Download►

-- ------------------------------------------- 

-- This script creates the four tables necessary to execute the Episodes of care algorithm.

-- This script only needs to be executed once to establish the tables. 

CREATE TABLE tblSimilarity(

FirstDiagnosis             varchar(50)          NOT NULL,

SecondDiagnosis      varchar(50)          NOT NULL,

Similarity           int                  NOT NULL); 

CREATE TABLE tblPatientDiagnosesGrouped(

RecordID             int           NOT NULL,

              PatientID            varchar(50)   NOT NULL,

              DiagnosisDateTime    datetime      NOT NULL,

              DiagnosisCode        varchar(250)  NOT NULL); 

CREATE TABLE tblPatientDiagnoses(

              RecordID             int IDENTITY(1,1)    NOT NULL,

              PatientID            varchar(50)          NOT NULL,

              DiagnosisDateTime    datetime             NOT NULL,

              DiagnosisCode        varchar(250)         NOT NULL); 

CREATE TABLE tblEpisodeIndex(

              EpisodeRecordID      int IDENTITY(1,1)    NOT NULL,

              EpisodeID            varchar(2500),

              PatientID            varchar(50),

              DiagnosisCode        varchar(50),

              DiagnosisDateTime    datetime);  

-- ----------------------------------------------------

-- Classify Episodes of Care                         --

-- Author: Farrokh Alemi, Samuel Walters             --

-- Last Modified: 2/14/06                            --

-- ----------------------------------------------------

-- ----------------------------------------------------

-- SECTION I - Diagnosis Record Grouping

-- Groups duplicate patient diagnoses by patient into single records to avoid processing

-- duplicates and improve transaction performance.

-- ---------------------------------------------------- 

-- Delete any grouped records from previous analyses 

DELETE FROM tblPatientDiagnosesGrouped; 

-- Insert new patient diagnoses records into grouped table,

-- grouping by patient, diagnosis code,

-- diagnosis date/time, and taking the last matching

-- RecordID for all diagnosis records in the grouped row. 

INSERT INTO tblPatientDiagnosesGrouped(





SELECT        tblPatientDiagnoses.PatientID,




FROM          tblPatientDiagnoses 

GROUP BY      tblPatientDiagnoses.PatientID,



ORDER BY      tblPatientDiagnoses.DiagnosisDateTime;

-- ----------------------------------------------------

-- SECTION II - Construct Similarity Index

-- Constructs a table containing the number of times

-- any diagnosis pair co-occurs for each patient. This

-- table is used to calculate the similarity score

-- for each diagnosis pair for each patient.

-- ----------------------------------------------------

-- Delete all Similarity table entries from previous

-- analyses.  

DELETE FROM tblSimilarity; 

-- Determine the number of times a pair of diagnoses

-- co-occur for any given patient, and insert those

-- counts into the similarity table with the

-- diagnosis pairs. 

INSERT INTO   tblSimilarity( FirstDiagnosis, SecondDiagnosis, Similarity) 

SELECT        FirstDiagnosis, SecondDiagnosis, Count(PatientID) 

FROM          (SELECT              tblPatientDiagnosesGrouped1.DiagnosisCode AS FirstDiagnosis,

                           tblPatientDiagnosesGrouped2.DiagnosisCode AS SecondDiagnosis,


              FROM          tblPatientDiagnosesGrouped as tblPatientDiagnosesGrouped1,

                           tblPatientDiagnosesGrouped AS tblPatientDiagnosesGrouped2

              WHERE         tblPatientDiagnosesGrouped2.PatientID = tblPatientDiagnosesGrouped1.PatientID AND  

-- To adjust the sensitivity of this algorithm, modify the value "30" on the next line or the maximum number of days between diagnosis pairs)

-- to allow more or less maximum days between diagnosis pairs.  If this number is edited, the same number must be modified in

-- Section III to account for the change.

CONVERT(int, tblPatientDiagnosesGrouped2.DiagnosisDateTime

              tblPatientDiagnosesGrouped1.DiagnosisDateTime) <= 30 

              GROUP BY      tblPatientDiagnosesGrouped1.DiagnosisCode,



              HAVING        tblPatientDiagnosesGrouped2.DiagnosisCode >


              ) As GroupedPerPatient 

GROUP BY      FirstDiagnosis,


-- ----------------------------------------------------

-- SECTION III - Identify Episodes of Care

-- This query uses the previously constructed similarity table to

-- identify episodes of illness for all patients in grouped table. Paired diagnoses are

-- identified as those for whom the similarity of the diagnosis pair divided by the

-- elapsed time between the diagnoses is greater than or equal to the cutoff

-- value, defined as the maximum similarity value in tblSimilarity divided by the maximum timeframe within

-- which two diagnoses co-occur in the Similarity table (30 days unless altered above) plus one to avoid

-- division by zero in cases where two diagnoses  occur on the same date/time value. The resulting episode

-- records are inserted into the table tblEpisodeIndex.

-- ----------------------------------------------------- 

-- Delete any records remaining from previous analyses from the episode index table.

DELETE FROM tblEpisodeIndex; 

-- Insert an episode record for each diagnosis belonging to an episode for a patient. Episodes are uniquely

-- identified using EpisodeID. 

INSERT INTO tblEpisodeIndex (PatientID, DiagnosisCode, DiagnosisDateTime, EpisodeID) 

SELECT        PatientID, DiagnosisCode, DiagnosisDateTime, Min(Episode) AS EpisodeID 

FROM   (SELECT              TOP 100 PERCENT









                            WHEN   SecondDiagnosis Is Null

                            THEN   tblPatientDiagnosesGrouped.PatientID +

                                  tblPatientDiagnosesGrouped.DiagnosisCode +


                           ELSE   tblPatientDiagnosesGrouped.PatientID +FirstDiagnosis +


                     END) AS Episode 

       FROM          tblPatientDiagnosesGrouped                          

                     LEFT JOIN     (SELECT       TOP 100 PERCENT


                                  tblPatientDiagnosesGrouped1.DiagnosisCode AS FirstDiagnosis,

                                  tblPatientDiagnosesGrouped1.DiagnosisDateTime AS FirstDiagnosisDate,                                  

                                  tblPatientDiagnosesGrouped2.DiagnosisCode as SecondDiagnosis,

                                  tblPatientDiagnosesGrouped2.DiagnosisDateTime as SecondDiagnosisDate,                            


                                                                CONVERT       (Float,tblSimilarity.Similarity)/(1 +

CONVERT(float, tblPatientDiagnosesGrouped2.DiagnosisDateTime -              tblPatientDiagnosesGrouped1.DiagnosisDateTime)) AS Score       

FROM   (SELECT       CONVERT(float,Max(Similarity))/31 as CutoffValue

                                          FROM  tblSimilarity) as Cutoff,                                         



tblPatientDiagnosesGrouped As tblPatientDiagnosesGrouped1

ON tblSimilarity.FirstDiagnosis = tblPatientDiagnosesGrouped1.DiagnosisCode)           


tblPatientDiagnosesGrouped AS tblPatientDiagnosesGrouped2 ON tblSimilarity.SecondDiagnosis = tblPatientDiagnosesGrouped2.DiagnosisCode AND tblPatientDiagnosesGrouped1.PatientID = tblPatientDiagnosesGrouped2.PatientID                                        

WHERE  tblPatientDiagnosesGrouped1.DiagnosisDateTime < tblPatientDiagnosesGrouped2.DiagnosisDateTime AND

tblPatientDiagnosesGrouped1.RecordID < tblPatientDiagnosesGrouped2.RecordID AND

CONVERT(float, tblSimilarity.Similarity)/(1 + CONVERT(float, tblPatientDiagnosesGrouped2.DiagnosisDateTime -   tblPatientDiagnosesGrouped1.DiagnosisDateTime)) > CutoffValue 

                                         ORDER BY      tblPatientDiagnosesGrouped1.DiagnosisDateTime DESC,                                                                    

                                                                    tblPatientDiagnosesGrouped2.DiagnosisCode DESC 

) As LinkedDiagnoses


ON tblPatientDiagnosesGrouped.DiagnosisDateTime = LinkedDiagnoses.SecondDiagnosisDate AND

       tblPatientDiagnosesGrouped.DiagnosisCode = LinkedDiagnoses.SecondDiagnosis AND

       tblPatientDiagnosesGrouped.PatientID = LinkedDiagnoses.PatientID                    

ORDER BY      tblPatientDiagnosesGrouped.PatientID,


) as EpisodesUngrouped

GROUP BY      PatientID, DiagnosisCode, DiagnosisDateTime

ORDER BY      PatientID, Min(Episode);

This page is part of the course on Healthcare Databases, lecture on Complex Queries  This page was first prepared on January 2005 and last revised  on 01/27/2017.