Calculations and Complex Queries in Healthcare Databasess

Complex Queries

This session shows how database manipulations can provide you with measures of episodes of illness. Measures of episodes of illness are important in evaluating performance of clinicians and healthcare programs. This page is based on a recent publication. For details see: Alemi F, Walters SR. A Mathematical Theory for Identifying and Measuring Severity of Episodes of Care. l Manag Health Care. 2006 April/June;15(2):72-82. PubMed►

This page is based on United States patent application 10/054,706 filed on 1/24/2002 by George Mason University on behalf of Farrokh Alemi, Ph.D. and Valentin Prudius. We grant permission to individual scientists within university, Federal and State governments settings to use these procedures free of licensing fees for the purpose of evaluating its effectiveness. Permission is also granted to all students using this procedure as part of an educational class.

Episodes of Illness

This paper provides a mathematical model for identifying episodes of care and measuring the severity of an episode. Measures of episodes of care, in general, and the approach proposed here, in particular, can be used to set capitation rates or to profile clinicians performance. Numerous approaches to measuring episodes of care exist. Examples include Prospective Risk Adjustment, Ambulatory Visit Groups, Disease Staging, Products of Ambulatory Care, Ambulatory Diagnosis Groups and Ambulatory Care Groups. In addition to broad approaches to measurement of episodes of illness, many have developed disease specific episodes of care. There have been a number of review articles about episodes of care. Given the wide range of approaches to measurement of episodes of care, the natural question to ask is why we need yet another approach?

Three reasons have motivated us to seek a new approach to measuring episodes of care. First, we provide a mathematical model for measuring episodes of care. No other approach does so. Most existing approaches to measuring episodes of care do not describe the internal procedures used for measuring severity or identifying episodes of care. Some commercial approaches consider such information as business secrets that should not be disclosed. Even when they do describe the internal mechanism of the approach, all rely on heuristics that make clinical sense but do not provide a mathematical theory for the relation between the variables used in constructing episodes of care. Thus, researchers face a black box -- the content of which they know little about or is based on heuristics that they cannot easily modify and reapply. In the absence of a theory, it is difficult to learn from one study how better measures can be constructed. Each study and each approach exists on its own merits and fails to contribute to the other. Then researchers tend to compete on claims of accuracy rather than to build on each other's work. As a result, while many approaches exist, cumulative progress in the field where in one investigator builds on another person’s approach, has been limited.

The mathematical theory proposed in this paper allows us to accumulate information and improve our understanding of how severity of episodes of care should be measured. Future researchers can change the theory to arrive at more accurate predictions. As new insights are found, the theory is modified and knowledge is accumulated. Second, our proposed approach does not classify diagnoses into clusters of diseases before identifying episodes of care. All existing approaches are built on the concept of classifying possible diagnoses into a few clusters and then findings rules for creating episodes for these clusters. Schneeweiss and colleagues reported that 92 diagnosis clusters make up 86 percent of all ambulatory visits. Others have expanded this set to 125, with varying levels of severity and different time periods, during which the diagnoses in the cluster will belong to the same episode. We propose an approach that does not attempt to reduce the large set of possible diagnoses into a smaller set of clusters. Reduction into smaller set of diseases, by definition, gives up important nuances in order to have a manageable set of diagnoses.

For example, infections often follow wounds and therefore may be considered part of the same episode. But an otitis media, even though an infection of the ear, could not possibly be part of an episode of trauma to the leg. Defining all infections as one cluster of diagnoses forces investigators to ignore important differences that might exist between types of infections. In our approach, all operations are defined on individual diagnoses without need to pre-set these diagnoses into broad clusters. Sometimes classification of diseases into clusters is based on the etiology of the disease, leading in our view to counter intuitive classifications. An episode of trauma may include fracture to the knee as well as fracture of the leg even though the knee and leg are different problems. Congestive heart failure may be part of episode of myocardial infarction even though one involves the heart the other the lung. Two very dissimilar diagnoses may be part of the same episode, even though these diagnoses do not describe the same illness. Third, the objective of our proposed approach, contrary to some existing approaches is not to create homogenous resource use episodes. Thus, not all follow-up visits are part of the same episode even though they may all be short visits and therefore have similar resource use.

In our approach, the nature of the diagnosis, not the intensity of visits, is the basis of classifying visits into episodes. For example, follow-up visit for myocardial infarction is part of an MI episode and a follow-up visit for trauma is part of trauma episode. Intensity-based measures cannot be used for evaluating whether the numbers of visits are appropriate. In essence, they are fee schedules, except that these fee schedules are based on groups of visits or diagnoses and not single visit diagnosis. We propose a relation-based episode classification system that remedies this important shortcoming. It can be used to judge appropriateness of number of visits.

Proposed Method of Measuring Severity of Episodes

We start our description of the methods with a few definitions. An episode of care is a group of diagnoses on the same patient that describes the course of a given illness. Note that this definition does not depend on the nature of services delivered, the doctor delivering services, or the site of services. Nor, contrary to others, does this definition assume that services are temporally contiguous. Thus, it allows for episodes to be overlapping; for example, a patient may have an acute exacerbation of their chronic diabetes and experience an episode of upper respiratory infection. Episodes are characterized by what we call "an anchor diagnosis." This is the diagnosis that gives its name to the episode. Episodes have starting (sometimes referred to as trigger diagnosis) and stopping points that may be different from the anchor diagnosis. Episodes are characterized by a rate of progression, a peak severity during the course of episode, and morbidity and mortality outcomes. One episode, for example, may have a rapid onset, progress to a very serious condition, and then lead to death. Another episode may have a slow onset and never become serious.

Selecting diagnoses that are part of same episode

Defining an episode begins with selecting diagnoses that are part of the same episode. Imagine that a patient has had a series of diagnosis D₁, D₂, ... D_m at times T₁, T₂, through Tm. Whether two diagnoses are part of the same episode depends on the nature of the two diagnoses and the time between them. Two diagnoses that are similar or related in nature should be part of the same episode unless they occur at significantly different times. If we define P_ia as the probability that the diagnosis “i and diagnosis “a belong to the same episode, then the theory suggests that:

P_ia= function {T_ia, S_ia}

Where the similarity between the diagnosis “i” and diagnosis “a” is S_ia and number of days between diagnosis “i” and diagnosis “a” is T_ia and calculated as:

T_ia= T_a T_i and T _ia > 0

Note that the probability of being part of the same episode, P_ia, should be directly related to similarity of two diagnoses, S_ia, and inversely related to, T_ia, the time between the two diagnoses. A specific mathematical function that preserves these two relationships is:

P_ia= S _ia / (1+βT _ia)

In the above equation, α and β are constants. When a patient presents with several diagnoses, then the probability that any two of the diagnoses may belong to an episode is calculated using the above formula. Later, these pair-wise probabilities of belonging to the same episode are used to classify diagnosis into groups -- using one of many widely available classification methods. For a specific example of a classification algorithm see Appendix at end of this paper.

Severity of an Episode

Diagnoses differ in terms of their severity. We show severity of diagnosis i as Sev_iand calculate the overall severity of an episode by the following mathematical formula:

Overall severity of episode = 1- п _i (1 - Sev_i)

There are many different mathematical formulas for aggregating severity of individual diagnosis to severity of an episode. The most common approach is to add or average the severity scores for each diagnosis. Adding scores is not appropriate, as episodes with few severe diagnoses would be scored lower than episodes with many non-severe diagnoses. Averaging is also not appropriate, as patients who have two diagnoses, one severe and the other not, will be rated lower than patients with just the severe diagnosis. Instead of adding or averaging the scores, we prefer using the above Multiplicative model. For example, if a patient has two diagnoses, one with severity score 0.9 and another with severity score 0.5, then the overall severity of the episode is calculated as: Overall severity for the patient = 1 -(1-0.9)*(1-0.5) = 0.95 Compared to the adding or the averaging formula, the multiplicative formula has several advantages: The influence of severe diagnoses on the overall score are not diluted by non-severe diagnoses and merely increasing the number of diagnoses will not necessarily result in high overall severity scores.

Evaluation of the Algorithm

We created a measure for severity of episodes of illness for developmentally delayed children who were enrolled in the Medicaid program of one Southeastern State. Developmentally delayed children use health services extensively. To reduce computational difficulties and without loss of generality, we randomly sampled 565 patients among the 3250 patients in the database. The data included both in-patient and outpatient Medicaid payments for the patient. The in-patient portion included both the health professionals billing and the institution's bills. On average, the State paid $9,296 per patient per year. The standard error of the cost was $2,238, reflecting large variation in cost of care across patients. Cost ranged from low of $29 (reflecting patients enrolled for portion of the year) to $884,967 per year. Estimating the time between two diagnoses, T_ia, is easy and can be read directly from the database by taking the absolute value of the difference in dates of the two diagnoses. Estimating the similarity of the two diagnoses, S_ia, was more difficult. A surrogate measure of similarity of two diagnoses is the number of times the two diagnoses co-occur within a specific time frame. The implicit assumption is that complications and related problems tend to occur in clusters. We calculated a score proportional to the likelihood that two diagnoses belong to the same episode by the formula provided earlier. We used this score to classify diagnoses into episodes in such a manner that diagnoses within one episode were more similar than diagnoses in different episodes. Appendix A gives a detailed example of how diagnoses were classified. The mean number of episodes was 147 (standard error = 320). Patients differed considerably in the number of episodes they had. We calculated the severity of each diagnosis as the average amount paid for the diagnosis. Severity and costs are not always related, especially when patients die before expensive services can be delivered. But in our database no patient passed away. Therefore, cost may have been a reasonable surrogate measure of severity.

Results

To test the accuracy of our measures of episodes of care, we regressed cost of care on severity of the episode, number of episodes and interaction between number of episodes and severity of episodes. We measured cost of care by the amount the State paid for each encounter. Since patients' eligibility for Medicaid changes frequently, the amount paid by the State is only an approximate measure of total cost of care of the patient. To have one estimate of severity for a patient, we averaged the severity scores for each patient across all their episodes during the year. The averaged severity score ranged from 0.01 to 0.27. The mean was 0.03 (standard error = 0.001). Table 1 summarizes regression results.

	Coefficients	P-value
Intercept	-7297	0.003
Average severity of episodes	-33.58	0.000
Number of episodes	444971	0
Interaction between number of episodes & severity of episodes	756	0
Table 1: Regression of "Amount paid by the State" on severity and number of episodes
Number of observations = 565, Adjusted R Squared = 53.11%

The dependent variable was "the amount paid by the State". All three independent variables -- "the average severity of the episodes", "the number of episodes of the patient" and “the interaction between the severity and the number of episodes” -- were statistically significant predictors of the dependent variable at alpha levels lower than 0.001. The R-Squared adjusted by degrees of freedom was 53%.

Discussion

Data showed that episodes of care can be constructed from encounter databases. Furthermore, the proposed measure of episode of care explained a large percentage of variance in cost of care. The magnitude of the percent of variance explained by the measures reported here is of special interest. Many measures of severity and case mix report R² values less than 10%. Because our approach explains a large percent of the variance, our confidence in the validity of our measure of severity of episodes is increased. One may expect the performance of the approach developed in this paper to deteriorate when parameters of the model are estimated from one database and applied to another unrelated database. Nevertheless the magnitude of percent of variations in objective data explained by our approach is so high that we are hopeful that even with drops in performance, our approach will remain relatively more accurate than many existing approaches. The approach proposed here can be used to construct episodes of care for specific diseases. Thus, if one investigator is interested in episodes for diabetes and another is interested in episodes of cancer, both can use the algorithm proposed here by pre-selecting patients with a particular disease. The most appealing part of the proposed approach is the ease with which the approach can be integrated with existing databases. The proposed mathematical model works on any administrative database, which has information on date of visit and diagnosis. Any person familiar with database operations can implement it. In addition, electronic medical record companies can use the algorithm proposed here to embed methods of analyzing performance of clinicians within their electronic record systems.

Appendix A: An Algorithm for Classifying Severity of Episodes of Illness

1) Start with a Table of patient unique identification number, diagnosis and time of diagnosis. Here is a small example database:

Time (dd/mm/yy)	Patient ID	Diagnosis
01/01/01	1001	A
12/01/01	1001	B
22/01/01	1002	A
12/01/01	1002	B
22/01/01	1003	C
02/02/01	1001	D
02/02/01	1002	B
12/02/01	1003	D
13/02/01	1003	B
01/05/01	1002	C
Table 2: Starting Data

2) Create a query identifying for any pair of diagnoses the number of unique patients for whom the two diagnoses co-occur within 30 days. Note that the co-occurrence of diagnosis "a" and "b" does not depend on the order of which one comes first. Here is how the query will look like for the above example data:

First diagnosis	Second diagnosis	Co-occurrences	First diagnosis	Second diagnosis	Co-occurrences
A	A	2	C	A	0
A	B	2	C	B	1
A	C	0	C	C	2
A	D	1	C	D	1
B	A	2	D	A	1
B	B	2	D	B	2
B	C	1	D	C	1
B	D	2	D	D	2

3) For each patient conduct the following analysis: a) For the patient, when the same diagnosis occurs at two different time periods, rename the diagnoses into unique names -- usually a combination of the name and date of diagnosis. For example patient 1002 has the following data when renamed:

Time (dd/mm/yy)	Patient ID	Diagnoses
12/01/01	1002	B1201
22/01/01	1002	A
13/02/01	1002	B1302
01/05/01	1002	C
Table 4: Renaming same diagnosis at two times

4) For the patient, measure the absolute value of the length of time between any pair of diagnoses for the patient, refer to this as time between any two diagnoses. For example for patient 1002 the time between two different diagnoses will be:

First diagnosis	Second diagnosis	Time	First diagnosis	Second diagnosis	Time
A	B1201	10	B1302	A	21
A	B1302	21	B1302	B1201	31
A	C	38	B1302	C	17
B1201	A	10	C	A	38
B1201	B1302	31	C	B1201	48
B1201	C	48	C	B1302	17
Table 5: Time between diagnoses

5) For the patient, look up the similarity of any pair of different diagnoses they have from step "2" and divide this by absolute value of the time between the two diagnoses, from step "b". Refer to this as the score. For example for the patient 1002 the results will be:

First diagnosis	Second diagnosis	Time	First diagnosis	Second diagnosis	Time
A	B1201	.20	B1302	A	.10
A	B1302	.10	B1302	B1201	.06
A	C	.0	B1302	C	.06
B1201	A	.20	C	A	0
B1201	B1302	.06	C	B1201	.02
B1201	C	.02	C	B1302	.06
Table 6: Calculation of the Score

6) For the patient, standardized the score so that it ranges between 1 and zero by subtracting the minimum value from each score and dividing the results by the difference of maximum and minimum score. Refer to this as standardized score. For the patient 1002 the standardized score is as follows:

First diagnosis	Second diagnosis	Time	First diagnosis	Second diagnosis	Time
A	B1201	1	B1302	A	.50
A	B1302	.48	B1302	B1201	.30
A	C	.00	B1302	C	.30
B1201	A	1.0	C	A	.00
B1201	B1302	.32	C	B1201	.10
B1201	C	0.1	C	B1302	.30
Table 7: Standardized scores

7) Classify different diagnoses into episodes by using the standardized score. The following is one classification procedure that could be used:

Combine the two diagnoses with maximum standardized score into one episode if the value of the standardized score is higher than a pre-set cutoff -- usually 0.5.
Create a new diagnosis to represent the two diagnoses that were combined into an episode. Calculate the standardized score for this new diagnosis by averaging the standardized score of its two components.
Exclude the diagnoses that have already been combined into new diagnoses from further analysis and repeat steps starting from step 1. For example, the data for case 1002 will follow these steps:

4	A	B1201	B1302	C
A		1.0	.48	.00
B1201	1.0		.32	.10
B1302	.50	.32		.30
C	.00	.10	.30
Table 8: Classify diagnoses into episodes

A new diagnosis is created named AB1201 and standardized scores for the new diagnosis are calculated as the average of its component:

4	A	B1201	AB1201
B1302	.50	.32	(.5+.32)/2
C	.00	.10	(.00+.10)/2
Table 9: Calculation for a new diagnosis

The diagnosis already combined into an episode are excluded from further analysis and the steps are repeated and a new maximum of 0.41 is found.

	B1302	C	AB1201
B1302		.30	.41
C	.30		.05
Table 10: Finding a new maximum

The new maximum is not higher than the cutoff of 0.5. Therefore, no other diagnoses are combined into new episodes.

The result of the calculation for patient 1002 was three episodes:

The combination of diagnosis A and diagnosis B on 12/01/01.
Diagnosis B on 13/02/01 by itself.
Diagnosis C by itself. Note that diagnosis B on 13/02/01 was not combined with diagnosis B on 12/01/01 even though both are the same diagnosis.

Analyze Data

Part A: Import data from the following files into four tables. From ptid.xls to Patients, from claims.xls to Claims, from icd.xls to ICD9, from cpt.xls to CPT table. Video► SWF►

For each person, find the description of the diagnoses with maximum cost. Video► SWF►
Calculate the number of times two pair of diagnoses co-occur in the same person. Video► SWF►
Calculate for each person, the average time between two consecutive diagnoses. Video► SWF►

Part B: Measure severity of illness using H-CUP data. Contact the instructor for access to the data. More►

Organize your data You Tube►
Measure severity You Tube►

Please bring your work to class or email screen captures to the instructor. Email►

Presentations

There are several presentations for this lecture:

Lecture on episodes of illness. Listen► SWF► Slides►
Calculate similarity of two diagnoses Video► SWF►
Walk through algorithm for measuring episodes of illness Silent video► SWF►

Narrated slides and video require use of Flash. Download►.

Episode Treatment Groups PubMed►
Translation into Arabic. Read►

An algorithm for classifying episodes

-- Episodes of Care Classification

-- Table and View Creation Script

-- Author: Farrokh Alemi, Samuel Walters

-- Last Update: 2/14/06

-- Format: ANSI SQL-92

-- See a database implementation Download►

-- -------------------------------------------

-- This script creates the four tables necessary to execute the Episodes of care algorithm.

-- This script only needs to be executed once to establish the tables.

CREATE TABLE tblSimilarity(

FirstDiagnosis varchar(50) NOT NULL,

SecondDiagnosis varchar(50) NOT NULL,

Similarity int NOT NULL);

CREATE TABLE tblPatientDiagnosesGrouped(

RecordID int NOT NULL,

PatientID varchar(50) NOT NULL,

DiagnosisDateTime datetime NOT NULL,

DiagnosisCode varchar(250) NOT NULL);

CREATE TABLE tblPatientDiagnoses(

RecordID int IDENTITY(1,1) NOT NULL,

PatientID varchar(50) NOT NULL,

DiagnosisDateTime datetime NOT NULL,

DiagnosisCode varchar(250) NOT NULL);

CREATE TABLE tblEpisodeIndex(

EpisodeRecordID int IDENTITY(1,1) NOT NULL,

EpisodeID varchar(2500),

PatientID varchar(50),

DiagnosisCode varchar(50),

DiagnosisDateTime datetime);

-- ----------------------------------------------------

-- Classify Episodes of Care --

-- Author: Farrokh Alemi, Samuel Walters --

-- Last Modified: 2/14/06 --

-- ----------------------------------------------------

-- SECTION I - Diagnosis Record Grouping

-- Groups duplicate patient diagnoses by patient into single records to avoid processing

-- duplicates and improve transaction performance.

-- ----------------------------------------------------

-- Delete any grouped records from previous analyses

DELETE FROM tblPatientDiagnosesGrouped;

-- Insert new patient diagnoses records into grouped table,

-- grouping by patient, diagnosis code,

-- diagnosis date/time, and taking the last matching

-- RecordID for all diagnosis records in the grouped row.

INSERT INTO tblPatientDiagnosesGrouped(

PatientID,

DiagnosisCode,

DiagnosisDateTime,

RecordID)

SELECT tblPatientDiagnoses.PatientID,

tblPatientDiagnoses.DiagnosisCode,

tblPatientDiagnoses.DiagnosisDateTime,

Max(tblPatientDiagnoses.RecordID)

FROM tblPatientDiagnoses

GROUP BY tblPatientDiagnoses.PatientID,

tblPatientDiagnoses.DiagnosisCode,

tblPatientDiagnoses.DiagnosisDateTime

ORDER BY tblPatientDiagnoses.DiagnosisDateTime;

-- ----------------------------------------------------

-- SECTION II - Construct Similarity Index

-- Constructs a table containing the number of times

-- any diagnosis pair co-occurs for each patient. This

-- table is used to calculate the similarity score

-- for each diagnosis pair for each patient.

-- ----------------------------------------------------

-- Delete all Similarity table entries from previous

-- analyses.

DELETE FROM tblSimilarity;

-- Determine the number of times a pair of diagnoses

-- co-occur for any given patient, and insert those

-- counts into the similarity table with the

-- diagnosis pairs.

INSERT INTO tblSimilarity( FirstDiagnosis, SecondDiagnosis, Similarity)

SELECT FirstDiagnosis, SecondDiagnosis, Count(PatientID)

FROM (SELECT tblPatientDiagnosesGrouped1.DiagnosisCode AS FirstDiagnosis,

tblPatientDiagnosesGrouped2.DiagnosisCode AS SecondDiagnosis,

tblPatientDiagnosesGrouped1.PatientID

FROM tblPatientDiagnosesGrouped as tblPatientDiagnosesGrouped1,

tblPatientDiagnosesGrouped AS tblPatientDiagnosesGrouped2

WHERE tblPatientDiagnosesGrouped2.PatientID = tblPatientDiagnosesGrouped1.PatientID AND

-- To adjust the sensitivity of this algorithm, modify the value "30" on the next line or the maximum number of days between diagnosis pairs)

-- to allow more or less maximum days between diagnosis pairs. If this number is edited, the same number must be modified in

-- Section III to account for the change.

CONVERT(int, tblPatientDiagnosesGrouped2.DiagnosisDateTime

tblPatientDiagnosesGrouped1.DiagnosisDateTime) <= 30

GROUP BY tblPatientDiagnosesGrouped1.DiagnosisCode,

tblPatientDiagnosesGrouped2.DiagnosisCode,

tblPatientDiagnosesGrouped1.PatientID

HAVING tblPatientDiagnosesGrouped2.DiagnosisCode >

tblPatientDiagnosesGrouped1.DiagnosisCode

) As GroupedPerPatient

GROUP BY FirstDiagnosis,

SecondDiagnosis;

-- ----------------------------------------------------

-- SECTION III - Identify Episodes of Care

-- This query uses the previously constructed similarity table to

-- identify episodes of illness for all patients in grouped table. Paired diagnoses are

-- identified as those for whom the similarity of the diagnosis pair divided by the

-- elapsed time between the diagnoses is greater than or equal to the cutoff

-- value, defined as the maximum similarity value in tblSimilarity divided by the maximum timeframe within

-- which two diagnoses co-occur in the Similarity table (30 days unless altered above) plus one to avoid

-- division by zero in cases where two diagnoses occur on the same date/time value. The resulting episode

-- records are inserted into the table tblEpisodeIndex.

-- -----------------------------------------------------

-- Delete any records remaining from previous analyses from the episode index table.

DELETE FROM tblEpisodeIndex;

-- Insert an episode record for each diagnosis belonging to an episode for a patient. Episodes are uniquely

-- identified using EpisodeID.

INSERT INTO tblEpisodeIndex (PatientID, DiagnosisCode, DiagnosisDateTime, EpisodeID)

SELECT PatientID, DiagnosisCode, DiagnosisDateTime, Min(Episode) AS EpisodeID

FROM (SELECT TOP 100 PERCENT

tblPatientDiagnosesGrouped.PatientID,

tblPatientDiagnosesGrouped.DiagnosisCode,

tblPatientDiagnosesGrouped.DiagnosisDateTime,

FirstDiagnosis,

FirstDiagnosisDate,

SecondDiagnosis,

SecondDiagnosisDate,

(CASE

WHEN SecondDiagnosis Is Null

THEN tblPatientDiagnosesGrouped.PatientID +

tblPatientDiagnosesGrouped.DiagnosisCode +

CONVERT(varchar(50),
CONVERT(int,tblPatientDiagnosesGrouped.DiagnosisDateTime))

ELSE tblPatientDiagnosesGrouped.PatientID +FirstDiagnosis +

CONVERT(varchar(50),CONVERT(int,FirstDiagnosisDate))

END) AS Episode

FROM tblPatientDiagnosesGrouped

LEFT JOIN (SELECT TOP 100 PERCENT

tblPatientDiagnosesGrouped1.PatientID,

tblPatientDiagnosesGrouped1.DiagnosisCode AS FirstDiagnosis,

tblPatientDiagnosesGrouped1.DiagnosisDateTime AS FirstDiagnosisDate,

tblPatientDiagnosesGrouped2.DiagnosisCode as SecondDiagnosis,

tblPatientDiagnosesGrouped2.DiagnosisDateTime as SecondDiagnosisDate,

tblPatientDiagnosesGrouped1.RecordID,

CONVERT (Float,tblSimilarity.Similarity)/(1 +

CONVERT(float, tblPatientDiagnosesGrouped2.DiagnosisDateTime - tblPatientDiagnosesGrouped1.DiagnosisDateTime)) AS Score

FROM (SELECT CONVERT(float,Max(Similarity))/31 as CutoffValue

FROM tblSimilarity) as Cutoff,

(tblSimilarity

INNER JOIN

tblPatientDiagnosesGrouped As tblPatientDiagnosesGrouped1

ON tblSimilarity.FirstDiagnosis = tblPatientDiagnosesGrouped1.DiagnosisCode)

INNER JOIN

tblPatientDiagnosesGrouped AS tblPatientDiagnosesGrouped2 ON tblSimilarity.SecondDiagnosis = tblPatientDiagnosesGrouped2.DiagnosisCode AND tblPatientDiagnosesGrouped1.PatientID = tblPatientDiagnosesGrouped2.PatientID

WHERE tblPatientDiagnosesGrouped1.DiagnosisDateTime < tblPatientDiagnosesGrouped2.DiagnosisDateTime AND

tblPatientDiagnosesGrouped1.RecordID < tblPatientDiagnosesGrouped2.RecordID AND

CONVERT(float, tblSimilarity.Similarity)/(1 + CONVERT(float, tblPatientDiagnosesGrouped2.DiagnosisDateTime - tblPatientDiagnosesGrouped1.DiagnosisDateTime)) > CutoffValue

ORDER BY tblPatientDiagnosesGrouped1.DiagnosisDateTime DESC,

tblPatientDiagnosesGrouped2.DiagnosisCode DESC

) As LinkedDiagnoses

ON tblPatientDiagnosesGrouped.DiagnosisDateTime = LinkedDiagnoses.SecondDiagnosisDate AND

tblPatientDiagnosesGrouped.DiagnosisCode = LinkedDiagnoses.SecondDiagnosis AND

tblPatientDiagnosesGrouped.PatientID = LinkedDiagnoses.PatientID

ORDER BY tblPatientDiagnosesGrouped.PatientID,

tblPatientDiagnosesGrouped.DiagnosisDateTime

) as EpisodesUngrouped

GROUP BY PatientID, DiagnosisCode, DiagnosisDateTime

ORDER BY PatientID, Min(Episode);

This page is part of the course on Healthcare Databases, lecture on Complex Queries This page was first prepared on January 2005 and last revised on 01/27/2017.