Complex QueriesThis session shows how database manipulations can provide you with measures of episodes of illness. Measures of episodes of illness are important in evaluating performance of clinicians and healthcare programs. This page is based on a recent publication. For details see: Alemi F, Walters SR. A Mathematical Theory for Identifying and Measuring Severity of Episodes of Care. l Manag Health Care. 2006 April/June;15(2):72-82. PubMed► This page is based on United States patent application 10/054,706 filed on 1/24/2002 by George Mason University on behalf of Farrokh Alemi, Ph.D. and Valentin Prudius. We grant permission to individual scientists within university, Federal and State governments settings to use these procedures free of licensing fees for the purpose of evaluating its effectiveness. Permission is also granted to all students using this procedure as part of an educational class. Episodes of IllnessThis paper provides a mathematical model for identifying episodes of care and measuring the severity of an episode. Measures of episodes of care, in general, and the approach proposed here, in particular, can be used to set capitation rates or to profile clinicians performance. Numerous approaches to measuring episodes of care exist. Examples include Prospective Risk Adjustment, Ambulatory Visit Groups, Disease Staging, Products of Ambulatory Care, Ambulatory Diagnosis Groups and Ambulatory Care Groups. In addition to broad approaches to measurement of episodes of illness, many have developed disease specific episodes of care. There have been a number of review articles about episodes of care. Given the wide range of approaches to measurement of episodes of care, the natural question to ask is why we need yet another approach? Three reasons have motivated us to seek a new approach to measuring episodes of care. First, we provide a mathematical model for measuring episodes of care. No other approach does so. Most existing approaches to measuring episodes of care do not describe the internal procedures used for measuring severity or identifying episodes of care. Some commercial approaches consider such information as business secrets that should not be disclosed. Even when they do describe the internal mechanism of the approach, all rely on heuristics that make clinical sense but do not provide a mathematical theory for the relation between the variables used in constructing episodes of care. Thus, researchers face a black box -- the content of which they know little about or is based on heuristics that they cannot easily modify and reapply. In the absence of a theory, it is difficult to learn from one study how better measures can be constructed. Each study and each approach exists on its own merits and fails to contribute to the other. Then researchers tend to compete on claims of accuracy rather than to build on each other's work. As a result, while many approaches exist, cumulative progress in the field where in one investigator builds on another person’s approach, has been limited. The mathematical theory proposed in this paper allows us to accumulate information and improve our understanding of how severity of episodes of care should be measured. Future researchers can change the theory to arrive at more accurate predictions. As new insights are found, the theory is modified and knowledge is accumulated. Second, our proposed approach does not classify diagnoses into clusters of diseases before identifying episodes of care. All existing approaches are built on the concept of classifying possible diagnoses into a few clusters and then findings rules for creating episodes for these clusters. Schneeweiss and colleagues reported that 92 diagnosis clusters make up 86 percent of all ambulatory visits. Others have expanded this set to 125, with varying levels of severity and different time periods, during which the diagnoses in the cluster will belong to the same episode. We propose an approach that does not attempt to reduce the large set of possible diagnoses into a smaller set of clusters. Reduction into smaller set of diseases, by definition, gives up important nuances in order to have a manageable set of diagnoses. For example, infections often follow wounds and therefore may be considered part of the same episode. But an otitis media, even though an infection of the ear, could not possibly be part of an episode of trauma to the leg. Defining all infections as one cluster of diagnoses forces investigators to ignore important differences that might exist between types of infections. In our approach, all operations are defined on individual diagnoses without need to pre-set these diagnoses into broad clusters. Sometimes classification of diseases into clusters is based on the etiology of the disease, leading in our view to counter intuitive classifications. An episode of trauma may include fracture to the knee as well as fracture of the leg even though the knee and leg are different problems. Congestive heart failure may be part of episode of myocardial infarction even though one involves the heart the other the lung. Two very dissimilar diagnoses may be part of the same episode, even though these diagnoses do not describe the same illness. Third, the objective of our proposed approach, contrary to some existing approaches is not to create homogenous resource use episodes. Thus, not all follow-up visits are part of the same episode even though they may all be short visits and therefore have similar resource use. In our approach, the nature of the diagnosis, not the intensity of visits, is the basis of classifying visits into episodes. For example, follow-up visit for myocardial infarction is part of an MI episode and a follow-up visit for trauma is part of trauma episode. Intensity-based measures cannot be used for evaluating whether the numbers of visits are appropriate. In essence, they are fee schedules, except that these fee schedules are based on groups of visits or diagnoses and not single visit diagnosis. We propose a relation-based episode classification system that remedies this important shortcoming. It can be used to judge appropriateness of number of visits. Proposed Method of Measuring Severity of EpisodesWe start our description of the methods with a few definitions. An episode of care is a group of diagnoses on the same patient that describes the course of a given illness. Note that this definition does not depend on the nature of services delivered, the doctor delivering services, or the site of services. Nor, contrary to others, does this definition assume that services are temporally contiguous. Thus, it allows for episodes to be overlapping; for example, a patient may have an acute exacerbation of their chronic diabetes and experience an episode of upper respiratory infection. Episodes are characterized by what we call "an anchor diagnosis." This is the diagnosis that gives its name to the episode. Episodes have starting (sometimes referred to as trigger diagnosis) and stopping points that may be different from the anchor diagnosis. Episodes are characterized by a rate of progression, a peak severity during the course of episode, and morbidity and mortality outcomes. One episode, for example, may have a rapid onset, progress to a very serious condition, and then lead to death. Another episode may have a slow onset and never become serious. Selecting diagnoses that are part of same episodeDefining an episode begins with selecting diagnoses that are part of the same episode. Imagine that a patient has had a series of diagnosis D1, D2, ... Dm at times T1, T2, through Tm. Whether two diagnoses are part of the same episode depends on the nature of the two diagnoses and the time between them. Two diagnoses that are similar or related in nature should be part of the same episode unless they occur at significantly different times. If we define Pia as the probability that the diagnosis “i and diagnosis “a belong to the same episode, then the theory suggests that: Pia= function {Tia, Sia} Where the similarity between the diagnosis “i” and diagnosis “a” is Sia and number of days between diagnosis “i” and diagnosis “a” is Tia and calculated as: Tia= Ta Ti and T ia > 0 Note that the probability of being part of the same episode, Pia, should be directly related to similarity of two diagnoses, Sia, and inversely related to, Tia, the time between the two diagnoses. A specific mathematical function that preserves these two relationships is: Pia= S ia / (1+βT ia) In the above equation, α and β are constants. When a patient presents with several diagnoses, then the probability that any two of the diagnoses may belong to an episode is calculated using the above formula. Later, these pair-wise probabilities of belonging to the same episode are used to classify diagnosis into groups -- using one of many widely available classification methods. For a specific example of a classification algorithm see Appendix at end of this paper. Severity of an EpisodeDiagnoses differ in terms of their severity. We show severity of diagnosis i as Sevi and calculate the overall severity of an episode by the following mathematical formula: Overall severity of episode = 1- п i (1 - Sevi ) There are many different mathematical formulas for aggregating severity of individual diagnosis to severity of an episode. The most common approach is to add or average the severity scores for each diagnosis. Adding scores is not appropriate, as episodes with few severe diagnoses would be scored lower than episodes with many non-severe diagnoses. Averaging is also not appropriate, as patients who have two diagnoses, one severe and the other not, will be rated lower than patients with just the severe diagnosis. Instead of adding or averaging the scores, we prefer using the above Multiplicative model. For example, if a patient has two diagnoses, one with severity score 0.9 and another with severity score 0.5, then the overall severity of the episode is calculated as: Overall severity for the patient = 1 -(1-0.9)*(1-0.5) = 0.95 Compared to the adding or the averaging formula, the multiplicative formula has several advantages: The influence of severe diagnoses on the overall score are not diluted by non-severe diagnoses and merely increasing the number of diagnoses will not necessarily result in high overall severity scores. Evaluation of the AlgorithmWe created a measure for severity of episodes of illness for developmentally delayed children who were enrolled in the Medicaid program of one Southeastern State. Developmentally delayed children use health services extensively. To reduce computational difficulties and without loss of generality, we randomly sampled 565 patients among the 3250 patients in the database. The data included both in-patient and outpatient Medicaid payments for the patient. The in-patient portion included both the health professionals billing and the institution's bills. On average, the State paid $9,296 per patient per year. The standard error of the cost was $2,238, reflecting large variation in cost of care across patients. Cost ranged from low of $29 (reflecting patients enrolled for portion of the year) to $884,967 per year. Estimating the time between two diagnoses, Tia, is easy and can be read directly from the database by taking the absolute value of the difference in dates of the two diagnoses. Estimating the similarity of the two diagnoses, Sia, was more difficult. A surrogate measure of similarity of two diagnoses is the number of times the two diagnoses co-occur within a specific time frame. The implicit assumption is that complications and related problems tend to occur in clusters. We calculated a score proportional to the likelihood that two diagnoses belong to the same episode by the formula provided earlier. We used this score to classify diagnoses into episodes in such a manner that diagnoses within one episode were more similar than diagnoses in different episodes. Appendix A gives a detailed example of how diagnoses were classified. The mean number of episodes was 147 (standard error = 320). Patients differed considerably in the number of episodes they had. We calculated the severity of each diagnosis as the average amount paid for the diagnosis. Severity and costs are not always related, especially when patients die before expensive services can be delivered. But in our database no patient passed away. Therefore, cost may have been a reasonable surrogate measure of severity. ResultsTo test the accuracy of our measures of episodes of care, we regressed cost of care on severity of the episode, number of episodes and interaction between number of episodes and severity of episodes. We measured cost of care by the amount the State paid for each encounter. Since patients' eligibility for Medicaid changes frequently, the amount paid by the State is only an approximate measure of total cost of care of the patient. To have one estimate of severity for a patient, we averaged the severity scores for each patient across all their episodes during the year. The averaged severity score ranged from 0.01 to 0.27. The mean was 0.03 (standard error = 0.001). Table 1 summarizes regression results.
The dependent variable was "the amount paid by the State". All three independent variables -- "the average severity of the episodes", "the number of episodes of the patient" and “the interaction between the severity and the number of episodes” -- were statistically significant predictors of the dependent variable at alpha levels lower than 0.001. The R-Squared adjusted by degrees of freedom was 53%. DiscussionData showed that episodes of care can be constructed from encounter databases. Furthermore, the proposed measure of episode of care explained a large percentage of variance in cost of care. The magnitude of the percent of variance explained by the measures reported here is of special interest. Many measures of severity and case mix report R2 values less than 10%. Because our approach explains a large percent of the variance, our confidence in the validity of our measure of severity of episodes is increased. One may expect the performance of the approach developed in this paper to deteriorate when parameters of the model are estimated from one database and applied to another unrelated database. Nevertheless the magnitude of percent of variations in objective data explained by our approach is so high that we are hopeful that even with drops in performance, our approach will remain relatively more accurate than many existing approaches. The approach proposed here can be used to construct episodes of care for specific diseases. Thus, if one investigator is interested in episodes for diabetes and another is interested in episodes of cancer, both can use the algorithm proposed here by pre-selecting patients with a particular disease. The most appealing part of the proposed approach is the ease with which the approach can be integrated with existing databases. The proposed mathematical model works on any administrative database, which has information on date of visit and diagnosis. Any person familiar with database operations can implement it. In addition, electronic medical record companies can use the algorithm proposed here to embed methods of analyzing performance of clinicians within their electronic record systems. Appendix A: An Algorithm for Classifying Severity of Episodes of Illness1) Start with a Table of patient unique identification number, diagnosis and time of diagnosis. Here is a small example database:
2) Create a query identifying for any pair of diagnoses the number of unique patients for whom the two diagnoses co-occur within 30 days. Note that the co-occurrence of diagnosis "a" and "b" does not depend on the order of which one comes first. Here is how the query will look like for the above example data:
3) For each patient conduct the following analysis: a) For the patient, when the same diagnosis occurs at two different time periods, rename the diagnoses into unique names -- usually a combination of the name and date of diagnosis. For example patient 1002 has the following data when renamed:
4) For the patient, measure the absolute value of the length of time between any pair of diagnoses for the patient, refer to this as time between any two diagnoses. For example for patient 1002 the time between two different diagnoses will be:
5) For the patient, look up the similarity of any pair of different diagnoses they have from step "2" and divide this by absolute value of the time between the two diagnoses, from step "b". Refer to this as the score. For example for the patient 1002 the results will be:
6) For the patient, standardized the score so that it ranges between 1 and zero by subtracting the minimum value from each score and dividing the results by the difference of maximum and minimum score. Refer to this as standardized score. For the patient 1002 the standardized score is as follows:
7) Classify different diagnoses into episodes by using the standardized score. The following is one classification procedure that could be used:
The result of the calculation for patient 1002 was three episodes:
Analyze DataPart A: Import data from the following files into four tables. From ptid.xls to Patients, from claims.xls to Claims, from icd.xls to ICD9, from cpt.xls to CPT table. Video► SWF►
Part B: Measure severity of illness using H-CUP data. Contact the instructor for access to the data. More► Please bring your work to class or email screen captures to the instructor. Email► PresentationsThere are several presentations for this lecture:
Narrated slides and video require use of Flash. Download►. MoreAn algorithm for classifying episodes-- Episodes of Care Classification -- Table and View Creation Script -- Author: Farrokh Alemi, Samuel Walters -- Last Update: 2/14/06 -- Format: ANSI SQL-92 -- See a database implementation Download► -- ------------------------------------------- -- This script creates the four tables necessary to execute the Episodes of care algorithm. -- This script only needs to be executed once to establish the tables. CREATE TABLE tblSimilarity( FirstDiagnosis varchar(50) NOT NULL, SecondDiagnosis varchar(50) NOT NULL, Similarity int NOT NULL); CREATE TABLE tblPatientDiagnosesGrouped( RecordID int NOT NULL, PatientID varchar(50) NOT NULL, DiagnosisDateTime datetime NOT NULL, DiagnosisCode varchar(250) NOT NULL); CREATE TABLE tblPatientDiagnoses( RecordID int IDENTITY(1,1) NOT NULL, PatientID varchar(50) NOT NULL, DiagnosisDateTime datetime NOT NULL, DiagnosisCode varchar(250) NOT NULL); CREATE TABLE tblEpisodeIndex( EpisodeRecordID int IDENTITY(1,1) NOT NULL, EpisodeID varchar(2500), PatientID varchar(50), DiagnosisCode varchar(50), DiagnosisDateTime datetime); -- ---------------------------------------------------- -- Classify Episodes of Care -- -- Author: Farrokh Alemi, Samuel Walters -- -- Last Modified: 2/14/06 -- -- ---------------------------------------------------- -- ---------------------------------------------------- -- SECTION I - Diagnosis Record Grouping -- Groups duplicate patient diagnoses by patient into single records to avoid processing -- duplicates and improve transaction performance. -- ---------------------------------------------------- -- Delete any grouped records from previous analyses DELETE FROM tblPatientDiagnosesGrouped; -- Insert new patient diagnoses records into grouped table, -- grouping by patient, diagnosis code, -- diagnosis date/time, and taking the last matching -- RecordID for all diagnosis records in the grouped row. INSERT INTO tblPatientDiagnosesGrouped( PatientID, DiagnosisCode, DiagnosisDateTime, RecordID) SELECT tblPatientDiagnoses.PatientID, tblPatientDiagnoses.DiagnosisCode, tblPatientDiagnoses.DiagnosisDateTime, Max(tblPatientDiagnoses.RecordID) FROM tblPatientDiagnoses GROUP BY tblPatientDiagnoses.PatientID, tblPatientDiagnoses.DiagnosisCode, tblPatientDiagnoses.DiagnosisDateTime ORDER BY tblPatientDiagnoses.DiagnosisDateTime; -- ---------------------------------------------------- -- SECTION II - Construct Similarity Index -- Constructs a table containing the number of times -- any diagnosis pair co-occurs for each patient. This -- table is used to calculate the similarity score -- for each diagnosis pair for each patient. -- ---------------------------------------------------- -- Delete all Similarity table entries from previous -- analyses. DELETE FROM tblSimilarity; -- Determine the number of times a pair of diagnoses -- co-occur for any given patient, and insert those -- counts into the similarity table with the -- diagnosis pairs. INSERT INTO tblSimilarity( FirstDiagnosis, SecondDiagnosis, Similarity) SELECT FirstDiagnosis, SecondDiagnosis, Count(PatientID) FROM (SELECT tblPatientDiagnosesGrouped1.DiagnosisCode AS FirstDiagnosis, tblPatientDiagnosesGrouped2.DiagnosisCode AS SecondDiagnosis, tblPatientDiagnosesGrouped1.PatientID FROM tblPatientDiagnosesGrouped as tblPatientDiagnosesGrouped1, tblPatientDiagnosesGrouped AS tblPatientDiagnosesGrouped2 WHERE tblPatientDiagnosesGrouped2.PatientID = tblPatientDiagnosesGrouped1.PatientID AND -- To adjust the sensitivity of this algorithm, modify the value "30" on the next line or the maximum number of days between diagnosis pairs) -- to allow more or less maximum days between diagnosis pairs. If this number is edited, the same number must be modified in -- Section III to account for the change. CONVERT(int, tblPatientDiagnosesGrouped2.DiagnosisDateTime tblPatientDiagnosesGrouped1.DiagnosisDateTime) <= 30 GROUP BY tblPatientDiagnosesGrouped1.DiagnosisCode, tblPatientDiagnosesGrouped2.DiagnosisCode, tblPatientDiagnosesGrouped1.PatientID HAVING tblPatientDiagnosesGrouped2.DiagnosisCode > tblPatientDiagnosesGrouped1.DiagnosisCode ) As GroupedPerPatient GROUP BY FirstDiagnosis, SecondDiagnosis; -- ---------------------------------------------------- -- SECTION III - Identify Episodes of Care -- This query uses the previously constructed similarity table to -- identify episodes of illness for all patients in grouped table. Paired diagnoses are -- identified as those for whom the similarity of the diagnosis pair divided by the -- elapsed time between the diagnoses is greater than or equal to the cutoff -- value, defined as the maximum similarity value in tblSimilarity divided by the maximum timeframe within -- which two diagnoses co-occur in the Similarity table (30 days unless altered above) plus one to avoid -- division by zero in cases where two diagnoses occur on the same date/time value. The resulting episode -- records are inserted into the table tblEpisodeIndex. -- ----------------------------------------------------- -- Delete any records remaining from previous analyses from the episode index table. DELETE FROM tblEpisodeIndex; -- Insert an episode record for each diagnosis belonging to an episode for a patient. Episodes are uniquely -- identified using EpisodeID. INSERT INTO tblEpisodeIndex (PatientID, DiagnosisCode, DiagnosisDateTime, EpisodeID) SELECT PatientID, DiagnosisCode, DiagnosisDateTime, Min(Episode) AS EpisodeID FROM (SELECT TOP 100 PERCENT tblPatientDiagnosesGrouped.PatientID, tblPatientDiagnosesGrouped.DiagnosisCode, tblPatientDiagnosesGrouped.DiagnosisDateTime, FirstDiagnosis, FirstDiagnosisDate, SecondDiagnosis, SecondDiagnosisDate, (CASE WHEN SecondDiagnosis Is Null THEN tblPatientDiagnosesGrouped.PatientID + tblPatientDiagnosesGrouped.DiagnosisCode +
CONVERT(varchar(50), ELSE tblPatientDiagnosesGrouped.PatientID +FirstDiagnosis + CONVERT(varchar(50),CONVERT(int,FirstDiagnosisDate)) END) AS Episode FROM tblPatientDiagnosesGrouped LEFT JOIN (SELECT TOP 100 PERCENT tblPatientDiagnosesGrouped1.PatientID, tblPatientDiagnosesGrouped1.DiagnosisCode AS FirstDiagnosis, tblPatientDiagnosesGrouped1.DiagnosisDateTime AS FirstDiagnosisDate, tblPatientDiagnosesGrouped2.DiagnosisCode as SecondDiagnosis, tblPatientDiagnosesGrouped2.DiagnosisDateTime as SecondDiagnosisDate, tblPatientDiagnosesGrouped1.RecordID, CONVERT (Float,tblSimilarity.Similarity)/(1 + CONVERT(float, tblPatientDiagnosesGrouped2.DiagnosisDateTime - tblPatientDiagnosesGrouped1.DiagnosisDateTime)) AS Score FROM (SELECT CONVERT(float,Max(Similarity))/31 as CutoffValue FROM tblSimilarity) as Cutoff, (tblSimilarity INNER JOIN tblPatientDiagnosesGrouped As tblPatientDiagnosesGrouped1 ON tblSimilarity.FirstDiagnosis = tblPatientDiagnosesGrouped1.DiagnosisCode) INNER JOIN tblPatientDiagnosesGrouped AS tblPatientDiagnosesGrouped2 ON tblSimilarity.SecondDiagnosis = tblPatientDiagnosesGrouped2.DiagnosisCode AND tblPatientDiagnosesGrouped1.PatientID = tblPatientDiagnosesGrouped2.PatientID WHERE tblPatientDiagnosesGrouped1.DiagnosisDateTime < tblPatientDiagnosesGrouped2.DiagnosisDateTime AND tblPatientDiagnosesGrouped1.RecordID < tblPatientDiagnosesGrouped2.RecordID AND CONVERT(float, tblSimilarity.Similarity)/(1 + CONVERT(float, tblPatientDiagnosesGrouped2.DiagnosisDateTime - tblPatientDiagnosesGrouped1.DiagnosisDateTime)) > CutoffValue ORDER BY tblPatientDiagnosesGrouped1.DiagnosisDateTime DESC, tblPatientDiagnosesGrouped2.DiagnosisCode DESC ) As LinkedDiagnoses
ON tblPatientDiagnosesGrouped.DiagnosisDateTime = LinkedDiagnoses.SecondDiagnosisDate AND tblPatientDiagnosesGrouped.DiagnosisCode = LinkedDiagnoses.SecondDiagnosis AND tblPatientDiagnosesGrouped.PatientID = LinkedDiagnoses.PatientID ORDER BY tblPatientDiagnosesGrouped.PatientID, tblPatientDiagnosesGrouped.DiagnosisDateTime ) as EpisodesUngrouped GROUP BY PatientID, DiagnosisCode, DiagnosisDateTime ORDER BY PatientID, Min(Episode); This page is part of the course on Healthcare Databases, lecture on Complex Queries This page was first prepared on January 2005 and last revised on 01/27/2017. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||