This is the lecture on conditional independence, part of a series of lectures intended to prepare you to do probabilistic risk analysis. This lecture introduces several methods of checking for conditional independence -- an important element of causal models. Joint Distribution A joint distribution shows the the probability of two events co-occurring. In complex risk analysis containing several hundred events, the entire analysis is built on joint distribution of pairs of related events. Typically one event is the cause and the other is the effect and the purpose of specifying a joint distribution is to clarify the strength of causal relationship. First Event Second Event Total Absent Present Absent a b a+b Present c d c+d Total a+c b+d a+b+c+d=1 Table 1: Joint Distribution of Two Events Table 1 shows the joint distribution of two events. The constant a, b, c, and d, show how many combination of the two events were observed. For example, a shows how many times the first and second event are both absent. For another example, the constant d shows how many times both first and second event are present. The data in Table 1 has been standardized so that a+b+c+d=1. If the data is not in the standard form dividing by the sum of a+b+c+d would accomplish the same goal. For example, suppose we observed the following frequency with which under staffing and medication errors co-occur. Medication Error Total No error Error Adequate staffing 5 8 13 Under staffed 7 15 22 Total� 12 23 35 Table 2: Number of Co-occurrences of Medication Error and Under Staffing In Table 2, there were 50 visits in which the clinic was adequately staffed and there were no medication errors. In contrast there were 15 visits in which the clinic was under staffed and there was medication errors. The first step is to put Table 2 in standard form by dividing all cell values by the total number of visits. Dividing by total number of visits fits our notion of how probability should be measured by the frequency of observing an event divided by the total number of possibilities. Here the total number of possibilities is the total number of visits and dividing by this number guarantees us that we have a probability function that follows the 4 axioms mentioned in the first lecture on probabilities. Medication Error Total No error Error Adequate staffing 0.63 0.10 0.73 Under staffed 0.09 0.19 0.28 Total 0.717 0.29 1 Table 3: Joint Distribution of Staffing & Medication Errors Table 3 shows the joint distribution of staffing and mediation errors in the four cells in the center of the table. For example, the joint probability of medication error and a visit in which the clinic was understaffed was 19%. On the right side and on the bottom row, the table shows the marginal distribution of each of the events. Marginal distribution shows the probability density function of an event by itself. For example, Table 3 shows that medication errors has a Bernoulli distribution with probability of an error being 29% and probability of no error being 71%. The frequency of understaffing is also another marginal distribution and is given in the right hand column. Notice that the marginal probabilities are column and row wide sums of joint probabilities. We had earlier shown you how conditional probability is calculated by reducing the universe of possibilities to all situations that has already happened. We can see this reduction in the universe of possibilities by calculating conditional probabilities for events in Table 3. If the analyst wishes to calculate a conditional probability, the total visits should be reduced to visits in which the condition has been met. Suppose the analyst wants to calculate the probability of medication error if we know that the clinic is understaffed, show as p(Medication error | Understaffed clinic). We need to reduce the visits to only when clinic was understaffed. This is done by dividing the row by the marginal probability of being understaffed. Medication Error Total No error Error Adequate staffing Under staffed 0.32 0.68 1.00 Total Table 4: Probability of Medication Error Given Understaffed Clinic We say the universe of possibilities has been reduced because now only visits in which clinic was understaffed are reported. Note that in Table 4, the probability of medication error in this reduced universe is 68%. The point of this example is that conditional probabilities can be calculated easily by reducing the universe of possibilities to the condition. Lecture on independence & Bayes odds form Definition of Independence In probabilities, the concept of independence has a very specific meaning. If two events are independent of each other, then the occurrence of one event does not tell us much about the occurrence of the other event. Mathematically, this condition can be presented as: P(A | B) = P(A) Independence means that the presence of one clue does not change the value of another clue. An example might be prevalence of diabetes and car accidents; knowing the probability of car accidents in a population will not tell us anything about the probability of diabetes. When two events are independent, we can calculate the probability of both co-occurring from the marginal probabilities of each event occurring: P(A&B) = P(A) * P(B) This is the same as to say that joint distribution is the product of marginal distributions. Assumption of independence Medication Error Total No error Error Adequate staffing 0.52 0.21 0.73 Under staffed 0.20 0.08 0.28 Total 0.71 0.29 1 Table 5: Joint Distribution Derived from Marginal Distributions Under assumption of independence of staffing and medication errors, the probability of each cell in Table 5 can be calculated as the product of the row and column's marginal values. Chi-square tests if the number of visits observed in each cell in Table 2 is related to number of visits predicted from assumption of independence in Table 5 (note that this table provides probabilities and you need to multiply these probabilities with the total number of visits to obtain the expected values for each cell). In Excel, if you want to compare observed to expected occurrences of an event, the following function can be used: =Chitest( actual range, expected range)) This function gives the probability of observing the two distributions by random chance. If this probability is less than 0.05, then we would reject the hypothesis that the two set of values are not related. The Chi-squared statistic is 23. In our example, the probability of finding such a high Chi-squared statistic is 0.0001 and therefore we reject the hypothesis that two variables are independent of each other. We could have of course also compared the probability of medication error with and without conditioning on understaffing. If the two variables are independent, conditioning on understaffing should not change the probability. This is not the case: P( Medication error ) ≠ P( Medication error| understaffing) 0.29 ≠ 0.68 Definition of Conditional Independence Conditional independence means that for a specific population, presence of one clue does not change the probability of another. Mathematically, this is shown as: P(A | B, C) = P(A | C) The above formula reads that if we know that C has occurred, telling us that B has occurred does not add any new information to the estimate of probability of event A. Another way of saying this is to say that in population C, knowing B does not tell us much about chance for A. As before, conditional independence allows us to calculate joint probabilities from marginal probabilities: P(A&B | C) = P(A | C) * P(B | C) The above formulas says that among the population C, the probability of A & B occurring is equal to the product of probability of each event occurring. It is possible for two events to be dependent, but when conditioned on the occurrence of a third event they may become independent of each other. For example, we can compare these statistics for long shifts and medication errors. Thus we show (≠ means not equal to): P( Medication error ) ≠ P( Medication error| Long shift) At the same time, we may consider that in the population of employees that are not fatigued (even though they have long shifts), the two events are independent of each other, i.e. P( Medication error | Long shift, Not fatigued) =P( Medication error| Not fatigued)� This example shows that related events may become independent under certain condition. Use of Conditional Independence Conditional probabilities allow us to think through a sequence of uncertain events. If each event can be conditioned on its predecessor, a chain of events can be examined. Then, if one component of the chain changes, we can calculate the impact of the change through out the chain. In this sense, conditional probabilities show how a series of related causes affect each other and subsequently affect a sentinel event. Independence and conditional Independence are invoked often to simplify calculation of complex likelihoods involving multiple events. We have already shown how independence facilitates the calculation of joint probabilities. The advantage of verifying independence becomes even more pronounced when examining more that two events. When calculating the likelihood associated with a series of clues, one has to progressively condition the the probabilities on the sequence of clues that are available: ` P(C1,C2,C3, ...,Cn|H1) = P(C1|H1) * P(C2|H1,C1) * P(C3|H1,C1,C2) * P(C4|H1,C1,C2,C3) * ... * P(Cn|H1,C1,C2,C3,...,Cn-1) Note that each term in the above formula is conditioned on previous events. Note that all terms are conditioned on the event we want to predict (shown as H). The first term is conditioned on no additional event; the second term is conditioned on the first clue/event; the third term is conditioned on the first and second clues/events and so on until the last term that is conditioned on all subsequent n-1 clues/events. If we stay with our analogy that conditioning is reducing the sample size to the portion of the sample that has the condition, then the above formula suggests a sequence for reducing the sample size. Because there are many events, the data has to be portioned in increasingly smaller size. Obviously, for data to be partitioned so many times, one needs a large database. Conditional independence allows us to calculate likelihood ratios associated with a series of events without needing large databases. Instead of conditioning the event on the hypothesis and all prior events, we can now ignore all prior events: P(C1,C2,C3, ...,Cn | H) = P(C1 | H) * P(C2 | H) * P(C3 | H) * P(C4 | H) * ... * P(Cn | H) Verifying Independence There are several ways to verify conditional independence. These include (1) reducing sample size, (2) correlations, (3) direct query from experts and (4) separation in causal maps. (1) Reducing sample size If data exist, conditional independence can be verified by selecting the population that has the condition and verifying that the product of marginal probabilities is equal joint probability of the two events. For example, in Table 3, eighteen cases from a special unit prone to medication errors are presented. The question is whether rate of medication errors is independent of length of work shift. Case Medication error Long shift Fatigue 1 No Yes Yes 2 No Yes Yes 3 No No Yes 4 No No Yes 5 Yes Yes Yes 6 Yes No Yes 7 Yes No Yes 8 Yes Yes Yes 9 No No No 10 No No No 11 No Yes No 12 No No No 13 No No No 14 No No No 15 No No No 16 No No No 17 Yes No No 18 Yes No No Table 5: Medication Errors in 18 Consecutive Cases Using the data in Table 3, the probability of medication error is calculated as: P( Error) = Number of cases with errors / Number of cases = 6/18 = 0.33 P( Long shift) = Number of cases seen by a provider in a long shift / number of cases = 5/18 =0.28 P( Error & Long shift) = Number of cases with errors & long shift / Number of cases = 2/18 =0.11 P( Error & Long shift) = 0.11 ≠ .09 = 0.33 * 0.28 = P( Error) * P( Long shift) Above calculations show that probability of medication error and length of shift are not independent of each other. Knowing the length of the shift tells us something about the probability of error in that shift. But consider the situation where we are examining these two events among cases where the provider was fatigued. Now the population of cases we are examining is reduced to the cases 1 through 8. With this population, calculation of the probabilities yields: P( Error | Fatigued) = 0.50 P( Long shift | Fatigued) = 0.50 P( Error & Long shift | Fatigued) = 0.25 P( Error & Long shift | Fatigued) = 0.25 = 0.50 * 0.50 =P( Error | Fatigued) * P( Long shift | Fatigued) Among fatigued providers, medication error is independent of length of work shift. The procedures used in this example, namely calculating the joint probability and examining to see if it is approximately equal to product of the marginal is one way of verifying independence. Independence can also be examined by calculating conditional probabilities. As before conditional probabilities are calculated by restricting the population size. For example, in the population of fatigued providers (i.e. in cases 1 through 8) there are several cases of working in long shift (i.e. in cases 1, 2, 5, and 8). We can use this information to calculate conditional probabilities: P( Error | Fatigue) = 0.50 P( Error | Fatigue & Long shift) = 2/4 = 0.50 Again we observe that among fatigued workers knowing that the work shift was long adds no information to the probability of medication error. The above procedures shows how independence can be verified by counting cases in reduced populations. In case there is considerable amount of data available inside a database, the approach can easily be implemented by using Standard Query Language. To calculate the conditional probability of an event, all we need to do is to run a select query that would select the condition and count the number of events of interest. (2) Correlation Analysis One way for verifying independence so is to examine the correlations. Two events that are correlated are dependent. For example in Table 4, we can examine the relationship between age and blood pressure by calculating the correlation between these two variables. Case Age BP Weight 1 35 140 200 2 30 130 185 3 19 120 180 4 20 111 175 5 17 105 170 6 16 103 165 7 20 102 155 Table 4: Relationship between Age and Blood Pressure in 7 Patients The correlation between the age and blood pressure, in the sample of data in Table 4, is 0.91. This correlation is relatively high and suggests that knowing something about the age of the person will tell us a great deal about the blood pressure. Therefore, age and blood pressure are dependent in our sample. Correlations can also be used to verify conditional independence. To examine independence of event A and B, in population where event C has occurred, we need to introduce three pair wise correlations. Assume: Rab is the correlation between A and B Rac is the correlation between events A and C, and Rcb is the correlation between event C and B Events A and B are conditionally independent of each other if the "Vanishing Partial Correlation" condition holds. This condition states: Rab= Rac Rcb Using the data in Table 4, we calculate the following correlations: Rage, blood pressure = 0.91 Rage, weight = 0.82 R weight, blood pressure = 0.95 Examination of the data shows that the vanishing partial correlation holds (~ means approximate equality): Rage, blood pressure = 0.91 ~ 0.82 * 0.95 = Rage, weight * R weight, blood pressure Therefore, we can conclude that given the patients' weight, the variables age and blood pressure are independent of each other because they have a partial correlation of zero. (3) Directly Expert Queries It is not always possible to gather data. Sometimes independence must be verified subjectively by asking about the relationship among the variables from a knowledgeable expert. Unconditional independence can be verified by asking the expert to tell if knowledge of one event will tell us a lot about the likelihood of another. Conditional independence can be verified by repeating the same task but now within specific populations. Gustafson and others (1973a) described a procedure for assessing independence by directly querying experts: Write each event on a 3 x 5 card. Ask each expert to assume a specific population in which a condition has been met (i.e. an event has occurred). Ask the expert to pair the cards if knowing the value of one event will make it considerably easier to estimate the value of the other. Repeat these steps for other populations. If several experts are involved, ask them to present their clustering of cards to each other.) Have experts discuss any areas of disagreement, and remind them that only major dependencies should be clustered. Use majority rule to choose the final clusters. (To be accepted, a cluster must be approved by the majority of experts.) Experts will have in mind different, sometimes wrong, notions of dependence, so the words conditional dependence should be avoided. Instead, we focus on whether one clue tells us a lot about the influence of another clue in specific populations. We find that experts are more likely to understand this line of questioning as opposed to directly asking them to verify conditional independence. (4) Separation in Causal Maps One can assess dependencies through analyzing maps of causal relationships. In a causal network each node describes an event. The directed arcs between the nodes depict how one event causes another. Causal networks work for situations here there is no cyclical relationship among the variables; it is not possible to start from a node and follow the arcs and return to the same node. An expert is asked to draw a causal network of the events. If the expert can do so, then conditional dependence can be verified by the position of nodes and the arcs. Several rules can be used to identify conditional dependencies in a causal network (Pearl, 1998, p117). These rules include the following: Any two nodes connected by an arrow are dependent. Cause and immediate consequence are dependent. Multiple cause of same effect are dependent as knowing the effect and one of the causes will tell us more about the probability of other causes. If a cause leads to an intermediary event that subsequently affects a consequence, then the consequence is independent of the cause for a given level of the intermediary event. If one cause leads to multiple consequences, the consequences are conditionally independent of each other given the cause. In the above rules, we assume that removing the condition will actually remove the path between the independent events. "If removal of node C renders nodes A and B disconnected from each other, then A and B are proclaimed independent from each other given C. Another way to say this is to observe that event C is between events A and B and there is no way of following the arcs from A to B without passing through C. In this situation, P(A | B,C) = P(A | C), A is independent of B given C. For example, an expert may provide the map in Figure 5 for the relationships between age, weight and blood pressure. Figure 5: A Causal Map for Relationship of Age and Blood Pressure In this Figure, age and weight are shown to depend on each other. Age and blood pressure are show to be conditionally independent of each other because there is no way of going from one to the other without passing through the weight node. Note that if there was an arc between age and blood pressure, i.e. if the expert believed that there was a direct relationship between these two variables, then conditional independence would be violated. Analysis of causal maps can help identify a large number of independencies among the events being considered. We will present more details and examples for using causal models to verify independence when we discuss root cause analysis and modeling uncertainty in subsequent chapters. .� What Do You Know? Advanced learners like you, often need different ways of understanding a topic. Reading is just one way of understanding. Another way is through writing. When you write you not only recall what you have written but also may need to make inferences about what you have read. Please complete the following assessment: Using the following Table, Case Hospitalized? Gender Age Insured 1 Yes Male >65 Yes 2 Yes Male <65 Yes 3 Yes Female >65 Yes 4 Yes Female <65 No 5 No Male >65 No 6 No Male <65 No 7 No Female >65 No 8 No Female <65 No What is the probability of hospitalization given that you are male? Is insurance independent of age? What is the likelihood associated of being more than 65 years old among hospitalized patients? Please note that this is not the same as the probability of being hospitalized given you are 65 years old. In predicting hospitalization, what is the likelihood ratio associated with being 65 years old? What is the prior odds for hospitalization before any other information is available? Analyze the data in the Table and report if any two variables are conditionally independent of each other in predicting probability of hospitalization? To accomplish this you need to calculate the likelihood ratio associated with the following clues: Male, >65, Insured, Male &>65, Male & Insured, >65 & Insured, Male &>65 & Insured. Then you can see if adding a piece of information changes the likelihood ratio. Keep in mind that because the number of cases are too few, many ratios cannot be calculated. See it done(SWF file) in a different but similar data set. Draw what causes medication errors on a piece of paper, with each cause in a separate node and arrows showing the direction of causality. List all root causes, their immediate effects (showing a chain of cause and effects) until it leads to a medication error. Analyze the graph you have produced and list all conditional dependencies inherent in the graph. Calculating conditional probabilities by shrinking the universe (SWF file) Presentations To assist you in reviewing the material in this lecture, please see the following resources: See the slides for assessing conditional independence. Listen to the lecture on conditional probabilities. The same lecture is broken into four parts and presented below. Part 1: Introduction to Conditional Independence Part 2:t� t Definition of Conditional Independence Part 3: Verifying Conditional Independence Part 4: An example of verifying conditional independence: See a video on how to use Excel to calculate conditional probabilities by shrinking the universe (SWF file) Listen to lecture on subjective probabilities (SWF file) Listen to the lecture on independence Bayes odds form (SWF file) Narrated lectures require use of Flash. More For example of causal analysis in medicine see http://eric.univ-lyon2.fr/~pkdd2000/Download/DC7.pdf Dr. Korb's lecture on Bayesian Networks is at http://www.csse.monash.edu.au/~korb/subjects/2-3309/Lectures/L18/L18-4.pdf To test if your causal network is d-separated go to http://www.andrew.cmu.edu/user/wimberly/dsep/dSep.html Recent advances in assessing subjective probabilities from experts opinions can be found at http://scholar.google.com/scholar?q=assessing+probabilities+experts&ie=UTF-8&oe=UTF-8&hl=en For a listing of personal web pages of Bayesian analyst see http://www.bayesian.org/bayespeople.html Copyright © 1996 Farrokh Alemi, Ph.D. Most recent revision 10/22/11 This page is part of a course on Risk Analysis
This is the lecture on conditional independence, part of a series of lectures intended to prepare you to do probabilistic risk analysis. This lecture introduces several methods of checking for conditional independence -- an important element of causal models.
A joint distribution shows the the probability of two events co-occurring. In complex risk analysis containing several hundred events, the entire analysis is built on joint distribution of pairs of related events. Typically one event is the cause and the other is the effect and the purpose of specifying a joint distribution is to clarify the strength of causal relationship.
First Event
Second Event
Total
Absent
Present
a
b
a+b
c
d
c+d
a+c
b+d
a+b+c+d=1
Table 1: Joint Distribution of Two Events
Table 1 shows the joint distribution of two events. The constant a, b, c, and d, show how many combination of the two events were observed. For example, a shows how many times the first and second event are both absent. For another example, the constant d shows how many times both first and second event are present. The data in Table 1 has been standardized so that a+b+c+d=1. If the data is not in the standard form dividing by the sum of a+b+c+d would accomplish the same goal.
For example, suppose we observed the following frequency with which under staffing and medication errors co-occur.
Table 2: Number of Co-occurrences of Medication Error and Under Staffing
In Table 2, there were 50 visits in which the clinic was adequately staffed and there were no medication errors. In contrast there were 15 visits in which the clinic was under staffed and there was medication errors. The first step is to put Table 2 in standard form by dividing all cell values by the total number of visits. Dividing by total number of visits fits our notion of how probability should be measured by the frequency of observing an event divided by the total number of possibilities. Here the total number of possibilities is the total number of visits and dividing by this number guarantees us that we have a probability function that follows the 4 axioms mentioned in the first lecture on probabilities.
Table 3: Joint Distribution of Staffing & Medication Errors
Table 3 shows the joint distribution of staffing and mediation errors in the four cells in the center of the table. For example, the joint probability of medication error and a visit in which the clinic was understaffed was 19%. On the right side and on the bottom row, the table shows the marginal distribution of each of the events. Marginal distribution shows the probability density function of an event by itself. For example, Table 3 shows that medication errors has a Bernoulli distribution with probability of an error being 29% and probability of no error being 71%. The frequency of understaffing is also another marginal distribution and is given in the right hand column. Notice that the marginal probabilities are column and row wide sums of joint probabilities.
We had earlier shown you how conditional probability is calculated by reducing the universe of possibilities to all situations that has already happened. We can see this reduction in the universe of possibilities by calculating conditional probabilities for events in Table 3. If the analyst wishes to calculate a conditional probability, the total visits should be reduced to visits in which the condition has been met. Suppose the analyst wants to calculate the probability of medication error if we know that the clinic is understaffed, show as p(Medication error | Understaffed clinic). We need to reduce the visits to only when clinic was understaffed. This is done by dividing the row by the marginal probability of being understaffed.
Table 4: Probability of Medication Error Given Understaffed Clinic
We say the universe of possibilities has been reduced because now only visits in which clinic was understaffed are reported. Note that in Table 4, the probability of medication error in this reduced universe is 68%. The point of this example is that conditional probabilities can be calculated easily by reducing the universe of possibilities to the condition.
Lecture on independence & Bayes odds form
In probabilities, the concept of independence has a very specific meaning. If two events are independent of each other, then the occurrence of one event does not tell us much about the occurrence of the other event. Mathematically, this condition can be presented as:
P(A | B) = P(A)
Independence means that the presence of one clue does not change the value of another clue. An example might be prevalence of diabetes and car accidents; knowing the probability of car accidents in a population will not tell us anything about the probability of diabetes.
When two events are independent, we can calculate the probability of both co-occurring from the marginal probabilities of each event occurring:
P(A&B) = P(A) * P(B)
This is the same as to say that joint distribution is the product of marginal distributions.
Table 5: Joint Distribution Derived from Marginal Distributions
Under assumption of independence of staffing and medication errors, the probability of each cell in Table 5 can be calculated as the product of the row and column's marginal values. Chi-square tests if the number of visits observed in each cell in Table 2 is related to number of visits predicted from assumption of independence in Table 5 (note that this table provides probabilities and you need to multiply these probabilities with the total number of visits to obtain the expected values for each cell). In Excel, if you want to compare observed to expected occurrences of an event, the following function can be used:
=Chitest( actual range, expected range))
This function gives the probability of observing the two distributions by random chance. If this probability is less than 0.05, then we would reject the hypothesis that the two set of values are not related. The Chi-squared statistic is 23. In our example, the probability of finding such a high Chi-squared statistic is 0.0001 and therefore we reject the hypothesis that two variables are independent of each other.
We could have of course also compared the probability of medication error with and without conditioning on understaffing. If the two variables are independent, conditioning on understaffing should not change the probability. This is not the case:
P( Medication error ) ≠ P( Medication error| understaffing)
0.29 ≠ 0.68
Conditional independence means that for a specific population, presence of one clue does not change the probability of another. Mathematically, this is shown as:
P(A | B, C) = P(A | C)
The above formula reads that if we know that C has occurred, telling us that B has occurred does not add any new information to the estimate of probability of event A. Another way of saying this is to say that in population C, knowing B does not tell us much about chance for A. As before, conditional independence allows us to calculate joint probabilities from marginal probabilities:
P(A&B | C) = P(A | C) * P(B | C)
The above formulas says that among the population C, the probability of A & B occurring is equal to the product of probability of each event occurring. It is possible for two events to be dependent, but when conditioned on the occurrence of a third event they may become independent of each other. For example, we can compare these statistics for long shifts and medication errors. Thus we show (≠ means not equal to):
P( Medication error ) ≠ P( Medication error| Long shift)
At the same time, we may consider that in the population of employees that are not fatigued (even though they have long shifts), the two events are independent of each other, i.e.
P( Medication error | Long shift, Not fatigued) =P( Medication error| Not fatigued)�
This example shows that related events may become independent under certain condition.
Conditional probabilities allow us to think through a sequence of uncertain events. If each event can be conditioned on its predecessor, a chain of events can be examined. Then, if one component of the chain changes, we can calculate the impact of the change through out the chain. In this sense, conditional probabilities show how a series of related causes affect each other and subsequently affect a sentinel event.
Independence and conditional Independence are invoked often to simplify calculation of complex likelihoods involving multiple events. We have already shown how independence facilitates the calculation of joint probabilities. The advantage of verifying independence becomes even more pronounced when examining more that two events. When calculating the likelihood associated with a series of clues, one has to progressively condition the the probabilities on the sequence of clues that are available: `
P(C1,C2,C3, ...,Cn|H1) = P(C1|H1) * P(C2|H1,C1) * P(C3|H1,C1,C2) * P(C4|H1,C1,C2,C3) * ... * P(Cn|H1,C1,C2,C3,...,Cn-1)
Note that each term in the above formula is conditioned on previous events. Note that all terms are conditioned on the event we want to predict (shown as H). The first term is conditioned on no additional event; the second term is conditioned on the first clue/event; the third term is conditioned on the first and second clues/events and so on until the last term that is conditioned on all subsequent n-1 clues/events. If we stay with our analogy that conditioning is reducing the sample size to the portion of the sample that has the condition, then the above formula suggests a sequence for reducing the sample size. Because there are many events, the data has to be portioned in increasingly smaller size. Obviously, for data to be partitioned so many times, one needs a large database.
Conditional independence allows us to calculate likelihood ratios associated with a series of events without needing large databases. Instead of conditioning the event on the hypothesis and all prior events, we can now ignore all prior events:
P(C1,C2,C3, ...,Cn | H) = P(C1 | H) * P(C2 | H) * P(C3 | H) * P(C4 | H) * ... * P(Cn | H)
There are several ways to verify conditional independence. These include (1) reducing sample size, (2) correlations, (3) direct query from experts and (4) separation in causal maps.
If data exist, conditional independence can be verified by selecting the population that has the condition and verifying that the product of marginal probabilities is equal joint probability of the two events. For example, in Table 3, eighteen cases from a special unit prone to medication errors are presented. The question is whether rate of medication errors is independent of length of work shift.
Using the data in Table 3, the probability of medication error is calculated as:
P( Error) = Number of cases with errors / Number of cases = 6/18 = 0.33 P( Long shift) = Number of cases seen by a provider in a long shift / number of cases = 5/18 =0.28 P( Error & Long shift) = Number of cases with errors & long shift / Number of cases = 2/18 =0.11 P( Error & Long shift) = 0.11 ≠ .09 = 0.33 * 0.28 = P( Error) * P( Long shift)
Above calculations show that probability of medication error and length of shift are not independent of each other. Knowing the length of the shift tells us something about the probability of error in that shift. But consider the situation where we are examining these two events among cases where the provider was fatigued. Now the population of cases we are examining is reduced to the cases 1 through 8. With this population, calculation of the probabilities yields:
P( Error | Fatigued) = 0.50 P( Long shift | Fatigued) = 0.50 P( Error & Long shift | Fatigued) = 0.25 P( Error & Long shift | Fatigued) = 0.25 = 0.50 * 0.50 =P( Error | Fatigued) * P( Long shift | Fatigued)
Among fatigued providers, medication error is independent of length of work shift. The procedures used in this example, namely calculating the joint probability and examining to see if it is approximately equal to product of the marginal is one way of verifying independence.
Independence can also be examined by calculating conditional probabilities. As before conditional probabilities are calculated by restricting the population size. For example, in the population of fatigued providers (i.e. in cases 1 through 8) there are several cases of working in long shift (i.e. in cases 1, 2, 5, and 8). We can use this information to calculate conditional probabilities:
P( Error | Fatigue) = 0.50 P( Error | Fatigue & Long shift) = 2/4 = 0.50
Again we observe that among fatigued workers knowing that the work shift was long adds no information to the probability of medication error. The above procedures shows how independence can be verified by counting cases in reduced populations. In case there is considerable amount of data available inside a database, the approach can easily be implemented by using Standard Query Language. To calculate the conditional probability of an event, all we need to do is to run a select query that would select the condition and count the number of events of interest.
One way for verifying independence so is to examine the correlations. Two events that are correlated are dependent. For example in Table 4, we can examine the relationship between age and blood pressure by calculating the correlation between these two variables.
Table 4: Relationship between Age and Blood Pressure in 7 Patients
The correlation between the age and blood pressure, in the sample of data in Table 4, is 0.91. This correlation is relatively high and suggests that knowing something about the age of the person will tell us a great deal about the blood pressure. Therefore, age and blood pressure are dependent in our sample.
Correlations can also be used to verify conditional independence. To examine independence of event A and B, in population where event C has occurred, we need to introduce three pair wise correlations. Assume:
Rab
is the correlation between A and B
Rac
is the correlation between events A and C, and
Rcb
is the correlation between event C and B
Events A and B are conditionally independent of each other if the "Vanishing Partial Correlation" condition holds. This condition states:
Rab= Rac Rcb
Using the data in Table 4, we calculate the following correlations:
Rage, blood pressure
= 0.91
Rage, weight
= 0.82
R weight, blood pressure
= 0.95
Examination of the data shows that the vanishing partial correlation holds (~ means approximate equality):
Rage, blood pressure = 0.91 ~ 0.82 * 0.95 = Rage, weight * R weight, blood pressure
Therefore, we can conclude that given the patients' weight, the variables age and blood pressure are independent of each other because they have a partial correlation of zero.
It is not always possible to gather data. Sometimes independence must be verified subjectively by asking about the relationship among the variables from a knowledgeable expert. Unconditional independence can be verified by asking the expert to tell if knowledge of one event will tell us a lot about the likelihood of another. Conditional independence can be verified by repeating the same task but now within specific populations. Gustafson and others (1973a) described a procedure for assessing independence by directly querying experts:
Write each event on a 3 x 5 card.
Ask each expert to assume a specific population in which a condition has been met (i.e. an event has occurred).
Ask the expert to pair the cards if knowing the value of one event will make it considerably easier to estimate the value of the other.
Repeat these steps for other populations.
If several experts are involved, ask them to present their clustering of cards to each other.)
Have experts discuss any areas of disagreement, and remind them that only major dependencies should be clustered.
Use majority rule to choose the final clusters. (To be accepted, a cluster must be approved by the majority of experts.)
Experts will have in mind different, sometimes wrong, notions of dependence, so the words conditional dependence should be avoided. Instead, we focus on whether one clue tells us a lot about the influence of another clue in specific populations. We find that experts are more likely to understand this line of questioning as opposed to directly asking them to verify conditional independence.
One can assess dependencies through analyzing maps of causal relationships. In a causal network each node describes an event. The directed arcs between the nodes depict how one event causes another. Causal networks work for situations here there is no cyclical relationship among the variables; it is not possible to start from a node and follow the arcs and return to the same node. An expert is asked to draw a causal network of the events. If the expert can do so, then conditional dependence can be verified by the position of nodes and the arcs. Several rules can be used to identify conditional dependencies in a causal network (Pearl, 1998, p117). These rules include the following:
In the above rules, we assume that removing the condition will actually remove the path between the independent events. "If removal of node C renders nodes A and B disconnected from each other, then A and B are proclaimed independent from each other given C. Another way to say this is to observe that event C is between events A and B and there is no way of following the arcs from A to B without passing through C. In this situation, P(A | B,C) = P(A | C), A is independent of B given C.
For example, an expert may provide the map in Figure 5 for the relationships between age, weight and blood pressure.
Figure 5: A Causal Map for Relationship of Age and Blood Pressure
In this Figure, age and weight are shown to depend on each other. Age and blood pressure are show to be conditionally independent of each other because there is no way of going from one to the other without passing through the weight node. Note that if there was an arc between age and blood pressure, i.e. if the expert believed that there was a direct relationship between these two variables, then conditional independence would be violated. Analysis of causal maps can help identify a large number of independencies among the events being considered. We will present more details and examples for using causal models to verify independence when we discuss root cause analysis and modeling uncertainty in subsequent chapters. .�
Advanced learners like you, often need different ways of understanding a topic. Reading is just one way of understanding. Another way is through writing. When you write you not only recall what you have written but also may need to make inferences about what you have read. Please complete the following assessment:
Using the following Table,
To assist you in reviewing the material in this lecture, please see the following resources:
See a video on how to use Excel to calculate conditional probabilities by shrinking the universe (SWF file)
Listen to lecture on subjective probabilities (SWF file)
Listen to the lecture on independence Bayes odds form (SWF file)
Narrated lectures require use of Flash.