 # Conditional Independence

This is the lecture on conditional independence, part of a series of lectures  intended to prepare you to do probabilistic risk analysis. This lecture  introduces several methods of checking for conditional independence -- an  important element of causal models.

# Joint  Distribution

A joint distribution shows the the probability of two events co-occurring.   In complex risk analysis containing several hundred events, the entire analysis  is built on joint distribution of pairs of related events.  Typically one  event is the cause and the other is the effect and the purpose of specifying a  joint distribution is to clarify the strength of causal relationship.

 First Event Second Event Total Absent Present Absent a b a+b Present c d c+d Total a+c b+d a+b+c+d=1 Table 1:  Joint Distribution of Two Events

Table 1 shows the joint distribution of two events.  The constant a, b,  c, and d, show how many combination of the two events were observed.  For  example, a shows how many times the first and second event are both absent.   For another example, the constant d shows how many times both first and second  event are present.  The data in Table 1 has been standardized so that  a+b+c+d=1.  If the data is not in the standard form dividing by the sum of  a+b+c+d would accomplish the same goal.

For example, suppose we observed the following frequency with which under staffing and medication errors co-occur.

 Medication Error Total No error Error Adequate staffing 5 8 13 Under staffed 7 15 22 Total� 12 23 35 Table 2:  Number of Co-occurrences of Medication Error and Under Staffing

In Table 2, there were 50 visits in which the clinic was adequately staffed  and there were no medication errors.  In contrast there were 15 visits in  which the clinic was under staffed and there was medication errors.  The  first step is to put Table 2 in standard form by dividing all cell values by the  total number of visits.  Dividing by total number of visits fits our notion  of how probability should be measured by the frequency of observing an event  divided by the total number of possibilities.  Here the total number of  possibilities is the total number of visits and dividing by this number  guarantees us that we have a probability function that follows the 4 axioms  mentioned in the first lecture on probabilities.

 Medication Error Total No error Error Adequate staffing 0.63 0.10 0.73 Under staffed 0.09 0.19 0.28 Total 0.717 0.29 1 Table 3: Joint Distribution of Staffing & Medication Errors

Table 3 shows the joint distribution of staffing and mediation errors in the  four cells in the center of the table.  For example, the joint probability  of medication error and a visit in which the clinic was understaffed was 19%.   On the right side and on the bottom row, the table shows the marginal  distribution of each of the events.  Marginal distribution shows the  probability density function of an event by itself.  For example, Table 3  shows that medication errors has a Bernoulli distribution with probability of an  error being 29% and probability of no error being 71%.  The frequency of  understaffing is also another marginal distribution and is given in the right  hand column.  Notice that the marginal probabilities are column and row  wide sums of joint probabilities.

We had earlier shown you how conditional probability is calculated by  reducing the universe of possibilities to all situations that has already  happened.  We can see this reduction in the universe of possibilities by  calculating conditional probabilities for events in Table 3.  If the  analyst wishes to calculate a conditional probability, the total visits should  be reduced to visits in which the condition has been met.  Suppose the  analyst wants to calculate the probability of medication error if we know that  the clinic is understaffed, show as p(Medication error | Understaffed clinic).   We need to reduce the visits to only when clinic was understaffed.  This is  done by dividing the row by the marginal probability of being understaffed.

 Medication Error Total No error Error Adequate staffing Under staffed 0.32 0.68 1.00 Total Table 4: Probability of Medication Error Given Understaffed Clinic

We say the universe of possibilities has been reduced because now only visits  in which clinic was understaffed are reported.  Note that in Table 4, the  probability of medication error in this reduced universe is 68%.  The point  of this example is that conditional probabilities can be calculated easily by  reducing the universe of possibilities to the condition. # Definition of  Independence

In probabilities, the concept of independence has a very  specific meaning.  If two events are independent of each other, then the  occurrence of one event does not tell us much about the occurrence of the other  event.  Mathematically, this condition can be presented as:

P(A | B) = P(A)

Independence means that the presence of one clue does not change the value of  another clue.  An example might be prevalence of diabetes and car  accidents; knowing the probability of car accidents in a population will not  tell us anything about the probability of diabetes.

When two events are independent, we can calculate the  probability of both co-occurring from the marginal probabilities of each event  occurring:

P(A&B) = P(A) * P(B)

This is the same as to say that joint distribution is the product of marginal  distributions.

 Assumption of independence Medication Error Total No error Error Adequate staffing 0.52 0.21 0.73 Under staffed 0.20 0.08 0.28 Total 0.71 0.29 1 Table 5:  Joint Distribution Derived from Marginal Distributions

Under  assumption of independence of staffing and medication errors, the probability of  each cell in Table 5 can be calculated as the product of the row and column's  marginal values.  Chi-square tests if the number of visits observed in each  cell in Table 2 is related to number of visits predicted from assumption of  independence in Table 5 (note that this table provides probabilities and you  need to multiply these probabilities with the total number of visits to obtain  the expected values for each cell).  In Excel, if you want to compare  observed to expected occurrences of an event, the following function can be  used:

=Chitest( actual range,  expected range))

This  function gives the probability of observing the two distributions by random  chance.  If this probability is less than 0.05, then we would reject the  hypothesis that the two set of values are not related.  The Chi-squared  statistic is 23.  In our example, the probability of finding such a high  Chi-squared statistic is 0.0001 and therefore we reject the hypothesis that two  variables are independent of each other.

We could  have of course also compared the probability of medication error with and  without conditioning on understaffing.  If the two variables are  independent, conditioning on understaffing should not change the probability.   This is not the case:

P( Medication error ) P( Medication error| understaffing)

0.29 0.68

# Definition of Conditional Independence

Conditional independence means that for a specific population, presence of one  clue does not change the probability of another.  Mathematically, this is  shown as:

P(A | B, C) = P(A | C)

The above formula reads that if we know that C has occurred,  telling us that B has occurred does not add any new information to the estimate  of probability of event A.  Another way of saying this is to say that in  population C, knowing B does not tell us much about chance for A.   As  before, conditional independence allows us to calculate joint probabilities from   marginal probabilities:

P(A&B | C) = P(A | C) * P(B | C)

The above formulas says that  among the population C, the probability of A & B occurring is equal to the  product of probability of each event occurring.   It is possible for  two events to be dependent, but when conditioned on the occurrence of a third  event they may become independent of each other.  For example, we can  compare these statistics for long shifts and medication errors.  Thus we  show (≠ means not equal to):

P( Medication error ) P( Medication error| Long shift)

At the same time, we may consider that in the  population of employees that are not fatigued (even though they have long  shifts), the two events are independent of each other, i.e.

P( Medication error | Long shift, Not fatigued) =P( Medication error| Not fatigued)�

This example shows that related events may  become independent under certain condition.

# Use of Conditional Independence

Conditional probabilities allow us to think through a sequence of uncertain  events.  If each event can be conditioned on its predecessor, a chain of  events can be examined.  Then, if one component of the chain changes, we  can calculate the impact of the change through out the chain.  In this  sense, conditional probabilities show how a series of related causes affect each  other and subsequently affect a sentinel event.

Independence and conditional Independence are  invoked often to simplify calculation of complex likelihoods involving multiple  events.  We have already shown how independence facilitates the calculation  of joint probabilities.  The advantage of verifying independence becomes  even more pronounced when examining more that two events.  When calculating  the likelihood associated with a series of clues, one has to progressively  condition the the probabilities on the sequence of clues that are available: `

P(C1,C2,C3,  ...,Cn|H1)  = P(C1|H1)  * P(C2|H1,C1)  *
P(C3|H1,C1,C2)  * P(C4|H1,C1,C2,C3)  * ... * P(Cn|H1,C1,C2,C3,...,Cn-1)

Note that each term in the above formula is conditioned on previous events.   Note that all terms are conditioned on the event we want to predict (shown as  H).  The first term is conditioned on no additional event; the second term  is conditioned on the first clue/event; the third term is conditioned on the  first and second clues/events and so on until the last term that is conditioned  on all subsequent n-1 clues/events.  If we stay with our analogy that  conditioning is reducing the sample size to the portion of the sample that has  the condition, then the above formula suggests a sequence for reducing the  sample size.  Because there are many events, the data has to be portioned  in increasingly smaller size.  Obviously, for data to be partitioned so  many times, one needs a large database.

Conditional independence allows us to calculate likelihood ratios associated  with a series of events without needing large databases.  Instead of  conditioning the event on the hypothesis and all prior events, we can now ignore  all prior events:

P(C1,C2,C3,  ...,Cn | H) = P(C1 | H) * P(C2 | H) *  P(C3 | H) * P(C4 | H) * ... * P(Cn | H)

# Verifying Independence

There are  several ways to verify conditional independence.  These include (1)  reducing sample size, (2) correlations, (3) direct query from experts and (4)  separation in causal maps.

## (1) Reducing sample size

If data  exist, conditional independence can be verified by selecting the population that  has the condition and verifying that the product of marginal probabilities is  equal joint probability of the two events.  For example, in Table 3, eighteen cases from a  special unit prone to medication errors are presented.  The question is  whether rate of medication errors is independent of length of work shift.

 Case Medication error Long shift Fatigue 1 No Yes Yes 2 No Yes Yes 3 No No Yes 4 No No Yes 5 Yes Yes Yes 6 Yes No Yes 7 Yes No Yes 8 Yes Yes Yes 9 No No No 10 No No No 11 No Yes No 12 No No No 13 No No No 14 No No No 15 No No No 16 No No No 17 Yes No No 18 Yes No No Table 5:  Medication Errors in 18 Consecutive Cases

Using the data in Table 3,  the probability of medication error is calculated as:

P( Error) =  Number of cases with errors / Number of cases = 6/18 = 0.33
P( Long shift) = Number of cases seen by a provider in a long shift / number of  cases = 5/18 =0.28
P( Error & Long shift) = Number of cases with errors & long shift / Number of  cases = 2/18 =0.11
P( Error & Long shift) = 0.11
.09  = 0.33 * 0.28 = P( Error) * P( Long shift)

Above calculations show that probability of medication  error and length of shift are not independent of each other.  Knowing the  length of the shift tells us something about the probability of error in that  shift.   But consider the situation where we are examining these two  events among cases where the provider was fatigued.   Now the  population of cases we are examining is reduced to the cases 1 through 8.   With this population, calculation of the probabilities yields:

P( Error |  Fatigued) =  0.50
P( Long shift | Fatigued) = 0.50
P( Error & Long shift | Fatigued) = 0.25
P( Error &  Long shift | Fatigued) = 0.25
=

0.50 * 0.50 =P( Error |  Fatigued) * P( Long shift | Fatigued)

Among fatigued providers, medication error is independent  of length of work shift. The procedures used in this example, namely  calculating the joint probability and examining to see if it is approximately  equal to product of the marginal is one way of verifying independence.

Independence can also be examined by calculating  conditional probabilities.   As before conditional probabilities are calculated by restricting the population size. For example, in the  population of fatigued providers (i.e. in cases 1 through 8) there are several  cases of working in long shift (i.e. in cases 1, 2, 5, and 8). We can use  this information to calculate conditional probabilities:

P( Error | Fatigue) = 0.50
P( Error | Fatigue & Long shift) = 2/4 = 0.50

Again we observe that among fatigued workers knowing that  the work shift was long adds no information to the probability of medication  error.  The above procedures shows how independence can be verified  by counting cases in reduced populations.  In case there is considerable  amount of data available inside a database, the approach can easily be  implemented by using Standard Query Language. To calculate the conditional  probability of an event, all we need to do is to run a select query that would  select the condition and count the number of events of interest.

## (2) Correlation Analysis

One way for  verifying independence so is to examine the correlations. Two events that are correlated are dependent. For  example in Table 4, we can examine the relationship between age and blood  pressure by calculating the correlation between these two variables.

 Case Age BP Weight 1 35 140 200 2 30 130 185 3 19 120 180 4 20 111 175 5 17 105 170 6 16 103 165 7 20 102 155

Table 4: Relationship between Age and Blood Pressure in 7 Patients

The  correlation between the age and blood pressure, in the sample of data in Table  4, is 0.91.  This correlation is relatively high and suggests that  knowing something about the age of the person will tell us a great deal about  the blood pressure. Therefore, age and blood pressure are dependent in our  sample.

Correlations can also be used to verify conditional independence. To  examine independence of event A and B, in population where event C has occurred,  we need to introduce three pair wise correlations. Assume:

 Rab is the  correlation between A and B Rac is the  correlation between events A and C, and Rcb is the  correlation between event C and B

Events A and B are conditionally independent of each other if the "Vanishing Partial  Correlation" condition holds. This condition states:

Rab= Rac Rcb

Using the  data in Table 4, we calculate the following correlations:

 Rage,  blood pressure = 0.91 Rage,  weight = 0.82 R weight, blood pressure = 0.95

Examination of the data shows that the vanishing partial  correlation holds (~ means approximate equality):

Rage, blood pressure = 0.91 ~ 0.82 * 0.95 = Rage, weight * R weight, blood  pressure

Therefore,  we can conclude that given the patients' weight, the variables age and blood  pressure are independent of each other because they have a partial correlation of zero.

## (3) Directly Expert Queries

It is not  always possible to gather data. Sometimes independence must be verified  subjectively by asking about the relationship among the variables from a  knowledgeable expert. Unconditional independence can be verified by asking  the expert to tell if knowledge of one event will tell us a lot about the  likelihood of another. Conditional independence can be verified by  repeating the same task but now within specific populations. Gustafson and  others (1973a) described a procedure for assessing independence by directly  querying experts:

• Write each event on a 3 x 5 card.

• Ask each expert to assume a specific population in which a condition has been met (i.e. an event has occurred).

• Ask the expert to pair the cards if knowing the value of one event will make it considerably easier to estimate the value of the other.

• Repeat these steps for other populations.

• If several experts are involved, ask them to present their clustering of cards to each other.)

• Have experts discuss any areas of disagreement, and remind them that only major dependencies should be clustered.

• Use majority rule to choose the final clusters. (To be accepted, a cluster must be approved by the majority of experts.)

Experts will have in mind  different, sometimes wrong, notions of dependence, so the words conditional  dependence should be avoided. Instead, we focus on whether one clue tells us  a lot about the influence of another clue in specific populations. We find that  experts are more likely to understand this line of questioning as opposed to  directly asking them to verify conditional independence.

## (4) Separation in Causal Maps

One can assess dependencies through analyzing maps of causal relationships.  In a causal network each node describes an event. The directed arcs  between the nodes depict how one event causes another. Causal networks  work for situations here there is no cyclical relationship among the variables;  it is not possible to start from a node and follow the arcs and return to the  same node. An expert is asked to draw a causal network of the events.  If the expert can do so, then conditional dependence can be verified by the  position of nodes and the arcs.  Several rules can be used to  identify conditional dependencies in a causal network (Pearl, 1998, p117).  These rules include the following:

1. Any two nodes connected by an arrow are dependent. Cause and immediate consequence are dependent.
2. Multiple cause of same effect are dependent as knowing the effect and one of the causes will tell us more about the probability of other causes.
3. If a cause leads to an intermediary event that subsequently affects a consequence, then the consequence is independent of the cause for a given level of the intermediary event.
4. If one cause leads to multiple consequences, the consequences are conditionally independent of each other given the cause.

In the above rules, we assume that removing the condition will actually  remove the path between the independent events. "If removal of node C  renders nodes A and B disconnected from each other, then A and B are proclaimed  independent from each other given C.  Another way to say this is to  observe that event C is between events A and B and there is no way of following  the arcs from A to B without passing through C. In this situation, P(A |  B,C) = P(A | C), A is independent of B given C.

For example, an expert may provide the map in Figure 5 for the relationships  between age, weight and blood pressure. Figure 5: A Causal Map for Relationship of Age and  Blood Pressure

In this Figure, age and weight are shown to depend on each other. Age  and blood pressure are show to be conditionally independent of each other  because there is no way of going from one to the other without passing through  the weight node. Note that if there was an arc between age and blood  pressure, i.e. if the expert believed that there was a direct relationship  between these two variables, then conditional independence would be violated.  Analysis of causal maps can help identify a large number of independencies among  the events being considered. We will present more details and examples for  using causal models to verify independence when we discuss root cause analysis  and modeling uncertainty in subsequent chapters.    .�

# What Do You Know?

Advanced learners like you, often need different ways of understanding a  topic. Reading is just one way of understanding. Another way is through writing.  When you write you not only recall what you have written but also may need to  make inferences about what you have read. Please complete the following  assessment:

Using the following Table,

 Case Hospitalized? Gender Age Insured 1 Yes Male >65 Yes 2 Yes Male <65 Yes 3 Yes Female >65 Yes 4 Yes Female <65 No 5 No Male >65 No 6 No Male <65 No 7 No Female >65 No 8 No Female <65 No
1. What is the probability of hospitalization given that you are male?
2. Is insurance independent of age?
3. What is the likelihood associated of being more than 65 years old among hospitalized patients? Please note that this is not the same as the probability of being hospitalized given you are 65 years old.
4. In predicting hospitalization, what is the likelihood ratio associated with being 65 years old?
5. What is the prior odds for hospitalization before any other information is available?
6. Analyze the data in the Table and report if any two variables are conditionally independent of each other in predicting probability of  hospitalization? To accomplish this you need to calculate the likelihood ratio associated with the following clues: Male, >65, Insured, Male &>65, Male & Insured, >65 & Insured, Male &>65 & Insured.  Then you can see if adding a piece of information changes the likelihood ratio. Keep in mind that because the number of cases are too few, many ratios cannot be calculated. See it done(SWF file) in a different but similar data set.
7. Draw what causes medication errors on a piece of paper, with each cause in a separate node and arrows showing the direction of causality. List all root causes, their immediate effects (showing a chain of cause and effects) until it leads to a medication error.
8. Analyze the graph you have produced and list all conditional dependencies inherent in the graph. Calculating conditional  probabilities by shrinking the universe (SWF file)

# Presentations

To assist you in reviewing the  material in this lecture, please see the following resources:

1. See the slides for assessing conditional independence. Listen to the lecture on conditional probabilities. The same lecture is broken into  four parts and presented below.

Part 1: Introduction to Conditional Independence

Part 2:t� t Definition of Conditional Independence

Part 3: Verifying Conditional Independence

Part 4: An example of verifying conditional independence:
1. See a video on how to use Excel to calculate conditional probabilities by shrinking the universe (SWF  file)

2. Listen to lecture on subjective probabilities (SWF file)

3. Listen to the lecture on independence Bayes odds form   (SWF  file)

Narrated lectures require use of Flash.