Assigned Reading

Session overview
YouTube►
See also YouTube►

Introduction to
Causal Networks
Read►
 Learning network models
 More
Assignment
For this assignment you can use any statistical package, including R,
SAS, and SPSS. Your instructor is familiar with Netica and
BayesiaLab. R packages are also used often. OpenBUGS and
Gibbs Sampler, Stan, OpenMarkov, and Direct Graphical Model are open
source software. Netica is free for networks less than 15 nodes. A more complete list is available in Wikipedia
under "Bayesian Networks."
OpenBUGS►
Stan►
Direct Graphical Models►
OpenMarkov►
Graphical Models Toolkit►
PyMC►
Genie Smile►
SamIam►
Bayes Server►
AIspace►
BayesiaLab►
Hugin►
AgenaRisk►
dVelox►
System Modeler►
UnBBayes►
Uninet►
Tetrad►
Dezide►
Netica►
Work on this assignment can be done in group's of two students but you cannot work with a student that you have previously teamed up with.
 Draw networks based on the following independence assumptions.
When directed networks are possible, give formulas for predicting
the last variable in the networks from marginal and pairwise
conditional probabilities. Keep in mind that absence of
independence assumption implies dependence.
Review►
Nodes in Network 
Assumption 
X, Y,
Z 
I(X,Y) 
X, Y,
Z 
I(X,Y), Not I(X,YZ) 
X, Y,
Z 
I(X,Y), I(X,YZ), Y
measured last 
X, Y,
Z, W 
I(X,Y), I(X,YZ),
I({X,Y},WZ), W measured last 
X, Y,
Z, W 
I(X,Y), I(Z,W), and
X measured before Z and Y measured before W 
Wang's Teach One
Video►
 Construct a Bayesian probability network model that would
predict success with antidepressants. A network model will include
variables, and mediators of the effect of variables, on response to
the antidepressant. Include all baseline diagnoses and gender as
covariates. Include previous antidepressants as covariates.
To understand the sequence of trying antidepressants, examine the
following figure:
Calculate remission if the patient was on the
antidepressant. Remission or relapse should be considered an end
node. Gender is a root node. All other variables, e.g. diagnoses,
could be either root or intermediary nodes but all occur prior to
use of antidepressant. The antidepressants that were given
prior to an antidepressant should be used as a covariate. The
data has been modified to report per person data, without
visitbased information.
Data►
(a)
Identify the parents in the Markov blanket of citalopram using
bnLearn software in R and a constraint based algorithm such as Grow
Shrink. All baseline diseases occur prior to treatment with
citalopram. (b) Identify the parents in Markov blanket of
citalopram using regression analysis.
Amr's Regression Code►
(c) Identify impact of
citalopram on response using stratified covariate balancing (d) Using the
network model, predict the response to citalopram for a patient with
neurological disorder, and PTSD. (e) Using SQL, predict response
to citalopram for a patient with PTSD and neurological disorders, based on the nearest strata. SQL
for Similar Strata►
 Write an SQL code to calculate the probability of negative outcome in the situation
where the patient is severely ill and has not signed a "Do Not
Resuscitate" (DNR) order. Note that probabilities for
events that are mutually exclusive and exhaustive should add up to
one.
Data►
Bushra's Teach One►
Anto's Teach One►
Slides►
SQL►
 Redo problem 3 in Netica or other software and verify the
accuracy of your answer.
To accomplish this project organize the 4 node network inside Netica
and direct the links between the nodes as in the graph structure.
Then for every node, enter the table of probabilities as per tables
given in Question 3. For example, for the DNR node enter the
two probabilities of 0.1 and 0.9 into the Table within the node for
DNR. Once the entire network (the graph and the related
probabilities) has been entered into Netica, evaluate the risks
for a patient who is severely ill and has not signed a "Do Not
Resuscitate" order.
Netica►
Shruti's
Teach One►
Usman's Teach One►

The attached data show the percent of diabetes in different 2,228 counties within
United States in 2010, 2011, and 2012 years. We want to
understand if access to food stores affects diabetes. Create the network model,
using data from 7 LASSO regressions. The
first regression will be diabetes in 2012 on all 2011 variables.
Six other LASSO regressions will have as dependent variable the
statistically significant variables in the previous regression
regressed on all 2010 variables. Draw the network model using Netica.
Stratify
the parents in Markov blanket of diabetes in 2012; calculate the impact of access
to quality food stores in 2011 on diabetes using stratified
covariate balancing.
Data►
Instruction for SCB►
New SCB Code►
Netica►
Answer►
 Inside an electronic health record, there are data on outcomes of a
particular intervention. Using the network drawn below, write
the equations that would allow you to estimate what would happen if
the intervention was not given. First, write an equation for
each node in the network based on variables that precede it.
For example, the regression equation for predicting whether there is
an adverse event is given by the equation:
Outcome = a + b Treatment + c Severity
Second, set the variables that change across these equations to the
relevant values. For example, set Treatment to be zero.
See Velosky's Teach One►
 The following graph was used to simulate data on bundling payment for total hip fracture treatment:
 Does poor access to quality food stores affects diabetes?
The following data show rate of diabetes in several geographic areas
with different levels of access. The unit of analysis is the
geographic area. For each area the percent of presence of
different variables are included in the data. The data are
collected over 5 years and include the following variables:
county_fips, state, id, age, race, gender, income, year,
marital_status, cellphone, insured, edu, cty_year, adults_in_hh,
smoke, diabetes, bmi, unemployment, active_commuting, food_stores,
restaurants, complete_case, z_ue, z_ac, z_food, z_rest, z_ue_mean,
z_ac_mean, z_food_mean, z_rest_mean, z_ue_diff, z_ac_diff,
z_food_diff, z_rest_diff, and region. Fit a network
model to the data, where rate of diabetes in the 5th year is the
dependent variable of interest. All other variables are measured
in the 4th year. Data►
A number of countylevel analysis are published,
including:
More on BMI►
More on depression►
More on mortality►
More on food outlets►
More on
pollution►
More
For additional information (not part of the required reading), please see the following links:/p>
 Introduction to causal inference
Read
1►
Read 2►
Video►
Slides►
 Meta analysis through Bayesian networks
Read►
 Introduction to Bayesian networks
Read►
 Learning Bayesian Networks
Read►
 Selection of Judea Pearl's articles
PubMed►
 Applications of Bayesian networks in healthcare PubMed►
 Use of graphs in removing confounding Read►
 Learning Bayesian networks from correlated data Read►
 Bayesian networks in neuroscience Read►
 Cost analysis using Bayesian networks Read►
 Comparison of Bayesian network and logistic models Read►
 Bayesian network classifiers
Read►
 Introduction to Markov process
Tim's Lecture►
 Explanation of predictions
Aloudah's Lecture►
 Outcome based prescribing for citalopram
Slides►
This page is part of the course on Comparative Effectiveness by Farrokh Alemi, PhD Home►
Email►
