

Assigned Reading
- Diagnosis of COVID-19 is complex and beyond means of human mind:
Assignment
Question 1: Create an AI system to diagnose COVID-19 based on
information available at home prior to visit to a clinic or emergency room.
- Create the knowledgebase of the AI system
Describe the order of occurrence of the variables:
(a) Tier assignment (determines ordering between variables)
- Tier 0 — Birth: Age,
gender identity, race, ethnicity variables. Assigned because these are
fixed at birth and precede all other observations.
- Tier 1 — Vaccination:
All COVID and flu vaccine status and vaccine attitude variables.
Assigned because vaccination was collected as a prior history
variable, before symptom onset.
- Tier 2 — Symptoms: All
symptom, exposure, behavior, and condition variables from the Symptom
Screening and About You sections. These represent the illness period.
- Tier 3 — At-home test:
At-home rapid test result variables (pink line, blue line,
confirmations). Assigned because home testing occurs after symptoms
are already present.
- Tier 4 — PCR lab test:
The single PCR Test Positive variable. Assigned because lab
confirmation occurs last in the clinical pathway, after all other
variables.
(b) Temporal value rules (applied in this priority order)
- Cross-tier pairs (tier
A < tier B):
n_code_before_target =
n_code_target (all co-occurrences; A always precedes B by
design).
n_target_before_code = 0.
- Within-symptom pairs
with Ill data (tier A == tier B, both are
30141-covid_tst_symptoms vars):
n_code_before_target and
n_target_before_code filled from
Symptoms_Ill_N first-symptom flags — participant-reported
ordering of which symptom appeared first among those who had both.
- Same-tier pairs with
co-occurrence but no Ill data: Both
n_code_before_target and
n_target_before_code =
n_code_target (the count of patients with both present),
applied symmetrically since ordering cannot be determined.
- Zero co-occurrence
pairs (n_code_target = 0), all types:
n_code_before_target =
n_code_no_target (patients where A=1 and B=0, interpreted as A
occurring without B ever being observed).
n_target_before_code =
n_target_no_code (patients where B=1 and A=0).
(c) Exclusion rule
- Self-comparisons: Any
row where
concept_code == target_concept_code is excluded. Verified to be
zero rows in the current file.
- Create pairwise association of variables with each other.
- Calculate the frequency with which each variable occurs with
another.
- Create the structure of the network using regression.
- Regress the PCR test results on all variables
and pairwise or triple cluster of variables that precede it.
- List the variables that are direct predictors
of PCR test results. This list should
include the coefficients for the non-zero Logistic
regression variables, including coefficients for pairs or
triple of variables.
- Regress symptoms on all symptoms and other variables
that occur prior to it
- Report the percent of variation explained by the LASSO
regression of PCR tests on independent variables.
Calculate and report the
McFadden Pseudo R-Square.
- Regress each variable that is a direct
predictor of PCR test results on all preceding
variables. In this regression, the statistically significant variables are parents in the
Markov blanket of the regression response variable.
- For each regression, report the independent
variables that are significant (non-zero) predictors of
the response variable (the response variables are the
direct predictors of PCR tests)
- For each regression, report the percent of
variation explained by the regression
- Draw
the network using Netica.
- Provide an image of the structure of the
network, organized so that nodes that occur later are put
to the right of nodes that occur earlier. Please
note that if you do not have a license to Netica, you can
make the network and take a screen shot before you save
the network and need a license.
- Estimate the parameters of
the network
- Using the LASSO regression, calculate the predicted value
for all combinations of the parents in the Markov blanket of
the regression's response variables. Enter this information
into Netica Tables.
Use DEMI algorithm
- Using DEMI algorithm and the knowledgebase you have organized
and a large language model, predict the probability of COVID-19
from the symptoms of COVID.
The following resources may be helpful:
- Survey data results
Download►
Dictionary►
- Sample knowledgebase of frequency of variables co-occurring and
precedence among variables CSV►
- DEMI algorithm Python►
This page is part of the course on Comparative Effectiveness by Farrokh Alemi, Ph.D.
Home► Profile►
|