Comparative Effectiveness HAP 823

Lecture: AI Diagnosis of COVID-19 at Home  

 

Assigned Reading

Assignment

Question 1:  Create an AI system to diagnose COVID-19 based on information available at home prior to visit to a clinic or emergency room. 

  • Create the knowledgebase of the AI system

Describe the order of occurrence of the variables:

(a) Tier assignment (determines ordering between variables)

  1. Tier 0 — Birth: Age, gender identity, race, ethnicity variables. Assigned because these are fixed at birth and precede all other observations.
  2. Tier 1 — Vaccination: All COVID and flu vaccine status and vaccine attitude variables. Assigned because vaccination was collected as a prior history variable, before symptom onset.
  3. Tier 2 — Symptoms: All symptom, exposure, behavior, and condition variables from the Symptom Screening and About You sections. These represent the illness period.
  4. Tier 3 — At-home test: At-home rapid test result variables (pink line, blue line, confirmations). Assigned because home testing occurs after symptoms are already present.
  5. Tier 4 — PCR lab test: The single PCR Test Positive variable. Assigned because lab confirmation occurs last in the clinical pathway, after all other variables.

(b) Temporal value rules (applied in this priority order)

  1. Cross-tier pairs (tier A < tier B): n_code_before_target = n_code_target (all co-occurrences; A always precedes B by design). n_target_before_code = 0.
  2. Within-symptom pairs with Ill data (tier A == tier B, both are 30141-covid_tst_symptoms vars): n_code_before_target and n_target_before_code filled from Symptoms_Ill_N first-symptom flags — participant-reported ordering of which symptom appeared first among those who had both.
  3. Same-tier pairs with co-occurrence but no Ill data: Both n_code_before_target and n_target_before_code = n_code_target (the count of patients with both present), applied symmetrically since ordering cannot be determined.
  4. Zero co-occurrence pairs (n_code_target = 0), all types: n_code_before_target = n_code_no_target (patients where A=1 and B=0, interpreted as A occurring without B ever being observed). n_target_before_code = n_target_no_code (patients where B=1 and A=0).

(c) Exclusion rule

  1. Self-comparisons: Any row where concept_code == target_concept_code is excluded. Verified to be zero rows in the current file.
  • Create pairwise association of variables with each other.
    • Calculate the frequency with which each variable occurs with another.
  • Create the structure of the network using regression.
    1. Regress the PCR test results on all variables and pairwise or triple cluster of variables that precede it. 
      • List the variables that are direct predictors of PCR test results.  This list should include the coefficients for the non-zero Logistic regression variables, including coefficients for pairs or triple of variables.  
      • Regress symptoms on all symptoms and other variables that occur prior to it
      • Report the percent of variation explained by the LASSO regression of PCR tests on independent variables.  Calculate and report the McFadden Pseudo R-Square. 
    2. Regress each variable that is a direct predictor of PCR test results on all preceding variables. In this regression, the statistically significant variables are parents in the Markov blanket of the regression response variable. 
      • For each regression, report the independent variables that are significant (non-zero) predictors of the response variable (the response variables are the direct predictors of PCR tests)
      • For each regression, report the percent of variation explained by the regression
    3. Draw the network using Netica.
      • Provide an image of the structure of the network, organized so that nodes that occur later are put to the right of nodes that occur earlier.  Please note that if you do not have a license to Netica, you can make the network and take a screen shot before you save the network and need a license.
    4. Estimate the parameters of the network
      • Using the LASSO regression, calculate the predicted value for all combinations of the parents in the Markov blanket of the regression's response variables. Enter this information into Netica Tables.

Use DEMI algorithm

  • Using DEMI algorithm and the knowledgebase you have organized and a large language model, predict the probability of COVID-19 from the symptoms of COVID.

The following resources may be helpful:


This page is part of the course on Comparative Effectiveness by Farrokh Alemi, Ph.D. Home► Profile►