HAP 786: Workshop in Health Informatics

Lecture: Create Database in All of Us  

Pull data together
Generated by ChatGPT

Overview

Objectives

  1. Select a population of patients to focus on.
  2. Select a high dimensional set of variables to focus on
  3. Define baseline measures and independent variables
  4. Define exposure measures
  5. Define outcomes and dependent variables

Assigned Reading & Learning Materials

Different student teams have different assignments and thus you should not follow the code of others verbatim.  Please note the following videos are organized to serve a specific type of regression and the variables you need in your database could be different.  For missing value or miss-match between language and machine learning models, you need to work with the entire cohort.  You need to think through what are the dependent and the independent variables.  These would be different in different projects.  For missing value or miss-match project the dependent variable is a condition in All of Us and the independent variables are other conditions, medications, or factors.  For other projects, your dependent variable could be response to antidepressant or a predictor of response to antidepressant.  In creating the database, you need to make sure that all independent variables occur prior to the dependent variable. 

Assignment

In this session, you are asked to organize the database for your analysis.  Analysis of observational data requires that you pay attention to timing of variables.  In organizing your database, it is important to make sure that you include timing of the variables.  You need to (a) create a cohort, (b) create a database, and (c) add specific variables missing in the database.

Create Cohort and Related Data Sets for the Missing Value Project.  

You are tasked with organizing a database for analysis of missing values in predicting response to antidepressants. Observational data analysis requires careful attention to timing. In structuring the database, include timing variables and complete the following:  

  1. Create your cohort in All of Us. For missing values project, the entire population should be included.
    Cohort in All of Us Step 2
  2. In missing values, the unit of analysis is individuals. The independent variables are all medications and all conditions. The dependent variable are conditions, medications, procedures and anti-depressant history that are predictive of response to depression treatment.
  3. Create your data sets, for your cohort.  Do not include non-EHR data or surveys.
  4. Add in demographic data
    Joining demographic data
  5. Add in variables used to predict response to antidepressants, these variables are dependent variables in the planned regressions and are typically called target variables:

    Add target variables


    This page is part of the HAP 786 course organized by Farrokh Alemi, Ph.D. Home► Email►