Pull data together
Generated by ChatGPT

Overview

Organize your data Slides► Narrated Slides► Video► YouTube►

Objectives

Select a population of patients to focus on.
Select a high dimensional set of variables to focus on
Define baseline measures and independent variables
Define exposure measures
Define outcomes and dependent variables

Assigned Reading & Learning Materials

Different student teams have different assignments and thus you should not follow the code of others verbatim. Please note the following videos are organized to serve a specific type of regression and the variables you need in your database could be different. For missing value or miss-match between language and machine learning models, you need to work with the entire cohort. You need to think through what are the dependent and the independent variables. These would be different in different projects. For missing value or miss-match project the dependent variable is a condition in All of Us and the independent variables are other conditions, medications, or factors. For other projects, your dependent variable could be response to antidepressant or a predictor of response to antidepressant. In creating the database, you need to make sure that all independent variables occur prior to the dependent variable.

Keerti Reddy Resapu's missing values database Slides► YouTube► Video► Code►
Keerti Reddy Resapu's adding antidepressant history to the database Slides► Code► CSV► Video►
Keerti Reddy Resapu's creating database in background Slides► Video►
Merging data and dropping duplicate fields Slides► Video► Code►
Vlad Cardenas Teach One Part 1 YouTube 1► 2 ► Slides► Code►
Organizing response to antidepressants CSV►
Organizing conditions (independent variables) within body systems CSV►
Creating survival variable More►
Creating a database in All of Us YouTube►
Lana Hashem's Teach One - Data Preparation Part 1: YouTube►
Chathrini Sirisena's Teach One - Data Preparation Part 2: YouTube►
Wafaa Abdelmalak's Teach One - Data Preparation Part 3 YouTube►
Jenny Rivera-Rivas' Teach One - Data Preparation Part 4 YouTube►

Assignment

In this session, you are asked to organize the database for your analysis. Analysis of observational data requires that you pay attention to timing of variables. In organizing your database, it is important to make sure that you include timing of the variables. You need to (a) create a cohort, (b) create a database, and (c) add specific variables missing in the database.

Create Cohort and Related Data Sets for the Missing Value Project.

You are tasked with organizing a database for analysis of missing values in predicting response to antidepressants. Observational data analysis requires careful attention to timing. In structuring the database, include timing variables and complete the following:

Create your cohort in All of Us. For missing values project, the entire population should be included.
In missing values, the unit of analysis is individuals. The independent variables are all medications and all conditions. The dependent variable are conditions, medications, procedures and anti-depressant history that are predictive of response to depression treatment.
Create your data sets, for your cohort. Do not include non-EHR data or surveys.
Add in demographic data
Add in variables used to predict response to antidepressants, these variables are dependent variables in the planned regressions and are typically called target variables:

This page is part of the HAP 786 course organized by Farrokh Alemi, Ph.D. Home► Email►

HAP 786: Workshop in Health Informatics

Lecture: Create Database in All of Us

Overview

Objectives

Assigned Reading & Learning Materials

Assignment