Generated by ChatGPT
Overview
Objectives
- Select a population of patients to focus on
- Select a high dimensional set of variables to focus on
- Define baseline measures and independent variables
- Define exposure measures
- Define outcomes and dependent variaables
Assigned Reading & Learning Materials
Different student teams have different assignments and
thus you should not follow the code of others verbatim. Please note
the following videos are organized to serve a specific type of regression
and the variables you need in your database could be different. You
need to think through what are the dependent and the independent
variables. We would like you to use conditions reported in All of Us
as your independent variable. Your dependent variable could be
response to antidepressant or a predictor of response to antidepressant.
In creating the database you need to make sure that all independent
variables occur prior to the dependent variable.
- Vlad Cardenas Teach One Part 1
YouTube►
- Vlad Cardenas Teach One Part 2
YouTube►
Slides►
Code►
- Rasil Alamri’s Teach One Part 1
YouTube►
- Mona Mohamed’s Teach One Part 2
YouTube►
- Organizing response to antidepressants
CSV►
- Organizing conditions
(independent variables) within body systems
CSV►
- Creating survival variable
More►
- Creating a database in All of Us
YouTube►
- Rasil Alamri’s Teach One Part 1:
YouTube►
- Mona Mohamed’s Teach One Part 2:
YouTube►
- Lana Hashem's Teach One - Data Preparation Part 1:
YouTube►
- Chathrini Sirisena's Teach One - Data Preparation Part 2:
YouTube►
- HAP 464 Lecture 03 - Data Preparation (Part 2):
YouTube►
- Wafaa Abdelmalak's Teach One - Data Preparation Part 3
YouTube►
- Jenny Rivera-Rivas' Teach One - Data Preparation Part 4
YouTube►
Assignment
The semester long project in this course is to assess the
effectiveness of an existing guide to depression medications in minority
populations. In this session, you are asked to organize the database for
your analysis. Analysis of observational data requires that you
pay attention to timing of variables. In organizing your database,
it is important to make sure that you include timing of the variables.
(B) Create Cohort and Related Data Sets. Note that a
cohort and data sets are different concepts.
- Create your cohort in All of Us.
- Limit the cohort by African American race.
- Create the concept for Major depression. Review in PubMed how
investigators have defined Major depression in EHRs. Alternatively, use conditions defined within All of Us to select
the right definition of Major Depression.
- Create the concept of patient's survival.
- The unit of analysis is medications and not individuals. An
individual can have multiple medications. Define the database so
that there is one entry for each antidepressant.
- Create your data sets, for your cohort. Do not include non-EHR data or surveys.
Note that creation of antidepressant data set requires creation of concepts that capture the
antidepressant in the data. In your cohort, select demographics (age, gender)
and all conditions as independent variables of interest. No survey responses are needed for independent variables. Rely only
on EHR data only. Include date of occurrence of every event. You also need the date of
first use (purchase) of the antidepressant. The date of occurrence of the
response variable is the first time the variable/condition has occurred. Here are the
data points that you need to include in your data sets:
- ID of antidepressant
- ID of person
- Age at first intake of
antidepressant
- Sex at birth
- Gender
- Survival
- 590 Diseases among the Conditions.
Here are more detailed steps in getting ready for analysis:
- Get the dataset for patient demographics to include date of birth,
race, and ethnicity.
- Select African Americans.
- Create the base of df_analysis from this.
- Get date_of_death from the dataset containing dead persons
then left join to df_analysis.
- Get date_of_first_antidepressant from dataset containing all of
your cancers then left join to df_analysis.
- Get date of every antidepressant purchase
- Process disease dataset
- Get list of all of your antidepressants codes. The data set should
not be limited to the antidepressant you selected and should include
all antidepressants.
- Create a new column for the start date of antidepressant you
selected.
- Create a new column for the end date of antidepressant you
selected.
- Create a new column for duration of any antidepressant used prior
to the antidepressant you selected.
- Create a new column disease_group
- Use the df_disease_grouped.csv to fill the disease_group
column based on standard_concept_code
- Change missing values of disease_group to zero, 0 (catch all
disease grouping). This assumes that unreported diseases are
absent.
- Select all diseases that occur prior to date of the
antidepressants
- Calculate number of days of use of antidepressants and score if
antidepressant was prematurely abandoned.
- Binarize the disease_group column. No need to drop any
column since this is not mutually exclusive, meaning a person can have
many disease groups thus avoiding the dummy variable trap.
- Aggregate based on antidepressant-id so that only 1 row per
antidepressant per person_id is
in the dataset and the binarized disease group columns indicate all
the disease groups that the person has.
- Drop all other columns except antidepressant_id, person_id, days
of antidepressant use, and the binarized
columns.
- Left join the binarized columns to df_analysis
- You are now ready to start description of the data
This page is part of the HAP 819 course organized by Farrokh Alemi, Ph.D.
Home► Email►
|