Overview
In this module, you will learn to handle missing values in both
dependent and independent data, a crucial skill for ensuring the integrity
of your analyses. You will determine if data is missing at random, check
the accuracy of mean imputation, and verify if missing values in EHRs
indicate the absence of disease. By using a series of regressions
(structural equation models), you will predict the value of missing
variables, equipping you with advanced techniques to manage incomplete
datasets effectively.
Learning Objectives
After completing the activities this module you should be able to:
- Adjust for missing values in the dependent and independent data
- Determine if data is missing at random
- Check for accuracy of mean imputation
- Check for accuracy of missing value in EHRs indicating absence of
disease
- Use a series of regressions (structural equation models) to
predict the value of missing variables
Lecture
Indicates
AI assisted content, image or video.
Assignments
Assignments should be submitted in Blackboard. The submission must have a summary statement, with one statement per question. All
assignments should be done in R if possible.
Question 1: Regress progression in Infectious and Parasite body system on all
other variables (except diabetes). In the attached data, the variables indicate incidence of diabetes (a
binary variable) and progression of diseases in body systems. You can do the analysis first on 10% sample before you do it on the entire data that
may take several hours.
- Remove all instances where progression in Infectious and Parasite body system is missing. Report the number of cases that remain.
- Plot the progression in Infectious and Parasite body system. Is
the data bimodal? Is the data symmetric around the mean? Transform
the data to improve QQplot..
- Include in your analysis all pairwise, and triplets of the independent variables. Exclude any variable or interaction term that is always missing.
What is the total number of independent variables included in the regression?
- Impute missing independent variables from other variables that are present. Regress progression in Infectious and Parasite body
system on independent variables and report the percent of variation explained.
- Assume that missing independent variables indicate the patient does not have
any disease in the body system (i.e., 0 score). Regress progression in
Infectious and Parasite body system on independent variables and report the percent of variation explained.
- Assume the value of the missing independent variables can be replaced by the average value of the independent variable. Regress progression
in Infectious and Parasite body system on independent variables and report the percent of variation explained.
- Indicate which method of replacing missing values fits the data best.
Question 2: Consider the regression of progression in Infectious and Parasite body system on all
other variables (except diabetes). In the attached data, the variables indicate incidence of diabetes (a
binary variable) and progression of diseases in body systems. You can do the analysis first on 10% sample before you do it on the entire data that
may take several hours.
- Create a binary variable that is 1 every time a variable is missing and 0 otherwise. Predict Progression in Infectious and
Parasite body system from binary diseases and report if any of the variables is statistically significant. List variables that are not missing at random.
Variables that are not missing at random have a statistically
signficant relationship to the response (outcome) variable.
- Replace missing values using MICE. Report the percent of variation
explained before and after MICE adjustments.
More
For additional information (not part of the required reading), please see the following links:
- Introduction to regression by others
YouTube►
Slides►
- Regression using R Read►
- Statistical learning with R
Read►
- Open introduction to statistics
Read►
This page is part of the HAP 819 course on Advance Statistics and was
organized by Farrokh Alemi PhD Home►
Email►
|