
Overview
In this module, you will learn to handle missing values in both
dependent and independent data, a crucial skill for ensuring the integrity
of your analyses. You will determine if data is missing at random, check
the accuracy of mean imputation, and verify if missing values in EHRs
indicate the absence of disease. By using a series of regressions
(structural equation models), you will predict the value of missing
variables, equipping you with advanced techniques to manage incomplete
datasets effectively.
Learning Objectives
After completing the activities this module you should be able to:
- Adjust for missing values in the dependent and independent data
- Determine if data is missing at random
- Check for accuracy of mean imputation
- Check for accuracy of missing value in EHRs indicating absence of
disease
- Use a series of regressions (structural equation models) to
predict the value of missing variables
Lecture
Indicates
AI assisted content, image or video.
Assignments
Assignments should be submitted in Blackboard. The submission must have a summary statement, with one statement per question. All
assignments should be done in R if possible.
Question 1: Regress progression in Infectious and Parasite body system on all
other variables (except diabetes). In the attached data, the variables indicate incidence of diabetes (a
binary variable) and progression of diseases in body systems. You can do the analysis first on 10% sample before you do it on the entire data that
may take several hours.
- Remove from independent variables a body system that is always missing. Report the number of cases
and variables that remain.
- Assume that missing independent variables indicate that the
patient does not have any disease in the body system (i.e., assign a
score of 0 when the data is missing). Print a summary of the data
showing that there are no missing values in the data.
- Regress progression in the Infectious and Parasite body system on
the independent variables. Report the total number of cases and
variables in the analysis. Report the R-squared. Report the
coefficients of variables that are statistically significant.
Question 2: Consider the regression of progression in Infectious and Parasite body system on all
other variables (except diabetes). In the attached data, the variables indicate incidence of diabetes (a
binary variable) and progression of diseases in body systems. You can do the analysis first on 10% sample before you do it on the entire data that
may take several hours.
- Remove variables where a body system is always missing. Report the number of cases
and variables that remain.
- Regress indicators for missing indicator variables on other
reported independent variables, using MICE software or doing the regressions
one at a time by yourself. Report the coefficients of these
regressions.
- Regress progression in Infectious and Parasite body system on the
independent variables and indicator variables for missing variables. Report the total number of cases and variables
in the data. Report the
R-squared for the regression. Report the coefficients of the
regression equation and list the variables that are missing not at
random.
More
For additional information (not part of the required reading), please see the following links:
- Introduction to regression by others
YouTube►
Slides►
- Regression using R Read►
- Statistical learning with R
Read►
- Open introduction to statistics
Read►
This page is part of the HAP 819 course on Advance Statistics and was
organized by Farrokh Alemi PhD Home►
Email►
|