# Ordinary Regression Missing Values

## Learning Objectives

After completing the activities this module you should be able to:

• Adjust for missing values in the dependent and independent data
• Determine if data is missing at random
• Check for accuracy of mean imputation
• Check for accuracy of missing value in EHRs indicating absence of disease
• Use a series of regressions (structural equation models) to predict the value of missing variables

## Assignments

Assignments should be submitted in Blackboard. The submission must have a summary statement, with one statement per question. All assignments should be done in R if possible.

Question 1: Regress progression in Infectious and Parasite body system on all other variables (except diabetes). In the attached data, the variables indicate incidence of diabetes (a binary variable) and progression of diseases in body systems. You can do the analysis first on 10% sample before you do it on the entire data that may take several hours.

1. Remove all instances where progression in Infectious and Parasite body system is missing.  Report the number of cases that remain.
2. Plot the progression in Infectious and Parasite body system. Is the data bimodal?  Is the data symmetric around the mean? Transform the data to improve QQplot..
3. Include in your analysis all pairwise, and triplets of the independent variables.  Exclude any variable or interaction term that is always missing. What is the total number of independent variables included in the regression?
4. Impute missing independent variables from other variables that are present.  Regress progression in Infectious and Parasite body system on independent variables and report the percent of variation explained.
5. Assume that missing independent variables indicate the patient does not have any disease in the body system (i.e., 0 score).   Regress progression in Infectious and Parasite body system on independent variables and report the percent of variation explained.
6. Assume the value of the missing independent variables can be replaced by the average value of the independent variable. Regress progression in Infectious and Parasite body system on independent variables and report the percent of variation explained.
7. Indicate which method of replacing missing values fits the data best.

Question 2: Consider the regression of progression in Infectious and Parasite body system on all other variables (except diabetes). In the attached data, the variables indicate incidence of diabetes (a binary variable) and progression of diseases in body systems. You can do the analysis first on 10% sample before you do it on the entire data that may take several hours.

1. Create a binary variable that is 1 every time a variable is missing and 0 otherwise. Predict Progression in Infectious and Parasite body system from binary diseases and report if any of the variables is statistically significant.  List variables that are not missing at random.  Variables that are not missing at random have a statistically signficant relationship to the response (outcome) variable.
2. Replace missing values using MICE. Report the percent of variation explained before and after MICE adjustments.