George Mason University HAP 719 Advanced Statistics

HAP 719: Advanced Statistics I

Hierarchical Regression Model Building  

 

Overview

This section provides more details on construction of regression models

Learning Objectives

After completing the activities this module you should be able to:

  • Examine if combination of variables are more accurate predictor than each variable by itself.
  • Distinguish among several methods of building a multiple linear regression model, including models with interaction terms
  • Interpret findings from statistical outputs pertaining to a regression model building technique

Lecture

  • Statistical Analysis of Electronic Health Records by Farrokh Alemi, 2020, pages 274 through 281 Slides► Video►
  • Open Intro's model building lecture YouTube►
  • Coefficient of Determination using R software Read►
  • Model selection using R software Read►

Assignments

Assignments should be submitted in Blackboard. The submission must have a summary statement, with one statement per question. All assignments should be done in R if possible.

Question 1: The following data provide a large number of factors that affect vaccination rates for COVID-19 in a county in United States.  Use hierarchical modeling to see which subset of factors explain largest portion of variance in getting Complete Series Vaccination rate.

Social-determinants,-political-leaning,-and-vaccination-hesitancy

  1. Initially explain variation in Complete Series Vaccination rates by demographics (including age, race, gender) of the county's residents.  Report the percent of variation explained. 
  2. Explain variation in Complete Series Vaccination rates by demographics (age, race, gender), and social determinants (including high school completion rate, percent nor proficient in English, percent employed, percent of children in poverty, and median household income).  Report the percent of variation explained.
  3.  Explain variation in Complete Series Vaccination rates by demographics (age, race, gender), social determinants (including high school completion rate, percent nor proficient in English, percent employed, percent of children in poverty, median household income) and health of residents (including percent population disabled, life expectancy, percent population having premature morbidity).  Report the percent of variation explained.
  4. Explain variation in Complete Series Vaccination rates by demographics (age, race, gender), social determinants (including high school completion rate, percent nor proficient in English, percent employed, percent of children in poverty, median household income), health of residents (including percent population disabled, life expectancy, percent population having premature morbidity), and political leaning of the population (including republican leaning, democrat leaning).  Report the percent of variation explained.
  5. Does a county's political leaning affect vaccination rates?

Resources for Question 1:

Question 2: The following data provide a large number of factors that affect diabetes rate in a county in United States.  Use hierarchical modeling to see which subset of factors explain largest portion of variance in rate of diabetes in the county.

Network-model-of-diabetes-over-two-years.

  1. Using only independent variables measured in 2015 predict incidence of diabetes in the county. Report the percent of variation explained.
  2. Using only independent variables measured in 2016 predict incidence of diabetes in the county. Report the percent of variation explained.
  3. Using both independent variables measured in 2015 and independent variables measured in 2016, predict incidence of diabetes in the ocunty. Report the percent of variation explained.
  4. List variables that have have an impact on incidence of diabetes within a year.
  5. List variables that have an impact on incidence of diabetes within 2 years.

Resources for Question 2:

More

For additional information (not part of the required reading), please see the following links:

  1. Introduction to regression by others YouTube► Slides►
  2. Regression using R Read►
  3. Statistical learning with R Read►
  4. Open introduction to statistics Read►

This page is part of the HAP 819 course on Advance Statistics and was organized by Farrokh Alemi PhD Home►  Email►