HAP 719: Advanced Statistics I

Hierarchical Regression Model Building

Hierarchial modeling
Generated by ChatGPT

Overview

This section provides more details on construction of regression models. In this module, you will dive into the construction of regression models, a critical aspect of statistical analysis. You will examine whether a combination of variables serves as a more accurate predictor than each variable individually. Additionally, you will distinguish among several methods of building a multiple linear regression model, including those with interaction terms. By interpreting findings from statistical outputs related to these techniques, you will gain a comprehensive understanding of regression model building, enhancing your ability to make informed decisions based on your analyses.

Learning Objectives

After completing the activities this module you should be able to:

Examine if combination of variables are more accurate predictor than each variable by itself.
Distinguish among several methods of building a multiple linear regression model, including models with interaction terms
Interpret findings from statistical outputs pertaining to a regression model building technique

Lecture

AI assisted content, image, or video

Statistical Analysis of Electronic Health Records by Farrokh Alemi, 2020, pages 274 through 281 Slides► Video►
Coefficient of Determination using R software Read►
Model selection using R software Read►
Yili Lin's on model building and selection Slides► Video 1► Video 2►
Yili Lin's Model building using R Slides► Video►

Assignments

Assignments should be submitted in Blackboard. The submission must have a summary statement, with one statement per question. All assignments should be done in R if possible.

Question 1: The following data provide a large number of factors that affect vaccination rates for COVID-19 in a county in United States. Use hierarchical modeling to see which subset of factors explain largest portion of variance in getting Complete Series Vaccination rate.

Social-determinants,-political-leaning,-and-vaccination-hesitancy

Initially explain variation in Complete Series Vaccination rates by demographics (including age, race, gender) of the county's residents. Report the percent of variation explained.
Explain variation in Complete Series Vaccination rates by demographics (age, race, gender), and social determinants (including high school completion rate, percent nor proficient in English, percent employed, percent of children in poverty, and median household income). Report the percent of variation explained.
Explain variation in Complete Series Vaccination rates by demographics (age, race, gender), social determinants (including high school completion rate, percent nor proficient in English, percent employed, percent of children in poverty, median household income) and health of residents (including percent population disabled, life expectancy, percent population having premature morbidity). Report the percent of variation explained.
Explain variation in Complete Series Vaccination rates by demographics (age, race, gender), social determinants (including high school completion rate, percent nor proficient in English, percent employed, percent of children in poverty, median household income), health of residents (including percent population disabled, life expectancy, percent population having premature morbidity), and political leaning of the population (including republican leaning, democrat leaning). Report the percent of variation explained.
Does a county's political leaning affect vaccination rates?

Resources for Question 1:

Data Download►
Yili Lin's Answer► R-code►
S B Yuvaraj Rejeti's Teach One YouTube►
Nahida Farheen Shaik's Teach One YouTube►

Question 2: The following data provide a large number of factors that affect diabetes rate in a county in United States. Use hierarchical modeling to see which subset of factors explain largest portion of variance in rate of diabetes in the county.

Network-model-of-diabetes-over-two-years.

Using only independent variables measured in 2015 predict incidence of diabetes in the county. Report the percent of variation explained.
Using only independent variables measured in 2016 predict incidence of diabetes in the county. Report the percent of variation explained.
Using both independent variables measured in 2015 and independent variables measured in 2016, predict incidence of diabetes in the ocunty. Report the percent of variation explained.
List variables that have have an impact on incidence of diabetes within a year.
List variables that have an impact on incidence of diabetes within 2 years.

Resources for Question 2:

Data Download►
Sowmya Chakravarthy's Answer► R-code►
Kai Muniz's Teach One YouTube►

For additional information (not part of the required reading), please see the following links:

Open Intro's model building lecture YouTube►
Introduction to regression YouTube► Slides►

This page is part of the HAP 719 course on Advance Statistics I and was organized by Farrokh Alemi PhD Home► Email►