﻿ Ordinary Regression Fitting Models

# Hierarchical Regression Model Building

## Overview

This section provides more details on construction of regression models

## Learning Objectives

After completing the activities this module you should be able to:

• Examine if combination of variables are more accurate predictor than each variable by itself.
• Distinguish among several methods of building a multiple linear regression model, including models with interaction terms
• Interpret findings from statistical outputs pertaining to a regression model building technique

## Lecture

• Statistical Analysis of Electronic Health Records by Farrokh Alemi, 2020, pages 274 through 281 Slides► Video►
• Open Intro's model building lecture YouTube►
• Coefficient of Determination using R software Read►
• Model selection using R software Read►

## Assignments

Assignments should be submitted in Blackboard. The submission must have a summary statement, with one statement per question. All assignments should be done in R if possible.

Question 1: The following data provide a large number of factors that affect vaccination rates for COVID-19 in a county in United States.  Use hierarchical modeling to see which subset of factors explain largest portion of variance in getting Complete Series Vaccination rate.

1. Initially explain variation in Complete Series Vaccination rates by demographics (including age, race, gender) of the county's residents.  Report the percent of variation explained.
2. Explain variation in Complete Series Vaccination rates by demographics (age, race, gender), and social determinants (including high school completion rate, percent nor proficient in English, percent employed, percent of children in poverty, and median household income).  Report the percent of variation explained.
3.  Explain variation in Complete Series Vaccination rates by demographics (age, race, gender), social determinants (including high school completion rate, percent nor proficient in English, percent employed, percent of children in poverty, median household income) and health of residents (including percent population disabled, life expectancy, percent population having premature morbidity).  Report the percent of variation explained.
4. Explain variation in Complete Series Vaccination rates by demographics (age, race, gender), social determinants (including high school completion rate, percent nor proficient in English, percent employed, percent of children in poverty, median household income), health of residents (including percent population disabled, life expectancy, percent population having premature morbidity), and political leaning of the population (including republican leaning, democrat leaning).  Report the percent of variation explained.
5. Does a county's political leaning affect vaccination rates?

Resources for Question 1:

Question 2: The following data provide a large number of factors that affect diabetes rate in a county in United States.  Use hierarchical modeling to see which subset of factors explain largest portion of variance in rate of diabetes in the county.

1. Using only independent variables measured in 2015 predict incidence of diabetes in the county. Report the percent of variation explained.
2. Using only independent variables measured in 2016 predict incidence of diabetes in the county. Report the percent of variation explained.
3. Using both independent variables measured in 2015 and independent variables measured in 2016, predict incidence of diabetes in the ocunty. Report the percent of variation explained.
4. List variables that have have an impact on incidence of diabetes within a year.
5. List variables that have an impact on incidence of diabetes within 2 years.

Resources for Question 2: