HAP 719: Advanced Statistics I

Analysis of Co-Variance & Covariate Balancing

Objectives

After completing the activities this module you should be able to:

Analyze data using ANCOVA
Analyze data after balancing covariates
Interpret statistical outputs as they relate to these techniques

Lecture

Yili's lecture on ANCOVA Slides►
Alemi's lecture on covariate balancing PubMed► Slides► YouTube► Video►
Analysis of covariance Video► Slides►
Stratified Covariate balancing R Code► R Package► Slides► YouTube►
Debapriya's Teach One YouTube► SAS Code►
Application of covariate balancing to lung cancer SQL►
Stratified covariate balancing in high dimensional data Read►
Lee's improving overlap through folding back Slides► SQL►

Assignments

Assignments are submitted on blackboard. They are graded as pass/fail. A summary 1-page word document should be included. In the summary, you should state if you were able to get the same answers as those provided. Your R, STATA, or Python code should be included in separate files. No late assignments are accepted. It is OK to help each other in doing the assignments but not OK to copy and paste work of others. It is OK to use ChatGPT or other large language models to generate the R code, but you must be transparent about it and report its use.

The data set you will be using for this project constitutes a subsample of a larger data set called 2011/2012 National Survey of Children’s Health (NSCH) which was conducted by the Centers for Disease Control and Prevention (CDC) and the National Center for Health Statistics. This survey included around 95,000 children between the ages of 0-17 years and its purpose was to measure children’s health status, insurance coverage, parental health and several other characteristics.

Background information Read►
2021 Data Download STATA►
List of variable names and corresponding response options Screener Variables► Topical Variables►

Question 1: The following data provide the length of stay of patients seen by Dr. Smith (Variable Dr. Smith=1) and his peer group (variable Dr. Smith = 0).

Balance the data through stratified covariate balancing. Graphically show that the weighting procedure of stratified covariate balancing results in same number of different types of patients treated by Dr. Smith or his peer. Switch the tree structure of peer group (but not the length of stay) with Dr. Smith's tree. This could be the output from R package on Covariate Balancing.
Report the un-confounded impact of Dr. Smith on length of stay using the common odds ratio of having above average length of stay.

Common odds ratio SQL►.
Data Download►
Bindu's Python Teach One YouTube►
Joseph's Python Teach One YouTube►
Weighted Length of Stay SQL►
Sowmya Chakravarthy's Answer► R code►

Question 2: The following data show the grades students got while working in teams. Estimate what should be the grade of each student, controlling for who they teamed up with. Here's how you can proceed. The grade for each student is a weighted average of their project grades. The project grades are weighted based on the team member's likelihood to increase the team's grade. For example, to calculate if Student B increases Student A's grade, you can count the number of projects where Student B was present and the grade was above average of all project grades. Then, divide this count by the total number of projects that student B had done:

Count teams where Student B was present and grade was above average of 83.75: 2 (project 1 and 2)
Count number of teams where Student B was present: 3 (project 1, 2, and 6)
Probability = (Number of teams with above average grade) / (Total Number of teams)
Probability = 2/3 = 0.67%
Team score: 100

Count projects where Student C was present and project's grade was above average grade of 83.75: 3
Count projects where Student C was present: 4
Probability that student C will increase the grade of the team: 3/4 =0.75
Team score: 90

Count projects where student D was present and project's grade was above average of 83.75: 2
Count number of teams where Student D was present: 4
Probability that student D will increase the grade of the team: 2/4 = 0.5
Team score: 60

Separate analysis must be done for each student. Here is an example of Inverse Propensity weighted average score for student A:

Inverse Weighted average

Calculate the inverse propensity weighted grades for all students. .

Data Download►
Vladimir Cardenas's Answer► R-Code►(password protected)
Sahila Qureshi's Teach One YouTube►

Question 3: The following data provide the relationship between Y and variables A, B, C, and D. Create an interaction plot, to examine the interactions between variables B, C, D and the variable A.

Data Download►
How to create an interaction plot? Read►

Visual display of one-way ANOVA PubMed►
Chapter 5 ANOVA Fixed Effect Models Open Book►

HAP 719: Advanced Statistics I

Analysis of Co-Variance & Covariate Balancing

Objectives

Lecture

Assignments

More