Learning Objectives
- Analyze data using search for key cases within the database
- Detect interactions among variables
- Estimate regression parameters for massive data without matrix
manipulations
Lecture
Assignments
Question 1: In the following estimate the regression
equations from a plot of the data across strata. On the X axis, each
number refers to a unique combination of diseases in the patient's medical
history. The X-axis is a variable that captures the patients'
prognosis based on their medical history. The Y axis shows the
probability of mortality. Three lines are plotted. The line with
diamonds shows the probability of mortality for each of the strata on the X
axis. The line with squares shows the probability of mortality for
combination of strata and stomach cancer. The dashed line shows the
average probability of stomach cancer, across all strata.
Answer the following questions?
- What is the approximate odds of mortality from stomach cancer?
This estimate includes the average effect of comorbidities and cancer.
- For patients without cancer, which set of comorbidities has the
highest risk of mortality?
- Does mortality from stomach cancer depend on medical history
(comorbidities) of
the patient?
- In strata 7, what is the impact of stomach cancer on mortality
risk?
- In strata 20, what is the impact of stomach cancer on mortality
risk?
- If we regress 6-month mortality on (a) stomach cancer and (b)
combinations of comorbidities of patients, what is the
approximate coefficient of stomach cancer in the regression equation?
To answer this question, calculate the approximate change in mortality
for change in occurrence of stomach cancer across all 31 strata.
- What is the coefficient associated with combination of cancer and
medical history in strata 31?
- In what strata cancer adds the most to the risk of mortality?
- If we construct a linear regression model consisting of two
variables, cancer and prognosis of patients associated with
comorbidities captured in the strata, in which strata we are likely to
have the highest residuals.
Resources for Corner Cases Question 1:
Question 2: In the following, Y
indicates the logit of probability of an outcome. It is regressed on X1 through X4. All variables are binary. All
independent variable are monotone. All are related to Y. Y values are standardized so that when all independent values are absent
logit of y is 0 and when all are present logit of Y is 1. Using the technique of searching the data to construct approximate regression equations, answer the following questions:
- What is the coefficient of X1 in the regression of Y on X1 through X4?
- What is the coefficient of X1X2 in the regression of logit of Y on X1 through X4 plus all possible interactions among independent variables?
- To get an exact specification of regression coefficient for X1, what do we need to do?
Resources for Corner Case Question 2:
Question 3: In the following, we regress Y on five
binary variables that are positively related to Y.
- Using search technique what is the coefficient of the variable A?
- Using regression with no-intercept, what is the coefficient of the variable A?
- Create an interaction plot for variable A
Resources for Question 3
- Data
Download►
- How to estimate coefficients of a variable through search
Slides►
- Shreya Prasanna's answers
YouTube►
- Interaction plot for variable A before sorting:
More
For additional information (not part of the required reading), please see the following links:
- Open introduction to statistics
Read►
This page is part of the HAP 819 course on Advanced Statistics by Farrokh Alemi PhD
Home►
Email►
|