Question 1: For the following X and Y data, calculate the regression equation using Excel. Plot the points and the line. Calculate the residuals and sum of squared residuals. Recalculate the sum of squared residuals and re-plot the line when the intercept is increased and decreased by 20%. Recalculate the sum of squared residuals and plot the line when the coefficient for X is increased and decreased by 20%. Which of these 5 lines minimizes the sum of squared residuals and by how much? Sample Plots in Reading► Cheema & Shih's Answer►
Question 2: Regress cost of healthcare on comorbidities of the patients, age of patients, gender of patients and whether they participated in the medical foster home program. MFH is an intervention for nursing home patients. In this program, nursing home patients are diverted to a community home and health care services are delivered within the community home. The resident eats with the family and relies on the family members for socialization, food and comfort. It is called "foster" home because the family previously living in the community home is supposed to act like the resident's family. Enrollment in MFH is indicated by a variable MFH=1.
Various costs are reported in the file, including cost inside and outside the organization. Rely on cost per day but exclude patients who have 0 cost within the organization. The cost is reported for specific time period after admission, some short and some longer. Use daily cost so you do not get caught on the issues related to lack of followup.
CCS in these data refers to Clinical Classification System of Agency for Health Care Research and Quality. These data indicate the comorbidities of the patient. When null, it is assumed the patient did not have the comorbidity. When data are entered it is assumed that the patient had the comorbidity and the reported value is the first (maximum) or last (minimum) number of days till admission to either the nursing home or the MFH. Thus an entry of 20 under the minimum CCS indicates that from the most recent occurrence of the comorbidity till admission was 20 days. An entry of 400 under the Maximum CCS indicates that from the first time the comorbidity occurred till admission was 400 days. You choose what data (minimum, maximum, occurrence) is relevant for the analysis and you use what you think should be used. Keep in mind the possibility that for acute illness the most recent event may be predictive while for chronic illness the first occurrence may be predictive of cost.
The functional disabilities are probabilities that the patient has the disability. These probabilities are generated from the CCS diagnoses and demographics of the person by a P2C2E process. P2C2E stands for a process too complicated to explain. In completing this assignment follow these steps:
Question 3: In the following data, examine whether age, gender and last year's cost predict next year's cost. If you are using R code make sure that you reformat currency into a number. Data► R Code► Cheema & Shih's Answer►
Question 4: A Markov Blanket refers to a set of variables that would make all other variables irrelevant in predicting the response variable. If X1 and X2 are signficant predictors of Y, X3 and X4 are not, and no interactions are significant; then what is the Markov Blanket for Y? How is the concept of Markov Blanket related to multi-colinearity?
Suppose X2 occurs after Y and X1 occurs prior to Y, what is a Markov Blanket that separates variables that are irrelevant and could possibly be causes of Y. Keep in mind that a cause is something that occurs prior to effect, has a significant association with the effect, has a mechanism leading from cause to effect, and if cause is removed then the effect is less likely to occur, cetris peribus.
For additional information (not part of the required reading), please see the following links: