Assignments should be submitted in Blackboard. All assignments should be done in Python, if possible. Include at top of your report a summary section or include a file that reports the summary of your work. In the summary section, or summary page, write brief sentences comparing your work to answers given or videos. For example, "I got the same answers as the Teach One video for question 1."
Question 1: For the following X and Y data, calculate the regression equation using Excel. Plot the points and the line. Calculate the residuals and sum of squared residuals. Recalculate the sum of squared residuals and re-plot the line when the intercept is increased and decreased by 20%. Recalculate the sum of squared residuals and plot the line when the coefficient for X is increased and decreased by 20%. Which of these 5 lines minimizes the sum of squared residuals and by how much? Sample Plots in Reading► Cheema & Shih's Answer► Chelsea's Answer►
Question 2: Regress cost of healthcare on comorbidities of the patients, age of patients, gender of patients and whether they participated in the medical foster home program. MFH is an intervention for nursing home patients. In this program, nursing home patients are diverted to a community home and health care services are delivered within the community home. The resident eats with the family and relies on the family members for socialization, food and comfort. It is called "foster" home because the family previously living in the community home is supposed to act like the resident's family. Enrollment in MFH is indicated by a variable MFH=1.
Various costs are reported in the file, including cost inside and outside the organization. Rely on cost per day but exclude patients who have 0 cost within the organization. The cost is reported for specific time period after admission, some short and some longer. Use daily cost so you do not get caught on the issues related to lack of followup.
CCS in these data refers to Clinical Classification System of Agency for Health Care Research and Quality. These data indicate the comorbidities of the patient. When null, it is assumed the patient did not have the comorbidity. When data are entered it is assumed that the patient had the comorbidity and the reported value is the first (maximum) or last (minimum) number of days till admission to either the nursing home or the MFH. Thus an entry of 20 under the minimum CCS indicates that from the most recent occurrence of the comorbidity till admission was 20 days. An entry of 400 under the Maximum CCS indicates that from the first time the comorbidity occurred till admission was 400 days. You choose what data (minimum, maximum, occurrence) is relevant for the analysis and you use what you think should be used. Keep in mind the possibility that for acute illness the most recent event may be predictive while for chronic illness the first occurrence may be predictive of cost.
The functional disabilities are probabilities that the patient has the disability. These probabilities are generated from the CCS diagnoses and demographics of the person. In completing this assignment follow these steps: Python Code►
Question 3: In the following data, examine whether age, gender and last year's cost predict next year's cost. If you are using R code make sure that you reformat currency into a number. Data► R Code► Cheema & Shih's Answer► Chelsea's Answer►
Question 4: Throughout this course we emphasize the concept of Markov blanket. A Markov blanket refers to a set of variables that would make all other variables irrelevant in predicting the response variable. You have not been exposed to this concept yet but you have learned about issues related to multi-collinearity in regression. The point of this question is to push you to think harder about these two concepts.
For additional information (not part of the required reading), please see the following links: