## Lecture: Ordinary Regression
## Assigned Reading- Session overview YouTube►
- Introduction to regression
- Verifying regression assumptions Slides► YouTube► Video►
- Model selection Video►
- Diagnostics using graphs Video►
- LASSO regression Read►
- Markov blanket in a network Media►
## AssignmentAssignments should be submitted in Blackboard.
Various costs are reported in the file, including cost inside and outside the organization. Rely on cost per day but exclude patients who have 0 cost within the organization. The cost is reported for specific time period after admission, some short and some longer. Use daily cost so you do not get caught on the issues related to lack of followup. CCS in these data refers to Clinical Classification System of Agency for Health Care Research and Quality. These data indicate the comorbidities of the patient. When null, it is assumed the patient did not have the comorbidity. When data are entered it is assumed that the patient had the comorbidity and the reported value is the first (maximum) or last (minimum) number of days till admission to either the nursing home or the MFH. Thus an entry of 20 under the minimum CCS indicates that from the most recent occurrence of the comorbidity till admission was 20 days. An entry of 400 under the Maximum CCS indicates that from the first time the comorbidity occurred till admission was 400 days. You choose what data (minimum, maximum, occurrence) is relevant for the analysis and you use what you think should be used. Keep in mind the possibility that for acute illness the most recent event may be predictive while for chronic illness the first occurrence may be predictive of cost. The functional disabilities are probabilities that the patient has the disability. These probabilities are generated from the CCS diagnoses and demographics of the person. In completing this assignment follow these steps: Python Code► - Clean the data using SQL. Check if cases repeat and should be deleted from the analysis. There are many null values, make sure your solutions takes into account null value. Gender is indicated as "M" and "F"; revise by replacing M with 1 and F with 0. In survival days, null values indicate zero. In other variables, null value can be imputed from the mode or the case can be ignored. Taheeri's Teach One► Marla's Teach One►
- Describe the data using univariate analysis.
- Check that cost distribution is normal and if not normal decide on transformation of the value that would make it more normal.
- Check that age and cost have a linear relationship.
- Check the impact of age and gender interaction on cost.
- Check the impact of survival on cost.
- Create a regression model to explain the relationship among the variables and cost.
- Use plots of residuals to test regression assumptions.
- Explain the percentage of variation in cost explained by the model.
- List the top 10 predictors of cost (list these predictors using English language and not coded data).
- Describe in English if MFH contributes to cost of care
Use the instructor's last name as the password for the data. Data► CCS► Cheema & Shih's Answer► Chelsea's Answer►
- If X1 and X2 are significant predictors of Y, X3 and X4 are not, and no interactions are significant; then what is the Markov Blanket for Y? How is the concept of Markov Blanket related to multi-collinearity?
- Suppose X2 occurs after Y and X1 occurs prior to Y, what is a Markov Blanket that separates variables that are irrelevant and could possibly be causes of Y. Keep in mind that a cause is something that occurs prior to effect, has a significant association with the effect, has a mechanism leading from cause to effect, and if cause is removed then the effect is less likely to occur, Cetris Peribus.
Markov chains explained visually► Markov blanket's definition► Answer by Chelsea► ## MoreFor additional information (not part of the required reading), please see the following links: - Introduction to regression by others Video► Slides►
- Regression using R Read►
- Statistical learning with R Read►
- Open introduction to statistics Read►
This page is part of the course on Comparative Effectiveness by Farrokh Alemi PhD Home► Email► |
||||||||||||||||||||||||||||||