## Lecture: LASSO Regression
## Assigned Reading- Purpose of LASSO regression Slides►
- Tutorial on LASSO regression Python► R► You Tube►
- Clusters of COVID-19 symptoms: Application of LASSO regression (use instructor's last name as password) Read►
- What to do about a negative McFadden R-squared? ChatGPT►
## Assignment
- Describe the order of occurrence of the variables. Assume that age, and gender occur at birth. Assume home tests occurs after onset of symptoms. Assume that laboratory PCR test occurs after home test. Establish the order with which symptoms occur by counting for each pair of symptoms, the number of times one symptom occurs before another. Use the sum of pairwise count of one symptom occurring before all others to establish which symptom occurs first.
- Using logistic LASSO, regress the PCR test results on all variables (age, gender, symptoms, and home test results), including pairwise or triple variables that precede PCR test. List the variables that are direct predictors of PCR test results. This list should include the coefficients for the non-zero Logistic regression variables, including coefficients for pairs or triple of variables. Report the percent of variation explained by the LASSO regression of PCR tests on independent variables. Calculate and report the McFadden Pseudo R-Square.
- Using LASSO, regress Fever on variables that precede it,(include main effects, pairwise combination, and triplets of variables0. Report the independent variables that are significant (non-zero) predictors of Fever. Report the percent of variation explained by the regression
The following resources may be helpful: - Data Download► Dictionary►
- How to include interaction terms in Python ChatGPT►
- Count of number of times symptoms occur together for the same person Data►
- Percent of times symptom listed in the row occurs before the symptom listed in the column Data►
- Python code for repeated LASSO regressions, in order of variables' time of occurrence Python►
- Jieun Jan's Teach One SQL►
- Tejaswi Pulusu's Teach One Slides►
- Vladimir Cardenas's Answer► R-Code►
- Dharmi Desai's Teach One YouTube► Slides►
- Plot the Circulatory Body System and notice that it is bimodal.
- Organize Circulatory Body System (dependent variable) into a binary variable. Zero for values of the bimodal distribution to the left and 1 for the values of bimodal distribution to the right. Drop from analysis any place were Circulatory Body System is missing.
- When an independent variable is missing, impute the variable from other variables.
- Include pairwise, triple, and four-way combination of independent variables in your analysis. Print out the values for the first 4 rows of the dependent and independent variables
- Adjust the hyper-parameter so that about 10 to 15 variables remain in the equation. Report predictors of progression of diseases in the circulatory body system. List the variables, pairs of variables, triplets and four way combination of variables that are non-zero in the LASSO regression.
- Evaluate the McFadden R-square (percent of variation in circulatory diseases explained by other variables that occur prior to it).
- Data Download► Dictionary►
- Gidewon Tesfai's Answer► R-code►(Password protected)
## More- Graphical LASSO PubMed►
- Time varying graphical LASSO YouTube►
- Graphical LASSO more accurate than logistic regression PubMed►
This page is part of the HAP 819 course on Advanced Statistics organized by Farrokh Alemi, PhD Home► Email► |