Assigned Reading
- Purpose of LASSO regression Slides►
- Clusters of COVID-19 symptoms: Application of LASSO regression (use instructor's last name as password)
Read►
- What to do about a negative McFadden R-squared?
ChatGPT►
- Tutorial on LASSO regression
Python►
R►
You Tube►
- Clusters of COVID-19 symptoms: Application of LASSO regression (use instructor's last name as password)
Read►
- Python and R code for LASSO regressions, temporal analysis, R-squared calculations, and other topics
Zip►
Assignment
Question 1: LASSO regress COVID-19 test results on COVID-19 symptoms.
Identify the relative weight of each symptom, pair of symptom, and
triplet of symptoms. Clarify if clusters of symptoms are more accurate
than individual symptoms. LASSO regress COVID-19 test results on its symptoms:
Report a measure of goodness of fit and the coefficients for the
regression.
Resources for Question 1:
- Prepare the data by setting variables that are present to 1 and
absent to 0. When the COVID-19 test result is missing, drop the case
from the analysis. When a symptom is missing, replace it with its
mode, almost always 0. Data►
- 30 subset of IDs Data Subset►
- Data pre-processing
Python►
- Main effect model
Python►
- How to include interaction terms in Python
ChatGPT►
- Symptom cluster model
Python►
- Dr Vang's code
Python►
- Chris Naso's Teach One
Slides►
Python►
Question 2: Using LASSO
logistic regressions, identify the relative weights of each symptom,
pair of symptom and triplet of symptoms, in diagnosis
of COVID-19. Make sure that you use Polynomial function to create
interaction terms for symptoms. Make sure that you try the LASSO
regressions with the following three C hyper-parameters: 0.1, 0.01, and
0.001. List the non-zero coefficients for each of the C parameters.
Report the McFadden Pseudo R-square for each of the C parameters.
Resources for Question 2:
- COVID-19 test results and symptoms
Data►
- Count of number of times symptoms occur together for the same person
Data►
- Percent of times symptom listed in the row occurs before the symptom listed in the column
Data►
- Python code for repeated LASSO regressions, in order of variables' time of occurrence
Python►
- Jieun Jan's Teach One SQL►
- Tejaswi Pulusu's Teach One Slides►
Question 3: The following provides the results from
a recent LASSO regression of "symptom remission" on patients' "medical
history" for patients taking 15 different antidepressants.
- For patients taking Bupropion, what are the 5 most important
features that increase symptom remission? Ask ChatGPT if a
person with these features should take Bupropion? Report the difference
between the regression and the advice of ChatGPT.
- For patients taking Bupropion, what are the 5 least important
features that can be used to rule out the use of Bupriorin?
- In comparing Bupropion and Citalopram, what are the features that
affect both medications? If the first 3 digits of the International
Classification of Disease codes are the same, consider them the same
feature.
- Suppose we can ask about the features listed in the two
regressions. In what order, question should be asked,
if we want to differentiate among the two medications with least
amount of queries? List the first 10 questions that are most likely to
resolve the need to take one of the two medications.
Here are resources for Q3:
- For regression coefficients look at the sheet "Rem Coef"
Download►
- How to code response to 3c in Python
ChatGPT►
- Nishita's Teach One Slides►
Question 4 (Optional): In the upcoming project for the course,
you are asked to analyze data within All of Us. Please set up the
database for analysis of impact of antidepressants on remission.
Here are resources for question 4:
More
- Graphical LASSO
PubMed►
- Time varying graphical LASSO YouTube►
- Graphical LASSO more accurate than logistic regression
PubMed►
This page is part of the course on Comparative Effectiveness by Farrokh Alemi, PhD Home►
Email►
|