- Tutorial on LASSO regression
- Clusters of COVID-19 symptoms: Application of LASSO regression (use instructor's last name as password)
Question 1: LASSO regress COVID-19 test results on COVID-19 symptoms.
Identify the relative weight of each symptom, pair of symptom, and
triplet of symptoms. Clarify if clusters of symptoms are more accurate
than individual symptoms. LASSO regress COVID-19 test results on its symptoms:
Report a measure of goodness of fit and the coefficients for the
Resources for Question 1:
- Prepare the data by setting variables that are present to 1 and
absent to 0. When the COVID-19 test result is missing, drop the case
from the analysis. When a symptom is missing, replace it with its
mode, almost always 0. Data►
- 30 subset of IDs Data Subset►
- Data pre-processing
- Main effect model
- How to include interaction terms in Python
- Symptom cluster model
- Dr Vang's code
- Chris Naso's Teach One
Question 2: Using LASSO
logistic regressions, identify the relative weights of each symptom,
pair of symptom and triplet of symptoms, in diagnosis
of COVID-19. Make sure that you use Polynomial function to create
interaction terms for symptoms. Make sure that you try the LASSO
regressions with the following three C hyper-parameters: 0.1, 0.01, and
0.001. List the non-zero coefficients for each of the C parameters.
Report the McFadden Pseudo R-square for each of the C parameters.
Resources for Question 2:
- COVID-19 test results and symptoms
- Count of number of times symptoms occur together for the same person
- Percent of times symptom listed in the row occurs before the symptom listed in the column
- Python code for repeated LASSO regressions, in order of variables' time of occurrence
- Jieun Jan's Teach One SQL►
- Tejaswi Pulusu's Teach One Slides►
Question 3: The following provides the results from
a recent LASSO regression of "symptom remission" on patients' "medical
history" for patients taking 15 different antidepressants.
- For patients taking Bupropion, what are the 5 most important
features that increase symptom remission? Ask ChatGPT if a
person with these features should take Bupropion? Report the different
between the regression and the advice of ChatGPT.
- For patients taking Bupropion, what are the 5 least important
features that can be used to rule out the use of Bupriorin?
- In comparing Bupropion and Citalopram, what are the features that
affect both medications? If the first 3 digits of the International
Classification of Disease codes are the same, consider them the same
- Suppose we can ask about the features listed in the two
regressions. In what order, question should be asked,
if we want to differentiate among the two medications with least
amount of queries? List the first 10 questions that are most likely to
resolve the need to take one of the two medications.
Here are resources for Q3:
- For regression coefficients look at the sheet "Rem Coef"
- How to code response to 3c in Python
- Nishita's Teach One Slides►
Question 4: In the upcoming project for the course,
you are asked to analyze data within All of Us. Access to this
data requires several training programs and registration. In order
to make sure that you will have access to these data in time, please
register for access to de-identified data in All of Us. There are
several steps to registration and the process takes 60 to 90
- Register for an account on @researchallofus.org
- Change from temporary password to a new password and record your
password on paper somewhere.
- Turn on Google 2-Step Verification
- Verify your identity with Login.gov. This step
requires a state ID or Drivers License, social security number, and
text phone. There are multiple passwords that you should keep in mind.
There is your GMU password, your research workbench password on All of
Us and your computer password, and your Google password. Please
make sure that you keep these accounts separate and read the messages
carefully to see which password is needed.
- Complete All of Us Registered Tier Training
- You do not need to get additional data access beyond
registration data. George Mason University does not allow access to
- Sign the Code of Conduct Sign Data User Code of Conduct
Here are resources for question 4:
- Graphical LASSO
- Time varying graphical LASSO YouTube►
- Graphical LASSO more accurate than logistic regression
This page is part of the course on Comparative Effectiveness by Farrokh Alemi, PhD Home►