Lecture: LASSO Regression  


Assigned Reading

  • Tutorial on LASSO regression Python► R► You Tube►
  • Clusters of COVID-19 symptoms: Application of LASSO regression (use instructor's last name as password) Read►


Question 1: LASSO regress COVID-19 test results on COVID-19 symptoms. Identify the relative weight of each symptom, pair of symptom, and triplet of symptoms. Clarify if clusters of symptoms are more accurate than individual symptoms.  LASSO regress COVID-19 test results on its symptoms: Report a measure of goodness of fit and the coefficients for the regression. 

Resources for Question 1:

  • Prepare the data by setting variables that are present to 1 and absent to 0. When the COVID-19 test result is missing, drop the case from the analysis. When a symptom is missing, replace it with its mode, almost always 0. Data►
  • 30 subset of IDs Data Subset►
  • Data pre-processing Python►
  • Main effect model Python►
  • How to include interaction terms in Python ChatGPT►
  • Symptom cluster model Python►
  • Dr Vang's code Python►
  • Chris Naso's Teach One Slides► Python►

Question 2: Using LASSO logistic regressions, identify the relative weights of each symptom, pair of symptom and triplet of symptoms, in diagnosis of COVID-19.  Make sure that you use Polynomial function to create interaction terms for symptoms.  Make sure that you try the LASSO regressions with the following three C hyper-parameters: 0.1, 0.01, and 0.001. List the non-zero coefficients for each of the C parameters. Report the McFadden Pseudo R-square for each of the C parameters.  

Resources for Question 2:

  • COVID-19 test results and symptoms Data►
  • Count of number of times symptoms occur together for the same person Data►
  • Percent of times symptom listed in the row occurs before the symptom listed in the column Data►
  • Python code for repeated LASSO regressions, in order of variables' time of occurrence Python►
  • Jieun Jan's Teach One SQL►
  • Tejaswi Pulusu's Teach One Slides►

Question 3: The following provides the results from a recent LASSO regression of "symptom remission" on patients' "medical history" for patients taking 15 different antidepressants.

  1. For patients taking Bupropion, what are the 5 most important features that increase symptom remission?  Ask ChatGPT if a person with these features should take Bupropion? Report the different between the regression and the advice of ChatGPT.
  2. For patients taking Bupropion, what are the 5 least important features that can be used to rule out the use of Bupriorin?
  3. In comparing Bupropion and Citalopram, what are the features that affect both medications? If the first 3 digits of the International Classification of Disease codes are the same, consider them the same feature. 
  4. Suppose we can ask about the features listed in the two regressions.  In what order, question should be asked, if we want to differentiate among the two medications with least amount of queries? List the first 10 questions that are most likely to resolve the need to take one of the two medications.

Here are resources for Q3:

Question 4: In the upcoming project for the course, you are asked to analyze data within All of Us.  Access to this data requires several training programs and registration.  In order to make sure that you will have access to these data in time, please register for access to de-identified data in All of Us.  There are several steps to registration and the process takes 60 to 90 minutes:

  1.  Register for an account on @researchallofus.org
  2. Change from temporary password to a new password and record your password on paper somewhere.
  3. Turn on Google 2-Step Verification
  4.  Verify your identity with Login.gov.  This step requires a state ID or Drivers License, social security number, and text phone. There are multiple passwords that you should keep in mind.  There is your GMU password, your research workbench password on All of Us and your computer password, and your Google password.  Please make sure that you keep these accounts separate and read the messages carefully to see which password is needed.
  5.  Complete All of Us Registered Tier Training
  6.  You do not need to get additional data access beyond registration data. George Mason University does not allow access to Controlled Tier
  7. Sign the Code of Conduct Sign Data User Code of Conduct

Here are resources for question 4:


This page is part of the course on Comparative Effectiveness by Farrokh Alemi, PhD Home► Email►