HAP 719 Advanced Statistics

HAP 719: Advanced Statistics I

Final Exam 2024  

Submit Jupyter file in a PDF format.  Include both code and your findings. 

Question 1: Use the following corpus of training data, classify the sentiment in the following sentence: The doctor was terrible but the nurses were terrific.     

  1. Regress the classification labels in the training set on the words in the target sentence: "doctor", "was" "terrible" "but" "nurses" terrific".   Report the shape of data, i.e., number of variables, and number of cases.  Report the coefficient for each word and the intercept.  Report which coefficient is statistically significant.  Report the McFadden R2 in 5-fold cross-validation.  What is the predicted probability that the sentence is a complaint? 
  2. Regress the classification labels in the training set on the words, pair of words, triplet of consecutive words in the target sentence.  Report the shape of the data and show that the number of variables exceeds the number of variables in part "a".  Report the intercept and the coefficients for words and combination of words.  Indicate which coefficient is statistically signficant.

Question 2:  Create an interaction plot for diabetes. The x-axis should indicate combinations of the other variables and the Y axis cost of care. Plot two lines one with and one without diabetes.  Indicate which interactions are likely to be statistically significant in these data. 


This page is part of the HAP 719 course on Advanced Statistics I by Farrokh Alemi PhD Home► Email►