Submit Jupyter file in a PDF format. Include both code and your
findings.
Question 1: Use the following corpus of training data,
classify the sentiment in the following sentence: The doctor was terrible
but the nurses were terrific.
- Regress the classification labels in the training set on the words
in the target
sentence: "doctor", "was" "terrible" "but" "nurses" terrific". Report
the shape of data, i.e., number of variables, and number of cases.
Report the coefficient for each word and the intercept. Report
which coefficient is statistically significant. Report the
McFadden R2 in 5-fold cross-validation. What is the
predicted probability that the sentence is a complaint?
- Regress the classification labels in the training set on the
words, pair of words, triplet of consecutive words in the target
sentence. Report the shape of the data and show that the number
of variables exceeds the number of variables in part "a". Report
the intercept and the coefficients for words and combination of words.
Indicate which coefficient is statistically signficant.
Question 2: Create an interaction plot for diabetes. The
x-axis should indicate combinations of the other variables and the Y axis
cost of care. Plot two lines one with and one without diabetes. Indicate which
interactions are likely to be statistically significant in these data.
This page is part of the HAP 719 course on Advanced Statistics I by Farrokh Alemi PhD
Home►
Email►
|