For this assignment you can use any statistical software you are familiar with or use R.  The objective is to find a set of variables that would predict response to citalopram.  Citalopram is an antidepressant.  It is abbreviated as CIT in the data. These data come from STAR*D experiment conducted by NIMH.

  1. Read about the study protocol. Protocol►
  2. Download data.  Use instructor's last name as password.  Must enter password twice. Data 2010►  Data 2003►
  3. Summarize the data.
  4. Select a set of variables and construct a logistic regression model to predict success of CIT.
  5. Check assumptions of the model through visual plots, including:
    • the residuals of the model are nearly normal,
    •  the variability of the residuals is nearly constant,
    • the residuals are independent, and
    • each variable is linearly related to the outcome
  6. Describe what predicts success of CIT.
  7. Describe how well the model predicts response to CIT.

See Bushra GM's response to assignment Read► Excel► SPSS► SAV►  See work of Jamie and Shruti Read►


For additional information (not part of the required reading), please see the following links:

  1. Regression using R Read►
  2. Statistical learning with R Read►
  3. Open introduction to statistics Read►

