George Mason University HAP 719 Advanced Statistics

HAP 719: Advanced Statistics I

Basic data 

Introduction to Data (review)



Assignments are submitted on blackboard.  They are graded as pass/fail.  A summary 1-page word document should be included.  In the summary, you should state if you were able to get the same answers as those provided. Your R, STATA, or Python code should be included in separate files. No late assignments are accepted.  It is OK to help each other in doing the assignments but not OK to copy and paste work of others.  It is OK to use ChatGPT or other large language models to generate the R code, but you must be transparent about it and report its use. 

Question 1: Read the following data, (a) replace missing values with zeros, (b) drop variables where all data are missing, and (c) plot progression of illness in the neoplasm body system against progression of illness in endocrine body system. These data are on incidence of diabetes among patients who have a variety of body systems deteriorations.  

Question 2: Calculate a histogram of progression in mental disorders.  Repeat the histogram and this time use the natural log of progression in mental disorders.  You can do this on 10% sample of data.

Question 3: Calculate correlation matrix between any pair of variables. Show the scatterplots of all pairs.  You can do this on a 10% random sample of the data.

Question 4: Calculate a summary of the data  that includes mean and standard deviation of each variable.  Identify the variable with the largest mean.  Identify the variable with largest standard deviation. You can do this on 10% sample of the data.