- Stratified Regression (use instructor's last name for password)
- Read Chapter 18 in Statistical Analysis of Electronic Health
Records by Farrokh Alemi, 2020
- Cursor and do-while SQL commands
Submit one file for all questions. Include all charts, code, and
output in the same file. Start each question in a separate page or
sheet. Include in the first page a summary page. In the summary page
write statements comparing your work to answers given or videos. For
example, "I got the same answers as the Teach One video for question 1."
Question 1: Estimate mortality rate in 6 months for lung cancer
patients with various
SQL & R Code Combined►
- Identify parents in the Markov blanket of lung cancer.
- Verify that all comorbidties make prognosis of lung cancer
- Use SQL code and parents in Markov blanket of lung cancer, to
estimate survival from lung cancer.
- Use SQL to construct case/control
comparisons for each comorbidity of lung cancer.
SQL to estimate the intercept for parameters of the multiplicative function form.
Estimate the overall k parameter for the multiplicative model.
- Report the mortality rate for patients who just have lung cancer and no
- Provide the equation that calculates the risk
for combination of lung cancer and its comorbidities.
Question 2: Many patients, at end of life, experience
disabilities. In fact, disabilities are often used to anticipate
end of life. The attached data show the disabilities residents of
veteran administration nursing homes have experienced. Estimate how various disabilities predict mortality
in 6 months. The data do not have headers. The variables are listed in the following order: ID, age,
gender (M for male, F for Female), number of assessments completed on
the person, number of days followed, days since first assessment, days
to last assessment, unable to eat, unable to transfer, unable to groom,
unable to toilet, unable to bathe, unable to walk, unable to dress,
unable to bowel, unable to urine, dead (1) or alive (0), and assessment
number. The following table should assist in organizing the data.
- Clean the data using the following steps: The age at death is
given as a row of data. For each assessment calcualte if the
patient dies in 6 months from the assessment. If the patient
never dies assume not dead in 6 months. At death assume that the
patient has all disabilities, as is the data indicates no disabilities
at death. Drop last assessment as no outcomes can be calculated
from last assessment. Assume age of assessment is age at first
assessment (given as the second variable) plus days to assessment/365.
Residents with negative age should be dropped because of date of birth
errors. Residents 100 or more years should be dropped because of
small sample. Note that the analysis is done at assessment level
and not at patient level. Data►
- Predict from the patient's assessments (i.e. their age, gender, and disabilities
at time of assessment) if the patient is likely to die in the next 6
months and may be
a candidate for hospice care. Do not use regression in these
analysis and estimate the parameters using SQL. SQL►
- Calculate the k constant for the multiplicative model using SQL.
Generate possible k values and see which one of the k values satisfy
- Use the model you have developed to predict the probability of
mortality for a 75 year old resident with urine, bowel, and toilet disabilities. Enter the case description into a table
called RecentCases, using
Create Table and Insert Value commands. Then use this table to
predict the probability of mortality for this resident.
Make sure that the probability of mortality is adjusted to range
between minimum amd maximum probabilities for different strata.
Stratfied regression provides a transformed probability that should be
adjusted to estimate the actual probability using this formula:
Where Max is the maximum and Min is the minimum probabilities for each
Question 3: The following data show the variation in
diabetes in select counties across United States. Using stratified
covariate balancing report the impact of access to supermarkets on
diabetes after controlling for other variables.
- Check that all variables are positively and monotonely related to prevalence of diabetes in the county.
- Assign a binary variable to each variable in such a manner that
when the variable is 1, diabetes is more likely.
- Create a multiplicative model for predicting diabetes.
Question 4: These data come
from STAR*D experiment conducted by National Institute of Medicine. Use instructor's last name as password.
The data report the experience of approximately 4,000 patients with
citalopram, bupropion, mirzapine, buspirone, lithium, nortriptyline,
sertraline, thyroid, tranylclypromine, and venlafaxine.
The data are reported for a total of 22,254 visits. Visits may
be 2 week or more apart. Not every patient shows for every
scheduled visit. Organize the data so there is one
row for each patient and each antidepressant trial (known in the
data as Concat). Note that this field considers
combination of antidepressants as a new antidepressant. Ignore the dose of the medication. Patients received multiple antidepressants
during these trials until something worked for them. Include each time a
new antidepressant was tried as a separate trial. If the patient has taken the
antidepressant at any time during the trial, then mark it
as 1, otherwise 0. Notice that some patients have taken the
medication and others have not. Patients who have not taken a
particular medication have taken other medications, so at any
time we are comparing one medication to alternative treatments.
The medication is considered to
have caused the remission if the patient is referred to follow
up portion of the study, at any point while taking the
medication; i.e. the variable
"Treatment_plan_equal_3" is set to 1 while taking the
- Clean and organize the data for analysis of bupropion
- Identify the parents in the Markov blanket of bupropion
- Create a multiplicative model of the impact of variables in the
parent of Markov Blanket of bupropion and buproprion itself on
- Predict remission rate for bupropion using two nearest strata
For additional information (not part of the required reading), please see the following links:
- Multi-attribute preference functions. Health Utilities Index. PubMed►
- Utility functions for health profiles PubMed►
- How decisions reveal our preferences
This page is part of the course on Comparative Effectiveness by Farrokh Alemi PhD Home►