Final Exam. Thursday the 23rd of March, Question Total Points Score Number Possible Received Total 100

Size: px

Start display at page:

Download "Final Exam. Thursday the 23rd of March, Question Total Points Score Number Possible Received Total 100"

Lynette Warner
5 years ago
Views:

1 General Comments: Final Exam Thursday the 23rd of March, 2017 Name: This exam is closed book. However, you may use four pages, front and back, of notes and formulas. Write your answers on the exam sheets. If you need more space, continue your answer on the back of the page. A standard normal table is attached at the end of the exam. Make sure you have all 19 pages! The exam is 180 minutes long. There are 4 questions, worth a total of 100 points. They are not equally weighted, nor are they of equal difficulty. The number of points each question is worth is printed with the problem. Read the questions carefully. If you are unsure of the interpretation, come ask. You must show your work to obtain full credit. If you use a result from class, state what result you are using. If you can t complete a problem for any reason, explain what concepts are at issue, and how you would attack the problem. If you can t work out a number you need for a later part of a problem give it a symbol and show how you would do the calculations with a symbol in place of the missing number. It is a good idea to explain your reasoning briefly in English. If I can t tell that you understood what you were doing, I can t give you credit, particularly if you get the wrong numerical answer. GOOD LUCK! Question Total Points Score Number Possible Received Total 100 THE STORY BEHIND THE EXAM: Dr. Biram Paul BP Moody is a psychiatry professor at our favorite school, the University of Calculationally Literate Adults, specializing in Bipolar Disorder. Bipolar disorder is characterized by large swings in mood, energy levels and the ability to enage in daily activities and tasks. The name bipolar refers to the two types of abnormal mood conditions, depression and mania (a state of elevated mood, euphoria and/or irritability), which characterize the disease. Scientists estimate that 1-2 million Americans currently suffer from BP and that anywhere from 1-4% of Americans may experience the disease during their lifetime. The emotional and financial costs to both individuals and society are enormous. You will help Dr. Moody analyze some of his data related to risk factors, frequencies of mood episodes and effects of treatments during this exam. 1

2 Question 1: Risky Business (28 points, 45 minutes) Dr. Moody s first data set is designed to examine various risk factors for bipolar disorder. A sample of n = 500 subjects believed to be at high risk for developing the disease was followed over an extended period of time. The outcome is a multicategory variable recording whether the subject developed Bipolar Disorder Type I (BPI), Bipolar Disorder Type II (BPII) or did not develop the disease (NoBP). BPI is characterized by more severe and longer lasting episodes of depression and higher levels of mania. BPII is characterized predominantly by depression with occasional episodes of hypomania (similar to mania but less severe). Although BPII subjects symptoms are less severe they have larger numbers of depressive episodes and less time well between episodes. (There are actually other subcategories of BP but for simplicity we will ignore them in this problem.) In addition to the outcome variable, Dr. Moody has recorded the following possible predictors: gender (X 1 = 1 for women and X 1 = 0 for men); family history (X 2 = 1 if the subject has a blood relative with BP or major depression, X 2 = 0 if not); history of subclinical depression (X 3 = 1 if the subject has such a history and X 3 = 0 if not); history of hypomania (X 4 = 1 if the subject has such a history and X 4 = 0 if not), substance abuse (X 5 = 1 if the subject has a history of drug or alcohol abuse and X 5 = 0 if not); and stress/anxiety (X 6 is an index score ranging from with higher scores indicating greater levels of life stress.) The printout on the following page shows the results of a multinomial logistic regression fit to Dr. Moody s data along with some follow up tests. Use it to answer parts (a)-(d). Part a (6 points) Find the odds ratio associated with the history of hypomania variable in the BPI level of the model and provide a brief interpretation of it. Do early episodes of hypomania seem to be a significant risk factor for BPI? What about for BPII? Does this make sense based on the problem statement? Solution: To get the odds ratio associated with a variable in a particular level of a multinomial model we simply exponentiate the coefficient. For the history of hypomania variable we have b 4 =.576 so the corresponding odds raio is e.576 = This means that subjects with a history of mania symptoms have odds (or relative risks) of getting BPI (as opposed to no BP, the reference category) 1.78 times or 78% higher than subjects without a such history, all else equal. A history of subthreshold manic symtoms is a significant risk factor for BPI (p-value =.032) but not for BPII (p-value =.756). This makes sense since BPI has a characterized by higher levels of mania whereas BPII involves primarily depression with only occasional episodes of hypomania. 2

3 . mlogit bptype gender famhist dephist hypomaniahist substance stress, baseoutcome(nobp) Multinomial logistic regression Number of obs = 500 LR chi2(12) = Prob > chi2 = Log likelihood = Pseudo R2 = bptype Coef. Std. Err. z P> z [95% Conf. Interval] BPI gender famhist dephist hypmaniahist substance stress _cons BPII gender famhist dephist hypmaniahist substance stress _cons ****************************************************************************** test gender test famhist test dephist chi2(2) = chi2(2) = chi2(2) = Prob > chi2 =.001 Prob > chi2 =.000 Prob > chi2 =.000 test hypomaniahist test substance test stress chi2(2) = 7.94 chi2(2) =.58 chi2(2) = Prob > chi2 =.019 Prob > chi2 =.749 Prob > chi2 =.0003 ****************************************************************************** test [1=2]: gender test [1=2]: famhist test [1=2]: dephist chi2(1) = 8.09 chi2(1) =.04 chi2(1) = Prob > chi2 =.001 Prob > chi2 =.835 Prob > chi2 =.001 test [1=2]: hypmaniahist test [1=2]: substance test [1=2]: stress chi2(1) = 6.36 chi2(1) =.01 chi2(1) = 7.35 Prob > chi2 =.012 Prob > chi2 =.911 Prob > chi2 =

4 Part b (2 points) Which variables are significant overall in this model? Solution: From the first block of tests following the regression printout, all the variables except substance abuse are overall significant in this model (p-values all <.05). Part c (5 points) For which variables do the effects differ across the levels of the model? Write the hypotheses mathematically and in words for one of these tests. noindent Solution: The second set of tests below the printout evaluates whether or not the coefficients of the predictors are equal in both levels of the model. A significant p-value implies that the the effect of the variable differs across the levels of the model. Using the first variable, gender, as an example the hypotheses would be H 0 : β 1,BP I = β 1,BP II the effect of gender on the odds of getting BPI is the same as the effect of gender on the odds of getting BPII (as opposed to no BP), all else equal. Or gender is equally a risk factor (or not) for BPI and BPII. H A : β 1,BP I β 1,BP II the odds ratio for women versus men for BPI vs no BP is different from the OR for BPII vs no BP. Or gender is a differential risk factor for the two subytpes of bipolar disorder. Note that the test is simply of whether the effects are equal we are definitely not testing whether the β s are equal to 0 but we also don t have to assume they are not 0. If both are 0 then there s definitely no difference in effect! The hypotheses for the other tests are parallel. From the printout we see that effects of gender, history of subclinical depression, history of hypomanic symptoms and stress/anxiety scores differ with respect to BPI and BPII with p-values of.001,.001,.012 and.0067 respectively. Part d (4 points) Based on your answers to (a)-(c) plus the coefficient values from the printouts provide a summary of how family and personal clinical and demographic factors affect the risks of various types of bipolar disorder. Solution: This part caused some problems as there are a lot of different ways to group the concepts. Given what was already answered in parts (a)-(c) the key is really to indicate in the cases when the various predictors have different effects for which subtype of BP they were a greater risk factor. Being female, having a history of hypomanic or depressive symptoms, having a relative with BP or major depression, and higher levels of stress are all associated with a higher risk of getting some form of bipolar disorder as one would expect. Substance abuse, however, does not seem to be a significant risk factor. Female gender, depressive symptoms and stress appear to be significantly greater risk factors for BPII than BPI. This follows from the fact that the (i) estimated coefficients are higher in the BPII part of the model than the BPI part of the model and (ii) the tests in part (c) show that the coefficients differ significantly between the two parts of the 4

5 model. Indeed, gender and stress are not even significant in the BPI part of the model although unsurprisingly a history of depressive symptoms is a significant risk factor for both BP subtypes. In contrast, a history of hypomaina is a significant risk factor for BPI but not BPII, while having a family member with BP or major depression is an equally significant risk factor for both subtypes. This is consistent with the fact the BPII is characterized by more frequent (although less severe) episodes of depression while BPI has higher levels of mania. Dr. Moody s graduate student suggests fitting an ordinal logistic model to to these data instead of the multinomial logistic model of parts (a)-(d) to save degrees of freedom and because she thinks the outcome categories are ordered in terms of severity. The corresponding printout is shown below. The outcome categories are ordered so that the coefficients correspond to changes in log odds of increasing putative severity of mood disorder (NoBP, BPI, BPII). Use the printout to answer the remaining parts of the question.. ologit bptype gender famhist dephist hypomaniahist substance stress Ordered logistic regression Number of obs = 500 LR chi2(6) = Prob > chi2 = Log likelihood = Pseudo R2 = bptype Coef. Std. Err. z P> z [95% Conf. Interval] gender famhist dephist hypmaniahist substance stress /cut /cut Note: You may find it helpful to know that the α =.05 critical point for a chi-squared distribution with 6 degrees of freedom is χ 2 6,.05 = Part e Find and give a careful interpretation of the odds ratio for the stress variable in the ordinal logistic regression model. Explain briefly what the proportional odds assumption means with respect to this variable and how you could check it. Solution: As with standard logistic and multinomial logistic models, in an ordinal logistic regression we obtain the odds ratio by exponentiating the regression coefficients. Here the odds ratio 5

6 associated with the stress variable is e.013 = This means that each additional point on the stress scale is associated with times or 1.3% higher odds of more severe(vs less severe) bipolar disorder. The ordering was assumed to be No BP < BP I < BPII the way the model was fit though in fact trying to order these at all is not a good idea as described in part (f)! The corresponding p-value is.000 so increased stress is clearly a highly significant risk factor! In this context, the proportional odds assumption means intuitively that the effect of stress is the same whether you are looking at odds of no BP vs some BP or no BP/BPI vs BPII which is somewhat hard to believe. Mathematically it means that if you plotted the stress relationship for these two comparisisons on the log odds scale you would get parallel lines. In order to do such a plot you would need to bin the data as the raw data don t give you log odds. Alternatively you could perform a score test or you could fit separate logistic models for some vs no BP or BPII vs no/npi and compare the coefficients of the stress variables in the two models to see whether they were approximately equal. The biggest mistake people made on this problem was in understanding what groups of outcomes were being compared at the various cutpoints. It is not BPI vs none and BPII vs none that is what a multinomial model does. It is not BPI vs none and BPII vs BPI that is a next highest model which we never formally studied though it can be done. Also, the proportional odds assumption is NOT equivalent to testing equality of coefficients for the various levels of the multinomial model. Indeed if the effects of stress on the odds of BPI and BPII are the same then the odds of BPII vs none/bpi will have to be less strong than the comparson of some vs no BP as in the first of these instances groups with lower and higher risks are being lumped together while in the second case the two higher risk groups get bundled. Part f (5 points) Do you think using an ordinal logistic model for these data was a good idea? Answer (i) based on the results of parts (a)-(d) and (ii) by performing an appropriate test. (You do not need to write out the hypotheses. Simply compute the test statistic and explain your conclusions.) Solution: This definitely does NOT seem like a good idea! From parts (a)-(d) the effects of the assorted predictor variables are very different on different levels of the model. Although most of the variables seem to overall be risk factors for BP in the expected direction, the ordering with respect to BP subtype is NOT the same for all the predictors. As noted in part (d), female gender, depressive symptoms and stress are much greater risk factors for BPII than BPI (and indeed stress and gender are not even siginificant for BPI) while the reverse is true for history of hypomania and the effect of history of subthreshold depression is essentially the same for the subtypes. It is not enough to just say the coefficients differed for some variables across the levels of the model. As noted above, the effect being equal for the subtypes is NOT the same as the proportional odds assumption which says that the BREAKPOINT between the ordered levels doesn t matter, not that all other levels are equally different from the base/reference level! Indeed, if the proportional odds assumption is correct we would EXPECT the effects to differ across the levels of the multinomial model. We can explicitly check whether the multinomial model (which is more flexible and allows each predictor to have it s own effect on each level of the model) is superior to the ordinal logistic model (which allows only single coefficient for each variable) by calculating our usual statistic of negative 6

7 twice the difference log likelihoods. The difference in degrees of freedom is equal to the difference in the number of parameters between the two models. Here we have χ 2 = 2 ( ( )) = This is much bigger than the critical value of χ 2.05,6 = 12.6 that was given in the problem statement so we conclude that the more flexible multinomial model is a significant improvement over the more parsimonious ordinal logistic model, as expected. (Note that both the multinomial model and the ordinal logistic model have two intercepts here so the difference in the degrees of freedom is just the number of predictor variables.) Part g (Optional Bonus) Using the multinomial logistic model, calculate the probability of developing BPI for (i) a man and (ii) a woman who has no family history of depressive disorders and no personal history of hypomania, substance abuse or stress but does have a history of subclinical depression. Are you surprised by your answers given the sign of the gender coefficient in the BPI part of the model? Explain what has happened. Solution: We are told that the values of the predictors are all 0 except for a family history of depression which is 1 and gender which is 0 for the man and 1 for the woman. In our multinomial model the formula for the required probability is e Xβ BP I 1 + e Xβ BP I + e Xβ BP II so all we need to do is get the relevant linear combinations for the main and the woman. For the man we have and for a woman we have e e = e e e = e Thus the man has a 28.4% chance of developing BPI (at least in our study population) while a woman has only a 25.3% chance. This may at first seem odd since the coefficie of the gender variable in the BPI level of the model is positive, suggesting that women have higher odds of BPI vs no depression than men. However, the probabilities are calculated relative to ALL THREE outcome categories, not just BPI vs no BP. The effect of gender on BPII is even greater than on BPI relative to no BP and so when you combine them with the other covariates set at these levels the probability actually goes up a fair bit for women relative to men for BPII but drops for BPI and no BP. 7

8 Part h (Optional Bonus) Do you think your preferred model from parts (a)-(f) will give accurate estimates of the probabilities of getting BPI and BPII? If so, why? If not, explain conceptually what you might need to do to adjust your model. Solution: The model will not give accurate predicted probabilities for developing BPI and BPII in the general population since we were told this was a high risk sample (though it will give accurate probabilites conditional on the person being high risk, however that was defined). To get accurate predicted probabilities in the general population we would have to look at the sampling weights that were used to get the high risk sample. It is actually not as simple as just adjusting the intercept here because this is not a case-control study. We prospectively followed people to see whether or not they developed BPI or BPII. What we did do was over-sample people based on covariate characteristics that increased their risk profile so that is what we need to adjust for. 8

9 Question 2: An Episode in the Life...(28 points, 45 minutes) Next Dr. Moody wants to look at the number of episodes of mania and depression experienced by people with bipolar disorder. He obtains a new sample of people who have had the disease for a moderate amount of time and records their lifetime number of manias to date. A printout of his Poisson regression with gender (0 = male, 1 = female), BP type (0 = BPI, 1 = BPII) and anxiety/stress (on a scale of with 100 being the highest) as predictors is shown below. He uses illness duration (number of years since the subject was diagnosed with BP) as the offset variable.. poisson manias gender bptype stress, exposure(illdur) Poisson regression Number of obs = 400 LR chi2(3) = Prob > chi2 = Log likelihood = Pseudo R2 = manias Coef. Std. Err. z P> z [95% Conf. Interval] gender bptype stress _cons illdur (exposure) Part a (5 points) Give brief interpretations of the effects of BP type and the stress index on the number of manic episodes based on this model. (Your interpretations should be on the raw data scale, not the log scale.) Solution: In a Poisson regression model with an offset variable the exponentiated coefficients from the GLM fit are interpretable as rate ratios for the number of events per unit time/area/size/etc.. Here we see that, all else equal, the average number of manic episodes per year is e 2.92 =.054 times as high (or 94.6% lower) in those with BPII (bptype=1) than those with BPI (bptype=0). Similarly, the rate of manic episodes goes up by a factor of e.0092 = or an increase of roughly 1% for each additional point on the stress scale. Both variables are highly significant with p-values of 0 to three decimal places. Part b (4 points) Find the predicted number of lifetime (to date) manic episodes for a man with BPI with a stress index score of 50 who has had the disorder for 10 years. Solution: To get the predicted number of manias, we plug in the values of the predictor variables to the estimated regression equation, exponentiate to get from the log link scale to the original scale and then multiply by the appropriate offset variable. Here we have a man (gender=0) with 9

10 BPI (bptype=0) and a stress score of 50 who has had the disorder for 10 years so the estimated number of manias is 10 e (0) 2.915(0)+.0092(50) = 10 e.71 = We thus expect such a person to have had around 20 manic episodes over the course of their 10 year illness. Next Dr. Moody fits a zero-inflated Poisson model to his data. The printout is shown below.. zip manias gender bptype stress, inflate(bptype) exposure(illdur) vuong Zero-inflated Poisson regression Number of obs = 400 Nonzero obs = 208 Zero obs = 192 Inflation model = logit LR chi2(3) = Log likelihood = Prob > chi2 = Coef. Std. Err. z P> z [95% Conf. Interval] manias gender bptype stress _cons illdur (exposure) inflate bptype _cons Vuong test of zip vs. standard Poisson: z = 7.89 Pr>z = Part c (5 points) Explain what zero-inflation means in the context of this problem and why using BP type as the predictor in the inflation part of the model makes sense. Does there appear to have been zero inflation? Briefly justify your answer. Solution: Zero inflation means having more subjects with no events than the assumed distribution of the outcome variable (here Poisson) would suggest. In the context of this problem that would mean having more people than expected with no full manic episodes since their formal diagnosis. That might be fairly common in subjects with BPII which is characterized principally by depression with occasional hypomanic episodes so it makes sense to use BP subtype as the predictor in the inflation portion of the model. There clearly was zero inflation in this problem since the Vuong 10

11 test of the zip (zero-inflated Poisson) model versus the standard Poisson model is highly significant with a whopping Z value of 7.98 and a p-value of 0 to as many decimal places as reported. Part d (6 points) Interpret the regression coefficients of BP type in both parts of the ZIP model. (You may find it helpful to transform them to an approppriate scale.) Solution: The inflation portion of the model is essentially a logistic regression looking at whether or not a subject is a certain 0. Thus if we exponentiate the coefficients in that part of the model we get odds ratios. We have ˆβ = 2.74 so the corresponding odds ratio is e 2.74 = This means that someone with BPII is 15 and a half times higher odds of being a certain 0 i.e. never have manic episodes than someone with BPI. This makes perfect since. In the regression piece of the model the exponentiated coefficients are again interpreted as rate ratios but for people who are NOT certain 0 s. Here that means that, all else equal, among people who could have manic episodes the rate per year is e =.19 times as high (or 81% lower) in people with BPII than people with BPI which also makes sense. Next Dr. Moody decides to look at numbers of depressive episodes. Since depression episodes are more common than manic episodes he has simply recorded the number of episodes in the previous year. He starts by fitting a simple Poisson model with BP type (0 = BPI and 1 = BPII) as the only predictor since BPII is supposed to be characterized by shorter but more frequent depressive episodes. His printout shown is below. Use it to answer the remaining parts of the problem.. poisson depressions bptype Poisson regression Number of obs = 400 LR chi2(1) = Prob > chi2 = Log likelihood = Pseudo R2 = depressions Coef. Std. Err. z P> z [95% Conf. Interval] bptype _cons *****************************************************************************. estat gof, pearson Goodness-of-fit chi2 = Prob > chi2(398) =

12 Part e (5 points) Explain what over-dispersion means in context of this problem. Does there appear to be overdispersion in this model? If not, why not? If so, what do you think might have caused it? Solution: Over-dispersion means higher variance in the outcome variable (after controlling for the covariates) than would be expected based on the assumed distribution. For a Poisson distribution the mean and variance are equal so overdispersion means that the variation in number of manias is larger than the mean rate of manias. We can check for this using a chi-squared goodness of fit test. From the printout the test statistic is and the p-value is 0 meaning there is extremely strong evidence of a lack of fit and in particular a suggestion of over-dispersion. If we take the ratio of the goodness of fit statistic to the number of degrees of freedom we get an estimate of the degree of over-dispersion. Here χ 2 /(df) = /398 = 2.16 so the variance is about twice as high as it should be. One possible cause for this is heterogeneity caused by the failure to include important predictors such as gender, stress, whether the person is a rapid cycler and so on which we have already seen are important in this disorder. Part f (3 points) Based on your answer to part (e), Dr. Moody is concerned that the standard error estimates for his regression coefficients may not be valid. Name two techniques we have learned that he could use try to adjust for this problem. Solution: We could adjust the standard errors directly using the inflation factor calculated from the goodness of fit test. Specifically this would mean multiplying the standard errors given on the Poisson regression printout by 2.16 = 1.47 so they would be increased by 47% which is quite a lot. Alternatively we could fit a negative binomial model which allows for more dispersion than the Poisson model. Part g (Optional Bonus) Dr. Moody s graduate student suggests that instead of using the techniques you named in part (f) you could take a non-parametric approach. Explain briefly (i) how you might use a permutation test to check whether there is a relationship between BP type and the average yearly number of depressive episodes and (ii) how you could use a bootstrap procedure to get a valid confidence interval for the coefficient of the BP Type indicator and hence for the rate ratio. Solution: (i) To do the permutation test we would randomly shuffle the BP type labels many times, rerun the Poisson model for each permuted data set, and sort the resulting regression coefficients for BP type to get the null distribution. We would then see if our observed regression coefficient of 1.57 is outside the 2.5% to 97.5% range of the null distribution. (ii) For the bootstrap proceedure we would take around 2000 samples of size n=400 with replacement from the original data (probably sampling separately from the the BPI and BPII subjects to maintain their relative proportions in the sample), run the Poisson model for each of the bootstrap samples to get the corresponding regression coefficient, order them and use the 2.5% and 97.5% values as the boundaries of our 95% confidence interval. 12

13 Question 3: Quality Control (24 points, 40 minutes) Dr. Moody has developed a new treatment for depression in bipolar subjects. This is a little tricky since use of standard antidepressants may push subjects over into mania. He has randomly assigned 50 BPII depressed subjects each to his new treatment and to the standard treatment and has followed them over a 6 month period. His outcome variable is a quality of life index (range with 100 being the best) which he has recorded at baseline, 3 months and 6 months. Below is the printout from a mixed effects model with group (0 = standard treatment, 1 = new treatment), time (in months) and a group by time interaction as fixed effects, along with a random intercept. The summary information from an OLS regression and a model with both a random intercept and a random slope are also included. The corresponding spaghetti plot is on the next page.. xtreg qol group time groupbytime, mle Random-effects ML regression Number of obs = 300 Group variable (i): id Number of groups = 100 Random effects u_i ~ Gaussian Obs per group: min = 3 avg = 3.0 max = 3 LR chi2(3) = Log likelihood = Prob > chi2 = qol Coef. Std. Err. z P> z [95% Conf. Interval] group time groupbytime _cons /sigma_u /sigma_e rho Likelihood-ratio test of sigma_u=0: chibar2(01)= Prob>=chibar2 = ****************************************************************************** Model Log-Likelihood df AIC BIC OLS Random Intercept Random Intercept + Slope Note: for AIC and BIC, lower scores are better. 13

14 . Part a (4 points) Based on the spaghetti plot, which of the three models would you expect to be the best, the OLS regression, the random intercept model, or the model that includes both the random intercept and slope? Briefly explain your reasoning and say how you would interpret the observed random effects (if any) in practical real-world terms. Solution: Based on the spaghetti plot, the random intercept model should be the best. The lines for the individual subjects within each group are very parallel (no fanning effect) but there are substantial shifts of the lines up and down about the group average. This suggests that each person has there own base level of QOL (so a random intercept is needed) but that there isn t much individual variation in the rate of change of QOL (so a random slope is not needed). The random intercept effects represent the innate view that individuals have of their basic life circumstances (or their basic mood personality) which cause their ratings to be systematically higher or lower than those of other people across time. Part b (5 points) Do the model printouts confirm your expectations? Briefly justify your answer, including stating which models you believe are significant improvements compared to others. Solution: The printouts are consistent with our expectations from part (a). The random intercept model has a far higher (less negative) log likelihood and much lower information criteria values than OLS regression, meaning that there is a significant improvement in model fit by using the random intercepts. Moreover, the likelihood ratio chi-squared test on the printout for the variance of the random intercept is hugely significant (χ 2 = , p-value = 0). Of course we could have calculated this directly by taking -2 times the difference in log likelihoods from the table below the printout! However the log likelihood and information criteria values for the random intercept plus random slope model are essentially identical to those for the random intercept only model, meaning there is no additional improvement in fit for adding the random slope term. The random intercept and random intercept+slope models are both significantly better than OLS but the model with the random slope is no better than the intercept only model. 14

15 Part c (7 points) Explain as carefully as you can what the model tells you about how the quality of life of the patients in this study changes over time as a function of treatment. Your answer should incorporate the signs, numerical values and significance of the fixed-effect model coefficients. Solution: The various coefficients can be interpreted as follows. At baseline (time = 0) the average QOL score in both groups is around 30. The intercept in the model ( ˆβ 0 = 30.06) corresponds to the average QOL score at time 0 in the control group. The group coefficient ( ˆβ 1 =.04) gives the difference in QOl between the new treatment and control (standard treatment) groups at time 0 but it is miniscule and indeed not statistically significant (p-value =.945) which tells us that at baseline there is no group difference. This is good since we are supposed to be doing a randomized study! The coefficient of the time variable tells us the change in QOL PER MONTH in the group receiving the standard treatment. Their QOL scores are going up by about ˆβ 2 = 2.02 points per month and that improvement is significant given the p-value of 0. The coefficient of the group by time interaction tells us the DIFFERENCE in the rate of change in QOL between the new and standard treatment groups. Here ˆβ 4 = 4.98 means that the the rate of improvement in QOL is 5 points higher per month with Dr. Moody s new treatment than it would have been on the standard treatment. The corresponding p-value of 0 tells us that the subjects in the new treatment group are improving significantly faster than those on the standard treatment. Their rate of improvement is in fact = 7 points per month. Thus we see that the subjects all started with fairly low QOL scores and while both treatments seem to be helping, Dr. Moody s new treatment is helping more, just as he would hope. Part d (6 points) Dr. Moody says his primary interests are really to find out (i) whether there has been significant change in quality of life within the group of subjects getting his new treatment from baseline to 6 months and (ii) whether there is a significant difference in quality of life between the two treatment groups at the 6 month time point. For each of these scenarios name two other techniques which could be used to address those questions and say how you would decide which was more appropriate. Solution: For point (i), Dr. Moody is interested in the before/after change in QOL in the group getting his new treatment. Since this is a within subjects test we have pairing. We could use a paired t-test, a sign test or a signed rank test to address this question. If the distribution of QOL scores is normal (as our mixed model from parts (a)-(c) assumed) then the paired t-test will be the most powerful. However, if the distribution of changes in QOL scores is skewed then we would probably want the signed rank test which is more powerful than the sign test but doesn t make as strong assumptions as the t-test. (Note that we could in fact get the answer to this problem from the mixed model by using an appropriate contrast though I didn t actually show you how to run that sort of test in the mixed model context.) For part (ii) we are interested in comparing the two treatment groups at a single time point. Since the groups are independent we can use a two-sample t-test if the QOL scores are normally distributed or a Wilcoxon rank-sum test if they are not. As above, the parametric test will be the more powerful if the distributional assumptions are met. (We could also get this result as a 15

16 contrast within the mixed model.) Part e (2 points) Dr. Moody is considering adding covariates such as age, gender, and current life stress to his mixed effects model. What do you think this is likely to do to the magnitude of the random effects and why? Solution: We would expect the random effects to shrink in magnitude (i.e. there would be a decrease in the estimated variance of the random intercepts, ˆσ i ) since the covariates Dr. Moody is proposing represent some of the individual subject level characteristics that could lead to someone having systematically low or high QOL. For instance if someone had a particularly stressful set of life situations, their QOL scores might low all along. If you adjust for differences in stress level that would explain some of the variance in the intercepts. Part f (Optional Bonus) Dr. Moody could have chosen to use either an hierarchical or repeated measures formulation of his model instead of the mixed effects model used above. You can get bonus credit for (i) mathematically rewriting the model you chose in (a)-(b) as a hierarchical model, (ii) saying what repeated measures covariance structure would be equivalent to the model you selected in parts (a)-(b) and (iii) suggesting (with justification) one other repeated measures covariance structure that might be appropriate here and describing briefly how you would compare it to the current model. Solution: (i) Thinking of this problem in the hierarchical framework we start with the withinsubjects model. We let Y it represent the QOL of subject i at time t and have Y it = γ 0,i + γ 1,i Time + ɛ i,t Next we write the coefficients from the within-subjects model as a function of population average treatment effects and individual effects. Based on the earlier parts of the problem we expect to have a subject level random intercept but no random slope so we get γ 0,i = β 0 + β 1 Tx + b 0,i γ 1,i = β 2 + β 3 Tx where b 0,i and ɛ it are all mean 0 and normally distributed and independent of one-another. (ii) The random intercept model is equivalent to a compound symmetric covariance structure if all subjects are measured at the same equally spaced time points. (iii) Since we have three time points we could consider either an unstructured covariance matrix (allowing the variances and correlations across time to vary freely) or an AR(1) model in which the variances are constant but the correlations die off over time. This last model probably makes a lot of sense since QOL is subject to change over time due to a variety of life circumstances which will tend to attenuate correlations if their are big gaps. 16

17 Question 4: What s Your Response? (20 points, 35 minutes) Dr. Moody is encouraged by the results of Question 3 which suggest his new depression treatment is improving BP subjects quality of life. Now he wants to learn more about how long it takes them to respond to it symptomatically. (Someone is considered a responder if their score on a standard index of depression symptoms both drops by at least 50% and falls below the cutoff for clinical depression.) He has performed a small 16 week (112 day) pilot study, randomizing 10 subjects to his new treatment and 10 to the standard treatment and has recorded how long it took each subject to achieve responder status. The response times in days for each group are shown below along with their respective Kaplan-Meier curves. Subjects whose times were censored are marked with a (+). Standard Treatment (Group 0): 28+, 49, 50, 67, 80, 90, 100, 105, 110, 112+ New Treatment (Group 1): 10, 12, 14, 20, 20, 25, 30, 37, 50, 112+ Part a (4 points) Give an example of an informative reason for censoring and a non-informative reason for censoring that could occur in the context of this study and identify a subject to whom each of these things could reasonably apply (by circling their data value), briefly explaining your reasoning. Solution: Informative censoring occurs if the subject drops out for a reason that has something to do with the outcome of interest. You do not have the know the reason for it to be informative. In the context of this study, informative censoring could occur, for example, if someone dropped out because their condition worsened or they were experiencing bad medication side effects. The person who was censored early in the standard treatment group at 28 days could fall in this category, if the treatment wasn t working for them. Non-informative censoring occurs if the subject drops out for a reason that has nothing to do with the outcome of interest. Again, it makes no difference whether you know the reason for drop out or not. A common form of noninformative censoring is when the study ends before people have the event of interest. We ve got one person in each treatment group who was censored at day 112 which was the stated end time 17

18 of the study. This is non-informative censoring. You can t say this sort of censoring is informative because we knew the people had lasted a long time. When you have censoring you always know that the people have gone without an event for that long, however long it is. What matters is whether the reason you lost track of them is because they were doing well (or poorly) not just that they were doing well or poorly. Part b (5 points) Based on the Kaplan-Meier plot what is the approximate median time to response in each of the treatment groups (i.e. the time by which 50% of subjects have responded)? What is the estimated response rate in each of the groups at the end of the 16 week pilot study? Overall what do the curves suggest about how the new treatment works relative to the standard treatment? Solution: To find the median time to event we just need to see where the survival curves cross the 50% line. This appears to be at about 20 days for the new treatment group and about 90 days for the standard treatment. By the end of the 16 weeks however, the two curves have come together, with about 90% of subjects in each group having responded. (Note that in this case failure to survive is a good thing it means people have responded!) The implication is that both treatments actually work very well in the long run but Dr. Moody s new treatment relieves symptoms much faster which is probably still of great interest to doctors and their patients. Now Dr. Moody decides to examine how dose is related to the probability of treatment response. Specifically, he has taken n = 40 subjects, given them a range of doses of the medication, and recorded whether or not they responded. The printout for a probit regression for his data is shown below. Use it to answer the remainder of the question.. probit responder dose Probit regression Number of obs = 40 LR chi2(1) = Prob > chi2 = Log likelihood = Pseudo R2 = responder Coef. Std. Err. z P> z [95% Conf. Interval] dose _cons ****************************************************************************** Part c (4 points) Based on this model, find the probability of response and a corresponding 95% confidence interval for people who receive no treatment (i.e. a dose of 0), briefly explaining your reasoning. Solution: We are asked to find the probability of response and a corresponding 95% confidence interval at a dose of 0 (i.e. the probability of spontaneous recovery without treatment). At a 18

19 dose of zero the Z-score is just the intercept, Z = 1.48 and the corresponding probability is P (Z 1.48) =.5 P (0 Z 1.48) =.0694 or just under 7%. Now to get the confidence interval we just have to use the transformation principle. The printout gives us a confidence interval of to for the intercept. To get the confidence interval for the spontaneous recovery rate we just convert these values to probabilities: (P (Z 2.34) = % and P (Z.625) = %. Thus there is somewhere between a 1% to 26.5% spotaneous recovery rate. This is a pretty wide range. The imprecision is due to a combination of our small sample size (n=40) and the fact that Dr. Moody probably wasn t trying doses near 0 so the model probably doesn t have very much information in that range. Part d (3 points) Dr. Moody believes that that for his treatment to be widely used clinically he needs to be able to get at least a 75% response rate. However, if he uses a dose greater than 70 there will be an unacceptably high risk of patients switching into a manic episode. According to your best estimate will Dr. Moody s new treatment have the desired response rate at a dose of 70? Show your work. Solution: In a probit regression model, plugging values into the estimated regression equation creates a a Z score which can be transformed into a probability using a standard normal table. At the maximum tolerable dose of 70 according to the printout we have Z = (70) =.43 Using the Z-table, the corresponding probability of treatment response at this dose is P (Z.43) =.6664 or 66.64%. This is below Dr. Moodu s target of 75% so he s not going ot be thrilled. Note that we are implicitly using the fact that the higher the dose, the higher the probability of treatment response (since the coefficient of dose is positive) so it is enough to know what happens at the highest acceptable dose. An alternative way to solve this problem is to find the Z-score associated with a 75% treatment response rate and see if the dose necessary to achieve this is below 70. According to the Z table a value of roughly Z =.675 is associated with a probability of.75. To get the dose associated with a Z-score of.675 we solve the following:.675 = D The required dose is 79 which is above the safety limit. Part e (4 points) Do you believe a linear relationship in dose is likely to be appropriate here? Say why or why not and briefly describe two ways you could check. Solution: It seems unlikely that the improvement in response rate can continue indefinitely, even on the log odds scale; eventually there should be some levelling off when additional drug has no further effect. We could check this by binning the data and plotting log odds as a function of the dose bins or by including curvilinear terms in our model and testing for their significance, or perhaps by running a goodness of fit test to see if we have model mis-specification. 19

20 Part f (Optional Bonus) What are the implications of your answer to part (e) for your findings from part (d)? Explain your reasoning. If our guess is correct that eventually the dose effect levels off then probably the slope in our linear model is too shallow (it is having to average out the steep initial slope and the eventual flatter slope.) If we build that change in slope into the model, the initial drop-off will probably be steeper and there will be a good chance we hit our desired target at a dose below

Age (continuous) Gender (0=Male, 1=Female) SES (1=Low, 2=Medium, 3=High) Prior Victimization (0= Not Victimized, 1=Victimized)

Age (continuous) Gender (0=Male, 1=Female) SES (1=Low, 2=Medium, 3=High) Prior Victimization (0= Not Victimized, 1=Victimized) Criminal Justice Doctoral Comprehensive Exam Statistics August 2016 There are two questions on this exam. Be sure to answer both questions in the 3 and half hours to complete this exam. Read the instructions