Final Exam. Thursday the 23rd of March, Question Total Points Score Number Possible Received Total 100

Size: px
Start display at page:

Download "Final Exam. Thursday the 23rd of March, Question Total Points Score Number Possible Received Total 100"

Transcription

1 General Comments: Final Exam Thursday the 23rd of March, 2017 Name: This exam is closed book. However, you may use four pages, front and back, of notes and formulas. Write your answers on the exam sheets. If you need more space, continue your answer on the back of the page. A standard normal table is attached at the end of the exam. Make sure you have all 19 pages! The exam is 180 minutes long. There are 4 questions, worth a total of 100 points. They are not equally weighted, nor are they of equal difficulty. The number of points each question is worth is printed with the problem. Read the questions carefully. If you are unsure of the interpretation, come ask. You must show your work to obtain full credit. If you use a result from class, state what result you are using. If you can t complete a problem for any reason, explain what concepts are at issue, and how you would attack the problem. If you can t work out a number you need for a later part of a problem give it a symbol and show how you would do the calculations with a symbol in place of the missing number. It is a good idea to explain your reasoning briefly in English. If I can t tell that you understood what you were doing, I can t give you credit, particularly if you get the wrong numerical answer. GOOD LUCK! Question Total Points Score Number Possible Received Total 100 THE STORY BEHIND THE EXAM: Dr. Biram Paul BP Moody is a psychiatry professor at our favorite school, the University of Calculationally Literate Adults, specializing in Bipolar Disorder. Bipolar disorder is characterized by large swings in mood, energy levels and the ability to enage in daily activities and tasks. The name bipolar refers to the two types of abnormal mood conditions, depression and mania (a state of elevated mood, euphoria and/or irritability), which characterize the disease. Scientists estimate that 1-2 million Americans currently suffer from BP and that anywhere from 1-4% of Americans may experience the disease during their lifetime. The emotional and financial costs to both individuals and society are enormous. You will help Dr. Moody analyze some of his data related to risk factors, frequencies of mood episodes and effects of treatments during this exam. 1

2 Question 1: Risky Business (28 points, 45 minutes) Dr. Moody s first data set is designed to examine various risk factors for bipolar disorder. A sample of n = 500 subjects believed to be at high risk for developing the disease was followed over an extended period of time. The outcome is a multicategory variable recording whether the subject developed Bipolar Disorder Type I (BPI), Bipolar Disorder Type II (BPII) or did not develop the disease (NoBP). BPI is characterized by more severe and longer lasting episodes of depression and higher levels of mania. BPII is characterized predominantly by depression with occasional episodes of hypomania (similar to mania but less severe). Although BPII subjects symptoms are less severe they have larger numbers of depressive episodes and less time well between episodes. (There are actually other subcategories of BP but for simplicity we will ignore them in this problem.) In addition to the outcome variable, Dr. Moody has recorded the following possible predictors: gender (X 1 = 1 for women and X 1 = 0 for men); family history (X 2 = 1 if the subject has a blood relative with BP or major depression, X 2 = 0 if not); history of subclinical depression (X 3 = 1 if the subject has such a history and X 3 = 0 if not); history of hypomania (X 4 = 1 if the subject has such a history and X 4 = 0 if not), substance abuse (X 5 = 1 if the subject has a history of drug or alcohol abuse and X 5 = 0 if not); and stress/anxiety (X 6 is an index score ranging from with higher scores indicating greater levels of life stress.) The printout on the following page shows the results of a multinomial logistic regression fit to Dr. Moody s data along with some follow up tests. Use it to answer parts (a)-(d). Part a (6 points) Find the odds ratio associated with the history of hypomania variable in the BPI level of the model and provide a brief interpretation of it. Do early episodes of hypomania seem to be a significant risk factor for BPI? What about for BPII? Does this make sense based on the problem statement? Solution: To get the odds ratio associated with a variable in a particular level of a multinomial model we simply exponentiate the coefficient. For the history of hypomania variable we have b 4 =.576 so the corresponding odds raio is e.576 = This means that subjects with a history of mania symptoms have odds (or relative risks) of getting BPI (as opposed to no BP, the reference category) 1.78 times or 78% higher than subjects without a such history, all else equal. A history of subthreshold manic symtoms is a significant risk factor for BPI (p-value =.032) but not for BPII (p-value =.756). This makes sense since BPI has a characterized by higher levels of mania whereas BPII involves primarily depression with only occasional episodes of hypomania. 2

3 . mlogit bptype gender famhist dephist hypomaniahist substance stress, baseoutcome(nobp) Multinomial logistic regression Number of obs = 500 LR chi2(12) = Prob > chi2 = Log likelihood = Pseudo R2 = bptype Coef. Std. Err. z P> z [95% Conf. Interval] BPI gender famhist dephist hypmaniahist substance stress _cons BPII gender famhist dephist hypmaniahist substance stress _cons ****************************************************************************** test gender test famhist test dephist chi2(2) = chi2(2) = chi2(2) = Prob > chi2 =.001 Prob > chi2 =.000 Prob > chi2 =.000 test hypomaniahist test substance test stress chi2(2) = 7.94 chi2(2) =.58 chi2(2) = Prob > chi2 =.019 Prob > chi2 =.749 Prob > chi2 =.0003 ****************************************************************************** test [1=2]: gender test [1=2]: famhist test [1=2]: dephist chi2(1) = 8.09 chi2(1) =.04 chi2(1) = Prob > chi2 =.001 Prob > chi2 =.835 Prob > chi2 =.001 test [1=2]: hypmaniahist test [1=2]: substance test [1=2]: stress chi2(1) = 6.36 chi2(1) =.01 chi2(1) = 7.35 Prob > chi2 =.012 Prob > chi2 =.911 Prob > chi2 =

4 Part b (2 points) Which variables are significant overall in this model? Solution: From the first block of tests following the regression printout, all the variables except substance abuse are overall significant in this model (p-values all <.05). Part c (5 points) For which variables do the effects differ across the levels of the model? Write the hypotheses mathematically and in words for one of these tests. noindent Solution: The second set of tests below the printout evaluates whether or not the coefficients of the predictors are equal in both levels of the model. A significant p-value implies that the the effect of the variable differs across the levels of the model. Using the first variable, gender, as an example the hypotheses would be H 0 : β 1,BP I = β 1,BP II the effect of gender on the odds of getting BPI is the same as the effect of gender on the odds of getting BPII (as opposed to no BP), all else equal. Or gender is equally a risk factor (or not) for BPI and BPII. H A : β 1,BP I β 1,BP II the odds ratio for women versus men for BPI vs no BP is different from the OR for BPII vs no BP. Or gender is a differential risk factor for the two subytpes of bipolar disorder. Note that the test is simply of whether the effects are equal we are definitely not testing whether the β s are equal to 0 but we also don t have to assume they are not 0. If both are 0 then there s definitely no difference in effect! The hypotheses for the other tests are parallel. From the printout we see that effects of gender, history of subclinical depression, history of hypomanic symptoms and stress/anxiety scores differ with respect to BPI and BPII with p-values of.001,.001,.012 and.0067 respectively. Part d (4 points) Based on your answers to (a)-(c) plus the coefficient values from the printouts provide a summary of how family and personal clinical and demographic factors affect the risks of various types of bipolar disorder. Solution: This part caused some problems as there are a lot of different ways to group the concepts. Given what was already answered in parts (a)-(c) the key is really to indicate in the cases when the various predictors have different effects for which subtype of BP they were a greater risk factor. Being female, having a history of hypomanic or depressive symptoms, having a relative with BP or major depression, and higher levels of stress are all associated with a higher risk of getting some form of bipolar disorder as one would expect. Substance abuse, however, does not seem to be a significant risk factor. Female gender, depressive symptoms and stress appear to be significantly greater risk factors for BPII than BPI. This follows from the fact that the (i) estimated coefficients are higher in the BPII part of the model than the BPI part of the model and (ii) the tests in part (c) show that the coefficients differ significantly between the two parts of the 4

5 model. Indeed, gender and stress are not even significant in the BPI part of the model although unsurprisingly a history of depressive symptoms is a significant risk factor for both BP subtypes. In contrast, a history of hypomaina is a significant risk factor for BPI but not BPII, while having a family member with BP or major depression is an equally significant risk factor for both subtypes. This is consistent with the fact the BPII is characterized by more frequent (although less severe) episodes of depression while BPI has higher levels of mania. Dr. Moody s graduate student suggests fitting an ordinal logistic model to to these data instead of the multinomial logistic model of parts (a)-(d) to save degrees of freedom and because she thinks the outcome categories are ordered in terms of severity. The corresponding printout is shown below. The outcome categories are ordered so that the coefficients correspond to changes in log odds of increasing putative severity of mood disorder (NoBP, BPI, BPII). Use the printout to answer the remaining parts of the question.. ologit bptype gender famhist dephist hypomaniahist substance stress Ordered logistic regression Number of obs = 500 LR chi2(6) = Prob > chi2 = Log likelihood = Pseudo R2 = bptype Coef. Std. Err. z P> z [95% Conf. Interval] gender famhist dephist hypmaniahist substance stress /cut /cut Note: You may find it helpful to know that the α =.05 critical point for a chi-squared distribution with 6 degrees of freedom is χ 2 6,.05 = Part e Find and give a careful interpretation of the odds ratio for the stress variable in the ordinal logistic regression model. Explain briefly what the proportional odds assumption means with respect to this variable and how you could check it. Solution: As with standard logistic and multinomial logistic models, in an ordinal logistic regression we obtain the odds ratio by exponentiating the regression coefficients. Here the odds ratio 5

6 associated with the stress variable is e.013 = This means that each additional point on the stress scale is associated with times or 1.3% higher odds of more severe(vs less severe) bipolar disorder. The ordering was assumed to be No BP < BP I < BPII the way the model was fit though in fact trying to order these at all is not a good idea as described in part (f)! The corresponding p-value is.000 so increased stress is clearly a highly significant risk factor! In this context, the proportional odds assumption means intuitively that the effect of stress is the same whether you are looking at odds of no BP vs some BP or no BP/BPI vs BPII which is somewhat hard to believe. Mathematically it means that if you plotted the stress relationship for these two comparisisons on the log odds scale you would get parallel lines. In order to do such a plot you would need to bin the data as the raw data don t give you log odds. Alternatively you could perform a score test or you could fit separate logistic models for some vs no BP or BPII vs no/npi and compare the coefficients of the stress variables in the two models to see whether they were approximately equal. The biggest mistake people made on this problem was in understanding what groups of outcomes were being compared at the various cutpoints. It is not BPI vs none and BPII vs none that is what a multinomial model does. It is not BPI vs none and BPII vs BPI that is a next highest model which we never formally studied though it can be done. Also, the proportional odds assumption is NOT equivalent to testing equality of coefficients for the various levels of the multinomial model. Indeed if the effects of stress on the odds of BPI and BPII are the same then the odds of BPII vs none/bpi will have to be less strong than the comparson of some vs no BP as in the first of these instances groups with lower and higher risks are being lumped together while in the second case the two higher risk groups get bundled. Part f (5 points) Do you think using an ordinal logistic model for these data was a good idea? Answer (i) based on the results of parts (a)-(d) and (ii) by performing an appropriate test. (You do not need to write out the hypotheses. Simply compute the test statistic and explain your conclusions.) Solution: This definitely does NOT seem like a good idea! From parts (a)-(d) the effects of the assorted predictor variables are very different on different levels of the model. Although most of the variables seem to overall be risk factors for BP in the expected direction, the ordering with respect to BP subtype is NOT the same for all the predictors. As noted in part (d), female gender, depressive symptoms and stress are much greater risk factors for BPII than BPI (and indeed stress and gender are not even siginificant for BPI) while the reverse is true for history of hypomania and the effect of history of subthreshold depression is essentially the same for the subtypes. It is not enough to just say the coefficients differed for some variables across the levels of the model. As noted above, the effect being equal for the subtypes is NOT the same as the proportional odds assumption which says that the BREAKPOINT between the ordered levels doesn t matter, not that all other levels are equally different from the base/reference level! Indeed, if the proportional odds assumption is correct we would EXPECT the effects to differ across the levels of the multinomial model. We can explicitly check whether the multinomial model (which is more flexible and allows each predictor to have it s own effect on each level of the model) is superior to the ordinal logistic model (which allows only single coefficient for each variable) by calculating our usual statistic of negative 6

7 twice the difference log likelihoods. The difference in degrees of freedom is equal to the difference in the number of parameters between the two models. Here we have χ 2 = 2 ( ( )) = This is much bigger than the critical value of χ 2.05,6 = 12.6 that was given in the problem statement so we conclude that the more flexible multinomial model is a significant improvement over the more parsimonious ordinal logistic model, as expected. (Note that both the multinomial model and the ordinal logistic model have two intercepts here so the difference in the degrees of freedom is just the number of predictor variables.) Part g (Optional Bonus) Using the multinomial logistic model, calculate the probability of developing BPI for (i) a man and (ii) a woman who has no family history of depressive disorders and no personal history of hypomania, substance abuse or stress but does have a history of subclinical depression. Are you surprised by your answers given the sign of the gender coefficient in the BPI part of the model? Explain what has happened. Solution: We are told that the values of the predictors are all 0 except for a family history of depression which is 1 and gender which is 0 for the man and 1 for the woman. In our multinomial model the formula for the required probability is e Xβ BP I 1 + e Xβ BP I + e Xβ BP II so all we need to do is get the relevant linear combinations for the main and the woman. For the man we have and for a woman we have e e = e e e = e Thus the man has a 28.4% chance of developing BPI (at least in our study population) while a woman has only a 25.3% chance. This may at first seem odd since the coefficie of the gender variable in the BPI level of the model is positive, suggesting that women have higher odds of BPI vs no depression than men. However, the probabilities are calculated relative to ALL THREE outcome categories, not just BPI vs no BP. The effect of gender on BPII is even greater than on BPI relative to no BP and so when you combine them with the other covariates set at these levels the probability actually goes up a fair bit for women relative to men for BPII but drops for BPI and no BP. 7

8 Part h (Optional Bonus) Do you think your preferred model from parts (a)-(f) will give accurate estimates of the probabilities of getting BPI and BPII? If so, why? If not, explain conceptually what you might need to do to adjust your model. Solution: The model will not give accurate predicted probabilities for developing BPI and BPII in the general population since we were told this was a high risk sample (though it will give accurate probabilites conditional on the person being high risk, however that was defined). To get accurate predicted probabilities in the general population we would have to look at the sampling weights that were used to get the high risk sample. It is actually not as simple as just adjusting the intercept here because this is not a case-control study. We prospectively followed people to see whether or not they developed BPI or BPII. What we did do was over-sample people based on covariate characteristics that increased their risk profile so that is what we need to adjust for. 8

9 Question 2: An Episode in the Life...(28 points, 45 minutes) Next Dr. Moody wants to look at the number of episodes of mania and depression experienced by people with bipolar disorder. He obtains a new sample of people who have had the disease for a moderate amount of time and records their lifetime number of manias to date. A printout of his Poisson regression with gender (0 = male, 1 = female), BP type (0 = BPI, 1 = BPII) and anxiety/stress (on a scale of with 100 being the highest) as predictors is shown below. He uses illness duration (number of years since the subject was diagnosed with BP) as the offset variable.. poisson manias gender bptype stress, exposure(illdur) Poisson regression Number of obs = 400 LR chi2(3) = Prob > chi2 = Log likelihood = Pseudo R2 = manias Coef. Std. Err. z P> z [95% Conf. Interval] gender bptype stress _cons illdur (exposure) Part a (5 points) Give brief interpretations of the effects of BP type and the stress index on the number of manic episodes based on this model. (Your interpretations should be on the raw data scale, not the log scale.) Solution: In a Poisson regression model with an offset variable the exponentiated coefficients from the GLM fit are interpretable as rate ratios for the number of events per unit time/area/size/etc.. Here we see that, all else equal, the average number of manic episodes per year is e 2.92 =.054 times as high (or 94.6% lower) in those with BPII (bptype=1) than those with BPI (bptype=0). Similarly, the rate of manic episodes goes up by a factor of e.0092 = or an increase of roughly 1% for each additional point on the stress scale. Both variables are highly significant with p-values of 0 to three decimal places. Part b (4 points) Find the predicted number of lifetime (to date) manic episodes for a man with BPI with a stress index score of 50 who has had the disorder for 10 years. Solution: To get the predicted number of manias, we plug in the values of the predictor variables to the estimated regression equation, exponentiate to get from the log link scale to the original scale and then multiply by the appropriate offset variable. Here we have a man (gender=0) with 9

10 BPI (bptype=0) and a stress score of 50 who has had the disorder for 10 years so the estimated number of manias is 10 e (0) 2.915(0)+.0092(50) = 10 e.71 = We thus expect such a person to have had around 20 manic episodes over the course of their 10 year illness. Next Dr. Moody fits a zero-inflated Poisson model to his data. The printout is shown below.. zip manias gender bptype stress, inflate(bptype) exposure(illdur) vuong Zero-inflated Poisson regression Number of obs = 400 Nonzero obs = 208 Zero obs = 192 Inflation model = logit LR chi2(3) = Log likelihood = Prob > chi2 = Coef. Std. Err. z P> z [95% Conf. Interval] manias gender bptype stress _cons illdur (exposure) inflate bptype _cons Vuong test of zip vs. standard Poisson: z = 7.89 Pr>z = Part c (5 points) Explain what zero-inflation means in the context of this problem and why using BP type as the predictor in the inflation part of the model makes sense. Does there appear to have been zero inflation? Briefly justify your answer. Solution: Zero inflation means having more subjects with no events than the assumed distribution of the outcome variable (here Poisson) would suggest. In the context of this problem that would mean having more people than expected with no full manic episodes since their formal diagnosis. That might be fairly common in subjects with BPII which is characterized principally by depression with occasional hypomanic episodes so it makes sense to use BP subtype as the predictor in the inflation portion of the model. There clearly was zero inflation in this problem since the Vuong 10

11 test of the zip (zero-inflated Poisson) model versus the standard Poisson model is highly significant with a whopping Z value of 7.98 and a p-value of 0 to as many decimal places as reported. Part d (6 points) Interpret the regression coefficients of BP type in both parts of the ZIP model. (You may find it helpful to transform them to an approppriate scale.) Solution: The inflation portion of the model is essentially a logistic regression looking at whether or not a subject is a certain 0. Thus if we exponentiate the coefficients in that part of the model we get odds ratios. We have ˆβ = 2.74 so the corresponding odds ratio is e 2.74 = This means that someone with BPII is 15 and a half times higher odds of being a certain 0 i.e. never have manic episodes than someone with BPI. This makes perfect since. In the regression piece of the model the exponentiated coefficients are again interpreted as rate ratios but for people who are NOT certain 0 s. Here that means that, all else equal, among people who could have manic episodes the rate per year is e =.19 times as high (or 81% lower) in people with BPII than people with BPI which also makes sense. Next Dr. Moody decides to look at numbers of depressive episodes. Since depression episodes are more common than manic episodes he has simply recorded the number of episodes in the previous year. He starts by fitting a simple Poisson model with BP type (0 = BPI and 1 = BPII) as the only predictor since BPII is supposed to be characterized by shorter but more frequent depressive episodes. His printout shown is below. Use it to answer the remaining parts of the problem.. poisson depressions bptype Poisson regression Number of obs = 400 LR chi2(1) = Prob > chi2 = Log likelihood = Pseudo R2 = depressions Coef. Std. Err. z P> z [95% Conf. Interval] bptype _cons *****************************************************************************. estat gof, pearson Goodness-of-fit chi2 = Prob > chi2(398) =

12 Part e (5 points) Explain what over-dispersion means in context of this problem. Does there appear to be overdispersion in this model? If not, why not? If so, what do you think might have caused it? Solution: Over-dispersion means higher variance in the outcome variable (after controlling for the covariates) than would be expected based on the assumed distribution. For a Poisson distribution the mean and variance are equal so overdispersion means that the variation in number of manias is larger than the mean rate of manias. We can check for this using a chi-squared goodness of fit test. From the printout the test statistic is and the p-value is 0 meaning there is extremely strong evidence of a lack of fit and in particular a suggestion of over-dispersion. If we take the ratio of the goodness of fit statistic to the number of degrees of freedom we get an estimate of the degree of over-dispersion. Here χ 2 /(df) = /398 = 2.16 so the variance is about twice as high as it should be. One possible cause for this is heterogeneity caused by the failure to include important predictors such as gender, stress, whether the person is a rapid cycler and so on which we have already seen are important in this disorder. Part f (3 points) Based on your answer to part (e), Dr. Moody is concerned that the standard error estimates for his regression coefficients may not be valid. Name two techniques we have learned that he could use try to adjust for this problem. Solution: We could adjust the standard errors directly using the inflation factor calculated from the goodness of fit test. Specifically this would mean multiplying the standard errors given on the Poisson regression printout by 2.16 = 1.47 so they would be increased by 47% which is quite a lot. Alternatively we could fit a negative binomial model which allows for more dispersion than the Poisson model. Part g (Optional Bonus) Dr. Moody s graduate student suggests that instead of using the techniques you named in part (f) you could take a non-parametric approach. Explain briefly (i) how you might use a permutation test to check whether there is a relationship between BP type and the average yearly number of depressive episodes and (ii) how you could use a bootstrap procedure to get a valid confidence interval for the coefficient of the BP Type indicator and hence for the rate ratio. Solution: (i) To do the permutation test we would randomly shuffle the BP type labels many times, rerun the Poisson model for each permuted data set, and sort the resulting regression coefficients for BP type to get the null distribution. We would then see if our observed regression coefficient of 1.57 is outside the 2.5% to 97.5% range of the null distribution. (ii) For the bootstrap proceedure we would take around 2000 samples of size n=400 with replacement from the original data (probably sampling separately from the the BPI and BPII subjects to maintain their relative proportions in the sample), run the Poisson model for each of the bootstrap samples to get the corresponding regression coefficient, order them and use the 2.5% and 97.5% values as the boundaries of our 95% confidence interval. 12

13 Question 3: Quality Control (24 points, 40 minutes) Dr. Moody has developed a new treatment for depression in bipolar subjects. This is a little tricky since use of standard antidepressants may push subjects over into mania. He has randomly assigned 50 BPII depressed subjects each to his new treatment and to the standard treatment and has followed them over a 6 month period. His outcome variable is a quality of life index (range with 100 being the best) which he has recorded at baseline, 3 months and 6 months. Below is the printout from a mixed effects model with group (0 = standard treatment, 1 = new treatment), time (in months) and a group by time interaction as fixed effects, along with a random intercept. The summary information from an OLS regression and a model with both a random intercept and a random slope are also included. The corresponding spaghetti plot is on the next page.. xtreg qol group time groupbytime, mle Random-effects ML regression Number of obs = 300 Group variable (i): id Number of groups = 100 Random effects u_i ~ Gaussian Obs per group: min = 3 avg = 3.0 max = 3 LR chi2(3) = Log likelihood = Prob > chi2 = qol Coef. Std. Err. z P> z [95% Conf. Interval] group time groupbytime _cons /sigma_u /sigma_e rho Likelihood-ratio test of sigma_u=0: chibar2(01)= Prob>=chibar2 = ****************************************************************************** Model Log-Likelihood df AIC BIC OLS Random Intercept Random Intercept + Slope Note: for AIC and BIC, lower scores are better. 13

14 . Part a (4 points) Based on the spaghetti plot, which of the three models would you expect to be the best, the OLS regression, the random intercept model, or the model that includes both the random intercept and slope? Briefly explain your reasoning and say how you would interpret the observed random effects (if any) in practical real-world terms. Solution: Based on the spaghetti plot, the random intercept model should be the best. The lines for the individual subjects within each group are very parallel (no fanning effect) but there are substantial shifts of the lines up and down about the group average. This suggests that each person has there own base level of QOL (so a random intercept is needed) but that there isn t much individual variation in the rate of change of QOL (so a random slope is not needed). The random intercept effects represent the innate view that individuals have of their basic life circumstances (or their basic mood personality) which cause their ratings to be systematically higher or lower than those of other people across time. Part b (5 points) Do the model printouts confirm your expectations? Briefly justify your answer, including stating which models you believe are significant improvements compared to others. Solution: The printouts are consistent with our expectations from part (a). The random intercept model has a far higher (less negative) log likelihood and much lower information criteria values than OLS regression, meaning that there is a significant improvement in model fit by using the random intercepts. Moreover, the likelihood ratio chi-squared test on the printout for the variance of the random intercept is hugely significant (χ 2 = , p-value = 0). Of course we could have calculated this directly by taking -2 times the difference in log likelihoods from the table below the printout! However the log likelihood and information criteria values for the random intercept plus random slope model are essentially identical to those for the random intercept only model, meaning there is no additional improvement in fit for adding the random slope term. The random intercept and random intercept+slope models are both significantly better than OLS but the model with the random slope is no better than the intercept only model. 14

15 Part c (7 points) Explain as carefully as you can what the model tells you about how the quality of life of the patients in this study changes over time as a function of treatment. Your answer should incorporate the signs, numerical values and significance of the fixed-effect model coefficients. Solution: The various coefficients can be interpreted as follows. At baseline (time = 0) the average QOL score in both groups is around 30. The intercept in the model ( ˆβ 0 = 30.06) corresponds to the average QOL score at time 0 in the control group. The group coefficient ( ˆβ 1 =.04) gives the difference in QOl between the new treatment and control (standard treatment) groups at time 0 but it is miniscule and indeed not statistically significant (p-value =.945) which tells us that at baseline there is no group difference. This is good since we are supposed to be doing a randomized study! The coefficient of the time variable tells us the change in QOL PER MONTH in the group receiving the standard treatment. Their QOL scores are going up by about ˆβ 2 = 2.02 points per month and that improvement is significant given the p-value of 0. The coefficient of the group by time interaction tells us the DIFFERENCE in the rate of change in QOL between the new and standard treatment groups. Here ˆβ 4 = 4.98 means that the the rate of improvement in QOL is 5 points higher per month with Dr. Moody s new treatment than it would have been on the standard treatment. The corresponding p-value of 0 tells us that the subjects in the new treatment group are improving significantly faster than those on the standard treatment. Their rate of improvement is in fact = 7 points per month. Thus we see that the subjects all started with fairly low QOL scores and while both treatments seem to be helping, Dr. Moody s new treatment is helping more, just as he would hope. Part d (6 points) Dr. Moody says his primary interests are really to find out (i) whether there has been significant change in quality of life within the group of subjects getting his new treatment from baseline to 6 months and (ii) whether there is a significant difference in quality of life between the two treatment groups at the 6 month time point. For each of these scenarios name two other techniques which could be used to address those questions and say how you would decide which was more appropriate. Solution: For point (i), Dr. Moody is interested in the before/after change in QOL in the group getting his new treatment. Since this is a within subjects test we have pairing. We could use a paired t-test, a sign test or a signed rank test to address this question. If the distribution of QOL scores is normal (as our mixed model from parts (a)-(c) assumed) then the paired t-test will be the most powerful. However, if the distribution of changes in QOL scores is skewed then we would probably want the signed rank test which is more powerful than the sign test but doesn t make as strong assumptions as the t-test. (Note that we could in fact get the answer to this problem from the mixed model by using an appropriate contrast though I didn t actually show you how to run that sort of test in the mixed model context.) For part (ii) we are interested in comparing the two treatment groups at a single time point. Since the groups are independent we can use a two-sample t-test if the QOL scores are normally distributed or a Wilcoxon rank-sum test if they are not. As above, the parametric test will be the more powerful if the distributional assumptions are met. (We could also get this result as a 15

16 contrast within the mixed model.) Part e (2 points) Dr. Moody is considering adding covariates such as age, gender, and current life stress to his mixed effects model. What do you think this is likely to do to the magnitude of the random effects and why? Solution: We would expect the random effects to shrink in magnitude (i.e. there would be a decrease in the estimated variance of the random intercepts, ˆσ i ) since the covariates Dr. Moody is proposing represent some of the individual subject level characteristics that could lead to someone having systematically low or high QOL. For instance if someone had a particularly stressful set of life situations, their QOL scores might low all along. If you adjust for differences in stress level that would explain some of the variance in the intercepts. Part f (Optional Bonus) Dr. Moody could have chosen to use either an hierarchical or repeated measures formulation of his model instead of the mixed effects model used above. You can get bonus credit for (i) mathematically rewriting the model you chose in (a)-(b) as a hierarchical model, (ii) saying what repeated measures covariance structure would be equivalent to the model you selected in parts (a)-(b) and (iii) suggesting (with justification) one other repeated measures covariance structure that might be appropriate here and describing briefly how you would compare it to the current model. Solution: (i) Thinking of this problem in the hierarchical framework we start with the withinsubjects model. We let Y it represent the QOL of subject i at time t and have Y it = γ 0,i + γ 1,i Time + ɛ i,t Next we write the coefficients from the within-subjects model as a function of population average treatment effects and individual effects. Based on the earlier parts of the problem we expect to have a subject level random intercept but no random slope so we get γ 0,i = β 0 + β 1 Tx + b 0,i γ 1,i = β 2 + β 3 Tx where b 0,i and ɛ it are all mean 0 and normally distributed and independent of one-another. (ii) The random intercept model is equivalent to a compound symmetric covariance structure if all subjects are measured at the same equally spaced time points. (iii) Since we have three time points we could consider either an unstructured covariance matrix (allowing the variances and correlations across time to vary freely) or an AR(1) model in which the variances are constant but the correlations die off over time. This last model probably makes a lot of sense since QOL is subject to change over time due to a variety of life circumstances which will tend to attenuate correlations if their are big gaps. 16

17 Question 4: What s Your Response? (20 points, 35 minutes) Dr. Moody is encouraged by the results of Question 3 which suggest his new depression treatment is improving BP subjects quality of life. Now he wants to learn more about how long it takes them to respond to it symptomatically. (Someone is considered a responder if their score on a standard index of depression symptoms both drops by at least 50% and falls below the cutoff for clinical depression.) He has performed a small 16 week (112 day) pilot study, randomizing 10 subjects to his new treatment and 10 to the standard treatment and has recorded how long it took each subject to achieve responder status. The response times in days for each group are shown below along with their respective Kaplan-Meier curves. Subjects whose times were censored are marked with a (+). Standard Treatment (Group 0): 28+, 49, 50, 67, 80, 90, 100, 105, 110, 112+ New Treatment (Group 1): 10, 12, 14, 20, 20, 25, 30, 37, 50, 112+ Part a (4 points) Give an example of an informative reason for censoring and a non-informative reason for censoring that could occur in the context of this study and identify a subject to whom each of these things could reasonably apply (by circling their data value), briefly explaining your reasoning. Solution: Informative censoring occurs if the subject drops out for a reason that has something to do with the outcome of interest. You do not have the know the reason for it to be informative. In the context of this study, informative censoring could occur, for example, if someone dropped out because their condition worsened or they were experiencing bad medication side effects. The person who was censored early in the standard treatment group at 28 days could fall in this category, if the treatment wasn t working for them. Non-informative censoring occurs if the subject drops out for a reason that has nothing to do with the outcome of interest. Again, it makes no difference whether you know the reason for drop out or not. A common form of noninformative censoring is when the study ends before people have the event of interest. We ve got one person in each treatment group who was censored at day 112 which was the stated end time 17

18 of the study. This is non-informative censoring. You can t say this sort of censoring is informative because we knew the people had lasted a long time. When you have censoring you always know that the people have gone without an event for that long, however long it is. What matters is whether the reason you lost track of them is because they were doing well (or poorly) not just that they were doing well or poorly. Part b (5 points) Based on the Kaplan-Meier plot what is the approximate median time to response in each of the treatment groups (i.e. the time by which 50% of subjects have responded)? What is the estimated response rate in each of the groups at the end of the 16 week pilot study? Overall what do the curves suggest about how the new treatment works relative to the standard treatment? Solution: To find the median time to event we just need to see where the survival curves cross the 50% line. This appears to be at about 20 days for the new treatment group and about 90 days for the standard treatment. By the end of the 16 weeks however, the two curves have come together, with about 90% of subjects in each group having responded. (Note that in this case failure to survive is a good thing it means people have responded!) The implication is that both treatments actually work very well in the long run but Dr. Moody s new treatment relieves symptoms much faster which is probably still of great interest to doctors and their patients. Now Dr. Moody decides to examine how dose is related to the probability of treatment response. Specifically, he has taken n = 40 subjects, given them a range of doses of the medication, and recorded whether or not they responded. The printout for a probit regression for his data is shown below. Use it to answer the remainder of the question.. probit responder dose Probit regression Number of obs = 40 LR chi2(1) = Prob > chi2 = Log likelihood = Pseudo R2 = responder Coef. Std. Err. z P> z [95% Conf. Interval] dose _cons ****************************************************************************** Part c (4 points) Based on this model, find the probability of response and a corresponding 95% confidence interval for people who receive no treatment (i.e. a dose of 0), briefly explaining your reasoning. Solution: We are asked to find the probability of response and a corresponding 95% confidence interval at a dose of 0 (i.e. the probability of spontaneous recovery without treatment). At a 18

19 dose of zero the Z-score is just the intercept, Z = 1.48 and the corresponding probability is P (Z 1.48) =.5 P (0 Z 1.48) =.0694 or just under 7%. Now to get the confidence interval we just have to use the transformation principle. The printout gives us a confidence interval of to for the intercept. To get the confidence interval for the spontaneous recovery rate we just convert these values to probabilities: (P (Z 2.34) = % and P (Z.625) = %. Thus there is somewhere between a 1% to 26.5% spotaneous recovery rate. This is a pretty wide range. The imprecision is due to a combination of our small sample size (n=40) and the fact that Dr. Moody probably wasn t trying doses near 0 so the model probably doesn t have very much information in that range. Part d (3 points) Dr. Moody believes that that for his treatment to be widely used clinically he needs to be able to get at least a 75% response rate. However, if he uses a dose greater than 70 there will be an unacceptably high risk of patients switching into a manic episode. According to your best estimate will Dr. Moody s new treatment have the desired response rate at a dose of 70? Show your work. Solution: In a probit regression model, plugging values into the estimated regression equation creates a a Z score which can be transformed into a probability using a standard normal table. At the maximum tolerable dose of 70 according to the printout we have Z = (70) =.43 Using the Z-table, the corresponding probability of treatment response at this dose is P (Z.43) =.6664 or 66.64%. This is below Dr. Moodu s target of 75% so he s not going ot be thrilled. Note that we are implicitly using the fact that the higher the dose, the higher the probability of treatment response (since the coefficient of dose is positive) so it is enough to know what happens at the highest acceptable dose. An alternative way to solve this problem is to find the Z-score associated with a 75% treatment response rate and see if the dose necessary to achieve this is below 70. According to the Z table a value of roughly Z =.675 is associated with a probability of.75. To get the dose associated with a Z-score of.675 we solve the following:.675 = D The required dose is 79 which is above the safety limit. Part e (4 points) Do you believe a linear relationship in dose is likely to be appropriate here? Say why or why not and briefly describe two ways you could check. Solution: It seems unlikely that the improvement in response rate can continue indefinitely, even on the log odds scale; eventually there should be some levelling off when additional drug has no further effect. We could check this by binning the data and plotting log odds as a function of the dose bins or by including curvilinear terms in our model and testing for their significance, or perhaps by running a goodness of fit test to see if we have model mis-specification. 19

20 Part f (Optional Bonus) What are the implications of your answer to part (e) for your findings from part (d)? Explain your reasoning. If our guess is correct that eventually the dose effect levels off then probably the slope in our linear model is too shallow (it is having to average out the steep initial slope and the eventual flatter slope.) If we build that change in slope into the model, the initial drop-off will probably be steeper and there will be a good chance we hit our desired target at a dose below

Age (continuous) Gender (0=Male, 1=Female) SES (1=Low, 2=Medium, 3=High) Prior Victimization (0= Not Victimized, 1=Victimized)

Age (continuous) Gender (0=Male, 1=Female) SES (1=Low, 2=Medium, 3=High) Prior Victimization (0= Not Victimized, 1=Victimized) Criminal Justice Doctoral Comprehensive Exam Statistics August 2016 There are two questions on this exam. Be sure to answer both questions in the 3 and half hours to complete this exam. Read the instructions

More information

Sociology 63993, Exam1 February 12, 2015 Richard Williams, University of Notre Dame,

Sociology 63993, Exam1 February 12, 2015 Richard Williams, University of Notre Dame, Sociology 63993, Exam1 February 12, 2015 Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ I. True-False. (20 points) Indicate whether the following statements are true or false.

More information

The Logic of Data Analysis Using Statistical Techniques M. E. Swisher, 2016

The Logic of Data Analysis Using Statistical Techniques M. E. Swisher, 2016 The Logic of Data Analysis Using Statistical Techniques M. E. Swisher, 2016 This course does not cover how to perform statistical tests on SPSS or any other computer program. There are several courses

More information

Statistical reports Regression, 2010

Statistical reports Regression, 2010 Statistical reports Regression, 2010 Niels Richard Hansen June 10, 2010 This document gives some guidelines on how to write a report on a statistical analysis. The document is organized into sections that

More information

MODEL I: DRINK REGRESSED ON GPA & MALE, WITHOUT CENTERING

MODEL I: DRINK REGRESSED ON GPA & MALE, WITHOUT CENTERING Interpreting Interaction Effects; Interaction Effects and Centering Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised February 20, 2015 Models with interaction effects

More information

m 11 m.1 > m 12 m.2 risk for smokers risk for nonsmokers

m 11 m.1 > m 12 m.2 risk for smokers risk for nonsmokers SOCY5061 RELATIVE RISKS, RELATIVE ODDS, LOGISTIC REGRESSION RELATIVE RISKS: Suppose we are interested in the association between lung cancer and smoking. Consider the following table for the whole population:

More information

Business Statistics Probability

Business Statistics Probability Business Statistics The following was provided by Dr. Suzanne Delaney, and is a comprehensive review of Business Statistics. The workshop instructor will provide relevant examples during the Skills Assessment

More information

Reflection Questions for Math 58B

Reflection Questions for Math 58B Reflection Questions for Math 58B Johanna Hardin Spring 2017 Chapter 1, Section 1 binomial probabilities 1. What is a p-value? 2. What is the difference between a one- and two-sided hypothesis? 3. What

More information

Regression Output: Table 5 (Random Effects OLS) Random-effects GLS regression Number of obs = 1806 Group variable (i): subject Number of groups = 70

Regression Output: Table 5 (Random Effects OLS) Random-effects GLS regression Number of obs = 1806 Group variable (i): subject Number of groups = 70 Regression Output: Table 5 (Random Effects OLS) Random-effects GLS regression Number of obs = 1806 R-sq: within = 0.1498 Obs per group: min = 18 between = 0.0205 avg = 25.8 overall = 0.0935 max = 28 Random

More information

Sociology Exam 3 Answer Key [Draft] May 9, 201 3

Sociology Exam 3 Answer Key [Draft] May 9, 201 3 Sociology 63993 Exam 3 Answer Key [Draft] May 9, 201 3 I. True-False. (20 points) Indicate whether the following statements are true or false. If false, briefly explain why. 1. Bivariate regressions are

More information

Poisson regression. Dae-Jin Lee Basque Center for Applied Mathematics.

Poisson regression. Dae-Jin Lee Basque Center for Applied Mathematics. Dae-Jin Lee dlee@bcamath.org Basque Center for Applied Mathematics http://idaejin.github.io/bcam-courses/ D.-J. Lee (BCAM) Intro to GLM s with R GitHub: idaejin 1/40 Modeling count data Introduction Response

More information

Name: emergency please discuss this with the exam proctor. 6. Vanderbilt s academic honor code applies.

Name: emergency please discuss this with the exam proctor. 6. Vanderbilt s academic honor code applies. Name: Biostatistics 1 st year Comprehensive Examination: Applied in-class exam May 28 th, 2015: 9am to 1pm Instructions: 1. There are seven questions and 12 pages. 2. Read each question carefully. Answer

More information

Simple Sensitivity Analyses for Matched Samples Thomas E. Love, Ph.D. ASA Course Atlanta Georgia https://goo.

Simple Sensitivity Analyses for Matched Samples Thomas E. Love, Ph.D. ASA Course Atlanta Georgia https://goo. Goal of a Formal Sensitivity Analysis To replace a general qualitative statement that applies in all observational studies the association we observe between treatment and outcome does not imply causation

More information

Making comparisons. Previous sessions looked at how to describe a single group of subjects However, we are often interested in comparing two groups

Making comparisons. Previous sessions looked at how to describe a single group of subjects However, we are often interested in comparing two groups Making comparisons Previous sessions looked at how to describe a single group of subjects However, we are often interested in comparing two groups Data can be interpreted using the following fundamental

More information

3 CONCEPTUAL FOUNDATIONS OF STATISTICS

3 CONCEPTUAL FOUNDATIONS OF STATISTICS 3 CONCEPTUAL FOUNDATIONS OF STATISTICS In this chapter, we examine the conceptual foundations of statistics. The goal is to give you an appreciation and conceptual understanding of some basic statistical

More information

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES Correlational Research Correlational Designs Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are

More information

Final Exam - section 2. Thursday, December hours, 30 minutes

Final Exam - section 2. Thursday, December hours, 30 minutes Econometrics, ECON312 San Francisco State University Michael Bar Fall 2011 Final Exam - section 2 Thursday, December 15 2 hours, 30 minutes Name: Instructions 1. This is closed book, closed notes exam.

More information

IAPT: Regression. Regression analyses

IAPT: Regression. Regression analyses Regression analyses IAPT: Regression Regression is the rather strange name given to a set of methods for predicting one variable from another. The data shown in Table 1 and come from a student project

More information

The Pretest! Pretest! Pretest! Assignment (Example 2)

The Pretest! Pretest! Pretest! Assignment (Example 2) The Pretest! Pretest! Pretest! Assignment (Example 2) May 19, 2003 1 Statement of Purpose and Description of Pretest Procedure When one designs a Math 10 exam one hopes to measure whether a student s ability

More information

Objectives. Quantifying the quality of hypothesis tests. Type I and II errors. Power of a test. Cautions about significance tests

Objectives. Quantifying the quality of hypothesis tests. Type I and II errors. Power of a test. Cautions about significance tests Objectives Quantifying the quality of hypothesis tests Type I and II errors Power of a test Cautions about significance tests Designing Experiments based on power Evaluating a testing procedure The testing

More information

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo Business Statistics The following was provided by Dr. Suzanne Delaney, and is a comprehensive review of Business Statistics. The workshop instructor will provide relevant examples during the Skills Assessment

More information

Notes for laboratory session 2

Notes for laboratory session 2 Notes for laboratory session 2 Preliminaries Consider the ordinary least-squares (OLS) regression of alcohol (alcohol) and plasma retinol (retplasm). We do this with STATA as follows:. reg retplasm alcohol

More information

STA 3024 Spring 2013 EXAM 3 Test Form Code A UF ID #

STA 3024 Spring 2013 EXAM 3 Test Form Code A UF ID # STA 3024 Spring 2013 Name EXAM 3 Test Form Code A UF ID # Instructions: This exam contains 34 Multiple Choice questions. Each question is worth 3 points, for a total of 102 points (there are TWO bonus

More information

Lesson 9: Two Factor ANOVAS

Lesson 9: Two Factor ANOVAS Published on Agron 513 (https://courses.agron.iastate.edu/agron513) Home > Lesson 9 Lesson 9: Two Factor ANOVAS Developed by: Ron Mowers, Marin Harbur, and Ken Moore Completion Time: 1 week Introduction

More information

1. Objective: analyzing CD4 counts data using GEE marginal model and random effects model. Demonstrate the analysis using SAS and STATA.

1. Objective: analyzing CD4 counts data using GEE marginal model and random effects model. Demonstrate the analysis using SAS and STATA. LDA lab Feb, 6 th, 2002 1 1. Objective: analyzing CD4 counts data using GEE marginal model and random effects model. Demonstrate the analysis using SAS and STATA. 2. Scientific question: estimate the average

More information

Psychology Research Process

Psychology Research Process Psychology Research Process Logical Processes Induction Observation/Association/Using Correlation Trying to assess, through observation of a large group/sample, what is associated with what? Examples:

More information

Chapter 7: Descriptive Statistics

Chapter 7: Descriptive Statistics Chapter Overview Chapter 7 provides an introduction to basic strategies for describing groups statistically. Statistical concepts around normal distributions are discussed. The statistical procedures of

More information

Study Guide for the Final Exam

Study Guide for the Final Exam Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make

More information

Modeling unobserved heterogeneity in Stata

Modeling unobserved heterogeneity in Stata Modeling unobserved heterogeneity in Stata Rafal Raciborski StataCorp LLC November 27, 2017 Rafal Raciborski (StataCorp) Modeling unobserved heterogeneity November 27, 2017 1 / 59 Plan of the talk Concepts

More information

Midterm STAT-UB.0003 Regression and Forecasting Models. I will not lie, cheat or steal to gain an academic advantage, or tolerate those who do.

Midterm STAT-UB.0003 Regression and Forecasting Models. I will not lie, cheat or steal to gain an academic advantage, or tolerate those who do. Midterm STAT-UB.0003 Regression and Forecasting Models The exam is closed book and notes, with the following exception: you are allowed to bring one letter-sized page of notes into the exam (front and

More information

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES 24 MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES In the previous chapter, simple linear regression was used when you have one independent variable and one dependent variable. This chapter

More information

Economics 345 Applied Econometrics

Economics 345 Applied Econometrics Economics 345 Applied Econometrics Lab Exam Version 1: Solutions Fall 2016 Prof: Martin Farnham TAs: Joffré Leroux Rebecca Wortzman Last Name (Family Name) First Name Student ID Open EViews, and open the

More information

Research Methods 1 Handouts, Graham Hole,COGS - version 1.0, September 2000: Page 1:

Research Methods 1 Handouts, Graham Hole,COGS - version 1.0, September 2000: Page 1: Research Methods 1 Handouts, Graham Hole,COGS - version 10, September 000: Page 1: T-TESTS: When to use a t-test: The simplest experimental design is to have two conditions: an "experimental" condition

More information

To open a CMA file > Download and Save file Start CMA Open file from within CMA

To open a CMA file > Download and Save file Start CMA Open file from within CMA Example name Effect size Analysis type Level Tamiflu Symptom relief Mean difference (Hours to relief) Basic Basic Reference Cochrane Figure 4 Synopsis We have a series of studies that evaluated the effect

More information

SUMMER 2011 RE-EXAM PSYF11STAT - STATISTIK

SUMMER 2011 RE-EXAM PSYF11STAT - STATISTIK SUMMER 011 RE-EXAM PSYF11STAT - STATISTIK Full Name: Årskortnummer: Date: This exam is made up of three parts: Part 1 includes 30 multiple choice questions; Part includes 10 matching questions; and Part

More information

Risk Aversion in Games of Chance

Risk Aversion in Games of Chance Risk Aversion in Games of Chance Imagine the following scenario: Someone asks you to play a game and you are given $5,000 to begin. A ball is drawn from a bin containing 39 balls each numbered 1-39 and

More information

MODELING DISEASE FINAL REPORT 5/21/2010 SARAH DEL CIELLO, JAKE CLEMENTI, AND NAILAH HART

MODELING DISEASE FINAL REPORT 5/21/2010 SARAH DEL CIELLO, JAKE CLEMENTI, AND NAILAH HART MODELING DISEASE FINAL REPORT 5/21/2010 SARAH DEL CIELLO, JAKE CLEMENTI, AND NAILAH HART ABSTRACT This paper models the progression of a disease through a set population using differential equations. Two

More information

Sheila Barron Statistics Outreach Center 2/8/2011

Sheila Barron Statistics Outreach Center 2/8/2011 Sheila Barron Statistics Outreach Center 2/8/2011 What is Power? When conducting a research study using a statistical hypothesis test, power is the probability of getting statistical significance when

More information

Sawtooth Software. MaxDiff Analysis: Simple Counting, Individual-Level Logit, and HB RESEARCH PAPER SERIES. Bryan Orme, Sawtooth Software, Inc.

Sawtooth Software. MaxDiff Analysis: Simple Counting, Individual-Level Logit, and HB RESEARCH PAPER SERIES. Bryan Orme, Sawtooth Software, Inc. Sawtooth Software RESEARCH PAPER SERIES MaxDiff Analysis: Simple Counting, Individual-Level Logit, and HB Bryan Orme, Sawtooth Software, Inc. Copyright 009, Sawtooth Software, Inc. 530 W. Fir St. Sequim,

More information

2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0%

2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0% Capstone Test (will consist of FOUR quizzes and the FINAL test grade will be an average of the four quizzes). Capstone #1: Review of Chapters 1-3 Capstone #2: Review of Chapter 4 Capstone #3: Review of

More information

Data and Statistics 101: Key Concepts in the Collection, Analysis, and Application of Child Welfare Data

Data and Statistics 101: Key Concepts in the Collection, Analysis, and Application of Child Welfare Data TECHNICAL REPORT Data and Statistics 101: Key Concepts in the Collection, Analysis, and Application of Child Welfare Data CONTENTS Executive Summary...1 Introduction...2 Overview of Data Analysis Concepts...2

More information

Two-sample Categorical data: Measuring association

Two-sample Categorical data: Measuring association Two-sample Categorical data: Measuring association Patrick Breheny October 27 Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 1 / 40 Introduction Study designs leading to contingency

More information

EXERCISE: HOW TO DO POWER CALCULATIONS IN OPTIMAL DESIGN SOFTWARE

EXERCISE: HOW TO DO POWER CALCULATIONS IN OPTIMAL DESIGN SOFTWARE ...... EXERCISE: HOW TO DO POWER CALCULATIONS IN OPTIMAL DESIGN SOFTWARE TABLE OF CONTENTS 73TKey Vocabulary37T... 1 73TIntroduction37T... 73TUsing the Optimal Design Software37T... 73TEstimating Sample

More information

MBios 478: Systems Biology and Bayesian Networks, 27 [Dr. Wyrick] Slide #1. Lecture 27: Systems Biology and Bayesian Networks

MBios 478: Systems Biology and Bayesian Networks, 27 [Dr. Wyrick] Slide #1. Lecture 27: Systems Biology and Bayesian Networks MBios 478: Systems Biology and Bayesian Networks, 27 [Dr. Wyrick] Slide #1 Lecture 27: Systems Biology and Bayesian Networks Systems Biology and Regulatory Networks o Definitions o Network motifs o Examples

More information

You must answer question 1.

You must answer question 1. Research Methods and Statistics Specialty Area Exam October 28, 2015 Part I: Statistics Committee: Richard Williams (Chair), Elizabeth McClintock, Sarah Mustillo You must answer question 1. 1. Suppose

More information

Sociology 593 Exam 2 March 28, 2003

Sociology 593 Exam 2 March 28, 2003 Sociology 59 Exam March 8, 00 I. True-False. (0 points) Indicate whether the following statements are true or false. If false, briefly explain why.. WHITE is coded = white, 0 = nonwhite. X is a continuous

More information

Chapter 9: Answers. Tests of Between-Subjects Effects. Dependent Variable: Time Spent Stalking After Therapy (hours per week)

Chapter 9: Answers. Tests of Between-Subjects Effects. Dependent Variable: Time Spent Stalking After Therapy (hours per week) Task 1 Chapter 9: Answers Stalking is a very disruptive and upsetting (for the person being stalked) experience in which someone (the stalker) constantly harasses or obsesses about another person. It can

More information

15.301/310, Managerial Psychology Prof. Dan Ariely Recitation 8: T test and ANOVA

15.301/310, Managerial Psychology Prof. Dan Ariely Recitation 8: T test and ANOVA 15.301/310, Managerial Psychology Prof. Dan Ariely Recitation 8: T test and ANOVA Statistics does all kinds of stuff to describe data Talk about baseball, other useful stuff We can calculate the probability.

More information

Still important ideas

Still important ideas Readings: OpenStax - Chapters 1 13 & Appendix D & E (online) Plous Chapters 17 & 18 - Chapter 17: Social Influences - Chapter 18: Group Judgments and Decisions Still important ideas Contrast the measurement

More information

Clincial Biostatistics. Regression

Clincial Biostatistics. Regression Regression analyses Clincial Biostatistics Regression Regression is the rather strange name given to a set of methods for predicting one variable from another. The data shown in Table 1 and come from a

More information

Generalized Estimating Equations for Depression Dose Regimes

Generalized Estimating Equations for Depression Dose Regimes Generalized Estimating Equations for Depression Dose Regimes Karen Walker, Walker Consulting LLC, Menifee CA Generalized Estimating Equations on the average produce consistent estimates of the regression

More information

Modelling Research Productivity Using a Generalization of the Ordered Logistic Regression Model

Modelling Research Productivity Using a Generalization of the Ordered Logistic Regression Model Modelling Research Productivity Using a Generalization of the Ordered Logistic Regression Model Delia North Temesgen Zewotir Michael Murray Abstract In South Africa, the Department of Education allocates

More information

Regression Including the Interaction Between Quantitative Variables

Regression Including the Interaction Between Quantitative Variables Regression Including the Interaction Between Quantitative Variables The purpose of the study was to examine the inter-relationships among social skills, the complexity of the social situation, and performance

More information

end-stage renal disease

end-stage renal disease Case study: AIDS and end-stage renal disease Robert Smith? Department of Mathematics and Faculty of Medicine The University of Ottawa AIDS and end-stage renal disease ODEs Curve fitting AIDS End-stage

More information

A NON-TECHNICAL INTRODUCTION TO REGRESSIONS. David Romer. University of California, Berkeley. January Copyright 2018 by David Romer

A NON-TECHNICAL INTRODUCTION TO REGRESSIONS. David Romer. University of California, Berkeley. January Copyright 2018 by David Romer A NON-TECHNICAL INTRODUCTION TO REGRESSIONS David Romer University of California, Berkeley January 2018 Copyright 2018 by David Romer CONTENTS Preface ii I Introduction 1 II Ordinary Least Squares Regression

More information

Problem #1 Neurological signs and symptoms of ciguatera poisoning as the start of treatment and 2.5 hours after treatment with mannitol.

Problem #1 Neurological signs and symptoms of ciguatera poisoning as the start of treatment and 2.5 hours after treatment with mannitol. Ho (null hypothesis) Ha (alternative hypothesis) Problem #1 Neurological signs and symptoms of ciguatera poisoning as the start of treatment and 2.5 hours after treatment with mannitol. Hypothesis: Ho:

More information

Chapter 12. The One- Sample

Chapter 12. The One- Sample Chapter 12 The One- Sample z-test Objective We are going to learn to make decisions about a population parameter based on sample information. Lesson 12.1. Testing a Two- Tailed Hypothesis Example 1: Let's

More information

Section 3.2 Least-Squares Regression

Section 3.2 Least-Squares Regression Section 3.2 Least-Squares Regression Linear relationships between two quantitative variables are pretty common and easy to understand. Correlation measures the direction and strength of these relationships.

More information

Data Analysis in the Health Sciences. Final Exam 2010 EPIB 621

Data Analysis in the Health Sciences. Final Exam 2010 EPIB 621 Data Analysis in the Health Sciences Final Exam 2010 EPIB 621 Student s Name: Student s Number: INSTRUCTIONS This examination consists of 8 questions on 17 pages, including this one. Tables of the normal

More information

Dr. Kelly Bradley Final Exam Summer {2 points} Name

Dr. Kelly Bradley Final Exam Summer {2 points} Name {2 points} Name You MUST work alone no tutors; no help from classmates. Email me or see me with questions. You will receive a score of 0 if this rule is violated. This exam is being scored out of 00 points.

More information

Results & Statistics: Description and Correlation. I. Scales of Measurement A Review

Results & Statistics: Description and Correlation. I. Scales of Measurement A Review Results & Statistics: Description and Correlation The description and presentation of results involves a number of topics. These include scales of measurement, descriptive statistics used to summarize

More information

Daniel Boduszek University of Huddersfield

Daniel Boduszek University of Huddersfield Daniel Boduszek University of Huddersfield d.boduszek@hud.ac.uk Introduction to Multinominal Logistic Regression SPSS procedure of MLR Example based on prison data Interpretation of SPSS output Presenting

More information

11/24/2017. Do not imply a cause-and-effect relationship

11/24/2017. Do not imply a cause-and-effect relationship Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are highly extraverted people less afraid of rejection

More information

Standard Deviation and Standard Error Tutorial. This is significantly important. Get your AP Equations and Formulas sheet

Standard Deviation and Standard Error Tutorial. This is significantly important. Get your AP Equations and Formulas sheet Standard Deviation and Standard Error Tutorial This is significantly important. Get your AP Equations and Formulas sheet The Basics Let s start with a review of the basics of statistics. Mean: What most

More information

Multiple Regression Models

Multiple Regression Models Multiple Regression Models Advantages of multiple regression Parts of a multiple regression model & interpretation Raw score vs. Standardized models Differences between r, b biv, b mult & β mult Steps

More information

APPENDIX N. Summary Statistics: The "Big 5" Statistical Tools for School Counselors

APPENDIX N. Summary Statistics: The Big 5 Statistical Tools for School Counselors APPENDIX N Summary Statistics: The "Big 5" Statistical Tools for School Counselors This appendix describes five basic statistical tools school counselors may use in conducting results based evaluation.

More information

Student Performance Q&A:

Student Performance Q&A: Student Performance Q&A: 2009 AP Statistics Free-Response Questions The following comments on the 2009 free-response questions for AP Statistics were written by the Chief Reader, Christine Franklin of

More information

Stepwise method Modern Model Selection Methods Quantile-Quantile plot and tests for normality

Stepwise method Modern Model Selection Methods Quantile-Quantile plot and tests for normality Week 9 Hour 3 Stepwise method Modern Model Selection Methods Quantile-Quantile plot and tests for normality Stat 302 Notes. Week 9, Hour 3, Page 1 / 39 Stepwise Now that we've introduced interactions,

More information

Political Science 15, Winter 2014 Final Review

Political Science 15, Winter 2014 Final Review Political Science 15, Winter 2014 Final Review The major topics covered in class are listed below. You should also take a look at the readings listed on the class website. Studying Politics Scientifically

More information

Chapter 12: Analysis of covariance, ANCOVA

Chapter 12: Analysis of covariance, ANCOVA Chapter 12: Analysis of covariance, ANCOVA Smart Alex s Solutions Task 1 A few years back I was stalked. You d think they could have found someone a bit more interesting to stalk, but apparently times

More information

Multifactor Confirmatory Factor Analysis

Multifactor Confirmatory Factor Analysis Multifactor Confirmatory Factor Analysis Latent Trait Measurement and Structural Equation Models Lecture #9 March 13, 2013 PSYC 948: Lecture #9 Today s Class Confirmatory Factor Analysis with more than

More information

How To Treat Resistant Bipolar Patients

How To Treat Resistant Bipolar Patients Transcript Details This is a transcript of an educational program accessible on the ReachMD network. Details about the program and additional media formats for the program are accessible by visiting: https://reachmd.com/programs/clinicians-roundtable/how-to-treat-resistant-bipolar-patients/3560/

More information

Psychology Research Process

Psychology Research Process Psychology Research Process Logical Processes Induction Observation/Association/Using Correlation Trying to assess, through observation of a large group/sample, what is associated with what? Examples:

More information

Exemplar for Internal Assessment Resource Mathematics Level 3. Resource title: Sport Science. Investigate bivariate measurement data

Exemplar for Internal Assessment Resource Mathematics Level 3. Resource title: Sport Science. Investigate bivariate measurement data Exemplar for internal assessment resource Mathematics 3.9A for Achievement Standard 91581 Exemplar for Internal Assessment Resource Mathematics Level 3 Resource title: Sport Science This exemplar supports

More information

Biostatistics II

Biostatistics II Biostatistics II 514-5509 Course Description: Modern multivariable statistical analysis based on the concept of generalized linear models. Includes linear, logistic, and Poisson regression, survival analysis,

More information

STP 231 Example FINAL

STP 231 Example FINAL STP 231 Example FINAL Instructor: Ela Jackiewicz Honor Statement: I have neither given nor received information regarding this exam, and I will not do so until all exams have been graded and returned.

More information

CHAPTER ONE CORRELATION

CHAPTER ONE CORRELATION CHAPTER ONE CORRELATION 1.0 Introduction The first chapter focuses on the nature of statistical data of correlation. The aim of the series of exercises is to ensure the students are able to use SPSS to

More information

Midterm project due next Wednesday at 2 PM

Midterm project due next Wednesday at 2 PM Course Business Midterm project due next Wednesday at 2 PM Please submit on CourseWeb Next week s class: Discuss current use of mixed-effects models in the literature Short lecture on effect size & statistical

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 10: Introduction to inference (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 17 What is inference? 2 / 17 Where did our data come from? Recall our sample is: Y, the vector

More information

MEA DISCUSSION PAPERS

MEA DISCUSSION PAPERS Inference Problems under a Special Form of Heteroskedasticity Helmut Farbmacher, Heinrich Kögel 03-2015 MEA DISCUSSION PAPERS mea Amalienstr. 33_D-80799 Munich_Phone+49 89 38602-355_Fax +49 89 38602-390_www.mea.mpisoc.mpg.de

More information

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo Please note the page numbers listed for the Lind book may vary by a page or two depending on which version of the textbook you have. Readings: Lind 1 11 (with emphasis on chapters 10, 11) Please note chapter

More information

Collecting & Making Sense of

Collecting & Making Sense of Collecting & Making Sense of Quantitative Data Deborah Eldredge, PhD, RN Director, Quality, Research & Magnet Recognition i Oregon Health & Science University Margo A. Halm, RN, PhD, ACNS-BC, FAHA Director,

More information

Self-assessment test of prerequisite knowledge for Biostatistics III in R

Self-assessment test of prerequisite knowledge for Biostatistics III in R Self-assessment test of prerequisite knowledge for Biostatistics III in R Mark Clements, Karolinska Institutet 2017-10-31 Participants in the course Biostatistics III are expected to have prerequisite

More information

t-test for r Copyright 2000 Tom Malloy. All rights reserved

t-test for r Copyright 2000 Tom Malloy. All rights reserved t-test for r Copyright 2000 Tom Malloy. All rights reserved This is the text of the in-class lecture which accompanied the Authorware visual graphics on this topic. You may print this text out and use

More information

Chapter 3: Examining Relationships

Chapter 3: Examining Relationships Name Date Per Key Vocabulary: response variable explanatory variable independent variable dependent variable scatterplot positive association negative association linear correlation r-value regression

More information

Sample Exam Paper Answer Guide

Sample Exam Paper Answer Guide Sample Exam Paper Answer Guide Notes This handout provides perfect answers to the sample exam paper. I would not expect you to be able to produce such perfect answers in an exam. So, use this document

More information

Today: Binomial response variable with an explanatory variable on an ordinal (rank) scale.

Today: Binomial response variable with an explanatory variable on an ordinal (rank) scale. Model Based Statistics in Biology. Part V. The Generalized Linear Model. Single Explanatory Variable on an Ordinal Scale ReCap. Part I (Chapters 1,2,3,4), Part II (Ch 5, 6, 7) ReCap Part III (Ch 9, 10,

More information

Statistical Techniques. Masoud Mansoury and Anas Abulfaraj

Statistical Techniques. Masoud Mansoury and Anas Abulfaraj Statistical Techniques Masoud Mansoury and Anas Abulfaraj What is Statistics? https://www.youtube.com/watch?v=lmmzj7599pw The definition of Statistics The practice or science of collecting and analyzing

More information

Correlation and regression

Correlation and regression PG Dip in High Intensity Psychological Interventions Correlation and regression Martin Bland Professor of Health Statistics University of York http://martinbland.co.uk/ Correlation Example: Muscle strength

More information

Probability Models for Sampling

Probability Models for Sampling Probability Models for Sampling Chapter 18 May 24, 2013 Sampling Variability in One Act Probability Histogram for ˆp Act 1 A health study is based on a representative cross section of 6,672 Americans age

More information

1 Simple and Multiple Linear Regression Assumptions

1 Simple and Multiple Linear Regression Assumptions 1 Simple and Multiple Linear Regression Assumptions The assumptions for simple are in fact special cases of the assumptions for multiple: Check: 1. What is external validity? Which assumption is critical

More information

CHAPTER - 6 STATISTICAL ANALYSIS. This chapter discusses inferential statistics, which use sample data to

CHAPTER - 6 STATISTICAL ANALYSIS. This chapter discusses inferential statistics, which use sample data to CHAPTER - 6 STATISTICAL ANALYSIS 6.1 Introduction This chapter discusses inferential statistics, which use sample data to make decisions or inferences about population. Populations are group of interest

More information

Chapter 23. Inference About Means. Copyright 2010 Pearson Education, Inc.

Chapter 23. Inference About Means. Copyright 2010 Pearson Education, Inc. Chapter 23 Inference About Means Copyright 2010 Pearson Education, Inc. Getting Started Now that we know how to create confidence intervals and test hypotheses about proportions, it d be nice to be able

More information

How to analyze correlated and longitudinal data?

How to analyze correlated and longitudinal data? How to analyze correlated and longitudinal data? Niloofar Ramezani, University of Northern Colorado, Greeley, Colorado ABSTRACT Longitudinal and correlated data are extensively used across disciplines

More information

The Lens Model and Linear Models of Judgment

The Lens Model and Linear Models of Judgment John Miyamoto Email: jmiyamot@uw.edu October 3, 2017 File = D:\P466\hnd02-1.p466.a17.docm 1 http://faculty.washington.edu/jmiyamot/p466/p466-set.htm Psych 466: Judgment and Decision Making Autumn 2017

More information

Analysis of Variance (ANOVA)

Analysis of Variance (ANOVA) Research Methods and Ethics in Psychology Week 4 Analysis of Variance (ANOVA) One Way Independent Groups ANOVA Brief revision of some important concepts To introduce the concept of familywise error rate.

More information

Today. HW 1: due February 4, pm. Matched case-control studies

Today. HW 1: due February 4, pm. Matched case-control studies Today HW 1: due February 4, 11.59 pm. Matched case-control studies In the News: High water mark: the rise in sea levels may be accelerating Economist, Jan 17 Big Data for Health Policy 3:30-4:30 222 College

More information

Unit 1 Exploring and Understanding Data

Unit 1 Exploring and Understanding Data Unit 1 Exploring and Understanding Data Area Principle Bar Chart Boxplot Conditional Distribution Dotplot Empirical Rule Five Number Summary Frequency Distribution Frequency Polygon Histogram Interquartile

More information

Standard Scores. Richard S. Balkin, Ph.D., LPC-S, NCC

Standard Scores. Richard S. Balkin, Ph.D., LPC-S, NCC Standard Scores Richard S. Balkin, Ph.D., LPC-S, NCC 1 Normal Distributions While Best and Kahn (2003) indicated that the normal curve does not actually exist, measures of populations tend to demonstrate

More information