Name: Biostatistics 1 st year Comprehensive Examination: Applied Take Home exam Due May 29 th, 2015 by 5pm. Late exams will not be accepted. Instructions: 1. There are 2 questions and 4 pages. Answer each question to the best of your ability. 2. Be as specific as possible and type up your answers. 3. This is a take home examination; You may consult books, your notes, or journal articles. You may use any computer software you desire. 4. DO NOT DISCUSS ANY PART OF THIS EXAM WITH A LIVING HUMAN BEING, ORACLE OR ARTIFICIAL INTELLIGENCE. THIS WORK MUST BE DONE COMPLETELY INDEPENDENTLY. 5. If you have any questions, contact Professor Jeffrey Blume (at any time by email at j.blume@vanderbilt.edu) or Linda Wilson during working hours. 6. Turn in your exam either in person or by email to Professor Blume or Ms. Wilson. If you choose to turn in your exam by email, make sure that you receive confirmation that your exam has been received before 5pm on May 29 th. If you do not receive confirmation you should assume that your exam has not been received. 7. Vanderbilt s academic honor code applies. 8. The exam, data and data dictionary can be downloaded from: https://dl.dropboxusercontent.com/u/25204698/comps/appliedexam2015.pdf http://biostat.mc.vanderbilt.edu/wiki/pub/main/datasets/stressecho.csv http://biostat.mc.vanderbilt.edu/wiki/pub/main/datasets/cstressecho.html Question Points Score Comments 1 100 2 100 Total
1. A new anti- hypertensive drug is being compared to placebo in randomized trial with 200 participants (100 placebo; 100 drug). Assume that systolic blood pressure (SBP in mmhg) is normally distributed with a standard deviation of 10mmHg (this applies to both arms). The statistician designing the trial is considering two different testing procedures to detect a true difference in mean SBP. These are: Point Null Test (PNT): This test is the routine unequal- variance two- sample t- test of the point null hypothesis that there is no difference between means, i.e. HH! : μμ! μμ! = 0 vs. HH! : μμ! μμ! 0. This test rejects the null hypothesis whenever its p- value is less than 0.05. Interval Null Test (INT): This test also yields a p- value for testing the null hypothesis that there is no difference between means. However, the p- value it uses is the largest p- value over a range of clinically unimportant null hypotheses, defined here as μμ! μμ! 1,1 mmhg. The idea is: compute the p- value for every null hypothesis in the interval and take the maximum (least significant) p- value as the final p- value for testing the null hypothesis that there is no difference between means. This test takes the largest (least significant) p- value over the clinical indifference zone (i.e., the interval null hypothesis) and rejects when that p- value is less than 0.05. Simulate this situation to answer the following questions. [Note: The indifference zone remains fixed at μμ! μμ! 1,1 mmhg for these questions.] a. What is the rate of false rejections for PNT and INT when μμ! μμ! = 0? b. What is the rate of false rejections for PNT and INT when μμ! μμ! = 4? c. Graphically display the rejection rates for PNT and INT over the range of values μμ! μμ! 7, 7. Discuss your findings. d. Let ff!"#,! pp and ff!"#,! pp be the p- value distributions (pdfs) for the PNT and INT procedures when μμ! μμ! = 0, respectively. Graph ff!"#,! pp and ff!"#,! pp individually and against each other. Discuss your findings. e. Let ff!"#,! pp and ff!"#,! pp be the p- value distributions (pdfs) for the PNT and INT procedures when μμ! μμ! = 4, respectively. Graph ff!"#,! pp and ff!"#,! pp individually and against each other. Discuss your findings. 1 of 4
f. What is the false discovery rate, PP(HH! is true HH! is Rejected), for PNT and INT when the null hypothesis is μμ! μμ! = 0, the alternative hypothesis is μμ! μμ! = 4, and the two hypotheses are thought to be equally likely. Discuss your findings. g. The INT procedure does technically yield a single p- value, which is loosely defined as the maximum p- value over the interval null hypothesis. Consider the following common definitions and interpretations of p- values and discuss how they apply or do not apply to the PNT and INT. i. A p- value is the probability of having observed results as extreme or more extreme than that observed when the null hypothesis is true. ii. iii. iv. A p- value is the smallest Type I Error rate, αα, at which the current study would still have rejected the null hypothesis. A p- value is a measure of the strength of evidence against the null hypothesis and smaller p- values denote stronger evidence. Equal p- values from experiments with the same sample size represent the same amount of evidence against the null hypothesis. 2 of 4
2. The goal of this study is to determine if an array of measures e.g. stress echocardiograph (SE), history of hypertension, age can be used to measure a patient's risk of having a cardiac event. For younger patients, stress echocardiograph (SE) is a typical test of this risk. It involves raising the patient's heart rate by exercise often by having the patient run on a treadmill and then taking various measurements, such as heart rate and blood pressure, as well as more complicated measurements of the heart. The problem with this test is that it often cannot be used on older patients whose bodies can't take the stress of hard exercise. The key to assessing risk, however, is putting stress on the heart before taking the relevant measurements. While exercise can't be used to create this stress for older patients, the drug dobutamine (DOB) can. This study, then, is partly an attempt to see if the stress echocardiography test is effective in predicting cardiac events when the stress on the heart was produced by dobutamine instead of exercise. More specifically, though, the study seeks to pinpoint which measurements taken during the stress echocardiography test are most helpful in predicting whether or not a patient suffered a cardiac event over the next year. The data are available as a comma- separated- value file at http://biostat.mc.vanderbilt.edu/wiki/pub/main/datasets/stressecho.csv A data dictionary is available at http://biostat.mc.vanderbilt.edu/wiki/pub/main/datasets/cstressecho.html a. The key outcome variable is any_event, a composite outcome indicating a cardiac related event during the year post- baseline. Death is treated as a cardiac event. Create your own version of this variable from the four component outcomes (see data dictionary). Suggest a resolution for any errors uncovered. b. What percent of the sample experienced a cardiac event? Describe the associations of the presence of a cardiac event with: positive stress echocardiogram, positive ECG (MI), baseline ejection fraction, ejection fraction on dobutamine, resting wall motion abnormality on echocardiogram (a value of 0 means there is an abnormality), history of hypertension, gender, and age. For the continuous measures, describe the relationship with the probability- - - or transformation of the probability- - - of a cardiac event. (For example, is the relationship linear?) c. Suppose we predict a cardiac event when a patient presents with a positive stress echocardiogram. Describe the sensitivity, specificity, positive and negative predicted values, and false positive/negative rates. 3 of 4
d. Can we improve the sensitivity and/or specificity by adding the classification of ECG results (normal, equivocal, MI) to the model? Explain. e. Suggest a model to predict cardiac events with measures chosen from those you described in part (b) with the goal of improving the test. Use the predicted probabilities from this model to produce an ROC curve. Discuss (include the optimal cutpoint, values of measures described in part (c), etc.). f. Consider a 75yo male with a positive stress echocardiogram, resting wall abnormality, a positive ECG, baseline ejection fraction of 40, a dobutamine ejection fraction of 42, and a history of hypertension. Do you predict he will have a cardiac event in the next year? What is his estimated risk? Provide a confidence interval for this estimate. 4 of 4