Name: emergency please discuss this with the exam proctor. 6. Vanderbilt s academic honor code applies.

Similar documents
Age (continuous) Gender (0=Male, 1=Female) SES (1=Low, 2=Medium, 3=High) Prior Victimization (0= Not Victimized, 1=Victimized)

Notes for laboratory session 2

Multiple Linear Regression Analysis

Final Exam - section 2. Thursday, December hours, 30 minutes

Cross-over trials. Martin Bland. Cross-over trials. Cross-over trials. Professor of Health Statistics University of York

ANOVA. Thomas Elliott. January 29, 2013

STA 3024 Spring 2013 EXAM 3 Test Form Code A UF ID #

Midterm STAT-UB.0003 Regression and Forecasting Models. I will not lie, cheat or steal to gain an academic advantage, or tolerate those who do.

Biostatistics 2 nd year Comprehensive Examination. Due: May 31 st, 2013 by 5pm. Instructions:

Psych 5741/5751: Data Analysis University of Boulder Gary McClelland & Charles Judd. Exam #2, Spring 1992

MULTIPLE REGRESSION OF CPS DATA

Multiple Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Self-assessment test of prerequisite knowledge for Biostatistics III in R

This tutorial presentation is prepared by. Mohammad Ehsanul Karim

m 11 m.1 > m 12 m.2 risk for smokers risk for nonsmokers

SUMMER 2011 RE-EXAM PSYF11STAT - STATISTIK

1. Objective: analyzing CD4 counts data using GEE marginal model and random effects model. Demonstrate the analysis using SAS and STATA.

Midterm Exam ANSWERS Categorical Data Analysis, CHL5407H

ECON Introductory Econometrics Seminar 7

Regression models, R solution day7

Business Statistics Probability

Name: Biostatistics 1 st year Comprehensive Examination: Applied Take Home exam. Due May 29 th, 2015 by 5pm. Late exams will not be accepted.

Binary Diagnostic Tests Two Independent Samples

Basic statistics for public health and policy

MODEL I: DRINK REGRESSED ON GPA & MALE, WITHOUT CENTERING

Business Research Methods. Introduction to Data Analysis

Introduction to regression

Regression Output: Table 5 (Random Effects OLS) Random-effects GLS regression Number of obs = 1806 Group variable (i): subject Number of groups = 70

Review and Wrap-up! ESP 178 Applied Research Methods Calvin Thigpen 3/14/17 Adapted from presentation by Prof. Susan Handy

NEUROBLASTOMA DATA -- TWO GROUPS -- QUANTITATIVE MEASURES 38 15:37 Saturday, January 25, 2003

Sociology Exam 3 Answer Key [Draft] May 9, 201 3

Use the above variables and any you might need to construct to specify the MODEL A/C comparisons you would use to ask the following questions.

Sociology 63993, Exam1 February 12, 2015 Richard Williams, University of Notre Dame,

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES

An Introduction to Bayesian Statistics

Statistical analysis DIANA SAPLACAN 2017 * SLIDES ADAPTED BASED ON LECTURE NOTES BY ALMA LEORA CULEN

Statistical reports Regression, 2010

Data Analysis in the Health Sciences. Final Exam 2010 EPIB 621

Modeling unobserved heterogeneity in Stata

Sensitivity, Specificity and Predictive Value [adapted from Altman and Bland BMJ.com]

Dr. Kelly Bradley Final Exam Summer {2 points} Name

SAS Data Setup: SPSS Data Setup: STATA Data Setup: Hoffman ICPSR Example 5 page 1

Common Statistical Issues in Biomedical Research

Least likely observations in regression models for categorical outcomes

Psychology Research Process

Analysis of Variance (ANOVA)

4. STATA output of the analysis

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Threats and Analysis. Shawn Cole. Harvard Business School

Content. Basic Statistics and Data Analysis for Health Researchers from Foreign Countries. Research question. Example Newly diagnosed Type 2 Diabetes

HS Exam 1 -- March 9, 2006

One-Way ANOVAs t-test two statistically significant Type I error alpha null hypothesis dependant variable Independent variable three levels;

Inferential Statistics

THE STATSWHISPERER. Introduction to this Issue. Doing Your Data Analysis INSIDE THIS ISSUE

STP 231 Example FINAL

Multivariate dose-response meta-analysis: an update on glst

Today: Binomial response variable with an explanatory variable on an ordinal (rank) scale.

Lecture 21. RNA-seq: Advanced analysis

Name MATH0021Final Exam REVIEW UPDATED 8/18. MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. 1) 1) A) B) C) D)

CLINICAL RESEARCH METHODS VISP356. MODULE LEADER: PROF A TOMLINSON B.Sc./B.Sc.(HONS) OPTOMETRY

SPSS output for 420 midterm study

MMI 409 Spring 2009 Final Examination Gordon Bleil. 1. Is there a difference in depression as a function of group and drug?

2. Scientific question: Determine whether there is a difference between boys and girls with respect to the distance and its change over time.

The Association Design and a Continuous Phenotype

Final Exam Version A

Lessons in biostatistics

Basic Biostatistics. Chapter 1. Content

Day 11: Measures of Association and ANOVA

Data Analysis Using Regression and Multilevel/Hierarchical Models

Poisson regression. Dae-Jin Lee Basque Center for Applied Mathematics.

Binary Diagnostic Tests Paired Samples

HZAU MULTIVARIATE HOMEWORK #2 MULTIPLE AND STEPWISE LINEAR REGRESSION

CHAPTER TWO REGRESSION

University of New Mexico Hypothesis Testing-4 (Fall 2015) PH 538: Public Health Biostatistical Methods I (by Fares Qeadan)

Multiple Regression Analysis

Regression Including the Interaction Between Quantitative Variables

What you should know before you collect data. BAE 815 (Fall 2017) Dr. Zifei Liu

STATISTICAL METHODS FOR DIAGNOSTIC TESTING: AN ILLUSTRATION USING A NEW METHOD FOR CANCER DETECTION XIN SUN. PhD, Kansas State University, 2012

CLINICAL RESEARCH METHODS VISP356. MODULE LEADER: PROF A TOMLINSON B.Sc./B.Sc.(HONS) OPTOMETRY

NORTH SOUTH UNIVERSITY TUTORIAL 2

Chapter 3: Examining Relationships

Find the slope of the line that goes through the given points. 1) (-9, -68) and (8, 51) 1)

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

(a) Perform a cost-benefit analysis of diabetes screening for this group. Does it favor screening?

Lecture 20: Chi Square

SPRING GROVE AREA SCHOOL DISTRICT. Course Description. Instructional Strategies, Learning Practices, Activities, and Experiences.

Still important ideas

Chapter 14. Inference for Regression Inference about the Model 14.1 Testing the Relationship Signi!cance Test Practice

Analysis and Interpretation of Data Part 1

Problem #1 Neurological signs and symptoms of ciguatera poisoning as the start of treatment and 2.5 hours after treatment with mannitol.

Study Guide for the Final Exam

Unit 1 Exploring and Understanding Data

Still important ideas

Student Performance Q&A:

Math 261 Exam I Spring Name:

Things you need to know about the Normal Distribution. How to use your statistical calculator to calculate The mean The SD of a set of data points.

Sample Math 71B Final Exam #1. Answer Key

PSYCHOLOGY 300B (A01) One-sample t test. n = d = ρ 1 ρ 0 δ = d (n 1) d

Between-Person and Within-Person Effects of Negative Mood Predicting Next-Morning Glucose COMPLETED VERSION

Transcription:

Name: Biostatistics 1 st year Comprehensive Examination: Applied in-class exam May 28 th, 2015: 9am to 1pm Instructions: 1. There are seven questions and 12 pages. 2. Read each question carefully. Answer to the best of your ability. 3. Be as specific as possible and write as clearly as possible. 4. This is an in- class examination; do not discuss any part of this exam with anyone while you are taking the exam. NO BOOKS, NO NOTES, NO INTERNET DEVICES, NO CALCULATORS, NO OUTSIDE ASSISTANCE. 5. You may leave the examination room to use the restroom or to step out into the hallway for a short breather. HOWEVER, YOU MUST LEAVE YOUR CELL PHONE AND ALL EXAM MATERIALS IN THE EXAMINATION ROOM. If there is an emergency please discuss this with the exam proctor. 6. Vanderbilt s academic honor code applies. Question Points Score Comments 1 36 2 36 3 40 4 36 5 36 6 36 7 36 Total 256

1. These are True or False questions. Use a separate sheet of paper to indicate which option (True or False) you are choosing for each answer. Write a brief justification for each answer (1-3 sentences). A new blood pressure medication is tested against a placebo. The p- value testing the null hypothesis of no- effect of medication is 0.51. a. True or False: It is more likely than not that the drug has no effect. A new blood pressure medication is tested against a placebo. The likelihood ratio comparing the hypotheses of a 5- mmhg difference in favor of the drug working versus the hypothesis of no difference between drug and placebo is 0.51. b. True or False: It is more likely than not that the drug has no effect. A new blood pressure medication is tested against a placebo. The posterior probability that the mean difference between the drug and placebo is less than 0 is 0.51. c. True or False: It is more likely than not that the drug has no effect. A new blood pressure medication is tested against a placebo in a randomized controlled trial with 10 subjects in each arm. The outcome measure is systolic blood pressure, in mmhg units, at 1 month after starting therapy. d. True or False: In this setting, a two- sample unequal variance t- test will be more efficient (more powerful) than a Wilcoxon- Mann- Whitney test. e. True or False: In this setting, a two- sample unequal variance t- test will be more efficient (more powerful) than a Z- test that uses pilot data to determine the assumed standard deviations. f. True or False: For any set of outcome data, there exists a prior distribution such that a 95% credible interval for the difference in mean systolic blood pressure will exclude 0. 1 of 12

2. Consider the following R code: Question 2 continued: # initialize variables reps <- 10^4 x <- rep( NA, reps ) y <- rep( NA, reps ) z <- rep( NA, reps ) # run loops for( i in 1:reps ){ a <- rnorm( n=2, mean=1, sd=1 ) b <- rnorm( n=2, mean=1, sd=1 ) c <- mean(a) - mean(b) d <- 2*pnorm( abs(c), lower.tail=f ) x[i] <- (d < 0.05 ) f <- rnorm( n=2, mean=2.96, sd=1 ) g <- mean(a) - mean(f) h <- 2*pnorm( abs(g), lower.tail=f ) y[i] <- (h < 0.05 ) k <- wilcox.test( a, f )$p.value z[i] <- (k < 0.05 ) } # summarize results x.mean <- mean(x) y.mean <- mean(y) z.mean <- mean(z) a. Describe the values f will take as explicitly as possible. b. Describe the values c will take as explicitly as possible. c. Make an educated guess for the value of x.mean. Explain your guess or explain why no reasonable guess can be made. d. Make an educated guess for the value of y.mean. Explain your guess or explain why no reasonable guess can be made. e. Make an educated guess for the value of z.mean. Explain your guess or explain why no reasonable guess can be made. f. As reps goes to infinity, x.mean will converge to some constant, say x.mu. What value of reps will ensure that x.mean is sufficiently close to x.mu? That is, find reps such that PP x.mean x.mu 0.001 = 0.997? Write out a formula and simplify as much as possible; you do not have to solve this numerically. 2 of 12

3. A two- arm randomized controlled trial of a new anti- diabetic medication was tested against a placebo. HbA1c, glycated haemoglobin, was measured three months after randomly assigned therapy was begun. HbA1c is used to assess a patient s average blood sugar levels over a period of months. A table summarizing key data from this trial follows; STATA output for these data are on the following page. HbA1c Treatment N Mean Standard Deviation Drug 8 6.7 2.0 Placebo 8 9.2 2.1 a. Using standard notation, write out the null and alternative hypotheses for a two- sample equal variance t- test of HbA1c levels on drug and placebo. b. Write out a test statistic that can be used to test the hypothesis from part (a) and insert the appropriate numbers from the table above (do not solve it). c. Interpret the STATA output using a formal hypothesis test with a pre- specified size of 5%. Provide a correct interpretation that is also suitable for a non- statistician. d. Interpret the STATA output using a formal significance test with a 5% significance level. Provide a correct interpretation that is also suitable for a non- statistician. e. Interpret the STATA output using an approach other than classical testing. Provide a correct interpretation that is also suitable for a non- statistician. If your ideal statistics are not reported here, define the missing statistics and provide an example to illustrate how they would be interpreted. f. The sample standard deviations are very close in this example. What would be a potential advantage of using an equal- variance t- test in this case? g. Suppose the data on the placebo arm was replaced by historical data from ten million (10^7) patients. As a group, these patients were known to be highly representative of the population of interest in terms of the central tendency of HbA1c but not representative in terms of its dispersion. Propose a test statistic for comparing drug to placebo that makes use of this essentially infinite sample of placebo patients. h. Propose and justify the degrees of freedom for the test you suggest in part (g). 3 of 12

STATA Output for Question #3 Two-sample t test with unequal variances ------------------------------------------------------------------------------ Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- x 8 6.7.7071068 2 5.027958 8.372042 y 8 9.2.7424621 2.1 7.444356 10.95564 ---------+-------------------------------------------------------------------- combined 16 7.95.59115 2.3646 6.689994 9.210006 ---------+-------------------------------------------------------------------- diff -2.5 1.025305-4.699551 -.3004494 ------------------------------------------------------------------------------ diff = mean(x) - mean(y) t = -2.4383 Ho: diff = 0 Satterthwaite's degrees of freedom = 13.9668 Ha: diff < 0 Ha: diff!= 0 Ha: diff > 0 Pr(T < t) = 0.0144 Pr( T > t ) = 0.0287 Pr(T > t) = 0.9856 4 of 12

4. A three arm randomized controlled trial for treating seasonal affective disorder (SAD) looked at the alleviation of SAD symptoms three weeks after beginning a therapy of cognitive- behavioral therapy (CBT), light therapy (LT), or placebo. A table summarizing key trial data follows; STATA output for these data are on the following page. Symptoms alleviated at three weeks? Therapy Yes No Cognitive- behavioral therapy 44 43 Light therapy 32 27 Placebo 11 31 a. Using standard notation, write out the null and alternative hypotheses for comparing the effectiveness of therapy between two arms. Write down the standard large- sample Wald test statistic for testing the hypotheses. b. Use the STATA output to find an observed Z- statistic for comparing the proportion of alleviated symptoms between the cognitive- behavior therapy and placebo arms. Does this Z- statistic correspond to the test in part (a)? If not, explain why. c. Using standard notation, write out the null and alternative hypotheses that are associated with the standard Chi- square test for the table above. Comment on the role these hypotheses play in understanding the effectiveness of therapy over placebo. The analysis plan called for first using a 5% significance level for the omnibus test. Then, if the omnibus test rejects, all pairwise tests between arms would be performed at a 1.67% significance level. d. Use the STATA output to implement this plan and interpret the results. Explain your conclusions and translate them into practical advice for health care providers. If there is additional information you would have liked to see, explain why that information is important/useful. e. The p- values for two of three pairwise comparisons are less than the p- value for the omnibus test. However, the omnibus test uses all of the data and thus has a larger sample size than the pairwise tests. How is it possible that the omnibus test has a larger p- value? Explain your reasoning. f. Is the family- wise Type I Error rate for the three pairwise tests, as implemented in the analysis plan, less than 5%, equal to 5% or greater than 5%? Justify your answer. 5 of 12

STATA Output for Question #4 (Page 1 of 2). tabi 44 43 \ 32 27 \ 11 31, row chi2 col row 1 2 Total 1 44 43 87 50.57 49.43 100.00 2 32 27 59 54.24 45.76 100.00 3 11 31 42 26.19 73.81 100.00 Total 87 101 188 46.28 53.72 100.00 Pearson chi2(2) = 8.9662 Pr = 0.011. cii 87 44, wald -- Binomial Wald --- Variable Obs Mean Std. Err. [95% Conf. Interval] -------------+--------------------------------------------------------------- 87.5057471.0536021.400689.6108053. cii 59 32, wald -- Binomial Wald --- Variable Obs Mean Std. Err. [95% Conf. Interval] -------------+--------------------------------------------------------------- 59.5423729.0648603.4152491.6694967. cii 42 11, wald -- Binomial Wald --- Variable Obs Mean Std. Err. [95% Conf. Interval] -------------+--------------------------------------------------------------- 42.2619048.0678427.1289355.3948741 6 of 12

STATA Output for Question #4 (Page 2 of 2). tabi 44 43 \ 32 27, row chi2 col row 1 2 Total 1 44 43 87 50.57 49.43 100.00 2 32 27 59 54.24 45.76 100.00 Total 76 70 146 52.05 47.95 100.00 Pearson chi2(1) = 0.1890 Pr = 0.664. tabi 44 43 \ 11 31, row chi2 col row 1 2 Total 1 44 43 87 50.57 49.43 100.00 2 11 31 42 26.19 73.81 100.00 Total 55 74 129 42.64 57.36 100.00 Pearson chi2(1) = 6.8862 Pr = 0.009. tabi 32 27 \ 11 31, row chi2 col row 1 2 Total 1 32 27 59 54.24 45.76 100.00 2 11 31 42 26.19 73.81 100.00 Total 43 58 101 42.57 57.43 100.00 Pearson chi2(1) = 7.8939 Pr = 0.005 7 of 12

5. For each of the following models, indicate whether it is a linear regression model, an intrinsically linear regression model, or neither of these. Justify your indication. [A model is intrinsically linear if it can be expressed in a linear form by a suitable transformation.] In each case, εε is a random error term. a. Model (a): YY = ββ + ββ XX + ββ log XX + ββ XX + εε b. Model (b): YY = εε exp ββ + ββ XX + ββ XX c. Model (c): YY = ββ + exp ββ XX + ββ XX + εε d. Model (d): YY = δδ XX / δδ εε + 1 XX + δδ e. Propose a computational solution that would allow you to compute an approximate 95% confidence interval for δδ + δδ ^2 from model (d). Provide enough detail to explain how to implement the procedure. [Code not necessary.] f. Suppose you wanted to compare model (a) and model (c) for model selection. Explain how to do this using a likelihood ratio test with a 5% significance level. Write down the computational steps needed to carry it this test. Provide enough detail to explain how to implement the procedure. [Code not necessary.] 8 of 12

6. National health, welfare, and education statistics for 210 places, mostly UN members, were collected. Measured social and health variables included fertility (number of children per woman), ppgdp (per capita gross domestic product in US dollars), and lifeexpf (female life expectancy in years). The results of the regression are shown below. log ffffffffffffffffff = ββ + ββ log pppppppppp + ββ llllllllllllllll + ee a. Provide a 95% confidence interval for the coefficient of lifeexpf and interpret both the interval and the coefficient. b. Provide a 95% confidence interval for the intercept and interpret both the interval and the coefficient. c. Suppose llllllllllllllll was re- centered at the population mean of llllllllllllllll and the regression refit. Which coefficients would change? Explain your reasoning. d. What is the correlation between the transformed response, log ffffffffffffffffff, and its fitted value? 9 of 12

STATA Output for Question #6 Source SS df MS Number of obs = 199 -------------+------------------------------ F( 2, 196) = 220.78 Model 27.1479253 2 13.5739627 Prob > F = 0.0000 Residual 12.0502857 196.06148105 R-squared = 0.6926 -------------+------------------------------ Adj R-squared = 0.6894 Total 39.198211 198.197970763 Root MSE =.24795 ------------------------------------------------------------------------------ lfert Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- lppg -.0654373.0178054-3.68 0.000 -.100552 -.0303226 lifeexpf -.0282361.0027397-10.31 0.000 -.0336393 -.022833 _cons 3.507362.1270742 27.60 0.000 3.256754 3.75797 ------------------------------------------------------------------------------ 10 of 12

7. A business analyst studied one- way airfare (US dollars) and distance (miles) from city A to 17 other cities in the US. The focus was on modeling airfare as a function of distance. The first model fit to the data was FFFFFFFF = ββ + ββ DDDDDDDDDDDDDDDD + ee a. Based on the output for the model (shown on next page) the analyst concluded the following: The regression coefficient of the predictor variable, Distance, is highly statistically significant and the model explains 99.4% of the variability in the Y- variable, Fare. Thus the model is highly effective for both understanding the effects of Distance on Fare and for predicting future values of Fare given the value of the predictor variable, Distance. Critique this conclusion. b. Does the ordinary simple regression model appear to fit the data well? Model output and diagnostics are shown on the next page. Explain your answer. For example, if your answer is no, also describe in detail how the model can be improved. 11 of 12

STATA Output for Question #7 Source SS df MS Number of obs = 17 -------------+------------------------------ F( 1, 15) = 2469.30 Model 267705.676 1 267705.676 Prob > F = 0.0000 Residual 1626.20673 15 108.413782 R-squared = 0.9940 -------------+------------------------------ Adj R-squared = 0.9936 Total 269331.882 16 16833.2426 Root MSE = 10.412 ------------------------------------------------------------------------------ fare Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- distance.2196873.004421 49.69 0.000.2102642.2291104 _cons 48.97177 4.405493 11.12 0.000 39.58168 58.36186 ------------------------------------------------------------------------------ Plot left shows data and fitted regression line. Plot right shows the standardized residual plot for the simple regression. 12 of 12