Data Analysis in the Health Sciences. Final Exam 2010 EPIB 621

Similar documents
NORTH SOUTH UNIVERSITY TUTORIAL 2

Self-assessment test of prerequisite knowledge for Biostatistics III in R

Stats for Clinical Trials, Math 150 Jo Hardin Logistic Regression example: interaction & stepwise regression

Poisson regression. Dae-Jin Lee Basque Center for Applied Mathematics.

Content. Basic Statistics and Data Analysis for Health Researchers from Foreign Countries. Research question. Example Newly diagnosed Type 2 Diabetes

Notes for laboratory session 2

Logistic regression. Department of Statistics, University of South Carolina. Stat 205: Elementary Statistics for the Biological and Life Sciences

2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0%

OxyContin in the 90 days prior to it being discontinued.

Regression models, R solution day7

STAT 503X Case Study 1: Restaurant Tipping

Question 1(25= )

GPA vs. Hours of Sleep: A Simple Linear Regression Jacob Ushkurnis 12/16/2016

Provincial Projections of Arthritis or Rheumatism, Special Report to the Canadian Rheumatology Association

Chapter 1: Exploring Data

Simple Linear Regression the model, estimation and testing

Optimizing implementation of fecal immunochemical testing in Ontario: A randomized controlled trial

Business Statistics Probability

Appendix B Fracture incidence and costs by province

STP 231 Example FINAL

Statistical reports Regression, 2010

Age (continuous) Gender (0=Male, 1=Female) SES (1=Low, 2=Medium, 3=High) Prior Victimization (0= Not Victimized, 1=Victimized)

Midterm STAT-UB.0003 Regression and Forecasting Models. I will not lie, cheat or steal to gain an academic advantage, or tolerate those who do.

Analysis of bivariate binomial data: Twin analysis

Name: emergency please discuss this with the exam proctor. 6. Vanderbilt s academic honor code applies.

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

Math 215, Lab 7: 5/23/2007

Normal Q Q. Residuals vs Fitted. Standardized residuals. Theoretical Quantiles. Fitted values. Scale Location 26. Residuals vs Leverage

Today. HW 1: due February 4, pm. Matched case-control studies

Regression so far... Lecture 22 - Logistic Regression. Odds. Recap of what you should know how to do... At this point we have covered: Sta102 / BME102

Mohammad Hajizadeh McGill University PHO-Rounds: Epidemiology 15 August 2013

EXECUTIVE SUMMARY DATA AND PROBLEM

Simple Linear Regression

Multiple Linear Regression Analysis

Estimating the volume of Contraband Sales of Tobacco in Canada

Economic Burden of Musculoskeletal Diseases in Canada

3.2A Least-Squares Regression

bivariate analysis: The statistical analysis of the relationship between two variables.

Chapter 3 CORRELATION AND REGRESSION

Correlation and regression

MMI 409 Spring 2009 Final Examination Gordon Bleil. 1. Is there a difference in depression as a function of group and drug?

Statistical Reasoning in Public Health 2009 Biostatistics 612, Homework #2

1. Objective: analyzing CD4 counts data using GEE marginal model and random effects model. Demonstrate the analysis using SAS and STATA.

# Claims with disp.fee > 0. Average cost submitted by Rx**

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

West Nile virus and Other Mosquito borne Diseases National Surveillance Report English Edition

An Introduction to Bayesian Statistics

Today: Binomial response variable with an explanatory variable on an ordinal (rank) scale.

Unit 1 Exploring and Understanding Data

Midterm Exam ANSWERS Categorical Data Analysis, CHL5407H

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES

m 11 m.1 > m 12 m.2 risk for smokers risk for nonsmokers

GENERALIZED ESTIMATING EQUATIONS FOR LONGITUDINAL DATA. Anti-Epileptic Drug Trial Timeline. Exploratory Data Analysis. Exploratory Data Analysis

Week 8 Hour 1: More on polynomial fits. The AIC. Hour 2: Dummy Variables what are they? An NHL Example. Hour 3: Interactions. The stepwise method.

Use the above variables and any you might need to construct to specify the MODEL A/C comparisons you would use to ask the following questions.

Surgical Outcomes: A synopsis & commentary on the Cardiac Care Quality Indicators Report. May 2018

Critical Care Medicine. Critical Care Medicine Profile

5 To Invest or not to Invest? That is the Question.

Chapter 3: Examining Relationships

PROFILE AND PROJECTION OF DRUG OFFENCES IN CANADA. By Kwing Hung, Ph.D. Nathalie L. Quann, M.A.

2017 JOB MARKET & EMPLOYMENT SURVEY EXECUTIVE SUMMARY

SUMMER 2011 RE-EXAM PSYF11STAT - STATISTIK

CRITERIA FOR USE. A GRAPHICAL EXPLANATION OF BI-VARIATE (2 VARIABLE) REGRESSION ANALYSISSys

STATISTICS INFORMED DECISIONS USING DATA

Macrolides in community-acquired pneumonia and otitis media Canadian Coordinating Office for Health Technology Assessment

12.1 Inference for Linear Regression. Introduction

SCHOOL OF MATHEMATICS AND STATISTICS

LAMA Products for the Treatment of COPD

Missing data. Patrick Breheny. April 23. Introduction Missing response data Missing covariate data

Statistics: A Brief Overview Part I. Katherine Shaver, M.S. Biostatistician Carilion Clinic

Review and Wrap-up! ESP 178 Applied Research Methods Calvin Thigpen 3/14/17 Adapted from presentation by Prof. Susan Handy

CHAPTER TWO REGRESSION

Sociology Exam 3 Answer Key [Draft] May 9, 201 3

Math 261 Exam I Spring Name:

IAPT: Regression. Regression analyses

Cervical Cancer and Pap Test Utilisation in Manitoba

BAM Monitor Performance. Seasonal and Geographic Variation in NC

ADHD and Education Survey March The Centre for ADHD Awareness, Canada

Waiting Your Turn. Wait Times for Health Care in Canada, 2018 Report. by Bacchus Barua and David Jacques. with Antonia Collyer

Dr. Kelly Bradley Final Exam Summer {2 points} Name

STATISTICAL METHODS FOR DIAGNOSTIC TESTING: AN ILLUSTRATION USING A NEW METHOD FOR CANCER DETECTION XIN SUN. PhD, Kansas State University, 2012

Today Retrospective analysis of binomial response across two levels of a single factor.

Modeling Binary outcome

Demand for Ocular Tissue in Canada - Final Report

An Introduction to Using WinBUGS for Cost-Effectiveness Analyses in Health Economics

Regression Including the Interaction Between Quantitative Variables

Still important ideas

vaccination in Canada Bernard Duval, md, mph, frcpc Institut National de Santé Publique du Québec Québec, Canada Sevilla,

Waiting Your Turn Wait Times for Health Care in Canada, 2017 Report

Understandable Statistics

Survey research (Lecture 1) Summary & Conclusion. Lecture 10 Survey Research & Design in Psychology James Neill, 2015 Creative Commons Attribution 4.

Survey research (Lecture 1)

Vessel wall differences between middle cerebral artery and basilar artery. plaques on magnetic resonance imaging

Recent Changes in Cervical Cancer Screening in Canada

Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F

Poison Control Centres

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

AP Statistics Practice Test Ch. 3 and Previous

WORKPLACE SMOKING BAN POLICY

Geriatric Medicine. Geriatric Medicine Profile

Transcription:

Data Analysis in the Health Sciences Final Exam 2010 EPIB 621 Student s Name: Student s Number: INSTRUCTIONS This examination consists of 8 questions on 17 pages, including this one. Tables of the normal distribution are provided on the last page. Please write your answers (neatly) in the spaces provided. Fully explain all of your answers. Each question is worth 10 points, for a total of 80. 1. 2. 3. 4. 5. 6. 7. 8. Total (out of 80)

2 1. A new lipid drug is claimed to both increase HDL cholesterol and decrease LDL cholesterol (so it increases good cholesterol while lowering bad cholesterol). Suppose that a 10 milligram per deciliter (mg/dl) change is considered clinically important for HDL, while a 0.4 gram per liter (g/l) change is considered as a clinically important change for LDL. A clinical trial is carried out, comparing this new drug to a placebo control. The main results are given in the table below: New Drug Placebo Sample size 200 200 Mean increase in HDL (mg/dl) 12 3 SD of increase in HDL (mg/dl) 5 4 Mean reduction in LDL (g/l) 0.6 0.4 SD of reduction in LDL (g/l) 0.3 0.3 (a) Calculate a 95% confidence interval for the mean difference in HDL increase between for the two treatment groups.

3 (b) Calculate a 95% confidence interval for the mean difference in LDL decrease between for the two treatment groups. (c) Considering your answers in parts (a) and (b) above, from a clinical viewpoint, what is your overall conclusion concerning the claim that the new drug both increases HDL cholesterol and decreases LDL cholesterol?

4 2. Consider again the data from LDL cholesterol from the table given in question #1. Suppose that a simple linear regression model is estimated using these same data, that is, the model Y = α + β X is fitted, where Y is the decrease in LDL cholesterol in g/l, and X is a dummy variable representing the treatment group, with X = 1 representing the new drug group, and X = 0 representing the placebo group. (a) State the value of ˆα, that is, the estimate of α in the above equation, when the regression is estimated from the data from question #1. Provide an approximate 95% confidence interval for α. (b) Is it possible for you to state the value of ˆβ, that is, the estimate of β in the above equation, when the regression is estimated from the data from question #1? If so, provide your estimate, and if not, state what information is missing.

5 3. Cerebral cortex thickness changes as children age. Cross-sectional data are collected for a group of 200 children, each child providing their age in months (age) and cortical thickness in millimeters (cort). Background information suggests that a polynomial model of degree three may fit the data well. Therefore, the following model is fit with results given below: cort = α + β 1 age + β 2 age 2 + β 3 age 3 > summary(cort) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.3985 1.4980 1.9680 2.0910 2.6600 4.3240 > summary(age) Min. 1st Qu. Median Mean 3rd Qu. Max. 1.00 13.00 24.50 24.35 35.00 48.00 > summary(lm(cort ~ age + age2 + age3) ) Call: lm(formula = cort ~ age + age2 + age3) Residuals: Min 1Q Median 3Q Max -1.33113-0.33527 0.01559 0.30434 1.58140 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 1.409e+00 1.712e-01 8.230 2.58e-14 *** age 1.169e-02 2.976e-02 0.393 0.695 age2-3.068e-04 1.400e-03-0.219 0.827 age3 2.353e-05 1.881e-05 1.251 0.213 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Residual standard error: 0.5054 on 196 degrees of freedom Multiple R-Squared: 0.6247, Adjusted R-squared: 0.619 F-statistic: 108.8 on 3 and 196 DF, p-value: < 2.2e-16 > confint(lm(cort ~ age + age2 + age3)) 2.5 % 97.5 % (Intercept) 1.071200e+00 1.746416e+00 age -4.700213e-02 7.038407e-02 age2-3.067421e-03 2.453775e-03 age3-1.357263e-05 6.062833e-05

6 The scatter plot between age and cort is: (a) Note that the scatter plot seems to show increasing values of cortical thickness with increasing age, and yet none of the beta coefficients in the model are statistically significant, with the smallest p-value being p = 0.213, and all confidence intervals including the null value of 0. Can you explain the most likely reason why this occurred?

(b) Provide a very rough guess as to what the beta coefficient for age might be if both age2 and age3 were dropped from the model. Explain how you derived this guess. 7

8 4. The Food and Drug Administration (FDA) in the United States recently issued a warning concerning a type of asthma medication. In particular, longacting beta agonists (LABA) might increase the risk of death if used in the absence of other medications, such as corticosteroids. Suppose that a study wishes to investigate this issue further, and so follows 10,000 subjects with asthma for one year, some of whom are LABA users, others not. The following logistic regression equation is estimated logit(y ) = α + β 1 age + β 2 LABA + β 3 severity + β 4 age LABA where the outcome Y = 1 indicates a death and Y = 0 indicates no death for a subject, age is dichotomized at 18 (so is a dummy variable, with 0 = child, 1 = adult), LABA is a dummy variable for treatment (1 = treatment with a LABA, 0 = non-laba treatment), severity is a measure of asthma severity on a 0 to 10 scale, and age LABA is an interaction term formed by multiplying the age variable by the LABA variable. The following results are observed: > summary(glm(y ~ age + LABA + severity + age_laba, family = binomial)) Call: glm(formula = Y ~ age + LABA + severity + age_laba, family = binomial) Deviance Residuals: Min 1Q Median 3Q Max -1.402-1.225 1.005 1.117 1.223 Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) -0.042288 0.038839-1.089 0.2762 age 0.247480 0.051796 4.778 1.77e-06 *** LABA 0.153592 0.103470 1.484 0.1377 severity 0.030816 0.006387 4.825 1.40e-06 *** age_laba -0.465616 0.221375-2.103 0.0354 * --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 13798 on 9999 degrees of freedom Residual deviance: 13751 on 9995 degrees of freedom AIC: 13761 Number of Fisher Scoring iterations: 4

9 > confint(glm(y ~ age + LABA + severity + age_laba, family = binomial)) 2.5 % 97.5 % (Intercept) -0.11842489 0.03383062 age 0.14614048 0.34920164 LABA -0.04853973 0.35735594 severity 0.01830410 0.04334381 age_laba -0.89993458-0.03075295 (a) What is the odds ratio for LABA use within children? (b) What is the odds ratio for LABA use within adults?

10 5. A researcher wishes to estimate the effects of variable X on a dichotomous outcome Y, while adjusting for possible confounding effects from variable Z. She calculates the following table of descriptive statistics: Variable Mean Standard deviation X 0 20 Y 0.7 0.46 Z 0 30 and runs a series of regression models using the bic.glm command: > output<-bic.glm( y ~ x + z, data = a, glm.family = "binomial", OR = 1000000) > summary(output) Call: bic.glm.formula(f = y ~ x + z, data = a, glm.family = "binomial", OR = 1e+06) 4 models were selected Best 4 models (cumulative posterior probability = 1 ): p!=0 EV SD model 1 model 2 model 3 model 4 Intercept 100 0.97674 0.1705 9.769e-01 9.782e-01 9.448e-01 8.712e-01 x 7.3 0.00030 0.0040. 1.737e-03 2.880e-02. z 99.3 0.02725 0.0071 2.750e-02 2.668e-02.. nvar 1 2 1 0 BIC -8.275e+02-8.222e+02-8.176e+02-8.117e+02 post prob 0.927 0.066 0.007 0.000 > output$mle [,1] [,2] [,3] [1,] 0.9769086 0.000000000 0.02749871 [2,] 0.9782191 0.001737433 0.02667911 [3,] 0.9448433 0.028802899 0.00000000 [4,] 0.8712224 0.000000000 0.00000000 > output$se [,1] [,2] [,3] [1,] 0.1705511 0.000000000 0.006566584 [2,] 0.1708567 0.012512714 0.008817639 [3,] 0.1643575 0.009017963 0.000000000 [4,] 0.1550527 0.000000000 0.000000000 > output$probne0 [1] 7.3 99.3 > output$label [1] "z" "xz" "x" "NULL" > output$postprob [1] 0.9268724756 0.0661748385 0.0065994672 0.0003532188

11 (a) Do you think there was any confounding between X and Z? Explain why or why not. (b) Provide a point estimate and 95% confidence interval for the odds ratio associated with the effect of X on Y. (c) Do you think it is reasonable to conclude there is no effect of X on Y? Explain why or why not.

12 6. A logistic regression model is run, and a Hosmer-Lemeshow plot is created. The sample size was n = 1000, and the plot was created by forming 20 groups, each of sample size 50. The plot is given below: (a) Overall, do you think the model fits the data well?

(b) Taking into account the manner in which the Hosmer-Lemeshow plot was constructed, consider a predicted probability of 0.5 on the x-axis. On average, how far would you expect the observed value (on the y-axis) to be from 0.5 for these data, for a good fitting model? 13

14 7. A study of skin cancer rates is conducted across Canada. Data on 50,000 randomly selected subjects are surveyed within each province, and the numbers of subjects with skin cancer are counted. Since it is plausible that skin cancer rates may vary from province to province, a hierarchical model is run. The model and results are given below: model { for (i in 1:10) { x[i] ~ dbin(p[i],n[i]) logit(p[i]) <- z[i] z[i] ~ dnorm(mu,tau) } mu ~ dnorm(0,0.001) sigma ~ dunif(0.001, 10) tau <- 1/(sigma*sigma) y ~ dnorm(mu, tau) w <- exp(y)/(1+exp(y)) pdiff <- p[10] - p[6] pstep <- step(p[10] - p[6]) } # Data list(n=c(50000, 50000, 50000, 50000, 50000, 50000, 50000, 50000, 50000, 50000), x=c(458, 448, 466, 497, 470, 333, 226, 489, 261, 456)) # n = sample size within each province # x = number of skin cancer cases within each province # # Order of provinces: # [1] = Newfoundland # [2] = Prince Edward Island # [3] = Nova Scotia # [4] = New Brunswick # [5] = Quebec # [6] = Ontario # [7] = Manitoba # [8] = Saskatchewan # [9] = Alberta # [10] = British Columbia # Results

15 node mean sd 2.5% median 97.5% start sample mu -4.826 0.1095-5.047-4.826-4.608 2001 50000 p[1] 0.009128 4.197E-4 0.008325 0.009123 0.00997 2001 50000 p[2] 0.008935 4.173E-4 0.008137 0.008929 0.009769 2001 50000 p[3] 0.009284 4.267E-4 0.008473 0.009279 0.01014 2001 50000 p[4] 0.009889 4.407E-4 0.009048 0.009883 0.01078 2001 50000 p[5] 0.009359 4.237E-4 0.008541 0.009353 0.01021 2001 50000 p[6] 0.0067 3.58E-4 0.006021 0.006693 0.007417 2001 50000 p[7] 0.004645 3.047E-4 0.004069 0.004638 0.005267 2001 50000 p[8] 0.009729 4.36E-4 0.008896 0.009721 0.0106 2001 50000 p[9] 0.005313 3.22E-4 0.004698 0.005307 0.005965 2001 50000 p[10] 0.009092 4.212E-4 0.008288 0.009088 0.009938 2001 50000 pdiff 0.002393 5.508E-4 0.001317 0.002391 0.003466 2001 50000 pstep 1.0 0.006324 1.0 1.0 1.0 2001 50000 sigma 0.3279 0.09776 0.1957 0.3092 0.5725 2001 50000 w 0.008477 0.003317 0.003911 0.007952 0.01625 2001 50000 y -4.827 0.3606-5.54-4.826-4.103 2001 50000 (a) From the above results, what is your best estimate of the overall skin cancer rate across the 10 provinces, with 95% credible interval? (b) Do you think Ontario has a different skin cancer rate compared to British Columbia? Explain your answer.

16 8. A researcher is trying to estimate the effect of dosage (X) of a blood pressure lowering drug, measured in milligrams (mg) on blood pressure change (Y ), measured in mm Hg. He gathers n = 50 subjects, and gives each a different dosage of the drug (mean dose = 20 mg, standard deviation of the dose is 3 mg), and measures their blood pressure changes. He then estimates the following linear regression equation: Y = 2 + 3 X He later discovers that the scale he was using to measure dose was not very accurate, providing an unbiased measurement, but with measurement error standard deviation of 1 mg. After considering measurement error in X, what is your adjusted estimate of the slope of the linear regression model that predicts Y from X?

17 Normal Density Table 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359 0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753 0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141 0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517 0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879 0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224 0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549 0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852 0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133 0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389 1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621 1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830 1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015 1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177 1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319 1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441 1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545 1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633 1.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706 1.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767 2.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817 2.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857 2.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890 2.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916 2.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.9936 2.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952 2.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.9964 2.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.9974 2.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.9981 2.9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.9986 3.0 0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.9990 Table of standard normal distribution probabilities. Each number in the table provides the probability that a standard normal random variable will be less than the number indicated.