EPI 200C Final, June 4 th, 2009 This exam includes 24 questions.

Similar documents
Lecture Outline. Biost 590: Statistical Consulting. Stages of Scientific Studies. Scientific Method

Supplement 2. Use of Directed Acyclic Graphs (DAGs)

Biostatistics II

PubH 7405: REGRESSION ANALYSIS. Propensity Score

Epidemiologic Methods I & II Epidem 201AB Winter & Spring 2002

A Bayesian Perspective on Unmeasured Confounding in Large Administrative Databases

A Brief Introduction to Bayesian Statistics

Biases in clinical research. Seungho Ryu, MD, PhD Kanguk Samsung Hospital, Sungkyunkwan University

Measurement Error in Nonlinear Models

Bayesian graphical models for combining multiple data sources, with applications in environmental epidemiology

In this module I provide a few illustrations of options within lavaan for handling various situations.

George B. Ploubidis. The role of sensitivity analysis in the estimation of causal pathways from observational data. Improving health worldwide

Improving ecological inference using individual-level data

Chapter 13 Estimating the Modified Odds Ratio

Objective: To describe a new approach to neighborhood effects studies based on residential mobility and demonstrate this approach in the context of

Sample size and power calculations in Mendelian randomization with a single instrumental variable and a binary outcome

cloglog link function to transform the (population) hazard probability into a continuous

Commentary SANDER GREENLAND, MS, DRPH

MS&E 226: Small Data

Instrumental Variables Estimation: An Introduction

Write your identification number on each paper and cover sheet (the number stated in the upper right hand corner on your exam cover).

Bias and confounding. Mads Kamper-Jørgensen, associate professor, Section of Social Medicine

A COMPARISON OF IMPUTATION METHODS FOR MISSING DATA IN A MULTI-CENTER RANDOMIZED CLINICAL TRIAL: THE IMPACT STUDY

SISCR Module 7 Part I: Introduction Basic Concepts for Binary Biomarkers (Classifiers) and Continuous Biomarkers

Analysis of Rheumatoid Arthritis Data using Logistic Regression and Penalized Approach

Methods for Addressing Selection Bias in Observational Studies

BIOSTATISTICAL METHODS

Applied Medical. Statistics Using SAS. Geoff Der. Brian S. Everitt. CRC Press. Taylor Si Francis Croup. Taylor & Francis Croup, an informa business

Measures of Association

School of Population and Public Health SPPH 503 Epidemiologic methods II January to April 2019

Confounding and Bias

Lecture Outline Biost 517 Applied Biostatistics I

You must answer question 1.

investigate. educate. inform.

Chapter 2 A Guide to Implementing Quantitative Bias Analysis

Fundamental Clinical Trial Design

PSYCH-GA.2211/NEURL-GA.2201 Fall 2016 Mathematical Tools for Cognitive and Neural Science. Homework 5

8/10/2012. Education level and diabetes risk: The EPIC-InterAct study AIM. Background. Case-cohort design. Int J Epidemiol 2012 (in press)

How to analyze correlated and longitudinal data?

WDHS Curriculum Map Probability and Statistics. What is Statistics and how does it relate to you?

Lecture 21. RNA-seq: Advanced analysis

Introducing a SAS macro for doubly robust estimation

Propensity Score Methods for Causal Inference with the PSMATCH Procedure

Propensity scores: what, why and why not?

Bayesian approaches to handling missing data: Practical Exercises

Chapter 1: Exploring Data

Time-varying confounding and marginal structural model

A Bayesian Nonparametric Model Fit statistic of Item Response Models

MEA DISCUSSION PAPERS

Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy Research

Comparison And Application Of Methods To Address Confounding By Indication In Non- Randomized Clinical Studies

Evaluating health management programmes over time: application of propensity score-based weighting to longitudinal datajep_

Computer Age Statistical Inference. Algorithms, Evidence, and Data Science. BRADLEY EFRON Stanford University, California

Midterm Exam ANSWERS Categorical Data Analysis, CHL5407H

Dr. Kelly Bradley Final Exam Summer {2 points} Name

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n.

Methodology for Non-Randomized Clinical Trials: Propensity Score Analysis Dan Conroy, Ph.D., inventiv Health, Burlington, MA

Observational Study Designs. Review. Today. Measures of disease occurrence. Cohort Studies

Analysis of TB prevalence surveys

Controlling Bias & Confounding

Lecture 9 Internal Validity

Supplementary Appendix

Chapter 17 Sensitivity Analysis and Model Validation

Propensity score methods to adjust for confounding in assessing treatment effects: bias and precision

Analysis of Environmental Data Conceptual Foundations: En viro n m e n tal Data

Bayesian methods for combining multiple Individual and Aggregate data Sources in observational studies

Estimating indirect and direct effects of a Cancer of Unknown Primary (CUP) diagnosis on survival for a 6 month-period after diagnosis.

Bayesian Logistic Regression Modelling via Markov Chain Monte Carlo Algorithm

Recent Advances in Methods for Quantiles. Matteo Bottai, Sc.D.

Evidence-Based Medicine Journal Club. A Primer in Statistics, Study Design, and Epidemiology. August, 2013

Selected Topics in Biostatistics Seminar Series. Missing Data. Sponsored by: Center For Clinical Investigation and Cleveland CTSC

UN Handbook Ch. 7 'Managing sources of non-sampling error': recommendations on response rates

AP STATISTICS 2013 SCORING GUIDELINES

Blood Pressure and Complications in Individuals with Type 2 Diabetes and No Previous Cardiovascular Disease. ID BMJ

Evaluating Social Programs Course: Evaluation Glossary (Sources: 3ie and The World Bank)

Role of respondents education as a mediator and moderator in the association between childhood socio-economic status and later health and wellbeing

Estimating Direct Effects of New HIV Prevention Methods. Focus: the MIRA Trial

Challenges of Observational and Retrospective Studies

Comparing treatments evaluated in studies forming disconnected networks of evidence: A review of methods

Using principal stratification to address post-randomization events: A case study. Baldur Magnusson, Advanced Exploratory Analytics PSI Webinar

Design of Experiments & Introduction to Research

Missing Data and Imputation

The following are questions that students had difficulty with on the first three exams.

Today: Binomial response variable with an explanatory variable on an ordinal (rank) scale.

Welcome to this series focused on sources of bias in epidemiologic studies. In this first module, I will provide a general overview of bias.

Sequential nonparametric regression multiple imputations. Irina Bondarenko and Trivellore Raghunathan

Advanced IPD meta-analysis methods for observational studies

Statistical Hocus Pocus? Assessing the Accuracy of a Diagnostic Screening Test When You Don t Even Know Who Has the Disease

Module Overview. What is a Marker? Part 1 Overview

Poisson regression. Dae-Jin Lee Basque Center for Applied Mathematics.

Lesson: A Ten Minute Course in Epidemiology

Assessing the impact of unmeasured confounding: confounding functions for causal inference

STATISTICAL METHODS FOR DIAGNOSTIC TESTING: AN ILLUSTRATION USING A NEW METHOD FOR CANCER DETECTION XIN SUN. PhD, Kansas State University, 2012

Missing data. Patrick Breheny. April 23. Introduction Missing response data Missing covariate data

Detection of Unknown Confounders. by Bayesian Confirmatory Factor Analysis

Estimating Heterogeneous Choice Models with Stata

Measuring cancer survival in populations: relative survival vs cancer-specific survival

Mostly Harmless Simulations? On the Internal Validity of Empirical Monte Carlo Studies

Unit 1 Exploring and Understanding Data

Bias. A systematic error (caused by the investigator or the subjects) that causes an incorrect (overor under-) estimate of an association.

Transcription:

Greenland/Arah, Epi 200C Sp 2000 1 of 6 EPI 200C Final, June 4 th, 2009 This exam includes 24 questions. INSTRUCTIONS: Write all answers on the answer sheets supplied; PRINT YOUR NAME and STUDENT ID NUMBER AT THE TOP OF THAT SHEET. Keep your exam questions for the review session, Thursday, June 12 th, 2008. The questions are multiple choice. There may be more than one correct answer for each question. List all the answers you agree with on the answer sheet. USE BLOCK CAPITAL LETTERS. Legibility is your responsibility. Each question is 2 points maximum credit, 2 points minimum, with 2/(# correct letters) for each correct letter and 2/(# incorrect letters) for each incorrect letter. THIS MEANS YOU ARE PENALIZED FOR GUESSING WRONG! 1. Random error A. is always chance variation. B. leads to incomplete predictability in epidemiology studies only when there is sampling error. C. can be reduced by increasing the size of the study. D. is still considered present even if the entire population of interest is included in the study. E. is low when study power is high. F. None of the above. 2. Good data analysis will always entail the following A. Data editing to identify incomplete records and check for consistency and accuracy of entries. B. Data summarization for inference using smoothing, modeling and hypothesis testing techniques. C. An inferential stage aimed at concise description of observations. D. Use of special methods to handle missing values 1

3. Suppose the following table were obtained from a cohort study of the effect of binary treatment X on binary outcome Y, with a measured binary covariate Z that is not affected by either X or Y: Z = 1 Z = 0 Summary table (ignoring covariate Z) X = 1 X = 0 X = 1 X = 0 X = 1 X = 0 Outcome Y = 1 40 25 20 5 60 30 Outcome Y = 0 10 25 30 45 40 70 A. The crude odds ratio is an average of the stratum-specific odds ratios. B. The crude risk difference is an average of the stratum-specific risk differences. C. The crude risk difference is a valid estimate of the absolute effect of X on Y. D. If there are no other potential confounders in this study, the covariate Z is not a confounder. 4. Which of the following is/are true? A. Loss to follow-up can lead to selection bias in cohort studies. B. Selection bias can arise from conditioning on the common effect of the exposure and an uncontrolled independent risk factor for the outcome. C. Case-control studies are no more prone to selection bias than are cohort studies. D. Selection bias cannot arise from conditioning on a common effect of an outcome and an unobserved independent predictor for the exposure. 5. In assessing interactions in epidemiology the following is/are true A. The definition of interaction response types under the potential-outcomes model is not specific to the outcome of interest. B. Additivity implies absence of interaction types. C. Presence of different interaction types is not compatible with an observation of additivity of risks. D. Superadditivity refers to an interaction contrast equal to or greater than 0. 2

6. Suppose in the DAG below all the variables are dichotomous and all the arrows represent positive but not perfect associations. Which of the following must be true? C X (Y) R Y* A. The marginal X-Y* association will be attenuated relative to the marginal association of X with Y. B. Controlling for C will make the X-Y* association less biased for the causal effect of X on Y than if C were not controlled. C. If C is controlled, the X-Y* association will be attenuated relative to the causal effect of X on Y. D. C is a valid instrumental variable for the effect of Y on Y*. 7. Regarding categorical analysis methods: A. Methods that stratify on follow-up time are needed for person-count data from studies with substantial loss to follow-up. B. Sparse-data methods are advisable when both expected and observed numbers of cases for each exposure category in some analysis strata are less than four. C. Categorical methods are needed to avoid assumptions about the shape of the XY doseresponse D. The method of choosing category boundaries is unimportant as long as the number of categories is at least five. 3

8. Regarding stratified analysis of epidemiologic data: A. Controlling for more variables can lead to the data becoming sparse across strata, which may lead to association changes being misinterpreted as evidence of confounding. B. One should use percentile categories of potential confounder variables to avoid unequal sizes of the analysis strata. C. Bias produced by confounder selection using statistical testing can be reduced by raising the α level. D. In order to reduce distortions caused by using the data to select variables for adjustment, one should base selection on changes in the point estimate. 9. Bayesian statistical analysis: A. requires the use of specialized software, as most software packages are not equipped to incorporate priors. B. requires posterior sampling. C. requires data augmentation. D. should compare priors with likelihoods before combining them. E. requires complete prior specification for all unknown parameters. F. None of the above. 10. Regarding bias analysis: A. A formal bias analysis is an objective assessment of the degree of bias that could have occurred in a study. B. Bias analysis should be done in all epidemiologic study reports. C. Bias analysis is more important for large studies because random error tends to be smaller in large studies. D. A bias analysis that uses prior distributions rather than specific values for the bias parameters could be interpreted as a semi-bayesian bias analysis. 11. A study collected data on the use (X) of selective serotonin reuptake inhibitors (SSRI: X=1 if used, 0 if not) and a binary measure (Y) of subsequent depression improvement (Y=1 if depression improved, 0 otherwise) among a cohort of middle-aged men suffering from 4

depression. It also had a measure of income Z as a potential confounder. Suppose you fit this logistic regression model to the data: g[e(y X=x, Z=z)] = β Y + β YX X + β YZ Z + β YXZ XZ. Which of the following statements is/are true? A. The link function g[.] is the logit link. B. The model implies that the odds of depression improvement among men with income of Z=2 using an SSRI is exp(β Y +β YX +2β YZ +2β YXZ ). C. The model implies that the log odds of depression improvement among men with an income of Z=1 who were not using any SSRI is β YZ. D. The inverse of the link function for the model is the antilog or exponential function ( exp ). E. The model is saturated. F. None of the above. 5

12. A closed cohort of 15,000 retired teachers aged at least 65 years was followed for 6 years to study the effect of statins on the occurrence of stroke. At the inception of the cohort, no teacher had ever used statins or had a history of heart disease or stroke. At baseline and every two years the teachers were evaluated for diagnosis of coronary heart disease (CHD), use of statins, and diagnosis of stroke since last evaluation. Suppose the causal diagram for this study could be drawn as follows where the evaluation times 0, 1 and 2 represent years 2, 4, and 6 after baseline: CHD 0 CHD 1 CHD 2 Statins 0 Statins 1 Statins 2 Stroke Not censored 0 Not censored 1 Not censored 2 Which of the following statements is/are true in the analysis of the longitudinal data from this cohort? A. Selection bias from the loss to follow-up (censoring) could be accounted for in the analysis provided there are no unmeasured confounders of the censoring process. B. To estimate the cumulative effect of statins on stroke incidence without bias, the diagnosis of CHD at each time point must be adjusted for in a regression model. C. Propensity score matching can be used to estimate the cumulative effect of statins use on stroke incidence. D. Conventional regression analysis will yield biased results because post-baseline CHD is affected by statins and confounds the effect of subsequent statins use. E. Standardization techniques can be used for the analysis of time-varying statins use. F. None of the above. 6

13. For a case-control study with an unmeasured binary confounder U, you were given the following general equation for the filling in the unobserved U-stratified 2 x 2 tables for the association between a binary exposure X and a binary outcome Y: For a cell count with Y=y and X=x in the crude (marginal) X-Y table, the proportion that should go into the U=1 stratum is expit(β U + β UY y + β UX x + β UYX yx). Crude X-Y table X=1 X=0 Y=1 A 1+ A 0+ Y=0 B 1+ B 0+ U=1 U=0 X=1 X=0 X=1 X=0 Y=1 A 11 A 01 Y=1 A 10 A 00 Y=0 B 11 B 01 Y=0 B 10 B 00 Which of the statements is/are always true? A. The cell count A 10 is given by A 1+ expit(β UY + β UX + β UYX ). B. The expression expit(β U + β UY + β UX + β UYX ) is the log odds of U when Y=1 and X=1. C. exp(β UYX ) quantifies how much the UX odds ratio changes when moving from Y=0 to Y=1. D. β UYX = 0 implies we are assuming there is no biologic interaction. 14. Which of the following is/are true regarding analysis of selection bias? A. There tend to be plenty of relevant data about bias parameters in analysis of selection bias. B. Because the concepts of selection bias and confounding overlap, the same bias correction factor is used to address them. C. Sensitivity analysis of selection bias sometimes simplifies to consideration of one bias factor. D. There will be no selection bias if the probability of selection in cases and noncases at every exposure level is 1. 15. If X, Z are binary exposures, Y is a binary outcome and odds(z=1 X=x, Y=y) = exp(β 0 + β X X + β Y Y + β XY XY), which of the following is/are true? 7

A. β Y = 0 implies that there is no association between Z and Y among those with X=1. B. β X = 0 and β XY = 0 together imply that X and Z are not associated given Y. C. β X = 0 implies no statistical interaction between X, Y and Z on the odds-ratio scale. D. β XY = 0 imply no biologic interaction between X and Z in producing the outcome Y. 16. In a study of the association between height (X) measured in centimeters and developing hypertrophic cardiomyopathy or chronic enlargement of the heart (Y), A. the model ln[r(x=x)] = α Y + α YX x is a logistic risk model for developing hypertrophic cardiomyopathy when X=x. B. ln[r(x=0)] = α Y can be interpreted as the background risk of hypertrophic cardiomyopathy. C. ln[odds(x=x)] = α Y + α YX x is a log-linear odds model for developing hypertrophic cardiomyopathy when X=x. D. it is advisable to recenter X around its mean in the study population. E. it is advisable to rescale X by transforming it into a Z-score. F. logit[r(x=x)] ln[r(x=x)] when the risk of developing hypertrophic cardiomyopathy is very small when X=x. G. None of the above. 17. Regarding regression models for a target population, A. In the two logistic regression models Y = β Y + β YX X + β YZ Z and X = γx + γ XY Y + γ XZ Z relating the same three binary variables X, Y and Z, the parameters β YX and γ XY are equal. B. The rate model E(Y X=x) = exp(α + βx) can never give negative rates. C. Pr(Y=1 set[sex=male]) Pr(Y=1 set[sex=female]) can be estimated when all confounding, selection bias, and misclassification is eliminated. D. Model specification is a form of model fitting. E. The model E(Y X=x) = exp( β Y β YX x) differs from the model E(Y -1 X=x) = exp(β Y +β YX x). F. None of the above. 18. Monte-Carlo Sensitivity Analysis: A) Should correct biases in the reverse order that they occurred. B) Will give similar results to a semi-bayesian analysis when no identified parameter is given a prior. C) Can incorporate random error in the corrections. 8

D) Treats every possible value of the bias parameters as equally probable. E) None of the above. 19. For a binary (0,1) outcome Y with antecedent variables X and Z, the expression Pr(Y=1 X=x,Z=z) represents A) the probability that having X=x will cause Y=1 if Z=z B) the probability that having Z=z will cause Y=1 if X=x C) the probability of observing Y=1 if X=x and Z=z. D) the mean of Y when X=x and Z=z E) None of the above 20. Bootstrapping requires A) drawing smaller subsamples from your study sample to see how your estimate changes. B) a large sample size. C) taking percentiles of the resampling distribution of estimates as confidence limits. D) specification of a prior distribution. E) None of the above 21. If the true exposure X and its measurement X* are positively associated, nondifferential error with respect to the outcome Y A) Always results in bias towards the null B) Can be reasonably assumed if the exposure measure was recorded before the outcome occurred C) Can be reasonably assumed if the exposure assignment is X* and is randomized, and intention to treat analysis is performed D) Absent other biases, allows a valid test of the null hypothesis that X does not affect Y. E) None of the above 22. Regarding model selection strategies, which of the following is/are true? A) For a regressand together with a given set of regressor variables, there is a unique minimal model and a unique maximal model that are not conflicting with background information about relationships among the variables. B) An expanding search process starts with a model form that is highly flexible. 9

C) A limitation of a purely contracting search process is that it may encounter sparse-data problems. D) A combination of expanding and contracting processes, such as the stepwise automated selection algorithm, is the best strategy in model searching. E) None of the above 23. Which of the following is/are true about model checking? A) A good fitting model must be a correct or approximately correct model. B) Model diagnostics not only helps to detect discrepancies between the data and the model but also indicates whether or not the model holds beyond the range of observed data. C) The usefulness of model diagnostic statistics is not affected by sample size. D) Comparing regression-model-based results with corresponding basic categorical-analysis results is helpful in understanding the extent to which the model-based results possibly do not reflect the data. E) None of the above. 24. Which of the following is/are described by Gilovich as example(s) of the influence of people s expectation and prior beliefs on their evaluation of evidence? A) Parents expect a child who excels in school one year to do as well or better the following year. B) Clergymen doubted Galileo s claim that the earth was not the center of the solar system C) Football and hockey sport teams that wear black uniforms have been penalized more often than average. D) Scientists are more likely to run additional experiments if the results of an initial study appear to refute a favored hypothesis. E) None of the above. 10