EPI 200C Final, June 4 th, 2009 This exam includes 24 questions.

Greenland/Arah, Epi 200C Sp 2000 1 of 6 EPI 200C Final, June 4 th, 2009 This exam includes 24 questions. INSTRUCTIONS: Write all answers on the answer sheets supplied; PRINT YOUR NAME and STUDENT ID NUMBER AT THE TOP OF THAT SHEET. Keep your exam questions for the review session, Thursday, June 12 th, 2008. The questions are multiple choice. There may be more than one correct answer for each question. List all the answers you agree with on the answer sheet. USE BLOCK CAPITAL LETTERS. Legibility is your responsibility. Each question is 2 points maximum credit, 2 points minimum, with 2/(# correct letters) for each correct letter and 2/(# incorrect letters) for each incorrect letter. THIS MEANS YOU ARE PENALIZED FOR GUESSING WRONG! 1. Random error A. is always chance variation. B. leads to incomplete predictability in epidemiology studies only when there is sampling error. C. can be reduced by increasing the size of the study. D. is still considered present even if the entire population of interest is included in the study. E. is low when study power is high. F. None of the above. 2. Good data analysis will always entail the following A. Data editing to identify incomplete records and check for consistency and accuracy of entries. B. Data summarization for inference using smoothing, modeling and hypothesis testing techniques. C. An inferential stage aimed at concise description of observations. D. Use of special methods to handle missing values 1

3. Suppose the following table were obtained from a cohort study of the effect of binary treatment X on binary outcome Y, with a measured binary covariate Z that is not affected by either X or Y: Z = 1 Z = 0 Summary table (ignoring covariate Z) X = 1 X = 0 X = 1 X = 0 X = 1 X = 0 Outcome Y = 1 40 25 20 5 60 30 Outcome Y = 0 10 25 30 45 40 70 A. The crude odds ratio is an average of the stratum-specific odds ratios. B. The crude risk difference is an average of the stratum-specific risk differences. C. The crude risk difference is a valid estimate of the absolute effect of X on Y. D. If there are no other potential confounders in this study, the covariate Z is not a confounder. 4. Which of the following is/are true? A. Loss to follow-up can lead to selection bias in cohort studies. B. Selection bias can arise from conditioning on the common effect of the exposure and an uncontrolled independent risk factor for the outcome. C. Case-control studies are no more prone to selection bias than are cohort studies. D. Selection bias cannot arise from conditioning on a common effect of an outcome and an unobserved independent predictor for the exposure. 5. In assessing interactions in epidemiology the following is/are true A. The definition of interaction response types under the potential-outcomes model is not specific to the outcome of interest. B. Additivity implies absence of interaction types. C. Presence of different interaction types is not compatible with an observation of additivity of risks. D. Superadditivity refers to an interaction contrast equal to or greater than 0. 2

6. Suppose in the DAG below all the variables are dichotomous and all the arrows represent positive but not perfect associations. Which of the following must be true? C X (Y) R Y* A. The marginal X-Y* association will be attenuated relative to the marginal association of X with Y. B. Controlling for C will make the X-Y* association less biased for the causal effect of X on Y than if C were not controlled. C. If C is controlled, the X-Y* association will be attenuated relative to the causal effect of X on Y. D. C is a valid instrumental variable for the effect of Y on Y*. 7. Regarding categorical analysis methods: A. Methods that stratify on follow-up time are needed for person-count data from studies with substantial loss to follow-up. B. Sparse-data methods are advisable when both expected and observed numbers of cases for each exposure category in some analysis strata are less than four. C. Categorical methods are needed to avoid assumptions about the shape of the XY doseresponse D. The method of choosing category boundaries is unimportant as long as the number of categories is at least five. 3

8. Regarding stratified analysis of epidemiologic data: A. Controlling for more variables can lead to the data becoming sparse across strata, which may lead to association changes being misinterpreted as evidence of confounding. B. One should use percentile categories of potential confounder variables to avoid unequal sizes of the analysis strata. C. Bias produced by confounder selection using statistical testing can be reduced by raising the α level. D. In order to reduce distortions caused by using the data to select variables for adjustment, one should base selection on changes in the point estimate. 9. Bayesian statistical analysis: A. requires the use of specialized software, as most software packages are not equipped to incorporate priors. B. requires posterior sampling. C. requires data augmentation. D. should compare priors with likelihoods before combining them. E. requires complete prior specification for all unknown parameters. F. None of the above. 10. Regarding bias analysis: A. A formal bias analysis is an objective assessment of the degree of bias that could have occurred in a study. B. Bias analysis should be done in all epidemiologic study reports. C. Bias analysis is more important for large studies because random error tends to be smaller in large studies. D. A bias analysis that uses prior distributions rather than specific values for the bias parameters could be interpreted as a semi-bayesian bias analysis. 11. A study collected data on the use (X) of selective serotonin reuptake inhibitors (SSRI: X=1 if used, 0 if not) and a binary measure (Y) of subsequent depression improvement (Y=1 if depression improved, 0 otherwise) among a cohort of middle-aged men suffering from 4

depression. It also had a measure of income Z as a potential confounder. Suppose you fit this logistic regression model to the data: g[e(y X=x, Z=z)] = β Y + β YX X + β YZ Z + β YXZ XZ. Which of the following statements is/are true? A. The link function g[.] is the logit link. B. The model implies that the odds of depression improvement among men with income of Z=2 using an SSRI is exp(β Y +β YX +2β YZ +2β YXZ ). C. The model implies that the log odds of depression improvement among men with an income of Z=1 who were not using any SSRI is β YZ. D. The inverse of the link function for the model is the antilog or exponential function ( exp ). E. The model is saturated. F. None of the above. 5

12. A closed cohort of 15,000 retired teachers aged at least 65 years was followed for 6 years to study the effect of statins on the occurrence of stroke. At the inception of the cohort, no teacher had ever used statins or had a history of heart disease or stroke. At baseline and every two years the teachers were evaluated for diagnosis of coronary heart disease (CHD), use of statins, and diagnosis of stroke since last evaluation. Suppose the causal diagram for this study could be drawn as follows where the evaluation times 0, 1 and 2 represent years 2, 4, and 6 after baseline: CHD 0 CHD 1 CHD 2 Statins 0 Statins 1 Statins 2 Stroke Not censored 0 Not censored 1 Not censored 2 Which of the following statements is/are true in the analysis of the longitudinal data from this cohort? A. Selection bias from the loss to follow-up (censoring) could be accounted for in the analysis provided there are no unmeasured confounders of the censoring process. B. To estimate the cumulative effect of statins on stroke incidence without bias, the diagnosis of CHD at each time point must be adjusted for in a regression model. C. Propensity score matching can be used to estimate the cumulative effect of statins use on stroke incidence. D. Conventional regression analysis will yield biased results because post-baseline CHD is affected by statins and confounds the effect of subsequent statins use. E. Standardization techniques can be used for the analysis of time-varying statins use. F. None of the above. 6

13. For a case-control study with an unmeasured binary confounder U, you were given the following general equation for the filling in the unobserved U-stratified 2 x 2 tables for the association between a binary exposure X and a binary outcome Y: For a cell count with Y=y and X=x in the crude (marginal) X-Y table, the proportion that should go into the U=1 stratum is expit(β U + β UY y + β UX x + β UYX yx). Crude X-Y table X=1 X=0 Y=1 A 1+ A 0+ Y=0 B 1+ B 0+ U=1 U=0 X=1 X=0 X=1 X=0 Y=1 A 11 A 01 Y=1 A 10 A 00 Y=0 B 11 B 01 Y=0 B 10 B 00 Which of the statements is/are always true? A. The cell count A 10 is given by A 1+ expit(β UY + β UX + β UYX ). B. The expression expit(β U + β UY + β UX + β UYX ) is the log odds of U when Y=1 and X=1. C. exp(β UYX ) quantifies how much the UX odds ratio changes when moving from Y=0 to Y=1. D. β UYX = 0 implies we are assuming there is no biologic interaction. 14. Which of the following is/are true regarding analysis of selection bias? A. There tend to be plenty of relevant data about bias parameters in analysis of selection bias. B. Because the concepts of selection bias and confounding overlap, the same bias correction factor is used to address them. C. Sensitivity analysis of selection bias sometimes simplifies to consideration of one bias factor. D. There will be no selection bias if the probability of selection in cases and noncases at every exposure level is 1. 15. If X, Z are binary exposures, Y is a binary outcome and odds(z=1 X=x, Y=y) = exp(β 0 + β X X + β Y Y + β XY XY), which of the following is/are true? 7

A. β Y = 0 implies that there is no association between Z and Y among those with X=1. B. β X = 0 and β XY = 0 together imply that X and Z are not associated given Y. C. β X = 0 implies no statistical interaction between X, Y and Z on the odds-ratio scale. D. β XY = 0 imply no biologic interaction between X and Z in producing the outcome Y. 16. In a study of the association between height (X) measured in centimeters and developing hypertrophic cardiomyopathy or chronic enlargement of the heart (Y), A. the model ln[r(x=x)] = α Y + α YX x is a logistic risk model for developing hypertrophic cardiomyopathy when X=x. B. ln[r(x=0)] = α Y can be interpreted as the background risk of hypertrophic cardiomyopathy. C. ln[odds(x=x)] = α Y + α YX x is a log-linear odds model for developing hypertrophic cardiomyopathy when X=x. D. it is advisable to recenter X around its mean in the study population. E. it is advisable to rescale X by transforming it into a Z-score. F. logit[r(x=x)] ln[r(x=x)] when the risk of developing hypertrophic cardiomyopathy is very small when X=x. G. None of the above. 17. Regarding regression models for a target population, A. In the two logistic regression models Y = β Y + β YX X + β YZ Z and X = γx + γ XY Y + γ XZ Z relating the same three binary variables X, Y and Z, the parameters β YX and γ XY are equal. B. The rate model E(Y X=x) = exp(α + βx) can never give negative rates. C. Pr(Y=1 set[sex=male]) Pr(Y=1 set[sex=female]) can be estimated when all confounding, selection bias, and misclassification is eliminated. D. Model specification is a form of model fitting. E. The model E(Y X=x) = exp( β Y β YX x) differs from the model E(Y -1 X=x) = exp(β Y +β YX x). F. None of the above. 18. Monte-Carlo Sensitivity Analysis: A) Should correct biases in the reverse order that they occurred. B) Will give similar results to a semi-bayesian analysis when no identified parameter is given a prior. C) Can incorporate random error in the corrections. 8

D) Treats every possible value of the bias parameters as equally probable. E) None of the above. 19. For a binary (0,1) outcome Y with antecedent variables X and Z, the expression Pr(Y=1 X=x,Z=z) represents A) the probability that having X=x will cause Y=1 if Z=z B) the probability that having Z=z will cause Y=1 if X=x C) the probability of observing Y=1 if X=x and Z=z. D) the mean of Y when X=x and Z=z E) None of the above 20. Bootstrapping requires A) drawing smaller subsamples from your study sample to see how your estimate changes. B) a large sample size. C) taking percentiles of the resampling distribution of estimates as confidence limits. D) specification of a prior distribution. E) None of the above 21. If the true exposure X and its measurement X* are positively associated, nondifferential error with respect to the outcome Y A) Always results in bias towards the null B) Can be reasonably assumed if the exposure measure was recorded before the outcome occurred C) Can be reasonably assumed if the exposure assignment is X* and is randomized, and intention to treat analysis is performed D) Absent other biases, allows a valid test of the null hypothesis that X does not affect Y. E) None of the above 22. Regarding model selection strategies, which of the following is/are true? A) For a regressand together with a given set of regressor variables, there is a unique minimal model and a unique maximal model that are not conflicting with background information about relationships among the variables. B) An expanding search process starts with a model form that is highly flexible. 9

C) A limitation of a purely contracting search process is that it may encounter sparse-data problems. D) A combination of expanding and contracting processes, such as the stepwise automated selection algorithm, is the best strategy in model searching. E) None of the above 23. Which of the following is/are true about model checking? A) A good fitting model must be a correct or approximately correct model. B) Model diagnostics not only helps to detect discrepancies between the data and the model but also indicates whether or not the model holds beyond the range of observed data. C) The usefulness of model diagnostic statistics is not affected by sample size. D) Comparing regression-model-based results with corresponding basic categorical-analysis results is helpful in understanding the extent to which the model-based results possibly do not reflect the data. E) None of the above. 24. Which of the following is/are described by Gilovich as example(s) of the influence of people s expectation and prior beliefs on their evaluation of evidence? A) Parents expect a child who excels in school one year to do as well or better the following year. B) Clergymen doubted Galileo s claim that the earth was not the center of the solar system C) Football and hockey sport teams that wear black uniforms have been penalized more often than average. D) Scientists are more likely to run additional experiments if the results of an initial study appear to refute a favored hypothesis. E) None of the above. 10