The Exposure-Stratified Retrospective Study: Application to High-Incidence Diseases

Similar documents
Chapter 02. Basic Research Methodology

I. Introduction and Data Collection B. Sampling. 1. Bias. In this section Bias Random Sampling Sampling Error

Concepts of Clinical Trials

Sanjay P. Zodpey Clinical Epidemiology Unit, Department of Preventive and Social Medicine, Government Medical College, Nagpur, Maharashtra, India.

Selection Bias in the Assessment of Gene-Environment Interaction in Case-Control Studies

Department of International Health

UN Handbook Ch. 7 'Managing sources of non-sampling error': recommendations on response rates

Population versus sampling procedures: implications from epidemiological studies

WELCOME! Lecture 11 Thommy Perlinger

A Brief Introduction to Bayesian Statistics

REPRODUCTIVE ENDOCRINOLOGY

Selection and Combination of Markers for Prediction

SAMPLING AND SCREENING PROBLEMS IN RHEUMATIC HEART DISEASE CASE. FINDING STUDY

EPI 200C Final, June 4 th, 2009 This exam includes 24 questions.

Sample Size Considerations. Todd Alonzo, PhD

Statistics for Clinical Trials: Basics of Phase III Trial Design

Missing Data and Imputation

Unit 1 Exploring and Understanding Data

Flexible Matching in Case-Control Studies of Gene-Environment Interactions

Catherine A. Welch 1*, Séverine Sabia 1,2, Eric Brunner 1, Mika Kivimäki 1 and Martin J. Shipley 1

A review of statistical methods in the analysis of data arising from observer reliability studies (Part 11) *

Gender-Based Differential Item Performance in English Usage Items

Unequal Numbers of Judges per Subject

How to analyze correlated and longitudinal data?

Adjusted Crash Odds Ratio Estimates of Driver Behavior Errors: A Re-Analysis of the SHRP2 Naturalistic Driving Study Data

NPTEL Project. Econometric Modelling. Module 14: Heteroscedasticity Problem. Module 16: Heteroscedasticity Problem. Vinod Gupta School of Management

Dr. Kelly Bradley Final Exam Summer {2 points} Name

Fundamental Clinical Trial Design

This is a repository copy of Practical guide to sample size calculations: non-inferiority and equivalence trials.

Case-control studies are retrospective investigations

Bias in regression coefficient estimates when assumptions for handling missing data are violated: a simulation study

JSM Survey Research Methods Section

Methodological aspects of non-inferiority and equivalence trials

Analysis of TB prevalence surveys

Lec 02: Estimation & Hypothesis Testing in Animal Ecology

Rational drug design as hypothesis formation

A NEW TRIAL DESIGN FULLY INTEGRATING BIOMARKER INFORMATION FOR THE EVALUATION OF TREATMENT-EFFECT MECHANISMS IN PERSONALISED MEDICINE

EC352 Econometric Methods: Week 07

Using Statistical Principles to Implement FDA Guidance on Cardiovascular Risk Assessment for Diabetes Drugs

Trial designs fully integrating biomarker information for the evaluation of treatment-effect mechanisms in stratified medicine

Two-sample Categorical data: Measuring association

A Methodological Issue in the Analysis of Second-Primary Cancer Incidence in Long-Term Survivors of Childhood Cancers

Psychology, 2010, 1: doi: /psych Published Online August 2010 (

Simple Linear Regression the model, estimation and testing

INTRODUCTION TO STATISTICS SORANA D. BOLBOACĂ

Mantel-Haenszel Procedures for Detecting Differential Item Functioning

Fixed-Effect Versus Random-Effects Models

A re-randomisation design for clinical trials

2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0%

WRITTEN PRELIMINARY Ph.D. EXAMINATION. Department of Applied Economics. January 17, Consumer Behavior and Household Economics.

Studying the effect of change on change : a different viewpoint

EPIDEMIOLOGY. Training module

Checking the counterarguments confirms that publication bias contaminated studies relating social class and unethical behavior

26:010:557 / 26:620:557 Social Science Research Methods

Dichotomizing partial compliance and increased participant burden in factorial designs: the performance of four noncompliance methods

BAYESIAN ESTIMATORS OF THE LOCATION PARAMETER OF THE NORMAL DISTRIBUTION WITH UNKNOWN VARIANCE

4. Model evaluation & selection

# tablets AM probability PM probability

Lecture Outline. Biost 590: Statistical Consulting. Stages of Scientific Studies. Scientific Method

Dr. Allen Back. Oct. 7, 2016

Comparing disease screening tests when true disease status is ascertained only for screen positives

American Journal of. Copyright 1999 by The Johns Hopkins University School of Hygiene and Public Hearth

Sampling for Success. Dr. Jim Mirabella President, Mirabella Research Services, Inc. Professor of Research & Statistics

The Impact of Continuity Violation on ANOVA and Alternative Methods

Evaluation of Models to Estimate Urinary Nitrogen and Expected Milk Urea Nitrogen 1

STATISTICS AND RESEARCH DESIGN

Book review of Herbert I. Weisberg: Bias and Causation, Models and Judgment for Valid Comparisons Reviewed by Judea Pearl

Bayesian and Frequentist Approaches

Multiple Regression Analysis

MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and. Lord Equating Methods 1,2

Glossary. Ó 2010 John Wiley & Sons, Ltd

Sampling Uncertainty / Sample Size for Cost-Effectiveness Analysis

Comparison of the Null Distributions of

MEA DISCUSSION PAPERS

Biostatistics and Epidemiology Step 1 Sample Questions Set 1

A Method for Utilizing Bivariate Efficacy Outcome Measures to Screen Agents for Activity in 2-Stage Phase II Clinical Trials

Efficient Bayesian Sample Size Calculation for Designing a Clinical Trial with Multi-Cluster Outcome Data

A Critique of Two Methods for Assessing the Nutrient Adequacy of Diets

LAB ASSIGNMENT 4 INFERENCES FOR NUMERICAL DATA. Comparison of Cancer Survival*

Overview of Study Designs

A Comparison of Robust and Nonparametric Estimators Under the Simple Linear Regression Model

Edinburgh Research Explorer

Bayesian graphical models for combining multiple data sources, with applications in environmental epidemiology

SRI LANKA AUDITING STANDARD 530 AUDIT SAMPLING CONTENTS

IAASB Main Agenda (February 2007) Page Agenda Item PROPOSED INTERNATIONAL STANDARD ON AUDITING 530 (REDRAFTED)

Clinical trial design issues and options for the study of rare diseases

Running head: NESTED FACTOR ANALYTIC MODEL COMPARISON 1. John M. Clark III. Pearson. Author Note

MIDTERM EXAM. Total 150. Problem Points Grade. STAT 541 Introduction to Biostatistics. Name. Spring 2008 Ismor Fischer

This is a repository copy of Practical guide to sample size calculations: superiority trials.

Impact and adjustment of selection bias. in the assessment of measurement equivalence

Estimating HIV incidence in the United States from HIV/AIDS surveillance data and biomarker HIV test results

Decomposition of the Genotypic Value

Critical Appraisal Series

Power of a Clinical Study

Challenges of Observational and Retrospective Studies

Computerized Mastery Testing

Biostatistics for Med Students. Lecture 1

Pre-Calculus Multiple Choice Questions - Chapter S4

Hypothesis testing: comparig means

Chapter 1: Exploring Data

Transcription:

The Exposure-Stratified Retrospective Study: Application to High-Incidence Diseases Peng T. Liu and Debra A. Street Division of Public Health and Biostatistics, CFSAN, FDA 5100 Paint Branch Pkwy, College Park, MD 20274 Abstract For measuring the degree of association between a disease and a risk factor, we have developed the exposure-stratified retrospective study that can serve as an alternative to the conventional case-control study. When the disease rate is greater than the exposure rate, this study design is superior in terms of the required sample size and the efficiency of the estimation. Key Words: Retrospective study, stratified sampling,, sample size, efficiency of estimate, odds ratio 1. Introduction The case-control study or retrospective study (Cornfield,1951) is a disease-stratified sampling. It divides the population into diseased and non-diseased strata, draws a proper number of samples from each stratum, surveys their exposure status and estimates the proportion of exposure for each group. In a rare-disease situation, the odds ratio estimated from this study is a good approximation of the relative risk. The retrospective approach is a backward approach to survey the respondent s past disease-exposure status. In addition to disease-stratified sampling, simple random sampling and exposure-stratified sampling are also available. However, these two designs are rarely discussed in the literature due to their typically low efficiency of estimation. Hence, disease-stratified sampling has become the sole design involving a retrospective approach. Recently, we reviewed the literature related to diabetes and obesity in order to identify possible associations with food consumption. Since we are dealing with high-incidence diseases, the rare disease assumption is clearly inappropriate and the odds ratio is no longer a good approximation of relative risk. Moreover, it turns out that the case-control study is no longer the most effective design under certain conditions. The objectives of this article are to: 1) develop procedures for estimating the sample size with fixed margins, 2) compare sample sizes and relative efficiencies between two different stratified sampling designs, and 3) define the inequality of sample size and inequality of design efficiency between two stratified designs and determine the feasible regions for each design. 2. Sample size with fixed margins The sample size for detecting a significant difference between two independent proportions is shown in the following equation (Snedecor, 1989; Fleiss, 1981; Schlesselman, 1974). n = {Z α/2 [2 p (1- p )] ½ + Z β [ p 1 (1- p 1 ) + p 2 (1- p 2 ) ] ½ } 2 / (p 1 p 2 ) 2 Where, p = (p 1 + p 2 )/2, Z α/2 = standardized normal value associated with two-tailed type-i error α. Z β = standardized normal value associated with one-tailed type-ii error β. 1897

In the case-control study, p 1 = P(E D ) represents the probability of exposure given a non-diseased case, and p 2 = P(E D ) represents the probability of exposure given a diseased case. (Note: the terms rate, proportion and probability are used synonymously). Usually, the exposure rate in the non-diseased group p 1 is assumed to be known or capable of being reasonably approximated, and the exposure rate in the exposed group p 2 depends on p 1 and r, where r is the minimum relative risk we desire to detect. (Schlesselman; 1974). In most situations, the disease rate P(D) and exposure rates P(E) in population are available but their association is unknown and to be detected by the proposed study. In this article, we utilize P(D) and P(E) directly in sample size estimation, avoid some improper assumptions such as P(D E ) = P(D) and P(E D ) = P(E), and improve the accuracy of sample size estimation. The developed procedures are illustrated by the following two examples. 2.1 Example 1 Suppose the prevalence rate of diabetes (disease) is 5.0% and the prevalence rate of obesity (risk factor) is 21%. What sample size is needed to be 90% sure of detecting a relative risk r = 2.0 at the 5% significance level, if we propose a case-control study or an exposure-stratified study? Solution: The exposure-disease status in a population can be expressed by a 2 x 2 table. The two marginal distributions are (0.21, 0.79), and (0.05, 0.095) respectively. To satisfy these two margin restrictions and to have a relative risk r = A(C+D)/C(A+B) = 2.0, simple algebraic operations yield the population distribution shown in Table 1. Table 1.The obesity-diabetes distribution under the hypothesis r = 2.0. Diabetes Non-Non-diabetes Obesity A= 0.0174 * B= 0.1926 0.2100 Non-obesity C= 0.0326 D= 0.7574 0.7900 0.0500 0.95000 1.0000 * If P(D) = δ and P(E) = ε and r = A (1-ε) / C ε (where C = δ A), then A = r δ ε / (1-ε + rε). 2.1.1 Sample size for the case-control study The case-control study divides the population into diabetes and non-diabetes strata. The exposure distribution for each stratum is shown in Table 2. Table 2. The distribution of obesity in the diabetes and non-diabetes groups Diabetes Non-diabetes Obesity 0.3480 0.2070 Non-obesity 0.6520 0.7973 1.00000 1.00000 The probability of obesity given a diabetes case, p 1 = P(O D) = 0.3480 and the probability of obesity given a non diabetes case, p 2 = P (O D ) = 0.2027. Substituting p 1 = 0.3480, p 2 = 0.2070, p = (p 1 + p 2 )/2 = 0.2775, Zα /2 = Z 0.05/2 = 1.960 and Z β = Z 0.10 = 1.282 into equation (1), yields n = 210 (per group). 2.1.2 Sample size for the exposure-stratified study The exposure-stratified study divides the population into obesity and non-obesity strata. The disease distribution for each group is shown in Table 3. Table 3. The distribution of diabetes in the obesity and non-obesity groups. Diabetes Non-diabetes Obesity 0.0826 0.9176 1.00000 Non-obesity 0.0413 0.9587 1.00000 _ 1898

The probability of diabetes given an obesity case, p 1 = P (D O) = 0.0826 and the probability of diabetes given a non obesity case, p 2 = P(D O ) = 0.0413. Substituting p 1 = 0.0826, p 2 = 0.0413, p = (p 1 + p 2 )/2 =0.0619, Zα /2 = Z 0.05/2 = 1.960 and Z β = Z 0.10 = 1.282 into equation (1) yields n = 714 (per group). Thus, in this example, the case-control study actually reduces the required sample size by 70.6% ((714-210)/714). In other words, the exposure-stratified study requires 3.39 times as many samples as the case-control study. 2.2. Example 2 Suppose the prevalence rate of obesity (disease) is 21% and the prevalence rate of vegetarianism (anti-risk factor) is 3%. What sample size is needed to be 90% sure of detecting an odds ratio o = 2.0 at the 5% significance level if we propose a case-control study and an exposure-stratified study? Solution: The prevalence rate of obesity is 21% and prevalence rate of vegetarianism is 3%. The marginal distributions are (0.21, 0.79) and (0.97, 0.03), respectively. To satisfy these two margins and to have an odds ratio o = AD/BC = 2.0, simple algebraic operations yield the population distribution shown in Table 4. Table 4.The population distribution under the hypothesis o = 2.0. Obesity Non-obesity Non-vegetarian A= 0.2064 B = 0.7636 0.9700 Vegetarian C = 0.0036 D= 0.0264 0.0300 0.2100 0.7900 1.0000 2.2.1 Sample size for the case-control study The case-control study divides the population into obesity and non-obesity strata. The exposure distribution for each stratum is shown in Table 5: Table 5. Sample distribution for obesity and non-obesity groups Obesity Non-obesity Non-vegetarian 0.9828 0.9665 Vegetarian 0.0172 0.0335 1.0000 1.0000 The probability of being vegetarian given an obesity case, p 1 = P(V O) = 0.0172 and the probability of being vegetarian given a non-obesity case, p 2 = P(V O ) = 0.0335. Substituting p 1 = 0.0172, p 2 = 0.0335, p = 0.0254, Zα/2 = Z0.025= 1.960, and Z β = Z 0.10 = 1.282 into equation (1) yield n = 2,170 (per group). 2.2.2 Sample size for the exposure-stratified study The exposure-stratified study divides the population into vegetarian and non-vegetarian strata. The disease distribution for each strata is shown in Table 6. Table 6. Obesity distribution for vegetarian and non-vegetarian groups Obesity Non-obesity Non-vegetarian 0.2127 0.7873 1.0000 Vegetarian 0.1200 0.8800 1.0000 The probability of obesity given a vegetarian, p 1 = P(O V ) = 0.1200 and the probability of obesity given a nonvegetarian, p 2 = P(O V ) = 0.2127. Substituting p 1 = 0.1200, p 2 = 0.2127, p = 0.1664, Z α/2 = Z 0.025 = 1.960, and Z β = Z 0.10 = 1.282 into equation (1) yield n = 337 (per group). 1899

Thus, in this worked example, the exposure-stratified study could reduce the required samples size by as much as 84.4% ((2170-337)/2,170). 3. Inequality of sample sizes between two stratified studies The above two examples are sufficient to illustrate the sample size properties for detecting the desired level of relative risk or odds ratio subject to the constraints of the arbitrarily chosen Type I and Type II error rates. Example 1 shows a situation where the case-control study requires a smaller sample size than the exposure-stratified study. By contrast, example 2 shows the situation where the case-control study requires a greater sample size. When the disease rate is equal to the exposure rate, the population distribution is characterized by a symmetric matrix. The exposure rate in the diseased group and in the non-diseased group derived from the disease-stratified study are the same as the corresponding disease rate in the exposed group and in the non-exposed group, respectively. Hence, the sample size for the two stratified designs becomes identical. In general, the inequality of sample size between two stratified designs may be stated as follows: If the disease rate is less than the exposure rate, then the sample size for the case-control study is less the exposurestratified study. If the disease rate is equal to the exposure rate, then the sample size for the case-control study is equal to the sample size for the exposure-stratified study. If the disease rate is greater than the exposure rate, then the sample size for the case-control study is greater than the sample size for the exposure stratified study. These inequalities could be used as a guideline for design selection. For example, suppose we want to study the degree of association between diabetes (with 5% prevalence rate) and a vegan diet (with 3% prevalence rate). As seen from our guideline, the exposure-stratified study requires a smaller sample size than the case-control study. 4. Inequality of efficiencies between two stratified studies The Pitman relative efficiency of estimation (Zacks, 1985) is expressed by the reciprocal ratio of two corresponding standard deviations. The odds ratios estimated from the three types of retrospective studies are asymptotically unbiased but with different efficiencies (i.e., variances). The relative efficiency of the odds ratio estimate can be simplified as the ratio of two corresponding k values. (where k = 1/a + 1/b + 1/c + 1/d, and a, b, c, and d reflect the sample distribution and are the entries in the matrix). In example 1, the relative efficiency of the odds ratio estimate as determined from the exposure stratified design compared to that for the case-control study is R.E. (ESS:CCS) = {(1/0.0826 + 1/0.9176 + 1/0.0413 + 1/0.9587)/ (1/0.3480 + 1/0.2070 + 1/0.6520 + 0.7973)} ½ = 0.54 (54%). In example 2, the corresponding result is R.E. (ESS:CCS) = {(1/0.2127 +1/0.7873+0.1210+1/0.8800)/ (1/0.9828 + 1/0.0172 + 1/0.9665 + 1/0.0355)} ½ = 6.06 (606%). Analogously to the results for the sample size, the inequality of two design efficiencies can be stated as follows: If the exposure rate is less than the disease rate, then the efficiency of the odds ratio estimate from the case-control study is less than the efficiency of the odds ratio estimate from the exposure-stratified study. If the exposure rate is equal to the disease rate, then the efficiency of the odds ratio estimate from the case-control study is equal to the efficiency of the odds ratio estimate from the case-control study. If the exposure rate is greater than the disease rate, then the efficiency of the odds ratio estimate from the case-control study control is greater than the efficiency of the odds ratio estimate from the exposure-stratified study. The inequality of sample size statements and the inequality of design efficiencies statements for the two designs are parallel and consistent with one another. 5. Discussion 1900

The procedures developed here which may be called reverse weighting procedures - call for i) decomposing either the disease rate in the population into disease rate in the exposed group and disease rate in the non-exposed group; or for ii)decomposing the exposure rate in the population into the exposure rate in the diseased group and the exposure rate in the non-diseased group, depending on the proposed design. These procedures could avoid certain improper assumptions, and improve the accuracy of estimation. This approach can also evaluate the functional relation between the odds ratio and relative risk, and examine the degree of assumption violation. In example 1, the sample size for detecting relative risk r = 2.0 is equivalent to detecting the odds ratio o = 2.1. The bias is 5% ( (229-210/210) x100% x). In example 2, the sample size for detecting the odds ratio o = 2.0 is equivalent to detecting a relative risk r = 1.75. The bias is 14.5% ((2.0-1.75)/1.75) x 100%). The sample size for detecting a desired relative risk in the case control study has been discussed by Schlesselman in 1974. His sample size table developed in the same year has been widely used. Since the relative risk cannot be estimated from a case-control study, some approximations should be adopted in his estimation. Therefore, his table turns out to be suited only for detecting a desired odds ratio instead of a relative risk. The discrepancy between the sample size for detecting the same value of odds ratio or relative risk could be greater than 20 % under certain conditions. Based on the inequality of odds ratio and relative risk, when the odds ratio is greater than 1, the odds ratio is greater than the relative risk (Liu and Street; 2004). To treat a relative risk as an odds ratio could substantially underestimate the true required sample size. References: Cornfield J. (1951). A method of estimating comparative rates from clinical data. Applications to cancer of lung, breast, and cervix. J. Natl. Cancer Inst. Vol. 11: 1269-1275. Fleiss, J.L. et al. (2003) Statistical Methods for Rates and Proportions, John Wiley & Sons, Inc. New York. Liu, P. T. and D. A Street (2004) The Relative Risk and Odds Ratio Inequalities and Conversions. The 2004 Proceeding of Joint Statistical Meeting, Statistics in Epidemiology Section, American Statistical Association. Schlesselman, J. J. (1974). Sample Size Requirements in Cohort and Case-Control Studies of Disease, Am. J. of Epidemiology, 99:381. Schlesselman, J. J. (1974). Tables of the Sample Size Requirement for Cohort and Case-Control Studies of Disease, NICHD, NIH, Bethesda, Md. Snedecor, G.W. and W.G.Cochran. (1989). Statistical Methods. (8 th edition), Iowa State Univ. Press. Ames, IA. Zacks, S. (1985). Pitman efficiency in Encyclopedia of Statistical Sciences, Vol. 6, S. Kotz & N.L. Johnson. Eds. Wiley, NewYork. pp: 731-735.. 1901