Sample Size Considerations. Todd Alonzo, PhD

Similar documents
Designing Studies of Diagnostic Imaging

Power & Sample Size. Dr. Andrea Benedetti

SISCR Module 4 Part III: Comparing Two Risk Models. Kathleen Kerr, Ph.D. Associate Professor Department of Biostatistics University of Washington

Introduction to ROC analysis

Outline of Part III. SISCR 2016, Module 7, Part III. SISCR Module 7 Part III: Comparing Two Risk Models

Lec 02: Estimation & Hypothesis Testing in Animal Ecology

Psychology Research Process

Binary Diagnostic Tests Two Independent Samples

Binary Diagnostic Tests Paired Samples

Methodological aspects of non-inferiority and equivalence trials

Lesson 11.1: The Alpha Value

Objectives. Quantifying the quality of hypothesis tests. Type I and II errors. Power of a test. Cautions about significance tests

Improving Reading Time of Digital Breast Tomosynthesis with Concurrent Computer Aided Detection

Sample Size, Power and Sampling Methods

SAMPLE SIZE IN CLINICAL RESEARCH, THE NUMBER WE NEED

NEED A SAMPLE SIZE? How to work with your friendly biostatistician!!!

Simple Linear Regression the model, estimation and testing

Fundamentals of Clinical Research for Radiologists. ROC Analysis. - Research Obuchowski ROC Analysis. Nancy A. Obuchowski 1

Comparing Two Means using SPSS (T-Test)

Bayesian and Frequentist Approaches

Methods for Determining Random Sample Size

Sheila Barron Statistics Outreach Center 2/8/2011

Hypothesis testing: comparig means

Reducing Decision Errors in the Paired Comparison of the Diagnostic Accuracy of Continuous Screening Tests

Mammography limitations. Clinical performance of digital breast tomosynthesis compared to digital mammography: blinded multi-reader study

Statistical inference provides methods for drawing conclusions about a population from sample data.

FDA Executive Summary

Psychology Research Process

Determining the size of a vaccine trial

Fundamental Clinical Trial Design

One-Way ANOVAs t-test two statistically significant Type I error alpha null hypothesis dependant variable Independent variable three levels;

Week 2 Video 3. Diagnostic Metrics

Lecture Notes Module 2

A Brief Introduction to Bayesian Statistics

Designing Experiments... Or how many times and ways can I screw that up?!?

COMPARING SEVERAL DIAGNOSTIC PROCEDURES USING THE INTRINSIC MEASURES OF ROC CURVE

Probability II. Patrick Breheny. February 15. Advanced rules Summary

Chapter 12: Introduction to Analysis of Variance

STA 3024 Spring 2013 EXAM 3 Test Form Code A UF ID #

Q: Why is breast cancer a big deal?

Common Statistical Issues in Biomedical Research

STAT 200. Guided Exercise 4

Clinical Trials A Practical Guide to Design, Analysis, and Reporting

Introduction. We can make a prediction about Y i based on X i by setting a threshold value T, and predicting Y i = 1 when X i > T.

Machine Learning Statistical Learning. Prof. Matteo Matteucci

OBSERVER PERFORMANCE METHODS FOR DIAGNOSTIC IMAGING Foundations, Modeling and Applications with R-Examples. Dev P.

Statistical Power Sampling Design and sample Size Determination

BMI 541/699 Lecture 16

Confidence Intervals On Subsets May Be Misleading

appstats26.notebook April 17, 2015

Viewpoints on Setting Clinical Trial Futility Criteria

3. Fixed-sample Clinical Trial Design

Applied Statistical Analysis EDUC 6050 Week 4

Statistical Considerations in Pilot Studies

EVALUATION AND COMPUTATION OF DIAGNOSTIC TESTS: A SIMPLE ALTERNATIVE

Weak Supervision. Vincent Chen and Nish Khandwala

Welcome to this series focused on sources of bias in epidemiologic studies. In this first module, I will provide a general overview of bias.

SAMPLING AND SAMPLE SIZE

e.com/watch?v=hz1f yhvojr4 e.com/watch?v=kmy xd6qeass

Behavioral Data Mining. Lecture 4 Measurement

AP STATISTICS 2008 SCORING GUIDELINES (Form B)

Breast Cancer Screening: Improved Readings With Computers

Creative Commons Attribution-NonCommercial-Share Alike License

This is a repository copy of Practical guide to sample size calculations: superiority trials.

Testing Superiority, Equivalence, and Non-Inferiority

Chapter 25. Paired Samples and Blocks. Copyright 2010 Pearson Education, Inc.

Review. Imagine the following table being obtained as a random. Decision Test Diseased Not Diseased Positive TP FP Negative FN TN

CHAPTER NINE DATA ANALYSIS / EVALUATING QUALITY (VALIDITY) OF BETWEEN GROUP EXPERIMENTS

European Federation of Statisticians in the Pharmaceutical Industry (EFSPI)

Analysis of Variance (ANOVA)

Comparison of two means

Fixed Effect Combining

Patrick Breheny. January 28

Adaptive designs: When are they sensible?

Sanjay P. Zodpey Clinical Epidemiology Unit, Department of Preventive and Social Medicine, Government Medical College, Nagpur, Maharashtra, India.

Breast tomosynthesis reduces radiologist performance variability compared to digital mammography

Statistics for Clinical Trials: Basics of Phase III Trial Design

Simple Sensitivity Analyses for Matched Samples Thomas E. Love, Ph.D. ASA Course Atlanta Georgia

Meta-Analysis and Subgroups

Hypothesis Testing. Richard S. Balkin, Ph.D., LPC-S, NCC

EC352 Econometric Methods: Week 07

Practical Bayesian Design and Analysis for Drug and Device Clinical Trials

Making comparisons. Previous sessions looked at how to describe a single group of subjects However, we are often interested in comparing two groups

Power of a Clinical Study

Sampling for Impact Evaluation. Maria Jones 24 June 2015 ieconnect Impact Evaluation Workshop Rio de Janeiro, Brazil June 22-25, 2015

Helmut Schütz. Satellite Short Course Budapest, 5 October

Comments on Significance of candidate cancer genes as assessed by the CaMP score by Parmigiani et al.

Empowered by Psychometrics The Fundamentals of Psychometrics. Jim Wollack University of Wisconsin Madison

Biostatistics 3. Developed by Pfizer. March 2018

Electrical impedance scanning of the breast is considered investigational and is not covered.

Welcome to our e-course on research methods for dietitians

Mammographic Breast Density Classification by a Deep Learning Approach

EPS 625 INTERMEDIATE STATISTICS TWO-WAY ANOVA IN-CLASS EXAMPLE (FLEXIBILITY)

SCHOOL OF MATHEMATICS AND STATISTICS

EXERCISE: HOW TO DO POWER CALCULATIONS IN OPTIMAL DESIGN SOFTWARE

Bios 6648: Design & conduct of clinical research

Two-Way Independent ANOVA

The t-test: Answers the question: is the difference between the two conditions in my experiment "real" or due to chance?

Midterm Exam MMI 409 Spring 2009 Gordon Bleil

Stat Wk 9: Hypothesis Tests and Analysis

Transcription:

Sample Size Considerations Todd Alonzo, PhD 1

Thanks to Nancy Obuchowski for the original version of this presentation. 2

Why do Sample Size Calculations? 1. To minimize the risk of making the wrong conclusion from your study 2. To plan for your study s needs 3. To determine if a study is feasible 3

Outline 4steps in process to determining sample size Two example studies Measurement Test hypothesis Four steps for each example 4

The process of sample size calculations takes place afteryou have specified your study s primary objective. e.g. To estimate and compare the breast-level sensitivity and specificity of board-certified mammographersinterpreting breast MRI vs. mammograms of high-risk women. 5

State statistical hypothesis and/or what you want to estimate 6

State statistical hypothesis and/or what you want to estimate Plan the statistical analysis 7

State statistical hypothesis and/or what you want to estimate. Plan the statistical analysis. Make assumptions about your study data 8

State statistical hypothesis and/or what you want to estimate Plan the statistical analysis Calculate sample size Make assumptions about your study data 9

Two Examples 1. Study to estimate sensitivity of new modality. - Simple measurement with confidence interval 2. Study to compare ROC areas of 2modalities. - Test hypothesis 10

Example 1: Study to estimate sensitivity of new modality Suppose you re planning a one-reader study Reader will score each patient using 5-point scale Define positive test result as score >3 Retrospective design 11

Estimate sensitivity and construct 95% CI 1. Estimate sensitivity as the proportion of diseased patients scored >3. 2. Retrospective design Calculate N Assumptions?? 12

Conversation with statistician Radiologist: I want to estimate sensitivity of my new modality Statistician: How sensitive do you expect it to be? Maybe between 0.7 and 0.9 To within 0.10 would be ok. Ok. We should construct a CI. How precisely do you want to estimate sensitivity? 13

Estimate sensitivity and construct 95% CI 1. Estimate sensitivity as the proportion of diseased patients scored >3. 2. Retrospective design Calculate N Assumptions: 1. Sensitivity: 0.7 0.9 2. CI precision of +0.10 14

Sample Size Formula for Constructing CI N = [ (z α/2 ) 2 (Var) ] / (L) 2 Var = variance L = precision 15

Some Useful z-values For 95% confidence level z α/2 = 1.96 For 5% type I error rate z α/2 = 1.96 For 80% power z β = 0.84 For 90%power z β = 1.28 16

Example 1 Variability for a proportion: p (1-p) p Variability 0.5 0.25 0.6 0.24 0.7 0.21 0.8 0.16 0.9 0.09 17

Example 1 N = [ (1.96) 2 (0.7 0.3) ] / (0.10) 2 = 80.7 So we need 81 patients with disease. 18

Impact of Study Design on N: Retrospective Study: Identify # diseased subjects needed for study based on sample size calculation. Call this N DIS = 81 19

Impact of Study Design on N: Retrospective Study: Identify # diseased subjects needed for study based on sample size calculation. Call this N DIS = 81 Prospective Study: Must consider prevalence rate. Say prevalence rate is 10%. total N= N DIS / Prevalence = 80.7/0.10 = 807 20

Number of Subjects Required 3500 3000 2500 2000 1500 1000 Retrospective (Ndis) Prospective (Ntotal) 500 0 0.05 0.1 0.15 0.2 Half-Width of CI 21

Example 2: Study to Compare ROC Areas of Two Modalities Suppose you want to compare a new modality with a standard modality Readers will assign a confidence score (0-100) Retrospective design 22

Testing Hypotheses Superiority Non-Inferiority Equivalence 23

Alternative Hypothesis Superiority One modality is better than the other Non-Inferiority New modality is at least as good as the standard modality Equivalence New modality hassame performance as the standard modality 24

Statistical Hypotheses Superiority H o : null Non-Inferiority H o : H A : alternative H A : Equivalence H o : H A : 25

Statistical Hypotheses Superiority H o : AUC new = AUC old Non-Inferiority H A : AUC new AUC old Equivalence 26

Statistical Hypotheses Superiority H o : AUC new = AUC old H A : AUC new AUC old Non-Inferiority H o : AUC new + δ <AUC old Equivalence H A : AUC new + δ >AUC old 27

Non-inferiority criterion If the new test is easier to use than the old test (e.g. fewer complications, less expensive, less radiation, less time, less personnel), it doesn t have to be quite so accurate: _δ=0.05_ 0.75 AUC old =0.80 28

Statistical Hypotheses Superiority Non-Inferiority Equivalence H o : (AUC new -AUC old ) <δ L or (AUC new -AUC old ) > δ U H A : δ L < (AUC new -AUC old ) < δ U 29

Equivalence criteria The difference between the new and old test must be between the lower and the upper equivalence bounds: δ L = -0.05 δ U = +0.05 (AUC new -AUC old ) 30

H o : AUC new + δ<auc old H A : AUC new + δ> AUC old 1. Retrospective 2. Equal # diseased & nondiseased subjects 3. Paired subjects Calculate sample size for 1- reader study Assumptions?? 31

Conversation with statistician Radiologist: I want to test if my new modality is non-inferior to the old modality. Maybe about AUC=0.8. I think it is about the same. Yes. A paired subject design. Statistician: How accurate is the old test? How accurate is the new test? Will the same subjects undergo the new and old test? Ok. I ll assume moderate correlation. 32

Accuracy of New Test If AUC new >>> AUC old We don t need a large N... If AUC new <AUC old We need a much larger N 33

H o : AUC new + δ<auc old H A : AUC new + δ> AUC old 1. Retrospective 2. Equal # diseased & nondiseased subjects 3. Paired subjects Calculate sample size for 1- reader study 1. Old test average AUC = 0.8 2. New test average AUC =0.8 3. δ=0.05 4. r subjects =0.5 34

Sample Size Formula for One-Reader Study (z α/2 + z β ) 2 [2 {Var subject (1-r subject )}] N dis = (θ new -θ old + δ) 2 35

Type I and II errors Truth Null is truth (inferior) Alternative is truth (non-inferior) Fail to reject Null Correct conclusion Incorrect conclusion Reject Null Incorrect conclusion Correct conclusion 36

Type I and II errors Truth Null is truth Alternative is truth Fail to reject Null Correct conclusion TypeII (β) error Reject Null TypeI (α) error POWER 37

Some Useful z-values For 95% confidence level z α/2 = 1.96 For 5% type I error rate z α/2 = 1.96 For 80% power z β = 0.84 For 90%power z β = 1.28 38

What is the variability for AUC? Obtain an estimate from pilot study Var subject = AUC (1-AUC) (conservative) 39

Sample Size Formula for One-Reader Study (z α/2 + z β ) 2 [2 {Var subject (1-r subject )}] N dis = (θ new -θ old + δ) 2 (1.96 + 0.84) 2 [2 {0.8 0.2 (1-0.5)}] N dis = (0.05) 2 500 40

Now Consider Multi-reader Design: Same readers interpret images from both modalities? paired-reader design: r reader > 0 unpaired-reader design: r reader = 0 41

H o : AUC new + δ<auc old H A : AUC new + δ> AUC old 1. Retrospective 2. Equal # diseased & nondiseased subjects 3. Paired subjects 4. **Paired readers Calculate sample size for MRMC study 1. Old test average AUC = 0.8 2. New test average AUC =0.8 3. δ=0.05 4. r subjects =0.5 **Additional assumptions 42

Conversation with statistician Radiologist: N=1000 is too large; I want to perform a MRMC study. Their AUCs will vary quite a bit. Maybe between 0.7 and 0.9. Yes, the same readers will interpret images from both modalities. Yes, the readers who perform best on the old test are likely to perform best on the new test. Statistician: Ok. What might be the range of the readers AUCs? Will the readers be paired? Do you think readers will perform similarly on the two tests? Ok. I ll assume high correlation. 43

H o : AUC new + δ<auc old H A : AUC new + δ> AUC old 1. Retrospective 2. Equal # diseased & nondiseased subjects 3. Paired subjects 4. Paired readers Calculate sample size for MRMC study 1. Old test average AUC = 0.8 2. New test average AUC =0.8 3. δ= 0.05 4. Range of AUC across readers: 0.7 0.9 5. r subjects =0.5 6. r readers = 0.75 44

Sample Size Formula for MRMC Study [R (θ new -θ old + δ) 2 ] λ= [2 {Var reader (1-r reader ) + Var subject /N DIS (1-r subject )}] R = # readers 45

Sample Size for MRMC Study λis the noncentralityparameter from an F distribution. You can use a math package (e.g. MATLAB) to calculate power if you know λ, R, and the type I error. 46

Sources of Variability in MRMC Studies Obtain an estimate from pilot study Var subject = AUC (1-AUC) (conservative) Obtain an estimate from pilot study Estimate the range of AUCs across readers; then Var reader = [(range)/4] 2 47

48

Unpaired Designs If you plan an unpaired design, then set correlations to zero: Unpaired readers: r reader = 0 Unpaired subjects: r subject = 0 N = # subjects per test 49

Unpaired Designs If subjects are unpaired andnot randomized to two tests, then you must consider differences in patient mix between two tests. Use statistical modeling to do this. This increases N. 50

Clustered data (common in imaging studies) Clustered data = multiple observations/subject Examples: left and right breast / subject six colon segments / subject unknown# lesions / subject 51

Clustered data In data analysis and in sample size estimation, do not assume independence! 52

Sample size estimation with clustered data A conservative approach is to assume only one observation / subject. 53

Sample size estimation with clustered data Alternatively, take clustered data into account: design effect = 1 + (s-1) x r cluster s = # observations / subject r cluster = correlation between observations in cluster 54

Example with clustered data Suppose readers will score each breast in a mammography study. Suppose that with one observation / subject you calculated that you need 100 subjects. 55

Example with clustered data Calculate design effect = 1 + (s-1) x r cluster = 1 + (2-1) x 0.5 = 1.5 56

Example with clustered data design effect = 1.5 If we needed 100 subjects (if only one observation per subject), then: 100 x1.5= 150 breasts needed so, 75 subjects needed (150 breasts) 57

Estimate and construct CI State statistical hypothesis: - superiority - non-inferiority - equivalence Plan Statistical Analysis: - prospective or retrospective - paired or unpaired -clustered or 1 obs/subject Calculate sample size Assumptions: - performance estimates - variability - correlations -set type I and II error rates 58