Sample Size Considerations. Todd Alonzo, PhD

Sample Size Considerations Todd Alonzo, PhD 1

Thanks to Nancy Obuchowski for the original version of this presentation. 2

Why do Sample Size Calculations? 1. To minimize the risk of making the wrong conclusion from your study 2. To plan for your study s needs 3. To determine if a study is feasible 3

Outline 4steps in process to determining sample size Two example studies Measurement Test hypothesis Four steps for each example 4

The process of sample size calculations takes place afteryou have specified your study s primary objective. e.g. To estimate and compare the breast-level sensitivity and specificity of board-certified mammographersinterpreting breast MRI vs. mammograms of high-risk women. 5

State statistical hypothesis and/or what you want to estimate 6

State statistical hypothesis and/or what you want to estimate Plan the statistical analysis 7

State statistical hypothesis and/or what you want to estimate. Plan the statistical analysis. Make assumptions about your study data 8

State statistical hypothesis and/or what you want to estimate Plan the statistical analysis Calculate sample size Make assumptions about your study data 9

Two Examples 1. Study to estimate sensitivity of new modality. - Simple measurement with confidence interval 2. Study to compare ROC areas of 2modalities. - Test hypothesis 10

Example 1: Study to estimate sensitivity of new modality Suppose you re planning a one-reader study Reader will score each patient using 5-point scale Define positive test result as score >3 Retrospective design 11

Estimate sensitivity and construct 95% CI 1. Estimate sensitivity as the proportion of diseased patients scored >3. 2. Retrospective design Calculate N Assumptions?? 12

Conversation with statistician Radiologist: I want to estimate sensitivity of my new modality Statistician: How sensitive do you expect it to be? Maybe between 0.7 and 0.9 To within 0.10 would be ok. Ok. We should construct a CI. How precisely do you want to estimate sensitivity? 13

Estimate sensitivity and construct 95% CI 1. Estimate sensitivity as the proportion of diseased patients scored >3. 2. Retrospective design Calculate N Assumptions: 1. Sensitivity: 0.7 0.9 2. CI precision of +0.10 14

Sample Size Formula for Constructing CI N = [ (z α/2 ) 2 (Var) ] / (L) 2 Var = variance L = precision 15

Some Useful z-values For 95% confidence level z α/2 = 1.96 For 5% type I error rate z α/2 = 1.96 For 80% power z β = 0.84 For 90%power z β = 1.28 16

Example 1 Variability for a proportion: p (1-p) p Variability 0.5 0.25 0.6 0.24 0.7 0.21 0.8 0.16 0.9 0.09 17

Example 1 N = [ (1.96) 2 (0.7 0.3) ] / (0.10) 2 = 80.7 So we need 81 patients with disease. 18

Impact of Study Design on N: Retrospective Study: Identify # diseased subjects needed for study based on sample size calculation. Call this N DIS = 81 19

Impact of Study Design on N: Retrospective Study: Identify # diseased subjects needed for study based on sample size calculation. Call this N DIS = 81 Prospective Study: Must consider prevalence rate. Say prevalence rate is 10%. total N= N DIS / Prevalence = 80.7/0.10 = 807 20

Number of Subjects Required 3500 3000 2500 2000 1500 1000 Retrospective (Ndis) Prospective (Ntotal) 500 0 0.05 0.1 0.15 0.2 Half-Width of CI 21

Example 2: Study to Compare ROC Areas of Two Modalities Suppose you want to compare a new modality with a standard modality Readers will assign a confidence score (0-100) Retrospective design 22

Testing Hypotheses Superiority Non-Inferiority Equivalence 23

Alternative Hypothesis Superiority One modality is better than the other Non-Inferiority New modality is at least as good as the standard modality Equivalence New modality hassame performance as the standard modality 24

Statistical Hypotheses Superiority H o : null Non-Inferiority H o : H A : alternative H A : Equivalence H o : H A : 25

Statistical Hypotheses Superiority H o : AUC new = AUC old Non-Inferiority H A : AUC new AUC old Equivalence 26

Statistical Hypotheses Superiority H o : AUC new = AUC old H A : AUC new AUC old Non-Inferiority H o : AUC new + δ <AUC old Equivalence H A : AUC new + δ >AUC old 27

Non-inferiority criterion If the new test is easier to use than the old test (e.g. fewer complications, less expensive, less radiation, less time, less personnel), it doesn t have to be quite so accurate: _δ=0.05_ 0.75 AUC old =0.80 28

Statistical Hypotheses Superiority Non-Inferiority Equivalence H o : (AUC new -AUC old ) <δ L or (AUC new -AUC old ) > δ U H A : δ L < (AUC new -AUC old ) < δ U 29

Equivalence criteria The difference between the new and old test must be between the lower and the upper equivalence bounds: δ L = -0.05 δ U = +0.05 (AUC new -AUC old ) 30

H o : AUC new + δ<auc old H A : AUC new + δ> AUC old 1. Retrospective 2. Equal # diseased & nondiseased subjects 3. Paired subjects Calculate sample size for 1- reader study Assumptions?? 31

Conversation with statistician Radiologist: I want to test if my new modality is non-inferior to the old modality. Maybe about AUC=0.8. I think it is about the same. Yes. A paired subject design. Statistician: How accurate is the old test? How accurate is the new test? Will the same subjects undergo the new and old test? Ok. I ll assume moderate correlation. 32

Accuracy of New Test If AUC new >>> AUC old We don t need a large N... If AUC new <AUC old We need a much larger N 33

H o : AUC new + δ<auc old H A : AUC new + δ> AUC old 1. Retrospective 2. Equal # diseased & nondiseased subjects 3. Paired subjects Calculate sample size for 1- reader study 1. Old test average AUC = 0.8 2. New test average AUC =0.8 3. δ=0.05 4. r subjects =0.5 34

Sample Size Formula for One-Reader Study (z α/2 + z β ) 2 [2 {Var subject (1-r subject )}] N dis = (θ new -θ old + δ) 2 35

Type I and II errors Truth Null is truth (inferior) Alternative is truth (non-inferior) Fail to reject Null Correct conclusion Incorrect conclusion Reject Null Incorrect conclusion Correct conclusion 36

Type I and II errors Truth Null is truth Alternative is truth Fail to reject Null Correct conclusion TypeII (β) error Reject Null TypeI (α) error POWER 37

Some Useful z-values For 95% confidence level z α/2 = 1.96 For 5% type I error rate z α/2 = 1.96 For 80% power z β = 0.84 For 90%power z β = 1.28 38

What is the variability for AUC? Obtain an estimate from pilot study Var subject = AUC (1-AUC) (conservative) 39

Sample Size Formula for One-Reader Study (z α/2 + z β ) 2 [2 {Var subject (1-r subject )}] N dis = (θ new -θ old + δ) 2 (1.96 + 0.84) 2 [2 {0.8 0.2 (1-0.5)}] N dis = (0.05) 2 500 40

Now Consider Multi-reader Design: Same readers interpret images from both modalities? paired-reader design: r reader > 0 unpaired-reader design: r reader = 0 41

H o : AUC new + δ<auc old H A : AUC new + δ> AUC old 1. Retrospective 2. Equal # diseased & nondiseased subjects 3. Paired subjects 4. **Paired readers Calculate sample size for MRMC study 1. Old test average AUC = 0.8 2. New test average AUC =0.8 3. δ=0.05 4. r subjects =0.5 **Additional assumptions 42

Conversation with statistician Radiologist: N=1000 is too large; I want to perform a MRMC study. Their AUCs will vary quite a bit. Maybe between 0.7 and 0.9. Yes, the same readers will interpret images from both modalities. Yes, the readers who perform best on the old test are likely to perform best on the new test. Statistician: Ok. What might be the range of the readers AUCs? Will the readers be paired? Do you think readers will perform similarly on the two tests? Ok. I ll assume high correlation. 43

H o : AUC new + δ<auc old H A : AUC new + δ> AUC old 1. Retrospective 2. Equal # diseased & nondiseased subjects 3. Paired subjects 4. Paired readers Calculate sample size for MRMC study 1. Old test average AUC = 0.8 2. New test average AUC =0.8 3. δ= 0.05 4. Range of AUC across readers: 0.7 0.9 5. r subjects =0.5 6. r readers = 0.75 44

Sample Size Formula for MRMC Study [R (θ new -θ old + δ) 2 ] λ= [2 {Var reader (1-r reader ) + Var subject /N DIS (1-r subject )}] R = # readers 45

Sample Size for MRMC Study λis the noncentralityparameter from an F distribution. You can use a math package (e.g. MATLAB) to calculate power if you know λ, R, and the type I error. 46

Sources of Variability in MRMC Studies Obtain an estimate from pilot study Var subject = AUC (1-AUC) (conservative) Obtain an estimate from pilot study Estimate the range of AUCs across readers; then Var reader = [(range)/4] 2 47

Unpaired Designs If you plan an unpaired design, then set correlations to zero: Unpaired readers: r reader = 0 Unpaired subjects: r subject = 0 N = # subjects per test 49

Unpaired Designs If subjects are unpaired andnot randomized to two tests, then you must consider differences in patient mix between two tests. Use statistical modeling to do this. This increases N. 50

Clustered data (common in imaging studies) Clustered data = multiple observations/subject Examples: left and right breast / subject six colon segments / subject unknown# lesions / subject 51

Clustered data In data analysis and in sample size estimation, do not assume independence! 52

Sample size estimation with clustered data A conservative approach is to assume only one observation / subject. 53

Sample size estimation with clustered data Alternatively, take clustered data into account: design effect = 1 + (s-1) x r cluster s = # observations / subject r cluster = correlation between observations in cluster 54

Example with clustered data Suppose readers will score each breast in a mammography study. Suppose that with one observation / subject you calculated that you need 100 subjects. 55

Example with clustered data Calculate design effect = 1 + (s-1) x r cluster = 1 + (2-1) x 0.5 = 1.5 56

Example with clustered data design effect = 1.5 If we needed 100 subjects (if only one observation per subject), then: 100 x1.5= 150 breasts needed so, 75 subjects needed (150 breasts) 57

Estimate and construct CI State statistical hypothesis: - superiority - non-inferiority - equivalence Plan Statistical Analysis: - prospective or retrospective - paired or unpaired -clustered or 1 obs/subject Calculate sample size Assumptions: - performance estimates - variability - correlations -set type I and II error rates 58