NEED A SAMPLE SIZE? How to work with your friendly biostatistician!!! BERD Pizza & Pilots November 18, 2013 Emily Van Meter, PhD Assistant Professor Division of Cancer Biostatistics
Overview Why do we need a sample size calculation??? What are the essential factors needed to calculate sample size? Other Considerations Types of Hypotheses Types of Endpoints How many groups?? HOW CAN WE HELP YOU?!
Why do we need to calculate sample size?!?!?
Starting Point Starts with your questions and endpoints Think of the data you will be collecting AND the analyses on this data Requires knowledge of the clinical setting
Reasons for Estimating N To achieve reasonable precision of the estimate of interest To ensure adequate statistical power for the hypothesis test Sample size should reflect what is needed for the PRIMARY aim and analysis of the study Needed to determine your budget
Main Question: Sample Size Question What sample size is required to ensure a _ of detecting a at level _?
Sample Size Question Study size should be enough to test your primary hypothesis..and should be enough to examine secondary hypotheses BUT WHAT IS ENOUGH??
OK fine you made your point sample size calculations are important! What factors do I need to calculate my sample size??
Sample Size Parameters Remember the Greeks: = alpha = Probability of a Type I Error = beta = Probability of a Type II Error 1- = POWER = clinically relevant difference*** = standard deviation will be needed too
Error Rates - Alpha A type I error occurs when a true null hypothesis is rejected In the superiority setting this is falsely claiming a difference In the non-inferiority setting this is falsely claiming a similarity This is called the significance level of a test
Choosing α carefully Because is chosen by the investigator, it is under his control and is known Thus when you reject H0, you know the probability of a Type I error is chosen a priori Usually set at alpha = 0.05 Sometimes see alpha = 0.10 or 0.15 in very exploratory phase II settings (not as common)
Error Rates - Beta A type II error occurs when we fail to reject a false null hypothesis In the superiority setting this is falsely claiming no difference In the non-inferiority setting this is falsely claiming a difference = P(fail to reject H0 H0 false) 1- = Power Usually see power at 80% - 90%... Never below 80%!!!
Type II Error and Power Why should we be concerned about power? The power of a test tells us how likely we are to reject the null hypothesis given that the alternative hypothesis is true If the power is too low, then we have little chance of rejecting the null even if the alternative is true What is the cause of low power? Invariably, it is an inadequate sample size
α and β and Statistical Considerations Because we have these two types of error and one is potentially possible in any decision, we NEVER say that we have proved that H0 is true or that H0 is false!!! Proof implies that there is no possibility for error Instead we say that the data support or fail to support the null hypothesis (i.e. reject or fail to reject H0, respectively) Statistical significance does not imply clinical significance
Clinically Relevant Difference Minimum clinically important difference, MCID ( Δ ), in the outcome measure Usually get some information from pilot data or other published results Another tough piece of information to get Purely clinically driven what amount of treatment effect will change clinical practice?
MCID MCID is not necessarily what you observe in previous pilot studies You may have observed the proportion of good outcome of 80% in the pilot study as compared to 30% in historical controls - but is that difference of 50% an MCID? MCID should be chosen such that it is reasonable to expect it from a particular treatment You may chose MCID of 25% but the pilot studies have shown only 5-10% effect size should you plan a study to detect a 25% effect size anyway?
OK so I have the basics now I m done right?! NEVER!
Other things to consider! Type of Hypothesis Superiority, Non-inferiority, equivalence, estimation Type of Endpoint!! Continuous, Binary, Ordinal, Time to Event Type of test 1-sided or 2-sided? Compliance Concerns Drop out rates, loss to follow up, and/or noncompliance (switching treatment arms?) Any interim looks for futility and/or superiority? Number of groups? (1, 2, 3 or more) If more than 1 how will they be allocated? 1:1? 2:1?
Sample Size depends on hypothesis! Hypothesis Testing Approach Superiority Is treatment A BETTER than B? Non-inferiority Is treatment A NOT WORSE than treatment B? Equivalence Is A EQUAL to B (within a reasonable range)? Confidence Interval Approach Estimation Just want to estimate our primary endpoint with a certain amount of precision
Type of Endpoints Continuous and Independent Continuous and NOT Independent Change in blood pressure from pre to post in same individual, repeated measurements Binary or Ordinal Outcomes Proportion of patients that have a successful outcome in your trial (must be clearly defined pre-study!!!) Time to Event Overall survival or progression, need to include accrual and followup times in sample size as well Each one uses a different formula to calculate the appropriate sample size!!! It s important to understand how your data will look!
Treatment Allocation Generally 1:1 (yields best power given total N) Practical More subjects allocated to a new procedure may allow for enthusiasm for the study by investigators or potential subjects Want to gain more information on safety of treatment arm (already have this information for the control arm)
Ethical Treatment Allocation Possibly to minimize exposure by allocating less subjects to more riskier treatment arm or to maximize exposure by allocating more subjects to potentially better treatment arm Some argue that this affects the clinical equipoise of the trial 23
Missing Outcome Data Subject became lost-to-follow-up Subject withdrew consent Subject died No other reason should exists for missing outcome data!
Protocol Violations Subject became lost-to-follow-up Subject withdrew consent Subject had not met eligibility criteria Subject/investigator did not comply with treatment regimen Crossover in treatment allocation What about not obtaining Informed Consent? NEED TO CONSIDER THIS IN SAMPLE SIZE CALCULATIONS!
One-sided versus two-sided tests When is a One-Sided Test Acceptable? It is completely inconceivable that the results can go in the opposite direction from that hypothesized Only truly concerned with outcomes in one tail Often see one-sided test at alpha=0.025 (usually alpha/2)
This is so complicated! Are there some easy ways to understand how sample sizes change?
When keeping all other things the same THINGS THAT MAKE SAMPLE SIZE INCREASE!!! Decreasing alpha Increasing power 2-sided vs. 1-sided test Decreasing MCID Keep in mind that could be changing standard deviations and/or means, or proportions, median survival times, etc Decreasing follow up and/or accrual times
Culprit of Underpowered Studies One common mistake in RCT designs is guaranteeing adequate power, not at or above the threshold of clinical significance, but at or above the desired or hoped-for effect size or one based on very optimistic, underpowered, pilot studies. (Kraemer and Kupfer, 2006)
Software Paid Sample Size Programs: nquery, PASS, EaST, SAS Can also calculate sample sizes in R Simon s 2 stage: http://linus.nci.nih.gov/~brb/ PS: Vandy Free Downloadable Program: http://biostat.mc.vanderbilt.edu/wiki/main/power SampleSize CRAB Calculators: http://www.crab.org/resources/statisticaltools.as px
HOW CAN WE HELP YOU?!?! If you get nothing else from this lecture, sample sizes can be somewhat complex and there are a ton of considerations before we can get the correct one for your study! Come talk to us EARLY!!!!! We are actually here to make your lives a bit easier (no promises though!)
What to bring to a meeting with us! Collaborate with entire research team early in the development process!!! What we need from you: Your proposed research question EASY! Primary and secondary outcomes We ll help you make it specific and quantifiable Groups / factors of interest Clinically meaningful detectable difference and variability of observations Can be found from pilot studies and literature Drives sample size needed! We tend to work better when well fed and happy! Just kidding!
Come talk with us! We can take your objectives and help you with sample size calculations Also assist with study design, analysis plans, etc BERD Core Drs. Richard Kryscio, Heather Bush, Richard Charnigo, Emily Van Meter, Arnold Stromberg, Catherine Starnes Website: http://ccts.uky.edu/berd/default.aspx Email contacts: Catherine Starnes (Catherine.starnes@uky.edu) Or Elodie Elayi (elodie.elayi@uky.edu)