Sample size for clinical trials

Similar documents
Sample size calculations: basic principles and common pitfalls

SAMPLE SIZE AND POWER

Sanjay P. Zodpey Clinical Epidemiology Unit, Department of Preventive and Social Medicine, Government Medical College, Nagpur, Maharashtra, India.

2017 OPTIONS FOR INDIVIDUAL MEASURES: REGISTRY ONLY. MEASURE TYPE: Outcome

Estimation. Preliminary: the Normal distribution

An introduction to power and sample size estimation

ICSS Safety Results NOT for PUBLICATION. June 2009 ICSS ICSS ICSS ICSS. International Carotid Stenting Study: Main Inclusion Criteria

Sample Size and Power. Objectives. Take Away Message. Laura Lee Johnson, Ph.D. Statistician

2018 OPTIONS FOR INDIVIDUAL MEASURES: REGISTRY ONLY. MEASURE TYPE: Outcome

How to Choose Between Carotid Stenting and Carotid Endarterectomy for Stroke Prevention

Sample size calculation a quick guide. Ronán Conroy

Review Statistics review 2: Samples and populations Elise Whitley* and Jonathan Ball

NEED A SAMPLE SIZE? How to work with your friendly biostatistician!!!

Sheila Barron Statistics Outreach Center 2/8/2011

CLINICAL TIMELINE EVA-3S CREST ICSS SPACE SAPPHIRE

Types of data and how they can be analysed

TCAR: TransCarotid Artery Revascularization Angela A. Kokkosis, MD, RPVI, FACS

Sample Size and Power

DESCRIPTION: Percent of asymptomatic patients undergoing CEA who are discharged to home no later than post-operative day #2

City, University of London Institutional Repository

Chapter 19. Confidence Intervals for Proportions. Copyright 2010 Pearson Education, Inc.

2016 PQRS OPTIONS FOR INDIVIDUAL MEASURES: REGISTRY ONLY

Understanding Uncertainty in School League Tables*

Assessing Agreement Between Methods Of Clinical Measurement

Lecture Notes Module 2

Hypothesis testing: comparig means

10.1 Estimating with Confidence. Chapter 10 Introduction to Inference

Rationales for an accurate sample size evaluation

The most important recommendations from the 2017 ESVS/ESC guideline on the management of carotid artery disease

MM105 Biostatistics, Clinical Methodology & Epidemiology

SAMPLE SIZE IN CLINICAL RESEARCH, THE NUMBER WE NEED

Carotid Artery Stenosis

How to design sample size? Minato Nakazawa, Ph.D. Dept. Public Health

CAROTID STENTING A 2009 UPDATE. Hoang Duong, MD Director of Interventional Neuroradiology Memorial Regional Hospital

BULgarian Carotid Artery Stenting versus Surgery Study (BULCASSS): Randomized single center trial

Planning Sample Size for Randomized Evaluations.

This is a repository copy of Practical guide to sample size calculations: non-inferiority and equivalence trials.

Things you need to know about the Normal Distribution. How to use your statistical calculator to calculate The mean The SD of a set of data points.

Chapter 19. Confidence Intervals for Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Disclosures. State of the Art Management of Carotid Stenosis. NIH funding for clinical trials Consultant for Scientia Vascular and Medtronic

I. Medical Decision-Making in Situations that Offer Multiple Alternatives

Objectives. Quantifying the quality of hypothesis tests. Type I and II errors. Power of a test. Cautions about significance tests

Analysis of Variance (ANOVA)

Comparisons within randomised groups

Statistical techniques to evaluate the agreement degree of medicine measurements

For the ICSS Investigators. 7 th Munich Vascular Conference Munich, 7 December 2017

This is a repository copy of Practical guide to sample size calculations: superiority trials.

Update on the only remaining Carotid Multicenter Randomised International Trial in the World:ACST-2

UNIT 1 Hypothesis testing and estimation

Data, frequencies, and distributions. Martin Bland. Types of data. Types of data. Clinical Biostatistics

Articles. Funding Medical Research Council, the Stroke Association, Sanofi-Synthélabo, European Union.

Carotid Endarterectomy vs. Carotid artery Stenting (Surgeon Perspective)

baseline comparisons in RCTs

Current Status and Perspectives of ACST-2, CREST-2, ECST-2 and ACTRIS. Richard Bulbulia Co-Principal Investigator ACST-2 University of Oxford

IAPT: Regression. Regression analyses

Bayesian and Frequentist Approaches

Special Topic Section

Methods for Determining Random Sample Size

03/30/2016 DISCLOSURES TO OPERATE OR NOT THAT IS THE QUESTION CAROTID INTERVENTION IS INDICATED FOR ASYMPTOMATIC CAROTID OCCLUSIVE DISEASE

Carotid Artery Stenting Versus

Determining the size of a vaccine trial

Sample Size Considerations. Todd Alonzo, PhD

Comparison of blood pressure measured at the arm, ankle and calf

Cross-over trials. Martin Bland. Cross-over trials. Cross-over trials. Professor of Health Statistics University of York

(a) 50% of the shows have a rating greater than: impossible to tell

Clincial Biostatistics. Regression

Surgical Treatment of Carotid Disease

Strategies for handling missing data in randomised trials

National Emphysema Treatment Trial (NETT) Consent for Randomization to Treatment

CHAPTER NINE DATA ANALYSIS / EVALUATING QUALITY (VALIDITY) OF BETWEEN GROUP EXPERIMENTS

The Pretest! Pretest! Pretest! Assignment (Example 2)

Supplementary appendix

New concepts for filter protection during CAS: double filtration. Alberto Cremonesi MD, FESC

Real-world data in pragmatic trials

USING STATCRUNCH TO CONSTRUCT CONFIDENCE INTERVALS and CALCULATE SAMPLE SIZE

Role of evidence from observational studies in the process of health care decision making

Further data analysis topics

Sampling. Ivana Kolčić, PhD

15.301/310, Managerial Psychology Prof. Dan Ariely Recitation 8: T test and ANOVA

Carotid Artery Stenting

Statistical Power Sampling Design and sample Size Determination

Adaptive designs: When are they sensible?

e.com/watch?v=hz1f yhvojr4 e.com/watch?v=kmy xd6qeass

Suppose we tried to figure out the weights of everyone on campus. How could we do this? Weigh everyone. Is this practical? Possible? Accurate?

Chapter 8 Estimating with Confidence

The RCSI Sample size handbook

ICCA Carotid Angioplasty and Other Cerebrovascular Interventions. Preliminary Scientific Program as of September 2011

W e have previously described the disease impact

Vivek R. Deshmukh, MD Director, Cerebrovascular and Endovascular Neurosurgery Chairman, Department of Neurosurgery Providence Brain and Spine

Clinical Considerations of High Intensity Interval Training (HIIT)

Final Exam PS 217, Fall 2010

The Leeds Reliable Change Indicator

Carotid endarterectomy versus carotid angioplasty for stroke prevention: a systematic review and meta-analysis

Revised Cochrane risk of bias tool for randomized trials (RoB 2.0) Additional considerations for cross-over trials

Statistical analysis plan - The Oslo Study of Clonidine in Elderly Patients with Delirium; LUCID

10/21/2014. Considerations in the Selection of Research Participants. Reasons to think about participant selection

Welcome to our e-course on research methods for dietitians

Stat Wk 9: Hypothesis Tests and Analysis

BIOSTATISTICAL METHODS

Transcription:

British Standards Institution Study Day Sample size for clinical trials Martin Bland Prof. of Health Statistics University of York http://martinbland.co.uk/ Outcome variables for trials An outcome variable is one which we hope to change, predict or estimate in a trial. Examples: Systolic blood pressure in a hypertension trial Cæsarean section rate in an obstetric trial Survival time in a cancer trial Number of outcome variables Should I have few or many? Many outcome variables Cover all possibilities Less likely to miss something Risk of false positives Few outcome variables Easier and quick to check and analyse Cheaper Easier for research subjects Avoid multiple testing 1

Primary and secondary outcome variables Get round the problem by having one outcome variable on which the main conclusion stands or falls, the primary outcome variable. If we do not find an effect for this variable, the study has a negative result. Usually several secondary outcome variables, to answer secondary questions. The primary outcome variable must relate to the main aim of the study. Choose one and stick to it. How large a sample should I take? A significance test for comparing two means is more likely to detect a large difference between two populations than a small one. The probability that a test will produce a significant difference at a given significance level is called the power of the test. The power of a test is related to: the postulated difference in the population the standard error of the sample difference (which depends on the sample size) the significance level, usually. Relationship between: power of the test, P postulated difference in the population, standard error of the sample difference, SE(d) significance level, α If we know three of these we can calculate the fourth. Bland M. (2000) An Introduction to Medical Statistics. Oxford University Press. 2

Relationship between: power of the test, P postulated difference in the population, standard error of the sample difference, SE(d) significance level, α We choose SE(d) depends on sample size and variability depends on power and significance level only If we know three of these we can calculate the fourth. SE(d) depends on the particular sample size problem. P = power of the test, α = significance level. Values of f(α,p) for different P and α α P 0.05 0.01 ------------------------------- 0.50 3.8 6.6 0.70 6.2 9.6 0.80 7.9 11.7 0.90 10.5 14.9 0.95 15.2 20.4 0.99 18.4 24.0 Example: trial to reduce blood pressure. Decide a clinically important difference would be 10 mm Hg. Where does this come from? Clinical judgement as to what would be important focus group of clinicians. What the treatment might achieve pilot studies, other trials of similar treatments. Back calculation from what is feasible frowned on by referees. 3

Relationship between: power of the test, P postulated difference in the population, standard error of the sample difference, SE(d) significance level, α We choose SE(d) depends on sample size and variability depends on power and significance level only If we know three of these we can calculate the fourth. Comparison of two means Compare the means of two samples, sample sizes and, from populations with means and, with the variance of the measurements being. We have and so the equation becomes: Comparison of two means For equal sized groups, n 1 = n 2 = n, the equation becomes: 4

Example: trial of an intervention for depression in primary care Primary outcome: PHQ9 depression score after 4 months. PHQ9 scale from 0 to 27, high score = depressed. Decide a clinically important difference would be 2 points. Pilot study: standard deviation of PHQ9 after treatment = 7 Choose power P = 0.90 = 90%, α = 0.05 =5%. Choose power P = 0.90 = 90%, α = 0.05 = 5%: f(α,p) -------------------------------- α P 0.05 0.01 ------------------------------- 0.50 3.8 6.6 0.70 6.2 9.6 0.80 7.9 11.7 0.90 10.5 14.9 0.95 15.2 20.4 0.99 18.4 24.0 Detect difference μ 1 μ 2 = 2 points on PHQ9. Pilot study: σ = 7. P = 0.90 = 90%, α = 0.05 =5%, f(α,p) = 10.5 Hence we need 258 patients in each group. 5

That s the hard way. We can use: Software, Graphics, Tables, nquery Advisor Altman s nomogram Machin, et al. (1998) Statistical Tables for the Design of Clinical Studies, Second Edition That s the hard way. We can use: Software, Graphics, Tables, nquery Advisor Altman s nomogram Machin, et al. (1998) Statistical Tables for the Design of Clinical Studies, Second Edition nquery Advisor, http://www.statistical-solutions-software.com/ (commercial) That s the hard way. We can use: Software, Graphics, Tables, nquery Advisor Altman s nomogram Machin, et al. (1998) Statistical Tables for the Design of Clinical Studies, Second Edition Altman, D.G. (1991) Practical Statistics for Medical Research. Chapman and Hall, London. (nomogram) 6

That s the hard way. We can use: Software, Graphics, Tables, nquery Advisor Altman s nomogram Machin, et al. (1998) Statistical Tables for the Design of Clinical Studies, Second Edition Machin, D., Campbell, M.J., Fayers, P., Pinol, A. (1998) Statistical Tables for the Design of Clinical Studies, Second Edition. Blackwell, Oxford. Software A computer program, PS. PS: Power and Sample Size is a free program by William Dupont and Walton Plummer, Jr. There are many available on the World Wide Web. PS, http://biostat.mc.vanderbilt.edu/wiki/main/powersamplesize (free Windows program) Other sample size considerations: Eligible patients disappear like snow in August. Eligible patients may refuse. Clinicians may forget or refuse to enrol eligible patients. Patients may drop out. 7

Other sample size considerations: Eligible patients disappear like snow in August. Eligible patients may refuse. Clinicians may forget or refuse to enrol eligible patients. Patients may drop out. Allow for this by increasing the sample size and making sure that the new sample size is much smaller than the predicted number of eligible patients. Comparing two proportions: Proportions p 1 and p 2. SE(d) depends on p 1 and p 2. Several different variations on this formula. Different books, tables, or software may give slightly different results. Two equal groups: Example: Cæsarean section. P = 0.90, α = 0.05, so f(α,p) = 10.5. From clinical records we observe 24%. A reduction to 20% would be of clinical interest. We have p 1 = 0.24, p 2 = 0.20. 8

Example: Cæsarean section. P = 0.90, α = 0.05, p 1 = 0.24, p 2 = 0.20. So n = 2,247 in each group. Detecting small differences between proportions requires a very large sample size. Using PS... Expressing differences between two proportions p 1 = 24%, p 2 = 20%, so difference = 4. Are we looking for a reduction of 4% in Cæsarean sections? NO. A reduction of 4% would be 4% of 24% = 4 24/100 = 0.96, i.e. from 24% down to 23.04%. The reduction from 24% to 20% is 4 percentage points, NOT 4%. Power may be increased by adjustment for baseline and prognostic variables. Power may be reduced by cluster randomisation. If in doubt, consult a statistician. 9

Allowing for adjustment Power may be increased by adjustment for baseline and prognostic variables. We need to know the reduction in the standard deviation produced by the adjustment. Researchers often simply say: Power will be increased by adjustment for.... so that things will actually be be better than they estimate, but they cannot say by how much. Allowing for adjustment Proportion of variation explained by regression = r 2. Standard deviation after regression is σ (1 r 2 ). Example: We want to do a trial of a therapy programme for the management of depression. Measure depression using the PHQ9 scale, 0 to 27, high score = depression. From an existing trial we know that people identified with depression in primary care and given treatment as usual have baseline PHQ9 score with mean = 18 and SD = 5. After four months they had mean 13, SD = 7. We want to detect a difference in mean PHQ9 = 2 points. Allowing for adjustment Standard deviation after regression is σ (1 r 2 ). Example: We want to design a trial to detect a difference in mean PHQ9 = 2 points. Power = 0.90, significance level = 0.05, difference = 2, SD = 7: n = 258 per group. We also know r = 0.42. Standard deviation after regression = σ (1 r 2 ) = 7 (1 0.42 2 ) = 6.35. Power = 0.90, significance level = 0.05, difference = 2, SD = 7: n = 213 per group. 10

Allowing for adjustment If we have a good idea of the reduction in the variability that reduction will produce, we can use this to reduce the required sample size. The effect is not usually very great. For example, to halve the required sample size, we must have (1 r 2 ) = ½ r 2 = ¾ r = 0.87. 0.87 is a pretty big correlation coefficient. Confidence intervals Movement to present results of trials in the form of confidence intervals rather than P values. Motivated by the difficulties of interpreting significance tests, particularly when the result was not significant. Major journals changed their instructions to authors to say that confidence intervals would be the preferred or even required method of presentation. Endorsed by the wide acceptance of the Consort standard for the presentation of clinical trials. Gardner MJ and Altman DG. Confidence intervals rather than P values: estimation rather than hypothesis testing. Br Med J 1986; 292: 746-50. Confidence intervals We ask researchers to design studies the results of which will be presented as confidence intervals, rather than significance tests. We should base our sample size calculations on confidence intervals, rather than significance tests. How do we do this? We need a formula for the confidence interval for the treatment difference in terms of the expected parameters and sample size. We must decide how precisely we want to estimate the treatment difference. Bland JM. (2009) The tyranny of power: is there a better way to calculate sample size? BMJ 2009; 339: b3985. 11

Confidence intervals For a large study, the 95% confidence interval will be 1.98 standard errors on either side of the observed difference. For a trial with equal sized samples, the 95% confidence interval for the difference between two means will be ±1.96σ (2/n) and for two proportions it will be ±1.96 (p 1 (1 p 1 )/n + p 2 (1 p 2 )/n) Confidence intervals For example, the International Carotid Stenting Study (ICSS) was designed to compare angioplasty and stenting with surgical vein transplantation for stenosis of carotid arteries, to reduce the risk of stroke. We did not anticipate that angioplasty would be superior to surgery in risk reduction, but that it would be similar in effect. Featherstone RL, Brown MM, Coward LJ. International Carotid Stenting Study: Protocol for a randomised clinical trial comparing carotid stenting with endarterectomy in symptomatic carotid artery stenosis. Cerebrovasc Dis 2004; 18: 69-74. Confidence intervals The sample size calculations for ICSS were based on the earlier CAVATAS study, which had the 3 year rate for death or ipsilateral stroke lasting more than 7 days = 14%. This was an equivalence trial, no difference was anticipated, p 1 = p 2 = 0.14. CAVATAS investigators. Endovascular versus surgical treatment in patients with carotid stenosis in the Carotid and Vertebral Artery TransluminalAngioplasty study (CAVATAS): a randomised trial. Lancet 2001; 357: 1729-37. 12

Confidence intervals For two proportions, width of the 95% confidence interval for the difference = ±1.96 (p 1 (1 p 1 )/n + p 2 (1 p 2 )/n) If we put p = 0.14, we can calculate this for different sample sizes: Sample size Width of 95% confidence interval 250 ±0.061 500 ±0.043 750 ±0.035 1000 ±0.030 We chose 750 in each group. Confidence intervals For two means, width of the 95% confidence interval for the difference = ±1.96σ (2/n). If we put n = 740, we can calculate this for the chosen sample size: ±1.96σ (2/750) = ±0.10σ. This was thought to be ample for cost data and any other continuous variables. 13