Large simple randomized trials and therapeutic decisions

Similar documents
Regulatory Hurdles for Drug Approvals

Reflection paper on assessment of cardiovascular safety profile of medicinal products

Reflection paper on assessment of cardiovascular risk of medicinal products for the treatment of cardiovascular and metabolic diseases Draft

NATIONAL INSTITUTE FOR HEALTH AND CLINICAL EXCELLENCE

Improving Medical Statistics and Interpretation of Clinical Trials

Safeguarding public health Subgroup Analyses: Important, Infuriating and Intractable

Critical Review Form Therapy Objectives: Methods:

Safeguarding public health CHMP's view on multiplicity; through assessment, advice and guidelines

Introduzione al metodo GRADE

An introduction to Quality by Design. Dr Martin Landray University of Oxford

DRAFT (Final) Concept Paper On choosing appropriate estimands and defining sensitivity analyses in confirmatory clinical trials

Journal Club Critical Appraisal Worksheets. Dr David Walbridge

The problem of uncontrolled hypertension

Safeguarding public health Subgroup analyses scene setting from the EU regulators perspective

Beta-blockers: Now what? Annemarie Thompson, MD Assistant Professor of Anesthesia and Medicine Vanderbilt University Medical Center

Robert Temple, MD Deputy Center Director for Clinical Science Center for Drug Evaluation and Research Food and Drug Administration

Evidence-Based Medicine and Publication Bias Desmond Thompson Merck & Co.

Design and analysis of clinical trials

Role of evidence from observational studies in the process of health care decision making

Is a Mediterranean diet best for preventing heart disease?

Clinical Trials and Clinical Practice: Surrogates at the Clinician/Patient Interface

David M Kent, MD, MS

Regulatory Experience in Reviewing CV Safety for Diabetes

Should beta blockers remain first-line drugs for hypertension?

CLINICAL DECISION USING AN ARTICLE ABOUT TREATMENT JOSEFINA S. ISIDRO LAPENA MD, MFM, FPAFP PROFESSOR, UPCM

E11(R1) Addendum to E11: Clinical Investigation of Medicinal Products in the Pediatric Population Step4

Hypertension Update Clinical Controversies Regarding Age and Race

Estimating risk from underpowered, but statistically significant, studies: Was

Updates in Therapeutics 2015: The Pharmacotherapy Preparatory Review & Recertification Course

Fundamental Clinical Trial Design

ACR OA Guideline Development Process Knee and Hip

Acute myocardial infarction. Cardiovascular disorders. main/0202_new 02/03/06. Search date August 2004 Nicholas Danchin and Eric Durand

Drug Class Review Newer Oral Anticoagulant Drugs

Antiplatelet Therapy in Primary CVD Prevention and Stable Coronary Artery Disease. Καρακώστας Γεώργιος Διευθυντής Καρδιολογικής Κλινικής, Γ.Ν.

03/30/2016 DISCLOSURES TO OPERATE OR NOT THAT IS THE QUESTION CAROTID INTERVENTION IS INDICATED FOR ASYMPTOMATIC CAROTID OCCLUSIVE DISEASE

JNC 8 -Controversies. Sagren Naidoo Nephrologist CMJAH

An introduction to power and sample size estimation

Dual Antiplatelet duration in ACS: too long or too short?

Guidance Document for Claims Based on Non-Inferiority Trials

Peer Review Report #2. Novel oral anticoagulants. (1) Does the application adequately address the issue of the public health need for the medicine?

Reflection paper on the use of extrapolation in the development of medicines for paediatrics

NATIONAL INSTITUTE FOR HEALTH AND CLINICAL EXCELLENCE

2013 Cholesterol Guidelines. Anna Broz MSN, RN, CNP, AACC Adult Certified Nurse Practitioner North Ohio Heart, Inc.

Learning Objectives 9/9/2013. Hypothesis Testing. Conflicts of Interest. Descriptive statistics: Numerical methods Measures of Central Tendency

Further data analysis topics

How to weigh the strength of prior information and clarify the expected level of evidence?

Value of troponin measurements in carotid artery revascularization

Introduction to Clinical Trials - Day 1 Session 2 - Screening Studies

Aspirin to Prevent Heart Attack and Stroke: What s the Right Dose?

David M. Kent, MD, MSc Professor of Medicine, Neurology, Clinical and Translational Science, Director, Predictive Analytics and Comparative

Critical Appraisal. Dave Abbott Senior Medicines Information Pharmacist

95% 2.5% 2.5% +2SD 95% of data will 95% be within of data will 1.96 be within standard deviations 1.96 of sample mean

Bayesian and Frequentist Approaches

New Hypertension Guidelines: Why the change? Neil Brummond, M.D. Avera Medical Group Internal Medicine Sioux Falls, SD

Dr Julia Hopyan Stroke Neurologist Sunnybrook Health Sciences Centre

9/4/2013. Decision Errors. Hypothesis Testing. Conflicts of Interest. Descriptive statistics: Numerical methods Measures of Central Tendency

Cholesterol lowering intervention for cardiovascular prevention in high risk patients with or without LDL cholesterol elevation

Antihypertensive Trial Design ALLHAT

Clinical Controversies in Perioperative Medicine

Myocardial Infarction In Dr.Yahya Kiwan

Andrew Cohen, MD and Neil S. Skolnik, MD INTRODUCTION

Cardiovascular Disease and Commercial Motor Vehicle Driver Safety. Physical Qualifications Division April 10, 2007

Evidence Supporting Post-MI Use of

4/7/ The stats on heart disease. + Deaths & Age-Adjusted Death Rates for

Carotid Artery Stenosis

How to use research to lead your clinical decision

! Parallels with clinical studies.! Two (of a number of) concerns about data from trials.! Concluding comments

The CARI Guidelines Caring for Australians with Renal Impairment. Cardiovascular Risk Factors

The Latest Generation of Clinical

JAMA. 2011;305(24): Nora A. Kalagi, MSc

Dyslipidemia in women: Who should be treated and how?

CADTH Therapeutic Review

ISPOR Task Force Report: ITC & NMA Study Questionnaire

5/2/2016. Outpatient Stroke Management Sheila Smith MD May 5, 2016

Interested parties (organisations or individuals) that commented on the draft document as released for consultation.

Trial designs fully integrating biomarker information for the evaluation of treatment-effect mechanisms in stratified medicine

Systematic Reviews and Meta- Analysis in Kidney Transplantation

European Federation of Statisticians in the Pharmaceutical Industry (EFSPI)

Varenicline and cardiovascular and neuropsychiatric events: Do Benefits outweigh risks?

Post Operative Troponin Leak: David Smyth Christchurch New Zealand

Aspirin for cardiovascular disease prevention

Perioperative Infarcts: Epidemiology, predictors and post-op monitoring

Discussion Meeting for MCP-Mod Qualification Opinion Request. Novartis 10 July 2013 EMA, London, UK

CARDIOVASCULAR RISK and NSAIDs

A Case Study: Two-sample categorical data

Subgroup analyses in randomised controlled trials: quantifying the risks of false-positives and false-negatives

Statistical reports Regression, 2010

The Effect of Statin Therapy on Risk of Intracranial Hemorrhage

How to Interpret a Clinical Trial Result

Quantitative benefit-risk assessment: An analytical framework for a shared understanding of the effects of medicines. Patrick Ryan 21 April 2010

European Statistical Meeting on Non-Inferiority. Non-inferiority trials are unethical

PROCORALAN MAKING A STRONG ENTRY TO THE NEW ESC GUIDELINES FOR THE MANAGEMENT OF HEART FAILURE

The Role of Basic Science in Evidence-Based Medicine

Modeling & Simulation to support evaluation of Safety and Efficacy of Drugs in Older Patients

Secretary-General of the European Commission, signed by Mr Jordi AYET PUIGARNAU, Director

ESC Geoffrey Rose Lecture on Population Sciences Cholesterol and risk: past, present and future

History of FDA Encouragement to Consider Subgroup Variability. Robert Temple, MD JHU-CERSI Symposium Heterogeneity

ALLHAT Role of Diuretics in the Prevention of Heart Failure - The Antihypertensive and Lipid- Lowering Treatment to Prevent Heart Attack Trial

Measurement and meaningfulness in Decision Modeling

Retrospective power analysis using external information 1. Andrew Gelman and John Carlin May 2011

Transcription:

Large simple randomized trials and therapeutic decisions Adam La Caze November 3, 2011 Contents 1 Background 1 2 Large simple randomized trials 3 2.1 The argument for large simple trials.................. 3 2.2 The assumptions regarding Type S errors are unmotivated....... 7 2.3 Accurate estimation of effects are important for therapeutic decisions. 8 3 Alternative approaches 8 3.1 Rothwell s rules............................. 8 3.2 Model-based approaches........................ 9 1 Background Large simple randomized trials Salim Yusuf, Rory Collins and Richard Peto (1984) argue large and simple randomized trials are the best way to determine small to modest benefits in important endpoints such as death The simple trials: Simple inclusion and exclusion criteria Simple intervention 1

Focus on single important endpoint, e.g. mortality Collect minimal prognostic data Yusuf et al. argue that such trials are reliable and clinically relevant Not all randomized trials are simple randomized trials, but there are numerous examples of large and simple trials Yusuf et al. have conducted (and continue to conduct) many. (See, for instance Population Health Research Institute and the POISE study (2008)) I focus on the arguments given for large simple randomized trials and the limitations of these arguments especially in terms of therapeutic decisions. Much of my focus is negative, but I will sketch an alternative approach (after all, whatever the limitations of large simple randomized trials, we would be stuck with them if they provided the only feasible approach). I don t suggest that large simple trials are not clinically relevant on the contrary they can answer an important therapeutic question (a population question) quite well. Rather, I suggest (i) that large simple trials don t do as well on certain (individual) therapeutic decisions are Yusuf et al. argue, and (ii), somewhat more speculatively, that alternative methods can do better on these individual therapeutic decisions. Examples Example 1 (ISIS-2). ISIS-2 (1988) randomized 17,187 patients suffering acute myocardial infarction from 16 countries to treatment with either streptokinase, aspirin, streptokinase and aspirin, or placebo ISIS-2 resolved important clinical controversies about the use of aspirin and thrombotics in patients suffering an acute myocardial infarction. It changed practice and, when compared to what was then standard practice, it has saved lives. Peto et al. (1995, 25) quotes a survey showing that routine use of aspirin in acute coronary care went from under 10% in 1987 to over 90% in 1989. Example 2 (POISE). POISE (2008) randomized 8351 patients undergoing non-cardiac surgery to peri-operative metoprolol Perioperative metoprolol reduced the primary endpiont (cardiovascular death, non-fatal myocardial infarction, non-fatal cardiac arrest): 5.8% c.f. 6.9%, p=0.0399 But, increased total mortality: 3.1% c.f. 2.3%, p=0.0317; and stroke 1.0% c.f. 0.5%, p=0.0053 Example 3 (POISE-2). POISE-2 is currently enrolling patients undergoing non-cardiac surgery to test the effects of aspirin (or placebo) and clonidine (or placebo) on mortality and non-fatal myocardial infarction 2

The function of randomized trials 1. Regulatory approval for the marketing of medicines Are the benefits of the drug likely to outweigh the harms in a population of patients? 2. Inform therapeutic decisions Are the benefits of the drug likely to outweigh the harms in an individual patient? The subgroup problem In the hope of individualising therapy, matching the characteristics of the individual to a relevant group of patients within the trial is tempting. But the statistical properties of subgroup analyses are often (very) poor Brookes et al. (2001) conducted a simulation study to quantify the risks of falsepositive and false-negative results in subgroup analyses. When there was no overall effect from treatment, 7 26% of trials showed one subgroup analysis gave a statistically significant result; When there was an overall effect from treatment, only one of the two subgroup analyses gave statistically significant results in 41 66% of trials. There are statistical approaches to improve inferences about subgroup effects, but none resolve the problem: formal tests of subgroup interactions reduce the false positives (but most trials are underpowered to assess). Responses to the subgroup problem: Alvan Feinstein (1984, p. 421) made the following comment in discussion of (Yusuf et al., 1984) The main problem, it seems to me, is again the question of whether we are evaluating two treatments or are we evaluating treatments for the care of patients? The different kinds of patient that are being lumped together into these heterogenous pastiches under the name of the same disease or under the name of the same therapeutic agents may produce results with excellent statistical ability to compare two treatments, but will be relatively worthless when people try to use the consequences in practice. What is required in a degree of humility in the face of an issue for which there is no statistical or clinical solution. [... ] The development of randomised clinical trials since Mackenzie s time has provided a much sounder 3

basis for making decisions about abstract patients and if representative samples of patients are included in the trials for deciding if the overall effect on population health of a treatment is beneficial or harmful. Randomised trials have not, however, answered the question of which individuals actually benefit from medical interventions. This, surely, is the key issue in clinical research in for the next millennium. (Smith and Egger, 1998) Rothwell (2007c, 142) comments on the logic of the argument that what matters is overall benefits and harms: The need for reliable data on risks and benefits in subgroups and individuals is greatest for potentially harmful interventions, such as warfarin or carotid endarterectomy, which are of overall benefit but which kill or disable a significant proportion of patients. yet, evidence-based guidelines usually recommend these treatments in all cases similar to those in the relevant RCTs. IN considering this approach, it is useful to draw an analogy with the criminal justice system. Suppose that research showed that individuals charged by the police with certain crimes were usually guilty. Few would argue that they should therefore be sentenced without trial. Automatic sentencing would, on average, do more good than harm, with most criminals correctly convicted, but any avoidable miscarriages of justice are widely regarded as unacceptable. In contrast, relatively high rates of treatment-related death or disability ( miscarriages of treatment ) are tolerated by the medical scientific community precisely on the basis that, on average, treatment will do more good than harm. Model-based drug development Model-based drug development uses mathematical models to account for and predict variation in pharmacological, pharmacokinetic and pharmacodynamic relationships over time: Dose P K Exposure : Exposure P D Biomarker response Sheiner (1997) made the argument that a key source for inefficiency in drug development was the insufficient use of the right kind of methodological tools in the learning phases of drug development. Additional relationships that are modelled include: Target Drug activity and Biomarker response Clinical outcome. 4

2 Large simple randomized trials 2.1 The argument for large simple trials Outline of the argument in Yusuf et al. (1984) 1. Effects of an intervention on an important endpoint (e.g. death) are likely to be modest 2. Well-conducted large randomized trials are more reliable in testing the modest effects of an intervention than observational studies 3. To be feasible large trials have to be simple 4. Large simple trials are clinically relevant Simple = broad and practical enrolment; little prognostic data collected; focus on single endpoint Relevant = relevance of overall effect c.f. subgroups Well-conducted randomised trials have to be large to reliably test for modest effects This can be illustrated by a hypothetical trial that is actually quite inadequate... in which a 20 per cent reduction in mortality is supposed to be detected among 2000 patients (1000 treated and 1000 not).... Even if exactly this difference were observed, however, it would not be conventionally significant (P = 0.1).... Yusuf et al. (1984, 412) Hence, reliable is construed within the context of a frequentist hypothesis test. Large simple trials are clinically relevant A key principle underlying the argument that clinical trials can be simple and yet provide medically relevant conclusions involves careful distinction between quantitative interactions and qualitative interactions. Yusuf et al. (1984, 413) 5

Quantitative and qualitative interactions Quantitative interactions: different magnitude, same direction Qualitative interactions: different direction, i.e. benefit in one subgroup, harm in another Andrew Gelman 1 calls the errors that can arise from undetected interactions: Type M (measurement) errors and Type S (signal) errors. Unanticipated qualitative interactions (Type S errors)... unanticipated qualitative interactions (whereby treatment is of substantial benefit among one recognizable category of patients in a trial and not among another) are probably extremely rare, even though in retrospective subgroup analysis they may seem extremely common. Our expectation is not that all qualitative interaction are unlikely, but merely that unanticipated qualitative interactions are unlikely... Yusuf et al. (1984, 413) Qualitative interactions are either (i) unanticipated and unlikely, or (ii) anticipated and incorporated into the specification of the trial. Shorter Yusuf et al. 1984 1. Conduct large simple trials because frequentist statistical approaches require them for identifying small to modest effect sizes 2. Individual therapeutic decisions can be based on the overall trial results because: (a) Frequentist statistical analyses of subgroup data unreliable or infeasible (b) Unanticipated Type S errors are rare (c) Anticipated Type S errors are avoided in trial design (d) Type M errors are common but unimportant to decisions Quotes representing this view: 1 See for instance, Gelman and Tuerlinckx (2000) 6

The treatment that is appropriate for one patient may be inappropriate for another. Ideally, therefore, what is wanted is not only an answer to the question Is this treatment helpful on average for a wide range of patients?, but also an answer to the question For which recognisable categories of patient is this treatment helpful? This ideal is, however, difficult to attain, for the direct use of clinical trial results in particular subgroups of patients is surprisingly unreliable. (Peto et al., 1995, p. 35) There are two main remedies for this unavoidable conflict between the reliable subgroup-specific conclusions that doctors want and the unreliable findings that direct subgroup analyses can usually offer. But, the extent to which these remedies are helpful in particular instances is one on which informed judgements differ. The first is to emphasise chiefly the overall results for particular outcomes as a guide (or at least a context for speculation) as to the qualitative results in various specific subgroups of patients, and to give proportionally less weight to the actual results in that subgroup than to extrapolation of the overall results. The second is to be influenced, in discussing the likely effects on mortality in specific subgroups, not only on the mortality in these subgroups, but also by the analyses of recurrence-free survival or some other surrogate outcome. (Peto et al., 1995, p. 35) 2.2 Problems with the argument for clinical relevance Clinically important differences in response are the norm Rothwell (2007c) provides numerous examples where clinically important differences in treatment effect arise Heterogeneity related to risk (risk of treatment and risk without treatment) Heterogeneity related to pathophysiology Heterogeneity related to stage of disease and timing of intervention Heterogeneity related to comorbidity Not all of these sources or heterogeneity are known to occur in specific cases and so can not be anticipated. And even when a difference in treatment effect can be anticipated, it can t always be incorporated when specifying the trial. Examples: Risk: difference in absolute risk without treatment (hypertension and stroke) Pathophysiology: genetic variation (response to treatment); aspirin in cardiovascular disease (CRP) 7

Timing: thrombolytics; lipid agents with differing LDL level Comorbidity: thiazides and betablockers in hypertensive patients with diabetes Type S errors are to be expected Progress in clinical science continuously identifies groups of patients who are particularly benefited or harmed by a therapy Example 4 (ISIS-2). Since ISIS-2 much of the progress in the use of thrombolytic therapy is in being able to identify groups of patients who respond differently based on their ECG (e.g ST interval depression) and area of infarct. Therapeutic decisions depend on effect sizes Therapeutic decisions in individuals requires weighing likely benefits with potential risks it is not just the overall direction of the effect, the estimating the magnitude is critical for decisions Effect sizes vary in different patient subgroups, hence the risk of Type M errors is high Trial RR 95% CI ARR 95% CI MRC 0.55 0.40 0.75 0.12 0.06 1.17 STOP 0.53 0.33 0.86 1.45 0.45 2.45 Heterogen. p=0.90 Heterogen. p=0.009 Example 5 (Antihypertensives). Table 1: Rothwell (2007c, 141) Table 1 compares two antihypertensive trials (MRC and STOP). MRC was conducted on relatively young (otherwise healthy) patients. STOP was conducted in elderly patients with multiple comorbidities. Despite providing effects in the same direction Estimates of effect size separate two treatments that provide benefits overall. And estimates are critical when weighing up the risks and benefits. Yusuf et al. agree that quantitative interactions in subgroups are common (i.e. the possibility of Type M errors), but don t seem to recognise the importance of this point for therapeutic decision makers. The importance of accurately estimating treatment effects undermines the clinical relevance of large simple trials. 8

Errors based on subgroup interactions are to be expected Errors are particularly difficult to identify when: The intervention is new Trials recruit heterogenous patient populations (i.e. large simple trials) The endpoint has multiple causes, e.g. death a variation on effect sizes on nonfatal MI or bleeding rates may create Type S errors on mortality. Weaker Yusuf et al. 1984 1. Conduct large simple trials because frequentist statistical approaches require them for identifying small to modest effect sizes 2. Individual therapeutic decisions can be based on the overall trial results when the following assumptions hold: (a) An unanticipated Type S error is considered unlikely (b) All anticipated Type S errors are avoided in trial design (c) Type M errors of a clinically important magnitude are considered unlikely 3 Alternative approaches 3.1 Rothwell s rules Rothwell s rules (2007a) Trial design Subgroups analyses should be defined a priori and limited to a small number focussing on the primary endpoint. Direction and magnitude should be predicted (and reported). Obtain expert clinical input in design Test with formal subgroup interaction test Analysis and reporting Adjust for multiple subgroup analyses Report as absolute and relative risks 9

Check for comparability of prognostic factors in subgroups Interpretation Ignore statistical significance of effect of treatment in individual subgroups Reproducibility is the best test for subgroup-treatment effects The false-negative rate for formal subgroup interaction tests is high (due to lack of power) Rules for subgroup analysis (Rothwell, 2007a, Panel 11.2) Rothwell provides rules for approaching subgroup analyses for trial design, analysis and reporting and interpretation. Essentially, the rules provide best practice given the subgroup problem and the use of Neyman-Pearson hypothesis testing Focusses solely on anticipated subgroup interactions Provides advice for setting up trials to better identify subgroups that respond differently 3.2 Model-based approaches Model-based drug development Model-based drug development Key criticisms of traditional drug development approaches (i.e. hypothesis testing in early drug development): 1. Inefficient use of information (both prior information and data collected during early studies) 2. provides a dichotomous answer when what is needed is an understanding of the relationship between dose and exposure (pharmacokinetic models) and exposure and response Model-based drug development incorporates a range of approaches, including: modelling, simulation, and adaptive designs 10

Figure 1: Sheiner (1997, 276) Model-based drug development 1. Build model: based on understanding of the target phenomena and available data A range of different models are used. May be placed in a hierarchy and make predictions about populations, groups, individual or observations. Include population parameters (clearance, volume of distribution), experimentally controllable variables (dose), and independent variables (time). Often incorporate stochastic variability (within-subject variability, measurement error) 2. Model criticism using data collected Test assumptions; select from a family of models 3. Conduct analyses Extending model-based approaches Large-scale simple (confirmatory) trials are open the same criticisms made of traditional early-stage drug trials: inefficient use of information, dichotomised results c.f. understanding variation Model-based approaches can make some therapeutic decisions tractable: what key prognostic factors influence outcomes? and, by how much? 11

Example 6 (Holford and Nutt (2011)). Standard descriptive analyses are unable to identify disease-modifying effects of treatments of Parkinson s disease it is difficult to separate disease progression, symptomatic effects and disease modifying effects Holford and Nutt (2011) use a modelling approach to better assess diseasemodifying effects from three large Parkinson s disease trials S(t) = S(0) + (α + β C e (t)) t + E max C e (t) EC 50 + C e (t) Where: S(t) is disease state at time = t; S(0) is the disease state at time = 0 (without treatment); α is disease progression; β represents disease-modifying effects; C e (t) plasma concentration of drug at time = t; E max is the maximum pharmacological effect, and EC 50 is the plasma concentration of the drug producing half the maximum drug effect. Figure 2: Holford and Nutt (2011) Concluding remarks 12

Questions/criticisms for the model-based approach What is the respective role of theory and data in specifying and evaluating the model? What role (if any) can model-based approaches play in confirming efficacy/effectiveness? Large simple randomized trials versus large randomized trials References Brookes, S., Whitely, E., Peters, T., Mulheran, P. A., Egger, M., and Davey Smith, G. (2001). Subgroup analyses in randomised controlled trials: quantifying the risks of false-positives and false-negatives. Health Technology Assessment, 5(33):1 58. 3 Feinstein, A. R. (1984). Why do we need some large, simple randomized trials? Discussion. In Yusuf et al. (1984), pages 421 422. 3 Gelman, A. and Tuerlinckx, F. (2000). Type s error rates for classical and bayesian single and multiple comparison procedures. Computational Statistics, 15(3):373 390. 5 Group, I.-. C. (1988). Randomised trial of intravenous streptokinase, oral aspirin, both, or neither among 17 187 cases of suspected acute myocardial infarction: Isis-2. The Lancet, 332(8607):349 360. 2 Holford, N. H. G. and Nutt, J. G. (2011). Interpreting the results of Parkinson s disease clinical trials: Time for a change. Movement disorders : official journal of the Movement Disorder Society, 26(4):569 577. 11 Peto, R., Collins, R., and Gray, R. (1995). Large-scale randomized evidence: Large, simple trials and overviews of trials. Journal of Clinical Epidemiology, 48(1):23 40. 2, 6 POISE Study Group (2008). Effects of extended-release metoprolol succinate in patients undergoing non-cardiac surgery (POISE trial): a randomised controlled trial. The Lancet, 371(9627):1839 1847. 2 Rothwell, P. M. (2007a). Reliable estimation and interpretation of the effects of treatment in subgroups. In Rothwell (2007b). 8, 9 Rothwell, P. M., editor (2007b). Treating Individuals: From randomised trials to personalised medicine. Elsevier, Philadelphia. 13 Rothwell, P. M. (2007c). When should we expect clinically important differences in response to treatment? In Rothwell (2007b). 4, 7, 8 Sheiner, L. B. (1997). Learning versus confirming in clinical drug development. Clin Pharmacol Ther, 61(3):275 91. 9, 10 13

Smith, G. D. and Egger, M. (1998). Incommunicable knowledge? Interpreting and applying the results of clinical trials and meta-analyses. Journal of Clinical Epidemiology, 51(4):289 295. 4 Yusuf, S., Collins, R., and Peto, R. (1984). Why do we need some large, simple randomized trials? Statistics in Medicine, 3:409 420. 1, 3, 4, 5, 6, 12 4 Additional examples of large simple trials Examples Example 7 (POISE). POISE (2008) randomized 8351 patients undergoing non-cardiac surgery to peri-operative metoprolol Perioperative metoprolol reduced the primary endpiont (cardiovascular death, non-fatal myocardial infarction, non-fatal cardiac arrest): 5.8% c.f. 6.9%, p=0.0399 But, increased total mortality: 3.1% c.f. 2.3%, p=0.0317; and stroke 1.0% c.f. 0.5%, p=0.0053 Example 8 (POISE-2). POISE-2 is currently enrolling patients undergoing non-cardiac surgery to test the effects of aspirin (or placebo) and clonidine (or placebo) on mortality and non-fatal myocardial infarction 5 Quotes regarding the subgroup problem Alvan Feinstein (1984, p. 421) made the following comment in discussion of (Yusuf et al., 1984) The main problem, it seems to me, is again the question of whether we are evaluating two treatments or are we evaluating treatments for the care of patients? The different kinds of patient that are being lumped together into these heterogenous pastiches under the name of the same disease or under the name of the same therapeutic agents may produce results with excellent statistical ability to compare two treatments, but will be relatively worthless when people try to use the consequences in practice. What is required in a degree of humility in the face of an issue for which there is no statistical or clinical solution. [... ] The development of randomised clinical trials since Mackenzie s time has provided a much sounder basis for making decisions about abstract patients and if representative 14

samples of patients are included in the trials for deciding if the overall effect on population health of a treatment is beneficial or harmful. Randomised trials have not, however, answered the question of which individuals actually benefit from medical interventions. This, surely, is the key issue in clinical research in for the next millennium. (Smith and Egger, 1998) Rothwell (2007c, 142) comments on the logic of the argument that what matters is overall benefits and harms: The need for reliable data on risks and benefits in subgroups and individuals is greatest for potentially harmful interventions, such as warfarin or carotid endarterectomy, which are of overall benefit but which kill or disable a significant proportion of patients. Yet, evidence-based guidelines usually recommend these treatments in all cases similar to those in the relevant RCTs. In considering this approach, it is useful to draw an analogy with the criminal justice system.... Suppose that research showed that individuals charged by the police with certain crimes were usually guilty. Few would argue that they should therefore be sentenced without trial. Automatic sentencing would, on average, do more good than harm, with most criminals correctly convicted, but any avoidable miscarriages of justice are widely regarded as unacceptable. In contrast, relatively high rates of treatment-related death or disability ( miscarriages of treatment ) are tolerated by the medical scientific community precisely on the basis that, on average, treatment will do more good than harm. Rothwell (2007c, 142) 15