Viewpoints on Setting Clinical Trial Futility Criteria

Similar documents
Objectives. Quantifying the quality of hypothesis tests. Type I and II errors. Power of a test. Cautions about significance tests

Lesson 11.1: The Alpha Value

Cost-Utility Analysis (CUA) Explained

EFFECTIVE MEDICAL WRITING Michelle Biros, MS, MD Editor-in -Chief Academic Emergency Medicine

Nonjudgmentally and Cognitive Therapy

Student Performance Q&A:

When Your Partner s Actions Seem Selfish, Inconsiderate, Immature, Inappropriate, or Bad in Some Other Way

Making comparisons. Previous sessions looked at how to describe a single group of subjects However, we are often interested in comparing two groups

BASIC VOLUME. Elements of Drug Dependence Treatment

18 INSTRUCTOR GUIDELINES

How Does Analysis of Competing Hypotheses (ACH) Improve Intelligence Analysis?

Sample Size Considerations. Todd Alonzo, PhD

e.com/watch?v=hz1f yhvojr4 e.com/watch?v=kmy xd6qeass

July Introduction

Our goal in this section is to explain a few more concepts about experiments. Don t be concerned with the details.

The Conference That Counts! March, 2018

Further Properties of the Priority Rule

Against Securitism, the New Breed of Actualism in Consequentialist Thought

Choosing Life: Empowerment, Action, Results! CLEAR Menu Sessions. Substance Use Risk 2: What Are My External Drug and Alcohol Triggers?

PROFESSIONALISM THE ABC FOR SUCCESS

I. INTRODUCING CROSSCULTURAL RESEARCH

The Good, the Bad and the Blameworthy: Understanding the Role of Evaluative Reasoning in Folk Psychology. Princeton University

Why Is It That Men Can t Say What They Mean, Or Do What They Say? - An In Depth Explanation

TTI Personal Talent Skills Inventory Coaching Report

Full title: A likelihood-based approach to early stopping in single arm phase II cancer clinical trials

Lidia Smirnov Counselling

Bayes theorem describes the probability of an event, based on prior knowledge of conditions that might be related to the event.

B - Planning for a Turkey Flu Pandemic

HOW TO STOP SELF-SABOTAGE

The Logic of Data Analysis Using Statistical Techniques M. E. Swisher, 2016

Homework Tracking Notes

9 INSTRUCTOR GUIDELINES

Chapter 11 Nonexperimental Quantitative Research Steps in Nonexperimental Research

Southern Safety Tri-Lateral Stop Work Authority/Intervention and Video Update

GCSE EXAMINERS' REPORTS

A Case Study: Two-sample categorical data

The Nonuse, Misuse, and Proper Use of Pilot Studies in Education Research

Why do Psychologists Perform Research?

The University of Manchester Library. My Learning Essentials. Now or never? Understanding the procrastination cycle CHEAT SHEET.

Power of a Clinical Study

[(EMBEDDED SYSTEMS DESIGN )] [AUTHOR: STEVE HEATH] [DEC-2002] BY STEVE HEATH

How is ethics like logistic regression? Ethics decisions, like statistical inferences, are informative only if they re not too easy or too hard 1

The comparison or control group may be allocated a placebo intervention, an alternative real intervention or no intervention at all.

VOLUME B. Elements of Psychological Treatment

Patrick Breheny. January 28

Chapter 23. Inference About Means. Copyright 2010 Pearson Education, Inc.

Flex case study. Pádraig MacGinty Owner, North West Hearing Clinic Donegal, Ireland

appstats26.notebook April 17, 2015

SAMPLE SIZE IN CLINICAL RESEARCH, THE NUMBER WE NEED

A Brief Introduction to Bayesian Statistics

Sampling for Impact Evaluation. Maria Jones 24 June 2015 ieconnect Impact Evaluation Workshop Rio de Janeiro, Brazil June 22-25, 2015

How to empower your child against underage drinking

Chapter 25. Paired Samples and Blocks. Copyright 2010 Pearson Education, Inc.

John Broome, Weighing Lives Oxford: Oxford University Press, ix + 278pp.

Understanding Homelessness

SISCR Module 4 Part III: Comparing Two Risk Models. Kathleen Kerr, Ph.D. Associate Professor Department of Biostatistics University of Washington

8/28/2017. If the experiment is successful, then the model will explain more variance than it can t SS M will be greater than SS R

THEORIES OF PERSONALITY II Psychodynamic Assessment 1/1/2014 SESSION 6 PSYCHODYNAMIC ASSESSMENT

Managing Other Medical Conditions

SITUATIONAL STRUCTURE AND INDIVIDUAL SELF-ESTEEM AS DETERMINANTS OF THREAT-ORIENTED REACTIONS TO POWER. Arthur R» Cohen

Unit 3: EXPLORING YOUR LIMITING BELIEFS

The Power to Change Your Life: Ten Keys to Resilient Living Robert Brooks, Ph.D.

WRITTEN ASSIGNMENT 1 (8%)

To conclude, a theory of error must be a theory of the interaction between human performance variability and the situational constraints.

Functionalist theories of content

Exemplar for Internal Assessment Resource Mathematics and Statistics Level 1 Resource title: Carbon Credits

Assignment 4: True or Quasi-Experiment

Greg Pond Ph.D., P.Stat.

How to use research to lead your clinical decision

Small glasses Big consequences!

Copyright 2013 Mayo Foundation 1

Assertive Communication

Clinical Trials. Susan G. Fisher, Ph.D. Dept. of Clinical Sciences

15.301/310, Managerial Psychology Prof. Dan Ariely Recitation 8: T test and ANOVA

What is Data? Part 2: Patterns & Associations. INFO-1301, Quantitative Reasoning 1 University of Colorado Boulder

Psychology. April 17, Tuesday.

1. UTILITARIANISM 2. CONSEQUENTIALISM 3. TELEOLOGICAL 4. DEONTOLOGICAL 5. MOTIVATION 6. HEDONISM 7. MORAL FACT 8. UTILITY 9.

Examiners Report/ Principal Examiner Feedback. Summer GCE Statistics S3 (6691) Paper 01

Your goal in studying for the GED science test is scientific

NEED A SAMPLE SIZE? How to work with your friendly biostatistician!!!

Psychology Research Process

Evaluation Models STUDIES OF DIAGNOSTIC EFFICIENCY

Villarreal Rm. 170 Handout (4.3)/(4.4) - 1 Designing Experiments I

Black Butterfly: A Statement on Counseling Minority Youth. Kimberly McLeod. Texas Southern University

Selection at one locus with many alleles, fertility selection, and sexual selection

On the diversity principle and local falsifiability

NEW METHODS FOR SENSITIVITY TESTS OF EXPLOSIVE DEVICES

2 Critical thinking guidelines

Mourn the Past, Create the Future: Adjusting to the Diagnosis

Gage R&R. Variation. Allow us to explain with a simple diagram.

STAT 200. Guided Exercise 4

CONTROLLABLE DOSE: A discussion on the control of individual doses from single sources. Roger H Clarke

Overcoming Perfectionism

Comparison of Futility Monitoring Methods Using RTOG Clinical Trials. Q. Ed Zhang, PhD

Review of Animals and the Economy. Steven McMullen Palgrave, pp., ebook and hardcover. Bob Fischer Texas State University

Re-Modelling NLP: Part Thirteen: Part B Re-Modelling Perceptual Positioning and Processing John McWhirter

RADM Patrick O Carroll, MD, MPH Senior Advisor, Assistant Secretary for Health, US DHSS

Living My Best Life. Today, after more than 30 years of struggling just to survive, Lynn is in a very different space.

Statistics for Clinical Trials: Basics of Phase III Trial Design

Diets Cause Weight Re-GAIN Here s How

Transcription:

Viewpoints on Setting Clinical Trial Futility Criteria Vivian H. Shih, AstraZeneca LP Paul Gallo, Novartis Pharmaceuticals BASS XXI November 3, 2014

Reference Based on: Gallo P, Mao L, Shih VH (2014). Alternative views on setting clinical trial futility criteria. Journal of Biopharmaceutical Statistics, 24(5):976-993. 2 BASS XXI Viewpoints on Setting Clinical Trial Futility Criteria V. Shih and P. Gallo November 3, 2014

Stopping Trials for Lack of Effect Futility: based on interim results, a trial seems unlikely to achieve its objectives Specific motivations for allowing the possibility of early stopping are situationdependent, but generally obvious Time Cost Ethics Resource reallocation 3 BASS XXI Viewpoints on Setting Clinical Trial Futility Criteria V. Shih and P. Gallo November 3, 2014

Typical Efficacy Scheme Level = 2.5% z-score ` 1st 2nd 3rd Final 4 BASS XXI Viewpoints on Setting Clinical Trial Futility Criteria V. Shih and P. Gallo November 3, 2014

Impose a Futility Boundary Level < 2.5% z-score ` 1st 2nd 3rd Final 5 BASS XXI Viewpoints on Setting Clinical Trial Futility Criteria V. Shih and P. Gallo November 3, 2014

Level is Decreased Level < 2.5%: because outcomes like this can occur! z-score ` Of course, power decreases also 1st 2nd 3rd Final 6 BASS XXI Viewpoints on Setting Clinical Trial Futility Criteria V. Shih and P. Gallo November 3, 2014

Terminology Aggressiveness z-score ` more aggressive less aggressive; more cautious 1st 2nd 3rd Final 7 BASS XXI Viewpoints on Setting Clinical Trial Futility Criteria V. Shih and P. Gallo November 3, 2014

Assumptions Non-binding futility boundary i.e., we don t modify success criteria to buy back lost α consistent with an understanding that futility is a soft decision (guidelines, not rules) We ll compare schemes in terms of power loss Another option: increase SS to regain lost power No early stopping for efficacy Notation: = hypothesized design effect, d = point estimate I = information time, z I = corresponding test stat 8 BASS XXI Viewpoints on Setting Clinical Trial Futility Criteria V. Shih and P. Gallo November 3, 2014

Tools for Addressing Futility Conditional power (CP) calculations usually conditions on the original study alternative sometimes on other quantities (e.g. point estimate) Predictive probability (PP) usually non-informative prior Beta-spending functions describes cumulative Type II error across the interim and final looks Others (B-value, stochastic curtailment, reject H A ) 9 BASS XXI Viewpoints on Setting Clinical Trial Futility Criteria V. Shih and P. Gallo November 3, 2014

Which Approach to Use? Discussions of the relative merits of the different approaches often seem to focus on philosophical grounds e.g. the assumptions seemingly being made the degree to which quantities might be interpreted as chances of success are they really? What s the real issue? Emerson et al (2005): operating characteristics 10 BASS XXI Viewpoints on Setting Clinical Trial Futility Criteria V. Shih and P. Gallo November 3, 2014

Consultation Examples Two actual proposals / consultations for futility criteria: 1. With 20% of data available, conditional power assuming the original must be at least 5% 2. At ⅔ information, the conditional power computed assuming that the observed effect is the true effect is at least 70% More on these later... 11 BASS XXI Viewpoints on Setting Clinical Trial Futility Criteria V. Shih and P. Gallo November 3, 2014

Possible Scenarios Trial outcome / True state of nature Interim decision Success Failure Stop for futility (Incorrect) (Correct) Continue Correct Incorrect Generally, we d like small chances of outcomes on the diagonal but of course decreasing one increases the other... 12 BASS XXI Viewpoints on Setting Clinical Trial Futility Criteria V. Shih and P. Gallo November 3, 2014

Striking a Balance We can t control error rates nearly as well as we typically do for an entire study {Stopping when we should} versus {continuing when we should} are always in conflict We should aim to strike an appropriate balance while limiting the chance of wrong decisions Proposal: usually, the worse transgression is stopping a trial which would have been successful 13 BASS XXI Viewpoints on Setting Clinical Trial Futility Criteria V. Shih and P. Gallo November 3, 2014

Relationship Between Criteria At a given time point, a futility rule expressed on any particular scale can be transformed to any other For example, in a 2.5% level, 90% power trial, with a single look at I = 50%, say we set a criterion of PP = 20% The same rule can be expressed as: CP = 62% CP(d) = 12% Beta spent = 6.7% Question: is the scale on which we express a futility criterion really that important? 14 BASS XXI Viewpoints on Setting Clinical Trial Futility Criteria V. Shih and P. Gallo November 3, 2014

15 BASS XXI Viewpoints on Setting Clinical Trial Futility Criteria V. Shih and P. Gallo November 3, 2014 Interrelationships I z I z I PP 1 1 I z z I d CP I 1 1 ) ( z I z I Iz I CP I ) (1 1 1 ) ( ) ) / ( ( ) ( 1 I PP d CP ) } {1 ) ( ( ) ( 1 z I PP I CP

90% Power for, I = 0.5 Z-score d / CP( ) CP(d) PP No stopping Power loss Stop under H 0 - - - - 0 0 0 0 32% <1% 3% 0.2% 50% 0.25 0.11 41% 1% 5% 0.6% 60% 0.50 0.22 51% 4% 11% 1.3% 69% 0.75 0.33 61% 10% 18% 2.7% 77% 1.00 0.44 70% 22% 29% 5.1% 84% scales for expressing futility rule behavior 16 BASS XXI Viewpoints on Setting Clinical Trial Futility Criteria V. Shih and P. Gallo November 3, 2014

Aggressiveness / Caution {We need not focus only on H 0, H A; other definitions of weak effect, likely success, etc. could be considered and evaluated} How much risk of stopping when we shouldn t are we willing to pay to buy a desired amount of chance of stopping when we should? Incorporate into a loss function? 17 BASS XXI Viewpoints on Setting Clinical Trial Futility Criteria V. Shih and P. Gallo November 3, 2014

How Aggressive? What are the dimensions of savings of interest? e.g., $, resources, time, patients, etc.? What factors affect the trade-offs? fixed vs variable costs prior belief: how much faith? / evidence from related trials ethics: unknown safety risks for experimental treatment upside: blockbuster, or me too? 18 BASS XXI Viewpoints on Setting Clinical Trial Futility Criteria V. Shih and P. Gallo November 3, 2014

When to Evaluate Futility? Again, a conflict : stopping earlier yields potentially greater savings; but... less ability to distinguish between scenarios which should / should not justify continuing Futility behavior improves with information in 2 ways: added precision from more data less data still to come that can overturn a poor trend Previous example, criteria: z = 0.5 at I = ½, we saw that power loss was 1.3% at I = ¼, it s 9.2% 19 BASS XXI Viewpoints on Setting Clinical Trial Futility Criteria V. Shih and P. Gallo November 3, 2014

Multiple Futility Looks Why not? i.e., in long-term studies There are practical limitations (on both ends) to when looks should take place too early, too late: no point The existence of a later look might impact the choice of criteria at a prior look because a decision to continue does not commit to trial completion, but only to proceed until a later point where data is more mature 20 BASS XXI Viewpoints on Setting Clinical Trial Futility Criteria V. Shih and P. Gallo November 3, 2014

Quantifying the Trade-offs How to extend to multiple looks? The cost of incorrect stopping: how about power loss across the whole scheme? of course, different ways to achieve this. perhaps, equal power loss at each analysis? The benefit of correct stopping: ASN: average sample size under H 0 21 BASS XXI Viewpoints on Setting Clinical Trial Futility Criteria V. Shih and P. Gallo November 3, 2014

Multiple-look Considerations Ideally, we could describe a scheme simply Now the scale matters! equal criteria across looks on one scale could be very unequal on another scale Example: say that at I = ½, we judge CP = 50% to be a sensible criterion What if we also used the same rule at I = ¼, ¾? PP across the 3 looks: 1.3%, 10.0%, 23.0% But is there any reason to expect that the same CP threshold behaves well at the other timepoints? hint: it doesn t... 22 BASS XXI Viewpoints on Setting Clinical Trial Futility Criteria V. Shih and P. Gallo November 3, 2014

Optimality Optimal boundaries: For a given schedule of analyses, and a specified amount of power loss, we can define boundaries that minimize ASN optimization done by grid search In what follows, we ll assume 3 looks at I = 0.25, 0.50, 0.75, and describe various boundaries: equal CP equal CP(d) equal PP equal power loss optimal (as above) 23 BASS XXI Viewpoints on Setting Clinical Trial Futility Criteria V. Shih and P. Gallo November 3, 2014

ASN vs Power Loss Equal PP boundaries at the 3 looks is quite close to optimal. Equal CP fares particularly poorly. 24 BASS XXI Viewpoints on Setting Clinical Trial Futility Criteria V. Shih and P. Gallo November 3, 2014

Comparing Boundaries: 1% Power Loss 25 BASS XXI Viewpoints on Setting Clinical Trial Futility Criteria V. Shih and P. Gallo November 3, 2014

1% Power Loss Boundaries Boundary type Common value Futility boundary on Z-scale ASN 1 st look 2 nd look 3 rd look Equal CP 0.347 0.636-1.622 0.087 1.101 Equal CP(d) 0.0004 0.637-0.472-0.291 0.245 Equal PP 0.033 0.590-0.612 0.086 0.780 Equal power loss 0.0033 0.595-0.819 0.138 0.972 Optimal - 0.585-0.660 0.160 0.860 26 BASS XXI Viewpoints on Setting Clinical Trial Futility Criteria V. Shih and P. Gallo November 3, 2014

What do Good Boundaries Look Like? Optimal boundaries for various amounts of power loss: 27 BASS XXI Viewpoints on Setting Clinical Trial Futility Criteria V. Shih and P. Gallo November 3, 2014

What do Good Boundaries Look Like? Interim results should not be expected to predict well the final study results!! Personal viewpoint: {power loss 1 2%?} early in a study, correspond to negative outcomes cross into positive territory somewhere towards the middle of the trial never correspond to highly favorable outcomes 28 BASS XXI Viewpoints on Setting Clinical Trial Futility Criteria V. Shih and P. Gallo November 3, 2014

Message My experience: trial teams encouraged by the knowledge that their study proceeded beyond a futility analysis, and then disappointed The proper interpretation of continuation beyond a futility evaluation is: not that the trial is likely to succeed but rather, that it has a chance to succeed or else we would stop too many trials that turn out to be successful 29 BASS XXI Viewpoints on Setting Clinical Trial Futility Criteria V. Shih and P. Gallo November 3, 2014

Back to (Flawed) Consultation Examples When 20% of the data is available, continue the trial as long as the conditional power (assuming the original ), is at least 5% This would correspond to z = - 4.6 Basically impossible to reach even under H 0 A substantial signal of harm 30 BASS XXI Viewpoints on Setting Clinical Trial Futility Criteria V. Shih and P. Gallo November 3, 2014

Consultation Example ⅔ into the trial, continue the study only if the conditional chance of success, computed under the assumption that the observed effect is the true effect, is at least 70% As stated, this must correspond to an observed effect greater than the value that would be significant at the end of the trial 31 BASS XXI Viewpoints on Setting Clinical Trial Futility Criteria V. Shih and P. Gallo November 3, 2014

Conclusion A futility scheme should be implemented with careful consideration of its motivation and objectives, and quantification of relative costs and trade-offs Familiar expression scales can be a useful device for describing criteria, but are not a substitute for sound investigation of operating characteristics Predictive probability seems to have some benefits in terms of easy description of a scheme which might have desirable properties Sensible futility criteria often correspond to quite poor observed outcomes, and it is important that trial personnel understand this 32 BASS XXI Viewpoints on Setting Clinical Trial Futility Criteria V. Shih and P. Gallo November 3, 2014