Noninferiority Clinical Trials Scott Evans Harvard University Brasilia, Brazil May 7, 2010 Special Obrigado Valdair Pinto, M.D. Interfarma
Outline Rationale and Examples Design Assumptions Constancy Assay Sensitivity Selecting the Active Control Biocreep Selecting the NI Margin Other issues and design alternatives Trial Conduct Trial Integrity and Quality Interim Data Monitoring Analyses ITT vs. Per Protocol Missing Data Switching between Noninferiority and Superiority Reporting Abbreviations NI = Noninferiority CI = Confidence Interval M = Noninferiority Margin SOC = Standard of Care
Single Arm Study Population of Patients Response SAMPLE Apply Intervention YES NO TIME Single Arm Study Limitations: Cannot control for natural history People may have improved anyway Placebo-effect (or Hawthorne effect) Patients/clinicians believe that they are getting better because of treatment; manifesting itself in better outcomes E.g., pain Miss a good result when you observe no change but patients would have gotten worse if left untreated.
Placebo Control Population of Patients Response SAMPLE Randomize to Treatment Investigational Treatment Placebo Control YES NO YES NO TIME P 1 = efficacy of new treatment p 2 = efficacy of control group B A Superiority: H O : p 1 -p 2 = 0 H A : p 1 -p 2 0 C Control Treatment Better 0 p 1 -p 2 New Treatment Better
Placebo Controlled Trial Limitation If an effective SOC treatment exists, then randomization to a placebo may not be ethical Alternative Consider using the SOC as an active control Show noninferiority (NI) to the active control and thus significant efficacy over placebo (indirectly) Retain a fraction of the active control effect Noninferiority (Active Control) Population of Patients Response SAMPLE Randomize to Treatment Investigational Treatment Active Control YES NO YES NO TIME
Framework Premise Active control has been shown to be effective (i.e., superior to placebo) in historical trials Objective Show experimental therapy is no worse or noninferior (in a practical sense) than active control Implication Underlying implication is that the experimental therapy is thus superior to placebo Methodology WARNING NI cannot be demonstrated with nonsignificant p-values from superiority tests High p-value similarity Scientific method Absence of evidence is not evidence of absence
Methodology: Traditional Approach Select NI Margin (M) Selected such that if between-group differences < M are considered acceptable or irrelevant Rule out important relevant (clinical and statistical) differences with reasonable confidence by showing that differences between experimental therapy and active control are < M The analyses consists of obtaining a (2-sided) CI for the difference between the experimental therapy and active control and noting if the CI estimate is < M Hypotheses H 0 : Between group differences > M H A : Between group differences < M (note: 1-sided alternative) Interpretation CLINICALLY INFERIOR A B CLINICALLY NONINFERIOR P 1 = efficacy of new therapy p 2 = efficacy of control group C Noninferiority: H O : p 1 -p 2 - Μ H A : p 1 -p 2 > - Μ D F E Superiority: H O : p 1 -p 2 = 0 H A : p 1 -p 2 0 STATISTICALLY INFERIOR STATISTICALLY SUPERIOR - Μ 0 Noninferiority Margin p 1 -p 2
Motivation Experimental therapy may be better in other ways Better toxicity profile Less expensive Less invasive or complicated (better adherence) Shorter treatment duration Different resistance profile If new therapy can be shown to be NI with respect to efficacy and has other advantages, then it will be a useful product Examples In HIV, less complicated or toxic regimens with similar efficacy to existing regimens are sought Registration of generics To show BID is NI to TID To show that capsule is NI to tablet To identify new treatment options in case resistance develops
Example: ACTG 116A Objective To show that DDI is NI to AZT Rational In 1989, AZT was the only approved ARV and had been shown better than placebo in reducing disease progression Placebo not ethical with approval of AZT More treatments needed considering development of resistance Endpoint: time to AIDS-defining event or death NI if an increase in the risk is not more than 60% I.e., UB of CI for HR (DDI vs. AZT) < 1.6 Example ACTG 116A DDI (500mg/day): HR=1.02 90% CI = (0.79, 1.33) DDI (750mg/day): HR=1.04 90% CI = (0.80, 1.34) DDI 500 mg 750 mg 1 Exact Equality Hazard Ratio 1.6 Range of practical NI
Example: FDA SGE Experience A randomized, double-blind, multicenter study comparing the efficacy and safety of Piperacillin/Tazobactam (PT, 4G/500MG) and Imipenem/Cilastatin (IC, 500MG/500MG) administered intravenously every six hours to treat nosocomial pneumonia in hospitalized patients Example: FDA SGE Experience Active Control: Imipenem/Cilastatin (IC) 60/99 cured New drug: Piperacillin/Tazobactam (PT) 67/98 cured NI margin = 20% Lower bound of 95% CI for the difference in response rates (PT-IC) is 0.066 (> 0.20) Was a margin of 20% too large? NI would be shown for a margin as small as 7% Result PT was noninferior to IC Approved by the FDA
Assumption: Assay Sensitivity Trial is able to detect differences between treatments if they exist Need a sensitive enough instrument to measure response and detect differences if they exist Otherwise, all interventions will display similar responses due to the insensitivity of the instrument Assumption: Constancy Premise: Active control superior to placebo in historical trial Constancy Assumption The effect of active control relative to placebo has not changed Otherwise may not be able to show retention of the effect of the active control over placebo May not be the case in the presence of resistance development or with differing trial conduct (e.g., administration of treatment, differences in populations or endpoints, etc.) Not verifiable in NI trial (without placebo) Implication: conduct the NI trial in the same manner as the trial that established the effect of the active control over placebo Can constancy every hold with changing medical practice?
Active Control Selection Should have established superiority over placebo Substantial magnitude of effect Estimated with precision within the setting in which the NI trial is being conducted Regulatory approval may not be sufficient Consider selecting the best available control Concerns for Poor Selection of an Active Control NI is not transitive If A is noninferior to B and B is noninferior to C, then it does NOT necessarily follow that A is noninferior to C Biocreep Can be problematic when a therapy shown to be noninferior is selected as the active control for the next generation of NI trials Problem for generics and biosimilars Diminished effectiveness due to resistance development Traditional use and public belief of effectiveness
MRSA: The Superbug CNN October 17, 2007 MRSA DMC Discussion ID clinicians insisted antimicrobials are "highly effective" in treating skin infections, but are unable to site evidence as support Then claim it is unethical to do any other trial design FDA Guidance on use of NI trials (released October 2007) NI study designs may be appropriate when there is adequate evidence of a defined effect size for the control treatment so that the proposed NI margin can be supported. For an NI study, having an adequately justified NI margin is essential to having an informative study. If NI studies are being considered, a comprehensive synthesis of the evidence that supports the effect size of the active control and the proposed NI margin should be assembled during the period of protocol development and provided to the FDA along with the protocol. We are asking sponsors to provide adequate evidence to support the proposed NI margin for any indication being studied using active-controlled studies designed to show NI. It is likely, however, that for some indications, such as acute bacterial sinusitis (ABS), acute bacterial exacerbation of chronic bronchitis (ABECB) and acute bacterial otitis media (ABOM), available data will not support the use of an NI design. We recommend that sponsors consider other study designs (e.g., superiority designs) to provide evidence of effectiveness in these three indications.
A5265: Treatment for OC in Africa Fluconazole unavailable (expensive) Nystatin is used as the SOC GV (an inexpensive topical agent) showed excellent in-vitro activity A NI trial of GV compared to Nystatin was proposed But published studies showing the superiority of nystatin to placebo could not be identified No nystatin effect to retain Options Run superiority study But may not be able to claim superiority to placebo upon conclusion Selection of the NI Margin Combination of statistical reasoning and clinical judgment Must be smaller than the effect size of active control over placebo to retain effect Conceptually Maximum treatment difference that is clinically irrelevant Largest treatment difference that is acceptable in order to gain other advantages of the experimental intervention Context dependent Pre-specification important to avoid manipulation Ideally chosen independent of considerations of study power But sample size is very sensitive to its selection, affecting feasibility Directly impacts study conclusions
Choosing the Noninferiority Margin One strategy: Fixed margin approach ( preserving a fraction of the effect ) E.g., set the margin to be half of the estimated effect that the active control had over placebo Note that this approach does not consider the fact that the estimate from historical data is measured with uncertainty STAR Trial NI evaluation of Raloxifene vs. Tamoxifen Primary endpoint: invasive breast cancer Raloxifene is test agent Tamoxifen is the `active control Tamoxifen vs. Placebo: NSABP P1 Trial Subset of Women 50 years old Favors Placebo Favors Tamoxifen RR = 2.12 (1.52-3.03) Interpretation: P increases the rate of invasive breast cancer incidence compared to Tam by 112% (CI: 52% to 203%) 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 Relative Risk for Invasive Br Ca: Placebo / Tamoxifen
Tamoxifen vs. Placebo: NSABP P1 Trial Subset of Women 50 years old Favors Placebo Favors Tamoxifen RR = 2.12 (1.52-3.03) NI margin: 50% of Active control effect retained. 56% increased risk on P 1.28 RR = 1.56 1.84 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 Relative Risk for Invasive Br Ca: Placebo / Tamoxifen Choosing the Noninferiority Margin Another strategy: two 95%-95% CI method Set the NI margin = lower bound of the 95% CI for the effect of the placebo relative to the active control in the placebo controlled trial Addresses the issue of the variability of the effect estimate This criterion is stringent and depends directly on the strength of the evidence in the historical trial
Tamoxifen vs. Placebo: NSABP P1 Trial Subset of Women 50 years old Favors Placebo Favors Tamoxifen RR = 2.12 (1.52-3.03) NI margin: Lower bound of the 95% CI estimate for the RR of Tamoxifen vs. placebo 1.28 RR = 1.52 1.84 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 Relative Risk for Invasive Br Ca: Placebo / Tamoxifen STAR Trial: Claiming NI Using preservation of effect (50%) method The upper bound of the 95% CI estimate for the relative risk needs to be less than 1.56 Using 95-95 method The upper bound of the 95% CI estimate for the relative risk needs to be less than 1.52
Examples of Poor Selection of NI Margin TARGET Trial Evaluated if tirofaban was NI to abciximab for coronary syndromes NI margin was a HR = 1.47 (half of the effect of abciximab in the EPISTENT trial Problem: agent with a HR of 1.47 would not have been considered therapeutically NI to abciximab SPORTIF Trials Ximelegatran compared to warfarin for stroke prevention in atrial fibrillation patients Warfarin event rates in prior trials were 2.3% and 1.2% (SPORTIF III & V) NI margin was selected an absolute 2% difference However this may not rule out a doubling of the event rate Other Design Considerations Participants, endpoints, and other important aspects of the trial should be similar to those used in the trials used to demonstrate the effectiveness of the active control over placebo Can evaluate constancy assumption with active control If endpoints have changed then how can we ensure that: 1. There is an effect to be retained, and that 2. It is in fact retained?
Alternative Design: 3-Arm Trial Consider a 3-arm trial with test intervention, active control, and a placebo control Scientifically very attractive High validity Within-trial validation of NI margin Direct comparison of test intervention vs. placebo Can evaluate constancy assumption Ethical? Consider using when the efficacy of the active control has changed, is small, is in doubt, or is volatile Alternative Design: Estimation-Based Based on principles of estimation rather than hypothesis testing Do not specify explicit hypotheses Avoid making the distinction between superiority and NI Specify the precision with which you want to estimate treatment differences (e.g., the maximum length of a CI for the treatment effect difference) Power the study to estimate the effects with this precision Interpret the CI as usual (ruling out effects in either direction as appropriate) May wish to pre-specify a NI margin for regulatory purposes Particularly useful when the acceptable NI margin is not universal Allows for country-specific interpretation of NI E.g., A5163 ARV+oral etoposide and ARV+BV vs. ARV+liposomal anthracycline for advanced KS Primary endpoint: death or KS progression at 1 year
New Approach: Breaking Down the Research Question Two distinct objectives of NI trials 1. Evaluate if new intervention is NI to active control 2. Evaluate if new intervention is superior to placebo (accounting for uncertainty of historical evidence) Can design trial to address both of these objectives simultaneously or either objective individually (Gau & Ware, SIM, 2007) Testing NI to Active Control May be of interest for assessing comparative effectiveness (i.e., common in government funded trials), for marketing purposes, or (in part) for regulatory support (?) Method Select M based on a conceptual notion of NI (rather than using historical data) Evaluate NI using CIs n the standard way
Testing for Superiority to Placebo Considered the criteria for regulatory approval Less concern with showing NI to active control Some suggest that thus trials should not be called noninferiority trials Reasoning Degree of required efficacy for approval should be independent of trial design Demonstrating any efficacy is sufficient (by law) Inconsistency with retention of effect strategy New intervention could look better than active control (point estimates) but not meet the preservation of effect condition Two trials with different active controls have different standards for success If new intervention is better than active control, should the active control be withdrawn? Testing for Superiority to Placebo Synthesis Method A direct comparison of new intervention to placebo is not possible Thus a comparison is indirect and epidemiological (not a randomized comparison) If constancy can be assumed, then an indirect comparison can be made using estimates from the current trial and the historical trial If constancy is a concern, then adjustments can be made by discounting the synthetic estimate (subjective without supporting data) NI is not considered; M is irrelevant
Issues with Synthesis Method Proper control of type I error is challenging Assumes no bias in historical point estimate Often not the case with publication bias Problematic when constancy assumption does not hold Cannot produce a fixed NI margin No consensus on role of across-trial error Too liberal? Should effectiveness of alternative therapies be a factor in deciding how effective a new therapy needs to be in order to obtain approval? Addressing Both Objectives Design Calculate a sample size for each objective and take the maximum to ensure power for both Analyses Since results from historical trial are known, the two hypothesis tests can be reduced to two tests involving the estimated effect of the new intervention relative to the active control using inequalities
When One Objective is Achieved Superiority to placebo but not NI to active control The new therapy may be useful for patients in which the active control is contraindicated or not available NI to active control but not superior to placebo May indicate lack of constancy or that evidence for active control effect is weak Sample Sizes Generally believed to be larger than most superiority trials but this really depends on assumptions Sample sizes increase with decreasing M Stratification can help Adjusted CIs are generally narrower than unadjusted CIs Consider the costs Weigh the costs of Type I (incorrectly claiming NI) and Type II errors (incorrectly failing to claim NI)
Sample Size For regulatory submissions Still only get α=0.025 on the one side Using 0.05 lowers the level of evidence for drawing conclusions vs. accepted practice in superiority trials Use 2-sided 95% CIs to evaluate NI Important to power for both ITT and per protocol analyses Trial Conduct High quality trial conduct is critical NI can be concluded from poorly conducted trials Poor adherence, missing data, inadvertent enrollment, misclassification, measurement error and treatment crossovers can make treatment appear more similar (i.e., effect toward NI) and thus undermine trial validity Careful planning with diligent patient follow-up and monitoring is important Minimize dropout and non-adherence
Interim Analyses Reasonable to stop for futility (i.e., not being able to show NI), superiority, inferiority, or ineffective active control Stopping for NI less likely Difficult to show statistically without a lot of data Even if NI is shown, you may want to continue to see if superiority can be shown No ethical reason to stop for NI (unless clear superiority is already demonstrated) Use repeated CIs (with group sequential error spending principles) and predicted intervals Evans et.al., DIJ, 2007 Interim Result: What Would You Recommend? CLINICALLY INFERIOR CLINICALLY NONINFERIOR A C Noninferiority: H O : p 1 -p 2 < - H A : p 1 -p 2 - B D F E STATISTICALLY INFERIOR STATISTICALLY SUPERIOR - 0 Noninferiority Margin p 1 -p 2
Analyses 2-sided CIs are generally used P-values are not appropriate Interpret the CI as being able to rule out effects Note that the NI margin plays a direct role in interpretation and thus must be justified Compare response rate, adherence, etc. of active control relative to historical trials which provided evidence of its efficacy If active control displays different efficacy than in prior trials vs. placebo, then the validity of the pre-defined NI margin may be suspect and the interpretation of treatment differences is challenging Check constancy assumptions ITT vs. Per Protocol ITT is not necessarily conservative Poor adherence, missing data, inadvertent enrollment, misclassification, measurement error and treatment crossovers may make treatments appear similar ITT and Per Protocol analyses should both be conducted Sensitivity analyses are important Need consistency of qualitative result (CPMP, 2000, points to consider)
Missing Data Assess direction of bias from imputation May wish to utilize a creative (biased) imputation method for missing data E.g., binary endpoint: impute success for the active control and failure for new intervention If you still show NI, then it is not because of missing data Very conservative Some have suggested imputation consistent with the null hypothesis of the NI trial E.g., continuous endpoint: impute reasonable expected value (i) for the active control and (i-m) for the new intervention E.g., binary data: impute expected proportion (p) for active control and (p-m) for new intervention Then analyze using analysis of means Switching from Noninferiority to Superiority Generally appropriate to calculate p-value for superiority after showing NI No multiplicity adjustment necessary Closed test procedure Use ITT (consistent with superiority evaluation) Define plans a priori when possible
Switching from Superiority to Noninferiority Generally not acceptable to go from failing to demonstrate superiority to NI Unless a NI margin was pre-specified Post-hoc definition of NI margin is difficult to justify Choice of NI margin needs to be independent of trial data Concerns Is the control group an appropriate control for a NI study? i.e., with demonstrated superiority to placebo Was the efficacy displayed by the control group similar to that shown in trials vs. placebo (constancy)? ITT and PP become equally important Trial quality must be high Poor adherence, drop-out, etc. could drive NI Switching from Superiority to Noninferiority Very difficult to justify but may be feasible if: NI margin was pre-specified (or can be justified) Must be based on external information and not chosen to fit the data ITT and PP show similar results The trial was of high quality with few drop-outs and good adherence The control group displayed similar efficacy to trials vs. placebo The trial was sensitive enough to detect effects
Changing the NI Margin Generally okay to decrease An increase should be justified from external data (independent of trial) Occasionally new external data may change views of what the NI margin should be Construed as manipulation if not well justified Equivalence Not too much more AND not too much less 2-sided Common in PK studies (bioequivalence of drug levels) Too little not efficacious Too much toxic
Reporting Historical problems with reporting Greene et.al. AIM, 2000 Reviewed reporting of 88 trials 67% inappropriately concluded NI based on nonsignificant superiority tests Only 23% pre-specified a NI margin Use CONSORT Statement Extension Piaggio et.al., JAMA, 2006 Summary NI trials are very complex and require careful design, conduct, analyses, and reporting Understanding the intricacies of NI is essential for clinicians, statisticians, regulators, and journal editors NI CANNOT be concluded from nonsignificant tests for superiority The NI margin must selected very carefully NI trials should not be conducted when the appropriate assumptions do not hold Consider superiority trials as an alternative when possible
Useful References General Guidance FDA Draft Guidance (March 2010). EMEA: CHMP: Guidance on Choice of NI Margin (2005). EMEA: CHMP: Points to Consider: Switching between Superiority and NI (2000). Discussion of Issues Evans SR, Chance, 22:3:53-58, 2009. D Agostino et.al., SIM, 2003. Fleming, SIM, 2007. Gau and Ware, SIM, 2007. Reporting Greene et.al., AIM, 2000. Piaggio et.al., JAMA, 2006. Brazil World Cup 2010