Discussion Leaders. Workshop Outline. David J. Vanness, PhD. Ágnes Benedict, MA, MS. Background

Similar documents
1 Executive summary. Background

GUIDELINE COMPARATORS & COMPARISONS:

Comparing treatments evaluated in studies forming disconnected networks of evidence: A review of methods

An Introduction to Using WinBUGS for Cost-Effectiveness Analyses in Health Economics

Annual Rheumatology & Therapeutics Review for Organizations & Societies

Hypothetical Scenario

Indirect Comparisons: heroism or heresy. Neil Hawkins With acknowledgements to Sarah DeWilde

Models for potentially biased evidence in meta-analysis using empirically based priors

Population-adjusted treatment comparisons Overview and recommendations from the NICE Decision Support Unit

ISPOR Task Force Report: ITC & NMA Study Questionnaire

Technology appraisal guidance Published: 22 February 2012 nice.org.uk/guidance/ta247

NICE DECISION SUPPORT UNIT

Bayesian Meta-Analysis in Drug Safety Evaluation

Comparison of Meta-Analytic Results of Indirect, Direct, and Combined Comparisons of Drugs for Chronic Insomnia in Adults: A Case Study

Effective Health Care Program

Your submission states that the adjusted indirect comparison methodology was based largely on the publication by Glenny et al

5/10/2010. ISPOR Good Research Practices for Comparative Effectiveness Research: Indirect Treatment Comparisons Task Force

Using Statistical Principles to Implement FDA Guidance on Cardiovascular Risk Assessment for Diabetes Drugs

Statistical considerations in indirect comparisons and network meta-analysis

Technology appraisal guidance Published: 11 October 2017 nice.org.uk/guidance/ta480

Hierarchical network meta-analysis models to address sparsity of events and differing treatment classifications with regard to adverse outcomes

Model-Based Drug Development. Russell Reeve, Innovation, Quintiles May 22, 2012 Society of Clinical Trials

Fundamental Clinical Trial Design

Critical appraisal of pharmacoeconomic studies comparing TNF-α antagonists for rheumatoid arthritis treatment

Technology appraisal guidance Published: 9 August 2017 nice.org.uk/guidance/ta466

Applications of Bayesian methods in health technology assessment

Recent developments for combining evidence within evidence streams: bias-adjusted meta-analysis

Center for Evidence-based Policy

Public observer slides

Bayesian methods for combining multiple Individual and Aggregate data Sources in observational studies

Comparative Effectiveness Research Collaborative Initiative (CER-CI) PART 1: INTERPRETING OUTCOMES RESEARCH STUDIES FOR HEALTH CARE DECISION MAKERS

Scottish Medicines Consortium

Meta-Analysis. Zifei Liu. Biological and Agricultural Engineering

Further cost-effectiveness analysis of sequential TNF inhibitors for rheumatoid arthritis patients

Comments from Wyeth on the Assessment Report for the appraisal of Enbrel in RA General Comments

Using Indirect Comparisons to Support a Health Technology Assessment (HTA)

Abatacept (Orencia) for active rheumatoid arthritis. August 2009

Project Funded under FP7 - HEALTH Grant Agreement no Funded under FP7 - HEALTH Grant Agreement no

Methods for assessing consistency in Mixed Treatment Comparison Meta-analysis

Comparative Effectiveness Research Collaborative Initiative (CER-CI) PART 1: INTERPRETING OUTCOMES RESEARCH STUDIES FOR HEALTH CARE DECISION MAKERS

Systematic reviews and meta-analyses of observational studies (MOOSE): Checklist.

Accounting for treatment switching/discontinuation in comparative effectiveness studies

Cohen: Well, hi to my listeners, this is Dr. Marc Cohen, and I am happy again to discuss with you advances in the efficacy and safety of TNF

Analysis methods for improved external validity

Beyond the intention-to treat effect: Per-protocol effects in randomized trials

How Valuable are Multiple Treatment Comparison Methods in Evidence-Based Health-Care Evaluation?

Technology appraisal guidance Published: 25 August 2010 nice.org.uk/guidance/ta195

CADTH Therapeutic Review Panel

Version No. 7 Date: July Please send comments or suggestions on this glossary to

Network Meta-Analysis for Comparative Effectiveness Research

Individual Participant Data (IPD) Meta-analysis of prediction modelling studies

The use of continuous data versus binary data in MTC models: A case study in rheumatoid arthritis

Instrumental Variables Estimation: An Introduction

An Empirical Assessment of Bivariate Methods for Meta-analysis of Test Accuracy

Research in Real-World Settings: PCORI s Model for Comparative Clinical Effectiveness Research

Technology appraisal guidance Published: 4 June 2015 nice.org.uk/guidance/ta340

GRADE: Applicazione alle network meta-analisi

DERBYSHIRE JOINT AREA PRESCRIBING COMMITTEE (JAPC)

Missing data. Patrick Breheny. April 23. Introduction Missing response data Missing covariate data

Live WebEx meeting agenda

Systematic Reviews and Meta- Analysis in Kidney Transplantation

Clinical research in AKI Timing of initiation of dialysis in AKI

Overview of Paediatric Investigation Plan (PIP) in Paediatric Rheumatology

Controlled Trials. Spyros Kitsiou, PhD

HEDS Discussion Paper 06/12

Bayesian graphical models for combining multiple data sources, with applications in environmental epidemiology

Evidence Informed Practice Online Learning Module Glossary

STUDY DESIGNS WHICH ONE IS BEST?

Cost-effectiveness of apremilast (Otezla )

Meta-analysis of few small studies in small populations and rare diseases

Treatment changes in cancer clinical trials: design and analysis

You must answer question 1.

Empirical evidence on sources of bias in randomised controlled trials: methods of and results from the BRANDO study

Design and Analysis Plan Quantitative Synthesis of Federally-Funded Teen Pregnancy Prevention Programs HHS Contract #HHSP I 5/2/2016

Incorporating Clinical Information into the Label

Technology appraisal guidance Published: 28 November 2018 nice.org.uk/guidance/ta547

CAN EFFECTIVENESS BE MEASURED OUTSIDE A CLINICAL TRIAL?

Evaluation Models STUDIES OF DIAGNOSTIC EFFICIENCY

Modelling the cost-effectiveness of anti-tnfs for the treatment of psoriatic arthritis PLEASE DO NOT REPRODUCE

Lecture Outline. Biost 590: Statistical Consulting. Stages of Scientific Studies. Scientific Method

About Reading Scientific Studies

Modelling heterogeneity variances in multiple treatment comparison meta-analysis Are informative priors the better solution?

heterogeneity in meta-analysis stats methodologists meeting 3 September 2015

GRADE. Grading of Recommendations Assessment, Development and Evaluation. British Association of Dermatologists April 2014

School of Population and Public Health SPPH 503 Epidemiologic methods II January to April 2019

Bayesian Confidence Intervals for Means and Variances of Lognormal and Bivariate Lognormal Distributions

What is indirect comparison?

Advanced IPD meta-analysis methods for observational studies

Fixed Effect Combining

Overview and Comparisons of Risk of Bias and Strength of Evidence Assessment Tools: Opportunities and Challenges of Application in Developing DRIs

EXPERIMENTAL RESEARCH DESIGNS

Meta Analysis. David R Urbach MD MSc Outcomes Research Course December 4, 2014

Bayesian and Frequentist Approaches

ICH E9(R1) Technical Document. Estimands and Sensitivity Analysis in Clinical Trials STEP I TECHNICAL DOCUMENT TABLE OF CONTENTS

Supplement 2. Use of Directed Acyclic Graphs (DAGs)

Methods Research Report. An Empirical Assessment of Bivariate Methods for Meta-Analysis of Test Accuracy

Payer/HTA Perspective

Meta-analysis: Basic concepts and analysis

Division of Epidemiology and Biostatistics. Induction day 2016/17 MSc Epidemiology & Biostatistics

Transcription:

Discussion Leaders David J. Vanness, PhD Assistant Professor, Department of Population Health Sciences, University of Wisconsin School of Medicine and Public Health Ágnes Benedict, MA, MS GENERALIZED EVIDENCE SYNTHESIS IN COMPARATIVE EFFECTIVENESS RESEARCH: COULD THE EVIDENCE BASE BE BROADENED IN MIXED TREATMENT COMPARISONS? European Director, Economic Modeling and Simulations, United BioSource Corporation David J. Vanness, PhD Ágnes Benedict, MA, MSc 2 Workshop Outline Background Section 1: Study Considerations in Evidence Synthesis in CER Section 2: Case Studies in Generalized Evidence Synthesis Section 3: A Simulated Teaching Example Discussion 3 1

Fishing expeditions Two approaches to evidence synthesis: Restrictive inclusion criteria (traditional meta-analysis)» RCT only (possibly restricted further on quality)» Closely matched sample characteristics and interventions» Goal is to reduce heterogeneity and improve internal validity Expansive inclusion criteria (generalized evidence synthesis)» RCTs and observational studies» Wide tolerance for sample characteristics/interventions» Goal is to control for heterogeneity/improve generalizability Needs for CER CER is pragmatic: goal is to inform real-world decisions Heterogeneous clinical practice» Treatment switches, discontinuation» Wider variety of patient and clinician behaviors than in controlled trial environment Heterogeneous treatment effects» Effect modifiers may imply definable subgroups Desire to use CER for personalized guidelines (see, e.g., Basu 2011) Which is most useful for CER/Patient centered outcomes research? Methods for CER Evidence generation Evidence synthesis Decision analysis Meta-analysis methods» Classical frequentist meta-analyses» Bayesian meta-analyses / Bayesian hierarchical models» Meta-regressions with either of the above Meta-analysis evidence base:» Aggregate data» Combination of IPD and aggregate study level results Generalized evidence synthesis Combining RCTs & non-rcts, various outcome types (e.g. hazard ratio and count data for survival outcomes (Wood et al. 2010); individual patient data with aggregate data (Lambert 2007), data on humans / other species with adequate methods Evidence base considerations in Generalized Evidence Synthesis Impact of heterogeneity A lot: can increase external validity Biases Observational studies:» Confounding threat to internal validity» Selection bias threat to internal validity RCTs:» Publication Bias threat to external validity» Sponsorship Bias threat to external validity 8 2

Examples of Generalized Evidence Synthesis Decision Modeling HTA of sequential use of anti-tnf therapies in RA in patients who failed 1st line anti-tnf therapy Bayesian hierarchical models of observational and RCT evidence Comprehensive Decision Modeling Decision analysis models already combine multiple sources of evidence (and are really structured evidence syntheses) So, if a meta-analysis is to be used together with a decision analysis for CER, is there a need to restrict the evidence base? Use meta-analysis to summarize evidence base and build results directly into decision analysis models using Bayesian method (Cooper, Sutton, Abrams et al; 2004 and Cooper, Sutton and Abrams 2002) 9 GES: HTA of sequential use of anti-tnf therapies in RA Objective: to estimate ACR20, 50 and 70 responses on various treatments for the population of interest: patients with rheumatoid arthritis who failed 1 st line anti-tnf therapy Treatments considered: adalimumab, etanercept, infliximab, abatacept, rituximab Evidence base: RCTs in anti-tnf failure population: REFLEX (rituximab), ATTAIN (abatacept), RADIATE (tocilizumab), GO-AFTER (golimumab) with the latter two not in the scope of the NICE HTA RCTs in anti-tnf naïve patient population for all comparators 11 GES: HTA of sequential use of anti-tnf therapies Other evidence: 5 studies of various designs, providing comparative evidence on ACR responses (i.e. not only one arm ) ReAct study prospective open-label, multicentre study on adalimumab; including 6610 patients, 899 had a history of previous etanercept and/or infliximab therapy; 12 wks Buch 2005: (n=207, 12 wks) prospective study of consecutive patients, treated with infliximab, qualified as nonresponders, staying on infliximab or switched to etanercept Wick 2005 (n=27, 24 wks) retrospective analyses of patients in STURE registry, switching between anti-tnfs Nikas 2006 (n=24, 52 wks) open label, single-center comparative study of switchers from infliximab to adalimumab to adalimumab only controls Furst 2007 (n=27, 15 wks) randomised, open-label, clinical trial of 28 patients with an inadequate response to etanercept, receiving background methotrexate and randomised 1:1 to discontinue etanercept and receive infliximab or to continue etanercept. 12 3

GES: HTA of Sequential use of anti-tnf therapies Manufacturer: Abbott SP Roche BMS GES: HTA of Sequential use of anti-tnf therapies Criticisms by NICE: Included studies 29 RCTs + 5 non RCT comparative study 34 RCTs 3 RCTs and 18 RCTs 4 RCTs NICE could not quantify relative effect of a second TNF inhibitor in comparison with either conventional DMARDs or alternative biological DMARDs. Population covered # within/out of scope (per NICE) Method of Analyses Late RA TNF naïve and TNF Early & Late RA TNF naïve and TNF TNF IR population IR IR DMARD IR population 2 / 34 2 / 32 2 / 1 18/0 Bayesian hierarchical model, with metaregression Network metaanalysis, adjusted for disease duration Bayesian Indirect comparison TNF IR population 2 / 2 Bayesian Indirect comparison use of data from populations beyond the scope of the appraisal [..] was inappropriate because of variability of studies from which the data were taken exchangeability of relative treatment effects between the included studies could not be assumed and thus the validity of the results [by manufacturers] was questionable NICE CE model employed various arbitrary assumptions 13 14 60% 50% 40% 30% 20% 10% 0% 50% 51% 50% 46% 44% 34% 20% 18% 18% 15% 16% TNF IR IC 1 MRMTC 2 GO AFTER (w14) REFLEX (w24) ATTAIN (w24) TNF IR IC 1 MRMTC 2 GO AFTER (50mg/w24) GO AFTER (100mg/w24) REFLEX (rit) wk 24 ATTAIN (aba) wk 24 27% 27% 23% 18% 20% 20% 4% 6% 6% 5% 4% TNF IR IC 1 MRMTC 2 GO AFTER (wk 14) REFLEX (wk 24) ATTAIN (wk 24) TNF IR IC 1 MRMTC 2 GO AFTER (50mg/w24) GO AFTER (100mg/w24) REFLEX (rit) wk 24 ATTAIN (aba) wk 24 Examples of Published Generalized Evidence Syntheses Authors, year Disease area/treatment Type of studies Type of Model Outcome Covariate or bias adjustment AngioJet 3-level Odds ratio, short Grines et al thromboectomy 2 RCT + 9 non-rct hierarchical term mortality no 2008 vs PCI model rate Abdominal 4 RCT / 40 3-level Odds ratio, perioperative study or study arm al. 2010 Aneurysms: age, gender, CVD McCarron et Aortic observational (75 with hierarchical covariate imputation) ti model + various mortality levell EVAR vs OSR; 2 and 3-level Prevost et Breast cancer 4 RCT / 4 Relative risk,bc hierarchical study level, age al. 2000 screening "observational" mortality, model Sampath et al. 2007 Sutton & Abrams 2001 Loop diuretics Electronic fetal heart rate monitoring 5 RCT / 8 non-rct 3-level hierarchical model 9 RCT / 7 3-level comparative cohort / hierarchical 10 before-after model + various Risk ratio, mortality age, % males, control arm risk, quality score, study termination date Risk difference, perinatal publication year mortality DMARD Biologics DMARD Biologics 15 ACR20 ACR50 16 4

GES: Sutton and Abrams 2001 Background: widespread use of electronic fetal heart rate monitoring (EFM) in early 1970s in UK coincided with a decrease in overall perinatal mortality rate Evidence-based reviews conclude that EFM has not been shown to reduce perinatal mortality Conclusion ignores observational evidence: 17 studies Objective to include all evidence in a Bayesian three level hierarchical model with random effect for study type 2-level model: 3-level BH model: 3-level BH model: Sutton & Abrams, 2001. 3-level BH model: 17 18 McCarron et al. 2010: EVAR vs OSR McCarron et al. 2010: AAA Applied Bayesian hierarchical model to a subset of studies from a systematic review of endovascular (EVAR) and open surgical repair (OSR) in the treatment of abdominal aortic aneurysms (AAA). Examined odds ratios of peri-operative mortality Tested covariates for aggregate values for patient characteristics (e.g., mean age) within studies and two approaches for downweighting biased evidence 19 20 5

Generalized Evidence Synthesis Findings Change in estimate of mean effects (overall, and by study) with 3-level model depends on level of disparity between results by study type May result in wider credible interval for mean effect if very different results by study type Priors for means and variances can be adjusted to give greater weight to RCTs or assert that observational studies will be more heterogeneous / biased McCarron et al.: AAA. 2010 No standard method for choice of priors Impact of priors varied across studies Priors for between study heterogeneity seem influential No adjustment for publication bias, selection bias 21 Covariate Adjustment Limited by data availability Selection requires prior knowledge of key covariates Covariate adjustment needs to be done at the appropriate level: at aggregate study level does not account for covariate imbalance in non-randomized studies Imputation of additional covariates may have large impact 23 6

Illustration by Simulation Major Goal: use simulated environment to show potential benefits of using a broad evidence base Secondary Goal: walk through simple meta-regression MTC WinBUGS code Overall message: restricting synthesis to only evidence of high internal validity y( (i.e., RCTs) is no guarantee of suitability for CER when there are threats to generalizability Meant to illustrate does not prove including observational evidence is always a good idea Context Monte Carlo methods used to simulate the process of evidence generation Three hypothetical treatments: 1, 2 and 3 Observable (C) and unobservable (U) individual characteristics affect outcomes differently for the 3 treatments Goal is to compare effectiveness (success rate) of all 3 treatments for a subgroup defined by C Directed Acyclic Graph (DAG) Simulated Observational Studies (Pearl 1995; Greenland, Pearl and Robins 2002) Our simulated observational studies suffer from endogenous treatment selection: Individual patients more likely to select treatment with the highest probability of Door success for them Selection probability depends both on observable (C) and unobservable (U) factors C Both of these factors also affect outcomes creates treatment selection bias T Y U 7

Two back door paths induce correlation between T and Y, confounding the treatment effect pathway Population Distribution of Observable Confounder C C Better for T2 Better for T3 Population Distribution of C T U Y No effect on T1 0 C Target population: individuals with values of C near zero. Distribution of Unobservable Confounder U Better for T2 and T3 Population Distribution of U No effect on T1 0 U Simulated Randomized Controlled Trials In our simulated RCTs, treatments are assigned randomly: no treatment selection bias We also assume the trial is double blinded; no attrition or other threats to internal validity However, we simulate two threats to external validity (generalizability): Publication Bias Statistically significant and/or favorable results are more likely to be published: Significant Negative: 0% Insignificant Negative: 25% Significant Positive: 100% Insignificant Positive: 50% Sponsorship Bias Propensity for sponsors to conduct trials with high assurance (sponsored intervention statistically significantly better) Note: C and U are correlated: ρ = 0.25, but trial sponsors do not know this. 8

Simulating Sponsorship Bias Assume Treatment 1 is placebo and that Treatments 2 and 3 have sponsors who decide: Whether their trial will be head to head (2 vs 3) or placebo controlled (2 vs 1 or 3 vs 1) or whether to do any trial at all! Assume sponsors of treatments 2 and 3 have a limited research budget: Each can afford 10 2-arm trials of size 50-150 per arm Targeting Trials to Subpopulations Define a sample population with mean C = μ C Sponsors believe effectiveness is: T1: Pr(Success) = Φ(-0.4) T2: Pr(Success) = Φ(0.2 0.6 μ C ) T3: Pr(Success) = Φ(0 + 0.4 μ C ) Note: at desired population μ C =0, actual rates: T1: Pr(Success) = 0.34 T2: Pr(Success) = 0.58 T3: Pr(Success) = 0.50 With odds ratios: 2 versus 1: = 2.62 3 versus 1: = 1.90 3 versus 2: = 0.73 Deciding Which RCTs Get Done Sponsor 2: Given μ C and a fixed (budget determined) sample size and assuming 5% Type I error rate, calculate the power for trials of 2 vs. 1 and 2 vs. 3. Choose the trial type with the greatest power. Choose no trial if power < 80% Sponsor 3: Do the analogous exercise for trials of 3 vs. 1 and 3 vs. 2 Repeat using different randomly generated values of μ C until 10 trials have been conducted by each sponsor. An example First we randomly draw a target sample mean µ C (say 0.4822) This could be achieved by restrictive trial inclusion criteria. Population Distribution of C Sample distribution of C 0 µ C = 0.4822 C 9

Example (continued) T1: Pr(Success) = Φ(-0.4) = 0.345 T2: Pr(Success) = Φ(0.2 0.6 x 0.4822) = 0.464 T3: Pr(Success) = Φ(0 + 0.4 x 0.4822) = 0.576 Suppose (perhaps due to budget constraints), the sponsors consider trials with sample size of 150 patients per arm. From Sponsor 2 s point of view, a 2 vs. 3 trial would not be considered and a 2 vs. 1 trial would have low assurance (power of 55.9%); so no trial is done by Sponsor 2. Example (continued) But from Sponsor 3 s point of view, in the subpopulation with µ C =0.4822: 3 vs. 1 trial has 98.2% power 3 vs. 2 trial has 49.0% power So, a simulated 3 vs. 1 trial is done: Arm 1: Actual µ C =0.405, 57/150 successes Arm 2: Actual µ C =0.368, 84/150 successes 95% CI for difference in proportion: (0.250-0.475, P < 0.001) The favorable results are published with probability 1. Simulating Observational Studies Conduct 3 observational studies with sample size 500-2,000 Sample drawn from population distribution of C and U and no publication or sponsorship bias Biased treatment selection with probability proportional to actual probability of success for each patient i. T1: Pr(Success) = Φ(-0.4) T2: Pr(Success) = Φ(0.2 0.6 C i 0.2 U i ) T3: Pr(Success) = Φ(0 + 0.4 C i 0.2 U i ) Implies patients and physicians have private knowledge of U i and are more likely to choose treatments with the best chance of success. Three Approaches to Evidence Synthesis After 10 RCTs from each sponsor and 3 observational studies are complete, we synthesize the evidence using one of 3 approaches: 1. Full MTC meta regression evidence synthesis Use all 20 RCTs and 3 observational studies Unconstrained baseline (weakest assumption) Mixed effect model for treatment effects» Random effect reflects between trial variation due to unobserved factors» Fixed effect adjustment of treatment effect for trial arm level µ C» Implies conditional exchangeability studies are exchangeable conditional on armlevel µ C 10

Unconstrained Baseline model{ Random treatment effect Meta regression: treatment effect interaction with observed arm-level covariate C[i] WinBUGS code } #Likelihood for(i in 1:N_ARMS){ SUCCESS[i] ~ dbin(p[i],n_patients[i]) logit(p[i]) <- mu[study[i]] + delta[i]*(1-equals(treatment[i],1)) delta[i] ~ dnorm(mu_delta[i],prec) mu_delta[i] <- b[1]*c[i]*equals(treatment[i],2) + b[2]*c[i]*equals(treatment[i],3) + r[treatment[i]] - r[control[i]] } #Priors prec <- pow(sd,2) #Random Effect Variance sd ~ dunif(0,10) for(j in 1:N_STUDIES){ mu[j] ~ dnorm(0,0.001) #Study Baselines } r[1] <- 0 for (k in 2:N_TREATMENTS){ r[k] ~ dnorm(0,0.001) #Treatment Effects on Log-Odds Scale } #Prediction at C=0 log(or21) <- r[2] log(or31) <- r[3] log(or32) <- r[3] - r[2] Three Approaches to Evidence Synthesis 2. MTC meta-regression Same as MTC Mixed Effect Meta-Regression generalized evidence synthesis, except Exclude observational studies 3. Narrow-Scope MTC Exclude observational studies Exclude RCTs with µ C < -1 or µ C > 1 No meta-regression to control for µ C Comparison to Traditional Pairwise Meta-Analysis To connect to classical meta-analysis, consider the following results using the method of DerSimonian and Laird 11

True OR 2 versus 1: = 2.62 True OR 3 versus 1: = 1.90 OR (lower 95% upper) 1.66 4.30 2: x.bar=0.102 2.67 5: x.bar=-0.331 3.93 2.40 6.42 10: x.bar=-0.575 5.61 3.31 9.50 21: x.bar=-0.105 4.10 2.84 5.93 22: x.bar=-0.115 2.93 2.14 4.01 23: x.bar=-0.059 4.86 3.40 6.95 ------------------------------------ SummaryOR= 3.83 95% CI ( 3.07,4.78 ) Test for heterogeneity: X^2( 5 ) = 8.82 ( p-value 0.1163 ) Estimated random effects variance: 0.03 OR (lower 95% upper) 3.48 1.57 7.74 14: x.bar=0.799 16: x.bar=0.731 3.56 2.18 5.83 19: x.bar=0.396 1.77 0.99 3.18 20: x.bar=0.168 2.84 1.73 4.67 21: x.bar=0.189 2.73 1.88 3.95 22: x.bar=0.17 1.94 1.40 2.69 23: x.bar=0.264 3.09 2.17 4.40 ------------------------------------ SummaryOR= 2.62 95% CI ( 2.16,3.18 ) Test for heterogeneity: X^2( 6 ) = 7.95 ( p-value 0.2417 ) Estimated random effects variance: 0.02 OR (lower 95% upper) 0.06 0.03 0.13 1: x.bar=-1.327 3: x.bar=-0.927 0.26 0.14 0.48 4: x.bar=-0.978 0.14 0.08 0.27 6: x.bar=-1.422 0.04 0.02 0.10 7: x.bar=-1.41 0.04 0.02 0.09 8: x.bar=-0.755 0.31 0.18 0.54 9: x.bar=-1.579 0.03 0.01 0.08 11: x.bar=0.936 4.22 2.24 7.96 12: x.bar=1.325 4.75 2.36 9.58 13: x.bar=0.818 2.71 1.60 4.57 15: x.bar=1.013 4.38 2.16 8.87 17: x.bar=1.146 5.20 2.87 9.44 18: x.bar=1.184 4.17 2.52 6.92 ------------------------------------ SummaryOR= 0.53 95% CI ( 0.18,1.52 ) Test for heterogeneity: X^2( 12 ) = 410.97 ( p-value 0 ) Estimated random effects variance: 3.67 True OR 3 versus 2: = 0.73 Now, the 3 Bayesian a Evidence Synthesis s Approaches 12

True OR 2 versus 1: = 2.62 True OR 3 versus 1: = 1.90 All three approaches contained the true odds ratio, but inference was tightest for the full MTC with all data and mixedeffect meta-regression Rinse and Repeat True OR 3 versus 2: = 0.73 Repeat the exercise a large number of times (say, 500) to investigate the following: Coverage: what percent of the time do the 95% credible intervals as constructed actually contain the true odds ratio for each method? Credible Interval Width: which method provides the most precise inference Average Prediction Error: which method gets closest to the true odds ratios on average 13

Expected Coverage Expected 95% Credible Interval Width The full MTC with observational data and meta-regression had closest to the planned 95% probability of containing the true odds ratio. The full MTC with observational data and meta regression hd had the narrowest credible intervals. Expected Average Prediction Error How reliable are the 3 methods? The full MTC with observational data and meta regression were the least biased. Expected Average Prediction Error tells us whether we tend to get the answer right on average But we also care about minimizing how far off we are likely to be in any given analysis. Looking at the Distribution of Average Prediction Error is somewhat analogous to the standard error of the mean. 14

Average Prediction Error Distribution Average Prediction Error Distribution The full MTC with observational data and meta-regression had the tightest distribution of prediction errors. Average Prediction Error Distribution Summary: Lessons Learned from This Simulation Automatically excluding observational evidence because of threats to internal validity may not necessarily improve inference given an entire body of evidence and indeed can make it worse. Even if all evidence is internally valid, evidence synthesis may not present good inference for CER if publication and sponsorship bias are present. As in all things, a rule of reason applies. 15

Discussion Helpful References Basu A. Economics of individualization in comparative effectiveness research and a basis for a patient-centered health care. Journal of Health Economics 2011; In Press. Cooper NJ, Sutton AJ, Abrams KR, Turner D, Wailoo A. Comprehensive decision analytical modelling in economic evaluation: A Bayesian approach. Health Economics 2004; 13(3) 203-226 Cooper NJ, Sutton AJ, Abrams KR. Decision analytical economic modeling within a Bayesian framework: Application to prophylactic antibiotics use for caesarean section. Statistical Methods in Medical Research 2002; 11: 491-512. Cooper NJ, Sutton AJ, Morris D, Ades AE, Welton NJ. Addressing between-study heterogeneity and inconsistency in mixed treatment comparisons: Application to stroke prevention treatments in individuals with non-rheumatic atrial fibrillation. Statistics in medicine. 2009;28(14):1861 1881. Nixon RM, Bansback N, Brennan A. Using mixed treatment comparisons and metaregression to perform indirect comparisons to estimate the efficacy of biologic treatments in rheumatoid arthritis. Statistics in medicine. 2007;26(6):1237 1254. 16