Summarising and validating test accuracy results across multiple studies for use in clinical practice

Similar documents
Individual Participant Data (IPD) Meta-analysis of prediction modelling studies

Meta-analysis of diagnostic test accuracy studies with multiple & missing thresholds

Systematic reviews of prognostic studies 3 meta-analytical approaches in systematic reviews of prognostic studies

Introduction to diagnostic accuracy meta-analysis. Yemisi Takwoingi October 2015

EVIDENCE-BASED GUIDELINE DEVELOPMENT FOR DIAGNOSTIC QUESTIONS

Systematic reviews of prognostic studies: a meta-analytical approach

Supplementary Online Content

The spectrum effect in tests for risk prediction, screening, and diagnosis

Systematic Reviews of Studies Quantifying the Accuracy of Diagnostic Tests and Markers

Introduction to Meta-analysis of Accuracy Data

Quantifying the Added Value of a Diagnostic Test or Marker

External validation of clinical prediction models using big datasets from e-health records or IPD meta-analysis: opportunities and challenges

Meta-analysis of external validation studies

Building Reality Checks into the Translational Pathway for Diagnostic and Prognostic Models

Overview of Multivariable Prediction Modelling. Methodological Conduct & Reporting: Introducing TRIPOD guidelines

Value of symptoms and additional diagnostic tests for colorectal cancer in primary care: systematic review and meta-analysis

Methods for Evaluating Medical Tests and Biomarkers Birmingham, UK July 2016

SYSTEMATIC REVIEWS OF TEST ACCURACY STUDIES

Assessing variability in results in systematic reviews of diagnostic studies

Quantifying the variability of optimal cutpoints and reference values for diagnostic measures

Statistical considerations in indirect comparisons and network meta-analysis

Assessment of performance and decision curve analysis

Checklist for Diagnostic Test Accuracy Studies. The Joanna Briggs Institute Critical Appraisal tools for use in JBI Systematic Reviews

Clinical Epidemiology for the uninitiated

Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy

Systematic Reviews and meta-analyses of Diagnostic Test Accuracy. Mariska Leeflang

Statistical methods for the meta-analysis of full ROC curves

Advanced IPD meta-analysis methods for observational studies

heterogeneity in meta-analysis stats methodologists meeting 3 September 2015

Systematic review of prognostic models for recurrent venous thromboembolism (VTE) post-treatment of first unprovoked VTE

Meta-analysis of diagnostic research. Karen R Steingart, MD, MPH Chennai, 15 December Overview

Statistical methods for the meta-analysis of full ROC curves

MS&E 226: Small Data

Meta-analysis using individual participant data: one-stage and two-stage approaches, and why they may differ

Empirical assessment of univariate and bivariate meta-analyses for comparing the accuracy of diagnostic tests

Meta-analysis of diagnostic accuracy studies. Mariska Leeflang (with thanks to Yemisi Takwoingi, Jon Deeks and Hans Reitsma)

CLINICAL REGISTRIES Use and Emerging Best Practices

SISCR Module 7 Part I: Introduction Basic Concepts for Binary Biomarkers (Classifiers) and Continuous Biomarkers

Multivariate meta-analysis using individual participant data

Empirical evidence that disease prevalence may affect the performance of diagnostic tests with an implicit threshold: a cross-sectional study

Multivariate and network meta-analysis of multiple outcomes and multiple treatments: rationale, concepts, and examples

Reporting and methods in systematic reviews of comparative accuracy

An Introduction to Systematic Reviews of Prognosis

School of Population and Public Health SPPH 503 Epidemiologic methods II January to April 2019

MODEL SELECTION STRATEGIES. Tony Panzarella

Systematic reviews of prediction modeling studies: planning, critical appraisal and data collection

Sensitivity, specicity, ROC

Statistical modelling for thoracic surgery using a nomogram based on logistic regression

Reducing Decision Errors in the Paired Comparison of the Diagnostic Accuracy of Continuous Screening Tests

Downloaded from:

Optimal cut-off score for diagnosing depression with the Patient Health Questionnaire (PHQ-9): a meta-analysis

Murphy S. Optimal dynamic treatment regimes. JRSS B 2003;65(2):

Variable selection should be blinded to the outcome

Module Overview. What is a Marker? Part 1 Overview

Development of restricted mean survival time difference in network metaanalysis based on data from MACNPC update.

Where a licence is displayed above, please note the terms and conditions of the licence govern your use of this document.

Development, validation and application of risk prediction models

Copyright GRADE ING THE QUALITY OF EVIDENCE AND STRENGTH OF RECOMMENDATIONS NANCY SANTESSO, RD, PHD

Systematic review of interferon-gamma release assays in tuberculosis: focus on likelihood ratios

Bayesian meta-analysis of Papanicolaou smear accuracy

Critical reading of diagnostic imaging studies. Lecture Goals. Constantine Gatsonis, PhD. Brown University

Individualized Treatment Effects Using a Non-parametric Bayesian Approach

Most primary care patients with suspected

Clinical course of untreated cerebral cavernous malformations: an individual patient data meta-analysis

Received: 14 April 2016, Accepted: 28 October 2016 Published online 1 December 2016 in Wiley Online Library

Lecture Outline Biost 517 Applied Biostatistics I

Diagnostic methods I: sensitivity, specificity, and other measures of accuracy

Net Reclassification Risk: a graph to clarify the potential prognostic utility of new markers

Systematic Reviews. Madhukar Pai, MD, PhD Associate Professor Department of Epidemiology & Biostatistics McGill University, Montreal, Canada

Hayden Smith, PhD, MPH /\ v._

Systematic reviews and meta-analyses of diagnostic test accuracy

Explicit inclusion of treatment in prognostic modelling was recommended in observational and randomised settings

QUANTIFYING THE IMPACT OF DIFFERENT APPROACHES FOR HANDLING CONTINUOUS PREDICTORS ON THE PERFORMANCE OF A PROGNOSTIC MODEL

Bayesians methods in system identification: equivalences, differences, and misunderstandings

Meta-Analysis. Zifei Liu. Biological and Agricultural Engineering

Prognostic markers in cancer: the evolution of evidence from single studies to meta-analysis, and beyond

Bayesian Inference. Thomas Nichols. With thanks Lee Harrison

Session 7: The Sliding Dichotomy 7.1 Background 7.2 Principles 7.3 Hypothetical example 7.4 Implementation 7.5 Example: CRASH Trial

Seeing the Forest & Trees: novel analyses of complex interventions to guide health system decision making

Perspectives on analysing subgroup effects of clinical trials and their meta analyses

Knowledge Discovery for Clinical Decision Support

TRIPODS Workshop: Models & Machine Learning for Causal I. & Decision Making

research methods & reporting

What is indirect comparison?

Meta-analysis of Diagnostic Test Accuracy Studies

An introduction to individual patient data (IPD) meta-analysis

Bayesian Multiparameter Evidence Synthesis to Inform Decision Making: A Case Study in Metastatic Hormone-Refractory Prostate Cancer

Quantitative challenges of extrapolation

Fixed Effect Combining

Sensory Cue Integration

Design and Analysis Plan Quantitative Synthesis of Federally-Funded Teen Pregnancy Prevention Programs HHS Contract #HHSP I 5/2/2016

Multivariate Mixed-Effects Meta-Analysis of Paired-Comparison Studies of Diagnostic Test Accuracy

Observed Differences in Diagnostic Test Accuracy between Patient Subgroups: Is It Real or Due to Reference Standard Misclassification?

Lecture Outline. Biost 590: Statistical Consulting. Stages of Scientific Studies. Scientific Method

Changing EDSS progression in placebo cohorts in relapsing MS:

UK Liver Transplant Audit

Grading evidence for laboratory test studies beyond diagnostic accuracy: application to prognostic testing

Systematic reviews of diagnostic test accuracy studies

Meta-analysis of safety thoughts from CIOMS X

How to Develop, Validate, and Compare Clinical Prediction Models Involving Radiological Parameters: Study Design and Statistical Methods

Transcription:

Summarising and validating test accuracy results across multiple studies for use in clinical practice Richard D. Riley Professor of Biostatistics Research Institute for Primary Care & Health Sciences Thank you: Brian Willis, Thomas Debray, Kym Snell, Joie Ensor, Jon Deeks, Carl Moons, Julian Higgins, and others 1

Objectives Consider whether meta-analysis results are actually helpful (applicable) to clinical practice Consider methods to examine test performance in multiple settings, that go beyond average results Focus on probabilistic inferences and validation 2

Meta-analysis & heterogeneity Meta-analysis should be producing clinically useful results 3

Meta-analysis & heterogeneity Meta-analysis should be producing clinically useful results Usually there will be heterogeneity in a meta-analysis of test accuracy studies. e.g. differences across studies in: - thresholds reported (we just heard about this!) - methods of measurement - reference standard - population characteristics (case-mix variation) - prevalence of disease 4

Meta-analysis & heterogeneity Meta-analysis should be producing clinically useful results Usually there will be heterogeneity in a meta-analysis of test accuracy studies. e.g. differences across studies in: - thresholds reported (we just heard about this!) - methods of measurement - reference standard - population characteristics (case-mix variation) - prevalence of disease Causes the TRUE sensitivity and TRUE specificity to vary from setting to setting 5

Meta-analysis & heterogeneity Yet this does not stop us (me) from doing meta-analysis Just use a random effects model!!! - accounts for unexplained between-study heterogeneity 6

Meta-analysis & heterogeneity Yet this does not stop us (me) from doing meta-analysis Just use a random effects model!!! - accounts for unexplained between-study heterogeneity Gives a pooled result for sensitivity and specificity And most importantly (?) - another publication for the CV But can the clinician actually use that pooled result? Is the summary sensitivity and specificity applicable to their population? 7

Meta-analysis & heterogeneity Yet this does not stop us (me) from doing meta-analysis Just use a random effects model!!! - accounts for unexplained between-study heterogeneity Gives a pooled result for sensitivity and specificity And most importantly (?) - another publication for the CV But can the clinician actually use that pooled result? Is the summary sensitivity and specificity applicable to their population? Time for us to do better but can we? 8

9

10

11

Going beyond the average? PART 1: Probabilistic inferences for performance in clinical settings 12

Example 1: Accuracy of ear temperature for fever? 11 studies identified that - All used > 38 degrees to define test positive - All used rectal temperature as reference standard - All used the FirstTemp ear thermometer - All used an electronic rectal thermometer Bivariate meta-analysis used to combined 2 by 2 tables Produces summary sensitivity and specificity

Summary sensitivity = 65% (51% to 77%) Summary specificity = 98% (96% to 99%) Sensitivity 0.2.4.6.8 1 1.8.6 Specificity.4.2 0 Study estimate 95% confidence region Summary point

Summary sensitivity = 65% (51% to 77%) Summary specificity = 98% (96% to 99%) But these relate to the average performance across all populations - could sensitivity and specificity be different in particular settings? Sensitivity 0.2.4.6.8 1 1.8.6 Specificity Study estimate 95% confidence region.4.2 Summary point 0

Prediction regions (intervals) indicate the potential true test performance in a single setting Can be derived after a meta-analysis In a Bayesian framework they allow predictive inferences & distributions:

Prediction regions (intervals) indicate the potential true test performance in a single setting Can be derived after a meta-analysis In a Bayesian framework they allow predictive inferences & distributions:

Prediction regions (intervals) indicate the potential true test performance in a single setting Can be derived after a meta-analysis In a Bayesian framework they allow predictive inferences & distributions: What is probability sensitivity and specificity are > 80% in a single setting? Sensitivity 0.2.4.6.8 1 1.8.6 Specificity Study estimate 95% confidence region.4.2 0 Summary point 95% prediction region

Prediction regions (intervals) indicate the potential true test performance in a single setting Can be derived after a meta-analysis In a Bayesian framework they allow predictive inferences & distributions: What is probability sensitivity and specificity are > 80% in a single setting? Sensitivity 0.2.4.6.8 1 1 Probability sensitivity and specificity will fall in this region (values > 80%) in a single setting = 0.18.8.6 Specificity Study estimate 95% confidence region.4.2 0 Summary point 95% prediction region

Example 2: Accuracy of PTH for hypocalcaemia? 5 studies identified that studied parathyroid (PTH) - Patients all had a thyroidectomy - % change in PTH from before to after surgery - Change > 65% indicates test positive - Reference standard measured 48 hours later Accurate PTH test may help to send people home earlier Bivariate meta-analysis used to combined 2 by 2 tables Produces summary sensitivity and specificity

Sensitivity Specificity

Sensitivity Specificity Sensitivity 0.2.4.6.8 1 Probability sensitivity and specificity will fall in this region (values > 80%) in a single setting = 0.57 1.8.6.4 Specificity.2 0 Study estimate Summary point

Going beyond the average? PART 2: Tailoring PPV and NPV to clinical settings (with validation) 23

PPV and NPV PPV: probability of disease given positive test result NPV: probability of non-disease given negative test result Clinicians are usually more interested in PPV and NPV They need to combine sensitivity and specificity with prevalence to obtain them (e.g. using Bayesian theorem, or an equation) But what values of sensitivity, specificity and prevalence do they take from a meta-analysis? Are the derived PPV and NPV reliable? 24

OPTION A: Deriving PPV and NPV Take the pooled sensitivity, specificity and prevalence from meta-analysis of existing studies (though makes no use of local data) 25

OPTION A: Deriving PPV and NPV Take the pooled sensitivity, specificity and prevalence from meta-analysis of existing studies (though makes no use of local data) Applying Bayes Theorem we obtain predictions using: 26

Deriving PPV and NPV OPTION B: Take pooled sensitivity & specificity from meta-analysis Combine with known prevalence in local setting 27

Deriving PPV and NPV OPTION B: Take pooled sensitivity & specificity from meta-analysis Combine with known prevalence in local setting Applying Bayes Theorem we obtain predictions using 28

Deriving PPV and NPV OPTION C: Develop a meta-regression with study-level covariates Then predict PPV and NPV using the fitted model 29

OPTION C: Deriving PPV and NPV Develop a meta-regression with study-level covariates Then predict PPV and NPV using the fitted model e.g. fit bivariate meta-analysis of PPV and NPV from existing studies, with prevalence as a covariate (Leeflang et al.) Obtain predictions using 30

Are predicted PPV and NPV reliable? As for any risk prediction equation, we must check performance Here good calibration is essential Does predicted PPV & NPV agree with observed PPV & NPV? 31

Are predicted PPV and NPV reliable? As for any risk prediction equation, we must check performance Here good calibration is essential Does predicted PPV & NPV agree with observed PPV & NPV? PROBLEM: It is well-known that developed models are over-fitted (over-optimistic) in the data they were developed in (see work by Harrell, Steyerberg, etc) PROPOSAL: Use internal-external cross-validation (Royston et al., Debray et al.)

Internal-external cross-validation (IECV): example with 3 studies Each cycle produces estimates of predictive performance (calibration statistics such as O:E) Can then use meta-analysis to summarise across all cycles

Internal-external cross-validation (IECV) Helps answer the question: If I use meta-analysis to develop a prediction equation (model) for deriving PPV and NPV in a new population, - is it likely to perform well? - if not, will a different strategy work better? 34

Revisit PTH example - can we use options A or B to produce reliable PPV and NPV for particular clinical settings? (i) apply internal-external cross-validation (ii) then summarise performance 35

OPTION A: Predict PPV and NPV from meta-analysis estimates 36

OPTION B: Tailor PPV and NPV using local prevalence 37

OPTION B: Tailor PPV and NPV using local prevalence But large uncertainty due to small number of studies: Probability of only 40% that true O/E is between 0.9 and 1.1 38

NPV is most important for the PTH test We want to know for sure that a patient does not have hypocalcaemia, so we can send them home Q: What is the potential true NPV given a predicted NPV from option B Internal-external cross-validation followed by Bayesian metaanalysis of calibration performance tells us: For a predicted NPV of 0.95, there is a 95% probability the true NPV is between 0.78 and 0.99 - acceptable error? 39

Risk prediction models Our examples focused on test research But principles apply to risk prediction research in general e.g. multivariable models for diagnosis or prognosis Allow the inclusion of patient-level (& study-level) covariates Aim the same: we should want reliable model predictions in all clinical settings, not just on average 40

Time to go beyond the average performance and validate (improve) predictions in each setting, subgroup, and population of interest 41

External validation of a prognostic model in breast cancer patients - does it calibrate well upon validation? OPTION 1: No recalibration OPTION 2: Recalibration of the baseline hazard Country used for external validation Country used for external validation 42

Summary A challenge for us all: - Are our meta-analysis results useful? - Should we move away from summary results? - Focus rather on probabilistic statements? - Leave out studies to validate test performance? Much related work: - in particular tailored meta-analysis of Willis and Hyde - overfitting & recalibration of prediction models (Steyerberg et al.) - IPD meta-analysis of risk prediction studies (Debray et al.) Other clinical measures highly relevant (e.g. net-benefit: Vickers et al)

Some references Willis BH, Hyde CJ. Estimating a test's accuracy using tailored meta-analysis- How setting-specific data may aid study selection. J Clin Epidemiol 2014;67(5):538-46. Riley RD, Ahmed I, Debray TP, et al. Summarising and validating test accuracy results across multiple studies for use in clinical practice. Stat Med 2015;34(13):2081-103 Leeflang MM, Bossuyt PM, Irwig L. Diagnostic test accuracy may vary with prevalence: implications for evidence-based diagnosis. J Clin Epidemiol 2009;62(1):5-12. Leeflang MM, Deeks JJ, Rutjes AW, et al. Bivariate meta-analysis of predictive values of diagnostic tests can be an alternative to bivariate meta-analysis of sensitivity and specificity. J Clin Epidemiol 2012;65(10):1088-97. Leeflang MM, Rutjes AW, Reitsma JB, et al. Variation of a test's sensitivity and specificity with disease prevalence. CMAJ : 2013;185(11):E537-44. Debray TP, Moons KG, Ahmed I, et al. A framework for developing, implementing, and evaluating clinical prediction models in an individual participant data meta-analysis. Stat Med 2013;32(18):3158-8. KEELE COURSE: Statistical methods for IPD meta-analysis, 6-7 th December 2016