Meta-analysis of diagnostic accuracy studies. Mariska Leeflang (with thanks to Yemisi Takwoingi, Jon Deeks and Hans Reitsma)

Similar documents
Systematic Reviews and meta-analyses of Diagnostic Test Accuracy. Mariska Leeflang

Introduction to diagnostic accuracy meta-analysis. Yemisi Takwoingi October 2015

Introduction to Meta-analysis of Accuracy Data

SYSTEMATIC REVIEWS OF TEST ACCURACY STUDIES

Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy

Meta-analysis of diagnostic research. Karen R Steingart, MD, MPH Chennai, 15 December Overview

An Empirical Assessment of Bivariate Methods for Meta-analysis of Test Accuracy

Empirical assessment of univariate and bivariate meta-analyses for comparing the accuracy of diagnostic tests

Meta-analysis of diagnostic test accuracy studies with multiple & missing thresholds

Mariska Leeflang Birmingham, 19 July 2016

Multivariate Mixed-Effects Meta-Analysis of Paired-Comparison Studies of Diagnostic Test Accuracy

Methods Research Report. An Empirical Assessment of Bivariate Methods for Meta-Analysis of Test Accuracy

Meta-analysis of Diagnostic Test Accuracy Studies

Meta-analyses evaluating diagnostic test accuracy

Multivariate Mixed-Effects Meta-Analysis of Paired-Comparison Studies of Diagnostic Test Accuracy

Fixed Effect Combining

MIDAS RETOUCH REGARDING DIAGNOSTIC ACCURACY META-ANALYSIS

Index. Springer International Publishing Switzerland 2017 T.J. Cleophas, A.H. Zwinderman, Modern Meta-Analysis, DOI /

Meta-analysis using RevMan. Yemisi Takwoingi October 2015

Supplementary Online Content

METHODS FOR DETECTING CERVICAL CANCER

State of the Art in Clinical & Anatomic Pathology. Meta-analysis of Clinical Studies of Diagnostic Tests

Assessing variability in results in systematic reviews of diagnostic studies

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

Biostatistics II

Systematic Reviews of Studies Quantifying the Accuracy of Diagnostic Tests and Markers

Unit 1 Exploring and Understanding Data

Bayesian meta-analysis of Papanicolaou smear accuracy

Live WebEx meeting agenda

Reporting and methods in systematic reviews of comparative accuracy

Overview. Goals of Interpretation. Methodology. Reasons to Read and Evaluate

Critical reading of diagnostic imaging studies. Lecture Goals. Constantine Gatsonis, PhD. Brown University

Assessment of methodological quality and QUADAS-2

EVIDENCE-BASED GUIDELINE DEVELOPMENT FOR DIAGNOSTIC QUESTIONS

Studies reporting ROC curves of diagnostic and prediction data can be incorporated into meta-analyses using corresponding odds ratios

Classical Psychophysical Methods (cont.)

The cross sectional study design. Population and pre-test. Probability (participants). Index test. Target condition. Reference Standard

IAPT: Regression. Regression analyses

Review. Imagine the following table being obtained as a random. Decision Test Diseased Not Diseased Positive TP FP Negative FN TN

the standard deviation (SD) is a measure of how much dispersion exists from the mean SD = square root (variance)

Recent developments for combining evidence within evidence streams: bias-adjusted meta-analysis

Statistical methods for the meta-analysis of full ROC curves

How to Conduct a Meta-Analysis

11/24/2017. Do not imply a cause-and-effect relationship

Systematic reviews of prognostic studies 3 meta-analytical approaches in systematic reviews of prognostic studies

3 CONCEPTUAL FOUNDATIONS OF STATISTICS

Atherosclerosis 220 (2012) Contents lists available at ScienceDirect. Atherosclerosis

Lecture Outline. Biost 590: Statistical Consulting. Stages of Scientific Studies. Scientific Method

Meta-Analysis. Zifei Liu. Biological and Agricultural Engineering

Bivariate analysis of sensitivity and specificity produces informative summary measures in diagnostic reviews

Systematic Reviews. Simon Gates 8 March 2007

CRITERIA FOR USE. A GRAPHICAL EXPLANATION OF BI-VARIATE (2 VARIABLE) REGRESSION ANALYSISSys

Types of Data. Systematic Reviews: Data Synthesis Professor Jodie Dodd 4/12/2014. Acknowledgements: Emily Bain Australasian Cochrane Centre

2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0%

Clincial Biostatistics. Regression

Critical Thinking A tour through the science of neuroscience

Cochrane Pregnancy and Childbirth Group Methodological Guidelines

CHAMP: CHecklist for the Appraisal of Moderators and Predictors

STATISTICAL METHODS FOR DIAGNOSTIC TESTING: AN ILLUSTRATION USING A NEW METHOD FOR CANCER DETECTION XIN SUN. PhD, Kansas State University, 2012

Statistics, Probability and Diagnostic Medicine

Systematic reviews of diagnostic test accuracy studies

Statistical methods for the meta-analysis of full ROC curves

Models for potentially biased evidence in meta-analysis using empirically based priors

Correlation and regression

Thresholds for statistical and clinical significance in systematic reviews with meta-analytic methods

BMI 541/699 Lecture 16

Zheng Yao Sr. Statistical Programmer

Weight Adjustment Methods using Multilevel Propensity Models and Random Forests

Clinical Epidemiology for the uninitiated

Study protocol v. 1.0 Systematic review of the Sequential Organ Failure Assessment score as a surrogate endpoint in randomized controlled trials

Clinical research in AKI Timing of initiation of dialysis in AKI

Diagnostic screening. Department of Statistics, University of South Carolina. Stat 506: Introduction to Experimental Design

Business Statistics Probability

Assessment of performance and decision curve analysis

Biostatistics for Med Students. Lecture 1

Choice of axis, tests for funnel plot asymmetry, and methods to adjust for publication bias

University of Wollongong. Research Online. Australian Health Services Research Institute

Longitudinal and Hierarchical Analytic Strategies for OAI Data

Numerical Integration of Bivariate Gaussian Distribution

Regina Louie, Syntex Research, Palo Alto, California Alex Y aroshinsky, Syntex Research, Palo Alto, California

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Novel Approaches and New Methods to Increase Case Detection by Microscopy

Development of restricted mean survival time difference in network metaanalysis based on data from MACNPC update.

GATE CAT Diagnostic Test Accuracy Studies

Sensitivity, specicity, ROC

Evidence Based Medicine

Profile Analysis. Intro and Assumptions Psy 524 Andrew Ainsworth

Thyroid nodules are extremely common and are observed

The recommended method for diagnosing sleep

Meta Analysis. David R Urbach MD MSc Outcomes Research Course December 4, 2014

Summarising and validating test accuracy results across multiple studies for use in clinical practice

Understandable Statistics

Design and Analysis Plan Quantitative Synthesis of Federally-Funded Teen Pregnancy Prevention Programs HHS Contract #HHSP I 5/2/2016

MicroRNAs are novel non-invasive diagnostic biomarkers for pulmonary embolism: a meta-analysis

CHAPTER ONE CORRELATION

Addendum: Multiple Regression Analysis (DRAFT 8/2/07)

18/11/2013. An Introduction to Meta-analysis. In this session: What is meta-analysis? Some Background Clinical Trials. What questions are addressed?

Propensity Score Methods for Causal Inference with the PSMATCH Procedure

A Case Study: Two-sample categorical data

7/17/2013. Evaluation of Diagnostic Tests July 22, 2013 Introduction to Clinical Research: A Two week Intensive Course

Transcription:

Meta-analysis of diagnostic accuracy studies Mariska Leeflang (with thanks to Yemisi Takwoingi, Jon Deeks and Hans Reitsma) 1

Diagnostic Test Accuracy Reviews 1. Framing the question 2. Identification and selection of studies 3. Quality assessment 4. Data extraction 5. Data analysis 6. Interpretation of the results 2

Ultimate t goal of meta-analysis Robust conclusions with respect to the research question(s) 3

Meta-Analysis 1. Calculation of an overall summary (average) of high precision, coherent with all observed data 2. Typically a weighted average is used where more informative (larger) studies have more say 3. Assess the degree to which h the study results deviate from the overall summary 4. Investigate possible explanations for the deviations 4

The (meta-)analytic ti process 1. What analyses did you plan? a. Primary objective b. Subgroups, sensitivity analyses, etc. 2. What are the data at hand? a. Forest plots b. Raw ROC plots c. Variation in predefined covariates? 3. Is meta-analysis analysis appropriate? a. Sufficient clinical/methodological homogeneity b. Enough studies per review question 4. Meta-analysis 5

Summary of which h values? Sensitivity Specificity - Positive Predictive Value Negative Predictive Value Positive Likelihood Ratio Negative Likelihood Ratio Diagnostic Odds ratio ROC curves Disease (Ref. test) Pres. Abs. Index + TP FP Test - FN TN 6

Pooling sensitivity and specificity? it 7

Pooling sensitivity and specificity? it 8

Pooling Likelihood Ratios? 9

Pooling LRs? 10

Pooling odds ratios? 11

Let s focus on sensitivity and specificity Predictive values are directly depending on prevalence Pooling likelihood ratios may lead to misleading / impossible results Pooling odds ratios may be okay, but are difficult to interpret. From the pooled sensitivity and specificity, it is still possible to calculate LRs and PVs. 12

Descriptive Analysis Forest plots point estimate with 95% CI paired: sensitivity and specificity sideby side 13

14

Descriptive Analysis Forest plots point estimate with 95% CI paired: sensitivity and specificity it sideby side ROC plot pairs of sensitivity & specificity in ROC space bubble plot to show differences in precision 15

Plot in ROC Space 1.0 0.8 True positive ra ate 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 False positive rate 16

Different Approaches Pooling separate estimates t Not recommended Summary ROC model Traditional approach, relative simple More complex models Bivariate random approach Hierarchical summary ROC approach 17

Threshold effects 1 6.8 Fetal fibronectin for predicting spontaneous birth Decreasing threshold increases sensitivity but decreases specificity Increasing threshold increases specificity but decreases sensitivity sensitivity.4. 0.2 1.8.6.4 specificity.2 0 18

Implicit and explicit threshold effects Explicit threshold: different thresholds are used for test positivity Implicit threshold: there is no or only one threshold, h but in some cases tests t are earlier regarded as positive than in other cases 19

Explicit threshold: (ROC) curve The ROC curve represents the relationship between the true positive rate (TPR) and the false positive rate (FPR) of the test at various thresholds used to distinguish disease cases from non-cases. Deeks, J. J BMJ 2001;323:157-162 20

Implicit it thresholdh ELISA for invasive aspergillosis; cutoff value 1.5 ODI. 21

Diagnostic odds ratios Ratio of the odds of positivity in the diseased to the odds of positivity in the non-diseased Diagnostic OR TP FP TN FN DOR sensitivity 1 sensitivity 1 specificity specificity LR LR ve ve 22

Diagnostic odds ratios Cervical Cancer (Biopsy) Present Absent HPV + 65 93 158 Test - 7 161 198 72 254 356 DOR 65 161 93 7 16 23

Diagnostic odds ratios Sensitivity Specificity 50% 60% 70% 80% 90% 95% 99% 50% 1 2 2 4 9 19 99 60% 2 2 4 6 14 29 149 70% 2 4 5 9 21 44 231 80% 4 6 9 16 36 76 396 90% 9 14 21 36 81 171 891 95% 19 29 44 76 171 361 1881 99% 99 149 231 396 891 1881 9801 24

Symmetrical ROC curves and diagnostic odds ratios 1.8 Sensitivity.4.6.2 uninformative test As DOR increases, the ROC curve moves closer to its ideal position near the upper-left corner. line of symmetry ROC curve is asymmetric when test accuracy varies 0with threshold 1.8.6.4 Specificity.2 0 DOR = 90 DOR = 6 DOR = 15 DOR = 3 25

Statistical modelling of ROC curves statisticians like straight lines with axes that are independent variables first calculate the logits of TPR and FPR and then graph the difference against their sum 26

Translating ROC space to D versus S 1.0 6 True positiv ve rate 0.8 0.6 0.4 0.2 D = log od dds ratio 5 4 3 2 1 0 0.0-1 0.0 0.2 0.4 0.6 0.8 1.0 False positive rate -6-5 -4-3 -2-1 0 1 27 2 S

Moses-Littenberg SROC method 6 What do the axes mean? Difference in logits is the log of the DOR Sum of the logits is a marker of diagnostic threshold odds ratio D = log 5 4 3 2 1 0-1 -6-5 -4-3 -2-1 0 1 2 S 28

Moses-Littenberg SROC method Regression models can be used to fit the straight lines to model relationship between test accuracy and test threshold D = a + bs Outcome variable D is the difference in the logits Explanatory variable S is the sum of the logits Ordinary or weighted regression weighted by sample size or by inverse variance of the log of the DOR 29

Linear Regression 6 5 4 3 D 2 1 0-1 -6-5 -4-3 -2-1 0 1 2 S 30

Producing summary ROC curves Transform back to the ROC dimensions where a is the intercept, b is the slope when the ROC curve is symmetrical, b=0 and the equation is simpler 31

Linear Regression & Back Transformation 6 1.0 5 4 0.8 Q D 3 2 1 True positive rate 0.6 0.4 0 0.2-1 -6-5 -4-3 -2-1 0 1 2 S 0.0 0.0 0.2 0.4 0.6 0.832 1.0 False positive rate

Different situations ti What is the relationship between the underlying distribution and the ROC curve and the D versus S line? Let s have a look at different situations. 33

ROC curve and logit difference and sum plot: small difference, same spread re elative frequenc cy 0.1 008 0.08 non-diseased diseased 0.06 0.04 0.02 0 0 20 40 60 80 100 measurement ive rate (%age e) true positi 100 80 60 40 20 0 0 20 40 60 80 100 false positive rate (%age) logit TPR - log git FPR 10 6 2-40 -20 0 20 40-2 logit TPR + logit FPR 34

ROC curve and logit difference and sum plot: moderate difference, same spread rela ative frequenc cy 0.1 0.08 0.06 0.04 0.02 0 non-diseased diseased 0 20 40 60 80 100 measurement 100 ive rate true positi (%ag ge) 80 60 40 20 0 logit FPR logit TPR - l 8 4 0-30 -20-10 0 10 20 30 40-4 0 20 40 60 80 100 logit TPR + logit FPR false positive rate (%age) 35

ROC curve and logit difference and sum plot: large difference, same spread relati ive frequency 01 0.1 0.08 non-diseased diseased 0.06 0.04 0.02 0 0 20 40 60 80 100 measurement 100 true positive rate (%age) 80 60 40 20 0 0 20 40 60 80 100 logit FPR logit TPR - l 8 4 0-30 -20-10 0 10 20 30 40-4 logit TPR + logit FPR false positive rate (%age) 36

ROC curve and logit difference and sum plot: moderate difference, unequal spread 0.1 ive frequency relati 0.08 0.06 0.04 0.02 0 non-diseased diseased 0 20 40 60 80 100 measurement HIGH DOR tr rue positive ra ate (%age) 100 80 60 40 20 0 0 20 40 60 80 100 LOW DOR logit fpr logit tpr - l 10 8 6 4 2 0-30 -20-10 -2 0 10 20 30-4 -6 logit tpr + logit fpr false positive rate (%age) 37

SROC regression: another example 10 1.0 7 0.8 6 Sensitivity0.6 0.4 0.2 5 4 3 2 unweighted weighted 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1 - Specificity 1-4 -3-2 -1 0 1 2 S Transformation linearizes relationship between accuracy and threshold so that linear regression can be used 38

PSV example cont. 7 1.0 6 0.8 5 4 3 unweighted weighted Sensitivity0.6 0.4 2 0.2 1-4 -3-2 -1 0 1 2 S 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1 - Specificity inverse transformation The SROC curve is produced by using the estimates of a and b to compute the expected sensitivity (tpr) across a range of values for 1-specificity (fpr) 39

Problems with the Moses-Littenberg SROC method Poor estimation Tends to underestimate test accuracy due to zero-cell corrections and bias in weights Validity of significance tests Sampling variability in individual studies not properly taken into account P-values and confidence intervals erroneous Operating points knowing average sensitivity/specificity is important but cannot be obtained Sensitivity for a given specificity can be estimated 40

Advanced models HSROC and Bivariate methods Hierarchical / multi-levellevel allows for both within and between study variability, and within study correlations between diseased and non- diseased groups Logistic correctly models sampling uncertainty in the true positive proportion and the false positive proportion no zero cell adjustments needed Random effects allows for heterogeneity between studies Regression models used to investigate sources of heterogeneity 41

Parameterizations HSROC Mean lndor Variance lndor Mean threshold Variance threshold Shape of ROC Bivariate Mean logit sens Variance logit sens Mean logit spec Variance logit spec Correlation between sensitivity and specificity Other than the parameterization, the models are mathematically equivalent, see Harbord R, Deeks J et al. A unification of models for meta-analysis of diagnostic accuracy studies. Biostatistics 2006;1:1-21. 42

Hierarchical SROC model 1 accuracy shape threshold Sens sitivity.5 0 1.5 Specificity 0 43

Bivariate model Sens sitivity 1.5 correlation specificity sensitivity 0 1.5 Specificity 0 44

Outputs from the models HSROC Estimates underlying SROC curve, and the average operating point on the curve (mean DOR and mean threshold) Possible to estimate mean sensitivity, specificity and mean likelihood ratios, with standard errors obtained using the delta method Confidence and prediction ellipses estimable Bivariate Estimates the average operating point (mean sensitivity and specificity), confidence and prediction ellipses Possible to estimate mean likelihood ratios, with standard errors obtained using the delta method Underlying SROC curve estimable 45

Fitting the models HSROC Bivariatei Hierarchical model with non-linear regression, random effects and binomial error Original code in winbugs Easy to fit in PROC NLMIXED in SAS Hierarchical model with linear regression, random effects and binomial error Easy to fit in PROC NLMIXED in SAS, can be fitted in PROC MIXED Also in GLLAMM in STATA, MLWin 46

Syntax Proc NLMIXED - HSROC proc nlmixed data=diag di ; parms alpha=4 theta=0 beta=0 s2ua=1 s2ut=1; logitp = (theta + ut + (alpha + ua) * dis) * exp(-(beta)*dis); p = exp(logitp)/(1+exp(logitp)); model pos ~ binomial(n,p); random ua ut ~ normal([0, 0], [s2ua,0,s2ut]) subject=study; shape Disease indicator 47

Hierarchical SROC model 1 accuracy shape threshold Sens sitivity.5 0 1.5 Specificity 0 48

Syntax Proc NLMIXED - Bivariate proc nlmixed data=diag di ; parms msens=1 mspec=2 s2usens=0.2 s2uspec=0.6 cov=0; logitp = (msens + usens)*dis + (mspec + uspec)*nondis; p = exp(logitp)/(1+exp(logitp)); model pos ~ binomial(n,p); random usens uspec ~ normal([0, 0], [s2usens,cov,s2uspec]) subject=study; 49

Bivariate model Sens sitivity 1.5 correlation specificity sensitivity 0 1.5 Specificity 0 50

METADAS SAS macro developed to automate HSROC/bivariate analysis using PROC NLMIXED Can be used together th with Review Manager 5 (Cochrane review Software): Plot summary curve(s) Display summary point(s) Display 95% confidence and/or prediction regions for summary point(s) 51

Part 2 dealing with heterogeneity

The meta-analyst ss dream! 1,00 090 0,90 0,80 s e n s i t i v i t y 070 0,70 0,60 0,50 0,40 0,30 0,20 0,10 0,00 0,00 0,20 0,40 0,60 0,80 1,00 1-specificity 53

Realistic situation: vast heterogeneity 54

Echocardiography in Coronary Heart Disease 1.0 0.8 Se ensitivity 0.6 04 0.4 0.2 0.0 00 0.0 02 0.2 04 0.4 06 0.6 08 0.8 10 1.0 1-specificity 55

GLAL in Gram Negative Sepsis 1.0 0.8 Se ensitivity 0.6 04 0.4 0.2 0.0 00 0.0 02 0.2 04 0.4 06 0.6 08 0.8 10 1.0 1-specificity 56

F/T PSA in the Detection of Prostate cancer 1.0 0.8 Sensitivit ty 0.6 0.4 0.2 0.0 00 0.0 02 0.2 04 0.4 06 0.6 08 0.8 10 1.0 1-specificity 57

Dip-stick Testing for Urinary Tract Infection 1.0 0.8 ity Sensitiv 06 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1-specificity 58

Sources of Variation I. Chance I. variation II. III. IV. Differences II. in threshold Bias III. Clinical IV. subgroups V. Unexplained V. variation 59

Sources of Variation: Chance Chance variability: Chance variability: sample size=40 sample size=100 1.0 1.0 0.8 0.8 Sensitivity 0.6 0.4 Sensitivity 0.6 0.4 0.2 0.2 0.0 00 0.0 1.0 0.8 0.6 0.4 0.2 0.0 Specificity 1.0 0.8 0.6 0.4 0.2 0.0 Specificity 60

Sources of Variation: Threshold 1.0 Sen nsitivity 0.8 0.6 0.4 Threshold: perfect negative correlation no chance variability 0.2 0.0 1.0 0.8 0.6 0.4 0.2 0.0 Specificity 61

Sources of Variation: Threshold Sen nsitivity 1.0 0.8 0.6 0.4 Threshold: h perfect negative correlation + chance variability ss=60 0.2 0.0 1.0 0.8 0.6 0.4 0.2 0.0 Specificity 62

Sources of Variation: Bias & Subgroup Sen nsitivity 1.0 0.8 0.6 0.4 Bias & Subgroup: sens & spec higher ss=60 no threshold 0.2 0.0 1.0 0.8 0.6 0.4 0.2 0.0 Specificity 63

Sources of Variation I. Chance variation II. III. IV. Differences in threshold Bias Subgroups V. Unexplained variation 64

Comparison Feature Older Model* Advanced models** Chance variability +/- + Threshold differences + + Subgroup + + Unexplained variation +/- + * Moses-Littenberg model ** Hierarchical and bivariate models 65

Exploring heterogeneity Summarise data per subgroup Subgroup analyses Meta-regression analysis Covariates Study characteristics (patients, index tests, reference standard, setting, disease stage, etc.) Methodological quality items (QUADAS items) 66

Subgroup analysis and metaregression Advanced models can easily incorporate studylevel covariates Different questions can be addressed: differences in summary points of sensitivity or specificity it differences in overall accuracy differences in threshold differences in shape of SROC curve 67

Limitations of meta-regression Validity of covariate information poor reporting on design features Population characteristics information missing or crudely available Lack of power small number of contrasting studies 68

Subgroup analyses 1.0 0.8 Subgroup 1: both sens & spec higher nsitivity Se 0.6 0.4 0.2 0.0 1.0 0.8 0.6 0.4 0.2 0.0 Specificity 69

Prospective vs. Retrospective studies 1.0 0.8 Sensitivity 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1-specificity Data collection: Prosp Retro 70

This may look easy, but The following slides give the results of a study we did to incorporate the effects of quality into a meta-analysis. Leeflang et al. Impact of adjustment for quality on results of metaanalyses of diagnostic accuracy. Clin Chem. 2007;53:164-72. 71

Effects of high/low h/l Q? 1. Change in DOR 2. Change in consistency of DOR 3. Change in heterogeneity 72

Hypotheses Deficiencies in study quality have been associated with inflated estimates and with heterogeneity. Accounting for quality differences will therefore lead to less optimistic summary estimates. more homogenous results. 73

Challenge 3 Incorporation Strategies t 1. Ignoring (sometimes graphs are shown) pooling all studies, disregarding quality 2. Subgroup Analysis also: quality as criterion for inclusion also: stratification more than one subgroup also: sensitivity analysis 3. Regression analysis Stepwise multivariable regression analysis and Multivariable regression analysis with a fixed set of covariates 4. Weighted pooling not done 5. Sequential analysis highest quality lowest quality cumulative meta-analysis 74

Methods Quality assessment in 487 studies included d in 30 systematic ti reviews. QUADAS checklist used (Whiting et al. BMC Med Res Methodol, 2003) Two definitions for high-quality: 1. Evidence-based definition 2. Common practice definition Three methods for incorporation of quality: 1. Exclusion of low quality studies 2. Multivariable regression analysis with all items involved 3. Stepwise multivariable regression analysis (p>0.2) Comparison of DORs, 95% CI of DORs, and changes in a hypothetical decision. 75

Evidence-based definition 76

Common practice definition iti 77

Results Nonreporting of items was common, especially for blinding of index or reference test; time-interval between index test and reference test; and about inclusion of patients. Evidence-based definition: 72 high quality studies (15%); 12 reviews contained no high-quality studies. Common-practice definition: 70 high quality studies (14%); 9 reviews contained no high-quality studies. Fulfilling all 8 criteria: only 10 out of 487 studies were of high quality and only 1 meta-analysis out of 31 contained more than 3 high-quality studies 78

The Strategiest Ignoring quality: Pooling all studies Analyzing subgroups: Only pooling high-quality studies; high quality defined as fulfilling a certain subset of criteria. Stepwise QUADAS-items with a p-value <0.2 multivariable univariate are entered in a regression multivariable regression model analysis: Multivariable regression analysis with a set of covariates: A standard set of three QUADASitems was used as covariates in each meta-analysis analysis. 79

ID MA DOR 80

Conclusions? We found no evidence for our hypothesis that adjusting for quality leads to less optimistic and more homogenous results. Explanations: Poor reporting Small sample size (30 SRs, small studies) Opposite effects of quality items DOR in stead of sensitivity and specificity Relation quality estimates not straightforward Still, poor quality will affect the trustworthiness. Therefore, report quality of individual studies and overall quality. 81

Exercise What do the results of a meta- analysis mean? I have some Output from SAS and STATA and would like to invite you to have a look at them. 82

Bivariate or HSROC? What do the parameters mean? 83

84

Part 3 Test Comparisons

Differences between tests Diagnosis of lymph node metastasis in women with cervical cancer 2 imaging modalities: lymphangiography (LAG, n=17) CT (n=17) Published meta-analysis analysis JAMA 1997;278:1096-11011101 Modelled by adding covariate for test into the model statement, e t, and parameter a estimates for differences e in: Sensitivity and specificity for bivariate Log DOR, threshold and shape for HSROC 86

ROC plot of individual study results (L=lymphangiography C=CT) 1.0 Sens sitivity L L 08 0.8 L C L C C L C L C C L C L L 0.6 C L L 0.4 C L L L CC C 0.2 C C C L 0.0 C C 0.0 0.2 0.4 0.6 0.8 1.0 1-specificity 87

Summary ROC estimates t 1.0 CT itive rate True pos L L 08 0.8 L C 0.6 0.4 L L C CLL C L C L C L L C L C L L L L C L L CC C L LAG L 02 0.2 LC C C L L 0.0 CL C 0.0 0.2 0.4 0.6 0.8 1.0 False positive rate 88

Average operating points and confidence ellipses 1.0 Sens sitivity L L LAG 08 0.8 L C 0.6 CT 0.4 L C C C L C L C L C L L C L L C L CC C L L 0.2 C C C L 0.0 C C 0.0 0.2 0.4 0.6 0.8 1.0 1-specificity 89

Difference between average operating points Imaging modality Sensitivity (95% CI) Specificity (95% CI) LAG 0.67 (0.57 to 0.76) 0.80 (0.73 to 0.85) CT 049(037to061) 0.49 (0.37 0.61) 092(088to095) 0.92 (0.88 0.95) P-value Lag vs. CT 0.023 0.0002 90

Summary points or SROC curves? Clinical interpretation Need to estimate performance at a threshold, using sensitivity, specificity or/and likelihood ratios Single threshold or mixed thresholds? Summary curve describes how test performance varies across thresholds. Studies do not need to report a common threshold to contribute. Summary point must relate to a particular threshold. Only studies reporting a common threshold can be combined. 91

Summary points or SROC curves? Comparing tests and subgroups Often wish to use as much data as possible if this means mixing thresholds SROC curves are needed d if still a common threshold either method appropriate Possible to assess impact of threshold as a covariate SROC curves allow identification of crossing lines A Cochrane review may include both an analysis of the SROC curves, and estimation of average threshold specific operating points 92

Comparative analyses Indirect comparisons Different tests used in different studies Potentially confounded d by other differences between the studies Direct comparisons Patients receive both tests or randomized to tests Differences in accuracy more attributable t bl to the tests t Few studies may be available and may not be representative 93

Example of pilot Cochrane Review Down Syndrome screening review Studies Participants 1st trimester - NT alone 10 79,412 1st trimester - NT and serology 22 222,171 2nd trimester - triple test (serology) 19 72,797 94

95

Indirect comparison NT alone Sensitivity: 72% (63%-79%) Specificity: 94% (91% -96%) DOR: 39 (26-60) NT with serology Sensitivity: 86% (82%-90%) Specificity: 95% (93%-96%) 96%) DOR: 110 (84-143) RDOR: 2.8 (1.7-4.6), p <0.0001 Triple test Sensitivity: 82% (76%-86%) Specificity: 83% (77%-87%) DOR: 21 (15-30) RDOR: 0.5 (0.3-0.9), p = 0.03 96

DIRECT COMPARISONS NT alone Sensitivity: 71% (59%-82%) Specificity: 95% (91%-98%) DOR: 41 (16-67) NT with serology Sensitivity: 85% (77%-93%) Specificity: 96% (93%-98%) DOR: 123 (40-206) Triple test No paired studies available 97

Indirect versus Direct comparisons NT alone Sensitivity: 72% (63%-79%) Specificity: 94% (91% -96%) DOR: 39 (26-60) NT alone Sensitivity: 71% (59%-82%) Specificity: 95% (91%-98%) 98%) DOR: 41 (16-67) NT with serology Sensitivity: 86% (82%-90%) Specificity: 95% (93%-96%) 96%) DOR: 110 (84-143) RDOR: 2.8 (1.7-4.6), p <0.0001 NT with serology Sensitivity: 85% (77%-93%) Specificity: 96% (93%-98%) DOR: 123 (40-206) 98

Part 4 Some other issues

Another approach Hypothesis testing is not common in diagnostic test accuracy research or in diagnostic meta-analyses. But you could test whether the studies you found or whether e the summary estimate falls within a certain target region. 100

Target region 100 Target region Tr ue positive rate (sen s) 80 60 40 20 0 0 20 40 60 80 100 False positive rate (1-spec) 101

100 Target region Tr ue positive rate (sen s) 80 60 40 20 0 0 20 40 60 80 100 False positive rate (1-spec) 102

Publication bias In systematic reviews of intervention studies, publication bias is an important form of bias To investigate publication bias in reviews, funnel plots are used. In diagnostic reviews, funnel plots are seriously misleading and alternatives have poor power. 103

Publication bias - background many studies are done without ethical review or study registration prospective registration is therefore not available diagnostic test accuracy studies do not test hypotheses, so there is no significance involved we have no clue whether publication bias exists for diagnostic accuracy studies and how the mechanisms behind it may work 104

Summary Part 1: meta-analysis introduction Part 2: heterogeneity Part 3: test comparisons Part 4: some other issues 105