Critical appraisal: Systematic Review & Meta-analysis Atiporn Ingsathit MD.PhD. Section for Clinical Epidemiology and biostatistics Faculty of Medicine Ramathibodi Hospital Mahidol University What is a review? A review provides a summary of evidence to answer important practice and policy questions without readers having to spend the time and effort to summarize the evidence themselves. 1
Type of review Narrative review (conventional review) Review article Chapter from textbook Systematic review 2
3
Why we need systematic reviews? 4
Problems of conventional review Broad clinical questions Unsystematic approaches to collecting of evidences Unsystematic approach to summarizing of evidences Trend to be biased by author s opinions Load of evidence Conflicting of evidence What is a systematic review? A review of a particular subject undertaken in such a systematic way that risk of bias is reduced. Systemic reviews have explicit, scientific, and comprehensive descriptions of their objectives and methods. Hunink, Glasziou et al, 2001. 10 5
AIMS Systematic: to reduce bias Explicit (precisely and clearly express) : to ensure reproducibility 6
7 AAAAAAAAAAA AAAAAAAAAAA AAAAAAAAAAA AAAAAAAAAAA AAAAAAAAAAA AAAAAAAAAAA AAAAAAAAAAA AAAAAAAAAAA AAAAAAAAAAA AAAAAAAAAAA Systematic review
What is a meta-analysis? The analysis of multiple studies, including statistical techniques for merging and contrasting results across studies. Synonyms: research synthesis, systematic overview, pooling, and scientific audit. Focus on contrasting and combining results from different studies in the hopes of identifying patterns among study results. Quantitative methods applied only after rigorous qualitative selection process. Hunink, Glasziou et al, 2001. 15 Meta-analysis Estimates treatment effects Leading to reduces probability of false negative results (increase power of test) Potentially to a more timely introduction of effective treatments. 8
Process of conducting a systematic review and meta-analysis Define the question: PICO Conduct literature search Sources: Databases, experts, funding agencies, pharmaceutical companies, hand-searching, references Identify titles and abstracts Apply inclusion and exclusion criteria Titles and abstract full articles final eligible articles agreement Create data abstraction Data abstraction, methodologic quality, agreement on validity Conduct analysis Determine method of generating pooled estimates Pooled estimates ( if appropriate) Explore heterogeneity conduct subgroup Explore publication bias Example 9
19 Users guides for how to use review articles Gordon Guyatt, Roman Jaeschke, Kameshwar Prasad, and Deborah J Cook Users Guides to Medical Literature: A Manual for Evidence-Based Clinical Practice 2008 10
1. Assess the systematic review validity. * Did the review explicitly Address a sensible clinical question? * Did the review include explicit and appropriate eligibility criteria? * Was biased selection and reporting of studies unlikely? * Was the Search for Relevant Studies Detailed and Exhaustive? * Were the Primary Studies of High Methodologic Quality? * Were Assessments of Studies Reproducible? 2. What are the results? * Were the results similar from study to study? * What are the overall results of the review? * How precise were the results? 11
3. How can I apply the results to patient care? * Were all patient-important outcomes considered? * Are any postulated subgroup effects credible? * What is the overall quality of the evidence? * Are the benefits worth the costs and potential risks? Validity criteria 1. Did the Review Explicitly Address a Sensible Clinical Question? P Lupus nephritis I Mycophenolate mofetil (MMF) C Cyclophosphamide (CYC) O Complete, partial remission, adverse events 12
Validity criteria 2. Did the review include explicit and appropriate eligibility criteria? Range of patients (older/younger, severity) Range of interventions ( dose, route) Range of outcomes (short/long-term, surrogate/clinical) Validity criteria 3 Was biased selection and reporting of studies unlikely? Clear inclusion and exclusion criteria Topic Therapy Diagnosis Harm Prognosis Guides Were patients randomized? Was follow-up complete? Was the patient sample representative of those with the disorder? Was the diagnosis verified using gold standard, and independent? Did the investigators demonstrate similarity in all known determinants of outcome or adjust for differences in the analysis? Was follow-up sufficiently complete? Was there a representative sample of patients? Was follow-up sufficiently complete? 13
Study Search and Selection One reviewer (NK) electronically searched the MEDLINE database using PubMed (National Library of Medicine, Bethesda,MD) (1951 to December 2009) Ovid (WoltersKluwer, NewYork, NY) (1966 to December 2009) The Cochrane Central Register of Randomized Controlled Trials (CENTRALVThe Cochrane Library issue 4, 2009) (United States Cochrane Center, Baltimore, MD). Search terms used without language restriction were as follows: (mycophenolate mofetil or mycophenolate) and cyclophosphamide and (lupus nephritis or glomerulonephritis), limited to randomized controlled trial. Two reviewers (NK and AT) independently screened titles and abstracts. Validity criteria 4. Was the Search for Relevant Studies Detailed and Exhaustive? Why should effort be exerted to search for published and unpublished articles? What articles tend to published more - the ones with positive or negative results? If positive articles tend to be published more, how will this affect meta-analyses of treatment interventions? 14
Publication bias Positive studies are more likely to be published to be published in Eng to be cited by other authors To produce multiple publication Large studies are more likely to be published even they have negative results Quality of study Lower quality of methodology shows larger effects Bias due to association between treatment effect and study size Publication bias assessment Using the Egger test on the 5 trials, we found borderline evidence of bias (coefficient = 2.03, SE = 0.64, p = 0.049) from the small study effects. Funnel plot for complete remission 15
Validity criteria 5. Were the Primary Studies of High Methodologic Quality? Methodologic Quality PRISMA guidelines 16
Validity criteria 6. Were Assessments of Studies Reproducible? Having 2 more people participate in each decision Good agreement Data Extraction and Risk Assessment Two reviewers (NK and AT) independently performed data extraction. We extracted trial characteristics (for example, study design, sample size, treatment dosage and duration, WHO classification, renal biopsy information) and definitions (complete remission and complete/partial remission). 17
Results 18
19
Results 1. Were the results similar from study to study? Explore heterogeneity What does heterogeneity mean? 20
Explore heterogeneity What does heterogeneity mean? The results are significantly different between studies. The possibility of excess variability between the results of the difference trials/studies is examined by the test of heterogeneity. Explore heterogeneity Why? As the studies might be not conduct according to a common protocol. Variations in patient groups, clinical setting, concomitant care, and the methods of delivery of the intervention or method of measurement of exposure for observational studies. 21
How do we detect heterogeneity? 1) Visual interpretation 2) Do statistical tests (e.g. q test, p<.1 implies heterogeneity, or I 2 >0.7) Visual interpretation 22
23
Do statistical tests Statistical test (1) Statistical test of heterogeneity (yes/no) Cochran Q Null hypothesis of the test for heterogeneity is that the underlying effect is the same in each of the studies. Low P value means that random error is an unlikely explanation of the differences in results from study to study. High P value increases our confidence that the underlying assumption of pooling holds true. 24
Statistical test (2) Magnitude of heterogeneity I 2 statistic Provides an estimate of the percentage of variability in results across studies that is likely due to true differences in treatment effect as opposed to chance As the I 2 increases, we become progressively less comfortable with a single pooled estimate, and need to look for explanations of variability other than chance I 2 < 0.25 small heterogeneity 0.25-0.5 moderate heterogeneity > 0.5 large heterogeneity Plot study results Forest plot or metaview 25
What can authors do if there is heterogeneity? 1) Identify the source of heterogeneity 2) Try to group studies into homogeneous categories (sensitivity analysis) 3) No statistical combination (no metaanalysis) Results 2 What are the overall results of the review? 26
Results 3. How precise were the results? 27
Confidence Intervals 0.6 0.8 1 1.2 1.4 1.6 Risk ratio 3. How can I apply the results to patient care? * Were all patient-important outcomes considered? * Are any postulated subgroup effects credible? * What is the overall quality of the evidence? * Are the benefits worth the costs and potential risks? 28
Number need to treat (NNT) Number needed to be treated to prevent one more event NNT = 1/R c -R t = 1/ARR Number need to harm (NNH) Number needed to be treated to harm one more of them NNH = 1/R t -R c NNT and NNH 29
Network meta-analysis Meta-analysis Traditional meta-analysis address the merits of one intervention vs. another Drawback it evaluates the effect of only 1 intervention vs. 1 comparator Do not permit inferences about the relative effectiveness of several interventions * Medical condition there are a selection of interventions that have most frequently been compared with placebo and occasionally with one another. 60 30
Network Meta-analysis (NMA) Multiple or mixed treatment comparison meta-analysis NMA approach provides estimates of effect sizes for all possible pairwise comparisons whether or not they have actually been compared head to head in RCTs. 61 Network Meta-analysis A network meta-analysis combines direct and indirect sources of evidence to estimate treatment effects. Direct evidence on the comparison of two particular treatments will be obtained from studies that contain both treatments Indirect evidence is obtained through studies that examine both treatments via some common treatment only. 31
Consideration in NMA 1. Among trials available for pairwise comparisons, are the studies sufficiently homogenous to combine for each intervention? (An assumption that is also necessary for a conventional meta-analysis) 2. Are the trials in the network sufficiently similar, with the exception of the intervention (eg, in important features, such as populations, design, or outcomes)? 3. Where direct and indirect evidence exist, are the findings sufficiently consistent to allow confident pooling of direct and indirect evidence together? 63 Users' Guides to the Medical Literature: A Manual for Evidence- Based Clinical Practice, 3rd ed 2015 Gordon Guyatt, Drummond Rennie, Maureen O. Meade, Deborah J. Cook http://jamaevidence.mhmedical.com/book.aspx?bookid=847 64 32
65 33
I. How Serious Is the Risk of Bias? 67 1. Did the Meta-analysis Include Explicit and Appropriate Eligibility Criteria? PICO Broader eligibility criteria enhance generalizability of the results if participants are too dissimilar heterogeneity Diversity of interventions excessive if authors pool results from different doses or even different agents in the same class, based on the assumption that effects are similar. Too broad in their inclusion of different populations, different doses or different agents in the same class, or different outcomes to make comparisons across studies credible. 68 34
Research question We therefore conducted a systematic review and network meta-analysis with the aim of comparing complete recovery rates at 3 and 6 months for corticosteroids, AVT (Acyclovir or Valacyclovir), or the combination of both for treatment of adult Bell s palsy. P I C O Eligible criteria Studies were included if they were RCTs, and studied subjects aged 18 years or older with sufficient data. Non-English papers were excluded from the review. 35
2. Was Biased Selection and Reporting of Studies Unlikely? Include all interventions because data on clearly suboptimal or abandoned interventions may still offer indirect evidence for other comparisons Apply the search strategies from other systematic reviews only if authors have updated the search to include recently published trials Some industry-initiated NMAs may choose to consider only a sponsored agent and its direct competitors Omit the optimal agent give a fragmented picture of the evidence Selection of NMA outcomes should not be data driven but based on importance for patients and consider both outcomes 71 of benefit and harm. Search strategy One author (NP) located studies in MEDLINE (from 1966 to August 2010) and EMBASE (from 1950 to September 2010) using PubMed and Ovid search engines. Search terms used were as follows: (Bell s palsy or idiopathic facial palsy) and (antiviral agents or acyclovir or valacyclovir), limited to randomized controlled trials. 36
Selection of study Where eligible papers had insufficient information, corresponding authors were contacted by e-mail for additional information. The reference lists of the retrieved papers were also reviewed to identify relevant publications. Where there were multiple publications from the same study group, the most complete and recent results were used. Study selection 37
Outcome Complete recovery was defined as a score 2 on the House-Brackman Facial Recovery scale, 8 on the Facial Palsy Recovery Index, > 36 points on the Yanagihara score, or 100 on the Sunnybrook scale. 3. Did the Meta-analysis Address Possible Explanations of Between- Study Differences in Results? When clinical variability is present conduct subgroup analyses or meta-regression to explain heterogeneity more optimally fit the clinical setting and characteristics of the patient you are treating. Multiple control interventions (eg, placebo, no intervention, older standard of care) It is important to account for potential differences between control groups Potential placebo effect 76 38
Plan for explore heterogeneity 4. Did the Authors Rate the Confidence in Effect Estimates for Each Paired Comparison? Ideally, for each paired comparison, authors will present the pooled estimate for the direct comparison (if there is one) and its associated rating of confidence, the indirect comparison(s) that contributed to the pooled estimate from the NMA and its associated rating of confidence, and the NMA estimate and the associated rating of confidence. 78 39
Lose Confidence in comparison of treatments RCT - failed to protect against risk of bias by allocation concealment, blinding, and preventing loss to follow-up. When on pooled estimates are (imprecision) Results vary from study to study and we cannot explain the differences (inconsistency); The population, intervention, or outcome differ from that of primary interest (indirectness); 80 40
II. What Are the Results? 81 1. What Was the Amount of Evidence in the Treatment Network? Gauge from the number of trials, total sample size, and number of events for each treatment and comparison Understanding the geometry of the network (nodes and links) will permit clinicians to examine the larger picture and see what is compared to what The credible intervals around direct, indirect, and NMA estimates provide a helpful index 82 41
Result at 3 months Result at >3 months 42
2. Were the Results Similar From Study to Study? NMA, with larger numbers of patients and studies - more powerful exploration of explanations of between-study differences The search conducted by NMA authors for explanations for heterogeneity may be informative. NMA - vulnerable to unexplained differences in results from study to study 85 3. Were the Results Consistent in Direct and Indirect Comparisons? Direct or indirect - most trustworthy? Requires assessing whether the direct and indirect estimates are consistent or discrepant. 86 43
Inconsistency B Three designs: AB, AC, ABC A C When the direct and indirect sources of evidence within a network do not agree, this is known as inconsistency 3. Were the Results Consistent in Direct and Indirect Comparisons? Direct or indirect - most trustworthy? Requires assessing whether the direct and indirect estimates are consistent or discrepant. Inconsistency in results in both the direct and indirect comparisons decrease confidence in estimates Statistical methods exist for checking this type of inconsistency, typically called a test for 88 incoherence. 44
Potential Reasons for Incoherence Between the Results of Direct and Indirect Comparisons Chance Genuine differences in results Differences in enrolled participants, interventions, background managements Bias in head-to-head (direct) comparisons Publication bias Selective reporting of outcomes and of analyses Inflated effect size in stopped early trials Limitations in allocation concealment, blinding, loss to follow-up, analysis as randomized Bias in indirect comparisons Each of the biasing issues above Test for incoherence Discrepancy of treatment effects between direct and indirect meta-results was then assessed using the standardized normal method (Z), i.e. by dividing the difference by its standard error. 45
4. How Did Treatments Rank and How Confident Are We in the Ranking? Besides presenting treatment effects, authors may also present the probability that each treatment is superior to all other treatments, allowing ranking of treatments. May be misleading because Fragility in the rankings Differences among the ranks may be too small to be important Other limitations in the studies (eg, risk of bias, inconsistency, indirectness). 92 46
93 5. Were the Results Robust to Sensitivity Assumptions and Potential Biases? May assess the robustness of the study findings by applying sensitivity analyses that reveal how the results change if some criteria or assumptions change. Sensitivity analyses may include restricting the analyses to trials with low risk of bias only or examining different but related outcomes 94 47
III. How Can I Apply the Results to Patient Care? 95 1. Were All Patient-Important Outcomes Considered? Many NMAs report only 1 or a few outcomes of interest Adverse events are infrequently assessed in meta-analysis and in NMAs. More likely to include multiple outcomes and assessments of harms 96 48
2. Were All Potential Treatment Options Considered? Network meta-analyses may place restrictions on what treatments are examined. Need background knowledge review. 97 3. Are Any Postulated Subgroup Effects Credible? Criteria exist for determining the credibility of subgroup analyses. NMA allow a greater number of RCTs to be evaluated and may offer more opportunities for subgroup analysis. 98 49
Single common comparator star network Only allow for indirect comparison reduce confidence in effect Use both direct and indirect evidence increase confidence in estimates of interest Mixture of indirect links and close loops, unbalanced shapes High confidence for some Low confidence for others 99 Hierarchy of Evidence Systematic reviews Randomized Controlled Trials Cohort studies Case-control studies Cross-sectional studies Cases reports 50
Take home messages Systematic review is a secondary research. It focused on a research question that tries to identify, appraise, select and synthesize all high quality research evidence relevant to that question. Meta-analysis is a statistic tool of a systematic review, which is broadly defined as a quantitative review and synthesis of the results of related but independent studies. Take home messages NMA can provide extremely valuable information in choosing among multiple treatments offered for the same condition It is important to determine the confidence one can place in the estimates of effect of the treatments considered and the extent to which that confidence differs across comparisons. 51