Grading Diagnostic Evidence-Based Statements and Recommendations

Similar documents
GRADE: grading quality of evidence and strength of recommendations for diagnostic tests and strategies

Washington, DC, November 9, 2009 Institute of Medicine

EVIDENCE-BASED GUIDELINE DEVELOPMENT FOR DIAGNOSTIC QUESTIONS

Introduzione al metodo GRADE

The QUOROM Statement: revised recommendations for improving the quality of reports of systematic reviews

Grading evidence for laboratory test studies beyond diagnostic accuracy: application to prognostic testing

SSAI Clinical Practice Committee guideline work flow v2. A. Formal matters

Guideline development in TB diagnostics. Karen R Steingart, MD, MPH McGill University, Montreal, July 2011

PGY1 Learning activities-ebcp Scripts

Supplementary Online Content

GRADE Evidence Profiles on Long- and Rapid-Acting Insulin Analogues for the treatment of Diabetes Mellitus [DRAFT] October 2007

Evaluating the Strength of Clinical Recommendations in the Medical Literature: GRADE, SORT, and AGREE

2014 International Pressure Ulcer Guideline Methodology

Why is ILCOR moving to GRADE?

Фармакоэкономика. теория и практика. Pharmacoeconomics. theory and practice

GRADE. Grading of Recommendations Assessment, Development and Evaluation. British Association of Dermatologists April 2014

GRADE. Grading of Recommendations Assessment, Development and Evaluation. British Association of Dermatologists April 2018

Copyright GRADE ING THE QUALITY OF EVIDENCE AND STRENGTH OF RECOMMENDATIONS NANCY SANTESSO, RD, PHD

GRADE guidelines: 11. Making an overall rating of confidence in effect estimates for a single outcome and for all outcomes

Observed Differences in Diagnostic Test Accuracy between Patient Subgroups: Is It Real or Due to Reference Standard Misclassification?

INTRODUCTION WHAT ARE PROFESSIONAL SOCIETIES AND OTHER ORGANIZATIONS DOING NOW?

MINI SYMPOSIUM - EUMASS - UEMASS European Union of Medicine in Assurance and Social Security

Cochrane-GRADE Workshop

An evidence-based laboratory medicine approach to evaluate new laboratory tests

Chapter 1: Introduction and Methodology

Objectives. Information proliferation. Guidelines: Evidence or Expert opinion or???? 21/01/2017. Evidence-based clinical decisions

'Summary of findings' tables in network meta-analysis (NMA)

Diagnostic Studies Dr. Annette Plüddemann

SPECIAL COMMUNICATION. Evidence-Based Medicine, Part 3. An Introduction to Critical Appraisal of Articles on Diagnosis

The cross sectional study design. Population and pre-test. Probability (participants). Index test. Target condition. Reference Standard

Results. NeuRA Treatments for internalised stigma December 2017

Grading the Evidence Developing the Typhoid Statement. Manitoba 10 th Annual Travel Conference April 26, 2012

Results. NeuRA Worldwide incidence April 2016

Diagnostic methods I: sensitivity, specificity, and other measures of accuracy

Interpreting Levels of Evidence and Grades of Health Care Recommendations

Problem solving therapy

Systematic Reviews and meta-analyses of Diagnostic Test Accuracy. Mariska Leeflang

Appraising Diagnostic Test Studies

The Joanna Briggs Institute Reviewers Manual 2014

EVIDENCE AND RECOMMENDATION GRADING IN GUIDELINES. A short history. Cluzeau Senior Advisor NICE International. G-I-N, Lisbon 2 November 2009

Results. NeuRA Hypnosis June 2016

Traumatic brain injury

Screening for Prostate Cancer with the Prostate Specific Antigen (PSA) Test: Recommendations 2014

Cite this article as: BMJ, doi: /bmj (published 18 July 2006)

Outcomes and GRADE Summary of Findings Tables: old and new

ACR OA Guideline Development Process Knee and Hip

GRADE tables to assist guideline development and recommendations. Plain Language Summary of Results

The Ever Changing World of Sepsis Management. Laura Evans MD MSc Medical Director of Critical Care Bellevue Hospital

Accuracy of pulse oximetry in screening for congenital heart disease in asymptomatic newborns: a systematic review

Reporting the effects of an intervention in EPOC reviews. Version: 21 June 2018 Cochrane Effective Practice and Organisation of Care Group

NeuRA Decision making April 2016

Results. NeuRA Mindfulness and acceptance therapies August 2018

Quality Assessment of Research Articles in Nuclear Medicine Using STARD and QUADAS 2 Tools

GRADE, Summary of Findings and ConQual Workshop

Checklist for Diagnostic Test Accuracy Studies. The Joanna Briggs Institute Critical Appraisal tools for use in JBI Systematic Reviews

NeuRA Sleep disturbance April 2016

Critical reading of diagnostic imaging studies. Lecture Goals. Constantine Gatsonis, PhD. Brown University

NICE guidelines development Low back pain and sciatica: Management of non-specific low back pain and sciatica

Critical Appraisal of a Meta-Analysis: Rosiglitazone and CV Death. Debra Moy Faculty of Pharmacy University of Toronto

Results. NeuRA Forensic settings April 2016

Online Annexes (5-8)

Online Annexes (2-4)

Systematic Reviews. Simon Gates 8 March 2007

SYSTEMATIC REVIEWS OF TEST ACCURACY STUDIES

Critical Review Form Clinical Decision Analysis

Recommendations on Screening for Lung Cancer 2016

Online Annexes (5-8)

Systematic Reviews of Studies Quantifying the Accuracy of Diagnostic Tests and Markers

Animal-assisted therapy

Systematic reviews of diagnostic accuracy studies are often

GRADE: Applicazione alle network meta-analisi

Evidence Based Medicine

Introduction to Cost-Effectiveness Analysis

GRADE guidelines: 3. Rating the quality of evidence

Mapping from SORT to GRADE. Brian S. Alper, MD, MSPH, FAAFP Editor-in-Chief, DynaMed October 31, 2013

Setting The setting was secondary care. The economic study was carried out in Switzerland.

Evidence-Based Medicine: Diagnostic study

Distraction techniques

NeuRA Obsessive-compulsive disorders October 2017

CRITICAL APPRAISAL OF CLINICAL PRACTICE GUIDELINE (CPG)

Results. NeuRA Family relationships May 2017

CONSORT 2010 checklist of information to include when reporting a randomised trial*

Results. NeuRA Motor dysfunction April 2016

NHS CeVEAS Information Packages

Clinical Practice Guideline on the Use of Actigraphy for the Evaluation of Sleep Disorders and Circadian Rhythm Sleep-Wake Disorders

Transcranial Direct-Current Stimulation

research methods & reporting

Critical appraisal: Systematic Review & Meta-analysis

Early Rehabilitation in the ICU: Do We Still Need Chest Physiotherapy?

Outline. What is Evidence-Based Practice? EVIDENCE-BASED PRACTICE. What EBP is Not:

Surviving Sepsis Campaign: International Guidelines for Management of Sepsis and Septic Shock: 2016

Latent tuberculosis infection

Alcohol interventions in secondary and further education

CEU screening programme: Overview of common errors & good practice in Cochrane intervention reviews

Evidence-based medicine and guidelines: development and implementation into practice

Method. NeuRA Biofeedback May 2016

Overview and Comparisons of Risk of Bias and Strength of Evidence Assessment Tools: Opportunities and Challenges of Application in Developing DRIs

The evidence system of traditional Chinese medicine based on the Grades of Recommendations Assessment, Development and Evaluation framework

Results. NeuRA Essential fatty acids August 2016

Results. NeuRA Treatments for dual diagnosis August 2016

Transcription:

Grading Diagnostic Evidence-Based Statements and Recommendations Grading of Recommendations Assessment, Development, and Evaluation (GRADE) Working Group Schunemann HJ, Oxman AD, Brozek J, Glasziou P, Jaeschke R, Williams J, et al. GRADEing the quality of evidence and strength of recommendations for diagnostic tests and strategies. BMJ submitted.

Acknowledgement Jeff Andrews, associate professor x David Atkins, chief medical officer a Dana Best, assistant professor b Peter A Briss, chief c Martin Eccles, professor d Yngve Falck-Ytter, associate director e Signe Flottorp, researcher f *Gordon H Guyatt, professor g Robin T Harbour, quality and information director h Margaret C Haugh, methodologist i David Henry, professor j Suzanne Hill, senior lecturer j Roman Jaeschke, clinical professor k Gillian Leng, guidelines programme director l Alessandro Liberati, professor m Nicola Magrini, director n James Mason, professor d Philippa Middleton, honorary research fellow o Jacek Mrukowicz, executive director p Dianne O Connell, senior epidemiologist q *Andrew D Oxman, director f Bob Phillips, associate fellow r *Holger J Schünemann, associate professorg g,s Tessa Tan-Torres Edejer, medical officer/scientist t Helena Varonen, associate editor u Gunn E Vist, researcher f John W Williams Jr, associate professor v Stephanie Zaza, project director w ~Prof. Patrick M Bossuyt and Prof. Victor M. Montori x) Vanderbilt University Medical Center, USA a) Agency for Healthcare Research and Quality, USA b) Children's National Medical Center, USA c) Centers for Disease Control and Prevention, USA d) University of Newcastle upon Tyne, UK e) German Cochrane Centre, Germany f) Norwegian Centre for Health Services, Norway g) McMaster University, Canada h) Scottish Intercollegiate Guidelines Network, UK i) Fédération Nationale des Centres de Lutte Contre le Cancer, France j) University of Newcastle, Australia k) McMaster University, Canada l) National Institute for Clinical Excellence, UK m) Università di Modena e Reggio Emilia, Italy n) Centro per la Valutazione della Efficacia della Assistenza Sanitaria, Italy o) Australasian Cochrane Centre, Australia p) Polish Institute for Evidence Based Medicine, Poland q) The Cancer Council, Australia r) Centre for Evidence-based Medicine, UK s) University of Buffalo, USA t) World Health Organisation, Switzerland u) Finnish Medical Society Duodecim, Finland v) Duke University Medical Center, USA w) Centers for Disease Control and Prevention, USA

Recommendations about Diagnosis Which tests should be recommended, and with what strength? Why grade recommendations? A systematic and explicit approach to making judgments about the quality of evidence and the strength of recommendations can help to prevent errors, facilitate critical appraisal of these judgments, and can help to improve communication of this information.

Grading Recommendations: Strong or Weak (For or Against) Strong recommendations strong methods accurate test impacts outcome few downsides of testing strategy indicating a judgment that a majority of well informed people will make the same choice (high confidence, low uncertainty) expect non-variant clinician and patient behavior - most patients should receive the intervention diminished role for clinical expertise - focus on implementation & barriers focused role of patient values and preferences - emphasis on compliance and barriers could be used as a performance / quality indicator decision aids not likely to be needed medical practice is expected to not to vary much Weak recommendations weak methods imprecise estimate / small effect substantial downsides indicating a judgment that a majority of well informed people will make the same choice, but a substantial minority will not (significant uncertainty) expect variability in clinician and patient actions clinical expertise important - focus on decision-making and implementation patient values and preferences important - focus on determining values and preferences relative to decision decision aids likely to be useful offering the intervention and helping patients make a decision could be used a quality criterion medical practice is expected to vary to some degree

Grading the Evidence: Evaluating Diagnostic Studies Evidence concepts scientific results that approximate truth size, accuracy, precision reliability, reproducibility, appropriateness, bias statistical descriptions trade-offs, limiting factors, cost Grade components Quality (Validity) The quality of evidence indicates the extent to which one can be confident that an estimate of effect is correct. Strength (Benefit/Risk - Results) The strength of a recommendation indicates the extent to which one can be confident that adherence to the recommendation will do more good than harm. Implementation and application Will the results help me with my patient care? (Relevance & Prevalence)

Grading evidence process Describe question / problem (PICO) Establish critical / important outcomes to decision and relative importance of outcomes Systematic review Evidence profile / quality for each outcome Overall quality of evidence Benefit/risk/harm/cost balance Overall strength of recommendation - GRADE Implementation and evaluation

Evaluating Diagnosis Studies: Validity (Quality) Are the results Valid? (grading quality) Was there an independent, blind comparison with a reference standard? (gold standard) Did the patient sample include an appropriate spectrum of the sort of patients to whom the diagnostic test will be applied in clinical practice? Is there a standard method for doing the test? (reproducibility, reliability)

Judgments about Evidence Quality Starting Line High Blinded testing of previously developed diagnostic criteria in series of consecutive patients (with universally applied reference "gold" standard) or a systematic review of these studies Moderate Low Very Low Development of diagnostic criteria on basis of consecutive patients (with universally applied reference "gold" standard) or a systematic review of these studies Study of nonconsecutive patients (no consistently applied reference "gold" standard) or a systematic review of these studies Study with a poor reference standard / Case-control study / Expert opinion / Case Report The quality of evidence indicates the extent to which one can be confident that an estimate of effect is correct.

Types of studies to evaluate a test or diagnostic strategy: outcome-based Example: RCT that explored a diagnostic strategy guided by the use of B-type natriuretic peptide (BNP) Designed to provide a more accurate diagnosis of heart failure - in patients presenting to the emergency department with acute dyspnea. The group randomized to receive BNP spent a shorter time in the hospital at lower cost, with no increased mortality or morbidity. When diagnostic intervention studies - ideally RCTs but also observational studies - comparing alternative diagnostic strategies with assessment of direct patient important outcomes are available, guideline recommendation panels can use the GRADE approach established for treatment questions. Mueller C, Scholer A, Laule-Kilian K, Martina B, Schindler C, Buser P, et al. Use of B-type natriuretic peptide in the evaluation and management of acute dyspnea. N Engl J Med 2004;350(7):647-54.

Types of studies to evaluate a test or diagnostic strategy: accuracy-based Diagnostic accuracy is a surrogate outcome for what we are really interested in, which is patient important benefit and harm. Example: consistent evidence from well- designed studies of fewer false negative results with non-contrast helical CT than with IVP in the diagnosis of acute urolithiasis. However, the stones in the ureters missed by IVP are smaller, and hence are likely to pass more easily. Since randomized trials evaluating outcomes in patients treated for smaller stones are not available, the extent to which CT would reduce missed cases (false negatives) and have important health benefits - remains uncertain. Deeks JJ. Systematic reviews in health care: Systematic reviews of evaluations of diagnostic and screening tests. Bmj 2001;323(7305):157-62. Worster A, Preyra I, Weaver B, Haines T. The accuracy of noncontrast helical computed tomography versus intravenous pyelography in the diagnosis of suspected acute urolithiasis: a meta-analysis. Ann Emerg Med 2002;40(3):280-6. Worster A, Haines T. Does replacing intravenous pyelography with noncontrast helical computed tomography benefit patients with suspected acute urolithiasis? Can Assoc Radiol J 2002;53(3):144-8.

Judgments about Evidence Quality High Moderate Low Very Low The quality of evidence indicates the extent to which one can be confident that an estimate of effect is correct. Moving Down serious limitations in study design or execution, sparse data: serious flaws can lower by one level, fatal flaws can lower by two levels consistency: important inconsistency can lower by one level directness of evidence: some uncertainty lower by one level, major uncertainty lower by two levels selection bias or reporting bias: strong evidence lower by 1 level imprecise evidence, wide CI can lower by one level adapted from Gordon Guyatt

Judgments about Evidence Quality The quality of evidence indicates the extent to which one can be confident that an estimate of effect is correct. High Moderate Further research is very unlikely to change our confidence in the estimates of diagnostic value. Further research is likely to have an important impact on our confidence in the estimate of diagnostic value and may change the estimate. Low Very Low Further research is very likely to have an important impact on our confidence in the estimate of diagnostic value and is likely to change the estimate. Any estimate of effect is very uncertain. GRADE s four categories of quality of evidence imply a gradient of confidence in estimates of the effect of a diagnostic test or strategy on patient-important outcomes

Coronary CT scanning versus invasive angiography: arriving at a bottom line for study quality Evidence profile / quality for each outcome Quality assessment of diagnostic accuracy studies example: should multislice spiral computed tomography versus conventional coronary angiography be used for diagnosis of coronary artery disease? 1 All outcomes No of studies Design Limitations Directness Inconsistency Imprecise data Reporting Bias Quality 21 studies (1,570 patients) accuracy studies 1 no Strong relation to patient important outcomes yes 2 (-1) no no 3 Moderate 1 all patients were selected to undergo conventional coronary angiography and were, therefore, presenting with high probability of coronary artery disease (median prevalence in the included studies: 63.5%, Range 6.6 to 100%) 2 there was significant heterogeneity of results (not investigated further by the authors of the review) 3 the possibility of reporting bias exists but it was not considered sufficient to downgrade

Evaluating Diagnosis Studies: Strength (Results) What are the Results? (grading strength) What is the magnitude of benefit, and how reliable/precise are these results? What are the magnitudes of risk, burden, and cost; and how reliable/precise are these results? Do the benefits outweigh the risks/burdens/costs? Are there known trade-offs? Are there unknown possible trade-offs? Result Parameters: Accuracy, Likelihood Ratios, Confidence Intervals Are likelihood ratios for the test results presented or data necessary for their calculation included? Statistics for Pre-test Probability (prevalence), Likelihood Ratios, Sensitivity and Specificity, Predictive Values Are the results of the test useful?

Benefit/risk/harm/cost balance Strength Benefits Absolute Benefit Increase X Utility Desirable effects: health benefits, less burden, savings 1 0 0 Net Benefits Uncertain Trade-Offs Uncertain Trade-Offs Net Harms Risks & Harms Absolute Risk Increase X Utility (Value Factor) Undesirable effects: harms, more burden, costs The strength of a recommendation indicates the extent to which one can be confident that adherence to the recommendation will do more good than harm. 1 Net benefits: The test intervention does more good than harm. Trade-offs: There are important trade-offs between the benefits and harms. Uncertain trade-offs: Not clear whether the test intervention does more good than harm. No net benefits: The test intervention does not do more good than harm. Net harms: The test intervention does more harm than good.

Decision Thresholds and Diagnostic Tests: Alternative Conceptualization Tests or test strategies that result in patients moving below the test threshold or above the treatment threshold (given the treatment exists) will often lead to strong recommendations despite the false negative and false positive test results. On the other hand, test or strategies that will only marginally change the probability of disease and require further testing will usually lead to weak recommendations. The test properties that best coincide with results that move the patient s probability significantly up or down are the Likelihood Ratios. Test Threshold Probability of diagnosis Treatment Threshold 0% 100% Low probability, no further testing no treatment Probability between test and treatment thresholds; further testing required WEAK STRONG High probability, no further testing treatment

Judgments about Overall Grade of Recommendations Judgments about the strength of a recommendation (Strong or Weak, For or Against) require consideration of: all critical outcomes (must be critical to care, critical to decision, not just important) the quality of the evidence the lowest quality of evidence for any critical outcome should provide the basis for grading the balance between benefits and harms if information on harm is critical, it should be included even if uncertainty exists translation of the evidence into specific circumstances evidence is global, application is local the certainty of the baseline risk Also important to consider costs (resource utilization) prior to making a recommendation

Conclusions The GRADE approach to grading the quality of evidence and strength of recommendations for diagnostic guidelines provides comprehensive and transparent methodology for developing these recommendations. The GRADE Working Group presents an overview of the approach, already established for grading treatment recommendations. Publication for Diagnostic GRADE in BMJ is pending. Extensive application to diagnostic guidelines is likely to refine the approach. The basic methodology & considerations that follow from recognizing test results as surrogate markers are unlikely to change.

Citations Atkins D, Best D, Briss PA, Eccles M, Falck-Ytter Y, Flottorp S, et al. Grading quality of evidence and strength of recommendations. BMJ 2004;328(7454):1490. Schünemann HJ, Jaeschke R, Cook DJ, Bria WF, El-Solh AA, Ernst A, et al. An official ATS statement: grading the quality of evidence and strength of recommendations in ATS guidelines and recommendations. Am J Respir Crit Care Med 2006;174(5):605-14. Schunemann HJ, Oxman AD, Brozek J, Glasziou P, Jaeschke R, Williams J, et al. GRADEing the quality of evidence and strength of recommendations for diagnostic tests and strategies. BMJ submitted. Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig LM, et al. Towards complete and accurate reporting of studies of diagnostic accuracy: The STARD Initiative. Ann Intern Med 2003;138(1):40-4. http://www.stardstatement.org/website%20stard/ Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig LM, et al. The STARD statement for reporting studies of diagnostic accuracy: explanation and elaboration. Ann Intern Med 2003;138(1):W1-12. http://www.stardstatement.org/website%20stard/ Bossuyt PM, Irwig L, Craig J, Glasziou P. Comparative accuracy: assessing new tests against existing diagnostic pathways. BMJ 2006;332(7549):1089-92. Lijmer JG, Mol BW, Heisterkamp S, Bonsel GJ, Prins MH, van der Meulen JH, et al. Empirical evidence of design-related bias in studies of diagnostic tests. JAMA 1999;282(11):1061-6. Rutjes AW, Reitsma JB, Di Nisio M, Smidt N, van Rijn JC, Bossuyt PM. Evidence of bias and variation in diagnostic accuracy studies. CMAJ 2006;174(4):469-76. Whiting P, Rutjes AW, Reitsma JB, Bossuyt PM, Kleijnen J. The development of QUADAS: a tool for the quality assessment of studies of diagnostic accuracy included in systematic reviews. BMC Med Res Methodol 2003;3:25. http://www.biomedcentral.com/1471-2288/3/25 Whiting PF, Weswood ME, Rutjes AW, Reitsma JB, Bossuyt PN, Kleijnen J. Evaluation of QUADAS, a tool for the quality assessment of diagnostic accuracy studies. BMC Med Res Methodol 2006;6:9. http://www.biomedcentral.com/1471-2288/6/9