Cochrane Update Assessing evidence in public health: the added value of GRADE

Similar documents
Washington, DC, November 9, 2009 Institute of Medicine

PHO MetaQAT Guide. Critical appraisal in public health. PHO Meta-tool for quality appraisal

Guideline development in TB diagnostics. Karen R Steingart, MD, MPH McGill University, Montreal, July 2011

The Joanna Briggs Institute Reviewers Manual 2014

Copyright GRADE ING THE QUALITY OF EVIDENCE AND STRENGTH OF RECOMMENDATIONS NANCY SANTESSO, RD, PHD

GRADE. Grading of Recommendations Assessment, Development and Evaluation. British Association of Dermatologists April 2014

Reporting the effects of an intervention in EPOC reviews. Version: 21 June 2018 Cochrane Effective Practice and Organisation of Care Group

GRADE. Grading of Recommendations Assessment, Development and Evaluation. British Association of Dermatologists April 2018

Cochrane Pregnancy and Childbirth Group Methodological Guidelines

Authors face many challenges when summarising results in reviews.

Outcomes and GRADE Summary of Findings Tables: old and new

Controlled Trials. Spyros Kitsiou, PhD

Objectives. Information proliferation. Guidelines: Evidence or Expert opinion or???? 21/01/2017. Evidence-based clinical decisions

Building the Evidence for Global Public Health

Guideline Development At WHO

BACKGROUND + GENERAL COMMENTS

An evidence rating scale for New Zealand

Cochrane-GRADE Workshop

Results. NeuRA Treatments for internalised stigma December 2017

Overview and Comparisons of Risk of Bias and Strength of Evidence Assessment Tools: Opportunities and Challenges of Application in Developing DRIs

The Ever Changing World of Sepsis Management. Laura Evans MD MSc Medical Director of Critical Care Bellevue Hospital

SSAI Clinical Practice Committee guideline work flow v2. A. Formal matters

MINI SYMPOSIUM - EUMASS - UEMASS European Union of Medicine in Assurance and Social Security

MEETING REPORT INTERCONNECT: A GLOBAL INITIATIVE ON GENE-ENVIRONMENT INTERACTION IN DIABETES AND OBESITY.

Evaluating the Strength of Clinical Recommendations in the Medical Literature: GRADE, SORT, and AGREE

Progress from the Patient-Centered Outcomes Research Institute (PCORI)

A Framework for Optimal Cancer Care Pathways in Practice

EVIDENCE AND RECOMMENDATION GRADING IN GUIDELINES. A short history. Cluzeau Senior Advisor NICE International. G-I-N, Lisbon 2 November 2009

Results. NeuRA Worldwide incidence April 2016

Trials and Tribulations of Systematic Reviews and Meta-Analyses

International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use

Clinical Research Scientific Writing. K. A. Koram NMIMR

Online Annexes (2-4)

Improving reporting for observational studies: STROBE statement

GRADE, Summary of Findings and ConQual Workshop

Framework on the feedback of health-related findings in research March 2014

A Framework for Patient-Centered Outcomes Research

Family Support for Children with Disabilities. Guidelines for Demonstrating Effectiveness

JBI GRADE Research School Workshop. Presented by JBI Adelaide GRADE Centre Staff

What is indirect comparison?

Results. NeuRA Hypnosis June 2016

Standards for the reporting of new Cochrane Intervention Reviews

These comments are an attempt to summarise the discussions at the manuscript meeting. They are not an exact transcript.

PGY1 Learning activities-ebcp Scripts

'Summary of findings' tables in network meta-analysis (NMA)

ACR OA Guideline Development Process Knee and Hip

CHAPTER 9 GENERAL RECOMMENDATIONS FRAMEWORK FOR MAKING RECOMMENDATIONS

A Case Study: Two-sample categorical data

CONSORT 2010 checklist of information to include when reporting a randomised trial*

ISPOR Task Force Report: ITC & NMA Study Questionnaire

Surviving Sepsis Campaign: International Guidelines for Management of Sepsis and Septic Shock: 2016

Mapping from SORT to GRADE. Brian S. Alper, MD, MSPH, FAAFP Editor-in-Chief, DynaMed October 31, 2013

Using natural experiments to evaluate population health interventions

Introduzione al metodo GRADE

Peer counselling A new element in the ET2020 toolbox

Structural Approach to Bias in Meta-analyses

Traumatic brain injury

An evidence-based laboratory medicine approach to evaluate new laboratory tests

The evidence system of traditional Chinese medicine based on the Grades of Recommendations Assessment, Development and Evaluation framework

Table 2. Mapping graduate curriculum to Graduate Level Expectations (GDLEs) - MSc (RHBS) program

PUBLIC HEALTH GUIDANCE SCOPE

Results. NeuRA Motor dysfunction April 2016

In many healthcare situations, it is common to find

Uses and misuses of the STROBE statement: bibliographic study

VERDIN MANUSCRIPT REVIEW HISTORY REVISION NOTES FROM AUTHORS (ROUND 2)

UNIT 5 - Association Causation, Effect Modification and Validity

Access to newly licensed medicines. Scottish Medicines Consortium

Alcohol interventions in secondary and further education

Background EVM. FAO/WHO technical workshop on nutrient risk assessment, Geneva, May 2005, published 2006.

January 2, Overview

Issues of validity and reliability in qualitative research

Module 9 Strategic information

DRAFT (Final) Concept Paper On choosing appropriate estimands and defining sensitivity analyses in confirmatory clinical trials

Problem solving therapy

Interventions, Effects, and Outcomes in Occupational Therapy

How to use this appraisal tool: Three broad issues need to be considered when appraising a case control study:

Assignment 4: True or Quasi-Experiment

Meta-analysis of safety thoughts from CIOMS X

Incorporating qualitative research into guideline development: the way forward

Student Social Worker (End of Second Placement) Professional Capabilities Framework Evidence

Results. NeuRA Mindfulness and acceptance therapies August 2018

CHAMP: CHecklist for the Appraisal of Moderators and Predictors

Accepted refereed manuscript of:

Essential Skills for Evidence-based Practice Understanding and Using Systematic Reviews

Webinar 3 Systematic Literature Review: What you Need to Know

HICPAC Recommendation Categorization Update Workgroup: Public Comment Summary and Finalization

Medicare Physician Fee Schedule Final Rule for CY 2018 Appropriate Use Criteria for Advanced Diagnostic Imaging Services Summary

Criteria for evaluating transferability of health interventions: a systematic review and thematic synthesis

1. Draft checklist for judging on quality of animal studies (Van der Worp et al., 2010)

1 The conceptual underpinnings of statistical power

INTRODUCTION AND GUIDING PRINCIPLES

Content. Evidence-based Geriatric Medicine. Evidence-based Medicine is: Why is EBM Needed? 10/8/2008. Evidence-based Medicine (EBM)

BIOLOGY. The range and suitability of the work submitted

Comparative Effectiveness Research Collaborative Initiative (CER-CI) PART 1: INTERPRETING OUTCOMES RESEARCH STUDIES FOR HEALTH CARE DECISION MAKERS

Results. NeuRA Family relationships May 2017

Animal-assisted therapy

Business Plan. July 2016 to June Trusted evidence. Informed decisions. Better health.

Audit report of published abstracts and Summary of findings tables

To evaluate a single epidemiological article we need to know and discuss the methods used in the underlying study.

INTRODUCTION. Evidence standards for justifiable evidence claims, June 2016

Transcription:

Journal of Public Health Vol. 34, No. 4, pp. 631 635 doi:10.1093/pubmed/fds092 Cochrane Update Assessing evidence in public health: the added value of GRADE Belinda J. Burford 1,2, Eva Rehfuess 3, Holger J. Schünemann 4,5, Elie A. Akl 4,6, Elizabeth Waters 1,2, Rebecca Armstrong 1,2, Hilary Thomson 7, Jodie Doyle 1,2, Tahna Pettman 1,2 1 Jack Brockhoff Child Health and Wellbeing Program, Melbourne, Australia 2 Cochrane Public Health Group, McCaughey Centre, Melbourne School of Population Health, The University of Melbourne, Melbourne, Australia 3 Institute for Medical Informatics, Biometry and Epidemiology, University of Munich, Germany 4 Department of Clinical Epidemiology and Biostatistics, McMaster University, Hamilton, ON, Canada L8S 4K1 5 Department of Medicine, McMaster University, Hamilton, ON, Canada L8S 4K1 6 Department of Internal Medicine, American University of Beirut, Beirut, Lebanon 7 MRC/CSO Social and Public Health Sciences Unit, Glasgow G12 8RZ, UK Address correspondence to Belinda J. Burford, E-mail: b.burford@unimelb.edu.au Introduction Concepts of public health vary globally, from countries and continents where public health is focused on preventive medicine and health-care systems, to areas where public health is conceptualized as focusing on social and macrolevel intergovernmental strategies, including health policies. The ways in which evidence evolves within this context includes specifically designed primary research, convenient or pragmatic evaluations with rigourous methodologies, and evaluations that capitalize on opportunities due to politics, circumstance or resources. These challenges, acknowledging they are not limited to public health, mean that formulating clear public health questions to ask of the evidence is crucial. Evidence collected in the context of population or public health should, as in any health-care setting, be used to make decisions about whether or not to implement new strategies, maintain or stop existing strategies. However, assessing evidence in public health presents some challenges that require resolution. For instance, some public health strategies or interventions can be implemented over a short time frame (e.g. outside smoking laws) while others may require intergenerational programmes (e.g. obesity prevention). Another challenge for public health and health promotion interventions is the understanding of harm or unintended consequence. Due to their emphasis on health promotion and prevention, there is a risk that public health interventions are perceived to be free of harms. This may be due to the fact that harms associated with clinical interventions, such as pharmacological or surgical interventions, are much more obvious. However, we know that implementation of any intervention or strategy presents a risk of harm, which needs to be anticipated and evaluated. Within the context of the spectrum of public health evidence, the quality of the evidence is an important component of the decision-making and knowledge-translation process. For interventions tha t seek to identify effectiveness, this requires an assessment of the degree of certainty that estimates of outcomes reported in a group of studies are high enough to support decisions or recommendations. GRADE (Grades of Recommendation, Assessment, Development, and Evaluation) is the most widely endorsed framework for assessing the quality of a body of evidence and deriving the strength of a potential evidence-informed recommendation. It has an ever-strengthening presence in clinical decisions (www.gradeworkinggroup.org), and much investment is currently being made in public health to assess its appropriateness and usefulness. 1 Belinda J. Burford, Research Fellow and Methods Advisor Eva Rehfuess, Senior Scientist Holger J. Schünemann, Professor and Chair Elie A. Akl, Associate Professor of Medicine Elizabeth Waters, Professor, Jack Brockhoff Chair of Child Public Health, and Coordinating Editor Rebecca Armstrong, Senior Research Fellow, Knowledge Translation & Public Health Evidence, Editorial and Methods Advisor Hilary Thomson, Senior Investigator Scientist Jodie Doyle, Managing Editor Tahna Pettman, Research Fellow, Evidence and Knowledge Translation Team # The Author 2012, Published by Oxford University Press on behalf of Faculty of Public Health. All rights reserved 631

632 JOURNAL OF PUBLIC HEALTH Assessing a body of evidence using GRADE In GRADE, the quality of evidence is defined as the extent of the confidence that the estimate of an effect is adequate to support a particular decision or recommendation. This assessment of quality of evidence considers study design, as well as five criteria that might decrease our confidence (i.e. risk of bias, imprecision, inconsistency in results across studies, publication bias and the lack of comparability of the population, intervention and outcomes of interest to those in the available studies) and three criteria that might increase our confidence (i.e. strong association, dose response gradient, opposing residual plausible confounding and bias). The assessment results in an overall GRADE rating of high (), moderate ( ), low ( )orverylow ( ). For more detail on this process, readers should refer to a series of papers in the Journal of Clinical Epidemiology. 2 Given that the evaluation of the quality domains may involve subjective judgements, the GRADE framework seeks to make these judgements and their rationale explicit, transparent and as reliable as possible. There are important differences between GRADE and other evidence rating systems: GRADE offers a standardized presentation of the summary of evidence with an emphasis on transparency, consideration of the quality of evidence is at the outcome level (rather than at the study level), factors other than study design alone are considered for rating the quality of evidence and, based on these, there is the opportunity to rate the quality of evidence up or down. Challenges with applying GRADE in PH reviews and potential solutions All Cochrane intervention reviews are required to use GRADE or at least comment on the GRADE criteria in the discussion of findings (http://www.editorial-unit. cochrane.org/mecir). The Cochrane Public Health Group (CPHG) is working with authors of reviews to address the challenges of applying the GRADE framework within reviews which typically involve complex, population-level interventions. There is some debate as to the relevance or appropriateness of using the GRADE framework to assess the effectiveness of public health interventions 3 8 ). Here we briefly highlight some common challenges in applying GRADE within public health evidence reviews and discuss approaches for dealing with these. We provide reasons for why we believe it is useful and appropriate to consider the GRADE criteria in such reviews. This is not intended to be a comprehensive discussion of all concerns in detail, but to encourage the use of GRADE and documentation of specific challenges encountered, as well as to stimulate further debate and discussion regarding how to appropriately evaluate evidence in public health for decision-making. If randomized controlled trials are not appropriate or possible, why does GRADE penalize for that? The first step in the GRADE process is to identify whether the evidence in question was derived from randomized controlled trials or other types of studies. This determines whether the starting point in GRADE is high (for randomized controlled trials) or low (other types of studies). Many think it is unfair to penalize the quality of evidence in a context where randomization may not be feasible (e.g. implementation of jurisdiction-wide public health laws) or ethical (e.g. examining the links between exposure to a workplace hazard and health outcomes). Consequently, it could be suggested that the best possible evidence (i.e. from observational studies) should be rated as high quality. However, starting non-randomized studies at a lower GRADE should not be viewed as a penalty, rather it is an acknowledgement that not randomizing simply decreases our certainty that observed effects in the intervention group compared with the comparison group are in fact due to the intervention itself. The benefit of randomization in minimizing the risk of selection bias and confounding is commonly accepted, and has been tested empirically. 9 An alternative way to conceptualize GRADE may be to think of the GRADE process as all studies starting out equally, and randomization being one factor to warrant upgrading. This alternative framing reassures many users when they think of randomization as being considered on par with the other upgrading criteria. Are all observational studies equal in GRADE? Although the initial quality rating for all observational studies is a (read 2 plus) rating, the GRADE process requires that domains beyond that of risk of bias (also called limitations in study design and execution) contribute to the final GRADE rating. In GRADE, only the final assessment of quality matters, that is, the confidence or certainty in effect estimates (including its direction) after evaluating all relevant domains. Therefore, although it appears at first glance that cross-sectional studies receive the same level of certainty as more appropriate designs, a full evaluation using the GRADE criteria demonstrates that GRADE makes no such assumption. This is because an evaluation of the risk of bias will lead to a more specific assessment of the strength and weaknesses of the designs.

ASSESSING EVIDENCE IN PUBLIC HEALTH 633 The more pertinent issue to focus on is the question of whether there are domains not currently considered by GRADE that could lead to an increase in the quality of evidence. This issue continues to be discussed by the GRADE Working Group. Can there be too much heterogeneity to apply GRADE? Heterogeneity is inevitable when considering evaluations of most public health interventions and programmes, with study heterogeneity often resulting in statistical heterogeneity. Study heterogeneity can be a result of differing populations, settings, and contexts as well as variability in the intervention itself, or differences in comparison conditions or outcome measures. They can also be the result of varying methodological quality between studies. All of these differences, which can occur with any health-care question, are likely to have an impact on the measured effectiveness of an intervention, so must be documented in a systematic review. Such heterogeneity can mean authors feel that some standard systematic review approaches, including GRADE, may not apply. However, an important aspect of the GRADE process is to assess for inconsistency in the findings between studies included in a review and to think about appropriate grouping of populations, settings, interventions, comparisons, outcomes and methodologies. If inconsistency is identified there is a need to explore the underlying reasons, typically by conducting subgroup or sensitivity analyses. This raises the issue of appropriate and credible subgroup analyses, which is beyond the scope of this paper. Interested readers should refer to refs (10, 11). Importantly, such potential differences must be considered in the initial stages of the review process in deciding whether or not studies are sufficiently similar to be combined to answer a question of interest to decision-makers. It should not be assumed that narrow review questions are preferred in order to reduce the likelihood of inconsistency. Rather, decision-makers may want to know about the evidence base for a range of potential options for addressing a problem. Depending on the purpose of the systematic review (e.g. searching for a clear signal across different but related interventions), study heterogeneity may be expected to result in a large degree of statistical heterogeneity. In this scenario, it may seem that downgrading due to inconsistency will be inevitable but review authors should carefully consider whether this is merited, especially where subgroup analyses or meta-regression help explain some of the observed statistical heterogeneity. If everything depends on context, is GRADE meaningful? Public health interventions are often highly dependent on context, so different findings may be expected if the programme is repeated in another setting. As the GRADE rating articulates how likely it is that further research will change our confidence in the estimate of effects, it can seem as though applying GRADE in this scenario may be less meaningful. However, where context is so crucial, it is arguably even more important to be explicit in acknowledging this in a summary of the evidence about what an intervention or programme achieves. Otherwise, decision-makers may, rightly so, expect to see the same level of effectiveness in their setting/community. GRADE provides a useful prompt to consider the important issue of context in public health. In cases where the effects may be highly dependent on the context, GRADE allows this to be incorporated into the process of assessing the evidence. Details about the particular contextual factors that are especially critical in mediating the effects in included studies can be documented, thus allowing readers to more appropriately judge whether similar interventions in their own context are likely to have similar results. If there is no pooled effect size, can GRADE still be used? In many reviews, there are outcomes for which it is not possible to provide a pooled effect size, perhaps due to significant heterogeneity (across studies, populations, interventions and outcomes) or lack of data due to insufficient evidence or reporting. Some argue that this makes important aspects of GRADE meaningless, such as imprecision and inconsistency; it also makes the presentations of findings more challenging. While it is true that there are limitations to what can be said about precision without an effect estimate, it is important to consider the intention behind examining precision when summarizing the evidence of effectiveness. Even without a pooled effect size, if similar outcomes are summarized from a group of studies, there must be information about the effects observed in the individual studies. The question then becomes: how precise an estimate is obtainable from each study and, collectively, what does that look like? To answer this question, the individual effect sizes and confidence intervals observed in each study should be examined. Precision is also related to the amount of information, so a judgement of sufficient information (in terms of numbers of participants and/or numbers of events) is required to give reliable estimates. A rough rule of thumb

634 JOURNAL OF PUBLIC HEALTH provided by the GRADE working group is at least 3000 participants, and for dichotomous outcomes, at least 300 events. 12 In terms of assessing consistency of findings, it is important to consider whether all studies are providing a similar indication of effect or whether there are conflicting findings. Again, examining effect estimates and associated confidence intervals will provide an indication of this. If there are no estimates of effect from individual studies, it is important to question what is being assessed from the studies. Perhaps it is to ascertain whether or not there are positive outcomes, regardless of effect size. Perhaps the review is intended to be a map of the current evidence base, similar to a scoping review. These questions are intrinsically linked to what conclusions can be drawn from the evidence. In any case, inconsistency and imprecision should be addressed in an effectiveness review, and the lack of a pooled effect size, while posing challenges in making decisions about the level of inconsistency and imprecision, does not seem to present reasonable justification for ignoring fundamental questions about what the evidence is suggesting. A structured approach to assessing the included studies even in the absence of meta-analytical results can also help identify the research agenda more clearly. Several summaries produced by the SUPPORT Collaboration (http://www. support-collaboration.org/) provide excellent examples of how to apply GRADE and present findings in the absence of pooled effect estimates. What are the implications of the final GRADE rating? It is reasonable to assume that a lot of evidence in public health reviews may have a GRADE rating of or (low or very low). One frequently expressed concern is whether this might be used as a reason for not valuing the available evidence base and subsequently not implementing potentially beneficial public health interventions. In fact, this concern is not restricted to public health. The implications of quality ratings for users of the evidence should always be considered, regardless of the framework used to arrive at such ratings. What is needed is an understanding of what the GRADE rating means, how best to communicate it to decision-makers and to separate the evidence assessment from developing recommendations. GRADE refers to our certainty in an effect estimate, or if there is no effect estimate, our certainty in the summarized findings from the body of evidence. That is, how likely is it that the findings reported could be substantially different in the contexts in which the review findings are intended to be applied? The rating on its own does not provide particularly useful information for the reader; however, the reasons for rating down could be useful. Review authors should also consider whether or not the GRADE rating is an accurate reflection of their true confidence in the evidence, and what other types of research might increase their confidence. The GRADE Working Group welcomes examples where there is a mismatch between the GRADE rating and our confidence in a body of evidence. If the current GRADE terminology for each rating sounds too negative or judgemental in certain contexts, there is also the option of using different terminology or relying solely on symbols. Regardless, careful documentation of the decisions taken to arrive at different ratings is what can provide useful information for decision-makers. It is also important to acknowledge that recommending a particular intervention or strategy depends not only on the quality of evidence to support the decision, but also on a range of other factors which impact on effectiveness, such as cost-effectiveness, acceptability, feasibility and many other issues. There is some criticism that GRADE is not as rigorous for assessing these other factors as it is for determining the quality of the evidence 13 and that the range of factors considered in going from evidence to recommendations may need to be broadened. Efforts are underway in the context of the DECIDE project (Developing and Evaluating Communication Strategies to Support Informed Decisions and Practice Based on Evidence) to develop a more systematic, transparent and comprehensive way to consider and document the decisions leading to a recommendation (http://www. decide-collaboration.eu/). Conclusion Decision-makers choosing between various strategies to address public health issues need to make an informed assessment of the benefits and potential harms of the alternatives and ensure that limited resources are used wisely. The GRADE framework is widely endorsed internationally, providing a standardized approach for assessing the quality of a body of effectiveness evidence. It increases transparency, and explicitness, and potentially reduces the influence of conflicts of interests in evidence assessment and interpretation. The World Health Organization, for example, requires all guidelines to be underpinned by evidence that has been assessed using the GRADE process. There are significant advantages in using the same framework to assess effectiveness evidence across both clinical and public health spheres. While it is important to acknowledge the challenges in assessing the effectiveness of more upstream, population-level

ASSESSING EVIDENCE IN PUBLIC HEALTH 635 approaches, these challenges are not sufficient to argue that the GRADE criteria are not relevant in public health. There is sufficient flexibility in applying the GRADE criteria and upgrading and downgrading decisions should meaningfully reflect our confidence in the findings. Many more examples of the application of the GRADE criteria in public health are required to examine the issues carefully and propose areas for improvement. Acknowledgements Eva Rehfuess acknowledges financial support from the Munich Center of Health Sciences. The CPHG acknowledges the support of the Victorian Health Promotion Foundation (VicHealth) and the National Health and Medical Research Council of Australia. References New Cochrane protocols and reviews of interest to health promotion and public health stakeholders from Issues 8 9, 2012 of The Cochrane Library (*denotes CPHG review/protocol) Child health Interventions for tobacco use prevention in Indigenous youth Motivational interviewing for improving outcomes in youth living with HIV Consumer and communication strategies E-mail for clinical communication between health-care professionals Community health *Fortification of staple foods with vitamin A for preventing vitamin A deficiency ( protocol: review underway Interventions for preventing falls in older people living in the community (updated) *Slum upgrading strategies involving physical environment and infrastructure interventions and their effects on health and socio-economic outcomes ( protocol: review underway) Effective practice/health systems Integration of HIV/AIDS services with maternal, neonatal and child health, nutrition and family planning services Interventions to improve the use of systematic reviews in decision-making by health system managers, policy-makers and clinicians Injury prevention Home safety education and provision of safety equipment for injury prevention (updated) Pregnancy and childbirth Antenatal breastfeeding education for increasing breastfeeding duration (updated) Antenatal dietary advice and supplementation to increase energy and protein intake (updated) Optimal duration of exclusive breastfeeding (updated) 1 Akl EA, Kennedy C, Konda K et al. Using GRADE methodology for the development of public health guidelines for the prevention and treatment of HIV and other STIs among men who have sex with men and transgender people. BMC Public Health 2012;12:386; doi:10.1186/1471-2458-12-386. 2 Guyatt GH, Oxman AD, Schünemann HJ et al. GRADE guidelines: a new series of articles. J Clin Epidemiol 2011a;64:380 2. 3 Duclos P, Durrheim DN, Reingold A et al. Developing evidencebased immunisation recommendations and GRADE. Vaccine, 2012; 10.1016/j.vaccine.2012.02.041. 4 Durrheim DN, Reingold A. Modifying the GRADE framework could benefit public health. J Epidemiol Community Health 2010;64:387. 5 ECDC. Evidence-based methodologies for public health. Stockholm: European Centre for Disease Prevention and Control, 2011. 6 Rehfuess EA, Bruce N, Prüss-Üstün A. GRADE for the advancement of public health. J Epidemiol Community Health 2011;65:559. 7 Schünemann H, Hill S, Guyatt G et al. The GRADE approach and Bradford Hill s criteria for causation. J Epidemiol Community Health 2010;65:392e5. 8 WHO. Guidance for the development of evidence-based vaccine-related recommendations. Continuous updates are available at http://www.who.int/immunization/sage/guidelines_development_ recommendations.pdf (25 June 2012, last date accessed). 9 Deeks JJ, Dinnes J, D Amico R et al. Evaluating non-randomised intervention studies. Health Technol Assess 2003;7(27):1 173. 10 Sun X, Briel M, Walter SD et al. Is a subgroup effect believable? Updating criteria to evaluate the credibility of subgroup analyses. BMJ 2010;340:c117. 11 Sun X, Briel M, Busse JW et al. Credibility of claims of subgroup effects in randomised controlled trials: systematic review. BMJ 2012;344:e1553. 12 Guyatt G, Oxman AD, Kunz R et al. GRADE guidelines 6. Rating the quality of evidence imprecision. J Clin Epidemiol 2011b;64:1283 93. 13 Barbui C, Dua T, van Ommeren M et al. Challenges in developing evidence-based recommendations using the GRADE approach: the case of mental, neurological and substance use disorders. PLOS Med 2010;7(8):e1000322.