Risk of bias assessment for. statistical methods

Similar documents
Overview Study rationale Methodology How CCT interventions approach health demand and supply Theory-based approach: does the evidence support the impl

Version No. 7 Date: July Please send comments or suggestions on this glossary to

QUASI-EXPERIMENTAL HEALTH SERVICE EVALUATION COMPASS 1 APRIL 2016

Evaluating Social Programs Course: Evaluation Glossary (Sources: 3ie and The World Bank)

Introduction to Applied Research in Economics Kamiljon T. Akramov, Ph.D. IFPRI, Washington, DC, USA

Downloaded from:

Quantitative Methods. Lonnie Berger. Research Training Policy Practice

WWC STUDY REVIEW STANDARDS

QUASI-EXPERIMENTAL APPROACHES

Propensity Score Analysis Shenyang Guo, Ph.D.

Randomization as a Tool for Development Economists. Esther Duflo Sendhil Mullainathan BREAD-BIRS Summer school

Causal Validity Considerations for Including High Quality Non-Experimental Evidence in Systematic Reviews

Glossary From Running Randomized Evaluations: A Practical Guide, by Rachel Glennerster and Kudzai Takavarasha

Causal Inference in Observational Settings

Study Design. Svetlana Yampolskaya, Ph.D. Summer 2013

Which Comparison-Group ( Quasi-Experimental ) Study Designs Are Most Likely to Produce Valid Estimates of a Program s Impact?:

Lecture II: Difference in Difference. Causality is difficult to Show from cross

26:010:557 / 26:620:557 Social Science Research Methods

1. Draft checklist for judging on quality of animal studies (Van der Worp et al., 2010)

About 3ie. This Working Paper was written by Howard White, 3ie. Contacts

Ec331: Research in Applied Economics Spring term, Panel Data: brief outlines

What is Multilevel Modelling Vs Fixed Effects. Will Cook Social Statistics

Threats and Analysis. Bruno Crépon J-PAL

Difference-in-Differences in Action

TRANSLATING RESEARCH INTO ACTION. Why randomize? Dan Levy. Harvard Kennedy School

Why randomize? Rohini Pande Harvard University and J-PAL.

Lecture 4: Research Approaches

Overview and Comparisons of Risk of Bias and Strength of Evidence Assessment Tools: Opportunities and Challenges of Application in Developing DRIs

Research Approach & Design. Awatif Alam MBBS, Msc (Toronto),ABCM Professor Community Medicine Vice Provost Girls Section

Threats and Analysis. Shawn Cole. Harvard Business School

Structural Approach to Bias in Meta-analyses

Difference-in-Differences

Introduction to Program Evaluation

WWC Standards for Regression Discontinuity and Single Case Study Designs

Recent developments for combining evidence within evidence streams: bias-adjusted meta-analysis

NBER TECHNICAL WORKING PAPER SERIES USING RAMDOMIZATION IN DEVELOPMENT ECONOMICS RESEARCH: A TOOLKIT. Esther Duflo Rachel Glennerster Michael Kremer

Methods of Randomization Lupe Bedoya. Development Impact Evaluation Field Coordinator Training Washington, DC April 22-25, 2013

Impact Evaluation Methods: Why Randomize? Meghan Mahoney Policy Manager, J-PAL Global

Confounding by indication developments in matching, and instrumental variable methods. Richard Grieve London School of Hygiene and Tropical Medicine

Challenges of Observational and Retrospective Studies

Malnutrition in Ecuador: differentiating the impact of several interventions

Measuring and Assessing Study Quality

Instrumental Variables Estimation: An Introduction

Experimental Methods in Health Research A. Niroshan Siriwardena

Can Quasi Experiments Yield Causal Inferences? Sample. Intervention 2/20/2012. Matthew L. Maciejewski, PhD Durham VA HSR&D and Duke University

Cross-Lagged Panel Analysis

Can Conditional Cash Transfers Play a Greater Role in Reducing Child Malnutrition? Lucy Bassett October 25, 2007

Evaluating the Matlab Interventions Using Non-standardOctober Methods10, and2012 Outcomes 1 / 31

RESEARCH METHODS. Winfred, research methods, ; rv ; rv

Evidence- and Value-based Solutions for Health Care Clinical Improvement Consults, Content Development, Training & Seminars, Tools

Lecture II: Difference in Difference and Regression Discontinuity

Conducting Strong Quasi-experiments

RESEARCH METHODS. Winfred, research methods,

Experimental and Quasi-Experimental designs

Empirical Methods in Economics. The Evaluation Problem

Analysis methods for improved external validity

Regression Discontinuity Analysis

Threats to Validity in Experiments. The John Henry Effect

WORKING PAPER SERIES

Threats to validity in intervention studies. Potential problems Issues to consider in planning

Impact Evaluation Toolbox

Supplementary material

MIT LIBRARIES. MiltmilWv I! ''.'''' ;i ill:i!.(;!!:i i'.ir-il:>!\! ItllillPlli. ' iilpbilisil HB31 .M415. no AHHH

Defining Evidence-Based : Developing a Standard for Judging the Quality of Evidence for Program Effectiveness and Utility

The Effects of Maternal Alcohol Use and Smoking on Children s Mental Health: Evidence from the National Longitudinal Survey of Children and Youth

GLOSSARY OF GENERAL TERMS

Research in Real-World Settings: PCORI s Model for Comparative Clinical Effectiveness Research

Controlled Trials. Spyros Kitsiou, PhD

Epidemiologic Methods I & II Epidem 201AB Winter & Spring 2002

Pros. University of Chicago and NORC at the University of Chicago, USA, and IZA, Germany

The validity of inferences about the correlation (covariation) between treatment and outcome.

The Limits of Inference Without Theory

4/10/2018. Choosing a study design to answer a specific research question. Importance of study design. Types of study design. Types of study design

Issues relating to study design and risk of bias when including non-randomized studies in systematic reviews on the effects of interventions

Group Work Instructions

Measuring impact. William Parienté UC Louvain J PAL Europe. povertyactionlab.org

XVII Congresso Nacional de Administração e Contabilidade - AdCont e 29 de outubro de Rio de Janeiro, RJ

Research Plan And Design

Key questions when starting an econometric project (Angrist & Pischke, 2009):

GRADE. Grading of Recommendations Assessment, Development and Evaluation. British Association of Dermatologists April 2014

ISPOR Task Force Report: ITC & NMA Study Questionnaire

PÄIVI KARHU THE THEORY OF MEASUREMENT

Does community-driven development build social cohesion or infrastructure?

Validity and Quantitative Research. What is Validity? What is Validity Cont. RCS /16/04

2013/4/28. Experimental Research

Clinical problems and choice of study designs

Research Design. Beyond Randomized Control Trials. Jody Worley, Ph.D. College of Arts & Sciences Human Relations

Regression Discontinuity Designs: An Approach to Causal Inference Using Observational Data

Assessing Studies Based on Multiple Regression. Chapter 7. Michael Ash CPPA

Econometric analysis and counterfactual studies in the context of IA practices

PLS 506 Mark T. Imperial, Ph.D. Lecture Notes: Reliability & Validity

Issues in African Economic Development. Economics 172. University of California, Berkeley. Department of Economics. Professor Ted Miguel

Cochrane Pregnancy and Childbirth Group Methodological Guidelines

Further data analysis topics

Study Designs in Epidemiology

Getting ready for propensity score methods: Designing non-experimental studies and selecting comparison groups

TREATMENT EFFECTIVENESS IN TECHNOLOGY APPRAISAL: May 2015

Randomized Controlled Trials Shortcomings & Alternatives icbt Leiden Twitter: CAAL 1

Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy Research

Evaluating the Impact of Health Programmes

Transcription:

Risk of bias assessment for experimental and quasiexperimental designs based on statistical methods Hugh Waddington & Jorge Garcia Hombrados

The haphazard way we individually and collectively study the fragility of inferences leaves most of us unconvinced that any inference is believable... It is important we study fragility in a much more systematic way Edward Leamer, 1983: Let s take the con out of econometrics

Important elements of study quality Internal validity Asks about the causal properties External validity Asks about the generalizability of causal properties (or, more generally, the study s findings) Construct validity Asks about the language of the measures Statistical conclusion validity Asks about the assumptions, estimations, and calculations of summary statistics

E.g. Systematic review by Gaarder et al. CCTs and health: unpacking the causal chain Journal of Development Effectiveness, 2010 Many applications to CCT programmes Programme Method (authors) Brazil (Bolsa Familia) PSM (Morris et al 2004) Colombia (Familias en Accion) PSM, DID (Attanasio et al 2005) Honduras (PRAF) RCT (Flores et al 2004; Morris et al 2004) Jamaica (PATH) RDD (Levy and Ohls 2007) Malawi (MDICP) RCT (Thornton 2008) Mexico (Progresa, Oportunidades) RCT (Gertler, 2000 etc etc etc), also PSM, RDD, DID Nepal (SDIP) ITS (Powell-Jackson et al 2009) Nicaragua (RPS) RCT (Barham, Maluccio, Flores 2008) Paraguay (Tekopora) PSM (Soares et al 2008) Turkey Jorge (CCT) Hombrados and Hugh Waddington RDD (Ahmed et al 2006) www.3ieimpact.org

Existing risk of bias tools 100s of risk of bias tools exist (Deeks et al. 2003) Mainly designed for assessing validity of RCTs and epidemiological designs (controlled before and after designs, cohort designs, case-control designs) But do not enable assessment of QEDs commonly used in social enquiry, such as RDDs and IV estimation

Prominent tools Risk of bias tool AHRQ RCTs Quasi- RCTs Cochrane tool? CEBP Natural experime nt (RDD) Instrume ntal variables Matching (eg PSM) Diff-indiff (panel data) CBA Cohort Casecontrol?? Down & Black? EPOC EPPHP NICE SIGN 50 Wells Valentine & Cooper (DIAD)?????

These tools also - Do not enable consistent classification across the full range of studies (experimental and quasi-experimental) - Can lead to overly simplistic and inappropriate classification of risk of bias across quasi-experiments - Are not sufficiently detailed to evaluate the execution of the methods of analysis, including complex statistical procedures - Often do not evaluate appropriate risk of bias criteria for social experiments

Theory and evidence on QEDs Overall, the evidence on international development suggests that high quality quasi-experimental studies based on statistical methods yield comparable results to experiments This seems to be particularly the case when evaluators have knowledge about the allocation rule, and so can model it, or the allocation rule is exogenous (Cook, Shadish and Wong, 2008; Hansen et al. 2011) But theory and evidence also suggests that if they are not appropriately implemented, these designs can yield misleading results (e.g. Glazerman et al., 2003) We should focus on identifying the circumstances under which these approaches yield accurate causal inference

So a tool like MSMS is not appropriate Source: Sherman et al. 1998

Internal validity of QEDs Impact evaluation assignment based on: Randomised assignment (experimental studies) Exogenous variation (natural experiments and RDDs) Selection by planners and or self-selection by participants Risk of bias of quasi-experimental designs largely depends on: The validity of the technique to ensure group equivalence The implementation of the method Other factors such as spillovers/contamination, hawthorne effects

Principles for assessing RoB (Higgins et al. 2011) Do not use quality scales Focus on internal validity Assess risk of bias in results not quality of reporting Assessment requires judgement Choose domains based on a combination of theoretical and empirical considerations Report outcome specific risk of bias

And we would add: Assessment should be based on both study design, as well as execution of the design and analysis Ideally will provide a common framework for evaluation of risk of bias for different types of designs

RoB tool being developed Build in the structure and RoB concept of existing tools including CEBP, EPOC and Cochrane. Address the statistical and conceptual assumptions underpinning the validity of quasi-experimental designs based on statistical methods. Provide a common framework of 8 evaluation criteria for the assessment of risk of bias and confidence levels (internal validity) in impact evaluation using experimental and the quasi-experimental designs based on statistical methods.

Evaluation criteria Category of bias 1. Mechanism of assignment / identification 2. Group equivalence in implementation of the method Selection bias Confounding Relevant questions For experimental designs: is the allocation mechanism appropriate to generate equivalent groups? Does the model of participation capture all relevant observable and unobservable differences in covariates between groups? Is the method of analysis adequately executed? Are the observable results of the counterfactual identification process convincing? Are all likely relevant confounders taken into account in the analysis? Is the estimation method sensitive to nonrandom attrition? 3. Hawthorne effects Motivation bias Are differences in outcomes across groups influenced by participant motivation as a result of programme implementation and, or monitoring? 4. Spill-overs and cross-overs Performance bias 5. Selective methods of analysis Analysis reporting bias Is the programme influencing the outcome of the individuals in the control group (including compensating investments for control groups)? Is the method of analysis or specification model used by the author selectively chosen? Is the analysis convincingly reported (and available for replication)? 6. Other sources of bias Other biases Are the results of the study subject to other threats to validity (e.g. placebo effects, Jorge Hombrados and Hugh Waddington courtesy bias, survey www.3ieimpact.org effects, inadequate

Full internal validity assessment will also take account of 7. Confidence Intervals and significance of the effect Type I and Type II error. Is the study subject to a unit of analysis error not adequately accounted for? Is the study subject to heteroscedasticity not accounted for? Is the study not taking into account possible heterogeneity in effects? Are the lack of significant effects driven by a lack of power?

Mechanism of assignment There are other credible methods of identification than randomisation. However, some study designs (RDDs, longitudinal studies) and methods of analysis (IV) are better able to eliminate or account for unobservables than others (PSM, crosssectional regression). E.g. Miguel and Kremer (2004) Worms??

Assumption of assignment mechanisms Method Validity assumption Randomised control trial (RCT) Regression discontinuity design (RDD) Natural experiment (instrumental variables) Assignment /participation rule external to participants and effectively random (observables and unobservables balanced) Difference-in-differences (DID) regression Adjusts for time-invariant unobservables; assumes no timevarying unobservables Propensity score matching (PSM) Assumes unobservables are correlated with observables

Group equivalence: execution of method of analysis Internal validity relies heavily on the execution of the method of analysis (e.g. group equivalence, efficiency of the participation equation, appropriateness of the instrumental variable); While a degree of qualitative judgment is required for all methods, it is particularly apparent in QEDs.

Threats to validity in execution RCTs: insufficient observations, non-random attrition E.g. high attrition in Miguel & Kremer (2004) Worms DID: attrition, failure to control for time varying covariates in DID regression Farmer field schools eg Orozco-Cirilo, 2008 PSM: lack of covariate balance DANIDA (2012) covariate imbalance RDD: fuzzy discontinuity IV: exogeneity of instrument Duflo and Pande (2009) Dams instrument not exogenous

Spillovers and other forms of contamination Non-geographically clustered designs (when they should be) eg worms, mosquito nets, sanitation etc Differential contamination by compensatory intervention (which affects outcome) Survey effects (measurement as treatment) E.g. Deworming trials (Taylor Robinson et al. 2012) E.g. WASH and diarrhea studies (Kremer et al. 2009)

Hawthorne & John Henry effects Motivation bias caused by implementation and evaluation monitoring can explain differences in outcomes in intervention studies. In contrast, expectation (placebo) effects are embodied within the causal mechanisms of many social interventions, so that isolating them may be less relevant (and in many cases is in any case not possible) E.g. interventional vs. observational data Intensive forms of data collection (cf. WASH and child diarrhoea Ies)

Selective methods of analysis The method of analysis used in the IE should be the best (likely least biased) given the available research design We recognise the importance of theory-based exploratory research tradition in the social sciences. But statistical methods when used inappropriately can provide ample opportunities for biased results-based research E.g. Pitt and Khandkher (1998) in Bangladesh construction of outcomes Ricker-Gilbert (2008) in Bangladesh: uses 2-step Heckman model, but presents non-significant IMR coefficient Carlberg (2012) in Ghana: no IV identifiers are significant, and no test presented for exogeneity

Precision of the effect Misinterpretation of statistical significance (confidence interval) due to: Heterogeneity of effects (eg binomial distribution of outcome) Heteroschedasticity and lack of normality Unit of analysis errors (allocation at village, analysis at household level) Lack of statistical power (small sample size)

Options for summary risk of bias Reviewer judgement to determine most important domains to determine high and low risk of bias, preferably specified in protocol E.g. for highly subjective outcomes, blinding more important Some outcomes at less risk of unobservable selection bias than others e.g. time savings resulting from access to amenities White (2008) In other other cases, unobservable selection bias is particularly problematic (e.g. microfinance due to risk aversion) Independent double coding and inter-rater check Examine sensitivity of results to risk of bias status

Thank you Visit: www.3ieimpact.org http://www.campbellcollaboration.org/interna tional_development/index.php