Recent advances in non-experimental comparison group designs

Similar documents
Getting ready for propensity score methods: Designing non-experimental studies and selecting comparison groups

Analysis methods for improved external validity

Introduction to Observational Studies. Jane Pinelis

Sensitivity Analysis in Observational Research: Introducing the E-value

Confounding by indication developments in matching, and instrumental variable methods. Richard Grieve London School of Hygiene and Tropical Medicine

Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy Research

Propensity Score Analysis Shenyang Guo, Ph.D.

Chapter 21 Multilevel Propensity Score Methods for Estimating Causal Effects: A Latent Class Modeling Strategy

Regression Discontinuity Analysis

Observational & Quasi-experimental Research Methods

Assessing the impact of unmeasured confounding: confounding functions for causal inference

Complier Average Causal Effect (CACE)

TRIPLL Webinar: Propensity score methods in chronic pain research

Title: New Perspectives on the Synthetic Control Method. Authors: Eli Ben-Michael, UC Berkeley,

Propensity Score Methods for Causal Inference with the PSMATCH Procedure

The Nonuse, Misuse, and Proper Use of Pilot Studies in Education Research

Instrumental Variables Estimation: An Introduction

George B. Ploubidis. The role of sensitivity analysis in the estimation of causal pathways from observational data. Improving health worldwide

Causal Validity Considerations for Including High Quality Non-Experimental Evidence in Systematic Reviews

Research Design. Beyond Randomized Control Trials. Jody Worley, Ph.D. College of Arts & Sciences Human Relations

Can Quasi Experiments Yield Causal Inferences? Sample. Intervention 2/20/2012. Matthew L. Maciejewski, PhD Durham VA HSR&D and Duke University

Chapter 11 Nonexperimental Quantitative Research Steps in Nonexperimental Research

Quantitative Methods. Lonnie Berger. Research Training Policy Practice

PharmaSUG Paper HA-04 Two Roads Diverged in a Narrow Dataset...When Coarsened Exact Matching is More Appropriate than Propensity Score Matching

State-of-the-art Strategies for Addressing Selection Bias When Comparing Two or More Treatment Groups. Beth Ann Griffin Daniel McCaffrey

Chapter 10 Quasi-Experimental and Single-Case Designs

Methods for Addressing Selection Bias in Observational Studies

MS&E 226: Small Data

The Practice of Statistics 1 Week 2: Relationships and Data Collection

Overview of Perspectives on Causal Inference: Campbell and Rubin. Stephen G. West Arizona State University Freie Universität Berlin, Germany

Impact Evaluation Toolbox

Conducting Strong Quasi-experiments

Propensity Score Methods to Adjust for Bias in Observational Data SAS HEALTH USERS GROUP APRIL 6, 2018

Challenges of Observational and Retrospective Studies

Which Comparison-Group ( Quasi-Experimental ) Study Designs Are Most Likely to Produce Valid Estimates of a Program s Impact?:

PubH 7405: REGRESSION ANALYSIS. Propensity Score

EXPERIMENTAL METHODS 1/15/18. Experimental Designs. Experiments Uncover Causation. Experiments examining behavior in a lab setting

Matching methods for causal inference: A review and a look forward

Researchers often use observational data to estimate treatment effects

Experimental Political Science and the Study of Causality

Outline. The why, when, and how of propensity score methods for estimating causal effects. Course description. Outline.

ICPSR Causal Inference in the Social Sciences. Course Syllabus

WORKING PAPER SERIES

Jake Bowers Wednesdays, 2-4pm 6648 Haven Hall ( ) CPS Phone is

Glossary From Running Randomized Evaluations: A Practical Guide, by Rachel Glennerster and Kudzai Takavarasha

Moving beyond regression toward causality:

Making comparisons. Previous sessions looked at how to describe a single group of subjects However, we are often interested in comparing two groups

BIOSTATISTICAL METHODS

Lecture Slides. Elementary Statistics Eleventh Edition. by Mario F. Triola. and the Triola Statistics Series 1.1-1

Patrick Breheny. January 28

Author's response to reviews

Approaches to Improving Causal Inference from Mediation Analysis

QUASI-EXPERIMENTAL HEALTH SERVICE EVALUATION COMPASS 1 APRIL 2016

Clever Hans the horse could do simple math and spell out the answers to simple questions. He wasn t always correct, but he was most of the time.

Simple Sensitivity Analyses for Matched Samples Thomas E. Love, Ph.D. ASA Course Atlanta Georgia

Working Paper: Designs of Empirical Evaluations of Non-Experimental Methods in Field Settings. Vivian C. Wong 1 & Peter M.

Psychology of Dysfunctional Behaviour RESEARCH METHODS

Multiple Regression Models

Structural Equation Modeling (SEM)

Human intuition is remarkably accurate and free from error.

Lecture II: Difference in Difference. Causality is difficult to Show from cross

Propensity scores: what, why and why not?

Psych 1Chapter 2 Overview

How should the propensity score be estimated when some confounders are partially observed?

A NEW TRIAL DESIGN FULLY INTEGRATING BIOMARKER INFORMATION FOR THE EVALUATION OF TREATMENT-EFFECT MECHANISMS IN PERSONALISED MEDICINE

Recommendations from the Report of the Government Inquiry into:

An Introduction to Multiple Imputation for Missing Items in Complex Surveys

P E R S P E C T I V E S

Missing Data and Imputation

Recent developments for combining evidence within evidence streams: bias-adjusted meta-analysis

Gathering. Useful Data. Chapter 3. Copyright 2004 Brooks/Cole, a division of Thomson Learning, Inc.

investigate. educate. inform.

Higher Psychology RESEARCH REVISION

Chapter 9 Experimental Research (Reminder: Don t forget to utilize the concept maps and study questions as you study this and the other chapters.

Identifying Peer Influence Effects in Observational Social Network Data: An Evaluation of Propensity Score Methods

Introduction to Program Evaluation

Combining machine learning and matching techniques to improve causal inference in program evaluation

Methodological requirements for realworld cost-effectiveness assessment

Experimental design (continued)

Strategies for handling missing data in randomised trials

Beyond the intention-to treat effect: Per-protocol effects in randomized trials

Abstract Title Page Not included in page count. Title: Analyzing Empirical Evaluations of Non-experimental Methods in Field Settings

Bayesian and Frequentist Approaches

In this chapter we discuss validity issues for quantitative research and for qualitative research.

Lecture II: Difference in Difference and Regression Discontinuity

Propensity Score Methods with Multilevel Data. March 19, 2014

Cross-Lagged Panel Analysis

Version No. 7 Date: July Please send comments or suggestions on this glossary to

Comparisons of Dynamic Treatment Regimes using Observational Data

Political Science 15, Winter 2014 Final Review

Matched Cohort designs.

Methods of Reducing Bias in Time Series Designs: A Within Study Comparison

Measuring Impact. Program and Policy Evaluation with Observational Data. Daniel L. Millimet. Southern Methodist University.

On Missing Data and Genotyping Errors in Association Studies

What is Multilevel Modelling Vs Fixed Effects. Will Cook Social Statistics

Instrumental Variables I (cont.)

Mitigating Reporting Bias in Observational Studies Using Covariate Balancing Methods

Public Policy & Evidence:

A Primer on Causality in Data Science

Understanding Regression Discontinuity Designs As Observational Studies

Transcription:

Recent advances in non-experimental comparison group designs Elizabeth Stuart Johns Hopkins Bloomberg School of Public Health Department of Mental Health Department of Biostatistics Department of Health Policy and Management www.biostat.jhsph.edu/ estuart estuart@jhsph.edu Funding thanks to IES R305D150001 September 23, 2016 Elizabeth Stuart (JHSPH) Comparison groups September 23, 2016 1 / 27

Outline 1 Introduction: Why non-experimental designs? 2 When do comparison group designs work (i.e., give accurate effect estimates)? 3 Moving from art to science 4 Sensitivity analysis to unobserved confounding 5 Conclusions Elizabeth Stuart (JHSPH) Comparison groups September 23, 2016 2 / 27

Outline 1 Introduction: Why non-experimental designs? 2 When do comparison group designs work (i.e., give accurate effect estimates)? 3 Moving from art to science 4 Sensitivity analysis to unobserved confounding 5 Conclusions Elizabeth Stuart (JHSPH) Comparison groups September 23, 2016 3 / 27

Introduction and Motivation Randomized experiments often seen as the gold standard for estimating causal effects (for good reason) But some important causal questions can only be answered using non-experimental studies e.g., interventions or risk factors it would be unethical to randomize (child maltreatment, drug use) e.g., not feasible to randomize because intervention widely available (books in the home, online reading program) e.g., can t wait that long to collect outcome data (long term effects of Head Start) e.g., worried that people who participate may not represent the target population (medical trials conducted only in academic medical centers, school-based studies conducted primarily in large districts) Elizabeth Stuart (JHSPH) Comparison groups September 23, 2016 4 / 27

The problem Individuals who select one treatment, or who are exposed to some risk factor of interest, likely different from those who don t Confounding Hard to separate out differences in outcomes due to these other confounders, vs. due to the treatment of interest Elizabeth Stuart (JHSPH) Comparison groups September 23, 2016 5 / 27

Some non-experimental design options Instrumental variables Requires finding an instrument that affects the treatment received but does not directly affect the outcome Randomized encouragement designs are one (sometimes feasible) type Otherwise have to hope for some naturally occurring instrument (e.g., charter school lotteries) Regression discontinuity Requires that treatment administered in a way that used a discontinuity; e.g., students with scores below a threshold got the intervention Interrupted time series Useful for interventions implemented at a particular point in time for a particular group (e.g., policy changes), with longitudinal measures before and after Comparative interrupted time series better than simple ITS, and then many of the same issues we will talk about here still come up Elizabeth Stuart (JHSPH) Comparison groups September 23, 2016 6 / 27

Comparison group designs as a feasible option Comparison groups often one of the most feasible designs Main idea: have data on people who got some treatment of interest, find a comparison group of individuals who are similar but did not receive the treatment Main strategy: try to ensure that the treatment and comparison groups are as similar as possible on a large set of baseline characteristics Traditionally may have just used regression adjustment to control for any differences However, this can lead to model dependence and concerns about model misspecification if the groups are quite different So a large literature has built up that aims to use design to equate the groups before subsequent regression adjustment Propensity scores are one key tool in this design as they help create groups that look similar on a potentially large set of baseline characteristics Big picture common strategies: matching, weighting, subclassification Elizabeth Stuart (JHSPH) Comparison groups September 23, 2016 7 / 27

Talk today Will not provide a general history and introduction to these methods Fundamentally, think about them as ways to make treatment and comparison groups look like they could have come from a randomized trial (Large literature on the benefits of emulating a randomized trial) Focus on 3 particular recent advances in this field: 1 Evidence on when comparison group designs work 2 Moving these designs to more science than art 3 Importance of sensitivity analysis to unobserved confounders Will also focus on point in time treatments today; more complex settings (e.g., longitudinal treatments) require generalization of these ideas; fundamentals stay the same though Elizabeth Stuart (JHSPH) Comparison groups September 23, 2016 8 / 27

Outline 1 Introduction: Why non-experimental designs? 2 When do comparison group designs work (i.e., give accurate effect estimates)? 3 Moving from art to science 4 Sensitivity analysis to unobserved confounding 5 Conclusions Elizabeth Stuart (JHSPH) Comparison groups September 23, 2016 9 / 27

When the key assumptions hold! Key assumption: unconfounded treatment assignment No unmeasured confounders: no unobserved differences between treatment and comparison groups, once we have balanced the groups on the observed characteristics Also called no hidden bias or ignorable treatment assignment Also requires assumption that everyone in the study had a non-zero chance of getting treatment or control ( common support ) e.g., explicitly exclude people not eligible for the treatment So how do we make these assumptions more plausible? Elizabeth Stuart (JHSPH) Comparison groups September 23, 2016 10 / 27

It s all about the covariates! Increasing evidence that what matters is what covariates are included, not exactly how the matching/equating is done Including a large set of covariates, in particular those related to treatment assignment and outcomes, makes unconfoundedness more likely to be satisfied Careful design crucial: what are the important confounders, and do we (or can we) measure them? Steiner, Cook, Shadish, and Clark (2010; Psych Methods); Cook, Shadish, and Wong (2008; JPAM); Steiner, Cook, Li, and Clark (2015; JREE) Elizabeth Stuart (JHSPH) Comparison groups September 23, 2016 11 / 27

Figure 2 from Steiner et al. (2010) Elizabeth Stuart (JHSPH) Comparison groups September 23, 2016 12 / 27

Lessons from this literature Think carefully about comparison group selection ( clever design ) Use a large set of variables (not just demographics; also include, e.g., pretest measures of the outcome) Select comparison group carefully (e.g., from same geographic area) Measure variables in same ways across treatment and comparison groups (Using large national datasets usually not as effective) Have good understanding of the treatment selection process (importance of the assignment mechanism!) Have large sample sizes in the comparison groups: easier to get good balance (Also should not adjust for/match on post-treatment variables) Elizabeth Stuart (JHSPH) Comparison groups September 23, 2016 13 / 27

Outline 1 Introduction: Why non-experimental designs? 2 When do comparison group designs work (i.e., give accurate effect estimates)? 3 Moving from art to science 4 Sensitivity analysis to unobserved confounding 5 Conclusions Elizabeth Stuart (JHSPH) Comparison groups September 23, 2016 14 / 27

More automated methods Traditionally the use of methods such as propensity scores has involved a fair amount of art Goal: create groups that look similar on the observed covariates (covariate balance ) To get there, try different estimation techniques, equating methods, interaction terms, etc., and pick the one that gives the best covariate balance New methods aim to remove some of this iteration, in two ways: Get balance on the covariates themselves directly (not necessarily using the propensity score) Estimate and use the propensity score in an automated way to get good balance Elizabeth Stuart (JHSPH) Comparison groups September 23, 2016 15 / 27

Getting direct balance Some methods aim to directly balance the covariates, not through the propensity score Coarsened exact matching (CEM; King et al.; cem R package): http://gking.harvard.edu/cem Essentially, exact matching on coarsened (categorized) covariates Trade off between number of matches and closeness of matching Works well for easily categorized variables (like high school degree or not); less clear for truly continuous ones (like age) Mixed integer programming and fine balance (Zubizarreta, 2012; mismatch R package) Sort of like exact matching on a few variables But instead of getting individual-level exact matches, matches the distributions exactly across the matched samples Elizabeth Stuart (JHSPH) Comparison groups September 23, 2016 16 / 27

Zubizarreta et al. (2015), Table 2 Elizabeth Stuart (JHSPH) Comparison groups September 23, 2016 17 / 27

More automated propensity score methods Other methods aim to automate the propensity score estimation itself, to estimate the propensity score in a way that optimizes balance Most popular: Covariate Balancing Propensity Score (CBPS; Imai and Ratkovic): http://imai.princeton.edu/research/cbps.html Doesn t simply maximize the likelihood; also has a balance constraint that it jointly maximizes Benefit of this is that it maintains the nice theoretical properties of the propensity score, but also more directly targets balance Genetic matching (Sekhon et al.) another version of this One drawback: many of these optimize a particular balance measure and may not optimize others Need more work to determine the best balance measures to optimize Elizabeth Stuart (JHSPH) Comparison groups September 23, 2016 18 / 27

Outline 1 Introduction: Why non-experimental designs? 2 When do comparison group designs work (i.e., give accurate effect estimates)? 3 Moving from art to science 4 Sensitivity analysis to unobserved confounding 5 Conclusions Elizabeth Stuart (JHSPH) Comparison groups September 23, 2016 19 / 27

What if we don t believe unconfoundedness? Sensitivity analyses can be done to assess how sensitive results are to an unobserved confounder Ask the question: How strongly related to treatment assignment and outcome would such a factor have to be in order to change study conclusions? Based originally on analysis by Cornfield showing that association between smoking and lung cancer most likely actually causal Methods now extended by Rosenbaum, VanderWeele, others Elizabeth Stuart (JHSPH) Comparison groups September 23, 2016 20 / 27

Example: Effects of psychosocial therapy on repeat suicide attempts Erlangsen et al. (2014) used Danish registry data to estimate effect of suicide prevention centers on Concern that there may be an unobserved variable related to participation and outcomes Sensitivity analysis can assess how strong such an unobserved variable would have to be to change study conclusions Used approach by VanderWeele and Arah (see Liu et al., 2013) For one of the weaker effects (repeated self-harm after 20 years) a binary unobserved confounder with prevalence 0.5 would have to have a 1.8-fold association with participation in the program and a two-fold association with the outcome in order to explain the results Elizabeth Stuart (JHSPH) Comparison groups September 23, 2016 21 / 27

Outline 1 Introduction: Why non-experimental designs? 2 When do comparison group designs work (i.e., give accurate effect estimates)? 3 Moving from art to science 4 Sensitivity analysis to unobserved confounding 5 Conclusions Elizabeth Stuart (JHSPH) Comparison groups September 23, 2016 22 / 27

Strengths and limitations of non-experimental comparison group designs Strengths Often feasible Relatively easy to describe and understand Idea of replicating a randomized trial: no use of outcome data in setting up the design Limitations Relies on having high-quality data Common measures across treatment and control groups, and important confounders measured Helps to have a good understanding of the treatment selection process, which is rare (opportunity for combining qualitative and quantitative work??) Elizabeth Stuart (JHSPH) Comparison groups September 23, 2016 23 / 27

Conclusions Many research questions require non-experimental designs When using non-experimental comparison group designs clever design helps General lessons: Measure as many confounders as possible; try to have an understanding of the treatment selection process Try to get as good covariate balance on the observed covariates as possible Assess sensitivity to key assumption of no unmeasured confounders Also lots of complications and extensions: multiple treatment levels, time-varying treatments, missing outcomes,... Elizabeth Stuart (JHSPH) Comparison groups September 23, 2016 24 / 27

Elizabeth Stuart (JHSPH) Comparison groups September 23, 2016 25 / 27

And remember... With better data, fewer assumptions are needed. - Rubin (2005, p. 324) You can t fix by analysis what you bungled by design. - Light, Singer and Willett (1990, p. v) Elizabeth Stuart (JHSPH) Comparison groups September 23, 2016 26 / 27

References Online class at JHSPH: 140.664 ( Causal inference in medicine and public health ) One-day short course on propensity scores in JHSPH summer institute (in-person and online!): http://www.jhsph.edu/departments/mental-health/summer-institute/courses.html Erlangsen, A.,..., Stuart, E.A., et al. (2014). Short and long term effects of psychosocial therapy provided to persons after deliberate self-harm: a register-based, nationwide multicentre study using propensity score matching. Lancet Psychiatry. Ho, D. E., Imai, K., King, G., and Stuart, E. A. (2007). Matching as nonparametric preprocessing for reducing model dependence in parametric causal inference. Political Analysis 15(3): 199-236. http://gking.harvard.edu/matchp.pdf. Rubin, D. B. (2001). Using propensity scores to help design observational studies: application to the tobacco litigation.health Services & Outcomes Research Methodology 2, 169-188. Stuart, E.A. (2010). Matching Methods for Causal Inference: A review and a look forward. Statistical Science 25(1): 1-21 Liu, W., Kuramoto, S.K., and Stuart, E.A. (2013). An Introduction to Sensitivity Analysis for Unobserved Confounding in Non-Experimental Prevention Research. Prevention Science 14(6): 570-580. PMCID: 3800481. Elizabeth Stuart (JHSPH) Comparison groups September 23, 2016 27 / 27