State-of-the-art Strategies for Addressing Selection Bias When Comparing Two or More Treatment Groups. Beth Ann Griffin Daniel McCaffrey

Similar documents
Developing Public Health Regulations for Legal Marijuana

Manitoba Centre for Health Policy. Inverse Propensity Score Weights or IPTWs

Matched Cohort designs.

For More Information

Recent advances in non-experimental comparison group designs

Introduction to Observational Studies. Jane Pinelis

TRIPLL Webinar: Propensity score methods in chronic pain research

Rise of the Machines

16:35 17:20 Alexander Luedtke (Fred Hutchinson Cancer Research Center)

Propensity Score Methods for Causal Inference with the PSMATCH Procedure

Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy Research

Meta-Analysis. Zifei Liu. Biological and Agricultural Engineering

Evaluating health management programmes over time: application of propensity score-based weighting to longitudinal datajep_

Propensity Score Methods to Adjust for Bias in Observational Data SAS HEALTH USERS GROUP APRIL 6, 2018

Lecture II: Difference in Difference. Causality is difficult to Show from cross

Chapter 17 Sensitivity Analysis and Model Validation

Overview of MET/CBT 5 Adoption

A Practical Guide to Getting Started with Propensity Scores

Module 14: Missing Data Concepts

Ec331: Research in Applied Economics Spring term, Panel Data: brief outlines

OHDSI Tutorial: Design and implementation of a comparative cohort study in observational healthcare data

1 Introduction. st0020. The Stata Journal (2002) 2, Number 3, pp

Draft Proof - Do not copy, post, or distribute

Propensity score methods : a simulation and case study involving breast cancer patients.

Assessing the impact of unmeasured confounding: confounding functions for causal inference

Matching methods for causal inference: A review and a look forward

Quantitative Methods. Lonnie Berger. Research Training Policy Practice

Measuring Impact. Conceptual Issues in Program and Policy Evaluation. Daniel L. Millimet. Southern Methodist University.

Integrated care that is, the care that occurs when mental

THE USE OF NONPARAMETRIC PROPENSITY SCORE ESTIMATION WITH DATA OBTAINED USING A COMPLEX SAMPLING DESIGN

Design and Analysis Plan Quantitative Synthesis of Federally-Funded Teen Pregnancy Prevention Programs HHS Contract #HHSP I 5/2/2016

SAFETY, AND ENVIRONMENT

Identifying Peer Influence Effects in Observational Social Network Data: An Evaluation of Propensity Score Methods

Confounding by indication developments in matching, and instrumental variable methods. Richard Grieve London School of Hygiene and Tropical Medicine

Methods of Reducing Bias in Time Series Designs: A Within Study Comparison

Using Ensemble-Based Methods for Directly Estimating Causal Effects: An Investigation of Tree-Based G-Computation

Propensity scores: what, why and why not?

Introduction to Program Evaluation

Researchers often use observational data to estimate treatment effects

An Introduction to Propensity Score Methods for Reducing the Effects of Confounding in Observational Studies

Introduction to Applied Research in Economics

Estimating the impact of community-level interventions: The SEARCH Trial and HIV Prevention in Sub-Saharan Africa

I. Introduction. Armin Falk IZA and University of Bonn April Falk: Behavioral Labor Economics: Psychology of Incentives 1/18

Discussion Meeting for MCP-Mod Qualification Opinion Request. Novartis 10 July 2013 EMA, London, UK

See Important Reminder at the end of this policy for important regulatory and legal information.

TREATMENT EFFECTIVENESS IN TECHNOLOGY APPRAISAL: May 2015

Instrumental Variables Estimation: An Introduction

Instrumental Variables I (cont.)

Chapter 21 Multilevel Propensity Score Methods for Estimating Causal Effects: A Latent Class Modeling Strategy

Introduction to Applied Research in Economics

Use of the Quantitative-Methods Approach in Scientific Inquiry. Du Feng, Ph.D. Professor School of Nursing University of Nevada, Las Vegas

Small-area estimation of mental illness prevalence for schools

Title: A robustness study of parametric and non-parametric tests in Model-Based Multifactor Dimensionality Reduction for epistasis detection

Developing Adaptive Health Interventions

Effects of propensity score overlap on the estimates of treatment effects. Yating Zheng & Laura Stapleton

Evidence Based Practice in Behavioral Health: An Overview. 12 Steps of EBPs by Bonnie Malek, Marion County

ICPSR Causal Inference in the Social Sciences. Course Syllabus

See Important Reminder at the end of this policy for important regulatory and legal information.

Lecture II: Difference in Difference and Regression Discontinuity

Propensity scores and causal inference using machine learning methods

Mantel-Haenszel Procedures for Detecting Differential Item Functioning

Clinical Policy: Olanzapine Orally Disintegrating Tablet (Zyprexa Zydis) Reference Number: CP.PMN.29 Effective Date: Last Review Date: 02.

Clinical Policy: Lisdexamfetamine (Vyvanse) Reference Number: CP.PMN.121 Effective Date: Last Review Date: Line of Business: Medicaid

Assignment 4: True or Quasi-Experiment

A NEW TRIAL DESIGN FULLY INTEGRATING BIOMARKER INFORMATION FOR THE EVALUATION OF TREATMENT-EFFECT MECHANISMS IN PERSONALISED MEDICINE

What Are the True Benefits of School-Based Drug

Combining machine learning and matching techniques to improve causal inference in program evaluation

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES

Using Propensity Score Matching in Clinical Investigations: A Discussion and Illustration

Introduction to Applied Research in Economics Kamiljon T. Akramov, Ph.D. IFPRI, Washington, DC, USA

Unit 1 Exploring and Understanding Data

SUNGUR GUREL UNIVERSITY OF FLORIDA

DEPARTMENT OF EPIDEMIOLOGY 2. BASIC CONCEPTS

Understanding Confounding in Research Kantahyanee W. Murray and Anne Duggan. DOI: /pir

See Important Reminder at the end of this policy for important regulatory and legal information.

GLOSSARY OF GENERAL TERMS

Dichotomizing partial compliance and increased participant burden in factorial designs: the performance of four noncompliance methods

Methods for Addressing Selection Bias in Observational Studies

Clinical Policy: Clozapine orally disintegrating tablet (Fazaclo) Reference Number: CP.PMN.12 Effective Date: Last Review Date: 02.

A Common-Sense Framework for Assessing Information-Based Counterterrorist Programs

See Important Reminder at the end of this policy for important regulatory and legal information.

LAS SHRED THE. How to adjust your diet like a pro to reach single digit body fat

MS&E 226: Small Data

Complier Average Causal Effect (CACE)

Book review of Herbert I. Weisberg: Bias and Causation, Models and Judgment for Valid Comparisons Reviewed by Judea Pearl

Introducing a SAS macro for doubly robust estimation

What is Multilevel Modelling Vs Fixed Effects. Will Cook Social Statistics

Research Design. Beyond Randomized Control Trials. Jody Worley, Ph.D. College of Arts & Sciences Human Relations

By: Mei-Jie Zhang, Ph.D.

Plans for Surveillance of Acute Myocardial Infarction in users of Oral Anti Diabetes Drugs

2013 ACTIVITIES GUIDE FOR GROUPS AGES 13 TO 16 DRUG AWARENESS WEEK ALCOHOL DRUGS GAMBLING

Recovery Measurement Pilot Study

How to use GoToWebinar

Propensity Score Methods with Multilevel Data. March 19, 2014

DISCLAIMER CONTRIBUTORS

Sensitivity Analysis in Observational Research: Introducing the E-value

Propensity score analysis with the latest SAS/STAT procedures PSMATCH and CAUSALTRT

For More Information

Cannabis Use Problems Identification Test (CUPIT) A measure of current and developing cannabis-related problems

Transcription:

State-of-the-art Strategies for Addressing Selection Bias When Comparing Two or More Treatment Groups Beth Ann Griffin Daniel McCaffrey 1

Acknowledgements This work has been generously supported by NIDA grant 1R01DA034065 Our colleagues Daniel Almirall Rajeev Ramchand Craig Martin Lisa Jaycox James Gazis Natalia Weil Andrew Morral Greg Ridgeway 2

Goals To increase your understanding of How to define and estimate causal effects How to use propensity scores when estimating causal effects To provide you with step-by-step instructions for using propensity score weights when you have: 2 groups (binary treatment) More than 2 groups (multiple treatments) 3

BACKGROUND ON CAUSAL EFFECTS ESTIMATION 4

Why causal effects? Causal effects are of great interest in many fields Interested in causal effects, not merely correlations Causal effects describe how individuals will change their behaviors (substance use, mental health, criminal involvement) in response to A new treatment A prevention program A new policy A new exposure 5

Motivating example Case study objective: To estimate the causal effect of MET/CBT5 versus usual care among adolescent clients Data from 2 SAMSHA CSAT discretionary grants MET/CBT5 Longitudinal, observational data MET/CBT5 at 37 sites from EAT Usual Care Longitudinal, observational data Usual care at 4 sites from ATM 6

Motivating example Case study objective: To estimate the causal effect of MET/CBT5 versus usual care among adolescent clients Data from 2 SAMSHA CSAT discretionary grants MET/CBT5 Usual Care 7

Various situations can benefit from use of causal inference methods Longitudinal observational studies (like our case study) Administrative databases with outcomes (i.e., in a health plan, who gets what treatment and what is their subsequent service use) Aggregating across studies to maximize power and look at moderation or mediation E.g., different RCTs done over time that compared A to B then A+C to B and you want to compare A to A +C 8

How do we get a causal effect estimate? Bad Good Pre-Treatment 9

How do we get a causal effect estimate? Bad Good Pre-Treatment when treated Post-Treatment 10

How do we get a causal effect estimate? Bad Good Pre-Treatment when treated False Effect 11

How do we get a causal effect estimate? Bad when untreated Good Pre-Treatment when treated 12

Causal effect = difference between two potential outcomes Bad when untreated Causal Effect Good Pre-Treatment when treated Post-Treatment 13

Causal effect of MET/CBT5 vs. Usual Care Bad when receive usual care Causal Effect Good Pre-Treatment when receive MET/CBT5 Post-Treatment 14

Causal effect of MET/CBT5 vs. Usual Care Bad when receive usual care Causal Effect Good Pre-Treatment when receive MET/CBT5 Post-Treatment Fundamental problem for causal inference: We only observe ONE of these potential outcomes/counterfactuals 15

Potential outcomes framework Two potential outcomes for each study participant: Outcome after receiving treatment = Y1 Outcome after receiving comparison condition = Y0 (can be either another treatment or no treatment control) Y1 and Y0 regardless of whether the individual received treatment (Z=1) or comparison condition (Z=0) exist for all individuals in the population Observe only one of these outcomes for each participant 16

Causal effect = difference between two potential outcomes Bad Good Pre-Treatment when untreated when treated Y 0 Y 1 Causal Effect = Y1 Y0 Post-Treatment 17

Primary types of causal effects Average treatment effect in the population (ATE) Answers the question: How effective is the treatment in the population? (if you have a control condition) What is the relative effectiveness of two treatments on average in the population? (if you have 2 treatments) E(Y1 Y0) Average treatment effect in the treated population (ATT) Answers the question: How would those who received treatment have done had they received the comparison condition? E(Y1 Y0 Z=1) 18

Primary types of causal effects: case study Average treatment effect in the population (ATE) Answers the question: How effective is the treatment in the population? (if you have a control condition) What is the relative effectiveness of MET/CBT5 versus usual care on average in the population? E(Y1 Y0) Average treatment effect in the treated population (ATT) Answers the question: How would those who had received usual care have done had they received MET/CBT5? E(Y1 Y0 Z=1) 19

Experiments vs. observational studies Random experiments = gold standard for estimating causal effects Randomization (if it works) makes groups being compared balanced on baseline characteristics Treatment assignment is unrelated to potential outcomes (strong ignorability) Randomization is not always feasible Observational studies provide another way to get at causal effects Treatment assignment is not controlled by the researcher Groups being compared are usually imbalanced Can use causal inference methods to try to replicate what a randomized study does 20

Biggest challenge to causal estimation is selection effects Selection occurs when the people getting the treatments being compared differ % Prior MH tx Behavioral Complexity Scale Crime Environment Scale Illegal Activities Scale 0 10 20 30 40 50 Substance Problems Scale Substance Frequency Scale MET/CBT5 Usual Care 21

Illustration: The potential impact of selection on our case study Mean substance frequency scale (SFS) at 12-months = 0.11 (~10 days of use per month) for usual care 0.07 (~ 6 days of use per month) for MET/CBT5 à Effect size (ES) difference of 0.41 22

Illustration: The potential impact of selection on our case study Mean substance frequency scale (SFS) at 12-months = 0.11 (~10 days of use per month) for usual care 0.07 (~ 6 days of use per month) for MET/CBT5 à Effect size (ES) difference of 0.41 I am going to report that usual care results in 0.41 standard deviations greater SFS than MET/CBT5 Is this a good idea? 23

Illustration: The potential impact of selection on our case study Substance Frequency Scale Usual Care MET/ CBT5 Observed Outcomes Pre-Treatment (Baseline) Observed Outcomes Post-Treatment (Follow-up) Post-treatment difference in outcomes=0.41 ES 24

Illustration: The potential impact of selection on our case study Mean substance frequency scale (SFS) at 12-months = 0.11 (~10 days of use per month) for usual care 0.07 (~ 6 days of use per month) for MET/CBT5 à Effect size (ES) difference of 0.41 I am going to report that usual care results in 0.41 standard deviations greater SFS than MET/CBT5 Is this a good idea? No!!! Groups might differ on pretreatment characteristics that influence SFS at 12-months such that even in the absence of the treatment the groups might differ on outcomes at 12-months 25

Different kids, different treatments Substance Frequency Scale Usual Care MET/ CBT5 Observed Outcomes Pre-Treatment (Baseline) Observed Outcomes Post-Treatment (Follow-up) Post-treatment difference in outcomes=0.41 ES 26

WANT: Same kids, different treatments Substance Frequency Scale MET/ CBT5 Observed Outcomes Pre-Treatment (Baseline) Unobserved counterfactual outcomes on usual care Post-Treatment (Follow-up) Causal Treatment Effect 27

Propensity scores help us deal with the challenge of selection The propensity score is an individual s probability of receiving treatment given pretreatment characteristics p(x)=pr (Z=1 X) Used to create balance between treatment and comparison conditions Balancing on p(x) can balance distribution of X between treatment and comparison groups 28

Propensity scores help us deal with the challenge of selection The propensity score is an individual s probability of receiving treatment given pretreatment characteristics p(x)=pr (Z=1 X) Used to create balance between treatment and comparison conditions Balancing on p(x) can balance distribution of X between treatment and comparison groups If strong ignorability holds, propensity score is all that is required to control for observed, pretreatment differences between groups Treatment assignment is strongly ignorable if (1) (Y 1,Y 0 ) Z X e.g, no unobserved confounders (2) 0 < p(x) < 1 e.g, overlap between groups 29

Various ways to use propensity scores to estimate causal effects 1 Propensity scores (probabilities).8.6.4.2 0 Treatment Comparison 30

Various ways to use propensity scores to estimate causal effects Stratification Propensity scores (probabilities) 1.8.6.4.2 0 Treatment Comparison 31

Various ways to use propensity scores to estimate causal effects Stratification Matching 1 Propensity scores (probabilities).8.6.4.2 0 Treatment Comparison 32

Various ways to use propensity scores to estimate causal effects Stratification Matching 1 Weighting 1.8 Propensity scores (probabilities).6.4.2 0 Treatment Comparison 33

Various ways to use propensity scores to estimate causal effects Stratification Matching Weighting Today, we will focus on how to use propensity score weights to correct for selection effects on observed pretreatment characteristics 1.8.6.4.2 Propensity scores (probabilities) 0 Treatment Comparison 34

How do we get good propensity score weights? Everything above has assumed that propensity scores are known Not so unless we have a randomized study Typically have to estimate the propensity scores from data Standard approach has been to use logistic regression model with treatment indicator as outcome Can be challenging Need to find the right combination higher order terms, interactions, variable selection, etc. 35

How do we get good propensity score weights? (cont.) Machine learning methods have been shown to perform better than logistic regression Achieve better balance between treatment and comparison group on pretreatment covariates Reduce bias in treatment effect estimates Produce more stable propensity score weights (thereby also improving precision) Field constantly coming up with new methods Covariate Balancing Propensity Score (CBPS; Imai and Ratkovic 2013) Super Learning (van der Laan 2014) High Dimensional Propensity Score (HDps) (Schneeweiss et al 2009) Entropy Balance (Hainmueller 2012) 36

Today we focus on using GBM to estimate propensity score weights GBM = generalized boosted regression models Data adaptive, nonparametric model Combines many piecewise-constant functions of the covariates to estimate unknown propensity score Can include nominal, discrete, and continuous covariates and covariates with missing values Can include large number of covariates Automatically selects which covariates to include and best functional form including interactions optimal iteration chosen as that which balances treatment and comparison group best 37

Use GBM via TWANG package TWANG = Toolkit for weighting and analysis of nonequivalent groups Uses GBM to estimate propensity score weights Provides various tools for assessing the quality of the propensity score weights Handles both binary and more than 2 treatment group settings Available in R, SAS and STATA 38

Assessing the quality of propensity score weights Primarily focuses on how well propensity score weights balance the two groups Various metrics available to assess balance Commonly used methods include: Standardized Effect Size (ES) differences between the two groups Kolmogorov-Smirnov (KS) statistic T-test p-values 39

Balance measures: ES The standardized effect size (ES) is the difference in mean of a pretreatment variable in the treated group versus the comparison group, divided by the standard deviation ( X 1 X 0 )/ σ In ATE analyses, σ is the pooled standard deviation In ATT analyses, σ is from the treated group Also called Standardized Mean Difference (SMD) Our group uses the rule of thumb that an absolute ES under 0.2 is indicative of good balance Some other authors have suggested that under 0.25 is OK 40

Balance measures: KS Another measure that we use is the Kolmogorov-Smirnof (KS) statistic KS statistic is the maximum vertical distance between the empirical cumulative distribution functions of two samples 41

KS example 1.0 0.8 ECDF 0.6 0.4 Treatment Group 0.2 0.0 Comparison Group -3-2 -1 0 1 2 3 x 42

KS example 1.0 0.8 ECDF 0.6 0.4 Treatment Group 0.2 Largest vertical difference is 0.35 0.0 Comparison Group -3-2 -1 0 1 2 3 x 43

Propensity score weights for different causal effects For ATE: weight treatment group by 1/ p ( X i ) weight comparison group by 1/(1 p ( X i )) For ATT: weight comparison group by p ( X i )/(1 p ( X i )) Both can provide unbiased estimates of the causal treatment effect in question assuming 1. covariates are balanced in treatment and control groups after weighting 2. there are no unobserved confounders and overlap (strong ignorability) 44

Final remarks Causal effects are important Biggest challenges = Can t estimate directly because don t observe all potential outcomes we need Selection: pretreatment differences between groups might bias our treatment effect estimates Propensity scores can help address these challenges Assuming no unobserved confounders When estimating propensity scores, inferences depend on how well they are estimated Use twang to get good estimates Go doubly robust 45

Please note this video is part of a three-part series. We encourage viewers to watch all three segments. 46

Please check out our website http://www.rand.org/statistics/twang.html 47

Please check out our website http://www.rand.org/statistics/twang.html Click on Stay Informed to keep up-to-date on tools available 48

Data acknowledgements The development of these slides was funded by the National Institute of Drug Abuse grant number 1R01DA01569 (PIs: Griffin/McCaffrey) and was supported by the Center for Substance Abuse Treatment (CSAT), Substance Abuse and Mental Health Services Administration (SAMHSA) contract #270-07-0191 using data provided by the following grantees: Adolescent Treatment Model (Study: ATM: CSAT/SAMHSA contracts #270-98-7047, #270-97-7011, #277-00-6500, #270-2003-00006 and grantees: TI-11894, TI-11874; TI-11892), the Effective Adolescent Treatment (Study: EAT; CSAT/SAMHSA contract #270-2003-00006 and grantees: TI-15413, TI-15433, TI-15447, TI-15461, TI-15467, TI-15475,TI-15478, TI-15479, TI-15481, TI-15483, TI-15486, TI-15511, TI-15514,TI-15545, TI-15562, TI-15670, TI-15671, TI-15672, TI-15674, TI-15678, TI-15682, TI-15686, TI-15415, TI-15421, TI-15438, TI-15446, TI-15458, TI-15466, TI-15469, TI-15485, TI-15489, TI-15524, TI-15527, TI-15577, TI-15584, TI-15586, TI-15677), and the Strengthening Communities- Youth (Study: SCY; CSAT/SAMHSA contracts #277-00-6500, #270-2003-00006 and grantees: TI-13305, TI-13308, TI-13313, TI-13322, TI-13323, TI-13344, TI-13345, TI-13354). The authors thank these grantees and their participants for agreeing to share their data to support these secondary analyses. The opinions about these data are those of the authors and do not reflect official positions of the government or individual grantees. 49

CHILDREN AND FAMILIES EDUCATION AND THE ARTS ENERGY AND ENVIRONMENT HEALTH AND HEALTH CARE INFRASTRUCTURE AND TRANSPORTATION INTERNATIONAL AFFAIRS LAW AND BUSINESS NATIONAL SECURITY POPULATION AND AGING PUBLIC SAFETY SCIENCE AND TECHNOLOGY TERRORISM AND HOMELAND SECURITY The RAND Corporation is a nonprofit institution that helps improve policy and decisionmaking through research and analysis. This electronic document was made available from www.rand.org as a public service of the RAND Corporation. Support RAND Browse Reports & Bookstore Make a charitable contribution For More Information Visit RAND at www.rand.org Explore the RAND Corproration View document details Presentation RAND presentations may include briefings related to a body of RAND research, videos of congressional testimonies, and a multimedia presentation on a topic or RAND capability. All RAND presentations represent RAND s commitment to quality and objectivity. Limited Electronic Distribution Rights This document and trademark(s) contained herein are protected by law as indicated in a notice appearing later in this work. This electronic representation of RAND intellectual property is provided for non-commercial use only. Unauthorized posting of RAND electronic documents to a non-rand Web site is prohibited. RAND electronic documents are protected under copyright law. Permission is required from RAND to reproduce, or reuse in another form, any of our research documents for commercial use. For information on reprint and linking permissions, please see RAND Permissions.