Can Quasi Experiments Yield Causal Inferences? Sample. Intervention 2/20/2012. Matthew L. Maciejewski, PhD Durham VA HSR&D and Duke University

Similar documents
Overview of Perspectives on Causal Inference: Campbell and Rubin. Stephen G. West Arizona State University Freie Universität Berlin, Germany

Abstract Title Page Not included in page count. Title: Analyzing Empirical Evaluations of Non-experimental Methods in Field Settings

Working Paper: Designs of Empirical Evaluations of Non-Experimental Methods in Field Settings. Vivian C. Wong 1 & Peter M.

Can Nonrandomized Experiments Yield Accurate Answers? A Randomized Experiment Comparing Random and Nonrandom Assignments

Causal Validity Considerations for Including High Quality Non-Experimental Evidence in Systematic Reviews

Institute for Policy Research, Northwestern University, b Friedrich-Schiller-Universität, Jena, Germany. Online publication date: 11 December 2009

Research Design. Beyond Randomized Control Trials. Jody Worley, Ph.D. College of Arts & Sciences Human Relations

CHAPTER EIGHT EXPERIMENTAL RESEARCH: THE BASICS of BETWEEN GROUP DESIGNS

VALIDITY OF QUANTITATIVE RESEARCH

Lecture 4: Research Approaches

Recent advances in non-experimental comparison group designs

BIOSTATISTICAL METHODS

Instrumental Variables I (cont.)

Within Study Comparison Workshop. Evanston, Aug 2012

Regression Discontinuity Design

Impact Evaluation Toolbox

Topic #6. Quasi-experimental designs are research studies in which participants are selected for different conditions from pre-existing groups.

Methodological requirements for realworld cost-effectiveness assessment

Version No. 7 Date: July Please send comments or suggestions on this glossary to

investigate. educate. inform.

Methods of Reducing Bias in Time Series Designs: A Within Study Comparison

04/12/2014. Research Methods in Psychology. Chapter 6: Independent Groups Designs. What is your ideas? Testing

Getting ready for propensity score methods: Designing non-experimental studies and selecting comparison groups

CALIFORNIA STATE UNIVERSITY STANISLAUS DEPARTMENT OF SOCIOLOGY ASSESSMENT MODEL

QUASI-EXPERIMENTAL HEALTH SERVICE EVALUATION COMPASS 1 APRIL 2016

Formative and Impact Evaluation. Formative Evaluation. Impact Evaluation

WORKING PAPER SERIES

Which Comparison-Group ( Quasi-Experimental ) Study Designs Are Most Likely to Produce Valid Estimates of a Program s Impact?:

ON SELECTION BIAS MAGNITUDES

Chapter 11 Nonexperimental Quantitative Research Steps in Nonexperimental Research

Introduction to Observational Studies. Jane Pinelis

Lecture II: Difference in Difference. Causality is difficult to Show from cross

Randomized Controlled Trials Shortcomings & Alternatives icbt Leiden Twitter: CAAL 1

26:010:557 / 26:620:557 Social Science Research Methods

Causal Methods for Observational Data Amanda Stevenson, University of Texas at Austin Population Research Center, Austin, TX

Quantitative Methods. Lonnie Berger. Research Training Policy Practice

11 questions to help you make sense of a case control study

Propensity Score Methods for Causal Inference with the PSMATCH Procedure

The Logic of Causal Inference

The Effects of the Payne School Model on Student Achievement Submitted by Dr. Joseph A. Taylor

Comparison of External and Internal Validity Bias in Experimental and Quasi- Experimental Approaches Mark White, Ben Hansen, Tim Lycurgus, Brian Rowan

STATISTICAL CONCLUSION VALIDITY

Chapter 21 Multilevel Propensity Score Methods for Estimating Causal Effects: A Latent Class Modeling Strategy

Define the population Determine appropriate sample size Choose a sampling design Choose an appropriate research design

Experimental Research. Types of Group Comparison Research. Types of Group Comparison Research. Stephen E. Brock, Ph.D.

Regression Discontinuity Analysis

Clinical research in AKI Timing of initiation of dialysis in AKI

Institute of Medical Epidemiology, Biostatistics, and Informatics, University of Halle-Wittenberg, Halle (Saale) 2

Multiple Act criterion:

Validity and Quantitative Research. What is Validity? What is Validity Cont. RCS /16/04

Anna Spurlock, Peter Cappers, Ling Jin, Annika Todd, LBNL Patrick Baylis, University of California at Berkeley

M6728. Goals. Depression Scores. Research Designs

Impact and adjustment of selection bias. in the assessment of measurement equivalence

QUASI EXPERIMENTAL DESIGN

Study Design STUDY DESIGN CASE SERIES AND CROSS-SECTIONAL STUDY DESIGN

PubH 7405: REGRESSION ANALYSIS. Propensity Score

Practical propensity score matching: a reply to Smith and Todd

INTERNAL VALIDITY, BIAS AND CONFOUNDING

Protocol Development: The Guiding Light of Any Clinical Study

Methods to control for confounding - Introduction & Overview - Nicolle M Gatto 18 February 2015

Public Policy & Evidence:

Experimental and Quasi-Experimental designs

Issues to Consider in the Design of Randomized Controlled Trials

TRIPLL Webinar: Propensity score methods in chronic pain research

Observational Study Designs. Review. Today. Measures of disease occurrence. Cohort Studies

Propensity Score Methods to Adjust for Bias in Observational Data SAS HEALTH USERS GROUP APRIL 6, 2018

Overview of the Logic and Language of Psychology Research

SIH: Trials and Research Endeavors. Tim Amrhein, MD Assistant Professor of Neuroradiology Duke University Medical

IN THE FACE OF THE FINANCIAL,

11/2/2017. Often, we cannot manipulate a variable of interest

Analysis methods for improved external validity

Educational Research. S.Shafiee. Expert PDF Trial

Overview of Clinical Study Design Laura Lee Johnson, Ph.D. Statistician National Center for Complementary and Alternative Medicine US National

WWC STUDY REVIEW STANDARDS

Institute of Medical Epidemiology, Biostatistics, and Informatics, University of Halle-Wittenberg, Halle (Saale) 2

UNIT 1: Fundamentals of research design and variables

Basic Concepts in Research and DATA Analysis

How to use this appraisal tool: Three broad issues need to be considered when appraising a case control study:

1. INTRODUCTION. Lalonde estimates the impact of the National Supported Work (NSW) Demonstration, a labor

Lecture 9 Internal Validity

Class 1: Introduction, Causality, Self-selection Bias, Regression

Rise of the Machines

Experimental Design (7)

Improving Causal Claims in Observational Research: An Investigation of Propensity Score Methods in Applied Educational Research

Conducting Strong Quasi-experiments

MBACATÓLICA JAN/APRIL Marketing Research. Fernando S. Machado. Experimentation

CRITICAL APPRAISAL SKILLS PROGRAMME Making sense of evidence about clinical effectiveness. 11 questions to help you make sense of case control study

In this chapter we discuss validity issues for quantitative research and for qualitative research.

Causal Inference in Observational Settings

Econometric analysis and counterfactual studies in the context of IA practices

Pros. University of Chicago and NORC at the University of Chicago, USA, and IZA, Germany

Introductory: Coding

Propensity Score Analysis Shenyang Guo, Ph.D.

WWC Standards for Regression Discontinuity and Single Case Study Designs

Design of Experiments & Introduction to Research

10/21/2014. Considerations in the Selection of Research Participants. Reasons to think about participant selection

Why randomize? Rohini Pande Harvard University and J-PAL.

Information and Software Technology

Chapter 9 Experimental Research (Reminder: Don t forget to utilize the concept maps and study questions as you study this and the other chapters.

Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy Research

Transcription:

Can Quasi Experiments Yield Causal Inferences? Matthew L. Maciejewski, PhD Durham VA HSR&D and Duke University Sample Study 1 Study 2 Year Age Race SES Health status Intervention Study 1 Study 2 Intervention ist # of interactions Duration of each interaction Single topic or multiple topics Content 1

RCT considered Gold Standard of Benefit Design for Several Reasons Create balance in observed covariates Reduces number of competing hypotheses for variation in outcomes to one (treatment assignment) Control group outcome is a valid counterfactual (unbiased estimate of outcome for treatment group had they not been randomized to treatment) Treatment effect generalizes to entire sample Statistical result is causal effect of treatment on outcome Context for Perceived Inferiority of Quasi Experiments Prior comparisons of RCTs and non RCTs Experimental results rarely replicated Even when applying instrumental variables (IV) methods (LaLonde1986) RCTs typically compared to non identical samples and non identical outcomes in different data Conclusion has been that design (quasi experiment) is the cause of difference, not sample or outcomes Could outcomes be similar across designs if same sample & outcomes? Differences in Samples for RCTs and Quasi Experiments RCTs Conducted on highly selected populations Rarely pregnant women, highest risk people, oldest Quasi experiments Conducted on general populations Differences not necessarily due to randomization Could be entirely due to different samples included 2

LaLonde (1986) Job Results Estimator Wage Difference for Men Unadjusted RCT $886 Non RCT estimates from PSID & CPS SSA Unadjusted Low= $1637, High=$1714 Age adjusted Low= $1388, High=$195 Age, schooling, race Low= $1228, High=$1466 & pre period wage IV Low= $667, High=$889 Stukel 2007 JAMA: Mortality Impact of Cardiac Catheterization Model Risk Ratio (95% CI) Unadjusted survival 0.36 (0.36, 0.37) Multivariate adjustment 0.51 (0.50, 0.52) Simple PS Adjustment: Deciles + 0.52 (0.51, 0.53) Covariates Fancy PS Adjustment: Deciles + Covariates 0.52 (0.51, 0.53) Conclusion 1) Adjustment for covariates important in non RCT 2) Multivariate & PS regressions are same Stukel 2007 JAMA: Mortality Impact of Cardiac Catheterization Model Risk Ratio (95% CI) Unadjusted survival 0.36 (0.36, 0.37) Multivariate adjustment 0.51 (0.50, 0.52) Simple PS Adjustment: Deciles + 0.52 (0.51, 0.53) Covariates Fancy PS Adjustment: Deciles + Covariates 0.52 (0.51, 0.53) What to conclude? 1) Regression & PS results are both right? 2) Results are both wrong? 3

Stukel 2007 JAMA: Mortality Impact of Cardiac Catheterization Model Risk Ratio (95% CI) Unadjusted survival 0.36 (0.36, 0.37) Multivariate adjustment 0.51 (0.50, 0.52) Simple PS Adjustment: Deciles + 0.52 (0.51, 0.53) Covariates Fancy PS Adjustment: Deciles + Covariates 0.52 (0.51, 0.53) Instrumental Variables 0.84 (0.79, 0.90) RCT Results 0.79 0.92 What to conclude? 1) Regression & PS results are both right? 2) Results are both wrong? This is it. Re appraising the Value of Quasi experiments An under used design allows direct comparison of results from RCT & non RCT Within study comparison study Four arm study: 2 stage process Randomize to randomized treatment or self selected treatment Same treatments, controls, outcomes, timing Can compare two treatment effects! Difference btn treatment & control in RCT arm Difference btn treatment & control in non RCT arm Design of Within Study Comparison by Shadish (JASA 2008) Recruited Students Pretests then Randomly Assigned Randomized Experiment Nonrandomized Study Mathematics Vocabulary Mathematics Vocabulary 4

Details of Shadish (2008) Design Participants from one college Participants pretested on several covariates Chose math and vocabulary training because Easy to induce effect with item difficulty Mth Math phobias cause plausible selection bias All participants treated together (in same class) without knowledge of different conditions People randomized to math in same training class as people self selecting math Everyone post test on math & vocab outcomes Unadjusted Results: Vocabulary Effect on Vocabulary Outcome Vocab Math Mean Difference Unadjusted RCT 16.19 8.08 8.11 Absolute Bias Unadjusted Quasiexperiment 16.75 7.75 9.00 0.89 Conclusions 1. Effect of vocab training on vocab scores was larger (9 of 30 points) when participants could self select into vocabulary training 2. The 8.11 point effect in RCT was overestimated by 11% (0.89 points) in the quasi experiment Propensity Score Modeling Based on a priori model of selection process that informed prospective pre test assessments Extensive adjustment Math & vocabulary pretest scores, ACT, GPA, prior exposure to math courses, math anxiety, demographic Big 5 personality traits (extraversion, emotional stability, agreeableness, intellect, & conscientiousness) Extensive adjustment reduced bias a lot (59 96%) Limited adjustment (comparable to claims) Age, sex, race & marital status had reduced bias modestly (12 30%) 5

Bias Reduction Fairly Similar Across Different Propensity Score Methods ter Adjustment Percent Bias Remaining Aft 100% 80% 60% 40% 20% 0% Vocabulary Outcome Unadjusted Quasi- Experiment Propensity Score Stratification Mathematics Outcome Propensity Score ANCOVA Propensity Score Adjustment Method Propensity Score Weighting Implications of Shadish (2008) Sampling design produced non equivalent groups on observables Big overlap in baseline values in RCT & non RCT groups due to 1 st stage randomization made propensity scores more valid Extensive measurement of relatively simple selection process, though not homogeneous Propensity score matching may not be effective if selection process is complex (as in job training) Bottom line: Propensity score results from extensive adjustment matched RCT results Limitations of Shadish (2008) Short duration (15 minutes) Not costly to conduct Little incentive for non compliance Absence of non compliance with treatment assignment Short time between pretest & post test, and short time between treatment & posttest Change attributable to few things besides treatment Not generalizable to complex medical settings Longer duration, have significant non compliance and delay between treatment and outcomes assessment 6

Conditions Under Which Quasi Experiments Match RCT Results Similarity between groups in pre period values When geographically local, comparison groups may not differ on major observables b/c provider & site effects controlled (e.g., pts in same clinic) ACEI example (Hebert & Maciejewski) Rigorous conceptualization and measurement of selection process to support effective matching Pre period outcomes are particularly important Adjustment using off the shelf vars not enough Regression discontinuity Descriptive Statistics of Unmatched CA ACEI and Non CA ACEI Cohorts CA ACEI Cohort? Non CA ACEI Cohort? Standardized Differences Age 76.1 75.9 7.17 Female (%) 65% 64% 2.09 White Race (%) 76% 83% 17.41 Black Race (%) 9% 7% 7.38 Baseline AMI (%) 6.7% 4.1% 11.52 Elixhauser Score 5.69 (7.79) 4.66 (6.99) 44.54 Baseline Expenditures $8081(15210) $6180 (12798) 16.06 Baseline # Meds 6.7 (3.8) 6.4 (3.6) 26.03 Office visits 8.6 (8.3) 8.5 (9.0) 4.42 Careful Consideration of Selection Process Bias can be significantly reduced if three steps of confounder adjustment are done Identification of all relevant confounders from literature, theory, and experts Error free measurement Proper modeling Use of variables of convenience fails 1 st step, so unlikely to reduce bias fully Especially true in claims data? 7

Reconsider Value of Quasi Experiments for Causal Inference? Comparing good RCT to poor quasi experiment confounds design type and the quality of its implementation Logical fallacy This conclusion is ex post facto because we know RCT results in advance Rarely true; more often have to infer ala Stukel Quasi experiments satisfying three conditions more likely to generate valid causal estimates Questions? References Cook, Shadish & Wong, 2008. J Policy Analysis and Management, 27(4): 724 750 Diaz & Handa, 2006. J Human Resources, 41(2): 319 345. LaLonde RJ, 1986. Amer Ec Rev, 76(4): 604 18 Shadish, Clark & Steiner, 2008. JASA, 103(484): 1334 1356 http://steinhardt.nyu.edu/scmsadmin/uploads/ 002/477/Tom%20Cook FINAL.pdf 8

References & Resources 1. Cook, T. (Producer). (2008). When Experiments and Observational Studies give comparable Causal Estimates:. [Powerpoint presentation] Retrieved from http://steinhardt.nyu.edu/scmsadmin/uploads/002/477/tom%20cook FINAL.pdf 2. Diaz, J. J., & Handa, S. (2006). An Assessment of Propensity Score Matching as a Nonexperimental Impact Estimator. [Article]. Journal of Human Resources, 41(2), 319 345. 3. LaLonde, R. J. (1986). Evaluating the Econometric Evaluations of Programs with Experimental Data. The American Economic Review, 76(4), 604 620. 4. Shadish, W. R., Clark, M. H., & Steiner, P. M. (2008). Can Nonrandomized Experiments Yield Accurate Answers? A Randomized Experiment Comparing Random and Nonrandom Assignments. Journal of the American Statistical Association, 103(484), 1334 1344. 1344 doi: doi:10.1198/016214508000000733 1198/016214508000000733 5. Stukel, T. A., Fisher, E. S., Wennberg, D. E., Alter, D. A., Gottlieb, D. J., & Vermeulen, M. J. (2007). Analysis of observational studies in the presence of treatment selection bias: effects of invasive cardiac management on AMI survival using propensity score and instrumental variable methods. [Research Support, N.I.H., Extramural Research Support, Non U.S. Gov't]. JAMA, 297(3), 278 285. doi: 10.1001/jama.297.3.278 9