Dylan Small Department of Statistics, Wharton School, University of Pennsylvania. Based on joint work with Paul Rosenbaum

Similar documents
in Medicine Tutorial in Biostatistics: Instrumental Variable Methods for Causal Inference

Can you guarantee that the results from your observational study are unaffected by unmeasured confounding? H.Hosseini

Introduction to Observational Studies. Jane Pinelis

Simple Sensitivity Analyses for Matched Samples Thomas E. Love, Ph.D. ASA Course Atlanta Georgia

Welcome to this third module in a three-part series focused on epidemiologic measures of association and impact.

J. Y. Hsu S. A. Lorch Center for Outcomes Research, The Children s Hospital of Philadelphia, Philadelphia, PA, USA

BIOSTATISTICAL METHODS

Bayesian graphical models for combining multiple data sources, with applications in environmental epidemiology

Instrumental Variables I (cont.)

Sensitivity Analysis in Observational Research: Introducing the E-value

EMPIRICAL STRATEGIES IN LABOUR ECONOMICS

Lecture II: Difference in Difference. Causality is difficult to Show from cross

Welcome to this series focused on sources of bias in epidemiologic studies. In this first module, I will provide a general overview of bias.

Lecture II: Difference in Difference and Regression Discontinuity

Establishing Causality Convincingly: Some Neat Tricks

Causal Association : Cause To Effect. Dr. Akhilesh Bhargava MD, DHA, PGDHRM Prof. Community Medicine & Director-SIHFW, Jaipur

How should the propensity score be estimated when some confounders are partially observed?

Instrumental Variables Estimation: An Introduction

Methods for Addressing Selection Bias in Observational Studies

Confounding by indication developments in matching, and instrumental variable methods. Richard Grieve London School of Hygiene and Tropical Medicine

Challenges of Observational and Retrospective Studies

MEA DISCUSSION PAPERS

Propensity Score Methods to Adjust for Bias in Observational Data SAS HEALTH USERS GROUP APRIL 6, 2018

Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy Research

Measuring Impact. Program and Policy Evaluation with Observational Data. Daniel L. Millimet. Southern Methodist University.

Missing data. Patrick Breheny. April 23. Introduction Missing response data Missing covariate data

Reflection Questions for Math 58B

Econometric Game 2012: infants birthweight?

PubH 7405: REGRESSION ANALYSIS. Propensity Score

Class 1: Introduction, Causality, Self-selection Bias, Regression

Mendelian Randomization

Estimating Direct Effects of New HIV Prevention Methods. Focus: the MIRA Trial

Brief introduction to instrumental variables. IV Workshop, Bristol, Miguel A. Hernán Department of Epidemiology Harvard School of Public Health

A NON-TECHNICAL INTRODUCTION TO REGRESSIONS. David Romer. University of California, Berkeley. January Copyright 2018 by David Romer

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras

I. Identifying the question Define Research Hypothesis and Questions

The Effects of Maternal Alcohol Use and Smoking on Children s Mental Health: Evidence from the National Longitudinal Survey of Children and Youth

Simulation study of instrumental variable approaches with an application to a study of the antidiabetic effect of bezafibrate

Estimating treatment effects with observational data: A new approach using hospital-level variation in treatment intensity

Identifying Peer Influence Effects in Observational Social Network Data: An Evaluation of Propensity Score Methods

Advanced IPD meta-analysis methods for observational studies

NORTH SOUTH UNIVERSITY TUTORIAL 2

The Late Pretest Problem in Randomized Control Trials of Education Interventions

Regression Discontinuity Designs: An Approach to Causal Inference Using Observational Data

University of Michigan School of Public Health

A Bayesian Perspective on Unmeasured Confounding in Large Administrative Databases

Bayesian methods for combining multiple Individual and Aggregate data Sources in observational studies

The Dynamic Effects of Obesity on the Wages of Young Workers

(b) empirical power. IV: blinded IV: unblinded Regr: blinded Regr: unblinded α. empirical power

Collecting Data Example: Does aspirin prevent heart attacks?

The Limits of Inference Without Theory

Objective: To describe a new approach to neighborhood effects studies based on residential mobility and demonstrate this approach in the context of

Can higher cigarette taxes improve birth outcomes?

Midterm Exam ANSWERS Categorical Data Analysis, CHL5407H

Causal Inference for Medical Decision Making

Gender and Generational Effects of Family Planning and Health Interventions: Learning from a Quasi- Social Experiment in Matlab,

1. Introduction Consider a government contemplating the implementation of a training (or other social assistance) program. The decision to implement t

Instrumental Variables. Application and Limitations

10. Introduction to Multivariate Relationships

TEXT: Freedman-Pisani-Purves, Statistics, 3rd edition, W.W. Norton & Co. Bring book to class, WHICH WILL BE HELD IN 344 EVANS, Tu Th

Stronger Instruments and Refined Covariate Balance in an Observational Study of the Effectiveness of Prompt Admission to the ICU in the UK

3.4 What are some cautions in analyzing association?

CASE STUDY 2: VOCATIONAL TRAINING FOR DISADVANTAGED YOUTH

Case A, Wednesday. April 18, 2012

Chapter 8 Statistical Principles of Design. Fall 2010

Challenges in design and analysis of large register-based epidemiological studies

Propensity Score Analysis Shenyang Guo, Ph.D.

Empirical Tools of Public Finance. 131 Undergraduate Public Economics Emmanuel Saez UC Berkeley

Applied Quantitative Methods II

Disparity Data Fact Sheet General Information

Statistical methods for assessing treatment effects for observational studies.

Matched Cohort designs.

Confidence Intervals On Subsets May Be Misleading

The role of self-reporting bias in health, mental health and labor force participation: a descriptive analysis

Lec 02: Estimation & Hypothesis Testing in Animal Ecology

Evidence-Based Medicine Journal Club. A Primer in Statistics, Study Design, and Epidemiology. August, 2013

NBER WORKING PAPER SERIES HOW WAS THE WEEKEND? HOW THE SOCIAL CONTEXT UNDERLIES WEEKEND EFFECTS IN HAPPINESS AND OTHER EMOTIONS FOR US WORKERS

Mediation Analysis With Principal Stratification

Two-sample Categorical data: Measuring association

From Description to Causation

Supplement 2. Use of Directed Acyclic Graphs (DAGs)

Contingency Tables Summer 2017 Summer Institutes 187

Recent advances in non-experimental comparison group designs

Section The Question of Causation

Complier Average Causal Effect (CACE)

Marno Verbeek Erasmus University, the Netherlands. Cons. Pros

Problem Set 5 ECN 140 Econometrics Professor Oscar Jorda. DUE: June 6, Name

Common Statistical Issues in Biomedical Research

t-test for r Copyright 2000 Tom Malloy. All rights reserved

Student Performance Q&A:

Biases in clinical research. Seungho Ryu, MD, PhD Kanguk Samsung Hospital, Sungkyunkwan University

ICPSR Causal Inference in the Social Sciences. Course Syllabus

Bios 6648: Design & conduct of clinical research

Biases in clinical research. Seungho Ryu, MD, PhD Kanguk Samsung Hospital, Sungkyunkwan University

Research Designs. Inferential Statistics. Two Samples from Two Distinct Populations. Sampling Error (Figure 11.2) Sampling Error

Structural vs. Atheoretic Approaches to Econometrics

What is Multilevel Modelling Vs Fixed Effects. Will Cook Social Statistics

Chapter 6. Experiments in the Real World. Chapter 6 1

ECON Microeconomics III

Confounding and Bias

Transcription:

Instrumental variables and their sensitivity to unobserved biases Dylan Small Department of Statistics, Wharton School, University of Pennsylvania Based on joint work with Paul Rosenbaum

Overview Instrumental Variable (IV) Method: Method of controlling for unmeasured confounding Example: Effect of World War II military service on future earnings. Sensitivity to unobserved biases for IV method. How strength of IVs affects sensitivity to unobserved biases: Implications for designing studies with IVs.

Example: WWII Veteran Status and Earnings Does military service raise or lower earnings? Angrist and Krueger (1994) studied this in context of WWII military service and 1980 earnings (using 5% public use sample of US Census). Lower earnings? Military service in WWII interrupts education or career. Higher earnings? Labor market might favor veterans, GI Bill increases education.

This is association not causation: WWII Vets might not be comparable to Non-Vets in terms of health, criminal behavior WWII Vets (76% of men) earned on average $4500 more in 1980 than Non-Vets.

Addressing Confounding Confounding Variable: Variable that is (i) not comparable between treatment and control groups. (ii) affects outcome. e.g., health, criminal behavior. If all confounders measured, they can be adjusted for by regression, propensity scores, matching methods But health, criminal behavior not measured in the Census

Unmeasured Confounding Graph is conditional on measured confounders (race, education up to 8 years, location of birth) Earnings Veteran Status Unobserved Confounders (Health, criminal behavior, etc.

Instrumental Variables Strategy Y=Outcome W=Treatment Z=IV Graph is conditional on measured confounders (race, education up to 8 years, location of birth) W: Veteran Status Y:Earnings Z: Year of Birth Unobserved Confounders (Health etc.) Extract variation in W from Z that is free of unobserved confounders and use this variation to estimate the causal effect of W on Y. Key IV Assumptions: (1) Z independent of unobserved variables; (2) Z does not have direct effect on outcome.

IV Applications in Health Research Outcome (Y ) Treatment (W ) IV ( Z ) Reference Mortality More intensive vs. less intensive treatment for heart attack patients Distance lived from cardiac care center Mortality Mortality Conventional vs. atypical antipsychotics Premature baby delivered at high level NICU vs. local hospital Prescribing physician s preference Mother s differential distance between high level NICU and local hospital McLellan, McNeil and Newhouse (1994) Wang, Schneeweiss et al. (2005) Lorch, Baiocchi, Ahlberg and Small (2012) Birth weight Maternal smoking State cigarette taxes Evans and Ringel (1999) Birth weight Maternal smoking Random assignment of free smoker s counseling Permutt and Hebel (1989) Heart attack HDL cholesterol Genes that affect HDL Voight et al. (2012)

Prototype IV Design: Matched Pair Encouragement Design Consider a matched pair design in which there are I matched pairs (say matched for measured confounders) and one unit j in each pair i is encouraged to receive treatment ( Z ij = 1) and the other unit j is not encouraged to receive treatment ( Z = 0). ij ' In this context, the encouragement variable Z is said to be a valid instrumental variable (IV) if Z is effectively randomly assigned: 1 1 PZ ( i1 = 1, Zi2 = 0) =, PZ ( i1 = 0, Zi2 = 1) = 2 2 (i.e., Z is not related to any unmeasured confounders). Inference can be based on two stage least squares or permutation inference. 95% CI for effect of military service on earnings using 1926 vs. 1928 as IV: (-$1445, -$500)

A picture of the IV argument -- We created matched triples: men matched on quarter of birth, race, age, education up to 8 years and location of birth. -- This figure provides reason to doubt military service increases earnings by $4500. -- From 1924 to 1926, the proportion of veterans stayed about constant and the earnings stayed about the same. From 1926 to 1928, the proportion of veterans decreased by 50% but earnings increased, suggesting military service decreases earnings.

Sensitivity Analysis IV method assumes that the IV (encouragement) is effectively randomly assigned: 1 1 PZ ( i1 = 1, Zi2 = 0) =, PZ ( i1 = 0, Zi2 = 1) = 2 2 There is often concern about whether this is true. In WWII Study, there are gradual long term trends in apprenticeship, education, employment and nutrition that might bias comparisons of workers born two years apart. A sensitivity analysis asks how departures from random assignment of the IV of various magnitudes might alter a study s conclusion.

Model for Sensitivity Analysis For subject ij, let π ij denote the probability that ij is encouraged, π = PZ ( = 1). ij ij Suppose that two subjects ij and ik may differ in their odds of being encouraged by at most a factor of Γ 1 because they differ in terms of an unobserved covariate, uij uik, 1 πij (1 πik ) Γ i, jk,. Γ π (1 π ) ik ij If Γ= 1, IV is randomly assigned. If Γ> 1, then distribution of treatment assignments is unknown but magnitude of departure from random assignment controlled by Γ.

Sensitivity Analysis for WWII Study Upper Bound on One-Sided Significance Level for 1926 vs. 1928 IV Γ H : β 0 H : β 4,500 0 0 1 0.001 0.001 1.2 1.000 0.001 1.5 1.000 0.027 1.6 1.000 0.904 2.2 1.000 1.000 2.3 1.000 1.000 β = causal effect of military service on earnings

Strength of IV We d like our study to be as insensitive to bias as possible, i.e., finding is significant for as large a Γ as possible. How does strength of IV affect sensitivity to bias. An IV is strong if encouragement has a strong effect on treatment received; An IV is weak if encouragement has only a weak effect on treatment received. Study Strong IV Weak IV World War II Study 1926 vs. 1928 1924 vs. 1926 Maternal Smoking Study Random assignment of State cigarette taxes free counseling Effects of Weak IVs 1. Increased Variance 2. Increased Sensitivity to Bias

Effect of Weak IVs I: Increased Variance Y W X Z X Unobserved Variables If Z is a weak IV, then the variance of the IV estimate will be higher because less variation in W from Z can be extracted. 95% CI for effect of military service using 1926 vs. 1928 IV: (-$1,445, -$500). 95% CI for effect of military service using 1924 vs. 1926 IV: (-$10,130, $10,750)

Effect of Weak IVs II: Increased Sensitivity to Bias Power of a Sensitivity Analysis (Rosenbaum, 2004) 1 Suppose Z were in fact a valid IV so that PZ ( i1 = 1, Zi2 = 0) = PZ ( i1 = 0, Zi2 = 1) =, 2 but we didn t know this and wanted to allow for some sensitivity to bias measured by Γ Suppose also that β β0 (true causal effect minus null hypothesis causal effect) was large, so that H0 : β = β0 was substantially in error. We would like to be able to reject H0 : β = β0 when the bias could be up to some Γ (e.g., Γ =1.5). Power of a sensitivity analysis at Γ : Probability that we will reject H0 : β = β0 for Γ assuming that Z is a valid IV and a given value of β β0.

Effect size: ( β β0)/ σ = 1 Number of pairs I Strength of IV: Γ 100 1000 10,000 100,000 lim P(Treat IV=1)- I P(Treat IV=0) 1 1 1.00 1.00 1.00 1.00 1 0.5 1 0.99 1.00 1.00 1.00 1 0.1 1 0.12 0.73 1.00 1.00 1 1 1.2 1.00 1.00 1.00 1.00 1 0.5 1.2 0.92 1.00 1.00 1.00 1 0.1 1.2 0.03 0.03 0.04 0.10 1 1 2 1.00 1.00 1.00 1.00 1 0.5 2 0.18 0.97 1.00 1.00 1 0.1 2 0.00 0.00 0.00 0.00 0 When the IV is valid ( Γ= 1), the power is of course greater for stronger IVs but there is good power for all cases with sample size of 10,000 pairs. Valid but weak IVs eventually get it right. But when 1 Γ>, the power can tend to 1 or 0 depending on the strength of the IV. Weak IVs are quite sensitive to small biases.

Practical Consequences 1. Weak IVs that might have small bias are dangerous to use. Weak IVs are sensitive to quite small biases ( Γ> 1 yet Γ close to 1), even when the effect size ( β β0)/ σ is quite large. Unless one is confident that a weak IV is perfectly valid ( Γ= 1), its extreme sensitivity to small biases is likely to limit its usefulness to the study of enormous effects, ( β β0)/ σ >> 1. 2. Strong IVs that might be moderately biased are useful. A strong IV may provide useful information even if moderate biases are plausible. Consider two studies, a small study with a strong IV and a large study with a weak IV, which would have the same power if both IVs are unbiased. When there is concern that the IVs might be biased, the small study with a strong IV has considerable advantages.

Potential IVs in Health Outcomes Research Potential IV Differential Distance to Nearest Provider of Treatment A vs. Treatment B Geographic or Hospital Preference for Treatment A vs. B Physician Preference for Treatment A vs. B Calendar Time (one treatment may become more common over time) Genetic Variants Timing of Admission to Hospital Insurance Plan Coverage for Treatment A vs. B Randomized Encouragement at Point of Care for Treatment A vs. B When No Clear Cut Choice Strength Weak or Strong Weak or Strong Weak or Strong Weak or Strong Usually Weak Weak or Strong Weak or Strong Potentially Strong Reference for this talk: Small, D. and Rosenbaum, P. (2008). War and wages: the strength of instrumental variables and their sensitivity to unobserved biases. Journal of the American Statisical Association, 103 924-933.