Putting the Evidence in Evidence-Based Policy. Jeffrey Smith University of Michigan

Similar documents
HCEO Summer School on Socioeconomic Inequality. Professor Jeffrey Smith Department of Economics University of Michigan

EMPIRICAL STRATEGIES IN LABOUR ECONOMICS

Empirical Strategies

Practical propensity score matching: a reply to Smith and Todd

Pros. University of Chicago and NORC at the University of Chicago, USA, and IZA, Germany

Lecture II: Difference in Difference. Causality is difficult to Show from cross

What is: regression discontinuity design?

The Limits of Inference Without Theory

Instrumental Variables I (cont.)

Viewpoint: Estimating the Causal Effects of Policies and Programs

Introduction to Applied Research in Economics Kamiljon T. Akramov, Ph.D. IFPRI, Washington, DC, USA

Lecture II: Difference in Difference and Regression Discontinuity

Applied Quantitative Methods II

Measuring Impact. Program and Policy Evaluation with Observational Data. Daniel L. Millimet. Southern Methodist University.

Empirical Strategies 2012: IV and RD Go to School

Class 1: Introduction, Causality, Self-selection Bias, Regression

ECON Microeconomics III

Introduction to Program Evaluation

Mostly Harmless Simulations? On the Internal Validity of Empirical Monte Carlo Studies

Quasi-experimental analysis Notes for "Structural modelling".

1. Introduction Consider a government contemplating the implementation of a training (or other social assistance) program. The decision to implement t

The Prevalence of HIV in Botswana

Introduction to Observational Studies. Jane Pinelis

Causality and Statistical Learning

Predicting the efficacy of future training programs using past experiences at other locations

Establishing Causality Convincingly: Some Neat Tricks

ICPSR Causal Inference in the Social Sciences. Course Syllabus

Estimation of treatment effects: recent developments and applications

Causality and Statistical Learning

Randomization as a Tool for Development Economists. Esther Duflo Sendhil Mullainathan BREAD-BIRS Summer school

Causal Validity Considerations for Including High Quality Non-Experimental Evidence in Systematic Reviews

Group Work Instructions

1. INTRODUCTION. Lalonde estimates the impact of the National Supported Work (NSW) Demonstration, a labor

EC352 Econometric Methods: Week 07

Empirical Strategies

P E R S P E C T I V E S

Evaluating the Regression Discontinuity Design Using Experimental Data

Session 1: Dealing with Endogeneity

Session 3: Dealing with Reverse Causality

Introduction to Applied Research in Economics

Ec331: Research in Applied Economics Spring term, Panel Data: brief outlines

Propensity Score Matching with Limited Overlap. Abstract

Introduction to Applied Research in Economics

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n.

Difference-in-Differences in Action

Rajeev Dehejia* Experimental and Non-Experimental Methods in Development Economics: A Porous Dialectic

MEA DISCUSSION PAPERS

Syllabus.

Randomized Evaluations

August 29, Introduction and Overview

Arturo Gonzalez Public Policy Institute of California & IZA 500 Washington St, Suite 800, San Francisco,

Erratum and discussion of propensity-score reweighting

NBER TECHNICAL WORKING PAPER SERIES USING RAMDOMIZATION IN DEVELOPMENT ECONOMICS RESEARCH: A TOOLKIT. Esther Duflo Rachel Glennerster Michael Kremer

Project Evaluation from Application to Econometric Theory: A Qualitative Explanation of Difference in Difference (DiD) Approach

Does Male Education Affect Fertility? Evidence from Mali

Effects of propensity score overlap on the estimates of treatment effects. Yating Zheng & Laura Stapleton

Confounding by indication developments in matching, and instrumental variable methods. Richard Grieve London School of Hygiene and Tropical Medicine

Reassessing Quasi-Experiments: Policy Evaluation, Induction, and SUTVA. Tom Boesche

MIT LIBRARIES. MiltmilWv I! ''.'''' ;i ill:i!.(;!!:i i'.ir-il:>!\! ItllillPlli. ' iilpbilisil HB31 .M415. no AHHH

Assessing Studies Based on Multiple Regression. Chapter 7. Michael Ash CPPA

TRACER STUDIES ASSESSMENTS AND EVALUATIONS

Understanding Regression Discontinuity Designs As Observational Studies

How Early Health Affects Children s Life Chances

Regression Discontinuity Design (RDD)

Can Quasi Experiments Yield Causal Inferences? Sample. Intervention 2/20/2012. Matthew L. Maciejewski, PhD Durham VA HSR&D and Duke University

The role of self-reporting bias in health, mental health and labor force participation: a descriptive analysis

Methodological Issues in Measuring the Development of Character

Aggregation Bias in the Economic Model of Crime

Keywords: causal channels, causal mechanisms, mediation analysis, direct and indirect effects. Indirect causal channel

A Potential Outcomes View of Value-Added Assessment in Education

Executive Summary. Future Research in the NSF Social, Behavioral and Economic Sciences Submitted by Population Association of America October 15, 2010

ACTION PLAN: REGULAR PROGRAMS AND ADDITIONAL STRATEGIES

Sample selection in the WCGDS: Analysing the impact for employment analyses Nicola Branson

DISCUSSION PAPER SERIES. No USING RANDOMIZATION IN DEVELOPMENT ECONOMICS RESEARCH: A TOOLKIT

Identifying Peer Influence Effects in Observational Social Network Data: An Evaluation of Propensity Score Methods

Carrying out an Empirical Project

Pooling Subjective Confidence Intervals

Marno Verbeek Erasmus University, the Netherlands. Cons. Pros

Causal inference nuts and bolts

Motherhood and Female Labor Force Participation: Evidence from Infertility Shocks

Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy Research

Empirical Methods in Economics. The Evaluation Problem

Recent advances in non-experimental comparison group designs

SOME NOTES ON THE INTUITION BEHIND POPULAR ECONOMETRIC TECHNIQUES

LOCAL INSTRUMENTS, GLOBAL EXTRAPOLATION: EXTERNAL VALIDITY OF THE LABOR SUPPLY-FERTILITY LOCAL AVERAGE TREATMENT EFFECT

Mastering Metrics: An Empirical Strategies Workshop. Master Joshway

Causal Methods for Observational Data Amanda Stevenson, University of Texas at Austin Population Research Center, Austin, TX

TESTING FOR COVARIATE BALANCE IN COMPARATIVE STUDIES

AN important feature of publicly sponsored job training

Geographic Data Science - Lecture IX

CASE STUDY 2: VOCATIONAL TRAINING FOR DISADVANTAGED YOUTH

The Stable Unit Treatment Value Assumption (SUTVA) and Its Implications for Social Science RCTs

Economics 2010a. Fall Lecture 11. Edward L. Glaeser

Economics 270c. Development Economics. Lecture 9 March 14, 2007

Jake Bowers Wednesdays, 2-4pm 6648 Haven Hall ( ) CPS Phone is

Dinamiche e persistenze nel mercato del lavoro italiano ed effetti di politiche (basi di dati, misura, analisi)

Empirical Tools of Public Finance. 131 Undergraduate Public Economics Emmanuel Saez UC Berkeley

Preliminary Draft. The Effect of Exercise on Earnings: Evidence from the NLSY

Comparing Experimental and Matching Methods using a Large-Scale Field Experiment on Voter Mobilization

Transcription:

Putting the Evidence in Evidence-Based Policy Jeffrey Smith University of Michigan State of the Art Lecture Canadian Economic Association Meetings Ottawa, Ontario June 2011

Gratitude Arthur Sweetman in particular Canadian economics more generally

Outline Evaluation policies Parameters of interest Types of evidence / rankings and comparisons General equilibrium effects Cost-benefit analysis Data quality Evaluation policy Alternatives to econometric evaluation Institutions Conclusions

Potential outcomes framework Y 0i = untreated outcome for unit i Y 1i = treated outcome for unit i Y Y 1i 0i i = = impact for unit i Note that the impact varies across units! This is intuitive in most cases but often ignored in studies that focus on the effect of some policy.

Parameters of interest Old standbys Treatment on the treated: EY ( 1i Y0i D= 1) Average treatment effect: EY ( 1i Y0i) New kids on the block Local average treatment effect (more later) Quantile treatment effects Variance of impacts Different parameters correspond to different policy questions. See e.g. Djebbari and Smith (2008) Journal of Econometrics

Heterogeneous treatment effects Treatment effect may also vary with Features of the treated units Features of the policy environment Features of the economic environment Example: U.S. Job Training Partnership Act and barriers to work Example: active labor market policy over the cycle, as in Lechner and Wunsch (2009) Journal of Labor Economics This variation is important for understanding how treatments work and for targeting (as in SOMS) either formally or informally

Experiments Random assignment makes causal claims very compelling indeed! Example: IES intensive teacher mentoring experiment Example: IES abstinence-only sex education experiment Experiments are easy to explain to politicians See Heckman and Smith (1995) and HLS (1999).

Issues with experimental evaluations Experiments can be conducted well or poorly and can sometimes be manipulated by program staff Substitution and dropout change the parameter being estimated and complicate interpretation Experiments provide only partial equilibrium estimates

Recent developments in experiments Econometric advances with group level random assignment Using experiments to test or calibrate structural models Ex: Todd and Wolpin (2005) and Lise et al. (2011) Using experiments to estimate structural parameters Back to the future example: US NIT experiments Ex: Ludwig et al. (2011)

The bottom lines on experiments Burt Barnow: experiments are not a substitute for thinking! But do more experiments! Or, in the case of Canada: do some experiments!

Selection on observed variables Random assignment conditional on observed variables! Is the goddess this kind to evaluation researchers? Important implications for the choice of evaluation policy and, by extension, for data collection

Key conceptual issue: what observed variables are required? Depends on policy type and on institutions Experiments are helpful here, as is thinking about the economics and reading the literature Defines an ongoing research strategy of collecting new types of variables suggested by theory and seeing if they matter Ex: WIA / LEHD project by Holzer, et al. Ex: Lechner and Wunsch (2011) General lesson: lagged outcomes really important Ex: Dolton and Smith (2011)

Matching, weighting and regression Parametric linear regression Angrist (1998): OLS does not estimate ATET! Nearest neighbor (or pair) matching Rules the statistics literature Optimal full matching as in e.g. Hansen (2004) Kernel matching and related schemes Estimating a non-parametric regression of Y 0 on PX ( ) Can also frame was a weighting estimator

Matching, weighting and regression (continued) Inverse propensity weighting Horvitz and Thompson (1952) JASA Attains the semi-parametric efficiency bound Watch out for probabilities near zero and one Inverse probability tilting (odd name, nice estimator) Important recent Monte Carlo analyses Busso, DiNardo and McCrary (2009a,b) Huber, Lechner and Wunsch (2011) Justin McCrary: matching is an attempt to approximate what reweighting is doing directly

Dynamic treatment assignment Going beyond Heckman and Robb (1985): What if treatment is offered in more than one period? Paradigmatic example: active labor market programs in Europe No-treatment comparison group implicitly conditions on future outcomes Solution (?): estimate the impact of treatment now versus waiting, where waiting may imply future treatment Key paper: Sianesi (2004) Review of Economics and Statistics

Instrumental variables Looking for instruments Sometimes instruments can be designed into the institutions Random assignment is the ideal instrument RD and D-in-D can be framed as IV estimators Key issue 1: instrument validity False Folk Theorem of Empirical Economics #9 Ex: Buckles and Hungerman (2010) on quarter of birth

Instrumental variables (continued) Key Issue 2: instruments identify Local Average Treatment Effects When is the LATE interesting? Example: instruments based on the cost of participation Example: instruments based on multiple births Example: mandatory schooling laws Example: Conley and Dupor (2011) A bad instrument may be worse than no instrument See the recent exchanges between Imbens, Heckman and Deaton

Longitudinal methods Includes diff-in-diff and more general panel methods Relies on bias stability assumption, wherein selection into treatment depends only on time-invariant unobserved differences across units Can be designed in via staged rollout of treatments over multiple administrative units Ex: Progresa evaluation in Mexico Ex: Healthy Babies, Healthy Children in Ontario But, does Canada need more provinces?

Longitudinal methods (continued) Issues with current applied econometric practice: Treatment timing depends on transitory shocks to outcomes Example: Ashenfelter s Dip for ALMP participants Changes in treatment effects over time relative to policy initiation Example: Card and Krueger minimum wage study Example: tax reform difference-in-differences studies Anticipatory effects of treatment See Heckman s (1996) comment on Nada Eissa s (1996) paper in the book Empirical Foundations of Household Taxation

Regression discontinuity Design RD into the institutions must be enforced! Need population at the RD Only identifies treatment effect at the discontinuity Beware of age and time discontinuities (think about economics) Example: US WPRS Black, Smith, Berger and Noel (2003) AER Example: Reading First (IES) Example: Healthy Babies, Healthy Children in Ontario

Methods advances in RD Power calculations for RD designs Schochet (2008) IES report Bandwidth selection Imbens and Kalyanaraman (2009) Specification testing McCrary (2008) Review of Economics and Statistics Discrete running variable Lee and Card (2007) Journal of Econometrics

RD surveys Imbens and Lemieux (2008) Review of Economics and Statistics Lee and Lemieux (2010) Journal of Economic Literature

Comparisons of econometric evaluation methods Much of the literature asks the wrong question. What we should want: mapping from parameter, data, and institutions to choice of econometric method (or stop) This mapping is not degenerate! There is no magic bullet. Example: NSW literature, e.g. LaLonde (1986), Heckman and Hotz (1989), Dehejia and Wahba (1999, 2002), Smith and Todd (2005a,b) Not about does matching work but about whether the NSW data suffice for identification with various econometric estimators

Rankings of econometric evaluation methods US IES What Works Clearinghouse: Three grades (plus omission from the clearinghouse) Only experiments and RD papers can get the top grade Thoughtful, clearly-defined written evaluation criteria Multiple evaluators for every study + resolution Formal appeals process Lots of within variation as well as between variation How much of the overall variation in quality is explained by identification strategy is an interesting empirical question

General equilibrium effects SUTVA and partial equilibrium analysis Implications for data collection Identification strategy 1: Geographically dispersed markets Ex: Forslund and Krueger (1992) Ex: Angelucci and di Giorgio (2008) Identification strategy 2: Structural models Example: Davidson and Woodbury (1992) Example: Heckman, Lochner and Taber (1998) Example: Lise, Seitz and Smith (2009)

Cost-benefit analysis Discount rates Excess burden How long do the impacts last? Example: National Supported Work Demonstration Example: National Job Corps Study No easy answer be nice to have more data points Incorporate all relevant outcomes Example: crime in the National Job Corps Study Example: fertility in Germany; Lechner and Wiehler (2007) Some may be difficult to monetize

Cost-benefit analysis (continued) Need good data on costs Canada could do (much) better here So could everyone else Ex: Farrell s Ice Cream Parlour Restaurant Incorporate general equilibrium effects Use the literature as a guide if no study-specific estimates Heckman, LaLonde and Smith (1999) Some programs do not work the Nothing Works Clearinghouse

Evaluation policy Three broad approaches and their issues: 1. Do experiments Expensive Politically challenging Not everything can be randomized 2. Rich administrative data (think Denmark or Germany) Strategic reporting (IPEDS in the US) Empty fields (education in Canada) Imperfect linking May still need to supplement with surveys Privacy issues

Evaluation policy (continued) 3. Survey based evaluations Can link surveys to administrative data Response rates falling on voluntary surveys Political costs of mandatory surveys Sampling getting harder over time Response mode issues Item non-response Canada is implicitly pursuing the third strategy. A blunt discussion of the issue at this level would be valuable.

Alternatives to econometric program evaluation Charlatans Performance management Heinrich, Heckman and Smith (2002), Heckman et al. (2011) Customer satisfaction / participant self-evaluations Example: Teaching American History Example: Smith, Whalley and Wilcox (2011)

Institutions Public use (or research use) data sets Peer review HRSDC doing well here Incentives for consultants to publish in academic journals Seconding academics to government agencies (e.g. chief economist at US DOL) Research conferences for academics, policy people and consultants Independent (more or less) evaluation shops (e.g. IFAU)

Conclusions The truth is out there! There is no such thing as the effect of a policy There is no substitute for thinking about identification Worry about general equilibrium effects Get the cost/benefit analysis right; avoid one number No easy alternatives to econometric policy evaluation Institutions matter