Introduction to Program Evaluation

Similar documents
Version No. 7 Date: July Please send comments or suggestions on this glossary to

Glossary From Running Randomized Evaluations: A Practical Guide, by Rachel Glennerster and Kudzai Takavarasha

Lecture II: Difference in Difference. Causality is difficult to Show from cross

Lecture II: Difference in Difference and Regression Discontinuity

Impact Evaluation Toolbox

Evaluating Social Programs Course: Evaluation Glossary (Sources: 3ie and The World Bank)

Analysis Plans in Economics

Instrumental Variables Estimation: An Introduction

Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy Research

CASE STUDY 2: VOCATIONAL TRAINING FOR DISADVANTAGED YOUTH

Impact Evaluation Methods: Why Randomize? Meghan Mahoney Policy Manager, J-PAL Global

The Logic of Data Analysis Using Statistical Techniques M. E. Swisher, 2016

TRANSLATING RESEARCH INTO ACTION. Why randomize? Dan Levy. Harvard Kennedy School

Applied Quantitative Methods II

Experimental Methods. Policy Track

Problem Situation Form for Parents

The Limits of Inference Without Theory

Ec331: Research in Applied Economics Spring term, Panel Data: brief outlines

How Early Health Affects Children s Life Chances

Measuring Impact. Program and Policy Evaluation with Observational Data. Daniel L. Millimet. Southern Methodist University.

Lecture Slides. Elementary Statistics Eleventh Edition. by Mario F. Triola. and the Triola Statistics Series 1.1-1

Public Policy & Evidence:

Instrumental Variables I (cont.)

Pros. University of Chicago and NORC at the University of Chicago, USA, and IZA, Germany

Chapter 1 Review Questions

TRACER STUDIES ASSESSMENTS AND EVALUATIONS

Measuring Impact. Conceptual Issues in Program and Policy Evaluation. Daniel L. Millimet. Southern Methodist University.

Introduction to Research Methods

AP Psychology -- Chapter 02 Review Research Methods in Psychology

ECON Microeconomics III

Methods of Randomization Lupe Bedoya. Development Impact Evaluation Field Coordinator Training Washington, DC April 22-25, 2013

Vocabulary. Bias. Blinding. Block. Cluster sample

Threats and Analysis. Shawn Cole. Harvard Business School

The 5 Things You Can Do Right Now to Get Ready to Quit Smoking

Introduction: Statistics, Data and Statistical Thinking Part II

Issues in African Economic Development. Economics 172. University of California, Berkeley. Department of Economics. Professor Ted Miguel

Script for Contacting People that you Know

Randomization as a Tool for Development Economists. Esther Duflo Sendhil Mullainathan BREAD-BIRS Summer school

Economics 2010a. Fall Lecture 11. Edward L. Glaeser

Econometric analysis and counterfactual studies in the context of IA practices

Regression Discontinuity Design

Statistics Mathematics 243

Teresa Anderson-Harper

WRITTEN PRELIMINARY Ph.D. EXAMINATION. Department of Applied Economics. January 17, Consumer Behavior and Household Economics.

Instructions for Printing Outbreak Scenario Cards:

An Economic Approach to Generalize Findings from Regression-Discontinuity Designs

Class #5: THOUGHTS AND MY MOOD

Causality and Statistical Learning

Chapter 1 Introduction to Educational Research

CSE 258 Lecture 1.5. Web Mining and Recommender Systems. Supervised learning Regression

Class 1: Introduction, Causality, Self-selection Bias, Regression

Introduction: Statistics and Engineering

Quasi-experimental analysis Notes for "Structural modelling".

The Diabetes Breakthrough: Dr. Osama Hamdy on his 12-week Plan

Business Statistics Probability

Planning for a time when you cannot make decisions for yourself

STAT 201 Chapter 3. Association and Regression

with Deborah Gruenfeld Professor, Stanford Graduate School of Business People decide how competent you are in a fraction of a second.

Regression Discontinuity Designs: An Approach to Causal Inference Using Observational Data

Research Questions, Variables, and Hypotheses: Part 2. Review. Hypotheses RCS /7/04. What are research questions? What are variables?

QUESTIONS ANSWERED BY

I. Introduction and Data Collection B. Sampling. 1. Bias. In this section Bias Random Sampling Sampling Error

Anthony Robbins' book on success

Political Science 15, Winter 2014 Final Review

Behaviorism: An essential survival tool for practitioners in autism

When Your Partner s Actions Seem Selfish, Inconsiderate, Immature, Inappropriate, or Bad in Some Other Way

TRANSLATING RESEARCH INTO ACTION

Understanding Science Conceptual Framework

QUASI-EXPERIMENTAL APPROACHES

Econ 270: Theoretical Modeling 1

Threats and Analysis. Bruno Crépon J-PAL

What can go wrong.and how to fix it!

An Experimental Investigation of Self-Serving Biases in an Auditing Trust Game: The Effect of Group Affiliation: Discussion

Introduction to Applied Research in Economics Kamiljon T. Akramov, Ph.D. IFPRI, Washington, DC, USA

How To Get Mentally Fit & Motivated

Recent advances in non-experimental comparison group designs

The Practice of Statistics 1 Week 2: Relationships and Data Collection

QUASI-EXPERIMENTAL HEALTH SERVICE EVALUATION COMPASS 1 APRIL 2016

Analysis A step in the research process that involves describing and then making inferences based on a set of data.

What is: regression discontinuity design?

Key questions when starting an econometric project (Angrist & Pischke, 2009):

FREQUENTLY ASKED QUESTIONS MINIMAL DATA SET (MDS)

3 CONCEPTUAL FOUNDATIONS OF STATISTICS

12 hours. Your body has eliminates all excess carbon monoxide and your blood oxygen levels become normal.

Reassessing Quasi-Experiments: Policy Evaluation, Induction, and SUTVA. Tom Boesche

Chapter Eight: Multivariate Analysis

Finding True Program Impacts Through Randomization

A NON-TECHNICAL INTRODUCTION TO REGRESSIONS. David Romer. University of California, Berkeley. January Copyright 2018 by David Romer

I have dementia... First steps after diagnosis

A Practical Guide to Getting Started with Propensity Scores

Workbook Relapse Prevention Name of the patient

Measuring impact. William Parienté UC Louvain J PAL Europe. povertyactionlab.org

GUIDE 4: COUNSELING THE UNEMPLOYED

Living Well with Diabetes. Meeting 12. Welcome!

2 INSTRUCTOR GUIDELINES

Worries and Anxiety F O R K I D S. C o u n s e l l i n g D i r e c t o r y H a p p i f u l K i d s

Computer Science 101 Project 2: Predator Prey Model

What makes us special? Ages 3-5

Mixed Methods Study Design

Transcription:

Introduction to Program Evaluation Nirav Mehta Assistant Professor Economics Department University of Western Ontario January 22, 2014 Mehta (UWO) Program Evaluation January 22, 2014 1 / 28

What is Program Evaluation? Using statistics to determine the effect of a treatment on an outcome (or outcomes) of interest. What is a treatment? It can be: a policy: Introducing school choice into a public school district an individual decision: Attending university for one year Finishing university Eating a burrito Two ways of recovering the effect of a treatment Experimental: Randomization of treatment Use observational data and a combination of statistical and behavioral assumptions Mehta (UWO) Program Evaluation January 22, 2014 2 / 28

My perspective I am currently working on projects in: the Economics of Education How school choice affects student achievement The effect of ability tracking on student achievement Health Economics The design of optimal physician incentive schemes Mehta (UWO) Program Evaluation January 22, 2014 3 / 28

What can we use program evaluation for? Three types of analyses: Retrospective: How did introducing a school choice program affect student achievement? Prospective: How would introducing a school choice program that has already been implemented on Group A affect student achievement for students in Group B? Prospective: How would a school choice program, which has never been implemented, affect student achievement for students in Group A or Group B or anyone else? Using retrospective analyses to prospectively evaluate programs requires extrapolation (i.e. additional assumptions). Mehta (UWO) Program Evaluation January 22, 2014 4 / 28

Leading example There is a public school district with one public school. A new public school enters the district. How does attending the new school affect student achievement? Mehta (UWO) Program Evaluation January 22, 2014 5 / 28

A little notation to fix ideas Individual i, time t Observed characteristics X it (household income, parental education,...) Unobserved characteristics ɛ it (motivation, ability, waking up on the right/wrong side of the bed,...) Treatment status D Dit = 0 means i didn t have the treatment at time t Dit = 1 means i had the treatment at time t Outcome Y is a function of individual characteristics and treatment status (score on standardized test or probability of graduating high school) Y (X, ɛ, D) Treatment effect it Y (X it, ɛ it, 1) Y (X it, ɛ it, 0) combination of behavioral responses and input changes Mehta (UWO) Program Evaluation January 22, 2014 6 / 28

Treatment effects There is in general a distribution of treatment effects. Put another way, there s no reason to expect that i = for all people. The effect of being in a new school not only reflects the potentially different characteristics of those students. Also incorporates behavioral responses that can affect a student s learning. For example, parents might help their child more or less when their child is in a particular school. These responses could also depend on student characteristics: the amount and efficacy of parent help on home may depend on parental education. Generalizing our findings to other students or another school requires us to make assumptions about these behavioral responses. Mehta (UWO) Program Evaluation January 22, 2014 7 / 28

Leading example First question: What is the treatment? We will focus on the students attending the new school for this talk. Note: We could also see what the effect of attending the old school when the new school enters (spillover effect of competition) is! Second question: Which summary of the treatment effect? Focus on average for today. Third question: Which students? (average for whom?) students attending the new school TT: Treatment on the treated students attending the old school TU: Treatment on the untreated all students who attended the old school last year? ATE: Average treatment effect Mehta (UWO) Program Evaluation January 22, 2014 8 / 28

Interpreting averages Most researchers focus on the average effect of a program on some subgroup of the population. Although convenient, this almost never innocuous! A small, positive, average treatment effect could be consistent with a small improvement for most people. a very large, positive change for some people. e.g. the worst students learn how to read e.g. the best students get into their super top choice university a very large, positive change for some and a large, negative change for some other people! Mehta (UWO) Program Evaluation January 22, 2014 9 / 28

Why is program evaluation hard? Look at student i, who attended the old public school in t 1 but then switched to the new public school in year t. D Outcome D Outcome t 1 0 Y (X i,t 1, ɛ i,t 1, 0) t 0 Y (X it, ɛ it, 0) 1 Y (X it, ɛ it, 1) Missing data problem: The object of interest is it Y (X it, ɛ it, 1) - Y (X it, ɛ it, 0) Y (X it, ɛ it, 0) is a counterfactual outcome We can t observe outcomes under both treatment conditions Therefore, we need to find a valid comparison group. Mehta (UWO) Program Evaluation January 22, 2014 10 / 28

Counterfactual outcomes Our definition of the treatment effect, and the summary of the treatment effect we re interested in (i.e. the average for some group of students) provide criteria for a comparison group. We observe Y (X it, ɛ it, 1). We need to come up with Y (X it, ɛ it, 0). Mehta (UWO) Program Evaluation January 22, 2014 11 / 28

Counterfactual outcomes In the language of our model, someone with the same observable characteristics (X ) and the same unobservable characteristics (ɛ) who did not participate in the treatment (D = 0) would suffice, given no further assumptions on Y. What if treatment status was related to unobservable characteristics, e.g. more motivated students are more likely to enroll in a new, demanding school. Many methods try to make people comparable across these unobservable characteristics. More on this later. Mehta (UWO) Program Evaluation January 22, 2014 12 / 28

Strategies for program evaluation Trade-off: The more (or stronger) assumptions you make, the more you can extrapolate. Economists, and other social scientists, have used the following: 1. Randomized control trial 2. Cross sectional comparisons 3. Fixed effects 4. Fixed effects and common time trend (Difference in differences) 5. Multivariate regression 6. Matching (e.g. on propensity scores) 7. Regression discontinuity design 8. Instrumental variables 9. Structural estimation Mehta (UWO) Program Evaluation January 22, 2014 13 / 28

1: Randomized control trial Say we randomly assigned attendance at the new school amongst all students at the old school. This is like having two subgroups of students who had the same distribution of (X, ɛ), but different treatment statuses D. We can recover the average for students on those subgroups by taking the average difference in outcomes between the two groups! Mehta (UWO) Program Evaluation January 22, 2014 14 / 28

Interpreting results from randomized control trials People like to say that RCTs are the gold standard for evaluating programs. The monetary gold standard is an obsolete relic that people like talking about all the time. So, I agree. If we are unwilling to assume that all students would be affected in exactly the same manner we have to make more assumptions to make use of findings from RCTs. Mehta (UWO) Program Evaluation January 22, 2014 15 / 28

Problems with RCTs They are retrospective. They are expensive. I don t have enough grant money or time to conduct RCTs every time I want to study something new. It s hard to generalize findings from one RCT to another if treatment effects are heterogeneous. We need further structure to understand the results. What is it about the new school that resulted in those amazing outcomes, and is replicable? Similar to generalizability. Will the next new school be exactly the same? This could be a HUGE deal. Education interventions, found effective through RCTs, often don t scale up. If we put further assumptions on Y, we start down the path of using observational (think: survey or administrative) data. Mehta (UWO) Program Evaluation January 22, 2014 16 / 28

Methods using observational data Mehta (UWO) Program Evaluation January 22, 2014 17 / 28

2: Cross-section Let s invoke the commonly used additively separable framework: Y it = X it β + it D it + ɛ it and assume that the effect of treatment is constant Y it = X it β + D it + ɛ it The inferential problem is that the random variable D it may be correlated with ɛ it. If highly motivated students (large, positive ɛ) were also the ones who switched to the new school, we might overstate the effect of attending the new school. Therefore, comparing people who received the treatment with those who did not may bias our estimate of. Mehta (UWO) Program Evaluation January 22, 2014 18 / 28

3: Fixed effects Assumption: What if motivation were constant over time? Y it = X it β + D it + α i + η t + µ it }{{} ɛ it We could then difference outcome equation within each student, over time. Run regression on differenced data. The year where there s a switch will identify. If a student switched because they were even more motivated in year t, we d have a problem! Mehta (UWO) Program Evaluation January 22, 2014 19 / 28

4: Difference in differences What is the new school was also introduced when there was a common, unobserved, shock? Say the new school entered because the district is in turmoil, which lowers achievement. Take two students, in the same district, one of whom had the treatment and the other who did not have the treatment. Take the difference between their differences! This gets rid of both α and η. Run regression on differenced data ( Diff in diff ) Mehta (UWO) Program Evaluation January 22, 2014 20 / 28

Condition on observables: 5. Regression and 6. Matching Include as many variables as you can in the linear regression. Hope you capture offending terms in ɛ. This is the same in principle as matching, basically find people with X as similar as possible. Just making an assumption about how the X enter the outcome equation. Both of these can look like data mining. beneath social scientists statistical issues, as well Mehta (UWO) Program Evaluation January 22, 2014 21 / 28

Local methods: Thinking inside the box The inferential problem was that we didn t know whether the distribution of ɛ was the same for students who attend the new school and attend the old school after the new school has entered. Sometimes, policymakers design programs that are assigned to people on only one side of a cutoff. If we can see the variable used in calculating group membership, we can form a local comparison group. Mehta (UWO) Program Evaluation January 22, 2014 22 / 28

7: Regression discontinuity design Outcome 0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 Delta(x) 0.30 0.35 0.40 0.45 0.50 0 20 40 60 80 100 Index 0 20 40 60 80 100 x Mehta (UWO) Program Evaluation January 22, 2014 23 / 28

7: Regression discontinuity design Outcome 0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 Treatment effect 0.30 0.35 0.40 0.45 0.50 0 20 40 60 80 100 Index 0 20 40 60 80 100 Index Mehta (UWO) Program Evaluation January 22, 2014 24 / 28

8: Instrumental variables Similar idea underlying instrumental variables: Find something that toggles treatment status without otherwise affecting outcome. Similarly, instrumental variables tell us about treatment effects for only a subgroup of the population! Those whose treatment status is affected by the instrument More generally, we can model the selection process and use a control function to solve the inferential problem. While we re modeling selection, why not just go all the way? Mehta (UWO) Program Evaluation January 22, 2014 25 / 28

9: Structural estimation Specify outcomes as the result of optimization problems. In the leading example, write down a student s utility from attending the old school and the new one, in terms of outcome of interest, which may depend on other choices like effort other factors, like distance between the two schools We then use data to estimate parameters of developed economic model that we can assume to be policy-invariant. For this we typically use different assumptions than other methods. Assumptions commonly grounded in theory (mine grounded in economic theory). We can use the estimated model parameters to then extrapolate to situations that haven t yet happened: Ex ante policy evaluation. Mehta (UWO) Program Evaluation January 22, 2014 26 / 28

Conclusion We talked about some commonly used methods to evaluate the effects of programs. Takeaways: 1. There almost always exists a set of assumptions under which a statistical model returns an estimate of the treatment effect. 2. How plausible are those assumptions? We need to go beyond statistics. 3. All methods for program evaluation involve assumptions! 4. Interpretation of : it s a combination of agent input choices and equilibrium responses. It may not be policy invariant! 5. It s imperative to understand the implications of the mathematical models we use before we run them. Mehta (UWO) Program Evaluation January 22, 2014 27 / 28

Suggested readings See Petra Todd s lecture notes for a more formal treatment: http://athena.sas.upenn.edu/petra/econ712.htm World Bank book on impact evaluation: http://www-wds. worldbank.org/external/default/wdscontentserver/wdsp/ IB/2009/12/10/000333037_20091210014322/Rendered/PDF/ 520990PUB0EPI1101Official0Use0Only1.pdf Mehta (UWO) Program Evaluation January 22, 2014 28 / 28