Introduction to Program Evaluation Nirav Mehta Assistant Professor Economics Department University of Western Ontario January 22, 2014 Mehta (UWO) Program Evaluation January 22, 2014 1 / 28
What is Program Evaluation? Using statistics to determine the effect of a treatment on an outcome (or outcomes) of interest. What is a treatment? It can be: a policy: Introducing school choice into a public school district an individual decision: Attending university for one year Finishing university Eating a burrito Two ways of recovering the effect of a treatment Experimental: Randomization of treatment Use observational data and a combination of statistical and behavioral assumptions Mehta (UWO) Program Evaluation January 22, 2014 2 / 28
My perspective I am currently working on projects in: the Economics of Education How school choice affects student achievement The effect of ability tracking on student achievement Health Economics The design of optimal physician incentive schemes Mehta (UWO) Program Evaluation January 22, 2014 3 / 28
What can we use program evaluation for? Three types of analyses: Retrospective: How did introducing a school choice program affect student achievement? Prospective: How would introducing a school choice program that has already been implemented on Group A affect student achievement for students in Group B? Prospective: How would a school choice program, which has never been implemented, affect student achievement for students in Group A or Group B or anyone else? Using retrospective analyses to prospectively evaluate programs requires extrapolation (i.e. additional assumptions). Mehta (UWO) Program Evaluation January 22, 2014 4 / 28
Leading example There is a public school district with one public school. A new public school enters the district. How does attending the new school affect student achievement? Mehta (UWO) Program Evaluation January 22, 2014 5 / 28
A little notation to fix ideas Individual i, time t Observed characteristics X it (household income, parental education,...) Unobserved characteristics ɛ it (motivation, ability, waking up on the right/wrong side of the bed,...) Treatment status D Dit = 0 means i didn t have the treatment at time t Dit = 1 means i had the treatment at time t Outcome Y is a function of individual characteristics and treatment status (score on standardized test or probability of graduating high school) Y (X, ɛ, D) Treatment effect it Y (X it, ɛ it, 1) Y (X it, ɛ it, 0) combination of behavioral responses and input changes Mehta (UWO) Program Evaluation January 22, 2014 6 / 28
Treatment effects There is in general a distribution of treatment effects. Put another way, there s no reason to expect that i = for all people. The effect of being in a new school not only reflects the potentially different characteristics of those students. Also incorporates behavioral responses that can affect a student s learning. For example, parents might help their child more or less when their child is in a particular school. These responses could also depend on student characteristics: the amount and efficacy of parent help on home may depend on parental education. Generalizing our findings to other students or another school requires us to make assumptions about these behavioral responses. Mehta (UWO) Program Evaluation January 22, 2014 7 / 28
Leading example First question: What is the treatment? We will focus on the students attending the new school for this talk. Note: We could also see what the effect of attending the old school when the new school enters (spillover effect of competition) is! Second question: Which summary of the treatment effect? Focus on average for today. Third question: Which students? (average for whom?) students attending the new school TT: Treatment on the treated students attending the old school TU: Treatment on the untreated all students who attended the old school last year? ATE: Average treatment effect Mehta (UWO) Program Evaluation January 22, 2014 8 / 28
Interpreting averages Most researchers focus on the average effect of a program on some subgroup of the population. Although convenient, this almost never innocuous! A small, positive, average treatment effect could be consistent with a small improvement for most people. a very large, positive change for some people. e.g. the worst students learn how to read e.g. the best students get into their super top choice university a very large, positive change for some and a large, negative change for some other people! Mehta (UWO) Program Evaluation January 22, 2014 9 / 28
Why is program evaluation hard? Look at student i, who attended the old public school in t 1 but then switched to the new public school in year t. D Outcome D Outcome t 1 0 Y (X i,t 1, ɛ i,t 1, 0) t 0 Y (X it, ɛ it, 0) 1 Y (X it, ɛ it, 1) Missing data problem: The object of interest is it Y (X it, ɛ it, 1) - Y (X it, ɛ it, 0) Y (X it, ɛ it, 0) is a counterfactual outcome We can t observe outcomes under both treatment conditions Therefore, we need to find a valid comparison group. Mehta (UWO) Program Evaluation January 22, 2014 10 / 28
Counterfactual outcomes Our definition of the treatment effect, and the summary of the treatment effect we re interested in (i.e. the average for some group of students) provide criteria for a comparison group. We observe Y (X it, ɛ it, 1). We need to come up with Y (X it, ɛ it, 0). Mehta (UWO) Program Evaluation January 22, 2014 11 / 28
Counterfactual outcomes In the language of our model, someone with the same observable characteristics (X ) and the same unobservable characteristics (ɛ) who did not participate in the treatment (D = 0) would suffice, given no further assumptions on Y. What if treatment status was related to unobservable characteristics, e.g. more motivated students are more likely to enroll in a new, demanding school. Many methods try to make people comparable across these unobservable characteristics. More on this later. Mehta (UWO) Program Evaluation January 22, 2014 12 / 28
Strategies for program evaluation Trade-off: The more (or stronger) assumptions you make, the more you can extrapolate. Economists, and other social scientists, have used the following: 1. Randomized control trial 2. Cross sectional comparisons 3. Fixed effects 4. Fixed effects and common time trend (Difference in differences) 5. Multivariate regression 6. Matching (e.g. on propensity scores) 7. Regression discontinuity design 8. Instrumental variables 9. Structural estimation Mehta (UWO) Program Evaluation January 22, 2014 13 / 28
1: Randomized control trial Say we randomly assigned attendance at the new school amongst all students at the old school. This is like having two subgroups of students who had the same distribution of (X, ɛ), but different treatment statuses D. We can recover the average for students on those subgroups by taking the average difference in outcomes between the two groups! Mehta (UWO) Program Evaluation January 22, 2014 14 / 28
Interpreting results from randomized control trials People like to say that RCTs are the gold standard for evaluating programs. The monetary gold standard is an obsolete relic that people like talking about all the time. So, I agree. If we are unwilling to assume that all students would be affected in exactly the same manner we have to make more assumptions to make use of findings from RCTs. Mehta (UWO) Program Evaluation January 22, 2014 15 / 28
Problems with RCTs They are retrospective. They are expensive. I don t have enough grant money or time to conduct RCTs every time I want to study something new. It s hard to generalize findings from one RCT to another if treatment effects are heterogeneous. We need further structure to understand the results. What is it about the new school that resulted in those amazing outcomes, and is replicable? Similar to generalizability. Will the next new school be exactly the same? This could be a HUGE deal. Education interventions, found effective through RCTs, often don t scale up. If we put further assumptions on Y, we start down the path of using observational (think: survey or administrative) data. Mehta (UWO) Program Evaluation January 22, 2014 16 / 28
Methods using observational data Mehta (UWO) Program Evaluation January 22, 2014 17 / 28
2: Cross-section Let s invoke the commonly used additively separable framework: Y it = X it β + it D it + ɛ it and assume that the effect of treatment is constant Y it = X it β + D it + ɛ it The inferential problem is that the random variable D it may be correlated with ɛ it. If highly motivated students (large, positive ɛ) were also the ones who switched to the new school, we might overstate the effect of attending the new school. Therefore, comparing people who received the treatment with those who did not may bias our estimate of. Mehta (UWO) Program Evaluation January 22, 2014 18 / 28
3: Fixed effects Assumption: What if motivation were constant over time? Y it = X it β + D it + α i + η t + µ it }{{} ɛ it We could then difference outcome equation within each student, over time. Run regression on differenced data. The year where there s a switch will identify. If a student switched because they were even more motivated in year t, we d have a problem! Mehta (UWO) Program Evaluation January 22, 2014 19 / 28
4: Difference in differences What is the new school was also introduced when there was a common, unobserved, shock? Say the new school entered because the district is in turmoil, which lowers achievement. Take two students, in the same district, one of whom had the treatment and the other who did not have the treatment. Take the difference between their differences! This gets rid of both α and η. Run regression on differenced data ( Diff in diff ) Mehta (UWO) Program Evaluation January 22, 2014 20 / 28
Condition on observables: 5. Regression and 6. Matching Include as many variables as you can in the linear regression. Hope you capture offending terms in ɛ. This is the same in principle as matching, basically find people with X as similar as possible. Just making an assumption about how the X enter the outcome equation. Both of these can look like data mining. beneath social scientists statistical issues, as well Mehta (UWO) Program Evaluation January 22, 2014 21 / 28
Local methods: Thinking inside the box The inferential problem was that we didn t know whether the distribution of ɛ was the same for students who attend the new school and attend the old school after the new school has entered. Sometimes, policymakers design programs that are assigned to people on only one side of a cutoff. If we can see the variable used in calculating group membership, we can form a local comparison group. Mehta (UWO) Program Evaluation January 22, 2014 22 / 28
7: Regression discontinuity design Outcome 0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 Delta(x) 0.30 0.35 0.40 0.45 0.50 0 20 40 60 80 100 Index 0 20 40 60 80 100 x Mehta (UWO) Program Evaluation January 22, 2014 23 / 28
7: Regression discontinuity design Outcome 0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 Treatment effect 0.30 0.35 0.40 0.45 0.50 0 20 40 60 80 100 Index 0 20 40 60 80 100 Index Mehta (UWO) Program Evaluation January 22, 2014 24 / 28
8: Instrumental variables Similar idea underlying instrumental variables: Find something that toggles treatment status without otherwise affecting outcome. Similarly, instrumental variables tell us about treatment effects for only a subgroup of the population! Those whose treatment status is affected by the instrument More generally, we can model the selection process and use a control function to solve the inferential problem. While we re modeling selection, why not just go all the way? Mehta (UWO) Program Evaluation January 22, 2014 25 / 28
9: Structural estimation Specify outcomes as the result of optimization problems. In the leading example, write down a student s utility from attending the old school and the new one, in terms of outcome of interest, which may depend on other choices like effort other factors, like distance between the two schools We then use data to estimate parameters of developed economic model that we can assume to be policy-invariant. For this we typically use different assumptions than other methods. Assumptions commonly grounded in theory (mine grounded in economic theory). We can use the estimated model parameters to then extrapolate to situations that haven t yet happened: Ex ante policy evaluation. Mehta (UWO) Program Evaluation January 22, 2014 26 / 28
Conclusion We talked about some commonly used methods to evaluate the effects of programs. Takeaways: 1. There almost always exists a set of assumptions under which a statistical model returns an estimate of the treatment effect. 2. How plausible are those assumptions? We need to go beyond statistics. 3. All methods for program evaluation involve assumptions! 4. Interpretation of : it s a combination of agent input choices and equilibrium responses. It may not be policy invariant! 5. It s imperative to understand the implications of the mathematical models we use before we run them. Mehta (UWO) Program Evaluation January 22, 2014 27 / 28
Suggested readings See Petra Todd s lecture notes for a more formal treatment: http://athena.sas.upenn.edu/petra/econ712.htm World Bank book on impact evaluation: http://www-wds. worldbank.org/external/default/wdscontentserver/wdsp/ IB/2009/12/10/000333037_20091210014322/Rendered/PDF/ 520990PUB0EPI1101Official0Use0Only1.pdf Mehta (UWO) Program Evaluation January 22, 2014 28 / 28