Key questions when starting an econometric project (Angrist & Pischke, 2009):

Econometric & other impact assessment approaches to policy analysis Part 1 1 The problem of causality in policy analysis Internal vs. external validity Key questions when starting an econometric project (Angrist & Pischke, 2009): 1. What is the causal relationship that I want to estimate? E.g., what is the causal effect of a change in policy x on outcome y? 2. If I could design the ideal experiment to capture this causal effect, what would that experiment look like? What factors would I hold constant and what factors would I vary? Often involves random assignment of treatment. Useful to think about even if the experiment would be impossible to carry out in practice. 3. What is my identification strategy? I.e., how can I use observational data to approximate the ideal experiment? 4. What type of statistical inference will I use? E.g., population of interest, sample, type of standard errors The potential outcomes framework for binary treatments Let y 1,i be the outcome for individual i with treatment ( D i = 1) y 0,i be the outcome for individual i without treatment ( D i = 0 ) Then the causal effect of treatment or the treatment effect for individual i is y 1,i y 0,i. Note that our framework is with vs. without, not before vs. after. Challenge is that we only observe people in one state (treated or untreated). Can t be in both states at same time. We only observe y i where: y i = y 1i if D i = 1 y 0i if D i = 0 = y 1i D i + (1 D i )y 0i = ( y 1i y 0i )D i + y 0i Effects are likely to differ across individuals/units of analysis. It s the expected value of the effect (i.e., the mean effect) that is of interest for policy analysis, etc.: E( y 1,i y 0,i ) Average treatment effect (ATE) 1 Notes developed by Nicole Mason, September 2012, drawing on references listed below and class notes from EC 850, Development Economics (taught by Songqin Jin, Spring 2010, Michigan State University Department of Economics). 1

What are we measuring if we compare the outcomes for the treated to the untreated? Is this the causal effect we want (i.e., the effect of treatment on the outcome)? Eq. 1: E( y i D i = 1) E( y i D i = 0) = E( y 1i D i = 1) E( y 0i D i = 1) Average treatment effect on the treated (ATT) + E( y 0i D i = 1) E( y 0i D i = 0) Selection bias Can also write ATT as: ATT = E( y 1i D i = 1) E( y 0i D i = 1) = E( y 1i y 0i D i = 1) If no selection bias, then we get the ATT. The ATT is of interest but note that it is not the same as the ATE (except in special cases). Selection bias term is the difference between no-treatment outcomes between individuals that are treated and those that are not treated. If there is selection bias (or self-selection), estimate of the ATE based on comparing the average outcomes of the treated to the untreated will be misleading (biased). Goal is to minimize/eliminate selection bias. Self-selection: In most cases, individuals at least partly determine whether they receive treatment, and their decisions may be related to the benefits of treatment, y 1,i y 0,i (Wooldridge, 2003: 606). EX) Causal effect of going to the hospital on health: If compare average health outcomes of those that do go to the hospital to the average health outcomes of those that do not go to the hospital, likely to have a selection bias problem. People who are sick are more likely than healthy people to go to the hospital. Sick people have worse health outcomes to start out with. Selection bias is an issue. Example from Zambia and/or agricultural sector? Example: Effect of providing fertilizer to farmers on maize yields The intervention: provide fertilizer to select farmers in a poor region of the country (region A) in 2011. Objective is to raise maize yields. o Program targets poor areas o Farmers have to enroll at local extension office to receive the fertilizer o No fertilizer program before 2011 Have data on maize yields for both 2009 (before program) and 2011 (year of program) for farmers in region A and another region that didn t get the fertilizer (region B) Observe that farmers that were provided fertilizer have lower yields in 2011 than in 2009 Does this mean that the program did not work? 2

o There was a national drought in 2011, so everyone s yields were lower in 2011 than in 2009 (failure of the reflexive - before & after - comparison) o We compare farmers in the program region (region A) to those in the nonprogram region (region B) and find that treatment farmers in region A have a larger decline in yields than in region B. Did the program have a negative impact? Farmers in region B have better quality soil (unobservable) Farmers in region B have more irrigation, which is key in a drought year (observable) What if we compare treatment farmers in region A with their neighbors? o Think treated farmers and comparison farmers soils are approximately the same o Suppose we observe that the treatment farmers yields decline by less than comparison farmers. Did the program work? Not necessarily. Suppose that farmers that went to register with the program have more farming ability, and so could manage the drought better than their neighbors and the fertilizer was irrelevant (individual ability unobservable). o Suppose we observe no difference between the two groups. Did the program not work? Not necessarily. Rain could have caused the fertilizer to run off onto the neighbors fields (spillover/contamination) Random assignment of treatment solves the selection bias problem (& ATE=ATT) If treatment is randomly assigned across the population, then treatment status ( D i ) is independent of the potential outcomes. Implications: 1. ATE =ATT, in other words, ATT = E( y 1i y 0i D i = 1) = E( y 1i y 0i ) = ATE 2. No selection bias so can easily estimate ATE=ATT using Eq. 1 above (i.e., compare average outcomes of treated to the untreated) With randomization, average initial (pre-treatment) characteristics of individuals should not be statistically different between treated and untreated groups. 3

Regression analysis of experimental data If randomization was successful, then can estimate the ATE using the regression: y i = α + β D i + ε i May get more precise estimates (smaller standard errors) if include other (exogenous) control variables, X: y i = α + β D i + Xγ + ε i If we have observational data (rather than experimental data/randomized treatment), we can, in some cases, use regression analysis to approximate the ideal experiment. We will spend much of the rest of the course looking at this issue. Randomization: great if you can get it but not always an option Randomization may not be an option in certain cases because it is: 1. Too costly 2. Unethical 3. Takes too long to get results 4. Otherwise infeasible When randomization is not an option, our challenge is to find a natural experiment/ quasi-experiment that comes as close as possible to the experimental ideal. Need to be able to change the variable of interest, holding other key variables constant. Natural experiment (or quasi-experiment): A situation where the economic environment sometimes summarized by an explanatory variable exogenously changes, perhaps inadvertently, due to a policy or institutional change (Wooldridge, 2002: 799). 4

Research Methodology Spectrum (Roe & Just, 2009 p. 1267) Lab experiment Field experiment Natural experiment Field/ market data Lab experiment: the researcher purposefully imposes one or more exogenous changes (i.e. treatments) on a randomly chosen subset of subjects while holding all other elements of the context identical for a control group Field experiment: the researcher manipulates a naturally occurring context to induce relevant exogenous variation Compared to laboratory experiments, in which the researcher has control over nearly all aspects of the economic and institutional context, field experiments allow for less researcher control because much of the context is independent of the researcher s effort. Natural experiment: As with field data, a researcher cannot manipulate the stimulus or influence the data generation process. Rather, the researcher takes advantage of a change in context or setting that occurs for some subjects due to natural causes or social change beyond the researcher s and subjects influence. A natural experiment generates a control group (i.e., a group of similar individuals who avoid the natural treatment) that can be compared to the treatment group. Field/market data: naturally occurring or uncontrolled data in which the researcher observes market or field behavior that transpires regardless of the researcher s existence and largely independent of the researcher s activity Validity: External, Internal, & Ecological (Roe & Just, 2009 p. 1266-7) Validity whether a particular conclusion or inference represents a good approximation to the true conclusion or inference (i.e., whether our methods of research and subsequent observations provide an adequate reflection of the truth Internal validity the ability of a researcher to argue that observed correlations are causal External validity the ability to generalize the relationships found in a study to other persons, times, and settings Ecological validity a study has ecological validity to the extent that the context in which subjects cast decisions is similar to the context of interest. 5

Source: Roe & Just (2009) Threats to internal validity (Roe & Just, 2009 p. 1268) 1. Lack of temporal clarity 2. Systematic differences in treatment groups 3. Concurrent third elements that confound the outcome 4. Maturation over time/structural change Threats to external validity (Roe & Just, 2009 p. 1268-9) 1. Potential interactions with elements and context not found in the study s setting 2. Limited variation within stimulus or response 3. Systematic differences between the groups for which the result will be applied Tradeoffs Threats to internal validity are greatest for field/market data and least for lab experiment data Threats to external validity are least for field/market data and greatest for lab experiment data Field experiments & natural experiments are a middle ground but not a panacea Ideally use multiple research approaches 6

References Angrist, J.D., and J.S. Pischke. 2009. Mostly Harmless Econometrics: An Empiricist s Companion. Princeton: Princeton University Press. Ch. 1: Questions about Questions Ch. 2: The Experimental Ideal Roe, B.E., and D.R. Just. 2009. Internal and External Validity in Economics Research: Tradeoffs between Experiments, Field Experiments, Natural Experiments, and Field Data. American Journal of Agricultural Economics 91: 1266-1271. Wooldridge, J.M. 2002. Introductory Econometrics: A Modern Approach, 2nd ed. Cincinnati, OH: South-Western College Publishers. Wooldridge, J.M. 2003. Econometric Analysis of Cross-Section and Panel Data, 1 st Edition. Cambridge, MA: MIT Press. 7