Experimental Methods. Policy Track

Experimental Methods Policy Track East Asia Regional Impact Evaluation Workshop Seoul, South Korea Nazmul Chaudhury, World Bank

Reference Spanish Version & French Version also available, Portuguese soon. http://runningres.com Free download at www.worldbank.org/ieinpractice

Impact Evaluation Road Map Logical Framework Measuring Impact How the program works in theory Methods to identify impact Operational Plan Resources

Our Objective Estimate the causal effect (impact) of program (P) on outcome (Y). Impact= Y T - Y C P = Program or Treatment Y = Indicator, Measure of Success Y T = Outcome with the program Y C = Outcome without the program (control)

Example: CCT Progresa National anti-poverty program in Mexico Cash Transfers conditional on school and health care attendance. Targeting Eligibility based on a poverty index Timing: Started 1997 Phased Roll-out, 5 million beneficiaries by 2004

Research Question What is the Impact of. a conditional cash transfer (P) on household consumption (Y)

Challenge No counterfactual Impact= Y T - Y C We do not observe what would have happened to the pupils if they did not receive any cash transfer (the counterfactual )?

CLONE PERFECT COUNTERFACTUAL

Perfect Experiment First, identify the target beneficiaries

Perfect Experiment - Clones and then clone the target beneficiaries

Perfect Experiment - Clones Give the cash transfer to one set of the clones

Perfect Experiment - Clones and compare their consumption some time later Because the people who received the cash transfer are exactly the same as those who did not receive the cash transfer, we can truly attribute the difference to the program

Before & After Comparing those enrolled with those not BAD COUNTERFACTUALS

Case 1: Before & After (1) Observe only beneficiaries Y 268 A (2) Two observations in time: Consumption in 1997 Consumption in 1998. 233 t=1997 t=1998 B $35 Time ESTIMATE OF IMPACT =A-B= $35

Case 1: Before & After Consumption (Y) Consumption after program start (treatment) 268.7 Consumption before program start (control (counterfactual)) 233.4 Estimate of Impact 35.3*** Estimated Impact on Consumption (Y) Linear Regression 35.27** Multivariate Linear Regression 34.28** Note: If the effect is statistically significant at the 1% significance level, we label the estimated impact with 2 stars (**).

Case 1: Problem: we don t know what would have happened without the program Economic Boom: o Real Impact=A-C o A-B over-estimates impact Economic Recession: o Real Impact=A-D o A-B underestimates impact Y 268 233 T=0 T=1 A C? B D? Impact? α = $35 Impact? Time

Case 2: Some people enroll, others not Ineligibles (Non-Poor) Eligibles (Poor = Target Population) Not Enrolled Enrolled

Case 2: Some people enroll, others not TREATMENT GROUP CONTROL GROUP

Case 2: Problem of Selection Bias What if those who chose not to enrol are different?

Case 2: Problem of Selection Bias And, what if we cannot observe (control for) these differences

Case 2: Problem of Selection Bias And, these difference influence outcomes? Are the factors that determine enrolment correlated with consumption?

Case 2: Some people enroll, others not Consumption (Y) Consumption with program (enrolled - treatment) Consumption without program (not enrolled - control) 268 290 Estimate of Impact -22** Estimated Impact on Consumption (Y) Linear Regression -22** Multivariate Linear Regression -4.15 Note: If the effect is statistically significant at the 1% significance level, we label the estimated impact with 2 stars (**).

Progresa Policy Recommendation? Impact on Consumption (Y) Case 1: Before & After Case 2: Enrolled & Not Enrolled Linear Regression 35.27** Multivariate Linear Regression 34.28** Linear Regression -22** Multivariate Linear Regression -4.15 Note: If the effect is statistically significant at the 1% significance level, we label the estimated impact with 2 stars (**).

Keep in Mind Before-After Comparison Problem: other factors that matter also change over time. Compare those who enroll with those who don t Problem: Selection bias. The enrolled may be different, and we don t observe these differences. Both comparison groups may lead to biased estimates of the program impact.

RANDOMIZATION GOOD COUNTERFACTUAL

Randomization: creating similar groups With large sample, two groups have very similar characteristics ON AVERAGE

External and Internal Validity Random Assignment Target Population Random Sample External Validity Internal Validity

Drawing a random sample from two groups does not make them comparable. Random Sample is NOT the Same as Random Assignment! Participants Non Participants

Case 3: Randomized Assignment in Progresa How do we know we have good clones of the treatment group? In the absence of Progresa, treatment and comparisons should be identical Let s compare their characteristics at baseline (t=0)

Case 3: Balanced characteristics at baseline Consumption ($ monthly per capita) Head s age (years) Spouse s age (years) Head s education (years) Spouse s education (years) Randomized Assignment Treatment Comparison T-stat 233.4 233.47-0.39 41.6 42.3-1.2 36.8 36.8-0.38 2.9 2.8 2.16** 2.7 2.6 0.006 Note: If the effect is statistically significant at the 1% significance level, we label the estimated impact with 2 stars (**).

Case 3: Balanced characteristics at baseline Randomized Assignment Treatment Comparison T-stat Head is female=1 0.07 0.07-0.66 Indigenous=1 0.42 0.42-0.21 Number of household members 5.7 5.7 1.21 Bathroom=1 0.57 0.56 1.04 Hectares of Land 1.67 1.71-1.35 Distance to Hospital (km) 109 106 1.02 Note: If the effect is statistically significant at the 1% significance level, we label the estimated impact with 2 stars (**).

create the correct counterfactual Y 268 A True Impact 239 B 233 t=1997 t=1998 Measure of Impact=A-B= $29

Note: If the effect is statistically significant at the 1% significance level, we label the estimated impact with 2 stars (**). Case 3: Randomized Assignment Treatment Group (Randomized to treatment) Counterfactual (Randomized to Comparison) Impact Baseline (t=0) Consumption (Y) 233.47 233.40 0.07 Follow-up (t=1) Consumption (Y) 268.75 239.5 29.25** Estimated Impact on Consumption (Y) Linear Regression 29.25** Multivariate Linear Regression 29.75**

Progresa Policy Recommendation? Impact of Progresa on Consumption (Y) Case 1: Before & After Case 2: Enrolled & Not Enrolled Case 3: Randomized Assignment Multivariate Linear Regression 34.28** Linear Regression -22** Multivariate Linear Regression -4.15 Multivariate Linear Regression 29.75** Note: If the effect is statistically significant at the 1% significance level, we label the estimated impact with 2 stars (**).

PROGRESA EXAMPLE RANDOMIZATION IN PRACTICE

Program Operates at Community Level

Community-level Randomization Randomly assign communities to receive program 320 treatment 506 communities 186 control

Program was randomly phased in Before During After Time evaluation period

Keep in Mind Randomized Assignment In Randomized Assignment, large enough samples, produces two statistically equivalent groups. We have identified the perfect clone!

Pop Quiz!

What is the problem when comparing those who enroll with those who choose not to? A. Those not enrolled could be very different to those enrolled, which might be correlated to the outcome of interest. B. Those not enrolled will also want to receive the program. C. Those not enrolled could be very different to those enrolled, but which is NOT correlated to the outcome of interest.

What is the biggest problem of doing a before/after comparison? A. Sample size might not be large enough B. Any difference in outcome we observe before-after could be due to other factors that changed. C. You might not find the people you surveyed in the beginning

What is a counter-factual? A. An argument B. It is what would have happened if the program did not occur. C. A type of vaccination. D. Contrary to the facts

Why can randomization create the correct counter-factual? A. It is fair because everyone has an equal chance of receiving the program. B. It clones each person. C. Those who receive the program are, on average, the same as those who did not receive the program.