Impact Evaluation Toolbox

Similar documents
Impact Evaluation Methods: Why Randomize? Meghan Mahoney Policy Manager, J-PAL Global

Version No. 7 Date: July Please send comments or suggestions on this glossary to

Why randomize? Rohini Pande Harvard University and J-PAL.

Finding True Program Impacts Through Randomization

TRANSLATING RESEARCH INTO ACTION. Why randomize? Dan Levy. Harvard Kennedy School

Issues in African Economic Development. Economics 172. University of California, Berkeley. Department of Economics. Professor Ted Miguel

Glossary From Running Randomized Evaluations: A Practical Guide, by Rachel Glennerster and Kudzai Takavarasha

Introduction to Program Evaluation

Evaluating Social Programs Course: Evaluation Glossary (Sources: 3ie and The World Bank)

Public Policy & Evidence:

Issues in African Economic Development. Economics 172. University of California, Berkeley. Department of Economics. Professor Ted Miguel

Measuring Impact. Program and Policy Evaluation with Observational Data. Daniel L. Millimet. Southern Methodist University.

Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy Research

QUASI-EXPERIMENTAL APPROACHES

ECONOMIC EVALUATION IN DEVELOPMENT

Instrumental Variables I (cont.)

How to Randomise? (Randomisation Design)

Recent advances in non-experimental comparison group designs

CASE STUDY 2: VOCATIONAL TRAINING FOR DISADVANTAGED YOUTH

Can Quasi Experiments Yield Causal Inferences? Sample. Intervention 2/20/2012. Matthew L. Maciejewski, PhD Durham VA HSR&D and Duke University

Child health and adult income. Worms at Work: Deworming and schooling in Kenya. Child health and adult income. Presentation by: Edward Miguel

Lecture II: Difference in Difference and Regression Discontinuity

Economics 270c. Development Economics. Lecture 9 March 14, 2007

OHDSI Tutorial: Design and implementation of a comparative cohort study in observational healthcare data

Threats and Analysis. Bruno Crépon J-PAL

Methods of Randomization Lupe Bedoya. Development Impact Evaluation Field Coordinator Training Washington, DC April 22-25, 2013

Steady Ready Go! teady Ready Go. Every day, young people aged years become infected with. Preventing HIV/AIDS in young people

Lecture II: Difference in Difference. Causality is difficult to Show from cross

Econometric analysis and counterfactual studies in the context of IA practices

SCALING UP TOWARDS UNIVERSAL ACCESS

Recent developments for combining evidence within evidence streams: bias-adjusted meta-analysis

Statistical Power Sampling Design and sample Size Determination

Epidemiology: Overview of Key Concepts and Study Design. Polly Marchbanks

Worms at Work: Long-run Impacts of Child Health Gains

Research Design. Beyond Randomized Control Trials. Jody Worley, Ph.D. College of Arts & Sciences Human Relations

A Practical Guide to Getting Started with Propensity Scores

Threats and Analysis. Shawn Cole. Harvard Business School

TRIPLL Webinar: Propensity score methods in chronic pain research

By the end of the activities described in Section 5, consensus has

CASE STUDY 4: DEWORMING IN KENYA

TRANSLATING RESEARCH INTO ACTION

Economics 270c Graduate Development Economics. Professor Ted Miguel Department of Economics University of California, Berkeley

Assessing Studies Based on Multiple Regression. Chapter 7. Michael Ash CPPA

How Early Health Affects Children s Life Chances

Challenges of Observational and Retrospective Studies

How Clinical Trials Work

Research Design. ESP178 Research Methods Dr. Susan Handy 1/19/16

Experimental Methods. Policy Track

BIOSTATISTICAL METHODS

Difference-in-Differences in Action

Randomized Evaluations

DRAFT: Sexual and Reproductive Rights and Health the Post-2015 Development Agenda

Causal Inference. Sandi McCoy, MPH, PhD University of California, Berkeley July 16, 2011

Combining Probability and Nonprobability Samples to form Efficient Hybrid Estimates

1 Online Appendix for Rise and Shine: The Effect of School Start Times on Academic Performance from Childhood through Puberty

You must answer question 1.

Sampling for Impact Evaluation. Maria Jones 24 June 2015 ieconnect Impact Evaluation Workshop Rio de Janeiro, Brazil June 22-25, 2015

Empirical Tools of Public Finance. 131 Undergraduate Public Economics Emmanuel Saez UC Berkeley

If you re affected by cancer, the last thing you want to think about is money.

Introduction to Observational Studies. Jane Pinelis

Causal Methods for Observational Data Amanda Stevenson, University of Texas at Austin Population Research Center, Austin, TX

Community Survey. Preview Copy. Our Community Assessing Social Capital

Designs. February 17, 2010 Pedro Wolf

TRACER STUDIES ASSESSMENTS AND EVALUATIONS

OBSERVATION METHODS: EXPERIMENTS

Tutorial 3: MANOVA. Pekka Malo 30E00500 Quantitative Empirical Research Spring 2016

Randomization as a Tool for Development Economists. Esther Duflo Sendhil Mullainathan BREAD-BIRS Summer school

Key questions when starting an econometric project (Angrist & Pischke, 2009):

What is: regression discontinuity design?

Introduction: Research design

Chapter Three: Sampling Methods

The ROBINS-I tool is reproduced from riskofbias.info with the permission of the authors. The tool should not be modified for use.

About 3ie. This Working Paper was written by Howard White, 3ie. Contacts

Conducting Strong Quasi-experiments

Economic Self-Help Group Programs for Improving Women s Empowerment: A Systematic Review

WWC STUDY REVIEW STANDARDS

Follow-up to the Second World Assembly on Ageing Inputs to the Secretary-General s report, pursuant to GA resolution 65/182

Regression Discontinuity Design

Impact of Florida s Medicaid Reform on Recipients of Mental Health Services

SAMPLING AND SAMPLE SIZE

Lecture 3. Previous lecture. Learning outcomes of lecture 3. Today. Trustworthiness in Fixed Design Research. Class question - Causality

The Economics of Obesity

Case study: Telephone campaigns to encourage child vaccination: Are they effective?

PubH 7405: REGRESSION ANALYSIS. Propensity Score

Quantitative Methods. Lonnie Berger. Research Training Policy Practice

2013/4/28. Experimental Research

Getting ready for propensity score methods: Designing non-experimental studies and selecting comparison groups

Supplementary Appendix for. Deworming externalities and schooling impacts in Kenya: A Comment on Aiken et al. (2015) and Davey et al.

HIV/AIDS Prevention among Female Sex Workers in AVAHAN districts of India

Prevalence and acceptability of male circumcision in South Africa

Spatial Epidemiology: GIS and Geostatistical Applications for Public Health

Impact of Eliminating Medicaid Adult Dental Coverage in California on Emergency. Department Use for Dental Problems

The Effects of Maternal Alcohol Use and Smoking on Children s Mental Health: Evidence from the National Longitudinal Survey of Children and Youth

Endogeneity is a fancy word for a simple problem. So fancy, in fact, that the Microsoft Word spell-checker does not recognize it.

How to Find the Poor: Field Experiments on Targeting. Abhijit Banerjee, MIT

Using register data to estimate causal effects of interventions: An ex post synthetic control-group approach

ECON Microeconomics III

Lecture 4: Research Approaches

Selected Topics in Biostatistics Seminar Series. Missing Data. Sponsored by: Center For Clinical Investigation and Cleveland CTSC

Chapter 5 & 6 Review. Producing Data Probability & Simulation

Transcription:

Impact Evaluation Toolbox Gautam Rao University of California, Berkeley * ** Presentation credit: Temina Madon

Impact Evaluation 1) The final outcomes we care about - Identify and measure them

Measuring Impacts IMPACTS OUTCOMES OUTPUTS difficulty of showing causality INPUTS

Measuring Impacts Knowledge abt HIV, sexual behavior, incidence IMPACTS No of Street Plays, Users targeted at govt. clinics OUTCOMES OUTPUTS difficulty of showing causality HIV Awareness Campaign INPUTS

Impact Evaluation 1) The final outcomes we care about - Identify and measure them 2) True causal effect of the intervention - Counterfactual: What would have happened in the absence of the intervention? - Compare measured outcomes with counterfactual à Causal effect

Toolbox for Impact Evaluation Non or Quasi-Experimental 1) Before vs. After 2) With / Without Program 3) Difference in-difference 4) Discontinuity Methods 5) Multivariate Regression 6) Instrumental Variable Experimental Method (Gold Standard) 7) Randomized Evaluation

Naïve Comparisons 1) Before-After the program 2) With-Without the program

An Example Millennium Village Project (MVP) Intensive package intervention to spark development in rural Africa 2005 to 2010 (and ongoing) Non randomly selected sites Poorer villages targeted

Before vs After

With vs. Without Program Comparison Villages MVP villages

With a little more data

Before-After Comparison Malnutrition Compare before and after intervention: A = Assumed outcome with no intervention B = Observed outcome A B A B-A = Estimated impact Before 2002 After 2004 We observe that villagers provided with training experience an increase in malnutrition from 2002 to 2004.

With-Without Comparison Now compare villagers in the program region A to those in another region B. We find that our program villagers have a larger increase in malnutrition than those in region B. Did the program have a negative impact? Not necessarily (remember program placement!) Region B is closer to the food aid distribution point, so villagers can access food easily (observable) Region A is less collectivized villagers are less likely to share food with their malnourished neighbors (unobservable)

Problems with Simple Comparisons Before-After Problem: Many things change over time, not just the project. Omitted Variables. With-Without Problem: Comparing oranges with apples. Why did the enrollees decide to enroll? Selection bias.

Another example of Selection Bias Impact of Health Insurance Can we just compare those with and without health insurance? Who purchases insurance? What happens if we compare health of those who sign up, with those who don t?

Multivariate Regression Change in outcome in the treatment group controlling for observable characteristics requires theorizing on what observable characteristics may impact the outcome of interest besides the programme Issues: how many characteristics can be accounted for? (omission variable bias) requires a large sample if many factors are to be controlled for

Multivariate Regression Compare before and after intervention: A = Assumed outcome with no intervention B = Observed outcome B-A = Estimated impact Control for other factors C = Actual outcome with no intervention (control) B-C = True Impact Malnutrition A Before 2002 C B A After 2004

To Identify Causal Links: We need to know: The change in outcomes for the treatment group What would have happened in the absence of the treatment ( counterfactual ) At baseline, comparison group must be identical (in observable and unobservable dimensions) to those receiving the program.

Creating the Counterfactual Women aged 14-25 years

What we can observe Women aged 14-25 years Screened by: income education level ethnicity marital status employment

What we can t observe Women aged 14-25 years What we can t observe: risk tolerance entrepreneurialism generosity respect for authority

Selection based on Observables With Program Without Program

What can happen (unobservables) With Program Without Program

Randomization With Program Without Program

Randomization With Program Without Program

Methods & Study Designs Methods Toolbox Randomization Regression Discontinuity Difference in Differences Matching Instrumental Variables Study Designs Clustering Phased Roll-Out Selective promotion Variation in Treatments

Methods Toolbox Randomization n n n Use a lottery to give all people an equal chance of being in control or treatment groups With a large enough sample, it guarantees that all factors/characteristics will be equal between groups, on average Only difference between 2 groups is the intervention itself Gold Standard

Ensuring Validity National Population EXTERNAL VALIDITY Random sample Sample of National Population INTERNAL VALIDITY Randomization

Ensuring Validity Internal Validity: - Have we estimated a causal effect? - Have we identified a valid counterfactual? External Validity: - Is our estimate of the impact generalizable - e.g. will IE results from Kerala generalize to Punjab?

Methods Toolbox Regression discontinuity When you can t randomize use a transparent & observable criterion for who is offered a program Examples: age, income eligibility, test scores (scholarship), political borders Assumption: There is a discontinuity in program participation, but we assume it doesn t impact outcomes of interest.

Methods Toolbox Regression discontinuity Poor Non-Poor

Methods Toolbox Regression discontinuity Poor Non-Poor

Methods Toolbox Regression discontinuity Example: South Africa pensions program Age Women Men 55-59 16% 5% 60-64 77% 22% 65+ 60% What happens to children living in these households? GIRLS: pension-eligible women vs. almost eligible BOYS: pension-eligible women vs. almost-eligible With pension-eligible men vs almost-eligible 0.6 s.d. heavier for age 0.3 s.d. heavier for age No diff.

Methods Toolbox Regression discontinuity Limitation: Poor generalizability. Only measures those near the threshold. Must have well-enforced eligibility rule.

Methods Toolbox Difference in differences Pre Post 2-way comparison of observed changes in outcomes (Before-After) for a sample of participants and nonparticipants (With-Without)

Methods Toolbox Difference in differences Big assumption: In absence of program, participants and non-participants would have experienced the same changes. Robustness check: Compare the 2 groups across several periods prior to intervention; compare variables unrelated to program pre/post Can be combined with randomization, regression discontinuity, matching methods

Methods Toolbox Time Series Analysis Also called: Interrupted time series, trend break analysis Collect data on a continuous variable, across numerous consecutive periods (high frequency), to detect changes over time correlated with intervention. Challenges: Expectation should be well-defined upfront. Often need to filter out noise. Variant: Panel data analysis

Methods Toolbox Time Series Analysis

Methods Toolbox Matching Pair each program participant with one or more non-participants, based on observable characteristics Big assumption required: In absence of program, participants and non-participants would have been the same. (No unobservable traits influence participation) **Combine with difference in differences

Methods Toolbox Matching Propensity score assigned: score(xi) = Pr(Zi = 1 Xi = xi) Z=treatment assignment x=observed covariates Requires Z to be independent of outcomes Works best when x is related to outcomes and selection (Z)

Methods Toolbox Matching

Methods Toolbox Matching Often need large data sets for baseline and follow-up: Household data Environmental & ecological data (context) Problems with ex-post matching: More bias with covariates of convenience (i.e. age, marital status, race, sex) Prospective if possible!

Methods Toolbox Instrumental Variables Example: Want to learn the effect of access to health clinics on maternal mortality Find a variable Instrument which affects access to health clinics but not maternal mortality directly. e.g. changes in distance of home to nearest clinic.

Overview of Study Designs Study Designs Clusters Phased Roll-Out Selective promotion Variation in Treatments Not everyone has access to the intervention at the same time (supply variation) The program is available to everyone (universal access or already rolled out)

Study Designs Cluster vs. Individual Cluster when spillovers or practical constraints prevent individual randomization. But easier to get big enough sample if we randomize individuals Individual randomization Group randomization

Issues with Clusters Assignment at higher level sometimes necessary: Political constraints on differential treatment within community Practical constraints confusing for one person to implement different versions Spillover effects may require higher level randomization Many groups required because of within-community ( intraclass ) correlation Group assignment may also require many observations (measurements) per group

Example: Deworming School-based deworming program in Kenya: Treatment of individuals affects untreated (spillovers) by interrupting oral-fecal transmission Cluster at school instead of individual Outcome measure: school attendance Want to find cost-effectiveness of intervention even if not all children are treated Findings: 25% decline in absenteeism

Possible Units for Assignment Individual Household Clinic Hospital Village level Women s association Youth groups Schools

Unit of Intervention vs. Analysis 20 Intervention Villages 20 Comparison Villages

Unit of Analysis Target Population Random Sample Evaluation Sample Unit of Intervention/ Randomization Random Assignment Control Treatment Unit of Analysis Random Sample again?

Study Designs Phased Roll-out Use when a simple lottery is impossible (because no one is excluded) but you have control over timing.

Phased Roll-Out Clusters A 1 2 3

Phased Roll-Out Clusters 1 A B Program 2 3

Phased Roll-Out Clusters 1 2 A B C Program Program Program 3

Phased Roll-Out Clusters 1 A B C D Program Program Program 2 3 Program Program Program

Study Designs Randomized Encouragement Already have the program Likely to join the program if pushed Randomize who gets an extra push to participate in the program Unlikely to join the program regardless

Evaluation Toolbox Variation in Treatment

Advantages of experiments Clear and precise causal impact Relative to other methods Much easier to analyze Can be cheaper (smaller sample sizes) Easier to explain More convincing to policymakers Methodologically uncontroversial 58

When to think impact evaluation? EARLY! Plan evaluation into your program design and roll-out phases When you introduce a CHANGE into the program New targeting strategy Variation on a theme Creating an eligibility criterion for the program Scale-up is a good time for impact evaluation! Sample size is larger 59

When is prospective impact evaluation impossible? Treatment was selectively assigned and announced and no possibility for expansion The program is over (retrospective) Universal take up already Program is national and non excludable Freedom of the press, exchange rate policy (sometimes some components can be randomized) Sample size is too small to make it worth it 60