ECONOMIC EVALUATION IN DEVELOPMENT

Similar documents
How to Randomise? (Randomisation Design)

TRANSLATING RESEARCH INTO ACTION. Why randomize? Dan Levy. Harvard Kennedy School

SAMPLING AND SAMPLE SIZE

Impact Evaluation Methods: Why Randomize? Meghan Mahoney Policy Manager, J-PAL Global

Evaluating Social Programs Course: Evaluation Glossary (Sources: 3ie and The World Bank)

Measuring impact. William Parienté UC Louvain J PAL Europe. povertyactionlab.org

Glossary From Running Randomized Evaluations: A Practical Guide, by Rachel Glennerster and Kudzai Takavarasha

CASE STUDY 4: DEWORMING IN KENYA

Why randomize? Rohini Pande Harvard University and J-PAL.

Impact Evaluation Toolbox

Lecture II: Difference in Difference and Regression Discontinuity

Abdul Latif Jameel Poverty Action Lab Executive Training: Evaluating Social Programs Spring 2009

Version No. 7 Date: July Please send comments or suggestions on this glossary to

Public Policy & Evidence:

Randomization as a Tool for Development Economists. Esther Duflo Sendhil Mullainathan BREAD-BIRS Summer school

Applied Econometrics for Development: Experiments II

Methods of Randomization Lupe Bedoya. Development Impact Evaluation Field Coordinator Training Washington, DC April 22-25, 2013

GUIDE 4: COUNSELING THE UNEMPLOYED

Planning Sample Size for Randomized Evaluations.

Lecture II: Difference in Difference. Causality is difficult to Show from cross

CHAPTER LEARNING OUTCOMES

Assignment 4: True or Quasi-Experiment

VALIDITY OF QUANTITATIVE RESEARCH

Welcome to this series focused on sources of bias in epidemiologic studies. In this first module, I will provide a general overview of bias.

Lesson 11.1: The Alpha Value

Statistical Power Sampling Design and sample Size Determination

Threats and Analysis. Bruno Crépon J-PAL

Chapter Three: Sampling Methods

TRACER STUDIES ASSESSMENTS AND EVALUATIONS

Econometric analysis and counterfactual studies in the context of IA practices

Chapter 12. The One- Sample

Conditional Average Treatment Effects

Threats and Analysis. Shawn Cole. Harvard Business School

Validity and Quantitative Research. What is Validity? What is Validity Cont. RCS /16/04

Experimental Research I. Quiz/Review 7/6/2011

CHAPTER 3 METHOD AND PROCEDURE

The Regression-Discontinuity Design

Chapter 13 Summary Experiments and Observational Studies

NUTRITION & HEALTH YAO PAN

Define Objectives. Define Solution. Design Evaluation. Use Findings. Report Findings. Implement Program. Program. Conduct Evaluation

Chapter 13. Experiments and Observational Studies. Copyright 2012, 2008, 2005 Pearson Education, Inc.

CASE STUDY 2: VOCATIONAL TRAINING FOR DISADVANTAGED YOUTH

Patrick Breheny. January 28

Instrumental Variables I (cont.)

Medicaid Denied My Request for Services, Now What?

By the end of the activities described in Section 5, consensus has

ORIENTATION SAN FRANCISCO STOP SMOKING PROGRAM

Sampling for Impact Evaluation. Maria Jones 24 June 2015 ieconnect Impact Evaluation Workshop Rio de Janeiro, Brazil June 22-25, 2015

TRANSLATING RESEARCH INTO ACTION

Randomized Evaluations

DISCOVERING THE TRUTH THROUGH NON-CONFRONTATIONAL INTERVIEWING TECHNIQUES. Session 5 3:15 PM 4:45 PM

What can go wrong.and how to fix it!

Sample Size, Power and Sampling Methods

Lecture 4: Research Approaches

Globally Informed, Locally Grounded Policymaking. Rachel Glennerster MIT

Communication Research Practice Questions

Analysis Plans in Economics

Previous Example. New. Tradition

In this chapter we discuss validity issues for quantitative research and for qualitative research.

Analysis A step in the research process that involves describing and then making inferences based on a set of data.

ACCTG 533, Section 1: Module 2: Causality and Measurement: Lecture 1: Performance Measurement and Causality

QUASI-EXPERIMENTAL APPROACHES

Regression Discontinuity Design

One slide on research question Literature review: structured; holes you will fill in Your research design

Your goal in studying for the GED science test is scientific

Biostatistics 3. Developed by Pfizer. March 2018

How was your experience working in a group on the Literature Review?

Chapter 11. Experimental Design: One-Way Independent Samples Design

Design of Experiments & Introduction to Research

Further Properties of the Priority Rule

Experimental Methods. Policy Track

Non-profit education, research and support network offers money in exchange for missing science

Instrumental Variables Estimation: An Introduction

Chapter 9 Experimental Research (Reminder: Don t forget to utilize the concept maps and study questions as you study this and the other chapters.

PLS 506 Mark T. Imperial, Ph.D. Lecture Notes: Reliability & Validity

Statistical Sampling: An Overview for Criminal Justice Researchers April 28, 2016

Psychology: The Science

Study Design STUDY DESIGN CASE SERIES AND CROSS-SECTIONAL STUDY DESIGN

Empirical Knowledge: based on observations. Answer questions why, whom, how, and when.

Research Questions, Variables, and Hypotheses: Part 2. Review. Hypotheses RCS /7/04. What are research questions? What are variables?

Lesson 9: Two Factor ANOVAS

QUASI-EXPERIMENTAL HEALTH SERVICE EVALUATION COMPASS 1 APRIL 2016

Lesson 8 Setting Healthy Eating & Physical Activity Goals

Process of Designing & Implementing a Research Project

15.301/310, Managerial Psychology Prof. Dan Ariely Recitation 8: T test and ANOVA

August 29, Introduction and Overview

Research Design. Beyond Randomized Control Trials. Jody Worley, Ph.D. College of Arts & Sciences Human Relations

Funnelling Used to describe a process of narrowing down of focus within a literature review. So, the writer begins with a broad discussion providing b

Lecture 18: Controlled experiments. April 14

Sheila Barron Statistics Outreach Center 2/8/2011

Georgina Salas. Topics 1-7. EDCI Intro to Research Dr. A.J. Herrera

Chapter Three Research Methodology

Defining Evidence-Based : Developing a Standard for Judging the Quality of Evidence for Program Effectiveness and Utility

OBSERVATION METHODS: EXPERIMENTS

REVIEW FOR THE PREVIOUS LECTURE

Chapter 7: Descriptive Statistics

You must answer question 1.

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES

Your Money or Your Life An Exploration of the Implications of Genetic Testing in the Workplace

How to interpret results of metaanalysis

Transcription:

ECONOMIC EVALUATION IN DEVELOPMENT Autumn 2015 Michael King 1

A plan to end poverty.. I have identified the specific investments that are needed [to end poverty]; found ways to plan and implement them; [and] shown that they can be affordable. Jeffrey Sachs End of Poverty 2

How much money. "After $2.3trillion over 5 decades, why are the desperate needs of the world's poor still so tragically unmet? Isn't it finally time for an end to the impunity of foreign aid? Bill Easterly The White Man s Burden 3

The different types of evaluation Types of evaluation: A framework Program Evaluation Monitoring & Evaluation Impact Evaluation Randomized Evaluation 11 4

Different levels of program evaluation 1. Needs Assessment 2. Program Theory Assessment 3. Process evaluation 4. Impact evaluation 5. Cost-benefit/Cost-effectiveness analysis 5

Different levels of program evaluation 1. Needs Assessment 2. Program Theory Assessment 3. Process evaluation 4. Impact evaluation [Our focus] 5. Cost-benefit/Cost-effectiveness analysis 6

Why focus on impact evaluation? Surprisingly little hard evidence on what works Can do more with given budget with better evidence If people knew money was going to programs that worked, could help increase funds Instead of asking do aid/development programs work? should be asking: Which work best, why and when? How can we scale up what works? 7

What makes a good evaluation? Ask the right questions For accountability For lesson learning Answers those questions in unbiased and definitive way To do that you need a model: logical framework/theory of change Who is the target? What are their needs? What is the program seeking to change? What is the precise program or part of program being evaluated? 8

9!!!! EXAMPLE'1:'WATER'AND'SANITATION'PROJECT' OBJECTIVES/ (What!you!want!to!achieve)! INDICATORS/ (How!to!measure!change)! MEANS/OF!VERIFICATION! (Where!&!how!to!get!information)! ASSUMPTIONS/ (What!else!to!be!aware!of)/ Goal:// Reduce!death!and!illness!related! to!water!and!sanitation!related! diseases!in!the!targeted! communities! G1!%!(percentage)!reduction!in!water!and!sanitation! related!diseases!among!target!population! G2!%!of!children!under!36!months!with!diarrhoea!in!the! last!two!weeks! Ministry!of!Health!/!WHO! statistics! Records!from!village!clinics!! Outcome/1/ Improved!access!to!and!use!of! sustainable!sources!of!safe!water! in!target!communities! 1a!%!of!people!in!the!target!communities!using!minimum! 25L!of!safe!water!per!day!! 1b!%!of!targeted!households!with!access!to!an!functional! water!source! 1c!%!of!water!points!managed!by!local!WatSan! committees! 1d!#!hours!spent!by!women!in!fetching!water!daily! 1a,b,d!Household!survey! 1c!Key!informant!interviews!with! WatSan!committee!members! Civil!war!/!hostilities!do! not!return! Improved!access!to!clinical! health!facilities!! Outputs// 1.1 Community!water!points! constructed!or!rehabilitated! 1.1a!#!(number)!of!water!points!constructed!to!national! standard!(140)!! 1.1ab%!of!water!handpumps!rehabilitated!to!national! standard!(35)! Community!Facility!Inspection! field!report!! Low!rainfall!does!not!limit! overall!water!supply.! 1.2 Community!management!of! water!points!is!improved! / 1.2a!#!of!communities!with!a!WatSan!committee! established! 1.2b!#!of!WatSan!committees!with!technicians!trained!to! perform!basic!maintenance!on!water!points! 1.2c!%!of!WatSan!committees!collecting!adequate! charges!to!maintain!the!water!points! 1.2a!Household!survey! Key!informant!interviews!with! WatSan!committee!members! No!major!disputes!or! conflicts!within!the! community! Outcome/2/ Improved!access!to!and!use!of! sustainable!sanitation!facilities! among!targeted!communities! 2a!%!of!people!in!the!target!communities!using!latrines! on!a!daily!basis!! 2b!%!of!targeted!households!with!access!to!functional! latrines!meeting!national!standard! 2c!%!of!latrines!managed!by!local!WatSan!committees! 2a,b!Household!survey! 2c!Key!informant!interviews!with! WatSan!committee!members! Civil!war!/!hostilities!do! not!return! Outputs// 2.1 Sanitation!facilities! constructed!! 2.1a!#!of!fully!functioning!household!latrines!constructed! (3,500)! Community!Facility!Inspection! field!report!! Flooding!or!other! environmental!problems! do!not!affect!sanitation! facilities! Log frame

10!!!! EXAMPLE'1:'WATER'AND'SANITATION'PROJECT' OBJECTIVES/ (What!you!want!to!achieve)! INDICATORS/ (How!to!measure!change)! MEANS/OF!VERIFICATION! (Where!&!how!to!get!information)! ASSUMPTIONS/ (What!else!to!be!aware!of)/ Goal:// Reduce!death!and!illness!related! to!water!and!sanitation!related! diseases!in!the!targeted! communities! G1!%!(percentage)!reduction!in!water!and!sanitation! related!diseases!among!target!population! G2!%!of!children!under!36!months!with!diarrhoea!in!the! last!two!weeks! Ministry!of!Health!/!WHO! statistics! Records!from!village!clinics!! Outcome/1/ Improved!access!to!and!use!of! sustainable!sources!of!safe!water! in!target!communities! 1a!%!of!people!in!the!target!communities!using!minimum! 25L!of!safe!water!per!day!! 1b!%!of!targeted!households!with!access!to!an!functional! water!source! 1c!%!of!water!points!managed!by!local!WatSan! committees! 1d!#!hours!spent!by!women!in!fetching!water!daily! 1a,b,d!Household!survey! 1c!Key!informant!interviews!with! WatSan!committee!members! Civil!war!/!hostilities!do! not!return! Improved!access!to!clinical! health!facilities!! Outputs// 1.1 Community!water!points! constructed!or!rehabilitated! 1.1a!#!(number)!of!water!points!constructed!to!national! standard!(140)!! 1.1ab%!of!water!handpumps!rehabilitated!to!national! standard!(35)! Community!Facility!Inspection! field!report!! Low!rainfall!does!not!limit! overall!water!supply.! 1.2 Community!management!of! water!points!is!improved! / 1.2a!#!of!communities!with!a!WatSan!committee! established! 1.2b!#!of!WatSan!committees!with!technicians!trained!to! perform!basic!maintenance!on!water!points! 1.2c!%!of!WatSan!committees!collecting!adequate! charges!to!maintain!the!water!points! 1.2a!Household!survey! Key!informant!interviews!with! WatSan!committee!members! No!major!disputes!or! conflicts!within!the! community! Outcome/2/ Improved!access!to!and!use!of! sustainable!sanitation!facilities! among!targeted!communities! 2a!%!of!people!in!the!target!communities!using!latrines! on!a!daily!basis!! 2b!%!of!targeted!households!with!access!to!functional! latrines!meeting!national!standard! 2c!%!of!latrines!managed!by!local!WatSan!committees! 2a,b!Household!survey! 2c!Key!informant!interviews!with! WatSan!committee!members! Civil!war!/!hostilities!do! not!return! Outputs// 2.1 Sanitation!facilities! constructed!! 2.1a!#!of!fully!functioning!household!latrines!constructed! (3,500)! Community!Facility!Inspection! field!report!! Flooding!or!other! environmental!problems! do!not!affect!sanitation! facilities! Impact Evaluation Process Evaluation } } Log frame

How to measure impact? What would have happened in the absence of the program? Take the difference between what happened (with the program)...and what would have happened (without the program) =IMPACT of the program 11

Counterfactual Counterfactual is often constructed by selecting a group not affected by the program Randomized: Use random assignment of the program to create a control group which mimics the counterfactual. Non randomized: Argue that a certain excluded group mimics the counterfactual. 12

Methodologies in impact evaluation 1. Experimental Randomized Evaluations 2. Quasi experimental Regression Discontinuity Design Instrumental Variables 3. Non experimental Matching Regression Difference in differences Pre v post Increasing sophistication 13

Cost benefit analysis 1. Needs assessment gives you the metric for measuring impact 2. Process evaluation gives you the costs of all the inputs 3. Impact evaluation gives you the quantified benefits 4. Identifying alternatives allows for comparative cost benefit 14

15

When to do a randomized evaluation? When there is an important question you want/ need to know the answer to Program is representative not gold plated Time, expertise, and money to do it right 16

When NOT to do an R.E. When the program is premature and still requires considerable tinkering to work well When the project is on too small a scale to randomize into two representative groups If a positive impact has been proven using rigorous methodology and resources are sufficient to cover everyone After the program has already begun and you are not expanding elsewhere 17

Drawing the chain of causality We want to drawing the chain of causality how effective is the intervention? We also want to answer: Why is it effective Defining and measuring intermediate outcomes will enrich our understanding of the program, reinforce our conclusions, and make it easier to draw general lessons inputs intermediary outcomes primary outcome 18

The basics of R.E. Start with simple case: Take a sample of program applicants Randomly assign them to either: Treatment Group is offered treatment Control Group not allowed to receive treatment (during the evaluation period) 19

Key advantage of experiments Because members of the groups (treatment and control) do not differ systematically at the outset of the experiment Ø any difference that subsequently arises between them can be attributed to the program rather than to other factors. 20

Example 1: De-worming in Western Kenya About 30,000 children in 75 primary schools in rural Kenya were treated en masse in schools with drugs for hookworm, whipworm, roundworm, and schistosomiasis (bilharzia). Randomly phased into schools over several years Randomisation by school and individual Key Impacts Reduced the incidence of moderate-to-heavy infections by 25 percentage points. Reduced school absenteeism by 25 percent, with the largest gains among the youngest pupils. School participation in the area increased by at least 0.14 years of schooling per treated child. There was no evidence that de-worming increased test scores. By far the most cost-effective method for improving school participation 21

Example 2: Teacher Absenteeism Chaudhury et al. (2006): results of almost 70,000 surprise visits to a representative sample of primary schools and clinics across 9 poor countries. Teachers were absent 19 percent of the time on average, and health care workers 35 percent of the time. In general, the poorer the country the higher the absence rates. On an average day, 27 percent of teachers are not at work in Uganda. 22

Project to Combat Absenteeism Re-orientating Pay Structure Ordinarily, teachers were paid a salary of Rs. 1,000 (about $22) per month, for 21 days of teaching In the treatment schools, guaranteed base pay of Rs. 500. Rewarded with Rs. 50 for each valid day taught Under new system, monthly pay ranged from Rs. 500 to Rs. 1,300 Verification by Camera Student take a picture of teacher and other students at the beginning and end of day Teacher present if 5 hour difference and min number of students present Cameras collected a few days before the end of a pay period 23

Results: Project to Combat Absenteeism 24

Eight key steps in conducting an experiment 1. Design the study carefully 2. Randomly assign people to treatment or control 3. Collect baseline data 4. Verify that assignment looks random 5. Monitor process so that integrity of experiment is not compromised 25

Eight key steps in conducting an experiment 6. Collect follow up data for both the treatment and control groups. 7. Estimate program impacts by comparing mean outcomes of treatment group vs. mean outcomes of control group. 8. Assess whether program impacts are statistically significant and practically significant. 26

Detailed Example: Balsakhi Program 27

Balsakhi Program: Background Implemented by Pratham, an NGO from India Program provided tutors (Balsakhi) to help atrisk children with school work Previously teachers decided which children would get the balsakhi Suppose we evaluated the balsakhi program using a randomized experiment: What would this entail? How would we do it? 28

Methods to estimate impacts Let s look at different ways of estimating the impacts using the data from the schools that got a balsakhi 1. Pre Post (Before vs. After) 2. Simple difference 3. Difference in difference 4. Other non experimental methods 5. Randomized Experiment 29

Pre post (Before vs. After) Look at average change in test scores over the school year for the balsakhi children Can this difference (26.42) be interpreted as the impact of the balsakhi program? Average post test score for children with a balsakhi Average pretest score for children with a balsakhi 51.22 24.80 Difference 26.42 30

2 Simple difference Compare test scores of... Children that got balsakhi with Children that did not Can this difference ( 5.05) be interpreted as the impact of the balsakhi program? Average score for children with a balsakhi Average score for children without a balsakhi 51.22 56.27 Difference 5.05 505 31

3 Difference in Differences Compare gains in test scores of... Children who got balsakhi With gains in test scores of... Children who did not get balsakhi Can 6.82 be interpreted as the impact of the balsakhi program? Pretest Post test test Difference Average score for children 24.80 51.22 26.42 with a balsakhi Average score for children without a balsakhi 36.67 56.27 19.60 Difference 6.82 32

4 Other Methods There are more sophisticated non experimental methods to estimate program impacts: Regression Matching Instrumental Variables Regression Discontinuity These methods rely on being able to mimic the counterfactual under certain assumptions Problem: Assumptions are not testable 33

5 Randomized Experiment Suppose we evaluated the balsakhi program using a randomized experiment What would be the advantage of using this method to evaluate the impact of the balsakhi program? 34

Impact of Balsakhi Summary Method Impact Estimate (1) Pre post 26.42* (2) Simple Difference 5.05* (3) Difference in Difference 6.82* (4) Regression 1.92 (5)Randomized Experiment 5.87* *: Statistically significant at the 5% level Bottom Line: Which method we use matters! 35

Case Study Learn to Read evaluations 36

How to randomise? Randomizing at the individual level Randomizing at the group level: Cluster Randomized Trial Which level to randomize? How to choose? Nature of the treatment How is the intervention administered? What is the catchment area of each unit of intervention How wide is the potential impact? Generally, best to randomize at the level at which the treatment is administered. 37

Constraint 1: Political/ethical For sure certain things are not ethical. Lotteries as a solution Simple lottery is useful when there is no a priori reason to discriminate and a resource constraint Lotteries are simple, common and transparent Randomly chosen from applicant pool Participants know the winners and losers Perceived as fair Transparent 38

Constraint 2: Resources Most programs have limited resources Results in more eligible recipients than resources will allow services for Resource constraints are often an evaluator s best friend May distinguish recipients by arbitrary criteria 39

Constraint 3: contamination Spillovers/Crossovers Remember the counterfactual! If control group is different from the counterfactual, our results can be biased Can occur due to Spillovers Crossovers 40

Constraint 4: logistics Need to recognized logistical constraints in research designs. Ex: de worming treatment by health workers Suppose administering de worming drugs was one of many responsibilities of a health worker Suppose the health worker served members from both treatment and control groups It might be difficult to train them to follow different procedures for different groups, and to keep track of what to give whom 41

Constraint 5: fairness Randomizing at the child level within classes Randomizing at the class level within schools Randomizing at the community level 42

Constraint 6: sample size The program is only large enough to serve a handful of communities Primarily an issue of statistical power 43

Nice approaches 1. Phase in: takes advantage of expansion 2. Randomization near the threshold 3. Encouragement design - randomize encouragement to receive treatment 44

Design Basic Lottery Phase In Most useful when Program oversubscribed Expanding over time Everyone must receive treatment eventually Program has to be open to all comers When take up is Encouragement low, but can be easily improved with an incentive Advantages Familiar Easy to understand Easy to implement Can C be implemented in public Easy to understand Constraint is easy to explain Control group complies because they expect to benefit later Can randomize at individual id level l even when the program is not administered at that level Disadvantages Control group may not cooperate Differential attrition Anticipation of treatment may impact short run behavior Difficult to measure long term impact Measures impact of those who respond to the incentive Need large enough inducement to improve take up Encouragement itself may have direct effect 45

More nice approaches 4. Multiple treatment groups: Perfectly possible but a control group still required 5. Cross cutting treatments: Useful for testing different components of treatment in different combinations 6. Varying levels of treatment: Different dosage levels 46

Mechanics of randomization Need sample frame Pull out of a hat/bucket Use random number generator in spreadsheet program to order observations randomly Stata program code 47

Points to consider Attrition Partial Compliance Spillovers Threat to external validity: Treatment group behavior changes: Hawthorne effect Comparison group behavior changes: John Henry effect Generalizability of results Can it be nationally replicated? Is the sample representative? Special implementation? 48

Estimation Population We wish to learn about this The sample average is our estimate of the population average Sample But we only see this 49

What estimate to be Right On Average Which sampling strategy will give us a more accurate estimate? 50

Estimation When we do estimation Sample size allows us to say something about the variability of our estimate But it doesn t ensure that our estimate will be close to the truth on average Ø Only randomization ensures accuracy. Ø Then control precision with sample size. 51

Accuracy versus Precision truth estimates truth estimates truth estimates truth estimates 52

Measuring Significance: Scientific Method Does the scientific method apply to social science? The scientific method involves: 1) proposing a hypothesis 2) designing experimental studies to test the hypothesis How do we test hypotheses? 53

Hypothesis testing: Legal analogy In criminal law, most institutions follow the rule: innocent until proven guilty The prosecutor wants to prove their hypothesis that the accused person is guilty The burden is on the prosecutor to show guilt The jury or judge starts with the null hypothesis that the accused person is innocent 54

Hypothesis testing In program evaluation, instead of presumption of innocence, the rule is: presumption of insignificance Policymaker s hypothesis: the program improves learning Evaluators approach experiments using the hypothesis: There is zero impact of this program Then we test this Null Hypothesis (H0) The burden of proof is on the program Must show a statistically significant impact 55

Hypothesis testing If our measurements show a difference between the treatment and control group, our first assumption is: In truth, there is no impact (our H0 is still true) There is some margin of error due to sampling This difference is solely the result of chance (random sampling error) We (still assuming H0 is true) then use statistics to calculate how likely this difference is in fact due to random chance 56

Hypothesis testing: conclusions If it is very unlikely (less than a 5% probability) that the difference is solely due to chance: We reject our null hypothesis We may now say: our program has a statistically significant impact 57

Hypothesis testing: conclusions Are we now 100 percent certain there is an impact? No, we may be only 95% confident And we accept that if we use that 5% threshold, this conclusion may be wrong 5% of the time That is the price we re willing to pay since we can never be 100% certain Because we can never see the counterfactual, We must use random sampling and random assignment, and rely on statistical probabilities 58

Back to the Balsakhi example: Baseline test score data This was the distribution of test scores in the baseline. The test was out of 100. Some students did really well, most, not so well. Many actually scored zero. 59

Endline test scores Now, look at the improvement. Very few scored zero, and many scored much closer to the 40-point range... 60

Post test: control & treatment sd 61

Average difference: 6 points What s the probability that the 6 point difference is due to chance? (Testing statistical significance) 62

Significance level (5%) Difference under null Observed difference N=2 0 6 Treatment Mean Control Mean Critical region 63

Q: How many children would we need to randomly sample to detect that the difference between the two groups is statistically significantly different from zero? 64

That probability depends on sample size (here: N=2) Difference under null Observed difference N=2 0 6 Treatment Mean Control Mean 65

Significance: Sample size = 8 Difference under null Observed difference N=8 0 6 Treatment Mean Control Mean 66

Significance: Sample size = 18 Difference under null Observed difference N=18 0 6 Treatment Mean Control Mean 67

Significance: Sample size = 100 Difference under null Observed difference 0 6 Treatment Mean Control Mean 68

Significance: Sample size = 100 Difference under null Observed difference N=6,000 0 6 Treatment Mean Control Mean 69

Hypothesis testing: conclusions What if the probability is greater than 5%? We can t reject our null hypothesis Are we 100 percent certain there is no impact? No, it just didn t meet the statistical threshold to conclude otherwise Perhaps there is indeed no impact Or perhaps there is impact But not enough sample to detect it most of the time Or we got a very unlucky sample this time How do we reduce this error? 70

Hypothesis testing: conclusions When we use a 95% confidence interval How frequently will we detect effective programs? That is Statistical Power 71

Hypothesis testing: 95% confidence YOU CONCLUDE Effective No Effect THE TRUTH Effective No Effect Type I Error (5% of the time) Type II Error (low power) Power: How frequently will we detect effective programs? 72

Power: main ingredients Variance The more noisy it is to start with, the harder it is to measure effects Effect Size to be detected The more fine (or more precise) the effect size we want to detect, the larger sample we need Smallest effect size with practical / policy significance? Sample Size The more children we sample, the more likely we are to obtain the true difference 73

TIME Example Projects Digitising Savings Groups: Impact Evaluation of Savings Group e- Recording Technology in Western Kenya. Migration, Remittances, and Information. NOURISH: Nutrition and Treatment Outcome. Development of a Ugandan-Irish HIV/Nutrition Research Cluster. Powering Education: the impact of solar lamps on educational attainment in rural Kenya. Repayment flexibility, contract choice and investment decisions among Indian microfinance borrowers. So fresh and so clean: urban community engagement to keep streets trash-free and improve the sustainability of drainage infrastructure in Senegal. 74