Why randomize? Rohini Pande Harvard University and J-PAL.

Similar documents
Impact Evaluation Methods: Why Randomize? Meghan Mahoney Policy Manager, J-PAL Global

TRANSLATING RESEARCH INTO ACTION. Why randomize? Dan Levy. Harvard Kennedy School

Case study: Telephone campaigns to encourage child vaccination: Are they effective?

What can go wrong.and how to fix it!

Threats and Analysis. Shawn Cole. Harvard Business School

Impact Evaluation Toolbox

Threats to Validity in Experiments. The John Henry Effect

Instrumental Variables Estimation: An Introduction

Threats and Analysis. Bruno Crépon J-PAL

Glossary From Running Randomized Evaluations: A Practical Guide, by Rachel Glennerster and Kudzai Takavarasha

The Practice of Statistics 1 Week 2: Relationships and Data Collection

Lecture II: Difference in Difference and Regression Discontinuity

CASE STUDY 2: VOCATIONAL TRAINING FOR DISADVANTAGED YOUTH

TRANSLATING RESEARCH INTO ACTION

Does Loss Aversion Motivate Collective Action?

Lecture II: Difference in Difference. Causality is difficult to Show from cross

Abdul Latif Jameel Poverty Action Lab Executive Training: Evaluating Social Programs Spring 2009

Evaluating Social Programs Course: Evaluation Glossary (Sources: 3ie and The World Bank)

Introduction to Applied Research in Economics Kamiljon T. Akramov, Ph.D. IFPRI, Washington, DC, USA

Methods for Addressing Selection Bias in Observational Studies

Experiments. ESP178 Research Methods Dillon Fitch 1/26/16. Adapted from lecture by Professor Susan Handy

Public Policy & Evidence:

Supplement 2. Use of Directed Acyclic Graphs (DAGs)

Methods of Randomization Lupe Bedoya. Development Impact Evaluation Field Coordinator Training Washington, DC April 22-25, 2013

Chapter 13. Experiments and Observational Studies. Copyright 2012, 2008, 2005 Pearson Education, Inc.

Chapter 13 Summary Experiments and Observational Studies

The Regression-Discontinuity Design

9 research designs likely for PSYC 2100

REVIEW FOR THE PREVIOUS LECTURE

The Stable Unit Treatment Value Assumption (SUTVA) and Its Implications for Social Science RCTs

You can t fix by analysis what you bungled by design. Fancy analysis can t fix a poorly designed study.

MS&E 226: Small Data

You must answer question 1.

EXPERIMENTAL RESEARCH DESIGNS

Regression Discontinuity Design

AP Statistics Chapter 5 Multiple Choice

Experimental design (continued)

ECONOMIC EVALUATION IN DEVELOPMENT

MATH& 146 Lesson 6. Section 1.5 Experiments

Introduction to Applied Research in Economics

Difference-in-Differences in Action

SYMSYS 130: Research Methods in the Cognitive and Information Sciences (Spring 2013)

This exam consists of three parts. Provide answers to ALL THREE sections.

Introduction to Applied Research in Economics

REMINDER: Fill out your course evaluation by Sunday. Also, CAOS and SATS extended to Sunday, December 8 23:50.

Checking the counterarguments confirms that publication bias contaminated studies relating social class and unethical behavior

Key questions when starting an econometric project (Angrist & Pischke, 2009):

Experimental Design Part II

PSY 250. Designs 8/11/2015. Nonexperimental vs. Quasi- Experimental Strategies. Nonequivalent Between Group Designs

Principle underlying all of statistics

SOME NOTES ON THE INTUITION BEHIND POPULAR ECONOMETRIC TECHNIQUES

Study Design. Study design. Patrick Breheny. January 23. Patrick Breheny Introduction to Biostatistics (171:161) 1/34

investigate. educate. inform.

What Behaviors Do Behavior Programs Change

A Review of Counterfactual thinking and the first instinct fallacy by Kruger, Wirtz, and Miller (2005)

Math 124: Modules 3 and 4. Sampling. Designing. Studies. Studies. Experimental Studies Surveys. Math 124: Modules 3 and 4. Sampling.

Sampling Controlled experiments Summary. Study design. Patrick Breheny. January 22. Patrick Breheny Introduction to Biostatistics (BIOS 4120) 1/34

Statistical Sampling: An Overview for Criminal Justice Researchers April 28, 2016

Causality and Treatment Effects

Math 124: Module 3 and Module 4

Creative Commons Attribution-NonCommercial-Share Alike License

Designed Experiments have developed their own terminology. The individuals in an experiment are often called subjects.

Stat 13, Intro. to Statistical Methods for the Life and Health Sciences.

Do Your Online Friends Make You Pay? A Randomized Field Experiment on Peer Influence in Online Social Networks Online Appendix

Analysis A step in the research process that involves describing and then making inferences based on a set of data.

Version No. 7 Date: July Please send comments or suggestions on this glossary to

Underlying Theory & Basic Issues

Who? What? What do you want to know? What scope of the product will you evaluate?

VALIDITY OF QUANTITATIVE RESEARCH

Population. population. parameter. Census versus Sample. Statistic. sample. statistic. Parameter. Population. Example: Census.

Randomized Evaluations

Instrumental Variables I (cont.)

Risk Aversion in Games of Chance

August 29, Introduction and Overview

STA Module 1 Introduction to Statistics and Data

Introduction: Research design

Chapter 9 Experimental Research (Reminder: Don t forget to utilize the concept maps and study questions as you study this and the other chapters.

Conducting Strong Quasi-experiments

AP Psychology -- Chapter 02 Review Research Methods in Psychology

Abstract Title Page Not included in page count. Title: Analyzing Empirical Evaluations of Non-experimental Methods in Field Settings

Introduction to Program Evaluation

Experimental Methods. Policy Track

Previous Example. New. Tradition

Quizzes (and relevant lab exercises): 20% Midterm exams (2): 25% each Final exam: 30%

Recognizing Ambiguity

Recent advances in non-experimental comparison group designs

Sampling for Success. Dr. Jim Mirabella President, Mirabella Research Services, Inc. Professor of Research & Statistics

Research Process. the Research Loop. Now might be a good time to review the decisions made when conducting any research project.

observational studies Descriptive studies

Margin of Error = Confidence interval:

Psychology 205, Revelle, Fall 2014 Research Methods in Psychology Mid-Term. Name:

Introduction: Research design

Economics 345 Applied Econometrics

Validity and Quantitative Research. What is Validity? What is Validity Cont. RCS /16/04

Do Get-Out-The-Vote Calls Reduce Turnout? The Importance of Statistical Methods for Field Experiments

Experimental Research Design

Methodology for Non-Randomized Clinical Trials: Propensity Score Analysis Dan Conroy, Ph.D., inventiv Health, Burlington, MA

TEACHING HYPOTHESIS TESTING: WHAT IS DOUBTED, WHAT IS TESTED?

Research Designs. Inferential Statistics. Two Samples from Two Distinct Populations. Sampling Error (Figure 11.2) Sampling Error

CHAPTER NINE DATA ANALYSIS / EVALUATING QUALITY (VALIDITY) OF BETWEEN GROUP EXPERIMENTS

Transcription:

Why randomize? Rohini Pande Harvard University and J-PAL www.povertyactionlab.org

Agenda I. Background on Program Evaluation II. What is a randomized experiment? III. Advantages and Limitations of Experiments IV. How wrong can you go: Vote 2002 campaign V. Conclusions

What is Program or Impact Evaluation? Program evaluation is a set of techniques used to determine if a treatment or intervention works. The treatment could be determined by an individual, or by a social program. To determine the causal effect of the treatment we need knowledge of counterfactuals, that is, what would have happened in the absence of the treatment? The true counterfactual is not observable: Observational studies, or associations, will not estimate causal effects The key goal of all program/impact evaluation methods is to construct or mimic the counterfactual.

Program Evaluation: The problem of Causal Inference What would have happened absent the program? How do we compare a world in which a program happens to a hypothetical world in which it doesn t? The program either happens or it doesn t. We cannot create a world in which the program never happened. So how do we measure this Counterfactual? We cannot do this exactly. But we can identify groups who are similar and measure their outcomes.

Counterfactuals We compare a group that is given the program to a group that is not given the program. Terminology: Treated (T): Group affected by program Control (C): Group unaffected by program Every program evaluation establishes a control group Randomized: Use random assignment of the program to create a control group which mimics the counterfactual. Non-randomized: Argue that a certain excluded group mimics the counterfactual.

What is a good counterfactual? When the only difference between Treatment and Control group is the program. In the absence of the program, Treatment and Control Units would have behaved the same. Key question: Are there other unobservable factors that might affect the treated units but not the control units? We call these unobservable factors selection bias. Presence of selection bias is rampant in observational studies.

Intuition Observational studies are contaminated with selection bias of unknown magnitude and direction. The selection bias is a consequence of the control and treatment groups differing in ways that we do not observe. Answer from Observational Study= Treatment Effect + Selection Bias If selection bias is positive, an observational study overstates the treatment effect; if negative, we understate this effect.

Intuition Observational studies are often used to understand the effect of exercise on health, taking extra classes on SAT scores and video games on exam performance. All are contaminated with selection bias. Key objective of program evaluation is finding or creating situations where the selection bias term is non-existent or very small. Good impact studies are those with small or zero selection bias. This will happen when: Treatment and Control groups differ only by exposure to the program. If they differ in the absence of the program, then we re not estimating a causal effect.

Types of Impact Evaluation Methods (1) Randomized Evaluations: Use coin-flip to classify subjects into Treatment and Control Groups. Give Treatment to the Treatment Group. Control group provides the counterfactual Also known as: Random Assignment Studies Randomized Field Trials Social Experiments Randomized Controlled Experiments

Types of Impact Evaluation Methods (Cont) (2) Non-Experimental or Quasi-Experimental Methods Simple Difference: Compare outcomes of treatment and Control group, where control group wasn t exposed to program for `exogenous reasons Differences-in-Differences Compare changes over time between treatment group and control group Statistical Matching Identify control group based on observables Instrumental Variables Predict treatment as function of variables which don t directly affect outcome of interest.

II WHAT IS A RANDOMIZED EVALUATION? www.povertyactionlab.org

The basics Start with simple case: Take a sample of program applicants Randomly assign them to either: Treatment Group - receives treatment Control Group - not allowed to receive treatment (during the evaluation period)

Random Assignment Suppose 50% of a group of individuals are randomly `treated to a program (without regard to their characteristics). If successfully randomized, individuals assigned to the treatment and control groups differ only in their exposure to the treatment. Implies that the distribution of both observable and unobservable characteristics in the treatment and control groups are statistically identical. Had no individual been exposed to the progam, the average outcomes of treatment and control group would have been the same, and there would be no selection bias!

Basic Setup of a Randomized Evaluation Target Population Potential Participants Evaluation Sample Random Assignment Treatment Group Control Group Participants No-Shows Based on Orr (1999)

Some variations on the basics Assigning to multiple treatment groups Assigning to units other than individuals or households Health Centers Schools Local Governments Villages Important factors: What is the decision-making unit At what level can data be collected?

Key Steps in conducting an experiment 1. Design the study carefully What is the problem? What is the key question to answer? What are the potential policy levers to address the problem? 2. Collect baseline data and randomly assign people to treatment or control 3. Verify that assignment looks random 4. Monitor process so that integrity of experiment is not compromised

Key Steps in conducting an experiment (cont) 5. Collect follow-up data for both the treatment and control groups 6. Estimate program impacts by comparing mean outcomes of treatment group vs. mean outcomes of control group 7. Assess whether program impacts are statistically significant and practically significant

III - ADVANTAGES AND LIMITATIONS OF RANDOMIZED EVALUATIONS www.povertyactionlab.org

Key advantage of experiments Because members of the groups (treatment and control) do not differ systematically at the outset of the experiment, any difference that subsequently arises between them can be attributed to the treatment rather than to other factors

Validity A tool to assess credibility of a study Internal validity: relates to ability to draw causal inference, i.e. can we attribute our impact estimates to the program and not to something else External validity: relates to ability to generalize to other settings of interest, i.e. can we generalize our impact estimates from this program to other populations, time periods, countries, etc? Random assignment versus Random sampling

Other advantages of experiments Relative to results from non-experimental studies, results from experiments are: Less subject to methodological debates Easier to convey More likely to be convincing to program funders and/or policymakers

Limitations of experiments Despite great methodological advantage of experiments, they are also potentially subject to threats to their validity. For example, Internal Validity (e.g. Hawthorne Effects, survey non-response, noshows, crossovers, duration bias, etc.) External Validity (e.g. are the results generalizable to the population of interest?) It is important to realize that some of these threats also affect the validity of nonexperimental studies

Other limitations of experiments Measure the impact of the offer to participate in the program Depending on design it may be possible to understand of the mechanism underlying the intervention. Costs (though need to think about costs of getting the wrong answer and costs ) Partial equilibrium

Other limitations of experiments Ethical issues Some public policies are implemented in randomized fashion due to lack of resources, for ethical reasons or to remove discretion from the system. In such cases researchers can evaluate the policy immediately e.g. in which Indian villages are leader positions reserved for woman Exploit Pilots or phase-in due to budgetary constraints

A nuanced view of Randomization Randomization offers best gold-standard answer for any treatment that we want to offer on a universal, compulsory, or mandatory basis. Vaccinations. Access to education until a certain level. Medical treatment for a certain condition. Universal Health Insurance. A randomization is less useful for situations where the effect of the treatment varies across individuals, and where individuals will be allowed to self-select into treatment. Attending Business School. Elective Surgery.

IV HOW WRONG CAN YOU GO: THE VOTE 2002 CAMPAIGN www.povertyactionlab.org

Case 1 Vote 2002 campaign Intervention designed to increase voter turnout in 2002 Phone calls to ~60,000 individuals Only ~35,000 individuals were reached Key Question: Did the campaign have a positive effect (i.e. impact) on voter turnout? 5 methods were used to estimate impact

Methods 1-3 Based on comparing reached vs not-reached Method 1: Simple difference in voter turnout, (Voter turnout) reached (Voter turnout) not reached Method 2: Multiple Regression controlling for some differences between the two groups Method 3: Method 2, but also controlling for differences in past voting behavior between the two groups

Impact Estimates using Methods 1-3 Estimated Impact Method 1 10.8 pp * Method 2 6.1 pp * Method 3 4.5 pp * pp=percentage points; *: statistically significant at the 5% level

Methods 1-3 Is any of these impact estimates likely to be the true impact of the Vote 2002 campaign?

Reached vs. Not Reached Reached Not Reached Difference Female 56.2% 53.8% 2.4 pp* Newly Regist. 7.3% 9.6% -2.3 pp* From Iowa 54.7% 46.7% 8.0 pp* Voted in 2000 71.7% 63.3% 8.3 pp* Voted in 1998 46.6% 37.6% 9.0 pp* pp=percentage points *: statistically significant at the 5% level

Method 4: Matching Similar data available on 2,000,000 individuals Select as a comparison group a subset of the 2,000,000 individuals that resembles the reached group as much as possible Statistical procedure: matching To estimate impact, compare voter turnout between reached and comparison group

An illustration of matching Source: Arceneaux, Gerber, and Green (2004)

Impact Estimates using Matching Estimated Impact Matching on 4 covariates 3.7 pp * Matching on 6 covariates 3.0 pp * Matching on all covariates 2.8 pp * pp=percentage points; *: statistically significant at the 5% level

Method 4: Matching Is this impact estimate likely to be the true impact of the Vote 2002 campaign? Key: These two groups should be equivalent in terms of the observable characteristics that were used to do the matching. But what about unobservable characteristics?

Method 5: Randomized Experiment Turns out 60,000 were randomly chosen from population of 2,060,000 individuals Hence, treatment was randomly assigned to two groups: Treatment group (60,000 who got called) Control group (2,000,000 who did not get called) To estimate impact, compare voter turnout between treatment and control groups Do a statistical adjustment to address the fact that not everyone in the treatment group was reached

Method 5: Randomized Experiment Impact estimate: 0.4 pp, not statistically significant Is this impact estimate likely to be the true impact of the Vote 2002 campaign? Key: Treatment and control groups should be equivalent in terms of both their observable and unobservable characteristics Hence, any difference in outcomes can be attributed to the Vote 2002 campaign

Summary Table Method Estimated Impact 1 Simple Difference 10.8 pp * 2 Multiple regression 6.1 pp * 3 Multiple regression with 4.5 pp * panel data 4 Matching 2.8 pp * 5 Randomized Experiment 0.4 pp

VI CONCLUSIONS

Conclusions Good public policies require knowledge of causal effects. Causal effects are estimated only if we have good counterfactuals. In the absence of good counterfactuals, analysis is contaminated with selection bias. Be suspicious of causal claims that come from observational studies. Randomization offers a solution to generating good counterfactuals.

Conclusions If properly designed and conducted, social experiments provide the most credible assessment of the impact of a program Results from social experiments are easy to understand and much less subject to methodological quibbles Credibility + Ease =>More likely to convince policymakers and funders of effectiveness (or lack thereof) of program

Conclusions (cont.) However, these advantages are present only if social experiments are well designed and conducted properly Must assess validity of experiments in same way we assess validity of any other study Must be aware of limitations of experiments

THE END