Methods of Randomization Lupe Bedoya. Development Impact Evaluation Field Coordinator Training Washington, DC April 22-25, 2013

Similar documents
Evaluating Social Programs Course: Evaluation Glossary (Sources: 3ie and The World Bank)

Glossary From Running Randomized Evaluations: A Practical Guide, by Rachel Glennerster and Kudzai Takavarasha

TRACER STUDIES ASSESSMENTS AND EVALUATIONS

Statistical Power Sampling Design and sample Size Determination

Version No. 7 Date: July Please send comments or suggestions on this glossary to

Carrying out an Empirical Project

Measuring impact. William Parienté UC Louvain J PAL Europe. povertyactionlab.org

Introduction to Program Evaluation

Regression Discontinuity Design

Impact Evaluation Toolbox

Why randomize? Rohini Pande Harvard University and J-PAL.

Randomized Evaluations

Public Policy & Evidence:

Quantitative Methods. Lonnie Berger. Research Training Policy Practice

GUIDE 4: COUNSELING THE UNEMPLOYED

Lecture II: Difference in Difference. Causality is difficult to Show from cross

Randomization as a Tool for Development Economists. Esther Duflo Sendhil Mullainathan BREAD-BIRS Summer school

Analysis of TB prevalence surveys

Lecture II: Difference in Difference and Regression Discontinuity

Planning Sample Size for Randomized Evaluations.

INTERNAL VALIDITY, BIAS AND CONFOUNDING

Econometric analysis and counterfactual studies in the context of IA practices

Threats and Analysis. Bruno Crépon J-PAL

Introduction to Experiments

Vocabulary. Bias. Blinding. Block. Cluster sample

Abdul Latif Jameel Poverty Action Lab Executive Training: Evaluating Social Programs Spring 2009

Chapter 13 Summary Experiments and Observational Studies

Chapter 13. Experiments and Observational Studies. Copyright 2012, 2008, 2005 Pearson Education, Inc.

Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy Research

Critical Appraisal Series

How to Randomise? (Randomisation Design)

A NEW TRIAL DESIGN FULLY INTEGRATING BIOMARKER INFORMATION FOR THE EVALUATION OF TREATMENT-EFFECT MECHANISMS IN PERSONALISED MEDICINE

Causal Validity Considerations for Including High Quality Non-Experimental Evidence in Systematic Reviews

Measuring Impact. Program and Policy Evaluation with Observational Data. Daniel L. Millimet. Southern Methodist University.

UNDERSTANDING & MISUNDERSTANDING RANDOMIZED CONTROLLED TRIALS. Angus Deaton, Princeton: Nancy Cartwright, Durham & UCSD Stockholm October 2016

I. Introduction and Data Collection B. Sampling. 1. Bias. In this section Bias Random Sampling Sampling Error

SECONDARY DATA ANALYSIS: Its Uses and Limitations. Aria Kekalih

Instrumental Variables Estimation: An Introduction

Nonresponse Rates and Nonresponse Bias In Household Surveys

13 Things Mentally Strong People Don t Do.

In this second module in the clinical trials series, we will focus on design considerations for Phase III clinical trials. Phase III clinical trials

Methods for Addressing Selection Bias in Observational Studies

Lecture 4: Research Approaches

Protocol Development: The Guiding Light of Any Clinical Study

Pros. University of Chicago and NORC at the University of Chicago, USA, and IZA, Germany

Math 140 Introductory Statistics

Further data analysis topics

Instrumental Variables I (cont.)

Why published medical research may not be good for your health

Technical Track Session IV Instrumental Variables

Sampling for Impact Evaluation. Maria Jones 24 June 2015 ieconnect Impact Evaluation Workshop Rio de Janeiro, Brazil June 22-25, 2015

Impact Evaluation Methods: Why Randomize? Meghan Mahoney Policy Manager, J-PAL Global

Recent developments for combining evidence within evidence streams: bias-adjusted meta-analysis

QUASI-EXPERIMENTAL HEALTH SERVICE EVALUATION COMPASS 1 APRIL 2016

Welcome to this series focused on sources of bias in epidemiologic studies. In this first module, I will provide a general overview of bias.

UNIT 3 & 4 PSYCHOLOGY RESEARCH METHODS TOOLKIT

REVIEW FOR THE PREVIOUS LECTURE

DATA is derived either through. Self-Report Observation Measurement

Quantitative research Quiz Answers

Chapter 3. Producing Data

TRANSLATING RESEARCH INTO ACTION. Why randomize? Dan Levy. Harvard Kennedy School

Probabilities and Research. Statistics

Designing Pharmacy Practice Research

Confounding and Bias

Validity and Quantitative Research. What is Validity? What is Validity Cont. RCS /16/04

Analysis A step in the research process that involves describing and then making inferences based on a set of data.

Propensity Score Methods to Adjust for Bias in Observational Data SAS HEALTH USERS GROUP APRIL 6, 2018

Risk of bias assessment for. statistical methods

You can t fix by analysis what you bungled by design. Fancy analysis can t fix a poorly designed study.

AP STATISTICS 2014 SCORING GUIDELINES

ECONOMIC EVALUATION IN DEVELOPMENT

Asking and answering research questions. What s it about?

Clinical problems and choice of study designs

UN Handbook Ch. 7 'Managing sources of non-sampling error': recommendations on response rates

Does AIDS Treatment Stimulate Negative Behavioral Response? A Field Experiment in South Africa

What is: regression discontinuity design?

Can Quasi Experiments Yield Causal Inferences? Sample. Intervention 2/20/2012. Matthew L. Maciejewski, PhD Durham VA HSR&D and Duke University

Applied Quantitative Methods II

Experiments. ESP178 Research Methods Dillon Fitch 1/26/16. Adapted from lecture by Professor Susan Handy

Threats and Analysis. Shawn Cole. Harvard Business School

WWC STUDY REVIEW STANDARDS

CHL 5225 H Advanced Statistical Methods for Clinical Trials. CHL 5225 H The Language of Clinical Trials

Outcomes, Indicators and Measuring Impact

CAUSAL EFFECT HETEROGENEITY IN OBSERVATIONAL DATA

CASE STUDY 2: VOCATIONAL TRAINING FOR DISADVANTAGED YOUTH

Lecture 18: Controlled experiments. April 14

UNIT I SAMPLING AND EXPERIMENTATION: PLANNING AND CONDUCTING A STUDY (Chapter 4)

Quasi-Experimental Workshop

Statistical Sampling: An Overview for Criminal Justice Researchers April 28, 2016

The Effects of Maternal Alcohol Use and Smoking on Children s Mental Health: Evidence from the National Longitudinal Survey of Children and Youth

Trial designs fully integrating biomarker information for the evaluation of treatment-effect mechanisms in stratified medicine

Class 1: Introduction, Causality, Self-selection Bias, Regression

Study Design STUDY DESIGN CASE SERIES AND CROSS-SECTIONAL STUDY DESIGN

Matched Cohort designs.

VALIDITY OF QUANTITATIVE RESEARCH

SAMPLING AND SAMPLE SIZE

Population. population. parameter. Census versus Sample. Statistic. sample. statistic. Parameter. Population. Example: Census.

Patrick Breheny. January 28

Math 124: Modules 3 and 4. Sampling. Designing. Studies. Studies. Experimental Studies Surveys. Math 124: Modules 3 and 4. Sampling.

Transcription:

Methods of Randomization Lupe Bedoya Development Impact Evaluation Field Coordinator Training Washington, DC April 22-25, 2013

Content 1. Important Concepts 2. What vs. Why 3. Some Practical Issues 4. Select Randomization Methods

Content 1. Important Concepts 2. What vs. Why 3. Some Practical Issues 4. Select Randomization Methods

Important Concepts Outcomes: What we observe, measure and want to affect Counterfactual Outcomes: The potential outcome that would had taken place if the individual had not been exposed to the program Impact: Change in outcomes caused by the intervention Use of the word cause More schooling causes higher earnings. We mean to say: A person with more schooling has higher earnings relative to the earnings of that same person if s/he had less schooling

The Evaluation Problem Basically a missing data problem: We don t observe the counterfactual outcomes for the same people If use an inappropriate counterfactual Biased estimates of the impact Impact Estimate ˆD i = Y i t -Y j c ˆD i = D i +x i, j Matching error for having individuals j and i = selection bias True impact Example: individuals with more ability tend to study more years, so the impact estimate of comparing those who study more (treatment) with those who study less (comparison) will have a significant matching error or selection bias

The Statistical Solution Randomization balances out the selection problem. It does not eliminate it Definition The purposeful manipulation of a social program or policy randomizing groups into treatment and control status E[ ˆD i ] = E[D i ] if and only if E[x i, j ] = 0 Independence Assumption = The 2 groups will (on average) have ALL the same characteristics (observable and unobservable) The only difference is the treatment If the randomization is not well implemented we are back with selection problems

In a nutshell We observe outcomes of policies and programs. We want to say that these policies or programs caused outcomes to change. We define these changes as an impact. We do not observe counterfactual outcomes. We create a counterfactual. We must convincingly solve the selection problem. We need to make sure the randomization/ intervention is implemented as planned

Content 1. Important Concepts 2. What vs. Why 3. Some Practical Issues 4. Select Randomization Methods

Example: A pencil sharpener

Example: A pencil sharpener Hidden from the researcher

Approach A (What) Two approaches: What vs. How/Why Come up with treatment that might affect pencils (windows). Run RCT. Conclusion: Windows affect pencils. Approach B (How/Why) Make economic theory about an economic mechanism (smoke gives opossums incentives to leave home). Find testable implication (opening windows affect pencils) and run RCT. Conclusion: We can t reject that smoke incentivizes opossums Implication for policy creating targeted incentives is a solution if we want to affect pencil sharpening (of course it applies to many real problems)

Need to focus on the most interesting questions: Why does financial literacy affect growth? Through which mechanism (how) does it affect growth, if at all? These questions affect the generalizability of the model! That is why DIME focuses on the How/Why Treatment variations to understand mechanisms Test hypothesis Contribute to scale up, learn for other settings

Content 1. Important Concepts 2. What vs. Why 3. Some Practical Issues 4. Select Randomization Methods

What are Potential Problems Random Designs face? Ethical issues Denying services Can only use when everyone does not have a right to the program Program/operational concerns Adequate participant flow Contamination Cannot randomize at the correct unit of analysis External validity

Where Do Ethical Concerns Arise? Voluntary programs that can enroll all applicants Mandatory programs that can enroll all eligibles Entitlement programs Control group will be made worse off

..And When They Can Be Addressed? Voluntary programs often far more interest than realized programs often simply enroll first comers Mandatory programs capacity may be limited relative to eligibility program is a demonstration Entitlement Programs difficult to justify one possibility: compensation to control group another possibility: encouragement designs

Need to Define Subgroups prior to RA Subgroups defined prior to random assignment (exogenous) e.g. demographics e.g. baseline behaviors, outcome values can generate unbiased subgroup estimates Subgroups defined by events/actions after random assignment (endogenous) problematic e.g. program dosage, stayers vs. leavers

Illustration 1: Mandatory Testing Program Variants Eligible Population e.g., all children in public HS RA Treatment 1 Treatment 2 Control Implications: Treatment variations (cannot evaluate What ) Impact can be extrapolated to all eligible population

Illustration 2: Voluntary Program Eligible Population e.g., unemployed Non-Applicants Outreach Applicants e.g., unemployed who apply to job training programs RA Treatment Control Implications: Could evaluate the What Impact can only be extrapolated to applicants within the eligible population

Implications for internal as well generalizability Internal Validity: does it get the causal effect right in the sample you are studying? (e.g., the children in public HS who volunteer to participate in the experiment) External Validity: Generalizable to the entire population and to other populations (e.g., all children in public HS)

Content 1. Important Concepts 2. What vs. Why 3. Some Practical Issues 4. Select Randomization Methods Simple Random Design Stratified Random Design Clustered Random Design

Simple Random Design Eligible Population All students RA At the student level Treatment Control Unit of randomization = unit of analysis Random sample from the universe (with or without replacement) Results can be extrapolated to the eligible population

Stratified Random Design Oversample a strata Eligible (1500 Students) Why? We think the impact of the program is different across diverse groups (i.e., strata) We may be able to measure more precisely the estimates by strata if 400 Women 1100 Men we randomize at that level. For instance, if we are interested in knowing the impact of a RA (150T;150C) 300 Treatment RA (150T;150C) program on women, who are under-represented in your eligible population (e.g., women in maledominated careers), we need to oversample women 300 Control Implication: Need to use weights to estimate the overall impact because women are oversampled

Cluster Random Design Eligible Population 900 villages Unit of Randomization RA Treatment 1 450 villages Control 450 villages 3,000 households 3,000 households Unit of analysis Outcomes are measured at this level

Cluster Random Design Unit of randomization (e.g., a village) is different from the unit of analysis (e.g., household) primary and secondary sampling units Treatment effects may be correlated within a village usually decreases precision In the worst case scenario, units within the cluster are identical and you are left with only the number of clusters

Individuals within clusters tend to have more similar elements than elements selected at random Intra-cluster correlation highly affects internal validity more sample size needed. We get more information about the impact on the whole population randomizing at the individual level than at the cluster level Easy to be unbalanced

What can go wrong? Common Problems: Can Be Serious!! Potentially underpowered situations small samples; clustered samples; high variance outcomes High non-participation or low dosage serious concern in voluntary programs Control group crossover/other contaminants Sample attrition esp. by treatment status control group can be harder to retain/locate Potential Selection Problems

Critical when conducting experiments Design experiments that help answer how/why things work (e.g., test predictions of theories), especially if your purpose is to pledge applicability beyond the specific context of your experiment. Recognize the limitations of your particular experiment (e.g., what you are estimating, what population your results are applicable to) Guarantee randomization is well implemented, and the intervention goes as planned Very frequent cause of low-quality IEs. FC are crucial for this. Make the appropriate/feasible corrections (e.g., se, selection problems)