Regression Discontinuity Design

Similar documents
Research Design. Beyond Randomized Control Trials. Jody Worley, Ph.D. College of Arts & Sciences Human Relations

Conditional Average Treatment Effects

The Regression-Discontinuity Design

Regression Discontinuity Analysis

8.SP.1 Hand span and height

Chapter 10 Quasi-Experimental and Single-Case Designs

We re going to talk about a class of designs which generally are known as quasiexperiments. They re very important in evaluating educational programs

GUIDE 4: COUNSELING THE UNEMPLOYED

Lecture II: Difference in Difference. Causality is difficult to Show from cross

Impact Evaluation Methods: Why Randomize? Meghan Mahoney Policy Manager, J-PAL Global

TRANSLATING RESEARCH INTO ACTION. Why randomize? Dan Levy. Harvard Kennedy School

STATISTICAL CONCLUSION VALIDITY

VALIDITY OF QUANTITATIVE RESEARCH

Propensity Score Analysis Shenyang Guo, Ph.D.

Instrumental Variables Estimation: An Introduction

Research Methods and Analysis

Lecture II: Difference in Difference and Regression Discontinuity

Chapter 3 CORRELATION AND REGRESSION

AP STATISTICS 2010 SCORING GUIDELINES

Can Quasi Experiments Yield Causal Inferences? Sample. Intervention 2/20/2012. Matthew L. Maciejewski, PhD Durham VA HSR&D and Duke University

Research Approaches Quantitative Approach. Research Methods vs Research Design

Measures of Dispersion. Range. Variance. Standard deviation. Measures of Relationship. Range. Variance. Standard deviation.

Chapter 7: Descriptive Statistics

The Late Pretest Problem in Randomized Control Trials of Education Interventions

Glossary From Running Randomized Evaluations: A Practical Guide, by Rachel Glennerster and Kudzai Takavarasha

What can go wrong.and how to fix it!

Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy Research

Overview of Perspectives on Causal Inference: Campbell and Rubin. Stephen G. West Arizona State University Freie Universität Berlin, Germany

Validity and Quantitative Research. What is Validity? What is Validity Cont. RCS /16/04

Methods of Randomization Lupe Bedoya. Development Impact Evaluation Field Coordinator Training Washington, DC April 22-25, 2013

Quasi-Experimental and Single Case Experimental Designs. Experimental Designs vs. Quasi-Experimental Designs

6. Unusual and Influential Data

Psychology Research Process

Final Exam: PSYC 300. Multiple Choice Items (1 point each)

This means that the explanatory variable accounts for or predicts changes in the response variable.

Regression Discontinuity Design. Day 2 of Quasi Experimental Workshop

Chapter 11: Experiments and Observational Studies p 318

Randomized Controlled Trials Shortcomings & Alternatives icbt Leiden Twitter: CAAL 1

Rival Plausible Explanations

26:010:557 / 26:620:557 Social Science Research Methods

An Introduction to Regression Discontinuity Design

Quantitative Methods. Lonnie Berger. Research Training Policy Practice

Causal Validity Considerations for Including High Quality Non-Experimental Evidence in Systematic Reviews

EXERCISE: HOW TO DO POWER CALCULATIONS IN OPTIMAL DESIGN SOFTWARE

Introduction to Program Evaluation

Technical Track Session IV Instrumental Variables

CASE STUDY 2: VOCATIONAL TRAINING FOR DISADVANTAGED YOUTH

EXPERIMENTAL RESEARCH DESIGNS

Section 3.2 Least-Squares Regression

Why randomize? Rohini Pande Harvard University and J-PAL.

FIGURE 1-The Cognitive Model. Core belief. I m incompetent. Intermediate belief. If I don t understand something perfectly, then I m dumb

Chapter 4: Scatterplots and Correlation

Formative and Impact Evaluation. Formative Evaluation. Impact Evaluation

AP STATISTICS 2008 SCORING GUIDELINES (Form B)

Colour Communication.

UNIT 1CP LAB 1 - Spaghetti Bridge

Chapter 4. Navigating. Analysis. Data. through. Exploring Bivariate Data. Navigations Series. Grades 6 8. Important Mathematical Ideas.

Welcome to this series focused on sources of bias in epidemiologic studies. In this first module, I will provide a general overview of bias.

CHAPTER ONE CORRELATION

Supplement 2. Use of Directed Acyclic Graphs (DAGs)

(2) In each graph above, calculate the velocity in feet per second that is represented.

If you could interview anyone in the world, who. Polling. Do you think that grades in your school are inflated? would it be?

Georgina Salas. Topics EDCI Intro to Research Dr. A.J. Herrera

Business Statistics Probability

Chapter 9 Experimental Research (Reminder: Don t forget to utilize the concept maps and study questions as you study this and the other chapters.

Overview of the Logic and Language of Psychology Research

Recent developments for combining evidence within evidence streams: bias-adjusted meta-analysis

Quasi-experimental analysis Notes for "Structural modelling".

TRANSLATING RESEARCH INTO ACTION

Chapter 3: Examining Relationships

Chapter 13. Experiments and Observational Studies

Is Knowing Half the Battle? The Case of Health Screenings

MATH 2560 C F03 Elementary Statistics I LECTURE 6: Scatterplots (Continuation).

3/25/2016. Ashley Dittmar. What s Wrong Here? Learning Assessment Question 1

Methods of Reducing Bias in Time Series Designs: A Within Study Comparison

Chapter 12. The One- Sample

Still important ideas

Conduct an Experiment to Investigate a Situation

ROC Curve. Brawijaya Professional Statistical Analysis BPSA MALANG Jl. Kertoasri 66 Malang (0341)

CHAPTER 3 Describing Relationships

HPS301 Exam Notes- Contents

Unit 1 Exploring and Understanding Data

MS&E 226: Small Data

Unit 7 Comparisons and Relationships

Part 1. For each of the following questions fill-in the blanks. Each question is worth 2 points.

The Fallacy of Taking Random Supplements

MITOCW conditional_probability

[En français] For a pdf of the transcript click here.

Instrumental Variables I (cont.)

Chapter 13 Summary Experiments and Observational Studies

The Logic of Data Analysis Using Statistical Techniques M. E. Swisher, 2016

Regression-Discontinuity Design. Day 2 of Quasi-Experimental Workshop

Chapter 13. Experiments and Observational Studies. Copyright 2012, 2008, 2005 Pearson Education, Inc.

Scientific Inquiry Section 1: Length & Measurement ruler or meter stick: equipment used in the lab to measure length in millimeters, centimeters or

Biostatistics. Donna Kritz-Silverstein, Ph.D. Professor Department of Family & Preventive Medicine University of California, San Diego

Thank you, Dr. Chalasani. The title for this talk was not my idea. It was Dr. Senior's idea. I am not sure what "Down with the Tower of Babel" means,

REPEATED MEASURES DESIGNS

MAT Mathematics in Today's World

Lecture 15. There is a strong scientific consensus that the Earth is getting warmer over time.

Transcription:

Regression Discontinuity Design

Regression Discontinuity Design Units are assigned to conditions based on a cutoff score on a measured covariate, For example, employees who exceed a cutoff for absenteeism could receive treatment, and those below do not. The effect is measured as the discontinuity between treatment and control regression lines at the cutoff (it is not the group mean difference). Here is the theory behind RDD in graphs using simulated data:

Imagine an ordinary scatterplot between, say, pretest and posttest. Not surprisingly, pretest and posttest are positively correlated. [Notice, this graph could be from RDD, or from RE (more later).]

Now imagine what would happen if everyone with a pretest score 50 received a treatment designed to improve posttest scores. How would the scatterplot change? The answer is: It depends on whether the treatment is effective or not. This example uses a pretest as the assignment variable, but :

The Assignment Variable Can be any variable A measure of need or merit But also, the order in which people walk in the door The last four digits of your SSN AV s yield more power the more highly correlated they are with the outcome (just like any covariate). Now back to our example of how the treatment affects the scatterplot

If the treatment had no effect, the scatterplot would not change at all (except for adding the cutoff line at 50).

If the treatment worked, however, the posttest score of everyone in the treatment group would increase (move up vertically let us temporarily assume a constant treatment effect for all cases), but the control dots would not move.

The size of the discontinuity between regression lines is the size of the effect In this simple case, it is hard to come up with a plausible alternative explanation for the effect, especially at the cutoff where people are so similar. (But if another program uses exactly the same cutoff, as in income eligibility, then )

But where should we measure the size of the discontinuity? At the cutoff: because (1) that is where we usually have the most observations, and (2) that is where units are most similar to each other.

But hypothetically, we could measure the effect (the discontuity) anywhere on the pretest continuum. To do so, we would (1) project a hypothetical counterfactual control group regression line into the treatment space, and then (2) measure the discontinuity between the observed treatment group and the counterfactual control: BUT.

(1) Such extrapolations of hypothetical counterfactuals are risky (more later). E.g., suppose the real counterfactual should have been nonlinear. (2) And we have relatively fewer observations the further we project from the cutoff, so our estimate may be less precise. So we prefer to measure the effect at the cutoff.

RDD Effect = RE Effect A discontinuity between regression lines is an unfamiliar definition of an effect. In a RE, we usually think of the effect as the difference in group means. Not so for RDD But the effect in an RE can also be conceptualized as a regression discontinuity.

If X = treatment and O = control, and if the treatment had no effect, then the scatterplot between pretest and posttest would not change (just as in RDD). Notice in the RE, unlike RDD, we have X s above and below 50, and O s above and below 50 there is no assignment cutoff

If the treatment is effective in the RE, then all the X s would move up but the O s would not. The RE effect can be thought of as a weighted average of the discontinuity between the X regression line and the O regression line at all points along the pretest axis. Notice also, we don t have to project a counterfactual regression line because we have a T and C regression line at all points along the pretest continuum.

Two Real Examples

4.9 4.8 Visits with physician per person per year 4.7 4.6 4.5 4.4 4.3 4.2 4.1 4 Under $3,000 $3,000 -$4,999 $5,000 -$6,999 $7,000 -$9,999 $10,000 -$14,999 $15,000 or more Income Level The Effects of Medicaid on Physician Visits: A Regression Discontinuity Design

Outcome SOPS on vertical axis AV is risk of psychosis scale: Scale of Prodromal Symptoms (SOPS) Control Group Treatment Group FACT Assignment variable SOPS on horizontal axis

Ways of thinking about RD Thinking of it as similar (but not identical to) a randomized experiment at cutoff Thinking of it as a completely known assignment process, again like a randomized experiment

Advantages When properly implemented and analyzed, RD yields an unbiased estimate of treatment effect (see Rubin, 1977). Units are assigned to treatment based on their need for treatment (or how much they merit a reward), consistent with the way in which many policies are implemented.

Disadvantages Statistical precision is considerably less than a randomized experiment of the same size (2.5x less precise, need 3-5x as many cases). Careful attention to power is crucial. Effects are unbiased only if the functional form of the relationship between the assignment variable and the outcome variable is correctly modeled, including: Nonlinear Relationships Interactions Fuzzy Discontinuity: Failure to adhere to the cutoff

Improving Power Place the cutoff at the mean Using all the standard methods to improve power (e.g., add covariates). Combining randomized and nonrandomized designs

Nonlinearities in RDD In a regression discontinuity design, we measure the size of the effect as the size of the discontinuity in regression lines at the cutoff:

The size of the discontinuity at the cutoff is the size of the effect.

Threat to SCV: Nonlinearities in Functional Form Anything that affects the size of that discontinuity other than treatment is a possible threat to validity. In the example we just used, we assumed that the relationship between the assignment variable and the outcome was linear our regression lines are straight lines. But suppose the functional form is not linear? The two things most likely to cause this problem are: Nonlinear Relationships between the assignment variable and the outcome Interactions between the assignment variable and treatment.

Here we see a discontinuity between the regression lines at the cutoff, which would lead us to conclude that the treatment worked. But this conclusion would be wrong because we modeled these data with a linear model when the underlying relationship was nonlinear.

If we super-impose a nonlinear regression line 1 onto the data, a line that seems to match the curves in the data pretty well, we see no discontinuity at the cutoff anymore, and correctly conclude that the treatment had no effect. 1 In this case, a cubic function (X 3 )

Functional Form: Interactions Sometimes the treatment works better for some people than for others For example, it is common to find that more advantaged children (higher SES, higher pretest achievement scores) benefit more from treatment than do less advantaged children. If this interaction (between the assignment variable and treatment) is not modeled correctly, a false discontinuity will appear:

Here we see a discontinuity that suggests a treatment effect. However, these data are again modeled incorrectly, with a linear model that contains no interaction terms, producing an artifactual discontinuity at the cutoff

If we superimpose the regression lines that would have been obtained had an interaction term been included, we would find no discontinuity at the cutoff

The interpretation of this example is important to understand. The title of the graph says false treatment main effect. However, the treatment did have an interaction effect: Treatment helped children with higher scores on the assignment variable more than children with lower scores on the assignment variable

Here we see an example where the treatment had both a main effect and an interaction effect, correctly modeled. main ^

Manipulation of the AV: The Irish School Leavers Examination An exam is given to determine who continues in school and who leaves, the decision being made using a cutoff The exam is graded by graders who are aware of the cutoff and the consequences. Graders showed a marked reluctance to assign exam scores just below the cutoff point:

Cutoff Frequency Lower Scores Examination Scores Higher Scores We would expect that examination scores would be normally distributed, but due to grader reluctance to assign scores just below the cutoff, the distribution is not normal. This can lead to nonlinearities between assignment and outcome.

A Related Problem: Crossover In a randomized experiments, sometimes people fail to receive the assigned condition In RDD, this also happens failure to adhere to the cutoff Solutions are the same as in a RE, e.g., ITT IV

How Problematic is Misassignment? Just as a slightly degraded randomized experiment may still produce better results than many quasi-experiments So also a slightly misassigned RD design may produce better results than many other quasiexperiments. Rules of thumb The fewer the participants who are misassigned, the less the problem. The narrower the range of misassignment around the cutoff, the less the problem.

Coping with Fuzzy RD: Intent-to-Treat Monitor both assignment and treatment received so that you know whether violations of the cutoff occur. Intent-to-Treat: In a randomized experiment, it is standard to analyze the data according to how people were assigned, not according to which treatment they actually received: This preserves the internal validity of the design But it answers the question what is the effect of assignment to treatment, which is different from what is the effect of treatment. One can apply this same logic (ITT) to RDD as well. Just as with a randomized experiment, the more crossovers there are, the worse the problem.

Coping with Fuzzy RD: Analysis Sensitivity Analysis: Run the analysis both including and excluding crossovers If the results are similar, confidence increases. A number of authors suggest that if the number of crossovers is no more than 5%, crossovers can probably be excluded with little problem. May be worth trying some of the methods used in randomized experiments to cope with this problem, though we know of no research on this: Instrumental variables Propensity scores Selection bias models

Summary RD is the only quasi-experimental design that yields an unbiased estimate. RD can be used with both archival data and original data. But power and statistical analysis require careful attention.