Regression Discontinuity Analysis

Similar documents
Chapter 10 Quasi-Experimental and Single-Case Designs

REPEATED MEASURES DESIGNS

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

Unit 1 Exploring and Understanding Data

Throwing the Baby Out With the Bathwater? The What Works Clearinghouse Criteria for Group Equivalence* A NIFDI White Paper.

Overview of Perspectives on Causal Inference: Campbell and Rubin. Stephen G. West Arizona State University Freie Universität Berlin, Germany

Chapter 1: Explaining Behavior

11/24/2017. Do not imply a cause-and-effect relationship

26:010:557 / 26:620:557 Social Science Research Methods

Results & Statistics: Description and Correlation. I. Scales of Measurement A Review

(CORRELATIONAL DESIGN AND COMPARATIVE DESIGN)

Business Statistics Probability

Regression Discontinuity Design

On the purpose of testing:

The Regression-Discontinuity Design

1.4 - Linear Regression and MS Excel

HPS301 Exam Notes- Contents

Addendum: Multiple Regression Analysis (DRAFT 8/2/07)

Correlation and Regression

Abstract Title Page Not included in page count. Title: Analyzing Empirical Evaluations of Non-experimental Methods in Field Settings

WELCOME! Lecture 11 Thommy Perlinger

Chapter 21 Multilevel Propensity Score Methods for Estimating Causal Effects: A Latent Class Modeling Strategy

Chapter 11 Nonexperimental Quantitative Research Steps in Nonexperimental Research

BOOTSTRAPPING CONFIDENCE LEVELS FOR HYPOTHESES ABOUT QUADRATIC (U-SHAPED) REGRESSION MODELS

2013/4/28. Experimental Research

Impact Evaluation Methods: Why Randomize? Meghan Mahoney Policy Manager, J-PAL Global

3 CONCEPTUAL FOUNDATIONS OF STATISTICS

Propensity Score Analysis Shenyang Guo, Ph.D.

Causal Validity Considerations for Including High Quality Non-Experimental Evidence in Systematic Reviews

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES

The Pretest! Pretest! Pretest! Assignment (Example 2)

The following are questions that students had difficulty with on the first three exams.

9 research designs likely for PSYC 2100

Statistical Methods and Reasoning for the Clinical Sciences

Comparing Pre-Post Change Across Groups: Guidelines for Choosing between Difference Scores, ANCOVA, and Residual Change Scores

Chapter 11. Experimental Design: One-Way Independent Samples Design

Recent advances in non-experimental comparison group designs

Should Causality Be Defined in Terms of Experiments? University of Connecticut University of Colorado

investigate. educate. inform.

Supplement 2. Use of Directed Acyclic Graphs (DAGs)

IAPT: Regression. Regression analyses

Running head: INDIVIDUAL DIFFERENCES 1. Why to treat subjects as fixed effects. James S. Adelman. University of Warwick.

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n.

Experimental Design. Dewayne E Perry ENS C Empirical Studies in Software Engineering Lecture 8

Preliminary Report on Simple Statistical Tests (t-tests and bivariate correlations)

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

TRANSLATING RESEARCH INTO ACTION. Why randomize? Dan Levy. Harvard Kennedy School

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Chapter Three Research Methodology

EXPERIMENTAL RESEARCH DESIGNS

Classical Psychophysical Methods (cont.)

Statistics and Probability

CHILD HEALTH AND DEVELOPMENT STUDY

Chapter 3: Examining Relationships

baseline comparisons in RCTs

Chapter 9 Experimental Research (Reminder: Don t forget to utilize the concept maps and study questions as you study this and the other chapters.

CHAPTER ONE CORRELATION

Still important ideas

Statistics for Psychology

Audio: In this lecture we are going to address psychology as a science. Slide #2

Formative and Impact Evaluation. Formative Evaluation. Impact Evaluation

Problem Set 5 ECN 140 Econometrics Professor Oscar Jorda. DUE: June 6, Name

Moderation in management research: What, why, when and how. Jeremy F. Dawson. University of Sheffield, United Kingdom

Assignment 4: True or Quasi-Experiment

Chapter 1 Introduction to Educational Research

Author's response to reviews

Choosing an Approach for a Quantitative Dissertation: Strategies for Various Variable Types

Sawtooth Software. MaxDiff Analysis: Simple Counting, Individual-Level Logit, and HB RESEARCH PAPER SERIES. Bryan Orme, Sawtooth Software, Inc.

EVALUATING PROGRAM EFFECTIVENESS USING THE REGRESSION POINT DISPLACEMENT DESIGN

Reminders/Comments. Thanks for the quick feedback I ll try to put HW up on Saturday and I ll you

Topic #6. Quasi-experimental designs are research studies in which participants are selected for different conditions from pre-existing groups.

Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy Research

Evaluation Models STUDIES OF DIAGNOSTIC EFFICIENCY

RESEARCH METHODS. Winfred, research methods,

Chapter 20: Test Administration and Interpretation

STATISTICS AND RESEARCH DESIGN

Methods for Addressing Selection Bias in Observational Studies

Lesson 11 Correlations

Why do Psychologists Perform Research?

Instrumental Variables Estimation: An Introduction

Problem 1) Match the terms to their definitions. Every term is used exactly once. (In the real midterm, there are fewer terms).

Measuring and Assessing Study Quality

Chapter 6 Topic 6B Test Bias and Other Controversies. The Question of Test Bias

Data and Statistics 101: Key Concepts in the Collection, Analysis, and Application of Child Welfare Data

An Empirical Study on Causal Relationships between Perceived Enjoyment and Perceived Ease of Use

Risk Aversion in Games of Chance

Experimental Methods in Health Research A. Niroshan Siriwardena

Chapter Eight: Multivariate Analysis

Research in Physical Medicine and Rehabilitation

RESEARCH METHODS. Winfred, research methods, ; rv ; rv

12/30/2017. PSY 5102: Advanced Statistics for Psychological and Behavioral Research 2

CHAPTER EIGHT EXPERIMENTAL RESEARCH: THE BASICS of BETWEEN GROUP DESIGNS

MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and. Lord Equating Methods 1,2

Doctors Fees in Ireland Following the Change in Reimbursement: Did They Jump?

Multiple Regression Models

Doing Quantitative Research 26E02900, 6 ECTS Lecture 6: Structural Equations Modeling. Olli-Pekka Kauppila Daria Kautto

Research Methods 1 Handouts, Graham Hole,COGS - version 1.0, September 2000: Page 1:

Experimental Research. Types of Group Comparison Research. Types of Group Comparison Research. Stephen E. Brock, Ph.D.

Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F

Transcription:

Regression Discontinuity Analysis A researcher wants to determine whether tutoring underachieving middle school students improves their math grades. Another wonders whether providing financial aid to low-income students has the desired effects on student success and dropout rates. A third hopes to assess the effectiveness of special support programs for promising high school athletes. These studies fit the definition of a regression discontinuity design, whereby participants who satisfy a chosen criterion are assigned to a certain treatment and some outcome variable is measured later. Regression discontinuity analysis is a statistical tool that allows researchers to examine the effectiveness of the treatment in such studies. Regression discontinuity analysis is used for studies in which participants are assigned to treatment conditions based on a known assignment rule rather than randomly being assigned to conditions. Researchers or practitioners define an a priori cutoff point (Z0) for participants scores on an assignment variable (Z). Participants below the cutoff point receive the treatment, whereas those above the cutoff point do not (or vice-versa). Participants are thus divided into groups defined by a dichotomous treatment variable (X). At a later point, the researchers measure the relevant outcome variable (Y). The goal of the regression discontinuity analysis is to determine whether the treatment has the desired effect on the outcome variable. The standard model To make these ideas more concrete, the following example will run through this text. In this hypothetical study, the assignment variable is a student s score on a standardized test taken in tenth grade, the treatment is whether the student is enrolled in a standardized test prep class, and the focal outcome measure is "self-efficacy," the student s

belief in her ability to improve her standardized test performance. The school provides the test prep class to students scoring in the lowest 30% on the tenth grade test. The data from this hypothetical example are displayed in Figure 1. One key to understanding this type of analysis is noting that over and above the effect of the treatment variable, there is a relationship between the assignment variable and the outcome variable. In the given example, even if the test prep class is effective, it is quite possible that the students who attended the class will have lower self-efficacy scores on average than those who did not, simply because they started out at a lower level at the outset of the study. The question is whether the students who receive the class will have higher self-efficacy than would be predicted based on their tenth grade test scores. A regression discontinuity design is analyzed as follows: the outcome variable (Y, self-efficacy) is regressed on the treatment variable (X, attending the test prep class or not) and the assignment variable (Z, tenth grade test score). One thus obtains the following regression equation: Y = b0 + b1x + b2z + e (1) If the coefficient b1 is statistically significant, the data suggest that the treatment has an effect on the outcome variable. On the graph, this treatment effect will manifest as a vertical discrepancy between the two parallel regression lines. In the given example, the self-efficacy scores of the students in the test prep class were higher than would be expected based on their tenth grade test scores. Because the treatment is dichotomous, the treatment effect is exactly equal to the coefficient b1. Students who attended the test prep class had self-efficacy scores 7.5 points higher as a result of taking the class. Curvilinear relationships

Aside from other core model assumptions (discussed elsewhere in this volume), linearity is exceptionally important to estimating the treatment effect without bias. If the relationship between the assignment and outcome variables is not linear, the b1 coefficient will not represent the treatment effect accurately. For example, imagine a dataset in which there is no treatment effect and a curvilinear relationship between the assignment variable and the outcome variable (see Figure 2a). The data could be accurately described with the following model: Y = b0 + b1z + b2z 2 + e (2) The treatment has no effect here, so the researchers should find a coefficient of zero if they add treatment as a third predictor to this model. When the researchers ignore this curvilinear relationship and analyze the data with the model described in equation 1, it is possible to obtain a significant treatment effect (see Figure 2b). However, this effect is due entirely to the fact that straight regression lines are being fit to a curved data pattern. When theory and prior studies suggest that there is a curvilinear relationship between assignment variable and outcome variable, it is advised to add a quadratic term to the model described in equation 1. The full model would then be: Y = b0 + b1x + b2z + b3z 2 + e (3) Like before, the coefficient b1 represents the treatment effect, over and above the (linear and quadratic) effect of the assignment variable. Interactions between treatment and outcome In certain cases, the treatment s effectiveness depends on individuals' scores on the assignment variable. Two cases are common: (a) individuals with scores on the assignment variable close to the cut-off point benefit less from the treatment (see Figure 3a), and (b)

individuals with scores on the assignment variable close to the cut-off point benefit more from the treatment (see Figure 3b). Both cases are problematic for the classic regression discontinuity model, which forces the two regression lines representing the model predictions to be parallel. The model is thus misspecified. In addition, certain observations may have large residuals, which decreases the statistical power to detect a treatment effect. The solution is to estimate an interactive model in which the Y-Z relationship is allowed to vary between the treated and the untreated groups. This can be achieved with the following model Y = b0 + b1x + b2z + b3xz + e (4) If the coefficient b3 is statistically significant, the data suggest that the relationship between the assignment variable and the outcome variable is not the same in the two groups. Like in every interactive model, b1 represents the treatment effect for a participant with a score of zero on the assignment variable. If the assignment variable in this example were included in its raw form (i.e., uncentered), the coefficient b1 would estimate the treatment effect for a student with a score of zero on the tenth grade test. Clearly, this coefficient, and its associated F- and p-values, would be rather meaningless. To address this issue, many texts on the Regression Discontinuity Analysis suggest centering the assignment variable around the cutoff point by subtracting the value of the cutoff point (here: 71) from every student's score on the assignment variable. Now, b1 represents the treatment effect for a student with a tenth grade test score of 71. Note that this effect is not the average treatment effect, making this a suboptimal approach. This approach will lead researchers to underestimate the average treatment effect when individuals whose scores on the assignment variable are close to the cutoff point benefit

comparatively less from the treatment (Figure 3a), and overestimate the average treatment effect when individuals whose scores on the assignment variable are close to the cutoff point benefit comparatively more from the treatment (Figure 3b). A better data-analytic strategy is to center the assignment variable around the average score in the treatment group (here: 65). With this form of centering, the coefficient b1 will represent the treatment effect for the typical person within the treatment group, accurately reflecting the average treatment effect. Regardless of the type of centering that is done with the assignment variable, the coefficient b3 indicates whether the relationship between the assignment and outcome variables is different in the treatment and no treatment conditions. In practice, it is virtually impossible to distinguish the case in which there is a curvilinear relationship between assignment variable and outcome variable and no treatment effect (Figure 2a) and the case in which there is a linear relationship between assignment variable and outcome variable, an average treatment effect, and an assignment variable by treatment interaction caused by the fact that the treatment is less effective for participants with scores close to the cutoff point (Figure 3a). Both the polynomial model (2) and the interactive model (4) will fit the data quite well. Depending on the spread of the scores on the assignment variable, the researchers may be able to demonstrate that there is no evidence for curvilinearity among individuals in the untreated group. They may attempt to demonstrate that the interactive model fits the data better: showing that it has a smaller sum of squared errors, fewer outliers, and violates fewer model assumptions, or by conducting a log-likelihood test showing that the observed results are more likely under the interactive hypothesis than under the curvilinear hypothesis (Glover and Dixon, 2004).

But researchers should be aware that the two interpretations are hard to distinguish empirically in a given dataset, and ultimately they have to use theoretical arguments and refer to prior studies if they end up favoring one interpretation over the other. For example, in the hypothetical example, it makes little sense that individuals with very low scores on the tenth grade test would score more highly on the self-efficacy measure if the test prep class had no effect. The researchers could argue that the interactive model is more logical than the quadratic model. Statistical and practical considerations The researchers might imagine that students scores will also be affected by factors like parents educational level or motivation to attend college. Covariates like these can simply be added to the regression equation. If a covariate coefficient is significant, it indicates that there is relationship between the covariate and the outcome over and above the effect of the treatment and the assignment variable. If the researchers are interested in exploring the effects of the treatment on individuals who are not part of the focal treatment group, they may choose to use a probabilistic assignment rule to decide who receives the treatment. This method contains elements of both a known assignment rule and random assignment: a certain proportion of participants on either side of the cutoff are given the treatment. For example, a researcher may decide that half of the students who scored in the bottom 40% on the test and one sixth of those in the upper 60% are randomly chosen for the test prep class. Note that it is impossible to use a dichotomous variable as the assignment variable, because such a variable will be perfectly confounded with the treatment variable: For

example, in a study in which gender is the assignment variable, it would be impossible to say whether the observed differences are driven by treatment or gender. The conclusion validity of studies with a regression discontinuity design is lower than that of randomized experiments, and a much larger sample size is required to achieve the same level of statistical power for two reasons. First, finding an effect of the treatment on the outcome variable over and above the effect of the assignment variable is difficult given that the treatment and assignment variables are, by definition, highly correlated. When the effects of multiple correlated predictors are estimated, the standard errors of regression coefficients are large, resulting in lower statistical power. Second, in practice, the treatment and comparison groups tend to be very different in size (e.g., people with an IQ over 150, families in the bottom 10% of household income). This imbalance between groups also decreases the power of the analysis. The conclusions that can be drawn from results of studies with a regression discontinuity design are limited in scope because any treatment effects can only be assumed to hold for the treatment group, and often not all alternative explanations can be ruled out. Including relevant covariates can help reveal the true effect of a treatment on the outcome of interest: if the relationship between the test prep class and self-efficacy persists when controlling for parents education level, the researchers can have more confidence that the treatment is having the observed effect. A number of ethical and practical concerns make research utilizing regression discontinuity designs rare or challenging. First, universal application of an assignment rule is difficult and, in some cases, unethical. By setting a cutoff point, a researcher ultimately decides who is deserving of a given treatment. Perhaps a student who scored just above the

cutoff point for receiving the test prep class will lose motivation to go to college. Furthermore, when a set cutoff point is made public, it begins to lose its meaning (e.g., people lying on their taxes to qualify for government programs). As a result, the scores on the assignment variable may contain a lot of error, hampering one s ability to reach accurate conclusions. Finally, in many situations where a regression discontinuity design is being used, a randomized experiment would be more effective, depending on the questions the researchers are interested in answering. If the school is hoping to assess whether the test prep class should be made mandatory for all students, a randomized experiment would make more sense than a regression discontinuity design, because conclusions from such a study can be assumed to hold for all individuals in the population of interest. The regression discontinuity design belongs to the family of so-called quasiexperimental designs. Other designs in this family are the "nonequivalent control group design" and the "interrupted time-series design". Like other quasi-experimental designs, the regression discontinuity design has less internal validity and less conclusion validity than a randomized experiment. However, it allows us to draw causal conclusions with greater confidence than a post-only correlational design or a simple pretest-posttest design. And yet, some researchers doing field research employ these latter designs when random assignment is not feasible. They should be aware that better alternatives, such as the regression discontinuity design, are available to them. Mitchell R Campbell and Markus Brauer See also Correlation (or covariance), Multicollinearity, Power Analysis, Quasi-experimental Design

Further readings Cook, T.D., & Campbell, D.T. (1979). Quasi-experimentation: Design & analysis issues for field settings. Boston, MA: Houghton Mifflin. Glover, S., & Dixon, P. (2004). Likelihood ratios: A simple and flexible statistic for empirical psychologists. Psychonomic Bulletin & Review, 11(5), 791-806. Judd, C.M., & Kenny, D.A. (1981). Estimating the effects of social intervention. Cambridge, UK: Cambridge University Press. Shadish, W.R., & Cook, T.D. (2009). The renaissance of field experimentation in evaluating interventions. Annual Review of Psychology, 60, 607-629.