Regression Discontinuity Design

Regression Discontinuity Design Units are assigned to conditions based on a cutoff score on a measured covariate, For example, employees who exceed a cutoff for absenteeism could receive treatment, and those below do not. The effect is measured as the discontinuity between treatment and control regression lines at the cutoff (it is not the group mean difference). Here is the theory behind RDD in graphs using simulated data:

Imagine an ordinary scatterplot between, say, pretest and posttest. Not surprisingly, pretest and posttest are positively correlated. [Notice, this graph could be from RDD, or from RE (more later).]

Now imagine what would happen if everyone with a pretest score 50 received a treatment designed to improve posttest scores. How would the scatterplot change? The answer is: It depends on whether the treatment is effective or not. This example uses a pretest as the assignment variable, but :

The Assignment Variable Can be any variable A measure of need or merit But also, the order in which people walk in the door The last four digits of your SSN AV s yield more power the more highly correlated they are with the outcome (just like any covariate). Now back to our example of how the treatment affects the scatterplot

If the treatment had no effect, the scatterplot would not change at all (except for adding the cutoff line at 50).

If the treatment worked, however, the posttest score of everyone in the treatment group would increase (move up vertically let us temporarily assume a constant treatment effect for all cases), but the control dots would not move.

The size of the discontinuity between regression lines is the size of the effect In this simple case, it is hard to come up with a plausible alternative explanation for the effect, especially at the cutoff where people are so similar. (But if another program uses exactly the same cutoff, as in income eligibility, then )

But where should we measure the size of the discontinuity? At the cutoff: because (1) that is where we usually have the most observations, and (2) that is where units are most similar to each other.

But hypothetically, we could measure the effect (the discontuity) anywhere on the pretest continuum. To do so, we would (1) project a hypothetical counterfactual control group regression line into the treatment space, and then (2) measure the discontinuity between the observed treatment group and the counterfactual control: BUT.

(1) Such extrapolations of hypothetical counterfactuals are risky (more later). E.g., suppose the real counterfactual should have been nonlinear. (2) And we have relatively fewer observations the further we project from the cutoff, so our estimate may be less precise. So we prefer to measure the effect at the cutoff.

RDD Effect = RE Effect A discontinuity between regression lines is an unfamiliar definition of an effect. In a RE, we usually think of the effect as the difference in group means. Not so for RDD But the effect in an RE can also be conceptualized as a regression discontinuity.

If X = treatment and O = control, and if the treatment had no effect, then the scatterplot between pretest and posttest would not change (just as in RDD). Notice in the RE, unlike RDD, we have X s above and below 50, and O s above and below 50 there is no assignment cutoff

If the treatment is effective in the RE, then all the X s would move up but the O s would not. The RE effect can be thought of as a weighted average of the discontinuity between the X regression line and the O regression line at all points along the pretest axis. Notice also, we don t have to project a counterfactual regression line because we have a T and C regression line at all points along the pretest continuum.

Two Real Examples

4.9 4.8 Visits with physician per person per year 4.7 4.6 4.5 4.4 4.3 4.2 4.1 4 Under $3,000 $3,000 -$4,999 $5,000 -$6,999 $7,000 -$9,999 $10,000 -$14,999 $15,000 or more Income Level The Effects of Medicaid on Physician Visits: A Regression Discontinuity Design

Outcome SOPS on vertical axis AV is risk of psychosis scale: Scale of Prodromal Symptoms (SOPS) Control Group Treatment Group FACT Assignment variable SOPS on horizontal axis

Ways of thinking about RD Thinking of it as similar (but not identical to) a randomized experiment at cutoff Thinking of it as a completely known assignment process, again like a randomized experiment

Advantages When properly implemented and analyzed, RD yields an unbiased estimate of treatment effect (see Rubin, 1977). Units are assigned to treatment based on their need for treatment (or how much they merit a reward), consistent with the way in which many policies are implemented.

Disadvantages Statistical precision is considerably less than a randomized experiment of the same size (2.5x less precise, need 3-5x as many cases). Careful attention to power is crucial. Effects are unbiased only if the functional form of the relationship between the assignment variable and the outcome variable is correctly modeled, including: Nonlinear Relationships Interactions Fuzzy Discontinuity: Failure to adhere to the cutoff

Improving Power Place the cutoff at the mean Using all the standard methods to improve power (e.g., add covariates). Combining randomized and nonrandomized designs

Nonlinearities in RDD In a regression discontinuity design, we measure the size of the effect as the size of the discontinuity in regression lines at the cutoff:

The size of the discontinuity at the cutoff is the size of the effect.

Threat to SCV: Nonlinearities in Functional Form Anything that affects the size of that discontinuity other than treatment is a possible threat to validity. In the example we just used, we assumed that the relationship between the assignment variable and the outcome was linear our regression lines are straight lines. But suppose the functional form is not linear? The two things most likely to cause this problem are: Nonlinear Relationships between the assignment variable and the outcome Interactions between the assignment variable and treatment.

Here we see a discontinuity between the regression lines at the cutoff, which would lead us to conclude that the treatment worked. But this conclusion would be wrong because we modeled these data with a linear model when the underlying relationship was nonlinear.

If we super-impose a nonlinear regression line 1 onto the data, a line that seems to match the curves in the data pretty well, we see no discontinuity at the cutoff anymore, and correctly conclude that the treatment had no effect. 1 In this case, a cubic function (X 3 )

Functional Form: Interactions Sometimes the treatment works better for some people than for others For example, it is common to find that more advantaged children (higher SES, higher pretest achievement scores) benefit more from treatment than do less advantaged children. If this interaction (between the assignment variable and treatment) is not modeled correctly, a false discontinuity will appear:

Here we see a discontinuity that suggests a treatment effect. However, these data are again modeled incorrectly, with a linear model that contains no interaction terms, producing an artifactual discontinuity at the cutoff

If we superimpose the regression lines that would have been obtained had an interaction term been included, we would find no discontinuity at the cutoff

The interpretation of this example is important to understand. The title of the graph says false treatment main effect. However, the treatment did have an interaction effect: Treatment helped children with higher scores on the assignment variable more than children with lower scores on the assignment variable

Here we see an example where the treatment had both a main effect and an interaction effect, correctly modeled. main ^

Manipulation of the AV: The Irish School Leavers Examination An exam is given to determine who continues in school and who leaves, the decision being made using a cutoff The exam is graded by graders who are aware of the cutoff and the consequences. Graders showed a marked reluctance to assign exam scores just below the cutoff point:

Cutoff Frequency Lower Scores Examination Scores Higher Scores We would expect that examination scores would be normally distributed, but due to grader reluctance to assign scores just below the cutoff, the distribution is not normal. This can lead to nonlinearities between assignment and outcome.

A Related Problem: Crossover In a randomized experiments, sometimes people fail to receive the assigned condition In RDD, this also happens failure to adhere to the cutoff Solutions are the same as in a RE, e.g., ITT IV

How Problematic is Misassignment? Just as a slightly degraded randomized experiment may still produce better results than many quasi-experiments So also a slightly misassigned RD design may produce better results than many other quasiexperiments. Rules of thumb The fewer the participants who are misassigned, the less the problem. The narrower the range of misassignment around the cutoff, the less the problem.

Coping with Fuzzy RD: Intent-to-Treat Monitor both assignment and treatment received so that you know whether violations of the cutoff occur. Intent-to-Treat: In a randomized experiment, it is standard to analyze the data according to how people were assigned, not according to which treatment they actually received: This preserves the internal validity of the design But it answers the question what is the effect of assignment to treatment, which is different from what is the effect of treatment. One can apply this same logic (ITT) to RDD as well. Just as with a randomized experiment, the more crossovers there are, the worse the problem.

Coping with Fuzzy RD: Analysis Sensitivity Analysis: Run the analysis both including and excluding crossovers If the results are similar, confidence increases. A number of authors suggest that if the number of crossovers is no more than 5%, crossovers can probably be excluded with little problem. May be worth trying some of the methods used in randomized experiments to cope with this problem, though we know of no research on this: Instrumental variables Propensity scores Selection bias models

Summary RD is the only quasi-experimental design that yields an unbiased estimate. RD can be used with both archival data and original data. But power and statistical analysis require careful attention.