Next: An Improved Method for Identifying Impacts in Regression Discontinuity Designs

Size: px
Start display at page:

Download "Next: An Improved Method for Identifying Impacts in Regression Discontinuity Designs"

Transcription

1 Next: An Improved Method for Identifying Impacts in Regression Discontinuity Designs Abstract: Mark C. Long University of Washington Box Seattle, WA (corresponding author) Jordan Rooklyn University of Washington Working Paper, August 31, 2016 This paper develops and advocates for a data-driven algorithm that simultaneously selects the polynomial specification and bandwidth combination that minimizes the predicted mean squared error at the threshold of a discontinuity. It achieves this selection by evaluating the combinations of specification and bandwidth that perform best in estimating the next point in the observed sequence on each side of the discontinuity. We illustrate this method by applying it to data with a simulated treatment effect to show its efficacy for regression discontinuity designs and reexamine the results of papers in the literature. Keywords: Program evaluation; regression discontinuity Acknowledgments: Brian Dillon, Dan Goldhaber, Jon Smith, Jake Vigdor, Ted Westling, and University of Washington seminar audience members provided helpful feedback on the paper. This research was partially supported by a grant from the U.S. Department of Education s Institute of Education Sciences (R305A140380).

2 Next: An Improved Method for Identifying Impacts in Regression Discontinuity Designs I. Introduction and Literature Review Regression discontinuity (RD) designs have become a very popular method for identifying the local average treatment effect of a program. In many policy contexts, estimating treatment effects via social experiments is not feasible due to either cost or ethical considerations. Furthermore, in many contexts, allocating a treatment on the basis of some score (often a score that illustrates the individual s worthiness of receiving the treatment) seems natural. RD holds the promise of having some of the advantages of random treatment allocation (assuming that being just above or just below the threshold score for receiving the treatment is effectively random) without the adverse complications of full-blown randomized experiments. However, RD designs present a challenge for researchers: how to identify the predicted value of the outcome (Y) as the score (X) approaches the threshold (T) from both the left and right hand side of that threshold. A number of guides to standard practice have been written during the past ten years; the highly cited guide by Lee and Lemieux (2010) provides the following guidance: 1 When the analyst chooses a parametric functional form (say, a low-order polynomial) that is incorrect, the resulting estimator will, in general, be biased. When the analyst uses a nonparametric procedure such as local linear regression essentially running a regression using only data points close to the cutoff there will also be bias.our main suggestion in estimation is to not rely on one particular method or specification (p. 284) To illustrate this point, Lee and Lemieux reanalyze the data from Lee (2008) who evaluated the impact of party incumbency on the probability that the incumbent party will retain the district s seat in the next election for the U.S. House of Representatives. In this analysis, X is defined as the Democratic vote share in year t minus the vote share of the Democrats strongest opponent (virtually always a Republican) (Lee, 2008, p. 686). Lee and Lemieux estimate the treatment effect by using polynomials ranging from order zero (i.e., the average of prior values) up to a 6 th 1 For other discussions of standard methods, see Imbens and Lemieux (2008), DiNardo and Lee (2010), Jacob et al. (2012) and Van Der Klaauw (2013). 2

3 order polynomial with the same order polynomial estimated for both sides of the discontinuity and with bandwidths ranging from 1% to 100% (i.e., using all of the data). For each bandwidth, they identify the optimal order of the polynomial by selecting the one with the lowest value of the Akaike information criterion (AIC) value. And, they identify an optimal bandwidth by choosing the value of h that minimizes the mean square of the difference between the predicted and actual value of Y (p. 321). As shown in Table 2 of their paper, using the optimal bandwidth, which is roughly 5%, and the optimal order of the polynomial for this bandwidth (quadratic), the estimated effect of incumbency on the Democratic party s vote share in year t+1 is (s.e. = 0.029). While this model selection procedure has the nice feature of selecting the specification and bandwidth optimally, it has two limitations: (1) it suggests that a particular order of the polynomial and bandwidth be used on both sides of the discontinuity, and (2) the AIC evaluates the fit of the polynomial at all values of X, and doesn t attempt to evaluate the fit of the polynomial as X approaches the threshold, which is more appropriate for the RD treatment effect estimation. Gelman and Imbens (2014) argue against using high order polynomial regressions to estimate treatment effects in an RD context and instead recommend that researchers control for local linear or quadratic polynomials or other smooth functions (p. 2). We focus here on their second critique: Results based on high order polynomial regressions are sensitive to the order of the polynomial. Moreover, we do not have good methods for choosing that order in a way that is optimal for the objective of a good estimator for the causal effect of interest. Often researchers choose the order by optimizing some global goodness of fit measure, but that is not closely related to the research objective of causal inference (p. 2). The goal of our paper is to provide an optimal method for choosing the polynomial order (as well as the bandwidth) that Gelman and Imbens (2014) note is currently lacking in the literature. Gelman and Zelizer (2015) illustrate the challenges that could come from using a higherorder polynomial by critiquing a prominent paper by Chen, Ebenstein, Greenstone, and Li (2013), which we describe in greater detail below, which examines the effect of an air pollution policy on life expectancy. Gelman and Zelizer note: 3

4 [Chen et al. s] cubic adjustment gave an estimated effect of 5.5 years with standard error 2.4. A linear adjustment gave an estimate of 1.6 years with standard error 1.7. The large, statistically significant estimated treatment effect at the discontinuity depends on the functional form employed. the headline claim, and its statistical significance, is highly dependent on a model choice that may have a data-analytic purpose, but which has no particular scientific basis (pp.3-4). Gelman and Zelizer conclude that: we are not recommending global linear adjustments as an alternative. In some settings a linear relationship can make sense. What we are warning against is the appealing but misguided view that users can correct for arbitrary dependence on the forcing variable by simply including several polynomial terms in a regression (p. 6). In the case study in Section 3.3 of this paper, we re-examine the Chen et al. results using our method. We show that Gelman and Zelizer s concerns are well founded; our method shows that the estimated effect of pollution on life expectancy is much smaller. In addition to finding the most appropriate form for the specification, researchers also face the challenge of deciding whether to estimate the selected specification over the whole range of X (that is a global estimate of Y=f0(X) and Y=f1(X) where f0(.) and f1(.) reflect the function on the left and right sides of the threshold) or to estimate the selected specification over a narrower range of X near T, a local approach. Imbens and Kalyanaraman (2012) argue for using a local approach and develop a technique for finding the optimal bandwidth. The Imbens and Kalyanaraman bandwidth selection method is devised for the estimation of separate local linear regressions on each side of the threshold. They note that ad hoc approaches for bandwidth choice, such as standard plug-in and cross-validation methods are typically based on objective functions which take into account the performance of the estimator of the regression function over the entire support and do not yield optimal bandwidths for the problem at hand (p. 934). Their method, in contrast, finds the bandwidth that minimizes mean squared error at the threshold. Imbens and Kalyanaraman caution that their method, which we henceforth label IK, gives a convenient starting point and benchmark for doing a sensitivity analyses regarding bandwidth choice (p. 940) and thus they remind the user to examine the results using other bandwidths. 4

5 While the IK method greatly helps researchers by providing a data-generated method for choosing the optimal bandwidth, it does so by assuming that the researcher is using a local linear regression on both sides of the threshold. This is can introduce substantial bias if (1) a linear regression is the incorrect functional form and (2) if the treatment changes the relationship between and. Our method, thus, simultaneously selects the optimal polynomial order and the optimal bandwidth for each side of the discontinuity. We achieve this result by evaluating the performance of various combinations of order and bandwidth with performance measured as mean squared error in predicting the observed values of Y as X approaches the threshold (from either side); estimating the mean squared error at the threshold as a weighted average of prior mean squared errors with greater weight on mean squared errors close to the threshold; and identifying the specification/bandwidth combination that has the lowest predicted mean squared error at the threshold. We show that our method does modestly better than the IK method when applied to real data with a simulated treatment effect. We then apply our method to data from two prominent papers (Lee (2008) and Chen et al. (2013)) and we document the extent to which our method produces different results. 2. Method The goal of RD studies is to estimate the local average treatment effect defined as the expected change in the outcome for those whose score is at the threshold:, where is the value of if observation i is untreated and is the value of if the treatment is received. Assume that treatment occurs when. 2 Assume that there is a smooth and continuous relationship between and in the range and that this relationship can be expressed as. Likewise, assume that there is a smooth 2 Note that our method is designed for sharp regression discontinuities, where treatment is received by all those who are on one side of a threshold and not received by anyone on the other side of the threshold. In fuzzy contexts, where there is a discontinuity in the probability of receiving treatment at the threshold, one can obtain estimates of the local effect of the treatment on the treated by computing the ratio of the discontinuity in the outcome at the threshold and the discontinuity in the probability of receiving treatment at the threshold. When applied in the context of fuzzy RDs, our method will identify the intent-to-treat estimate for those at the threshold, but will not yield an estimate of the local average treatment on the treated effect. 5

6 and continuous relationship between and in the range and that this relationship can be expressed as. Assuming that the only discontinuity in the relationship between and at is due to the impact of the treatment, the estimand,, is defined as the difference of the two estimated functions evaluated at the threshold: =. Define mean square prediction error ( ) as follows:. Our goal is to select the bandwidths ( and ) and order of the polynomials ( and ) for estimating and such that is minimized 3 : (1) argmin,, = argmin,, = argmin =,, argmin =,, argmin,, 2 To this point, the minimization problem is unconstrained and standard. Imbens and Kalyanaraman (2012) add the following constraints to this problem: and 1. That is, they assume linear relationships between and in the ranges and with the treatment effect,, being identified as the jump between those two linear functions at. 3 Note that choosing a higher bandwidth allows for more data to be used in estimating., which reduces the variance of the estimated parameters. But, a larger bandwidth increases the chance that. is not constant and smooth within the range in which it is estimated. A higher polynomial order can improve the fit of the function. to the observed distribution of and, and thus lowers the bias. But, a higher polynomial order leads to increased variance of the prediction, particularly in the tails of the distribution (e.g., at ). By minimizing though the choice of these parameters, we balance between our desires for low bias and low variance. 6

7 We take a different approach, which involves a different set of simplifying assumptions. First, unlike IK, our approach allows the treatment to more flexibly change the functional relationship between and as we do not assume linear functions on either side of the discontinuity. Our method has. estimated solely on data where, and. estimated solely on data where. This approach is akin to the common practice in RD studies of estimating one regression which fully interacts the polynomial terms with. 4 Second, we simplify the minimization problem considerably by dropping the last term (i.e., 2 ). Here is our justification for doing so. Suppose that for a given choice of and, the prediction on the left side of the threshold is positive (i.e., 0. One could attempt to select and such that the prediction error on the right side of the threshold is also positive (i.e., 0 and equal to the bias on the left so as to cancel it. In fact, one could carry this further and select and such that the error on the right side of the threshold is as positive as possible and thus making the last term as negative as possible (a point Imbens and Kalyanaraman note as well). However, doing so comes at a penalty of increasing the square of the prediction error on the right side (i.e., ) and thereby results in a higher. Thus, there is little to be gained by selecting and on the basis of the last term in Equation 1. If we can ignore this term, we substantially simplify the task by breaking it into two separate problems: (2) argmin argmin argmin,,,, argmin argmin.,, The advantage of our approach is that we can directly evaluate how different choices of and perform in predicting observed outcomes before one reaches the threshold, and pick values of and that have demonstrated strong performance in terms of their mean squared prediction errors for observed values. Our key insight is that by focusing on data from one side of the 4 Note that in such models. and. are in effect estimated solely based on data from their respective sides of the threshold as the same coefficients could be obtained by separate polynomial regressions on each side of the threshold. Put differently, no information from the right hand side is being used to estimate the coefficients on the left hand side and vice-versa. 7

8 threshold only, we can use that observed data to calculate a series of s and then predict (and ) as weighted averages of observed s (and confidence intervals around the weighted averages of observed s). We recognize, however, that if the treatment does not affect the functional relationship between and (e.g.,.. ), then our method would be inefficient (but unbiased), as one would gain power to estimate the common slope parameters of. and. by using data on both sides of the threshold. Index the observed distinct values of as 1 to. Define as equal to, where is a polynomial of order that is estimated over the interval from to 1 using the observed distributions of and in this interval, reflects the number of prior observations that are used to estimate the polynomial, and is the observed value of when =. Note that this formula uses an adaptive bandwidth that is a function of (i.e., Δ ) to accommodate areas where the data are thin. Suppose that we estimated as a straight average of these calculated values of (i.e., ), and then selected the parameters and that minimized this straight average. One disadvantage of doing so would be that it would ignore variance across the values of and would not consider the number of observations of used to compute this average. 5 Less confidence should be placed in estimates of that rely on fewer or more variable observations of. Thus, rather than select the parameters and that minimize the average, we select parameters and that minimize the upper bound of an 80% confidence interval around the average (i.e., such that there is only a 10% chance that the true, unknown, mean value of the broader distribution from which our observations are drawn is greater than this upper bound). 6 A second disadvantage of a straight average is that it places equal weight on the calculated values of regardless of how far is from the threshold. So as to place more weight on the calculated values of for which is close to, we estimate as a weighted average of the calculated values of : (3), 5 The number of observations of declines by 1 for each unit increase in either or 6 As with all confidence intervals, the choice of 80% is arbitrary. Different values can be set by the user of our Stata program for executing this method (Long and Rooklyn, 2016). 8

9 where is a kernel function (defined below). We then find the parameters that solve argmin., where, is the estimated standard error of. To find these parameters, we compute for all combinations of and subject to the following constraints: and are integers; max, 1..min, 1, where and are the minimum and maximum number of prior observations the researcher is willing to allow to be used in computing ; and.., where and are the minimum and maximum polynomial orders the researcher is willing to consider, 0, and when 0, is defined as the average of the prior values of. We select the combination of and (among those that are considered) that minimize. 7 In our empirical investigations below, we use an exponential kernel, defined as follows: (4) is the base weight, and we alternately explore base weights equal to 1, 10 3, 10 6, and When 1,. is the uniform kernel which gives uniform weight to each value of when estimating. When = 10 3 (10 6 ) [10 10 ], while all s get some positive weight, 50% (75%) [90%] of the weight is placed on the last 10% of s that are closest to the threshold. That is, higher values of gives more emphasis to s closer to the threshold than further away.. We repeat this process to estimate the parameters that solve argmin,, with the only difference being that we index the observed distinct values of as to 1 so that the analysis moves from the extreme right in towards. 7 Note that if using a linear specification with only the last two data points (or any polynomial of order using the last +1 data points) there will be no variance in the estimate of Y at the threshold. If this occurs for both sides of the discontinuity, there would be no variance to the estimate of the jump at the threshold. Such a lack of variance of the difference at the discontinuity would disallow hypothesis testing (or, conclude that there is an infinite t-statistic). As this result is unsatisfactory in most contexts, the reader may want to disallow such specification/bandwidth combinations. 9

10 To illustrate our method, suppose we had a series of six data points with (X, Y) coordinates (1,12), (2,15), (3,16), (4,13), (5,10), and (6,7), and we would like to use this information to estimate the next value of Y (when X=7). These six points are shown in Panel A of Figure 1. Our task is to find the specification that generally performs well in predicting the next value of Y, and more specifically, as discussed above, has a low for X=7. The argument for imposing a limited bandwidth, and not using all of the data points to predict the next value of Y, is a presumption that there has been a change in the underlying relationship between Y and X; for example a discrete jump in the value of Y (perhaps unrelated to X), or a change in the function defining the relationship of Y=f(X). If such a change occurred, then limiting the bandwidth would (ideally) constrain the analysis to the range in which f(x) is steady. In the example discussed above, there does appear to be a change in f(x) as the function appears to become linear after X=3. Of course, this apparent change could be a mirage and the underlying relationship could in fact be quadratic with no change. If there is no change in the relationship between Y and X, then one would generally want to use all available data points to best estimate f(x). Our method for adjudicating between these specifications and bandwidth choices is to compare all possibilities based on (and the upper bound of its confidence interval). Panels B through F of Figure 1 show the performance of possible candidate estimators. The corresponding Table 1 illustrates our method, where Panel A gives the predicted values based on polynomial orders in the range 0...2, and Panel B gives the calculations of each for the feasible combinations of and. Note that since the last four observations happen to be on a line (i.e., (3,16), (4,13), (5,10), and (6,7)), the linear specification using two prior data points has no error in predicting the values of Y when X equals 5 or 6, and the same is true for either the linear or quadratic specifications using three prior values for predicting the value of Y when X equals 6. [Insert Figure 1] [Insert Table 1] Panel C of Table 1 shows the weighted averages using various kernels. A linear specification using two prior data points has the lowest weighted average MSPE using all four 10

11 base weights, as is indicated by the bolded numbers. 8 This result is not surprising given the perfect linear relation of Y and X for the last four data points. As one can see, as the base weight increases, the weighted average approaches the value of the last in the series. There is clearly a trade-off involved here. With greater weight placed on the last s in the series, one gets less bias in the estimate of at the threshold as less weight is placed on s far away from the threshold. However, relying solely on the last (i.e., ) could invite error a particular specification might accidently produce a near perfect prediction for the last values of Y before the threshold and thus have a lower, but incorrectly predict the unknown value of Y at the threshold. Panel D of Table 1 presents the upper-bound of the 80% confidence interval around. Note that the linear specification using two prior data points has the lowest upper bound for three of the four base weights (with the exception being for the uniform weight). Since high base weights produce wider confidence intervals as they increase the sample standard deviation of the weighted average, using this upper bound of the confidence interval helps avoid unhappy accidents that could occur when using only. When we apply our method to simulated data, we find that the performance is relatively insensitive to the base weight, although we favor =10 3 given its strong performance documented below. Our Stata program for executing this method (Long and Rooklyn, 2016) allows the user to (a) select the minimum number of s that must be included in the analysis ( 2), excluding from consideration combinations of bandwidth and polynomial orders that result in few observations of, and thus to avoid unhappy accidents ; (b) select the minimum and maximum order of the polynomial that the user is willing to consider, (c) select the minimum number of observations the researcher is willing to allow to be used to estimate the next observation, and (d) the desired confidence interval for. For the rest of the paper (excluding Section 3.3), we set the minimum number of MSPEs to five, the minimum and maximum polynomial orders to zero and five, the minimum number of observations to five, and the confidence interval to 80%. 8 If there are ties for the lowest, which did not occur in Table 1, we select the specification with the lowest order polynomial (and ties for a given specification are adjudicated by selecting the smaller bandwidth). We make these choices given the preference in the literature for narrower bandwidths and lower-order polynomials. 11

12 In the next section, we illustrate the method by applying it to simulated data and use the method to re-evaluate examples from the existing literature. 3. Case Studies That Illustrate the Method 3.1 Case Study 1: Method Applied to Jacob et al. (2012) with a Simulated Treatment Jacob, Zhu, Somers, and Bloom (2012) provide a primer on how to use RD methods. They illustrate contemporary methods using a real data set with a simulated treatment effect, described as follows: The simulated data set is constructed using actual student test scores on a seventh-grade math assessment. From the full data set, we selected two waves of student test scores and used those two test scores as the basis for the simulated data set. One test score (the pretest) was used as the rating variable and the other (the posttest) was used as the outcome. We picked the median of the pretest (= 215) as the cut-point (so that we would have a balanced ratio between the treatment and control units) and added a treatment effect of 10 scale score points to the posttest score of everyone whose pretest score fell below the median. (pp. 7-8). We utilize these data provided by Jacob et al. to illustrate the efficacy of our method. Since the test scores are given in integers, and since the number of students located at each value of the pretest scores differs, we add a frequency weight to the regressions in constructing our predicted values and the weight for computing the weighted average MSPE becomes, where is the number of observations that have that value of X. In the first panel of Table 2, we estimate the simulated treatment effect (which should be -10 by construction) with the threshold at 215. Our method selects a linear specification using 23 data points for the left hand side and a quadratic specification with 33 data points for the right hand side (these selections are not sensitive to the base weight). Compared to the IK method, which selects a bandwidth of 6.3 for both sides, our method selected a much larger bandwidth. 9 9 To estimate these IK bandwidths and resulting treatment effect estimations, we use the rd command for Stata that was developed by Nichols (2011) and use local linear regressions (using 12

13 Our method outperforms IK with a slightly better estimate of the treatment effect (-9.36 versus ) and smaller standard errors (0.73 versus 1.27). The much smaller standard error provides our method more power than IK to correctly identify smaller treatment effects. [Insert Table 2] The second and third panels of Table 3 reset the threshold for the simulated effect to 205 and 225, which are respectively at the 19 th and 77 th percentiles of the distribution of X. With the threshold at 205, our model produces estimates of the simulated treatment effect in the range of to with base weights of 1 to 10 6 and with a base weight of Regardless of the base weight, our method selects a quadratic specification using the first 47 observations on the right side of the discontinuity. In contrast, the IK method uses a bandwidth of only 7.3 on both sides of the discontinuity and yields an inferior estimates of the treatment effect (-8.25) and with a higher standard error. Our method and the IK method produce comparable estimates of the treatment effect when the threshold is set at 225 ( to for our method versus for IK), yet our method again has smaller standard errors due to more precision in the estimates of the regression line. Figure 3 illustrates our preferred specifications and bandwidths for these three thresholds using 10 3 as the base weight. [Insert Figure 3] The next analysis, which is shown in Table 3, evaluates how our method performs when there is a zero simulated treatment effect. We restore the Jacob et al. data to have no effect and then estimate placebo treatment effects with the threshold set at 200, 205,, 230. We are testing whether our method generates false positives: apparent evidence of a treatment effect when there is no treatment. Our model yields estimated treatment effects that are generally small and lie in the range of to The bad news is that 2 of the 7 estimates are significant at the 10% level (1 at the 5% level). Thus, a researcher who uses our method would be more likely to incorrectly claim a small estimated treatment effect to be significant. The IK method does better at not finding significant placebo effects in the Jacob et al. (2012) data (none of the IK estimates are significant). However, the IK estimates have a broader range of to Thus the researcher using the IK method would be more inclined to incorrectly conclude that the triangular ( edge ) kernel weights) within the selected bandwidth. We also find nearly identical results using the rdob program for Stata written by Fuji, Imbens, and Kalyraman (2009). 13

14 treatment had a sizable effect even when the policy had no effect. The mean absolute error for this set of estimates is 0.76 using our method versus 0.97 using the IK method. The only reason that our method is more likely to incorrectly find significant effects is our lower standard errors, which lie in the range of 0.68 to 1.10 versus the IK standard errors, which lie in the range of 1.22 to Thus, we conclude that our higher rate of incorrectly finding significant effects is not a bug but a feature. The researcher who uses our method and finds an insignificant effect can argue that it s a well estimated zero, while that advantage is less likely to be present using IK. [Insert Table 3] To further investigate the efficacy of our method and to compare it to IK s method, we augment the Jacob et al. (2012) data by altering the outcome as follows: _ This cubic augmentation increases up to a local maxima of 7.7 points at 206, then declines to a local minima of at 239, and then curves upward again. We then estimate simulated treatment effects of 10 points for those below various thresholds, alternatively set at 200, 205,, 230. This simulated treatment effect added to an underlying cubic relation between and _ should be harder to identify using the IK method as it relies on an assumption of local linear relations. We furthermore evaluate our method relative to IK where the augmentation of posttest only occurs on the left or right side of the specification. Note that since a treatment could have heterogeneous effects, and thus larger or smaller effects away from the threshold, it is possible for the treatment to not only have a level effect at the threshold, but also alter the relationship between the outcome (Y) and the score (X). 10 Our method should have a better ability to handle such cases, and to thus derive a better estimate of the local effect at the threshold. The results are shown in Table 4 and the corresponding graphical representations are shown in Figure 4. In Panel A of Table 4, we show the results with the cubic augmentation applied to both sides of threshold. Across the seven estimations, our method produces an average absolute error of 0.94, which is a 7% improvement on the absolute error found using the IK method, where the average absolute error was In Panels B and C of Table 4, we show the results with the cubic augmentation applied to the left and right sides of threshold, 10 When we add the augmentation to the left hand side only, we level-shift the right hand side up or down so that there is a simulated effect of -10 points at the threshold, and vice-versa. 14

15 respectively. Our method is particularly advantageous when the augmentation is applied to the right side for these estimations, our method produces an average absolute error that is 30% lower than the average absolute error using the IK method. As shown in Figure 4, the principal advantage of our method is the adaptability of the bandwidth and curvature given the available evidence on each side of the threshold. [Insert Table 4] [Insert Figure 4] Having now (hopefully) established the utility of our method, in the next two sections we apply the method to two prominent papers in the RD literature. 3.3 Case Study 2: Method Applied to Data from Lee (2008) Our second case study applies our method to re-estimate findings in Lee (2008) discussed in Section 1. First, we re-examine the result shown in Lee s Figure 2a. Y is an indicator variable that equals 1 if the Democratic Party won the election in that district in year t+1. The key identifying assumption is that there is a modest random component to the final vote share (e.g., rain on Election Day) that cannot be fully controlled by the candidates and that, effectively, "whether the Democrats win in a closely contested election is...determined as if by a flip of a coin" (p. 684). Lee s data comes from U.S. Congressional election returns from 1946 to 1998 (see Lee (2008) for full description of the data). 11 The Lee data present a practical challenge for our method. It contains 4,341 and 5,332 distinct values of X on the left and right sides of the discontinuity. Using every possible number of prior values of X to predict Y at all distinct values of X, while possible, requires substantial computer processing time. To reduce our processing time, we compute the average value of X and Y within 200 bins on each side of the discontinuity, with each bin having a width of 0.5% (since X ranges from -100% to +100% with the discontinuity at 0%). Binning the data as such has the disadvantage of throwing out some information (i.e., the upwards or downwards sloping relationship between X and Y within the bin); yet, for most practical applications this information loss is minor if the bins are kept narrow. 11 We obtained these data on January 2, 2015 from 15

16 To estimate the treatment effect, Lee applies a logit with a 4 th order polynomial in the margin of victory, separately, for the winners and the losers (Lee, 2001, p. 14) using all of the data on both sides of the discontinuity. Given that our binning results in fractional values that lie in the interval from 0% to 100%, we use a generalized linear model using a logit link function as recommended by Papke and Woolridge (1996) for modeling proportions. 12 We find that a specification that is linear and uses less than half of the data points is best for X'β for both the left and the right sides (64 and 28 values on the right and left respectively, with the corresponding bandwidth range for the assignment variable being -32.0% to 13.0%). 13 We estimate that the Democratic Party has a 15.3% chance of winning the next election if they were barely below 50% on the prior election, and a 57.7% chance of winning the next election if they are just to the right of the discontinuity. Figure 5 shows the estimated curves. Our estimate of the treatment effect (i.e., barely winning the prior election) is 42.3% (s.e. = 3.5%) is smaller than Lee s estimate, which is found in Lee (2001): 45.0% (s.e. = 3.1%). [Insert Figure 5] Next, we re-examine the result shown in Lee s Figure 4a, where Y is now defined as the Democratic Party s vote share in year t+1. Lee (2008) used a 4 th order polynomial in X for each side of the discontinuity and concluded that the impact of incumbency on vote share was (s.e. = 0.011). That is, being the incumbent raised the expected vote share in the next election by 7.7 percentage points. Applying our method (as shown in Figure 6), we find that the best specification/bandwidth choice uses a quadratic specification based on the last 171 observations on the left hand side and a 5 th order polynomial based on the 188 observations to the right of the discontinuity (with the corresponding bandwidth range for the assignment variable being -94.8% to 93.7%). Our estimated treatment effect is smaller than Lee s and has a smaller standard error: (s.e. = 0.003). [Insert Figure 6] Lee s (2008) study was also reexamined by Lee and Lemieux (2010) and Imbens and Kalyanaraman (2011). We noted in Section 1 that according to Lee and Lemieux s analysis, the 12 See also Baum (2008). 13 After binning the data, we end up with 145 distinct values of X on the left side as some bins have no data (i.e., no elections in which the Democratic vote share in year t minus the strongest opponents share fell in that range). 16

17 optimal bandwidth/specification resulted in a larger estimate of the effect of incumbency (0.100) and a larger standard error (0.029). Scanning across their Table 2, the smallest estimated effect that they found was Thus, our estimate is not outside of the range of their estimates. Nonetheless, our estimate is smaller than what would be selected using Lee and Lemieux s twostep method for selecting the optimal bandwidth and then optimal specification for that bandwidth. Imbens and Kalyanaraman s found that the optimal bandwidth for a linear specification on both sides was 0.29 and using this bandwidth/specification produced an estimate of the treatment effect of (s.e. = 0.008). Again, their preferred estimate is somewhat larger than the estimate found using our method and with a higher standard error Case Study 5: Method Applied to Data from Chen, Ebenstein, Greenstone, & Li (2013) Our final case study is a replication of a prominent paper by Chen et al. (2013) that alarmingly concludes that an arbitrary Chinese policy that greatly increases total suspended particulates (TSPs) air pollution is causing the 500 million residents of Northern China to lose more than 2.5 billion life years of life expectancy (p ). This policy established free coal to aid winter heating of homes north of the Huai River and Qinling Mountain range. Chen et al. used the distance from this boundary as the assignment variable with the treatment discontinuity being the border itself. As shown in the first column of our Figure 7 (which reprints their Figures 2 and 3), Chen et al. estimate that being north of the boundary significantly raises TSP by 248 points and significantly lowers life expectancy by 5.04 years. These estimates are also shown in Panel A of Table 5. [Insert Figure 7] [Insert Table 5] We have attempted to replicate these results. Unfortunately, the primary data are proprietary and not easy to obtain; permission for their use can only be granted by the Chinese 14 Note however, that when we apply the Stata programs written by Fuji, Imbens, and Kalyraman (2009) and Nichols (2011) that produce treatment estimates using the Imbens and Kalyanaraman (2011) method, we find the optimal bandwidth for a linear specification on both sides was 0.11 and using this bandwidth/specification produced an estimate of the treatment effect of (s.e. = 0.002), which are quite similar to our estimates. 17

18 Center for Disease Control. 15 Rather than use the underlying primary data, we are treating the data shown in their Figures 2 and 3 as if it were the actual data. To do so, we have manually measured the X and Y coordinates of each data point in these figures as well as the diameter of each circle (where the circle s area is proportional to the population of localities represented in the bin). 16 The middle column of Figure 7 and Panel B of Table 5 present our replication applying their specification (a global cubic polynomial in latitude with a treatment jump at the discontinuity) to these data. We obtain similar results, although the magnitudes are smaller and less significant; our replication of their specification produces estimates that being north of the boundary raises TSP by 178 points (p-value 0.069) and insignificantly lowers life expectancy by 3.94 years (p-value 0.389). Comparing the first and second columns of Figure 7, note that the shapes of the estimated polynomial specifications are generally similar with the modest discrepancies showing that there is a bit of information lost by binning the data. In Panel C of Table 5, we apply our method to estimate these treatment effects. 17 We find significant effects on TSP, with TSP rising significantly 146 points (using IK s method, TSP is found to rise significantly by 197 points). Thus, Chen et al. s conclusion that TSP rises significantly appears to be reasonable and robust to alternative specifications. However, as shown in the second column in Panel D of Table 5, the estimated treatment impact on Life Expectancy is much smaller; we estimate that being north of the boundary significantly lowers life expectancy by 0.40 years, which is roughly one-tenth the effect size we estimated using their global cubic polynomial specification. The fragility of these results should not be surprising given a visual inspection of the scatterplot, which does not reveal a clear pattern to the naked eye. In fact, for the right hand side of the threshold for Life Expectancy, we find that a simple averaging of the 8 data points to the left of the threshold gives the best prediction at the threshold. We agree with Gelman and Zelizer s (2015) critique that the result 15 Personal communication with Michael Greenstone, March 16, We have taken two separate measurements for each figure and use the average of these two measurements for the X and Y coordinates and the median of our four measurements of the diameter of each circle. 17 Given that there are a small number of observations of and on each side of the discontinuity, we placed no constraint on the minimum number of observations or the minimum number of MSPEs that are required to be included. We considered polynomials of order 0 to 5. 18

19 indicates to us that neither the linear nor the cubic nor any other polynomial model is appropriate here. Instead, there are other variables not included in the model which distinguish the circles in the graph (p.4). 4. Conclusion While regression discontinuity design has over a 50 year history for estimating treatment impacts (going back to Thistlewaite and Campbell (1960)), the appropriate method for selecting the specification and bandwidth to implement the estimation has yet to be settled. This paper s contribution is the provision of a method for optimally and simultaneously selecting a bandwidth and polynomial order for both sides of a discontinuity. We identify the combination that minimizes the estimated mean squared predicted error at the threshold of a discontinuity. Our paper builds on Imbens and Kalyanaraman (2012), but is different from their approach which solves for the optimal bandwidth assuming that a linear specification will be used on both sides of the discontinuity. Our insight is that one can use the information on each side of the discontinuity to see what bandwidth/polynomial-order combinations do well in predicting the next data point as one moves closer and closer to the discontinuity. We apply our paper to reexamine several notable papers in the literature. While some of these paper s results are shown to be robust, others are shown to be more fragile, suggesting the importance of using optimal methods for specification and bandwidth selection. 19

20 References Baum, C.F. (2008). Modeling proportions. Stata Journal 8: Chen, Y., Ebenstein, A., Greenstone, M., and Li, H. (2013). Evidence on the impact of sustained exposure to air pollution on life expectancy from China s Huai River policy. Proceedings of the National Academy of Sciences 110, DiNardo, J., and Lee, D. (2010). Program evaluation and research designs. In Ashenfelter and Card (eds.), Handbook of Labor Economics, Vol. 4. Fuji, D., Imbens, G. and Kalyanaraman, K. (2009). Notes for Matlab and Stata regression discontinuity software. ession_discontinuity_software. Software downloaded on July 2, 2015 from Gelman, A., and Imbens, G. (2014). Why high-order polynomials should not be used in regression discontinuity designs. National Bureau of Economic Research, Working Paper 20405, Gelman, A., and Zelizer, A. (2015). Evidence on the deleterious impact of sustained use of polynomial regression on causal inference. Research & Politics, 2(1), 1-7. Imbens, G., and Kalyanaraman, K. (2012.) Optimal bandwidth choice for the regression discontinuity estimator. Review of Economic Studies, 79, Imbens, G., and Lemieux, T. (2008). Regression discontinuity designs: A guide to practice. Journal of Econometrics 142, Jacob, R., Zhu, P., Somers, M., and Bloom, H. (2012). A practical guide to regression discontinuity, MDRC, Accessed via Lee, D.S. (2001). The electoral advantage to incumbency and voters valuation of politicians experience: A regression discontinuity analysis of elections to the U.S. National Bureau of Economics Research, Working Paper Lee, D.S. (2008). Randomized experiments from non-random selection in U.S. House elections. Journal of Econometrics, 142, Lee, D. S., and Lemieux, T. (2010). Regression discontinuity designs in economics. Journal of Economic Literature 48, Long, M.C., and Rooklyn, J. (2016). Next: A Stata program for regression discontinuity. University of Washington. Nichols, A. (2011). rd 2.0: Revised Stata module for regression discontinuity estimation. Papke, L.E., and Wooldridge, J. (1996.) Econometric methods for fractional response variables with an application to 401(k) plan participation rates. Journal of Applied Econometrics 11: Thistlewaite, D., and Campbell, D. (1960). Regression-discontinuity analysis: An alternative to the ex post facto experiment". Journal of Educational Psychology 51(6): Van Der Klaauw, W. (2008). Regression-discontinuity analysis: A survey of recent developments in economics. Labour 22,

21 Figure 1: Predicting the next value after six observed data points Y Panel A: Data available to predict next Y Panel B: Predicting Y given X 2 using prior value of X X Y X Panel C: Predicting Y given X 3 using prior two values of X Panel D: Predicting Y given X 4 using prior three values of X Y Y X X Panel E: Predicting Y given X 5 using prior four values of X Panel F: Predicting Y given X = 6 using prior five values of X Y Y X X 21

22 Table 1: Computing Mean Squared Prediction Error (MSPE) and Selecting the Optimal Specification and Bandwidth Polynomial order ( 0 ) Number of prior data points ( 0 ) Panel A: Prediction of Y X Y Panel B: Error Squared X Panel C: Predicted Value of MSPE given X = Threshold (i.e., Weighed Average of MSPEs) Base Wgt. = 1 (Uniform) Base Wgt. = 10^ Base Wgt. = 10^ Base Wgt. = 10^ Panel D: Upper Bound of 80% Confidence Interval Around Predicted Value of MSPE given X = Threshold Base Wgt. = 1 (Uniform) Base Wgt. = 10^ Base Wgt. = 10^ Base Wgt. = 10^

23 Table 2: Estimating a Simulated Treatment Effect of -10 with Jacob et al. (2012) Data Simulated Treatment Effect = -10 Threshold = 215 Threshold = 205 Threshold = 225 Base Weight 1 1,000 10^6 10^10 1 1,000 10^6 10^10 1 1,000 10^6 10^10 Left Side of Threshold Optimal Specification Linear Linear Linear Linear Linear Linear Linear Linear Cubic Cubic Cubic Cubic Optimal # Prior Observations Total # Prior Observations Right Side of Threshold Optimal Specification Quadratic Quadratic Quadratic Quadratic Quadratic Quadratic Quadratic Quadratic Linear Linear Linear Linear Optimal # Prior Observations Total # Prior Observations Our Estimate of Treatment Effect Estimate s.e. (Estimate) (0.73) (0.73) (0.73) (0.73) (0.93) (0.93) (0.95) (1.38) (0.98) (0.98) (0.98) (1.07) Using Imbens and Kalyanaraman's (2012) Optimal Bandwidth for Linear Specification Bandwidth Estimate s.e. (Estimate) (1.27) (1.50) (1.44) 23

24 Figure 2: Selection of Specification and Bandwidth Using Data from Jacob et al. (2012) With Simulated Treatment Effect of -10 at Various Thresholds Simulated Threshold = 205 Estimated Treatment Effect = (s.e. = 0.24) Simulated Threshold = 215 Estimated Treatment Effect = (s.e. = 0.18) Simulated Threshold = 225 Estimated Treatment Effect = (s.e. = 0.14)

What is: regression discontinuity design?

What is: regression discontinuity design? What is: regression discontinuity design? Mike Brewer University of Essex and Institute for Fiscal Studies Part of Programme Evaluation for Policy Analysis (PEPA), a Node of the NCRM Regression discontinuity

More information

Regression Discontinuity Analysis

Regression Discontinuity Analysis Regression Discontinuity Analysis A researcher wants to determine whether tutoring underachieving middle school students improves their math grades. Another wonders whether providing financial aid to low-income

More information

An Introduction to Regression Discontinuity Design

An Introduction to Regression Discontinuity Design An Introduction to Regression Discontinuity Design Laura Wherry Assistant Professor Division of GIM & HSR RCMAR/CHIME Methodological Seminar November 20, 2017 Introduction to Regression Discontinuity Design

More information

Unit 1 Exploring and Understanding Data

Unit 1 Exploring and Understanding Data Unit 1 Exploring and Understanding Data Area Principle Bar Chart Boxplot Conditional Distribution Dotplot Empirical Rule Five Number Summary Frequency Distribution Frequency Polygon Histogram Interquartile

More information

Student Performance Q&A:

Student Performance Q&A: Student Performance Q&A: 2009 AP Statistics Free-Response Questions The following comments on the 2009 free-response questions for AP Statistics were written by the Chief Reader, Christine Franklin of

More information

BOOTSTRAPPING CONFIDENCE LEVELS FOR HYPOTHESES ABOUT REGRESSION MODELS

BOOTSTRAPPING CONFIDENCE LEVELS FOR HYPOTHESES ABOUT REGRESSION MODELS BOOTSTRAPPING CONFIDENCE LEVELS FOR HYPOTHESES ABOUT REGRESSION MODELS 17 December 2009 Michael Wood University of Portsmouth Business School SBS Department, Richmond Building Portland Street, Portsmouth

More information

Empirical Knowledge: based on observations. Answer questions why, whom, how, and when.

Empirical Knowledge: based on observations. Answer questions why, whom, how, and when. INTRO TO RESEARCH METHODS: Empirical Knowledge: based on observations. Answer questions why, whom, how, and when. Experimental research: treatments are given for the purpose of research. Experimental group

More information

School Autonomy and Regression Discontinuity Imbalance

School Autonomy and Regression Discontinuity Imbalance School Autonomy and Regression Discontinuity Imbalance Todd Kawakita 1 and Colin Sullivan 2 Abstract In this research note, we replicate and assess Damon Clark s (2009) analysis of school autonomy reform

More information

MEA DISCUSSION PAPERS

MEA DISCUSSION PAPERS Inference Problems under a Special Form of Heteroskedasticity Helmut Farbmacher, Heinrich Kögel 03-2015 MEA DISCUSSION PAPERS mea Amalienstr. 33_D-80799 Munich_Phone+49 89 38602-355_Fax +49 89 38602-390_www.mea.mpisoc.mpg.de

More information

BOOTSTRAPPING CONFIDENCE LEVELS FOR HYPOTHESES ABOUT QUADRATIC (U-SHAPED) REGRESSION MODELS

BOOTSTRAPPING CONFIDENCE LEVELS FOR HYPOTHESES ABOUT QUADRATIC (U-SHAPED) REGRESSION MODELS BOOTSTRAPPING CONFIDENCE LEVELS FOR HYPOTHESES ABOUT QUADRATIC (U-SHAPED) REGRESSION MODELS 12 June 2012 Michael Wood University of Portsmouth Business School SBS Department, Richmond Building Portland

More information

Journal of Political Economy, Vol. 93, No. 2 (Apr., 1985)

Journal of Political Economy, Vol. 93, No. 2 (Apr., 1985) Confirmations and Contradictions Journal of Political Economy, Vol. 93, No. 2 (Apr., 1985) Estimates of the Deterrent Effect of Capital Punishment: The Importance of the Researcher's Prior Beliefs Walter

More information

Discrimination and Generalization in Pattern Categorization: A Case for Elemental Associative Learning

Discrimination and Generalization in Pattern Categorization: A Case for Elemental Associative Learning Discrimination and Generalization in Pattern Categorization: A Case for Elemental Associative Learning E. J. Livesey (el253@cam.ac.uk) P. J. C. Broadhurst (pjcb3@cam.ac.uk) I. P. L. McLaren (iplm2@cam.ac.uk)

More information

Understanding Regression Discontinuity Designs As Observational Studies

Understanding Regression Discontinuity Designs As Observational Studies Observational Studies 2 (2016) 174-182 Submitted 10/16; Published 12/16 Understanding Regression Discontinuity Designs As Observational Studies Jasjeet S. Sekhon Robson Professor Departments of Political

More information

Results & Statistics: Description and Correlation. I. Scales of Measurement A Review

Results & Statistics: Description and Correlation. I. Scales of Measurement A Review Results & Statistics: Description and Correlation The description and presentation of results involves a number of topics. These include scales of measurement, descriptive statistics used to summarize

More information

Chapter 11: Advanced Remedial Measures. Weighted Least Squares (WLS)

Chapter 11: Advanced Remedial Measures. Weighted Least Squares (WLS) Chapter : Advanced Remedial Measures Weighted Least Squares (WLS) When the error variance appears nonconstant, a transformation (of Y and/or X) is a quick remedy. But it may not solve the problem, or it

More information

Lecture II: Difference in Difference. Causality is difficult to Show from cross

Lecture II: Difference in Difference. Causality is difficult to Show from cross Review Lecture II: Regression Discontinuity and Difference in Difference From Lecture I Causality is difficult to Show from cross sectional observational studies What caused what? X caused Y, Y caused

More information

A Spreadsheet for Deriving a Confidence Interval, Mechanistic Inference and Clinical Inference from a P Value

A Spreadsheet for Deriving a Confidence Interval, Mechanistic Inference and Clinical Inference from a P Value SPORTSCIENCE Perspectives / Research Resources A Spreadsheet for Deriving a Confidence Interval, Mechanistic Inference and Clinical Inference from a P Value Will G Hopkins sportsci.org Sportscience 11,

More information

Lecture 4: Research Approaches

Lecture 4: Research Approaches Lecture 4: Research Approaches Lecture Objectives Theories in research Research design approaches ú Experimental vs. non-experimental ú Cross-sectional and longitudinal ú Descriptive approaches How to

More information

Checking the counterarguments confirms that publication bias contaminated studies relating social class and unethical behavior

Checking the counterarguments confirms that publication bias contaminated studies relating social class and unethical behavior 1 Checking the counterarguments confirms that publication bias contaminated studies relating social class and unethical behavior Gregory Francis Department of Psychological Sciences Purdue University gfrancis@purdue.edu

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Midterm, 2016

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Midterm, 2016 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Midterm, 2016 Exam policy: This exam allows one one-page, two-sided cheat sheet; No other materials. Time: 80 minutes. Be sure to write your name and

More information

DRAFT (Final) Concept Paper On choosing appropriate estimands and defining sensitivity analyses in confirmatory clinical trials

DRAFT (Final) Concept Paper On choosing appropriate estimands and defining sensitivity analyses in confirmatory clinical trials DRAFT (Final) Concept Paper On choosing appropriate estimands and defining sensitivity analyses in confirmatory clinical trials EFSPI Comments Page General Priority (H/M/L) Comment The concept to develop

More information

Statistics and Probability

Statistics and Probability Statistics and a single count or measurement variable. S.ID.1: Represent data with plots on the real number line (dot plots, histograms, and box plots). S.ID.2: Use statistics appropriate to the shape

More information

How to interpret results of metaanalysis

How to interpret results of metaanalysis How to interpret results of metaanalysis Tony Hak, Henk van Rhee, & Robert Suurmond Version 1.0, March 2016 Version 1.3, Updated June 2018 Meta-analysis is a systematic method for synthesizing quantitative

More information

Chapter 1: Exploring Data

Chapter 1: Exploring Data Chapter 1: Exploring Data Key Vocabulary:! individual! variable! frequency table! relative frequency table! distribution! pie chart! bar graph! two-way table! marginal distributions! conditional distributions!

More information

Glossary From Running Randomized Evaluations: A Practical Guide, by Rachel Glennerster and Kudzai Takavarasha

Glossary From Running Randomized Evaluations: A Practical Guide, by Rachel Glennerster and Kudzai Takavarasha Glossary From Running Randomized Evaluations: A Practical Guide, by Rachel Glennerster and Kudzai Takavarasha attrition: When data are missing because we are unable to measure the outcomes of some of the

More information

Sawtooth Software. MaxDiff Analysis: Simple Counting, Individual-Level Logit, and HB RESEARCH PAPER SERIES. Bryan Orme, Sawtooth Software, Inc.

Sawtooth Software. MaxDiff Analysis: Simple Counting, Individual-Level Logit, and HB RESEARCH PAPER SERIES. Bryan Orme, Sawtooth Software, Inc. Sawtooth Software RESEARCH PAPER SERIES MaxDiff Analysis: Simple Counting, Individual-Level Logit, and HB Bryan Orme, Sawtooth Software, Inc. Copyright 009, Sawtooth Software, Inc. 530 W. Fir St. Sequim,

More information

Pros. University of Chicago and NORC at the University of Chicago, USA, and IZA, Germany

Pros. University of Chicago and NORC at the University of Chicago, USA, and IZA, Germany Dan A. Black University of Chicago and NORC at the University of Chicago, USA, and IZA, Germany Matching as a regression estimator Matching avoids making assumptions about the functional form of the regression

More information

For general queries, contact

For general queries, contact Much of the work in Bayesian econometrics has focused on showing the value of Bayesian methods for parametric models (see, for example, Geweke (2005), Koop (2003), Li and Tobias (2011), and Rossi, Allenby,

More information

A Brief Introduction to Bayesian Statistics

A Brief Introduction to Bayesian Statistics A Brief Introduction to Statistics David Kaplan Department of Educational Psychology Methods for Social Policy Research and, Washington, DC 2017 1 / 37 The Reverend Thomas Bayes, 1701 1761 2 / 37 Pierre-Simon

More information

The Regression-Discontinuity Design

The Regression-Discontinuity Design Page 1 of 10 Home» Design» Quasi-Experimental Design» The Regression-Discontinuity Design The regression-discontinuity design. What a terrible name! In everyday language both parts of the term have connotations

More information

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n.

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n. University of Groningen Latent instrumental variables Ebbes, P. IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document

More information

CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA

CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA Data Analysis: Describing Data CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA In the analysis process, the researcher tries to evaluate the data collected both from written documents and from other sources such

More information

Reminders/Comments. Thanks for the quick feedback I ll try to put HW up on Saturday and I ll you

Reminders/Comments. Thanks for the quick feedback I ll try to put HW up on Saturday and I ll  you Reminders/Comments Thanks for the quick feedback I ll try to put HW up on Saturday and I ll email you Final project will be assigned in the last week of class You ll have that week to do it Participation

More information

AP Statistics. Semester One Review Part 1 Chapters 1-5

AP Statistics. Semester One Review Part 1 Chapters 1-5 AP Statistics Semester One Review Part 1 Chapters 1-5 AP Statistics Topics Describing Data Producing Data Probability Statistical Inference Describing Data Ch 1: Describing Data: Graphically and Numerically

More information

STAT 200. Guided Exercise 4

STAT 200. Guided Exercise 4 STAT 200 Guided Exercise 4 1. Let s Revisit this Problem. Fill in the table again. Diagnostic tests are not infallible. We often express a fale positive and a false negative with any test. There are further

More information

(a) 50% of the shows have a rating greater than: impossible to tell

(a) 50% of the shows have a rating greater than: impossible to tell q 1. Here is a histogram of the Distribution of grades on a quiz. How many students took the quiz? What percentage of students scored below a 60 on the quiz? (Assume left-hand endpoints are included in

More information

1 Online Appendix for Rise and Shine: The Effect of School Start Times on Academic Performance from Childhood through Puberty

1 Online Appendix for Rise and Shine: The Effect of School Start Times on Academic Performance from Childhood through Puberty 1 Online Appendix for Rise and Shine: The Effect of School Start Times on Academic Performance from Childhood through Puberty 1.1 Robustness checks for mover definition Our identifying variation comes

More information

Chapter 11. Experimental Design: One-Way Independent Samples Design

Chapter 11. Experimental Design: One-Way Independent Samples Design 11-1 Chapter 11. Experimental Design: One-Way Independent Samples Design Advantages and Limitations Comparing Two Groups Comparing t Test to ANOVA Independent Samples t Test Independent Samples ANOVA Comparing

More information

Meta-Analysis and Publication Bias: How Well Does the FAT-PET-PEESE Procedure Work?

Meta-Analysis and Publication Bias: How Well Does the FAT-PET-PEESE Procedure Work? Meta-Analysis and Publication Bias: How Well Does the FAT-PET-PEESE Procedure Work? Nazila Alinaghi W. Robert Reed Department of Economics and Finance, University of Canterbury Abstract: This study uses

More information

WDHS Curriculum Map Probability and Statistics. What is Statistics and how does it relate to you?

WDHS Curriculum Map Probability and Statistics. What is Statistics and how does it relate to you? WDHS Curriculum Map Probability and Statistics Time Interval/ Unit 1: Introduction to Statistics 1.1-1.3 2 weeks S-IC-1: Understand statistics as a process for making inferences about population parameters

More information

Examining Relationships Least-squares regression. Sections 2.3

Examining Relationships Least-squares regression. Sections 2.3 Examining Relationships Least-squares regression Sections 2.3 The regression line A regression line describes a one-way linear relationship between variables. An explanatory variable, x, explains variability

More information

Business Statistics Probability

Business Statistics Probability Business Statistics The following was provided by Dr. Suzanne Delaney, and is a comprehensive review of Business Statistics. The workshop instructor will provide relevant examples during the Skills Assessment

More information

Combining the regression discontinuity design and propensity score-based weighting to improve causal inference in program evaluationjep_

Combining the regression discontinuity design and propensity score-based weighting to improve causal inference in program evaluationjep_ Journal of Evaluation in Clinical Practice ISSN 1365-2753 Combining the regression discontinuity design and propensity score-based weighting to improve causal inference in program evaluationjep_1768 317..325

More information

SUPPLEMENTAL MATERIAL

SUPPLEMENTAL MATERIAL 1 SUPPLEMENTAL MATERIAL Response time and signal detection time distributions SM Fig. 1. Correct response time (thick solid green curve) and error response time densities (dashed red curve), averaged across

More information

On the purpose of testing:

On the purpose of testing: Why Evaluation & Assessment is Important Feedback to students Feedback to teachers Information to parents Information for selection and certification Information for accountability Incentives to increase

More information

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 39 Evaluation of Comparability of Scores and Passing Decisions for Different Item Pools of Computerized Adaptive Examinations

More information

Is Knowing Half the Battle? The Case of Health Screenings

Is Knowing Half the Battle? The Case of Health Screenings Is Knowing Half the Battle? The Case of Health Screenings Hyuncheol Kim, Wilfredo Lim Columbia University May 2012 Abstract This paper provides empirical evidence on both outcomes and potential mechanisms

More information

Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F

Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F Plous Chapters 17 & 18 Chapter 17: Social Influences Chapter 18: Group Judgments and Decisions

More information

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo Business Statistics The following was provided by Dr. Suzanne Delaney, and is a comprehensive review of Business Statistics. The workshop instructor will provide relevant examples during the Skills Assessment

More information

SPRING GROVE AREA SCHOOL DISTRICT. Course Description. Instructional Strategies, Learning Practices, Activities, and Experiences.

SPRING GROVE AREA SCHOOL DISTRICT. Course Description. Instructional Strategies, Learning Practices, Activities, and Experiences. SPRING GROVE AREA SCHOOL DISTRICT PLANNED COURSE OVERVIEW Course Title: Basic Introductory Statistics Grade Level(s): 11-12 Units of Credit: 1 Classification: Elective Length of Course: 30 cycles Periods

More information

Minimizing Uncertainty in Property Casualty Loss Reserve Estimates Chris G. Gross, ACAS, MAAA

Minimizing Uncertainty in Property Casualty Loss Reserve Estimates Chris G. Gross, ACAS, MAAA Minimizing Uncertainty in Property Casualty Loss Reserve Estimates Chris G. Gross, ACAS, MAAA The uncertain nature of property casualty loss reserves Property Casualty loss reserves are inherently uncertain.

More information

Lecture (chapter 1): Introduction

Lecture (chapter 1): Introduction Lecture (chapter 1): Introduction Ernesto F. L. Amaral January 17, 2018 Advanced Methods of Social Research (SOCI 420) Source: Healey, Joseph F. 2015. Statistics: A Tool for Social Research. Stamford:

More information

Methods of Reducing Bias in Time Series Designs: A Within Study Comparison

Methods of Reducing Bias in Time Series Designs: A Within Study Comparison Methods of Reducing Bias in Time Series Designs: A Within Study Comparison Kylie Anglin, University of Virginia Kate Miller-Bains, University of Virginia Vivian Wong, University of Virginia Coady Wing,

More information

6. Unusual and Influential Data

6. Unusual and Influential Data Sociology 740 John ox Lecture Notes 6. Unusual and Influential Data Copyright 2014 by John ox Unusual and Influential Data 1 1. Introduction I Linear statistical models make strong assumptions about the

More information

Still important ideas

Still important ideas Readings: OpenStax - Chapters 1 13 & Appendix D & E (online) Plous Chapters 17 & 18 - Chapter 17: Social Influences - Chapter 18: Group Judgments and Decisions Still important ideas Contrast the measurement

More information

Regression Discontinuity Design

Regression Discontinuity Design Regression Discontinuity Design Regression Discontinuity Design Units are assigned to conditions based on a cutoff score on a measured covariate, For example, employees who exceed a cutoff for absenteeism

More information

Lec 02: Estimation & Hypothesis Testing in Animal Ecology

Lec 02: Estimation & Hypothesis Testing in Animal Ecology Lec 02: Estimation & Hypothesis Testing in Animal Ecology Parameter Estimation from Samples Samples We typically observe systems incompletely, i.e., we sample according to a designed protocol. We then

More information

(a) 50% of the shows have a rating greater than: impossible to tell

(a) 50% of the shows have a rating greater than: impossible to tell KEY 1. Here is a histogram of the Distribution of grades on a quiz. How many students took the quiz? 15 What percentage of students scored below a 60 on the quiz? (Assume left-hand endpoints are included

More information

How is ethics like logistic regression? Ethics decisions, like statistical inferences, are informative only if they re not too easy or too hard 1

How is ethics like logistic regression? Ethics decisions, like statistical inferences, are informative only if they re not too easy or too hard 1 How is ethics like logistic regression? Ethics decisions, like statistical inferences, are informative only if they re not too easy or too hard 1 Andrew Gelman and David Madigan 2 16 Jan 2015 Consider

More information

Technical Specifications

Technical Specifications Technical Specifications In order to provide summary information across a set of exercises, all tests must employ some form of scoring models. The most familiar of these scoring models is the one typically

More information

Cross-Lagged Panel Analysis

Cross-Lagged Panel Analysis Cross-Lagged Panel Analysis Michael W. Kearney Cross-lagged panel analysis is an analytical strategy used to describe reciprocal relationships, or directional influences, between variables over time. Cross-lagged

More information

Simultaneous Equation and Instrumental Variable Models for Sexiness and Power/Status

Simultaneous Equation and Instrumental Variable Models for Sexiness and Power/Status Simultaneous Equation and Instrumental Variable Models for Seiness and Power/Status We would like ideally to determine whether power is indeed sey, or whether seiness is powerful. We here describe the

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION Supplementary Statistics and Results This file contains supplementary statistical information and a discussion of the interpretation of the belief effect on the basis of additional data. We also present

More information

AP Statistics Practice Test Ch. 3 and Previous

AP Statistics Practice Test Ch. 3 and Previous AP Statistics Practice Test Ch. 3 and Previous Name Date Use the following to answer questions 1 and 2: A researcher measures the height (in feet) and volume of usable lumber (in cubic feet) of 32 cherry

More information

Alan S. Gerber Donald P. Green Yale University. January 4, 2003

Alan S. Gerber Donald P. Green Yale University. January 4, 2003 Technical Note on the Conditions Under Which It is Efficient to Discard Observations Assigned to Multiple Treatments in an Experiment Using a Factorial Design Alan S. Gerber Donald P. Green Yale University

More information

Establishing Causality Convincingly: Some Neat Tricks

Establishing Causality Convincingly: Some Neat Tricks Establishing Causality Convincingly: Some Neat Tricks Establishing Causality In the last set of notes, I discussed how causality can be difficult to establish in a straightforward OLS context If assumptions

More information

MMI 409 Spring 2009 Final Examination Gordon Bleil. 1. Is there a difference in depression as a function of group and drug?

MMI 409 Spring 2009 Final Examination Gordon Bleil. 1. Is there a difference in depression as a function of group and drug? MMI 409 Spring 2009 Final Examination Gordon Bleil Table of Contents Research Scenario and General Assumptions Questions for Dataset (Questions are hyperlinked to detailed answers) 1. Is there a difference

More information

CHAPTER 2. MEASURING AND DESCRIBING VARIABLES

CHAPTER 2. MEASURING AND DESCRIBING VARIABLES 4 Chapter 2 CHAPTER 2. MEASURING AND DESCRIBING VARIABLES 1. A. Age: name/interval; military dictatorship: value/nominal; strongly oppose: value/ ordinal; election year: name/interval; 62 percent: value/interval;

More information

Differential Item Functioning

Differential Item Functioning Differential Item Functioning Lecture #11 ICPSR Item Response Theory Workshop Lecture #11: 1of 62 Lecture Overview Detection of Differential Item Functioning (DIF) Distinguish Bias from DIF Test vs. Item

More information

The Prevalence of HIV in Botswana

The Prevalence of HIV in Botswana The Prevalence of HIV in Botswana James Levinsohn Yale University and NBER Justin McCrary University of California, Berkeley and NBER January 6, 2010 Abstract This paper implements five methods to correct

More information

Mark J. Anderson, Patrick J. Whitcomb Stat-Ease, Inc., Minneapolis, MN USA

Mark J. Anderson, Patrick J. Whitcomb Stat-Ease, Inc., Minneapolis, MN USA Journal of Statistical Science and Application (014) 85-9 D DAV I D PUBLISHING Practical Aspects for Designing Statistically Optimal Experiments Mark J. Anderson, Patrick J. Whitcomb Stat-Ease, Inc., Minneapolis,

More information

Measuring the User Experience

Measuring the User Experience Measuring the User Experience Collecting, Analyzing, and Presenting Usability Metrics Chapter 2 Background Tom Tullis and Bill Albert Morgan Kaufmann, 2008 ISBN 978-0123735584 Introduction Purpose Provide

More information

Evaluation Models STUDIES OF DIAGNOSTIC EFFICIENCY

Evaluation Models STUDIES OF DIAGNOSTIC EFFICIENCY 2. Evaluation Model 2 Evaluation Models To understand the strengths and weaknesses of evaluation, one must keep in mind its fundamental purpose: to inform those who make decisions. The inferences drawn

More information

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES Correlational Research Correlational Designs Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are

More information

Early Release from Prison and Recidivism: A Regression Discontinuity Approach *

Early Release from Prison and Recidivism: A Regression Discontinuity Approach * Early Release from Prison and Recidivism: A Regression Discontinuity Approach * Olivier Marie Department of Economics, Royal Holloway University of London and Centre for Economic Performance, London School

More information

Risk Aversion in Games of Chance

Risk Aversion in Games of Chance Risk Aversion in Games of Chance Imagine the following scenario: Someone asks you to play a game and you are given $5,000 to begin. A ball is drawn from a bin containing 39 balls each numbered 1-39 and

More information

4 Diagnostic Tests and Measures of Agreement

4 Diagnostic Tests and Measures of Agreement 4 Diagnostic Tests and Measures of Agreement Diagnostic tests may be used for diagnosis of disease or for screening purposes. Some tests are more effective than others, so we need to be able to measure

More information

Model Evaluation using Grouped or Individual Data. Andrew L. Cohen. University of Massachusetts, Amherst. Adam N. Sanborn and Richard M.

Model Evaluation using Grouped or Individual Data. Andrew L. Cohen. University of Massachusetts, Amherst. Adam N. Sanborn and Richard M. Model Evaluation: R306 1 Running head: Model Evaluation Model Evaluation using Grouped or Individual Data Andrew L. Cohen University of Massachusetts, Amherst Adam N. Sanborn and Richard M. Shiffrin Indiana

More information

Lecture II: Difference in Difference and Regression Discontinuity

Lecture II: Difference in Difference and Regression Discontinuity Review Lecture II: Difference in Difference and Regression Discontinuity it From Lecture I Causality is difficult to Show from cross sectional observational studies What caused what? X caused Y, Y caused

More information

Political Science 15, Winter 2014 Final Review

Political Science 15, Winter 2014 Final Review Political Science 15, Winter 2014 Final Review The major topics covered in class are listed below. You should also take a look at the readings listed on the class website. Studying Politics Scientifically

More information

Propensity Score Matching with Limited Overlap. Abstract

Propensity Score Matching with Limited Overlap. Abstract Propensity Score Matching with Limited Overlap Onur Baser Thomson-Medstat Abstract In this article, we have demostrated the application of two newly proposed estimators which accounts for lack of overlap

More information

Instrumental Variables Estimation: An Introduction

Instrumental Variables Estimation: An Introduction Instrumental Variables Estimation: An Introduction Susan L. Ettner, Ph.D. Professor Division of General Internal Medicine and Health Services Research, UCLA The Problem The Problem Suppose you wish to

More information

SAMPLING AND SAMPLE SIZE

SAMPLING AND SAMPLE SIZE SAMPLING AND SAMPLE SIZE Andrew Zeitlin Georgetown University and IGC Rwanda With slides from Ben Olken and the World Bank s Development Impact Evaluation Initiative 2 Review We want to learn how a program

More information

Regensburger DISKUSSIONSBEITRÄGE zur Wirtschaftswissenschaft

Regensburger DISKUSSIONSBEITRÄGE zur Wirtschaftswissenschaft Regensburger DISKUSSIONSBEITRÄGE zur Wirtschaftswissenschaft Downward Nominal Rigidity in US Wage Data from the PSID - An Application of the Kernel-Location Approach Christoph Knoppik Department of Economics,

More information

Further Properties of the Priority Rule

Further Properties of the Priority Rule Further Properties of the Priority Rule Michael Strevens Draft of July 2003 Abstract In Strevens (2003), I showed that science s priority system for distributing credit promotes an allocation of labor

More information

Lecture Notes Module 2

Lecture Notes Module 2 Lecture Notes Module 2 Two-group Experimental Designs The goal of most research is to assess a possible causal relation between the response variable and another variable called the independent variable.

More information

PROBABILITY Page 1 of So far we have been concerned about describing characteristics of a distribution.

PROBABILITY Page 1 of So far we have been concerned about describing characteristics of a distribution. PROBABILITY Page 1 of 9 I. Probability 1. So far we have been concerned about describing characteristics of a distribution. That is, frequency distribution, percentile ranking, measures of central tendency,

More information

Review+Practice. May 30, 2012

Review+Practice. May 30, 2012 Review+Practice May 30, 2012 Final: Tuesday June 5 8:30-10:20 Venue: Sections AA and AB (EEB 125), sections AC and AD (EEB 105), sections AE and AF (SIG 134) Format: Short answer. Bring: calculator, BRAINS

More information

EXPERIMENTAL RESEARCH DESIGNS

EXPERIMENTAL RESEARCH DESIGNS ARTHUR PSYC 204 (EXPERIMENTAL PSYCHOLOGY) 14A LECTURE NOTES [02/28/14] EXPERIMENTAL RESEARCH DESIGNS PAGE 1 Topic #5 EXPERIMENTAL RESEARCH DESIGNS As a strict technical definition, an experiment is a study

More information

Isolating causality between gender and corruption: An IV approach Correa-Martínez, Wendy; Jetter, Michael

Isolating causality between gender and corruption: An IV approach Correa-Martínez, Wendy; Jetter, Michael No. 16-07 2016 Isolating causality between gender and corruption: An IV approach Correa-Martínez, Wendy; Jetter, Michael Isolating causality between gender and corruption: An IV approach 1 Wendy Correa

More information

EXERCISE: HOW TO DO POWER CALCULATIONS IN OPTIMAL DESIGN SOFTWARE

EXERCISE: HOW TO DO POWER CALCULATIONS IN OPTIMAL DESIGN SOFTWARE ...... EXERCISE: HOW TO DO POWER CALCULATIONS IN OPTIMAL DESIGN SOFTWARE TABLE OF CONTENTS 73TKey Vocabulary37T... 1 73TIntroduction37T... 73TUsing the Optimal Design Software37T... 73TEstimating Sample

More information

Discrimination Weighting on a Multiple Choice Exam

Discrimination Weighting on a Multiple Choice Exam Proceedings of the Iowa Academy of Science Volume 75 Annual Issue Article 44 1968 Discrimination Weighting on a Multiple Choice Exam Timothy J. Gannon Loras College Thomas Sannito Loras College Copyright

More information

Placebo and Belief Effects: Optimal Design for Randomized Trials

Placebo and Belief Effects: Optimal Design for Randomized Trials Placebo and Belief Effects: Optimal Design for Randomized Trials Scott Ogawa & Ken Onishi 2 Department of Economics Northwestern University Abstract The mere possibility of receiving a placebo during a

More information

Quasi-experimental analysis Notes for "Structural modelling".

Quasi-experimental analysis Notes for Structural modelling. Quasi-experimental analysis Notes for "Structural modelling". Martin Browning Department of Economics, University of Oxford Revised, February 3 2012 1 Quasi-experimental analysis. 1.1 Modelling using quasi-experiments.

More information

10. LINEAR REGRESSION AND CORRELATION

10. LINEAR REGRESSION AND CORRELATION 1 10. LINEAR REGRESSION AND CORRELATION The contingency table describes an association between two nominal (categorical) variables (e.g., use of supplemental oxygen and mountaineer survival ). We have

More information

Christopher Cairns and Elizabeth Plantan. October 9, 2016

Christopher Cairns and Elizabeth Plantan. October 9, 2016 Online appendices to Why autocrats sometimes relax online censorship of sensitive issues: A case study of microblog discussion of air pollution in China Christopher Cairns and Elizabeth Plantan October

More information

Selection and Combination of Markers for Prediction

Selection and Combination of Markers for Prediction Selection and Combination of Markers for Prediction NACC Data and Methods Meeting September, 2010 Baojiang Chen, PhD Sarah Monsell, MS Xiao-Hua Andrew Zhou, PhD Overview 1. Research motivation 2. Describe

More information

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo Please note the page numbers listed for the Lind book may vary by a page or two depending on which version of the textbook you have. Readings: Lind 1 11 (with emphasis on chapters 10, 11) Please note chapter

More information

1 Introduction. st0020. The Stata Journal (2002) 2, Number 3, pp

1 Introduction. st0020. The Stata Journal (2002) 2, Number 3, pp The Stata Journal (22) 2, Number 3, pp. 28 289 Comparative assessment of three common algorithms for estimating the variance of the area under the nonparametric receiver operating characteristic curve

More information

Lecture Slides. Elementary Statistics Eleventh Edition. by Mario F. Triola. and the Triola Statistics Series 1.1-1

Lecture Slides. Elementary Statistics Eleventh Edition. by Mario F. Triola. and the Triola Statistics Series 1.1-1 Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by Mario F. Triola 1.1-1 Chapter 1 Introduction to Statistics 1-1 Review and Preview 1-2 Statistical Thinking 1-3

More information