1. INTRODUCTION. Lalonde estimates the impact of the National Supported Work (NSW) Demonstration, a labor

Size: px
Start display at page:

Download "1. INTRODUCTION. Lalonde estimates the impact of the National Supported Work (NSW) Demonstration, a labor"

Transcription

1

2

3 1. INTRODUCTION This paper discusses the estimation of treatment effects in observational studies. This issue, which is of great practical importance because randomized experiments cannot always be implemented, has been addressed previously by Lalonde (1986), whose data we use in this paper. Lalonde estimates the impact of the National Supported Work (NSW) Demonstration, a labor training program, on post-intervention income levels, using data from a randomized evaluation of the program. He then examines the extent to which non-experimental estimators can replicate the unbiased experimental estimate of the treatment impact, when applied to a composite data set of experimental treatment units and non-experimental comparison units. He concludes that standard non-experimental estimators, such as regression, fixed-effect, and latent-variable-selection models, are either inaccurate (relative to the experimental benchmark), or sensitive to the specification used in the regression. Lalonde s results have been influential in renewing the debate on experimental versus non-experimental evaluations (see Manski and Garfinkel 1992) and in spurring a search for alternative estimators and specification tests (e.g., Heckman and Hotz 1989; and Manski, Sandefur, McLanahan, and Powers 1992). In this paper, we apply propensity score methods (Rosenbaum and Rubin 1983) to Lalonde s data set. Propensity score methods focus on the comparability of the treatment and nonexperimental comparison groups in terms of pre-intervention variables. Controlling for differences in pre-intervention variables is difficult when the treatment and comparison groups are dissimilar and when there are many pre-intervention variables. The propensity score (the probability of assignment to treatment, conditional on covariates) summarizes the pre-intervention variables. We can easily control for differences between the treatment and non-experimental comparison groups through the estimated propensity score, a single variable on the unit interval. Using propensity score methods, we are able to replicate the experimental treatment effect for a range of specifications and estimators. 1

4 The assumption underlying the method is that assignment to treatment depends only on observable pre-intervention variables (called the ignorable treatment assignment assumption or selection on observables; see Rubin 1974, 1977, 1978; Heckman and Robb 1985; or Holland 1986). Though this is a strong assumption, we demonstrate that propensity score methods are an informative starting point, because they quickly reveal the extent to which the treatment and comparison groups overlap in terms of pre-intervention variables. The paper is organized as follows. Section 2 reviews Lalonde s data and replicates his results. Section 3 identifies the treatment effect under the potential outcomes causal model, and discusses estimation strategies for the treatment effect. In Section 4, we apply our methods to Lalonde s data set, and in Section 5, we discuss the sensitivity of the results to the methodology. Section 6 concludes the paper. 2. LALONDE S RESULTS 2.1 The Data The National Supported Work (NSW) Demonstration (see Manpower Demonstration Research Corporation 1983) was a federally-funded program implemented in the mid-1970s, with the objective of providing work experience for a period of 12 to 18 months to individuals who had faced economic and social problems prior to enrollment in the program. Those randomly selected to join the program participated in various types of work, ranging from operating a restaurant to construction work. Information on pre-intervention variables (pre-intervention earnings as well as education, age, ethnicity, and marital status) was obtained from initial surveys and Social Security Administration records. In this paper we focus on the male participants, since estimates for this group were the most sensitive to functional-form specification, as indicated in Lalonde (1986). Both the treatment and control groups participated in follow-up interviews at specific intervals. The outcome variable of interest is post-intervention (1978) earnings. Unlike typical clinical trials, the eligible candidates did not join the NSW program immediately, but were randomized in over a period of 51 months between March 1975 and June This introduced 2

5 what the administrators of the program have referred to as the cohort phenomenon (MDRC 1983, p. 48): individuals who joined early in the program had different characteristics than those who entered later. Lalonde limits his sample to those assigned between January 1976 and July 1977 in order to achieve homogeneity within the treatment and control groups, reducing the sample to 297 treated observations and 425 control observations for male participants. His sample is limited to one year of pre-intervention earnings data (1975). However, several years of pre-intervention earnings are viewed as important in determining the effect of job training programs (Angrist 1990, 1998; Ashenfelter 1978; Ashenfelter and Card 1985; and Card and Sullivan 1988). Thus, we further limit ourselves to a subset of this data in order to obtain data on earnings in Our subset, also defined using the month of assignment, includes 185 treated and 260 control observations. Since month of assignment is a pre-treatment variable, this selection does not affect the properties of the experimentally randomized data set: the treatment and control groups still have the same distribution of pre-intervention variables, so that a difference in means remains an unbiased estimate of the average treatment impact. We present the pre-intervention characteristics of the original sample and of our subset in the first four rows of Table 1. The distribution of pre-intervention variables is very similar across the treatment and control groups for both samples (none of the differences is statistically significant), but our subset differs somewhat from Lalonde s original sample, especially in terms of 1975 earnings. Our propensity score results will be based on our subset of the data, using two years of pre-intervention earnings. In order to render our results comparable to Lalonde s, we replicate his analysis on our subset (both with and without the additional year of pre-intervention earnings data), and show that his basic conclusions remain unchanged. As well, in Section 5, we discuss the sensitivity of our propensity score results to dropping the additional earnings data. 3

6 2.2 Lalonde s Results Lalonde estimates linear regression, fixed-effect, and latent variable selection models of the treatment impact. Since our analysis focuses on the importance of pre-intervention variables, we consider primarily the first of these. Non-experimental estimates of the treatment effect are based on the two distinct comparison groups used by Lalonde (1986), the Panel Study of Income Dynamics (PSID-1) and Westat s Matched Current Population Survey-Social Security Administration File (CPS-1). Lalonde also considers subsets of these two comparison groups, PSID2-3 and CPS2-3. Table 1 presents the pre-intervention characteristics of the comparison groups. It is evident that both PSID-1 and CPS-1 differ dramatically from the treatment group, especially in terms of age, marital status, ethnicity, and pre-intervention earnings (all the mean differences are statistically significant). In order to bridge the gap between the treatment and the comparison groups in terms of pre-intervention characteristics, Lalonde extracts subsets from PSID-1 and CPS-1 (PSID-2 and -3, and CPS-2 and -3) which resemble the treatment group in terms of single pre-intervention characteristics (such as age or employment status; see Table 1, notes). But as the table indicates, the subsets still remain substantially different from the treatment group (the mean differences in age, ethnicity, marital status, and earnings are smaller, but remain statistically significant). Table 2 (Panel A) replicates Lalonde s results using his original data and nonexperimental comparison groups (the results are identical to those presented in his paper, with the exceptions noted in the footnote of Table 2). Table 2 (Panel B) applies Lalonde s estimators to our reduced experimental sample and the same comparison units. Comparing the two panels, we note that the treatment effect, as estimated from the randomized experiment, is higher in Panel B ($1,794 compared with $886). This is due to the cohort phenomenon --individuals with a later month of assignment seem to have benefitted more from the program. Otherwise, the results are qualitatively similar. The simple difference in means, reported in column (1), yields negative treatment effects for the CPS and PSID comparison groups in both panels (except PSID-3). The 4

7 fixed-effect type differencing estimator in column (3) fares somewhat better, although many estimates are still negative or deteriorate when we control for covariates in both panels. The estimates in column (5) are closest to the experimental estimate, consistently closer than those in column (2) which do not control for earnings in The treatment effect is underestimated by about $1,000 for the CPS comparison groups and $1,500 for the PSID groups. Lalonde s conclusion from Panel A, which also holds for our version in Panel B, is that there is no consistent estimate robust to the specification of the regression or the choice of comparison group. The inclusion of earnings in 1974 as an additional variable in the regressions in Table 2 (Panel C) does not alter Lalonde s basic message, although the estimates improve when compared with Panel B. In columns (1) to (3), many estimates are still negative, but less so than in Panel B. In columns (4) and (5), the estimates are also closer to the experimental benchmark, off by about $1,000 for PSID1-3 and CPS1-2 and by $400 for CPS-3. Overall, the best results in Table 2 are for CPS-3, Panel C. This raises a number of issues. The strategy of considering subsets of the comparison group more comparable to the treatment group certainly seems to improve matters, provided that we observe the key pre-intervention variables. But Lalonde creates these subsets in an informal manner, based on one or two pre-intervention variables. Table 1 reveals that significant differences remain between the comparison groups and the treatment group. A more systematic means of creating such subsets should improve the estimates from both the CPS and PSID. We undertake this in Sections 3 and 4 with propensity score methods. 3. IDENTIFYING AND ESTIMATING THE AVERAGE TREATMENT EFFECT 3.1 Identification Let Y i1 represent the value of the outcome when unit i is subject to regime 1 (called treatment), and Y i0 the value of the outcome when unit i is exposed to regime 0 (called control). Only one of Y i0 or Y i1 can be observed for any unit, since we can not observe the same unit under both treatment and control. Let T i be a treatment indicator (=1 if exposed to treatment, =0 otherwise). Then 5

8 the observed outcome for unit i is Y i = T i Y i1 + (1 T i )Y i0. The treatment effect for unit i is τ = Y Y. i i1 i0 In an observational study, the treatment and comparison groups are often drawn from different populations. In our application the group exposed to the treatment is drawn from the population of interest (welfare recipients eligible for the program). The comparison group is drawn from a different population (in our application both the CPS and PSID are more representative of the general US population). The treatment effect we are trying to identify is therefore the treatment effect for the treated population: τ = E( Y T = 1) E( Y T 1). T = 1 i1 i i0 i = This cannot be estimated directly since Y i0 is not observed for the treated units. Assuming selection on observables (Rubin 1974, 1977), namely {Y i1, Y i0 T i } X i (using Dawid s notation, is independence), we obtain: ( ij i, i = 1 ) = ( Yij X i, Ti = 0) EY X T E E( Y X T j) =,, i i i = for j=0,1. Conditional on the observables, X i, there is no systematic pre-treatment difference between the groups assigned to treatment and control. This allows us to identify the treatment effect for the treated: { E( Y X, T = 1) E( Y X, T = 0) T 1} τ 1 = E =, (1) T = i i i i i i i where the outer expectation is over the distribution of X i T i =1, the distribution of pre-intervention variables in the treated population. One method for estimating the treatment effect stems from (1): estimating EY ( i Xi, Ti = 1 ) and E( Yi X i, Ti = 0) as two non-parametric equations. This estimation strategy becomes difficult, however, if the covariates, X i, are high dimensional. The propensity score theorem provides an intermediate step: Proposition 1 (Rosenbaum and Rubin 1983): Let p(x i ) be the probability of unit i having been assigned to treatment, defined as p(x i ) Pr(T i =1 X i )=E(T i X i ), where 0<p(X i )<1, i. Then: {( Yi 1, Yi0 ) Ti} Xi { } ( Yi 1, Yi0 ) Ti p( Xi). 6

9 Corollary: { E( Y T = 1, p( X )) E( Y T = 0, p( X )) T 1} τ 1 = E =, (2) T = i i i i i i i where the outer expectation is over the distribution of p(x i ) T i =1. One intuition for the propensity score is that, whereas in equation (1) we are trying to condition on X (intuitively, to find observations with similar covariates), in equation (2) we are trying to condition just on the propensity score, because the proposition implies that observations with the same propensity score have the same distribution of the full vector of covariates X. 3.2 The Estimation Strategy Estimation is in two steps. First, we estimate the propensity score for the sample of experimental treatment and non-experimental comparison units. We use the logistic model, but other standard models yield similar results. An issue is what functional form of the pre-intervention variables to include in the logit. We rely on the following proposition: Proposition 2 (Rosenbaum and Rubin 1983): X i Ti p( X i ). Proposition 2 asserts that, conditional on the propensity score, the covariates are independent of assignment to treatment, so that, for observations with the same propensity score, the distribution of covariates should be the same across the treatment and comparison groups. Conditioning on the propensity score, each individual has the same probability of assignment to treatment, as in a randomized experiment. We use this proposition to assess estimates of the propensity score. For any given specification (we start by introducing the covariates linearly), we group observations into strata defined on the estimated propensity score and check whether we succeed in balancing the covariates within each stratum. We use tests for the statistical significance of differences in the distribution 7

10 of covariates, focusing on first and second moments. If there are no significant differences between the two groups, then we accept the specification. If there are significant differences, we add higher-order terms and interactions of the covariates until this condition is satisfied. Section 5 shows that the results are not sensitive to the selection of higher order and interaction variables. In the second step, given the estimated propensity score, we need to estimate a univariate non-parametric regression EYT ( j, p( X) ) i i i =, for j=0,1. We focus on simple methods for obtaining a flexible functional form, stratification and matching, but in principle one could use any one of the standard array of non-parametric techniques (e.g., see Härdle 1990). With stratification, observations are sorted from lowest to highest estimated propensity score. The comparison units with an estimated propensity score less than the minimum (or greater than the maximum) estimated propensity score for treated units are discarded. The strata, defined on the estimated propensity score, are chosen so that the covariates within each stratum are balanced across the treatment and comparison units (we know such strata exist from step one). Based on equation (2), within each stratum we take a difference in means of the outcome between the treatment and comparison groups, and weight these by the number of treated observations in each stratum. We also consider matching on the propensity score. Each treatment unit is matched with replacement to the comparison unit with the closest propensity score; the unmatched comparison units are discarded (see Dehejia and Wahba 1997 for more details; also Rubin 1979, and Heckman, Ichimura, and Todd 1997). There are a number of reasons to prefer this two-step approach rather than estimating equation (1) directly. First, tackling equation (1) directly with a non-parametric regression would encounter the curse of dimensionality as a problem in many data sets, including ours, which have a large number of covariates. This is also true for estimating the propensity score using nonparametric techniques. Hence, we use a parametric model for the propensity score. This is preferable to applying a parametric model to equation (1) directly because, as we will see, the results are less sensitive to the logit specification than regression models, such as those in Table 2 (and because there is a simple criterion for determining which interactions to add to the specification). 8

11 Finally, depending on the estimator one adopts (e.g., stratification), an extremely precise estimate of the propensity score is not even needed, since the process of validating the propensity score produces at least one partition structure which balances pre-intervention covariates across the treatment and comparison groups within each stratum, which (by equation (1)) is all that is needed for an unbiased estimate of the treatment impact. 4. RESULTS USING THE PROPENSITY SCORE Using the method outlined in the previous section, we estimate the propensity score for each comparison group separately. Figure 1 presents a histogram of the estimated propensity scores for the treatment and PSID-1 comparison units, and Figure 2 for CPS-1 comparison units. In Figure 2, we discard 12,611 (out of a total of 15,992) CPS units whose estimated propensity score is less than the minimum for the treatment units. Even then, the first bin (from ) contains 2,969 of the remaining comparison units and only 26 treatment units. This provides a snapshot of the fact that the comparison group, although very large, contains relatively few units comparable to the treatment group. A similar pattern is seen in the first bin of Figure 1, but an important difference is that in Figure 1 there is limited overlap in the estimated propensity score between the treatment and PSID groups: there are 98 (more than half the total number of) treated units with an estimated propensity score in excess of 0.8, and only 7 comparison units. Instead, for the CPS, although the treatment units outnumber the comparisons for higher values of the estimated propensity scores, for most bins there are at least a few comparison units. We use stratification and matching on the propensity score to group the treatment units with the small number of comparison units that are comparable (namely, those comparison units whose estimated propensity scores are greater than the minimum -- or less than the maximum -- propensity score for treatment units). The treatment effect is estimated by summing the withinstratum difference in means between the treatment and comparison observations (of earnings in 1978), where the sum is weighted by the number of treated observations within each stratum (Table 3, column (4)). An alternative is a within-block regression, again taking a weighted sum 9

12 over the strata (Table 3, column (5)). When the covariates are well balanced, such a regression should have little effect, but it can help to eliminate the remaining within-block differences. Likewise for matching, we can estimate a simple difference in means between the treatment and matched comparison group for earnings in 1978 (column (7)), and also perform a regression of 1978 earnings on covariates (column (8)). Table 3 presents the results. For the PSID sample, the stratification estimate is $1,608 and the matching estimate is $1,691, which should be compared against the benchmark randomizedexperiment estimate of $1,794. The estimates from a difference in means, or regression control on the full sample, are -$15,205 and $731. The propensity score estimators yield more accurate estimates simply using a difference in means because only those comparison units similar to the treatment group have been used. In columns (5) and (8) controlling for covariates has little impact on the stratification and matching estimates. Likewise for the CPS, the propensity-scorebased estimates from the CPS -- $1,713 and $1, are much closer to the experimental benchmark than estimates from the full comparison sample -- -$8,498 and $972. Another set of estimates to consider is from the subsets of the PSID and CPS. In Table 2, the estimates tend to improve when applied to narrower subsets. However, as noted above, the estimates still range from -$8,498 to $1,326. In Table 3, the estimates do not improve for the subsets, although the range of fluctuation is much narrower, from $587 to $2,321. Tables 1 and 4 shed light on this. Table 1 presents the pre-intervention characteristics of the various comparison groups. We note that the subsets PSID-2 and -3, and CPS-2 and -3, though more closely resembling the treatment group, are still considerably different along a number of important dimensions, including ethnicity, marital status, and especially earnings. Table 4 presents the characteristics of the matched subsamples from the comparison groups. The characteristics of the matched subsets of CPS-1 and PSID-1 closely correspond to the treatment group; none of the differences are statistically significant. But as we create the subsets of the comparison groups, the quality of the matches declines, most dramatically for the PSID, with PSID-2 and -3 earnings now increasing 10

13 from 1974 to 1975, whereas for the treatment group they decline. The training literature has identified the dip in earnings as an important characteristic of participants in training programs (see Ashenfelter 1974, 1978). The CPS sub-samples retain the dip, but for the matched subset of CPS-3 earnings in 1974 are significantly higher than for the treatment group. This illustrates one of the important features of propensity score methods, namely that the creation of subsamples from the non-experimental comparison group is neither necessary nor desirable, because subsamples created based on single pre-intervention characteristics may dispose of comparison units which nonetheless are good overall comparisons with treatment units. The propensity score sorts out which comparison units are most relevant considering all of the pre-intervention characteristics, not just one characteristic at a time. Column (3) in Table 3 gives an important insight into how the estimators in columns (4) to (8) succeed in estimating the treatment effect accurately. In column (3) we regress the outcome (earnings in 1978) on a quadratic function of the estimated propensity score and a treatment indicator. The estimates are comparable to those in column (2), where we regress the outcome on all pre-intervention characteristics. This again demonstrates the ability of the propensity score to summarize all pre-intervention variables. The estimators in columns (4) to (8) differ from column (3) in two respects. First, their functional form is more flexible than a low-order polynomial in the estimated propensity score. Second, rather than requiring a constant additive treatment effect, they allow the treatment effect to vary within each stratum (for stratification) or for each individual (for matching). Finally, it must be noted that even though the estimates presented in Table 3 are closer to the experimental benchmark than those presented in Table 2, with the exception of the adjusted matching estimator, their standard errors are higher: in Table 3, column (5), the standard errors are 1,152 and 1,581 for the CPS and PSID, compared with 550 and 886 in Table 2, column (5). This is because the propensity score estimators use fewer observations. When stratifying on the propensity score, we discard irrelevant controls, and so the strata may contain as few as seven 11

14 treated observations. However, the standard errors for the adjusted matching estimator (751 and 809) are similar to those in Table 2. By summarizing all of the covariates in a single number, the propensity score method allows us to focus on the comparability of the comparison group to the treatment group. Hence, it allows us to address the issues of functional form and treatment effect heterogeneity much more easily. 5. SENSITIVITY ANALYSIS 5.1 Sensitivity to the Specification of the Propensity Score How sensitive are the estimates presented to the specification of the estimated propensity score? For the stratification estimator, as was suggested in Section 3, the exact specification of the estimated propensity score is not important as long as, within each stratum, the pre-intervention characteristics are balanced across the treatment and comparison groups. Since this was the basis of the specification search suggested in Section 3, either one can find a specification that balances pre-intervention characteristics, or one must conclude the treatment and comparison groups are irreconcilably different. The upper half of Table 5 demonstrates that the estimates of the treatment impact are not particularly sensitive to the specification used. Specifications 1 and 4 are the same as those in Table 3 (hence, they balance the pre-intervention characteristics). In specifications 2 to 3 and 5 to 6, we drop the squares and cubes of the covariates, and then interactions and dummy variables. In specifications 3 and 6, the logits then simply use the covariates linearly. These estimates are worse than those in Table 3, ranging from $835 to $1,774. But compared with the range of estimates from Table 2, these remain concentrated. Furthermore, we are unable to find a partition structure for the alternative specifications such that the pre-intervention characteristics are balanced within each stratum. There is a well-defined criterion to reject these alternative specifications. Indeed, the specification search begins with a linear specification, and adds higher-order and interaction terms until within-stratum balance is achieved. 12

15 5.2 Sensitivity to Selection on Observables One important assumption underlying propensity score methods is that all of the variables that affect assignment to treatment and are correlated with the potential outcomes, Y i1 and Y i0, are observed. This assumption led us to restrict Lalonde s data to the subset for which two (rather than one) years of pre-intervention earnings data is available. In Table 5 (Panel B), we consider how our estimators would fare in the absence of two years of pre-intervention earnings data by reestimating the treatment impact without making use of earnings in For PSID-1, the stratification estimators yield less reliable estimates than in Table 3, ranging from -$1,023 to $1,727 as compared with $1,473 to $1,691, although the matching estimator is more robust. In contrast, even though the estimates from the CPS are farther from the experimental benchmark than those in Table 3 ($861 to $1941 compared with $1,582 to $1,774), they are still more concentrated around the experimental estimates than the regression estimates in Panel B of Table 2. This illustrates that the results are sensitive to the set of pre-intervention variables used. For training programs, a sufficiently lengthy pre-intervention earnings history clearly is important. Table 5 also demonstrates the value of using multiple comparison groups. Even if we did not know the experimental estimate, in looking at Table 5 we would be concerned that the variables that we observe (assuming that earnings in 1974 are not observed) do not control fully for the differences between the treatment and comparison groups, because of variation in the estimates between the CPS and PSID. If all relevant variables are observed, then the estimates from both groups should be similar (as they are in Table 3). When an experimental benchmark is not available, multiple comparison groups are valuable because they can suggest the existence of important unobservables (see Rosenbaum 1987, which develops this idea in more detail). 13

16 6. CONCLUSION This paper demonstrates how to estimate the treatment impact in an observational study using propensity score methods. These methods are assessed using Lalonde s influential re-creation of a non-experimental setting. Our results show that the estimates of the training effect are close to the benchmark experimental estimate, and are robust to the specification of the comparison group and the functional form used to estimate the propensity score. A researcher using our method would arrive at estimates of the treatment impact ranging from $1,473 to $1,774, very close to the benchmark unbiased estimate from the experiment of $1,794. Furthermore, our methods succeed for a transparent reason: they use only the subset of the comparison group that is comparable to the treatment group, and discard the complement. Although Lalonde attempts to follow this strategy in his construction of other comparison groups, his method relies on an informal selection among the pre-intervention variables. Our application illustrates that even among a large set of potential comparison units, very few may be relevant. But it also illustrates that even a few comparison units can be enough to estimate the treatment impact. The methods we suggest are not relevant in all situations: there may be important unobservable covariates, for which the propensity score method cannot account. But rather than giving up, or relying on assumptions about the unobserved variables, propensity score methods may offer both a diagnostic on the quality of the comparison group and a means to estimate the treatment impact. 14

17 References Angrist, J. (1990), Lifetime Earnings and the Vietnam Draft Lottery: Evidence from Social Security Administrative Records, American Economic Review, 80, (1998), Estimating the Labor Market Impact of Voluntary Military Service Using Social Security Data on Military Applicants, Econometrica, 66, Ashenfelter, O. (1974), The Effect of Manpower Training on Earnings: Preliminary Results, in Proceedings of the Twenty-Seventh Annual Winter Meetings of the Industrial Relations Research Association, (eds.) J. Stern and B. Dennis, Madison: Industrial Relations Research Association (1978), Estimating the Effects of Training Programs on Earnings, Review of Economics and Statistics, 60, and D. Card (1985), Using the Longitudinal Structure of Earnings to Estimate the Effect of Training Programs, Review of Economics and Statistics, 67, Card, David, and Daniel Sullivan (1988), Measuring the Effect of Subsidized Training Programs on Movements in and out of Employment, Econometrica, 56, Dehejia, Rajeev H., and Sadek Wahba (1997), Matching Methods for Estimating Causal Effects in Non-Experimental Studies, University of Toronto, unpublished manuscript. Härdle, Wolfgang (1990), Applied Nonparametric Regression, Econometric Society Monographs, Cambridge: Cambridge University Press. Heckman, James, and Richard Robb (1985), Alternative Methods for Evaluating the Impact of Interventions, in Longitudinal Analysis of Labor Market Data, Econometric Society Monograph No. 10, (eds.) James Heckman and Burton Singer, Cambridge: Cambridge University Press and Joseph Hotz (1989), Choosing Among Alternative Nonexperimental Methods for Estimating the Impact of Social Programs: The Case of Manpower Training, Journal of the American Statistical Association, 84, , Hidehiko Ichimura, and Petra Todd (1997), Matching as an Econometric Evaluation Estimator: Evidence from Evaluating a Job Training Programme, Review of Economic Studies, 64, Holland, Paul W. (1986), Statistics and Causal Inference, Journal of the American Statistical Association, 81,

18 Lalonde, Robert (1986), Evaluating the Econometric Evaluations of Training Programs, American Economic Review, 76, Manpower Demonstration Research Corporation (1983), Summary and Findings of the National Supported Work Demonstration, Cambridge: Ballinger. Manski, Charles F., and Irwin Garfinkel (1992), Introduction, in Evaluating Welfare and Training Programs, (eds.) Charles Manski and Irwin Garfinkel, Cambridge: Harvard University Press , G. Sandefur, S. McLanahan, and D. Powers (1992), Alternative Estimates of the Effect of Family Structure During Adolescence on High School Graduation, Journal of the American Statistical Association, 87, Rosenbaum, P. (1987), The Role of a Second Control Group in an Observational Study, Statistical Science, 2(3), and D. Rubin (1983), The Central Role of the Propensity Score in Observational Studies for Causal Effects, Biometrika, 70, and (1984), Reducing Bias in Observational Studies Using the Subclassification on the Propensity Score, Journal of the American Statistical Association, 79, Rubin, Donald (1974), Estimating Causal Effects of Treatments in Randomized and Non-Randomized Studies, Journal of Educational Psychology, 66, (1977), Assignment to a Treatment Group on the Basis of a Covariate, Journal of Educational Statistics, 2, (1978), Bayesian Inference for Causal Effects: The Role of Randomization, The Annals of Statistics, 6, (1979), Using Multivariate Matched Sampling and Regression Adjustment to Control Bias in Observation Studies, Journal of the American Statistical Association, 74,

19 Control Sample Table 1. Sample Means of Characteristics for NSW and Comparison Samples Sample Characteristics No of Age Educ Black Hisp Nodegree Married RE74 Obs. US$ RE75 US$ NSW/Lalonde: a Treated ,571 Control ,672 RE74 subset: b Treated ,096 1,532 Control ,107 1,267 Comparison groups: c PSID-1 2, ,429 19,063 PSID ,027 7,569 PSID ,566 2,611 CPS-1 15, ,016 13,650 CPS-2 2, ,728 7,397 CPS ,619 2,467 NOTES: Data Legend: Age=age in years; Educ=number of years of schooling; Black=1 if black, 0 otherwise; Hisp=1 is Hispanic, 0 otherwise; Nodegree=1 if no high school degree, 0 otherwise; Married=1 if married, 0 otherwise; REx=earnings in calendar year 19x; Ux=1 if unemployed in 19x, 0 otherwise. a NSW sample as constructed by Lalonde (1986). b The subset of the Lalonde sample for which RE74 is available. c Definition of Comparison Groups (Lalonde 1986): PSID-1: All male household heads less than 55 years old who did not classify themselves as retired in PSID-2: Selects from PSID-1 all men who were not working when surveyed in the spring of PSID-3: Selects from PSID-2 all men who were not working in CPS-1: All CPS males less than 55 years of age. CPS-2: Selects from CPS-1 all males who were not working when surveyed in March CPS-3: Selects from CPS-2 all the unemployed males in 1976 whose income in 1975 was below the poverty level. PSID1-3 and CPS-1 are identical to those used in Lalonde. CPS 2-3 are similar to those used in Lalonde, but Lalonde s original subset could not be re-created.

20 Comparison Group Table 2. Lalonde s Earnings Comparisons and Estimated Training Effects for the NSW Male Participants Using Comparison Groups from the PSID and the CPS-SSA a A. Lalonde s original sample B. RE74 Subsample (results do not use RE74) C. RE74 Subsample (results use RE74) NSW Treatment Earnings Less Comparison Group Earnings, 1978 b Unadjusted Adjusted c Unrestricted Difference in Differences: Quasi-Difference in Earnings Growth: Unadjusted d Adjusted e Controlling for All Variables f NSW Treatment Earnings Less Comparison Group Earnings, 1978 b Unadjusted Adjusted c Unrestricted Difference in Differences: Quasi-Difference in Earnings Growth: Unadjusted d Adjusted e Controlling for All Variables f NSW Treatment Earnings Less Comparison Group Earnings, 1978 b Unadjusted Adjusted c Unrestricted Difference in Differences: Quasi-Difference in Earnings Growth: Unadjusted d Adjusted e (1) (2) (3) (4) (5) (1) (2) (3) (4) (5) (1) (2) (3) (4) (5) Controlling for All Variables f NSW 886 (472) 798 (472) 879 (467) 802 (468) 820 (468) 1,794 (633) 1,672 (637) 1,750 (632) 1,631 (637) 1,612 (639) 1,794 (633) 1,688 (636) 1,750 (632) 1,672 (638) 1,655 (640) PSID-1-15,578 (913) PSID-2-4,020 (781) PSID (760) -8,067 (990) -3,482 (935) -509 (967) -2,380 (680) -1,364 (729) 629 (757) -2,119 (746) -1,694 (878) -552 (967) -1,844 (762) -1,876 (885) -576 (968) -15,205 (1155) -3,647 (960) 1,070 (900) -7,741 (1175) -2,810 (1082) 35 (1101) -582 (841) 721 (886) 1,370 (897) -265 (881) 298 (1004) 243 (1101) 186 (901) 111 (1032) 298 (1105) -15,205 (1155) -3,647 (960) 1,070 (900) -879 (931) 94 (1042) 821 (1100) -582 (841) 721 (886) 1,370 (897) 218 (866) 907 (1004) 822 (1101) 731 (886) 683 (1028) 825 (1104) CPS-1-8,870 (562) -4,416 (577) -1,543 (426) -1,102 (450) -987 (452) -8,498 (712) -4,417 (714) -78 (537) 525 (557) 709 (560) -8,498 (712) -8 (572) -78 (537) 739 (547) 972 (550) CPS-2-4,195 (533) -2,341 (620) -1,649 (459) -1,129 (551) -1,149 (551) -3,822 (671) -2,208 (746) -263 (574) 371 (662) 305 (666) -3,822 (671) 615 (672) -263 (574) 879 (654) 790 (658) CPS-3-1,008 (539) -1 (681) -1,204 (532) -263 (677) -234 (675) -635 (657) 375 (821) -91 (641) 844 (808) 875 (810) -635 (657) 1,270 (798) -91 (641) 1,326 (796) 1,326 (798) NOTES: Panel A replicates Lalonde (1986), Table 5. The estimates for columns (1) to (4) for NSW, PSID1-3, and CPS-1 are identical to Lalonde s. CPS 2-3 are similar, but not identical, because we could not exactly re-create his subset. Column (5) differs because the data file we obtained did not contain all of the covariates used in column (10) of Lalonde (1986), Table 5. a Estimated effect of training on RE78. Standard errors are in parentheses. The estimates are in 1982 dollars. b Based on the experimental data, an unbiased estimate of the impact of training is presented in column (4), $1,794. c The exogenous variables used in the regressions-adjusted equations are age, age squared, years of schooling, high school dropout status, and race (and RE74 in Panel C). d Compares RE78 across the treatment and comparison group, controlling for RE75. e The same as (d), but controls for the additional variables listed under (c). f Controls for all pre-treatment covariates.

21 Table 3. Estimated Training Effects for the NSW Male Participants Using Comparison Groups from PSID and CPS-SSA NSW Earnings Less NSW Treatment Earnings Less Comparison Group Earnings, Comparison Group Conditional On The Estimated Propensity Score Earnings Quadratic in Score b Stratifying on the Score Matching on the Score (1) (2) (3) (4) (5) (6) (7) (8) Unadjusted Adjusted a Un-adjusted Adjusted Obs. g Unadjusted Adjusted NSW 1,794 (633) 1,672 (638) PSID-1 c -15,205 (1154) 731 (886) 294 (1389) 1,608 (1571) 1,494 (1581) 1,255 1,691 (2209) 1,473 (809) PSID-2 d -3,647 (959) 683 (1028) 496 (1193) 2,220 (1768) 2,235 (1793) 389 1,455 (2303) 1,480 (808) PSID-3 d 1,069 (899) 825 (1104) 647 (1383) 2,321 (1994) 1,870 (2002) 247 2,120 (2335) 1,549 (826) CPS-1 e -8,498 (712) 972 (550) 1117 (747) 1,713 (1115) 1,774 (1152) 4,117 1,582 (1069) 1,616 (751) CPS-2 e -3,822 (670) 790 (658) 505 (847) 1,543 (1461) 1,622 (1346) ,788 (1205) 1,563 (753) CPS-3 e -635 (657) 1,326 (798) 556 (951) 1,252 (1617) 2,219 (2082) (1496) 662 (776) NOTES: a Least Squares Regression: RE78 on a constant, expstat, age, age 2, educ, nodegree, black, hisp, RE74, RE75. b Least squares regression of RE78 on a quadratic on the estimated propensity score and a treatment indicator, for observations used under stratification; see note (g). c Logit: Prob (expstat=1)=f(age, age 2, educ, educ 2, married, nodegree, black, hisp, RE74, RE75, RE 74 2, RE75 2, u74*black) (Expstat=1 if the unit was subject to treatment, =0 otherwise). d Logit: Prob (expstat=1)=f(age, age 2, educ, educ 2, nodegree, married, black, hisp, RE74, RE 74 2, RE75, RE75 2, u74, u75) e Logit: Prob (expstat=1)=f(age, age 2, educ, educ 2, nodegree, married, black, hisp, RE74, RE75, u74, u75, educ*re74,age 3 ) f Weighted Least Squares: treatment observations weighted as 1, and control observations weighted by the number of times they are matched to a treatment observation (same covariates as (a)). g Number of observations refers to the actual number of comparison and treatment units used for (3) to (5), namely, all treatment units and those comparison units whose estimated propensity score is greater than the minimum, and less than the maximum, estimated propensity score for the treatment group.

22 Matched Samples Table 4. Sample Means of Characteristics for Matched Control Samples Sample Characteristics No. of Age School Black Hisp Nodegree Married RE74 Obs. US$ RE75 US$ NSW ,096 1,532 MPSID ,794 1,126 MPSID ,599 2,225 MPSID ,386 1,863 MCPS ,110 1,396 MCPS ,758 1,204 MCPS ,709 1,587 NOTES: MPSID 1-3 and MCPS 1-3 are the subsamples of PSID 1-3 and CPS 1-3 that are matched to the treatment group.

23 Comparison Group Table 5. Sensitivity of Estimated Training Effects to Specification of the Propensity Score NSW Earnings Less Comparison NSW Treatment Earnings Less Comparison Group Earnings Conditional on the Group Earnings Estimated Propensity Score Quadratic in Score c Stratifying on the Score Matching on the Score (1) (2) (3) (4) (5) (6) (7) (8) Unadjusted Adjusted a Unadjusted Adjusted a Obs. d Unadjusted Adjusted b NSW 1,794 (633) PSID-1: Spec. 1 PSID-1: Spec. 2 PSID-1: Spec. 3 CPS-1: Spec. 4 CPS-1: Spec. 5 CPS-1: Spec. 6 PSID-1: Spec. 7 PSID-2: Spec. 8 PSID-3: Spec. 8 CPS-1: Spec. 9 CPS-2: Spec. 9 CPS-3: Spec. 9-15,205 (1154) -15,205 (1154) -15,205 (1154) -8,498 (712) -8,498 (712) -8,498 (712) -15,205 (1154) -3,647 (959) 1,069 (899) -8,498 (712) -3,822 (670) -635 (657) 1,672 (638) 218 (866) 105 (863) 105 (863) 738 (547) 684 (546) 684 (546) -265 (880) 297 (1004) 243 (1100) 525 (557) 371 (662) 844 (807) A. Dropping higher-order terms 294 1,608 1,254 (1389) (1571) (1616) 539 1,524 1,775 (1344) (1527) (1538) 1,185 1,237 1,155 (1233) (1144) (1280) 1,117 1,713 1,774 (747) (1115) (1152) 1,248 1,452 1,454 (731) (632) (2713) 1,241 1,299 1,095 (671) (547) (925) B. Dropping RE ,023 (1279) (1410) (1493) (1154) (1472) (1495) 1, (1261) (1449) (1493) 1,181 1,234 1,347 (698) (695) (683) 482 1,473 1,588 (731) (1313) (1309) 722 1,348 1,262 (942) (1601) (1600) 1,255 1,691 (2209) 1,533 2,281 (1732) 1, (1720) 4,117 1,582 (1069) 6, (1007) 6,017 1,103 (877) 1,284 1,727 (1447) (1848) (1508) 4,558 1,402 (1067) 1,222 1,941 (1500) 504 1,097 (1366) 1,054 (831) 2,291 (796) 855 (906) 1,616 (751) 904 (769) 1,471 (787) 1,340 (845) 276 (902) 11 (938) 861 (786) 1,668 (755) 1,120 (783) NOTES: Spec. 1: Same as Table 3, note c. Spec. 2: Spec. 1 without higher powers. Spec. 3: Spec. 2 without higher-order terms. Spec. 4: Same as Table 3, note e. Spec. 5: Spec. 4 without higher powers. Spec. 6: Spec. 5 without higher-order terms. Spec. 7: Same as Table 3, note c, with RE74 removed. Spec. 8: Same as Table 3, note d, with RE74 removed. Spec. 9: Same as Table 3, note e, with RE74 removed. a Least Squares Regression: RE78 on a constant, expstat, age, educ, nodegree, black, hisp, RE74, RE75. b Weighted Least Squares: treatment observations weighted as 1, and control observations weighted by the number of times they are matched to a treatment observation (same covariates as (a)). c Least squares regression of RE78 on a quadratic on the estimated propensity score and a treatment indicator, for observations used under stratification; see note (d). d Number of observations refers to the actual number of comparison and treatment units used for (3) to (5), namely, all treatment units and those comparison units whose estimated propensity score is greater than the minimum, and less than the maximum, estimated propensity score for the treatment group.

24 150 Figure 1: Histogram of Estimated Propensity Score, NSW and PSID Solid=treated,Dashed=control Estimated p(xi), 1333 controls discarded, first bin contains Figure 2: Histogram of Estimated Propensity Score, NSW and CPS Solid=treated,Dashed=control Estimated p(xi), controls discarded, first bin contains 2969

Practical propensity score matching: a reply to Smith and Todd

Practical propensity score matching: a reply to Smith and Todd Journal of Econometrics 125 (2005) 355 364 www.elsevier.com/locate/econbase Practical propensity score matching: a reply to Smith and Todd Rajeev Dehejia a,b, * a Department of Economics and SIPA, Columbia

More information

EMPIRICAL STRATEGIES IN LABOUR ECONOMICS

EMPIRICAL STRATEGIES IN LABOUR ECONOMICS EMPIRICAL STRATEGIES IN LABOUR ECONOMICS University of Minho J. Angrist NIPE Summer School June 2009 This course covers core econometric ideas and widely used empirical modeling strategies. The main theoretical

More information

Pros. University of Chicago and NORC at the University of Chicago, USA, and IZA, Germany

Pros. University of Chicago and NORC at the University of Chicago, USA, and IZA, Germany Dan A. Black University of Chicago and NORC at the University of Chicago, USA, and IZA, Germany Matching as a regression estimator Matching avoids making assumptions about the functional form of the regression

More information

Measuring Impact. Program and Policy Evaluation with Observational Data. Daniel L. Millimet. Southern Methodist University.

Measuring Impact. Program and Policy Evaluation with Observational Data. Daniel L. Millimet. Southern Methodist University. Measuring mpact Program and Policy Evaluation with Observational Data Daniel L. Millimet Southern Methodist University 23 May 2013 DL Millimet (SMU) Observational Data May 2013 1 / 23 ntroduction Measuring

More information

Empirical Strategies

Empirical Strategies Empirical Strategies Joshua Angrist BGPE March 2012 These lectures cover many of the empirical modeling strategies discussed in Mostly Harmless Econometrics (MHE). The main theoretical ideas are illustrated

More information

Class 1: Introduction, Causality, Self-selection Bias, Regression

Class 1: Introduction, Causality, Self-selection Bias, Regression Class 1: Introduction, Causality, Self-selection Bias, Regression Ricardo A Pasquini April 2011 Ricardo A Pasquini () April 2011 1 / 23 Introduction I Angrist s what should be the FAQs of a researcher:

More information

Effects of propensity score overlap on the estimates of treatment effects. Yating Zheng & Laura Stapleton

Effects of propensity score overlap on the estimates of treatment effects. Yating Zheng & Laura Stapleton Effects of propensity score overlap on the estimates of treatment effects Yating Zheng & Laura Stapleton Introduction Recent years have seen remarkable development in estimating average treatment effects

More information

1. Introduction Consider a government contemplating the implementation of a training (or other social assistance) program. The decision to implement t

1. Introduction Consider a government contemplating the implementation of a training (or other social assistance) program. The decision to implement t 1. Introduction Consider a government contemplating the implementation of a training (or other social assistance) program. The decision to implement the program depends on the assessment of its likely

More information

Propensity Score Matching with Limited Overlap. Abstract

Propensity Score Matching with Limited Overlap. Abstract Propensity Score Matching with Limited Overlap Onur Baser Thomson-Medstat Abstract In this article, we have demostrated the application of two newly proposed estimators which accounts for lack of overlap

More information

The Prevalence of HIV in Botswana

The Prevalence of HIV in Botswana The Prevalence of HIV in Botswana James Levinsohn Yale University and NBER Justin McCrary University of California, Berkeley and NBER January 6, 2010 Abstract This paper implements five methods to correct

More information

Predicting the efficacy of future training programs using past experiences at other locations

Predicting the efficacy of future training programs using past experiences at other locations Journal of Econometrics ] (]]]]) ]]] ]]] www.elsevier.com/locate/econbase Predicting the efficacy of future training programs using past experiences at other locations V. Joseph Hotz a, *, Guido W. Imbens

More information

Introduction to Observational Studies. Jane Pinelis

Introduction to Observational Studies. Jane Pinelis Introduction to Observational Studies Jane Pinelis 22 March 2018 Outline Motivating example Observational studies vs. randomized experiments Observational studies: basics Some adjustment strategies Matching

More information

ICPSR Causal Inference in the Social Sciences. Course Syllabus

ICPSR Causal Inference in the Social Sciences. Course Syllabus ICPSR 2012 Causal Inference in the Social Sciences Course Syllabus Instructors: Dominik Hangartner London School of Economics Marco Steenbergen University of Zurich Teaching Fellow: Ben Wilson London School

More information

Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy Research

Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy Research 2012 CCPRC Meeting Methodology Presession Workshop October 23, 2012, 2:00-5:00 p.m. Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy

More information

Methods for Addressing Selection Bias in Observational Studies

Methods for Addressing Selection Bias in Observational Studies Methods for Addressing Selection Bias in Observational Studies Susan L. Ettner, Ph.D. Professor Division of General Internal Medicine and Health Services Research, UCLA What is Selection Bias? In the regression

More information

Impact and adjustment of selection bias. in the assessment of measurement equivalence

Impact and adjustment of selection bias. in the assessment of measurement equivalence Impact and adjustment of selection bias in the assessment of measurement equivalence Thomas Klausch, Joop Hox,& Barry Schouten Working Paper, Utrecht, December 2012 Corresponding author: Thomas Klausch,

More information

Marno Verbeek Erasmus University, the Netherlands. Cons. Pros

Marno Verbeek Erasmus University, the Netherlands. Cons. Pros Marno Verbeek Erasmus University, the Netherlands Using linear regression to establish empirical relationships Linear regression is a powerful tool for estimating the relationship between one variable

More information

PubH 7405: REGRESSION ANALYSIS. Propensity Score

PubH 7405: REGRESSION ANALYSIS. Propensity Score PubH 7405: REGRESSION ANALYSIS Propensity Score INTRODUCTION: There is a growing interest in using observational (or nonrandomized) studies to estimate the effects of treatments on outcomes. In observational

More information

Causal Validity Considerations for Including High Quality Non-Experimental Evidence in Systematic Reviews

Causal Validity Considerations for Including High Quality Non-Experimental Evidence in Systematic Reviews Non-Experimental Evidence in Systematic Reviews OPRE REPORT #2018-63 DEKE, MATHEMATICA POLICY RESEARCH JUNE 2018 OVERVIEW Federally funded systematic reviews of research evidence play a central role in

More information

BIOSTATISTICAL METHODS

BIOSTATISTICAL METHODS BIOSTATISTICAL METHODS FOR TRANSLATIONAL & CLINICAL RESEARCH PROPENSITY SCORE Confounding Definition: A situation in which the effect or association between an exposure (a predictor or risk factor) and

More information

Comparing Experimental and Matching Methods using a Large-Scale Field Experiment on Voter Mobilization

Comparing Experimental and Matching Methods using a Large-Scale Field Experiment on Voter Mobilization Comparing Experimental and Matching Methods using a Large-Scale Field Experiment on Voter Mobilization Kevin Arceneaux Alan S. Gerber Donald P. Green Yale University Institution for Social and Policy Studies

More information

Propensity Score. Overview:

Propensity Score. Overview: Propensity Score Overview: What do we use a propensity score for? How do we construct the propensity score? How do we implement propensity score es

More information

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n.

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n. University of Groningen Latent instrumental variables Ebbes, P. IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document

More information

Propensity Score Methods for Causal Inference with the PSMATCH Procedure

Propensity Score Methods for Causal Inference with the PSMATCH Procedure Paper SAS332-2017 Propensity Score Methods for Causal Inference with the PSMATCH Procedure Yang Yuan, Yiu-Fai Yung, and Maura Stokes, SAS Institute Inc. Abstract In a randomized study, subjects are randomly

More information

TESTING FOR COVARIATE BALANCE IN COMPARATIVE STUDIES

TESTING FOR COVARIATE BALANCE IN COMPARATIVE STUDIES TESTING FOR COVARIATE BALANCE IN COMPARATIVE STUDIES by Yevgeniya N. Kleyman A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Statistics) in The

More information

The role of self-reporting bias in health, mental health and labor force participation: a descriptive analysis

The role of self-reporting bias in health, mental health and labor force participation: a descriptive analysis Empir Econ DOI 10.1007/s00181-010-0434-z The role of self-reporting bias in health, mental health and labor force participation: a descriptive analysis Justin Leroux John A. Rizzo Robin Sickles Received:

More information

Confounding by indication developments in matching, and instrumental variable methods. Richard Grieve London School of Hygiene and Tropical Medicine

Confounding by indication developments in matching, and instrumental variable methods. Richard Grieve London School of Hygiene and Tropical Medicine Confounding by indication developments in matching, and instrumental variable methods Richard Grieve London School of Hygiene and Tropical Medicine 1 Outline 1. Causal inference and confounding 2. Genetic

More information

Complier Average Causal Effect (CACE)

Complier Average Causal Effect (CACE) Complier Average Causal Effect (CACE) Booil Jo Stanford University Methodological Advancement Meeting Innovative Directions in Estimating Impact Office of Planning, Research & Evaluation Administration

More information

EC352 Econometric Methods: Week 07

EC352 Econometric Methods: Week 07 EC352 Econometric Methods: Week 07 Gordon Kemp Department of Economics, University of Essex 1 / 25 Outline Panel Data (continued) Random Eects Estimation and Clustering Dynamic Models Validity & Threats

More information

PharmaSUG Paper HA-04 Two Roads Diverged in a Narrow Dataset...When Coarsened Exact Matching is More Appropriate than Propensity Score Matching

PharmaSUG Paper HA-04 Two Roads Diverged in a Narrow Dataset...When Coarsened Exact Matching is More Appropriate than Propensity Score Matching PharmaSUG 207 - Paper HA-04 Two Roads Diverged in a Narrow Dataset...When Coarsened Exact Matching is More Appropriate than Propensity Score Matching Aran Canes, Cigna Corporation ABSTRACT Coarsened Exact

More information

Mostly Harmless Simulations? On the Internal Validity of Empirical Monte Carlo Studies

Mostly Harmless Simulations? On the Internal Validity of Empirical Monte Carlo Studies Mostly Harmless Simulations? On the Internal Validity of Empirical Monte Carlo Studies Arun Advani and Tymon Sªoczy«ski 13 November 2013 Background When interested in small-sample properties of estimators,

More information

Econometric analysis and counterfactual studies in the context of IA practices

Econometric analysis and counterfactual studies in the context of IA practices Econometric analysis and counterfactual studies in the context of IA practices Giulia Santangelo http://crie.jrc.ec.europa.eu Centre for Research on Impact Evaluation DG EMPL - DG JRC CRIE Centre for Research

More information

Glossary From Running Randomized Evaluations: A Practical Guide, by Rachel Glennerster and Kudzai Takavarasha

Glossary From Running Randomized Evaluations: A Practical Guide, by Rachel Glennerster and Kudzai Takavarasha Glossary From Running Randomized Evaluations: A Practical Guide, by Rachel Glennerster and Kudzai Takavarasha attrition: When data are missing because we are unable to measure the outcomes of some of the

More information

EXAMINING THE EDUCATION GRADIENT IN CHRONIC ILLNESS

EXAMINING THE EDUCATION GRADIENT IN CHRONIC ILLNESS EXAMINING THE EDUCATION GRADIENT IN CHRONIC ILLNESS PINKA CHATTERJI, HEESOO JOO, AND KAJAL LAHIRI Department of Economics, University at Albany: SUNY February 6, 2012 This research was supported by the

More information

This content downloaded from on Mon, 2 Sep :55:20 PM All use subject to JSTOR Terms and Conditions

This content downloaded from on Mon, 2 Sep :55:20 PM All use subject to JSTOR Terms and Conditions Causal Effects in Nonexperimental Studies: Reevaluating the Evaluation of Training Programs Author(s): Rajeev H. Dehejia and Sadek Wahba Source: Journal of the American Statistical Association, Vol. 94,

More information

Arturo Gonzalez Public Policy Institute of California & IZA 500 Washington St, Suite 800, San Francisco,

Arturo Gonzalez Public Policy Institute of California & IZA 500 Washington St, Suite 800, San Francisco, WORKING PAPER #521 PRINCETON UNIVERSITY INDUSTRIAL RELATIONS SECTION JUNE 2007 Estimating the Effects of Length of Exposure to a Training Program: The Case of Job Corps 1 Alfonso Flores-Lagunes University

More information

Can Quasi Experiments Yield Causal Inferences? Sample. Intervention 2/20/2012. Matthew L. Maciejewski, PhD Durham VA HSR&D and Duke University

Can Quasi Experiments Yield Causal Inferences? Sample. Intervention 2/20/2012. Matthew L. Maciejewski, PhD Durham VA HSR&D and Duke University Can Quasi Experiments Yield Causal Inferences? Matthew L. Maciejewski, PhD Durham VA HSR&D and Duke University Sample Study 1 Study 2 Year Age Race SES Health status Intervention Study 1 Study 2 Intervention

More information

Jake Bowers Wednesdays, 2-4pm 6648 Haven Hall ( ) CPS Phone is

Jake Bowers Wednesdays, 2-4pm 6648 Haven Hall ( ) CPS Phone is Political Science 688 Applied Bayesian and Robust Statistical Methods in Political Research Winter 2005 http://www.umich.edu/ jwbowers/ps688.html Class in 7603 Haven Hall 10-12 Friday Instructor: Office

More information

Version No. 7 Date: July Please send comments or suggestions on this glossary to

Version No. 7 Date: July Please send comments or suggestions on this glossary to Impact Evaluation Glossary Version No. 7 Date: July 2012 Please send comments or suggestions on this glossary to 3ie@3ieimpact.org. Recommended citation: 3ie (2012) 3ie impact evaluation glossary. International

More information

Estimating average treatment effects from observational data using teffects

Estimating average treatment effects from observational data using teffects Estimating average treatment effects from observational data using teffects David M. Drukker Director of Econometrics Stata 2013 Nordic and Baltic Stata Users Group meeting Karolinska Institutet September

More information

Identifying Peer Influence Effects in Observational Social Network Data: An Evaluation of Propensity Score Methods

Identifying Peer Influence Effects in Observational Social Network Data: An Evaluation of Propensity Score Methods Identifying Peer Influence Effects in Observational Social Network Data: An Evaluation of Propensity Score Methods Dean Eckles Department of Communication Stanford University dean@deaneckles.com Abstract

More information

PREVENTIVE HEALTH BEHAVIORS AMONG THE ELDERLY

PREVENTIVE HEALTH BEHAVIORS AMONG THE ELDERLY PREVENTIVE HEALTH BEHAVIORS AMONG THE ELDERLY by Padmaja Ayyagari Department of Economics Date: Duke University Approved: Frank A. Sloan, Supervisor Alessandro Tarozzi Curtis R. Taylor Ahmed Khwaja Dissertation

More information

AP Statistics Exam Review: Strand 2: Sampling and Experimentation Date:

AP Statistics Exam Review: Strand 2: Sampling and Experimentation Date: AP Statistics NAME: Exam Review: Strand 2: Sampling and Experimentation Date: Block: II. Sampling and Experimentation: Planning and conducting a study (10%-15%) Data must be collected according to a well-developed

More information

WORKING PAPER SERIES

WORKING PAPER SERIES WORKING PAPER SERIES Evaluating School Vouchers: Evidence from a Within-Study Kaitlin P. Anderson Patrick J. Wolf, Ph.D. April 3, 2017 EDRE Working Paper 2017-10 The University of Arkansas, Department

More information

A COMPARISON OF IMPUTATION METHODS FOR MISSING DATA IN A MULTI-CENTER RANDOMIZED CLINICAL TRIAL: THE IMPACT STUDY

A COMPARISON OF IMPUTATION METHODS FOR MISSING DATA IN A MULTI-CENTER RANDOMIZED CLINICAL TRIAL: THE IMPACT STUDY A COMPARISON OF IMPUTATION METHODS FOR MISSING DATA IN A MULTI-CENTER RANDOMIZED CLINICAL TRIAL: THE IMPACT STUDY Lingqi Tang 1, Thomas R. Belin 2, and Juwon Song 2 1 Center for Health Services Research,

More information

P E R S P E C T I V E S

P E R S P E C T I V E S PHOENIX CENTER FOR ADVANCED LEGAL & ECONOMIC PUBLIC POLICY STUDIES Revisiting Internet Use and Depression Among the Elderly George S. Ford, PhD June 7, 2013 Introduction Four years ago in a paper entitled

More information

Working Paper: Designs of Empirical Evaluations of Non-Experimental Methods in Field Settings. Vivian C. Wong 1 & Peter M.

Working Paper: Designs of Empirical Evaluations of Non-Experimental Methods in Field Settings. Vivian C. Wong 1 & Peter M. EdPolicyWorks Working Paper: Designs of Empirical Evaluations of Non-Experimental Methods in Field Settings Vivian C. Wong 1 & Peter M. Steiner 2 Over the last three decades, a research design has emerged

More information

Causal Inference Course Syllabus

Causal Inference Course Syllabus ICPSR Summer Program Causal Inference Course Syllabus Instructors: Dominik Hangartner Marco R. Steenbergen Summer 2011 Contact Information: Marco Steenbergen, Department of Social Sciences, University

More information

Propensity Score Analysis: Its rationale & potential for applied social/behavioral research. Bob Pruzek University at Albany

Propensity Score Analysis: Its rationale & potential for applied social/behavioral research. Bob Pruzek University at Albany Propensity Score Analysis: Its rationale & potential for applied social/behavioral research Bob Pruzek University at Albany Aims: First, to introduce key ideas that underpin propensity score (PS) methodology

More information

The Limits of Inference Without Theory

The Limits of Inference Without Theory The Limits of Inference Without Theory Kenneth I. Wolpin University of Pennsylvania Koopmans Memorial Lecture (2) Cowles Foundation Yale University November 3, 2010 Introduction Fuller utilization of the

More information

Recent advances in non-experimental comparison group designs

Recent advances in non-experimental comparison group designs Recent advances in non-experimental comparison group designs Elizabeth Stuart Johns Hopkins Bloomberg School of Public Health Department of Mental Health Department of Biostatistics Department of Health

More information

MEA DISCUSSION PAPERS

MEA DISCUSSION PAPERS Inference Problems under a Special Form of Heteroskedasticity Helmut Farbmacher, Heinrich Kögel 03-2015 MEA DISCUSSION PAPERS mea Amalienstr. 33_D-80799 Munich_Phone+49 89 38602-355_Fax +49 89 38602-390_www.mea.mpisoc.mpg.de

More information

Impact Evaluation Toolbox

Impact Evaluation Toolbox Impact Evaluation Toolbox Gautam Rao University of California, Berkeley * ** Presentation credit: Temina Madon Impact Evaluation 1) The final outcomes we care about - Identify and measure them Measuring

More information

Discussion. Ralf T. Münnich Variance Estimation in the Presence of Nonresponse

Discussion. Ralf T. Münnich Variance Estimation in the Presence of Nonresponse Journal of Official Statistics, Vol. 23, No. 4, 2007, pp. 455 461 Discussion Ralf T. Münnich 1 1. Variance Estimation in the Presence of Nonresponse Professor Bjørnstad addresses a new approach to an extremely

More information

Comparing Experimental and Matching Methods Using a Large-Scale Voter Mobilization Experiment

Comparing Experimental and Matching Methods Using a Large-Scale Voter Mobilization Experiment Political Analysis Advance Access published September 16, 2005 doi:10.1093/pan/mpj001 Comparing Experimental and Matching Methods Using a Large-Scale Voter Mobilization Experiment Kevin Arceneaux Department

More information

Chapter 14: More Powerful Statistical Methods

Chapter 14: More Powerful Statistical Methods Chapter 14: More Powerful Statistical Methods Most questions will be on correlation and regression analysis, but I would like you to know just basically what cluster analysis, factor analysis, and conjoint

More information

Using ASPES (Analysis of Symmetrically- Predicted Endogenous Subgroups) to understand variation in program impacts. Laura R. Peck.

Using ASPES (Analysis of Symmetrically- Predicted Endogenous Subgroups) to understand variation in program impacts. Laura R. Peck. Using ASPES (Analysis of Symmetrically- Predicted Endogenous Subgroups) to understand variation in program impacts Presented by: Laura R. Peck OPRE Methods Meeting on What Works Washington, DC September

More information

CASE STUDY 2: VOCATIONAL TRAINING FOR DISADVANTAGED YOUTH

CASE STUDY 2: VOCATIONAL TRAINING FOR DISADVANTAGED YOUTH CASE STUDY 2: VOCATIONAL TRAINING FOR DISADVANTAGED YOUTH Why Randomize? This case study is based on Training Disadvantaged Youth in Latin America: Evidence from a Randomized Trial by Orazio Attanasio,

More information

Randomization as a Tool for Development Economists. Esther Duflo Sendhil Mullainathan BREAD-BIRS Summer school

Randomization as a Tool for Development Economists. Esther Duflo Sendhil Mullainathan BREAD-BIRS Summer school Randomization as a Tool for Development Economists Esther Duflo Sendhil Mullainathan BREAD-BIRS Summer school Randomization as one solution Suppose you could do a Randomized evaluation of the microcredit

More information

Regression Discontinuity Analysis

Regression Discontinuity Analysis Regression Discontinuity Analysis A researcher wants to determine whether tutoring underachieving middle school students improves their math grades. Another wonders whether providing financial aid to low-income

More information

Identifying Endogenous Peer Effects in the Spread of Obesity. Abstract

Identifying Endogenous Peer Effects in the Spread of Obesity. Abstract Identifying Endogenous Peer Effects in the Spread of Obesity Timothy J. Halliday 1 Sally Kwak 2 University of Hawaii- Manoa October 2007 Abstract Recent research in the New England Journal of Medicine

More information

Doctors Fees in Ireland Following the Change in Reimbursement: Did They Jump?

Doctors Fees in Ireland Following the Change in Reimbursement: Did They Jump? The Economic and Social Review, Vol. 38, No. 2, Summer/Autumn, 2007, pp. 259 274 Doctors Fees in Ireland Following the Change in Reimbursement: Did They Jump? DAVID MADDEN University College Dublin Abstract:

More information

Does AIDS Treatment Stimulate Negative Behavioral Response? A Field Experiment in South Africa

Does AIDS Treatment Stimulate Negative Behavioral Response? A Field Experiment in South Africa Does AIDS Treatment Stimulate Negative Behavioral Response? A Field Experiment in South Africa Plamen Nikolov Harvard University February 19, 2011 Plamen Nikolov (Harvard University) Behavioral Response

More information

UN Handbook Ch. 7 'Managing sources of non-sampling error': recommendations on response rates

UN Handbook Ch. 7 'Managing sources of non-sampling error': recommendations on response rates JOINT EU/OECD WORKSHOP ON RECENT DEVELOPMENTS IN BUSINESS AND CONSUMER SURVEYS Methodological session II: Task Force & UN Handbook on conduct of surveys response rates, weighting and accuracy UN Handbook

More information

Exploring the Impact of Missing Data in Multiple Regression

Exploring the Impact of Missing Data in Multiple Regression Exploring the Impact of Missing Data in Multiple Regression Michael G Kenward London School of Hygiene and Tropical Medicine 28th May 2015 1. Introduction In this note we are concerned with the conduct

More information

Sequential nonparametric regression multiple imputations. Irina Bondarenko and Trivellore Raghunathan

Sequential nonparametric regression multiple imputations. Irina Bondarenko and Trivellore Raghunathan Sequential nonparametric regression multiple imputations Irina Bondarenko and Trivellore Raghunathan Department of Biostatistics, University of Michigan Ann Arbor, MI 48105 Abstract Multiple imputation,

More information

THE GOOD, THE BAD, & THE UGLY: WHAT WE KNOW TODAY ABOUT LCA WITH DISTAL OUTCOMES. Bethany C. Bray, Ph.D.

THE GOOD, THE BAD, & THE UGLY: WHAT WE KNOW TODAY ABOUT LCA WITH DISTAL OUTCOMES. Bethany C. Bray, Ph.D. THE GOOD, THE BAD, & THE UGLY: WHAT WE KNOW TODAY ABOUT LCA WITH DISTAL OUTCOMES Bethany C. Bray, Ph.D. bcbray@psu.edu WHAT ARE WE HERE TO TALK ABOUT TODAY? Behavioral scientists increasingly are using

More information

Sawtooth Software. The Number of Levels Effect in Conjoint: Where Does It Come From and Can It Be Eliminated? RESEARCH PAPER SERIES

Sawtooth Software. The Number of Levels Effect in Conjoint: Where Does It Come From and Can It Be Eliminated? RESEARCH PAPER SERIES Sawtooth Software RESEARCH PAPER SERIES The Number of Levels Effect in Conjoint: Where Does It Come From and Can It Be Eliminated? Dick Wittink, Yale University Joel Huber, Duke University Peter Zandan,

More information

Propensity scores: what, why and why not?

Propensity scores: what, why and why not? Propensity scores: what, why and why not? Rhian Daniel, Cardiff University @statnav Joint workshop S3RI & Wessex Institute University of Southampton, 22nd March 2018 Rhian Daniel @statnav/propensity scores:

More information

Chapter 13 Estimating the Modified Odds Ratio

Chapter 13 Estimating the Modified Odds Ratio Chapter 13 Estimating the Modified Odds Ratio Modified odds ratio vis-à-vis modified mean difference To a large extent, this chapter replicates the content of Chapter 10 (Estimating the modified mean difference),

More information

The Impact of Relative Standards on the Propensity to Disclose. Alessandro Acquisti, Leslie K. John, George Loewenstein WEB APPENDIX

The Impact of Relative Standards on the Propensity to Disclose. Alessandro Acquisti, Leslie K. John, George Loewenstein WEB APPENDIX The Impact of Relative Standards on the Propensity to Disclose Alessandro Acquisti, Leslie K. John, George Loewenstein WEB APPENDIX 2 Web Appendix A: Panel data estimation approach As noted in the main

More information

Empirical Methods in Economics. The Evaluation Problem

Empirical Methods in Economics. The Evaluation Problem Empirical Methods in Economics Liyousew G. Borga February 17, 2016 The Evaluation Problem Liyou Borga Empirical Methods in Economics February 17, 2016 1 / 74 What if... : Answering Counterfactual Questions

More information

A Potential Outcomes View of Value-Added Assessment in Education

A Potential Outcomes View of Value-Added Assessment in Education A Potential Outcomes View of Value-Added Assessment in Education Donald B. Rubin, Elizabeth A. Stuart, and Elaine L. Zanutto Invited discussion to appear in Journal of Educational and Behavioral Statistics

More information

Econometric Game 2012: infants birthweight?

Econometric Game 2012: infants birthweight? Econometric Game 2012: How does maternal smoking during pregnancy affect infants birthweight? Case A April 18, 2012 1 Introduction Low birthweight is associated with adverse health related and economic

More information

Early Release from Prison and Recidivism: A Regression Discontinuity Approach *

Early Release from Prison and Recidivism: A Regression Discontinuity Approach * Early Release from Prison and Recidivism: A Regression Discontinuity Approach * Olivier Marie Department of Economics, Royal Holloway University of London and Centre for Economic Performance, London School

More information

Fit to play but goalless: Labour market outcomes in a cohort of public sector ART patients in Free State province, South Africa

Fit to play but goalless: Labour market outcomes in a cohort of public sector ART patients in Free State province, South Africa Fit to play but goalless: Labour market outcomes in a cohort of public sector ART patients in Free State province, South Africa Frikkie Booysen Department of Economics / Centre for Health Systems Research

More information

Meta-Analysis and Publication Bias: How Well Does the FAT-PET-PEESE Procedure Work?

Meta-Analysis and Publication Bias: How Well Does the FAT-PET-PEESE Procedure Work? Meta-Analysis and Publication Bias: How Well Does the FAT-PET-PEESE Procedure Work? Nazila Alinaghi W. Robert Reed Department of Economics and Finance, University of Canterbury Abstract: This study uses

More information

Does Male Education Affect Fertility? Evidence from Mali

Does Male Education Affect Fertility? Evidence from Mali Does Male Education Affect Fertility? Evidence from Mali Raphael Godefroy (University of Montreal) Joshua Lewis (University of Montreal) April 6, 2018 Abstract This paper studies how school access affects

More information

The Effects of Maternal Alcohol Use and Smoking on Children s Mental Health: Evidence from the National Longitudinal Survey of Children and Youth

The Effects of Maternal Alcohol Use and Smoking on Children s Mental Health: Evidence from the National Longitudinal Survey of Children and Youth 1 The Effects of Maternal Alcohol Use and Smoking on Children s Mental Health: Evidence from the National Longitudinal Survey of Children and Youth Madeleine Benjamin, MA Policy Research, Economics and

More information

TRACER STUDIES ASSESSMENTS AND EVALUATIONS

TRACER STUDIES ASSESSMENTS AND EVALUATIONS TRACER STUDIES ASSESSMENTS AND EVALUATIONS 1 INTRODUCTION This note introduces the reader to tracer studies. For the Let s Work initiative, tracer studies are proposed to track and record or evaluate the

More information

For general queries, contact

For general queries, contact Much of the work in Bayesian econometrics has focused on showing the value of Bayesian methods for parametric models (see, for example, Geweke (2005), Koop (2003), Li and Tobias (2011), and Rossi, Allenby,

More information

Technical Specifications

Technical Specifications Technical Specifications In order to provide summary information across a set of exercises, all tests must employ some form of scoring models. The most familiar of these scoring models is the one typically

More information

Missing Data and Imputation

Missing Data and Imputation Missing Data and Imputation Barnali Das NAACCR Webinar May 2016 Outline Basic concepts Missing data mechanisms Methods used to handle missing data 1 What are missing data? General term: data we intended

More information

Continuum of a Declining Trend in Correct Self- Perception of Body Weight among American Adults

Continuum of a Declining Trend in Correct Self- Perception of Body Weight among American Adults Georgia Southern University Digital Commons@Georgia Southern Georgia Southern University Research Symposium Apr 16th, 4:00 PM - 5:00 PM Continuum of a Declining Trend in Correct Self- Perception of Body

More information

Minimizing Uncertainty in Property Casualty Loss Reserve Estimates Chris G. Gross, ACAS, MAAA

Minimizing Uncertainty in Property Casualty Loss Reserve Estimates Chris G. Gross, ACAS, MAAA Minimizing Uncertainty in Property Casualty Loss Reserve Estimates Chris G. Gross, ACAS, MAAA The uncertain nature of property casualty loss reserves Property Casualty loss reserves are inherently uncertain.

More information

ECON Microeconomics III

ECON Microeconomics III ECON 7130 - Microeconomics III Spring 2016 Notes for Lecture #5 Today: Difference-in-Differences (DD) Estimators Difference-in-Difference-in-Differences (DDD) Estimators (Triple Difference) Difference-in-Difference

More information

Identifying Mechanisms behind Policy Interventions via Causal Mediation Analysis

Identifying Mechanisms behind Policy Interventions via Causal Mediation Analysis Identifying Mechanisms behind Policy Interventions via Causal Mediation Analysis December 20, 2013 Abstract Causal analysis in program evaluation has largely focused on the assessment of policy effectiveness.

More information

Unit 1 Exploring and Understanding Data

Unit 1 Exploring and Understanding Data Unit 1 Exploring and Understanding Data Area Principle Bar Chart Boxplot Conditional Distribution Dotplot Empirical Rule Five Number Summary Frequency Distribution Frequency Polygon Histogram Interquartile

More information

Ec331: Research in Applied Economics Spring term, Panel Data: brief outlines

Ec331: Research in Applied Economics Spring term, Panel Data: brief outlines Ec331: Research in Applied Economics Spring term, 2014 Panel Data: brief outlines Remaining structure Final Presentations (5%) Fridays, 9-10 in H3.45. 15 mins, 8 slides maximum Wk.6 Labour Supply - Wilfred

More information

Rajeev Dehejia* Experimental and Non-Experimental Methods in Development Economics: A Porous Dialectic

Rajeev Dehejia* Experimental and Non-Experimental Methods in Development Economics: A Porous Dialectic JGD 2015; 6(1): 47 69 Rajeev Dehejia* Experimental and Non-Experimental Methods in Development Economics: A Porous Dialectic Open Access Abstract: This paper surveys six widely-used non-experimental methods

More information

Catherine A. Welch 1*, Séverine Sabia 1,2, Eric Brunner 1, Mika Kivimäki 1 and Martin J. Shipley 1

Catherine A. Welch 1*, Séverine Sabia 1,2, Eric Brunner 1, Mika Kivimäki 1 and Martin J. Shipley 1 Welch et al. BMC Medical Research Methodology (2018) 18:89 https://doi.org/10.1186/s12874-018-0548-0 RESEARCH ARTICLE Open Access Does pattern mixture modelling reduce bias due to informative attrition

More information

Group Work Instructions

Group Work Instructions Evaluating Social Programs Executive Education at the Abdul Latif Jameel Poverty Action Lab June 18, 2007 June 22, 2007 Group Work Instructions You will be assigned to groups of 5-6 people. We will do

More information

THE USE OF MULTIVARIATE ANALYSIS IN DEVELOPMENT THEORY: A CRITIQUE OF THE APPROACH ADOPTED BY ADELMAN AND MORRIS A. C. RAYNER

THE USE OF MULTIVARIATE ANALYSIS IN DEVELOPMENT THEORY: A CRITIQUE OF THE APPROACH ADOPTED BY ADELMAN AND MORRIS A. C. RAYNER THE USE OF MULTIVARIATE ANALYSIS IN DEVELOPMENT THEORY: A CRITIQUE OF THE APPROACH ADOPTED BY ADELMAN AND MORRIS A. C. RAYNER Introduction, 639. Factor analysis, 639. Discriminant analysis, 644. INTRODUCTION

More information

Analysis of TB prevalence surveys

Analysis of TB prevalence surveys Workshop and training course on TB prevalence surveys with a focus on field operations Analysis of TB prevalence surveys Day 8 Thursday, 4 August 2011 Phnom Penh Babis Sismanidis with acknowledgements

More information

Causal Methods for Observational Data Amanda Stevenson, University of Texas at Austin Population Research Center, Austin, TX

Causal Methods for Observational Data Amanda Stevenson, University of Texas at Austin Population Research Center, Austin, TX Causal Methods for Observational Data Amanda Stevenson, University of Texas at Austin Population Research Center, Austin, TX ABSTRACT Comparative effectiveness research often uses non-experimental observational

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 10: Introduction to inference (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 17 What is inference? 2 / 17 Where did our data come from? Recall our sample is: Y, the vector

More information

I INTRODUCTION. Paper presented at the Fourteenth Annual Conference of the Irish Economic Association.

I INTRODUCTION. Paper presented at the Fourteenth Annual Conference of the Irish Economic Association. The Economic and NATURAL Social Review, EXPERIMENTS Vol. 31, No. AND 4, October, PROPENSITY 2000, SCORES pp. 283-308 283 Evaluating State Programmes: Natural Experiments and Propensity Scores* DENIS CONNIFFE

More information

Do Your Online Friends Make You Pay? A Randomized Field Experiment on Peer Influence in Online Social Networks Online Appendix

Do Your Online Friends Make You Pay? A Randomized Field Experiment on Peer Influence in Online Social Networks Online Appendix Forthcoming in Management Science 2014 Do Your Online Friends Make You Pay? A Randomized Field Experiment on Peer Influence in Online Social Networks Online Appendix Ravi Bapna University of Minnesota,

More information

1 Online Appendix for Rise and Shine: The Effect of School Start Times on Academic Performance from Childhood through Puberty

1 Online Appendix for Rise and Shine: The Effect of School Start Times on Academic Performance from Childhood through Puberty 1 Online Appendix for Rise and Shine: The Effect of School Start Times on Academic Performance from Childhood through Puberty 1.1 Robustness checks for mover definition Our identifying variation comes

More information