A Guide to Quasi-Experimental Designs

Similar documents
EMPIRICAL STRATEGIES IN LABOUR ECONOMICS

Estimating average treatment effects from observational data using teffects

Causal Methods for Observational Data Amanda Stevenson, University of Texas at Austin Population Research Center, Austin, TX

Propensity Score Matching with Limited Overlap. Abstract

Empirical Strategies

Effects of propensity score overlap on the estimates of treatment effects. Yating Zheng & Laura Stapleton

Key questions when starting an econometric project (Angrist & Pischke, 2009):

Causal Validity Considerations for Including High Quality Non-Experimental Evidence in Systematic Reviews

Propensity Score Methods for Causal Inference with the PSMATCH Procedure

Propensity scores and causal inference using machine learning methods

Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy Research

Propensity Score Analysis Shenyang Guo, Ph.D.

A note on evaluating Supplemental Instruction

Evaluating health management programmes over time: application of propensity score-based weighting to longitudinal datajep_

Methods for Addressing Selection Bias in Observational Studies

Instrumental Variables I (cont.)

An Introduction to Propensity Score Methods for Reducing the Effects of Confounding in Observational Studies

Using Ensemble-Based Methods for Directly Estimating Causal Effects: An Investigation of Tree-Based G-Computation

Matching methods for causal inference: A review and a look forward

Introduction to Observational Studies. Jane Pinelis

Ec331: Research in Applied Economics Spring term, Panel Data: brief outlines

The role of self-reporting bias in health, mental health and labor force participation: a descriptive analysis

Chapter 21 Multilevel Propensity Score Methods for Estimating Causal Effects: A Latent Class Modeling Strategy

Regression Discontinuity Suppose assignment into treated state depends on a continuously distributed observable variable

Propensity score methods : a simulation and case study involving breast cancer patients.

Lecture II: Difference in Difference. Causality is difficult to Show from cross

Rise of the Machines

Marno Verbeek Erasmus University, the Netherlands. Cons. Pros

Supplement 2. Use of Directed Acyclic Graphs (DAGs)

SUNGUR GUREL UNIVERSITY OF FLORIDA

Pros. University of Chicago and NORC at the University of Chicago, USA, and IZA, Germany

Empirical Strategies 2012: IV and RD Go to School

Complier Average Causal Effect (CACE)

Introduction to Applied Research in Economics Kamiljon T. Akramov, Ph.D. IFPRI, Washington, DC, USA

What is: regression discontinuity design?

Impact and adjustment of selection bias. in the assessment of measurement equivalence

Propensity scores: what, why and why not?

Assessing the impact of unmeasured confounding: confounding functions for causal inference

TREATMENT EFFECTIVENESS IN TECHNOLOGY APPRAISAL: May 2015

Score Tests of Normality in Bivariate Probit Models

A COMPARISON OF IMPUTATION METHODS FOR MISSING DATA IN A MULTI-CENTER RANDOMIZED CLINICAL TRIAL: THE IMPACT STUDY

Introducing a SAS macro for doubly robust estimation

Imputation approaches for potential outcomes in causal inference

Quasi-experimental analysis Notes for "Structural modelling".

Instrumental Variables Estimation: An Introduction

Statistical methods for assessing treatment effects for observational studies.

Manitoba Centre for Health Policy. Inverse Propensity Score Weights or IPTWs

Combining machine learning and matching techniques to improve causal inference in program evaluation

THE USE OF NONPARAMETRIC PROPENSITY SCORE ESTIMATION WITH DATA OBTAINED USING A COMPLEX SAMPLING DESIGN

Measuring Impact. Program and Policy Evaluation with Observational Data. Daniel L. Millimet. Southern Methodist University.

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n.

Empirical Strategies

BIOSTATISTICAL METHODS

Draft Proof - Do not copy, post, or distribute

ICPSR Causal Inference in the Social Sciences. Course Syllabus

Brief introduction to instrumental variables. IV Workshop, Bristol, Miguel A. Hernán Department of Epidemiology Harvard School of Public Health

Understanding Regression Discontinuity Designs As Observational Studies

Lecture II: Difference in Difference and Regression Discontinuity

PubH 7405: REGRESSION ANALYSIS. Propensity Score

Combining the regression discontinuity design and propensity score-based weighting to improve causal inference in program evaluationjep_

Applied Quantitative Methods II

How should the propensity score be estimated when some confounders are partially observed?

Application of Propensity Score Models in Observational Studies

Confounding by indication developments in matching, and instrumental variable methods. Richard Grieve London School of Hygiene and Tropical Medicine

Propensity Score Methods to Adjust for Bias in Observational Data SAS HEALTH USERS GROUP APRIL 6, 2018

The Prevalence of HIV in Botswana

Regression Discontinuity Design (RDD)

Practical propensity score matching: a reply to Smith and Todd

Syllabus.

Write your identification number on each paper and cover sheet (the number stated in the upper right hand corner on your exam cover).

Marginal structural modeling in health services research

George B. Ploubidis. The role of sensitivity analysis in the estimation of causal pathways from observational data. Improving health worldwide

Early Release from Prison and Recidivism: A Regression Discontinuity Approach *

Session 1: Dealing with Endogeneity

What is Multilevel Modelling Vs Fixed Effects. Will Cook Social Statistics

Regression Discontinuity Analysis

An Introduction to Regression Discontinuity Design

Introduction to Applied Research in Economics

Evaluating Social Programs Course: Evaluation Glossary (Sources: 3ie and The World Bank)

Chapter 10. Considerations for Statistical Analysis

Sensitivity Analysis in Observational Research: Introducing the E-value

Using register data to estimate causal effects of interventions: An ex post synthetic control-group approach

Causal Inference in Statistics and the Quantitative Sciences

UNSUPERVISED PROPENSITY SCORING: NN & IV PLOTS

Propensity score analysis with the latest SAS/STAT procedures PSMATCH and CAUSALTRT

A NEW TRIAL DESIGN FULLY INTEGRATING BIOMARKER INFORMATION FOR THE EVALUATION OF TREATMENT-EFFECT MECHANISMS IN PERSONALISED MEDICINE

Identification of population average treatment effects using nonlinear instrumental variables estimators : another cautionary note

Using Propensity Score Matching in Clinical Investigations: A Discussion and Illustration

Mostly Harmless Simulations? On the Internal Validity of Empirical Monte Carlo Studies

University of California, Berkeley

By: Mei-Jie Zhang, Ph.D.

Catherine A. Welch 1*, Séverine Sabia 1,2, Eric Brunner 1, Mika Kivimäki 1 and Martin J. Shipley 1

Optimal full matching for survival outcomes: a method that merits more widespread use

Bayesian Logistic Regression Modelling via Markov Chain Monte Carlo Algorithm

Propensity score methods to adjust for confounding in assessing treatment effects: bias and precision

The Limits of Inference Without Theory

Is Knowing Half the Battle? The Case of Health Screenings

(b) empirical power. IV: blinded IV: unblinded Regr: blinded Regr: unblinded α. empirical power

Simulation study of instrumental variable approaches with an application to a study of the antidiabetic effect of bezafibrate

Health, Mental Health and Labor Productivity: The Role of Self-Reporting Bias

Transcription:

Western Kentucky University From the SelectedWorks of Matt Bogard Fall 2013 A Guide to Quasi-Experimental Designs Matt Bogard, Western Kentucky University Available at: https://works.bepress.com/matt_bogard/24/

A Guide to Quasi- Experimental Designs Matt Bogard Abstract Linear regression is a very powerful empirical tool that allows for controlled comparisons of treatment effects across groups. However, omitted variable bias, selection bias, and issues related to unobserved heterogeneity and endogeneity can bias standard regression results. Quasi- experimental designs including propensity score methods, instrumental variables, regression discontinuity, and difference- in- difference estimators offer an inferentially rigorous alternative for program evaluation. In this guide, I begin with an introduction to the potential outcomes framework for rigorously characterizing selection bias and follow with discussions of quasi- experimental methods that may be useful to practitioners involved in program evaluation. Introduction Linear regression is a very powerful empirical tool that allows for controlled comparisons of treatment effects across groups. However, omitted variable bias, selection bias, and issues related to unobserved heterogeneity and endogeneity can bias standard regression results. Quasi- experimental (QE) designs provide an inferentially rigorous approach to causal inference. As discussed in Cellini (2008): - in- differences approaches are becoming quite common. Indeed, these approaches have replaced basic multivariate regression as The discussion below introduces concepts related to selection bias and unobserved heterogeneity and several quasi- experimental approaches that can be used to address these issues. The Randomized Controlled Experiment In the classic randomized controlled experiment (RCE), subjects are randomly assigned to a treatment and control group in a careful manner that ensures that subjects in each group are identical in all respects except for the treatment assignment. In a RCE, we assume that any difference in the observed outcome of interest is due to the treatment effect because all other factors have been accounted for in the experimental design. In this case, correlation between treatment and outcome, or observed differences in outcomes between treatment and control groups imply a causal treatment effect.

Selection Bias and the Rubin Causal Model and Potential Outcomes Framework The problem of selection bias is best characterized within the Rubin causal model or potential outcomes framework (Angrist and Pischke,2008; Rubin, 1974; Imbens and Wooldridge, 2009, Klaiber & Smith,2009) Suppose Y i is the measured outcome of interest. This can be written in terms of potential outcomes as: Y i = {y 1i if d i =1; y 0i, if d i = 0} (1) = y 0i + (y 1i - y 0i )d i (2) Where d i = choice or selection or treatment Y 0i = baseline potential outcome Y 1i = potential treatment outcome The causal effect of interest is y 1i - y 0i outcomes for a single individual. Reality forces us to compare outcomes for different individuals (those treated vs. untreated). What we actually measure is E[Y i d i =1] - E[Y i d i =0], the observed effect or observed difference between means for treated vs. untreated groups. The problem of non- random treatment assignment or selection bias can be characterized as follows: E[Y i d i =1] - E[Y i d i =0] = E[Y 1i - Y 0i ] + {E[Y 0i d i =1] - E[Y 0i d i =0]} (3) 1 The observed effect or difference is equal to the population average treatment effect (ATE) E[Y 1i - Y 0i ] in addition to the bracketed term which characterizes selection bias. If 0i i =1) differ from potential 0i of - select (d i =0), then the term {E [Y 0i d i =1] - E [Y 0i d i =0]} could have a positive or negative value, creating selection bias. When we calculate the observed difference between treated and untreated groups, selection bias becomes confounded with the actual treatment effect E[Y 1i - Y 0i ]. Note, if the potential outcomes of the treated and control groups were the same, then the selection bias term would equal zero, and the observed difference would represent the population average treatment effect. This is the result we would get from an ideal randomized controlled experiment. Selection bias can overpower the actual treatment effect and leave the naïve researcher to conclude (based on the observed effect E[Y i d i =1] - E[Y i d i =0]) that the intervention

or treatment was ineffectual or lead them to under or overestimate the true treatment effects depending on the direction of the bias. In many cases, applied research in the social sciences and education involves non- experimental or observational data and is plagued by issues related to selection bias. Quasi- experimental methods such as propensity score matching offer a methodological approach to this problem. Propensity Score Matching Previously I explained how selection bias can overpower the actual treatment effect and leave the naïve researcher to conclude that the intervention or treatment was ineffectual or lead them to under or overestimate the true treatment effects depending on the direction of the bias. According to the conditional independence assumption (CIA) (Rubin, 1973; Angrist & Pischke, 2008; Rosenbaum and Rubin, 1983;Angrist and Hahn,2004) conditional on covariate comparisons may remove selection bias, giving us the estimate of the treatment effect we need: E[Y i x i,d i =1]- E[Y i x i,d i =0]= E[Y 1i - Y 0i x i ] or Y 1i,Y 0i d i x i (4) The last term implies that treatment assignment (d i ) and response (Y 1i,Y 0i ) are conditionally independent given covariates x i. This conclusion provides the justification and motivation for utilizing matched comparisons to estimate treatment effects. Matched estimates of treatment effects are achieved by comparing units with similar covariate values and computing a weighted average based on the distribution of covariates: x P(X i =x)=e[y 1i - Y 0i ] = ATE (5) Matched comparisons imply balance o situation similar to a randomized experiment where all subjects are essentially the same except for the treatment (Thoemmes and Kim,2011). As Angrist and Pischke (2009) demonstrate, regression also can be utilized as a type of variance based weighted matching estimator where: 0 1 d 2 X + e and 1 = E[Y i x i,d i =1]- E[Y i x i,d i =0] (6) Matching on covariates can be complicated and cumbersome. An alternative is to implement matching based on an estimate of the probability of receiving treatment or selection. This probability is referred to as a propensity score. Given estimates of the propensity or probability of receiving treatment, comparisons can then be made between observations matched on propensity scores. This is in effect a two stage

process requiring first a specification and estimation of a model used to derive the propensity scores, and then some implementation of matched comparisons made on (1983) states that if the CIA holds, then matching or conditioning on propensity scores (denoted p(x i )) will also eliminate selection bias, i.e. treatment assignment (d i ) and response (Y 1i,Y 0i ) are conditionally independent given propensity scores p(x i ): Y 1i,Y 0i d i x i = Y 1i,Y 0i d i p(x i ) (7) In fact, propensity score matching can provide a more asymptotically efficient estimator of treatment effects than covariate matching (Angrist andhahn,2004). So the idea is to first generate propensity scores by specifying a model that predicts the probability of receiving treatment given covariates x i p(x i ) = p(d i =1 x i ) (8) There are many possible functional forms for estimating propensity scores. Logit and probit models with the binary treatment indicator as the dependent variable are commonly used. Hirano et. al find that an efficient estimator can be achieved by weighting by a non- parametrically estimated propensity score (Hirano, et al, 2003). Millimet and Tchernis find evidence that more flexible and over specified estimators perform better in propensity score applications (Millimet and Tchernis, 2009). A comparative study of propensity score estimators using logistic regression, support vector machines, decision trees, and boosting algorithms can be found in Westreich et al (Westreich et al, 2009). matching is accomplished by identifying individuals in the control group with propensity scores similar to those in the treated group. Types of matching algorithms include 1:1 and nearest neighbor methods. Differences between matched cases are calculated and then combined to estimate an average treatment effect. Another method that implements matching based on propensity scores includes stratified comparisons. In this case treatment and control groups are stratified or divided into groups or categories or bins of propensity scores. Then comparisons are made across strata and combined to estimate an average treatment effect. Matched comparisons based on propensity score strata are discussed in Rosenbaum and Rubin (1984). This method can remove up to 90% of bias due to factors related to selection using as few as five strata (Rosenbaum and Rubin, 1984). Inverse Probability of Treatment Weighted Regression An alternative to direct matching or matching on propensity scores involves the use of the inverse of propensity scores in a weighted regression framework (Horvitz and

Thompson,1952), known as inverse probability of treatment weighted (IPTW) regression where: = E[Y 1i ] and = E[Y i0 ] (Hirano and Imbens, 2001) (9) IPTW regression (with weights specified as above) specifically estimates the average treatment effect (ATE) (Astin,2011): ATE = E[Y 1i - Y 0i ] (10) Inverse probability of treatment weighting uses weights derived from the propensity scores to create a pseudo population such that the distribution of covariates in the population are independent of treatment assignment. (Astin,2011). The weighting Unobserved Heterogeneity and Endogeneity Let's suppose we estimate the following: 0 1 D+ e (11) When we estimate a regression such as (11) above and leave out an important variable such as A 1 can become unbiased and inconsistent. In fact, to the extent that D and A are both correlated, D becomes correlated with the error term violating a basic assumption of regression. The omitted information in A is referred to in econometrics as Heterogeneity is simply variation across individual units of observations, and since we c heterogeneity as it relates to A, we have unobserved heterogeneity. Correlation between an explanatory variable and the error term is referred to as endogeneity. So in econometrics, when we have an omitted variable (as is often with cases of causal inference and selection bias) we say we have endogeneity caused by unobserved heterogeneity. 1? We know from basic econometrics that our estimate of 1 = b = COV(Y,D)/VAR(D) (12) 0 1 D+ e into (12) we get: 0 1 D+ e,d)/var(d) =

0,D)/VAR(D 1 D,D)/VAR(D) + COV(e,D)/VAR(D) (13) 1 VAR(D)/VAR(D) + COV(e,D)/VAR(D) (14) 1 + COV(e,D)/VAR(D) (15) We can see from (15) that if we leave out a variable in (11) i.e. we have unobserved heterogeneity, then the correlation that results between D and the error term will not be zero, and our estima 1 will be biased by the term COV(e,D)/VAR(D). If (11) were correctly specified, then the term COV(e,D)/VAR(D) will drop out and we will get 1 Instrumental Variables Suppose an institution has a summer camp designed to prepare high school students for their first year of college and we want to assess the impacts of the camp on 1 st year retention might be specified as follows: 0 1 2 X + e (16) Where y = binary first year retention indicator CAMP = an indicator for camp attendance X = a vector of controls us with: 0 1 CAMP + e (17) The causal effect of interest or the treatment effect of CAMP is our regression estimate 1 in the regression above. But, what if CAMP attendance is voluntary? If attendance is voluntary, then it could be that students that choose to attend also have a high propensity to succeed or retain due to unmeasured factors (social capital, innate ability, ambition, etc.) not captured even with controls like test scores or other measures of ability. 1 could overstate the actual impact of CAMP on retention. If we knew about a variable that captures the omitted factors that may be related to both the choice of attending the CAMP and having a greater t call it INDEX, we would include it and estimate the following: 0 1 2 INDEX + e (18)

Omitted variable bias in equation (17) would cause us to mis- estimate the effect of CAMP. One way to characterize the selection bias problem is through the potential outcomes framework that I have discussed before, but this time lets characterize this problem in terms of the regression specification above. By omitting INDEX, information about INDEX is getting sucked up into the error term. When this happens, to the extent that INDEX is correlated with CAMP, CAMP becom This correlation with the error term is a violation of the classical regression assumptions and leads to biased estimates of 1. For more technical terminology than up into the error term, we would frame this in the context of our previous discussion of unobserved heterogeneity and endogeneity. So the question becomes, how do we tease out the true effects of CAMP, when we have Techniques using what are referred to as instrumental variables will help us do this. d Z. Suppose that Z tends to be correlated with our variable of interest CAMP. But we also notice (or argue) that Z tends to be unrelated to all of those omitted factors like innate ability and ambition that comprise the variable INDEX that we wish we had. The technique of instrumental variables looks at changes in a variable like Z, and relates them to changes in our variable of interest CAMP, and then relates those changes to the outcome of interest, retention. Since Z is unrelated to INDEX, then those changes in CAMP that are related to Z are likely to be less correlated with INDEX (and hence less correlated with the error - technical way to think about this is that we are taking Z and going through CAMP to get to Y, and bringing with us only those aspects of CAMP that are unrelated to INDEX. Z is like a filter that picks up only the variation in CAMP (what we - experimental variation) that we are interested in and filters out the noise picked up from not including or controlling for INDEX. Z is technically related to Y only through CAMP. Z (19) If we can do this, then our estimate of the effects of CAMP on Y will be unbiased by the omitted effects of INDEX. So how do we do this in practice? We can do this through a series of regressions. To relate changes in Z to changes in CAMP we estimate: 0 1 Z + e (20) Notice in (20 1 only picks up the common variation between Z and CAMP and leaves all of the variation in CAMP related to INDEX in the residual term related to INDEX because we are arguing that Z and INDEX are uncorrelated). You can think of this as the filtering process. Then, to relate changes in Z to changes in our target Y we estimate:

0 2 Z + e (21) Our instrumental variable estimator then becomes: IV 2 1 or - 1-1 MP or COV(Y,Z)/COV(CAMP,Z) (22) The last term in (22 IV represents the proportion of total variation in CAMP that is related to our Z that is also related to Y. Or, the total proportion of variation in CAMP unrelated to INDEX that is related to Y. Or, the total proportion of - in CAMP related to Y. Regardless of how IV, we can see that it teases out only that variation in CAMP that is unrelated to INDEX and relates it to Y giving us an estimate for the treatment effect of CAMP that is less biased than the standard regression like (17). IV by substitution via two stage least squares : CAMP est 0 1 Z + e (23) Y 0 + IV CAMP est + e (24) As discussed above, the first regression gets only the variation in CAMP related to Z, and leaves all of the variation in CAMP related to INDEX in the residual term. As Angrist and IV and retains only the quasi- experimental variation in CAMP generated by the instrument Z. (because in the second regresson we are using the estimate of camp derived from (23) vs. CAMP itself). 2 As discussed in their book Mostly Harmless Econometrics, most IV estimates are derived using packages like SAS, STATA, or R vs. explicit implementation of the methods illustrated above. Caution should be used to derive the correct standard errors, which are not the ones you will get in the intermediate results from any of the regressions depicted above. Difference- In- Difference Estimators Difference- in- difference (DD) estimators assume that in absence of treatment the over time. DD estimators are a special type of fixed effects estimator.

(A- B) = Differences in groups pre- groups. - B) = total post treatment effect = normal effect (A- B) + treatme - A) - treatment to the difference in group averages post treatment. The larger the difference post treatment the larger the treatment effect. This can also be represented in the regression context with interactions where t = time indicating pre and post treatment and d is an indicator for treatment and control groups. At t= 0 there are no treatments so those terms equal 0. The parameter 3 on the interaction term is our difference- in- difference estimator as shown below. 0 1 d 2 3 d*t + e (25) Treatment A (D=1) Group (t = 0) (t = 1) Difference 0 1 Control B (D=0) Difference in Difference:

Regression Discontinuity Designs Suppose a policy or intervention is implemented or a treatment is applied based on arbitrary values of some observed covariate value or values X 0. If there is some positive applied to subjects where X > X 0 more likely to exhibit higher levels of the outcome variable Y anyway? Is it valid to make comparisons of observed outcomes (Y) between groups with differing values of (X)? One solution would be to implement matched comparisons between groups with similar values of covariates. Regression discontinuity designs allow us to compare differences between groups in the neighborhood of the cutoff value X 0 giving us unbiased estimates of treatment effects. Treatment effects can be characterized by a change in intercept or main effect at the discontinuity.treatment assignment is equivalent to random assignment within the neighborhood of the cutoff (Lee & Lemieux,2010). More complicated functional forms may be estimated: (26) where f(x) may be a pth order polynomial Comparisons of outcomes in the neighborhood of X 0 provide estimates of the treatment effect of E[Y X] (Angrist &Pischke, 2009). Even more complicated methods including local linear regression may be implemented as well as combined main and interaction effects.

Lee and Lemieux (2010) provide a very good introduction to RD designs and make two very important points about their usefulness as a QE method. design can be interpreted as a weighted average treatment effect across all individuals" the notion that the RD design gene instruments to the RD design" Among quasi- experimental designs, RD designs may be regarded as coming the closest to the ideal of a randomized controlled experiment.

Sharp vs.fuzzy RD With a sharp RD model subjects are assigned to treatment and control groups based on the value of the observed cutoff X 0 - he cutoff, (referred to as non- compliance or incomplete compliance in some settings) it may be the case that we find subjects with values of X near the cutoff in both treatment and control groups. As van der Klaauw (2002) explains, this could be a case where assignment is based on the observable values of X in addition to other unobservable factors. RD in this context is a case of both selection on observables and unobservables. As explained in Angrist and Pischke (2009) the discontinuity serves as an instrument for treatment status and fuzzy RD can be understood in the instrumental variables context. Conclusion Multivariable regression can be a powerful empirical tool for estimating treatment effects of interventions. However issues related to omitted variable bias, selection bias, and unobserved heterogeneity and endogeneity can bias standard regression results. Quasi- experimental designs including propensity score methods, instrumental variables, regression discontinuity, and difference- in- difference estimators offer a more rigorous alternative for program evaluation. Notes 1. A variation on equation (3) can be written as: E[Y i d i =1] - E[Y i d i =0] = E[Y 1i - Y 0i d i =1] +{E[Y 0i d i =1] - E[Y 0i d i =0]} The expression E[Y 1i - Y 0i d i =1] represents the treatment effect on the treated (ATT) vs. the average treatment effect (ATE) E[Y 1i - Y 0i ] depicted earlier. For more detailed discussion on ATE vs ATT in the context of quasi- experimental designs see Austin, P.(2011), Angrist, J. D. & Pischke J. (2008), and Lanehart et al (2012). 2. We can also see that instrumental variables correct for omitted variable bias in the following way: IV = COV(Y,Z)/COV(D,Z) 0 1 D+ e we get (where D is our treatment indicator as in : 0 1 D+ e,z)/ COV(D,Z) =

0,Z)/ COV(D,Z) 1 D,Z)/ COV(D,Z) + COV(e,Z)/ COV(D,Z) 1 COV(D,Z)/ COV(D,Z) + COV(e,Z)/ COV(D,Z) 1 + COV(e,Z)/ COV(D,Z) By construction COV(e,Z) = 0 and we get an unbiased estimate of 1 References Angrist, J. D., & Hahn, J. (2004). When to control for covariates? Panel- Asymptotic Results for Estimates of Treatment Effects. Review of Economics and Statistics. 86, 58-72. Angrist,J.D., Imbens, G.W. & Rubin, D. (1996). Identification of Causal Effects Using Instrumental Variables. Journal of the American Statistical Association, Vol. 91, No. 434 (Jun., 1996), pp. 444-455 Angrist, J. D. & Pischke J. (2009). Mostly harmless econometrics: An empiricist's companion. Princeton University Press. Austin, P.(2011). An Introduction to Propensity Score Methods for Reducing the Effects of Confounding in Observational Studies. Multivariate Behav Research.May 46(3): 399 424. Baumer, P. Regression Discontinuity. Southern Methodist University. http://faculty.smu.edu/kyler/courses/7312/presentations/baumer/baumer_rd.pdf Caruana,R. & Niculescu- Mizil,A. (2006). An empirical comparison of supervised learning algorithms. The Proceedings of the 23rd International Conference on Machine Learning (ICML2006). pp. 161-168. Cleary,P. D. & Ronald,A. (1984). The analysis of relationships involving dichotomous dependent variables. Journal of Health and Social Behavior, Vol. 25, No. 3, pp. 334-348 Crump, R. K., Hotz, J., Imbens, G.W., & Mitnik, O.A. (2006). Moving the goalposts: addressing limited overlap in the estimation of average treatment effects by changing the estimand. Working Paper 33. National Bureau of Economic Research D'Agostino,R. B.(1971) A second look at analysis of variance on dichotomous data. Journal of Educational Measurement, Vol. 8, No. 4, pp. 327-333

Dey, E. L. & Astin, A. W. (1993). Statistical alternatives for studying college student retention: A comparative analysis of logit, probit, and linear regression. Research in Higher Education, 34(5). Evans, W. N. (2008). Difference in Difference Models, Course Notes ECON 47950: Methods for Inferring Causal Relationships in Economics. University of Notre Dame. Spring. Harder, V. S., Stuart, E. A., Anthony, J. C. (2010). Propensity score techniques and the assessment of measure covariate balance to test causal associations in psychological research. Psychological Methods, 15, 234-249. Hirano, K. & Imbens, G.W. (2001). Estimation of causal effects using propensity score weighting: an application to data on right heart catheterization. Health Services & Outcomes Research Methodology, 2:259 278. Hirano, K. & Imbens, G.W. & Ridder, G. (2003). Efficient estimation of average treatment effects using the estimated propensity score. Econometrica, Vol. 71, No. 4, 1161 1189. Horvitz D. G. & Thompson D. J.(1952). A Generalization of Sampling Without Replacement From a Finite Universe. Journal of the American Statistical Association, Vol. 47, No. 260 (Dec., 1952) Imbens, G W. & Lemieux, T.( 2008) "Regression discontinuity designs: A guide to practice," Journal of Econometrics, Elsevier, vol. 142(2), pages 615-635, February. Klaiber, H.A. & Smith,V.K. (2009). Evaluating Rubin's causal model for measuring the capitalization of environmental amenities. NBER Working Paper No 14957. National Bureau of Economic Research. Imbens, G. W. & Wooldridge, J.M.(2009). Recent developments in the econometrics of program evaluation. Journal of Economic Literature, 47:1, 5 86 Kang, J. and Schafer, J.(2007). Demystifying double robustness: A comparison of alternativestrategies for estimating a population mean from incomplete data. Statistical Science, 22(4), 523 539. Lanehart, R.E,de Gil, P.R, Kim,E.S, Bellara, A.P.,Kromney,J.D. & Lee, R.S. (2012). Propensity score analysis and assessment of propensity score approaches using SAS procedures. Paper 314-2012. SAS Global Forum 2012 Proceedings. North Carolina; SAS Institute

Lee, D.S. and Lemieux, T. (2010). Regression Discontinuity Designs in Economics Journal of Economic Literature 48 281-355 Lunney,G. H. (1970). Using analysis of variance with a dichotomous dependent variable: An empirical study. Journal of Educational Measurement, Vol. 7, No. 4, pp. 263-269 Maciejewski, M. L. & Brookhart, M.A. (2011). Propensity score workshop. Retrieved January 19,2013. Website: http://ahrqplexnet.sharepointspace.com/webinars/ps_webinar_followup.pdf Millimet, D. L. & Tchernis, R.(2009). On the specification of propensity scores, with applications to the analysis of trade policies. Journal of Business & Economic Statistics,Vol. 27, No. 3 Moss, B.G. and. Yeaton W.H. (2006). Shaping Policies Related to Developmental Education: An Evaluation Using the Regression- Discontinuity Design. EDUCATIONAL EVALUATION AND POLICY ANALYSIS September 21, 2006 vol. 28 no. 3 215-229 Pike,G.R., Hansen, M.J. & Lin, C. Using Instrumental Variables to Account for Selection Effects in Research on First- Year Programs. Research in Higher Education Volume 52, Number 2, 194-214 Pischke, J. (2012). Probit better than LPM? Accessed January 19, 2013. RetrievedJanuary 19,2013. Website: http://www.mostlyharmlesseconometrics.com/2012/07/probit- better- than- lpm/ Robins, J. M., Hernan, M. A., & Brumback, B. (2000). Marginal structural models and causal inference in epidemiology. Epidemiology, 11, 550-560. Rosenbaum, R. &. Rubin, D.B.(1983). The central role of the propensity score in observational studies for causal effects. Biometrika, Vol. 70, No. 1, pp. 41-55 Rosenbaum, R. &. Rubin, D.B.(1984). Reducing Bias in Observational Studies Using Sub classification on the Propensity Score. Journal of the American Statistical Association, Vol. 79, Issue. 387, pp. 516-524 Rubin, D. B.(1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, Vol 66(5), Oct 1974, 688-701

Rubin, Donald B. (1973). Matching to remove bias in observational studies. Biometrics, 29, 159-83. Stuart, E.(2011).Propensity score methods for estimating causal effects: The why, when, and how. Johns Hopkins Bloomberg School of Public Health. Department of Mental Health. Department of Biostatistics. Retrieved January 19,2013. Website: www.biostat.jhsph.edu/estuart Thoemmes, F. J. & Kim, E. S. (2011). A systematic review of propensity score methods in the social sciences. Multivariate Behavioral Research, 46(1), 90-118. Program Evaluation and the Difference- in- Difference Estimator Course Notes Education Policy and Program Evaluation. Vanderbilt University October 4, 2008 van der Klaauw (2002). Estimating the effect of financial aid offers on college enrollment: A regression- discontinuity approach. International Economic Review. 43(4), 1249 1287 Westreich, D., Justin L., & Funk, M.J. (2010). Propensity score estimation: machine learning and classification methods as alternatives to logistic regression. Journal of Clinical Epidemiology, 63(8): 826 833.