Inference with Difference-in-Differences Revisited

Similar documents
Example 7.2. Autocorrelation. Pilar González and Susan Orbe. Dpt. Applied Economics III (Econometrics and Statistics)

EC352 Econometric Methods: Week 07

MEA DISCUSSION PAPERS

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n.

Carrying out an Empirical Project

Statistical Tests Using Experimental Data

Session 1: Dealing with Endogeneity

Score Tests of Normality in Bivariate Probit Models

Session 3: Dealing with Reverse Causality

NPTEL Project. Econometric Modelling. Module 14: Heteroscedasticity Problem. Module 16: Heteroscedasticity Problem. Vinod Gupta School of Management

The Late Pretest Problem in Randomized Control Trials of Education Interventions

Lecture 14: Adjusting for between- and within-cluster covariates in the analysis of clustered data May 14, 2009

Lec 02: Estimation & Hypothesis Testing in Animal Ecology

Write your identification number on each paper and cover sheet (the number stated in the upper right hand corner on your exam cover).

The Effects of Autocorrelated Noise and Biased HRF in fmri Analysis Error Rates

ECON Microeconomics III

Statistical Power Sampling Design and sample Size Determination

Consequences of effect size heterogeneity for meta-analysis: a Monte Carlo study

1 Simple and Multiple Linear Regression Assumptions

Sample size and power calculations in Mendelian randomization with a single instrumental variable and a binary outcome

Ec331: Research in Applied Economics Spring term, Panel Data: brief outlines

Instrumental Variables Estimation: An Introduction

Unit 1 Exploring and Understanding Data

CHAPTER 4 Designing Studies

Comparison of Some Almost Unbiased Ratio Estimators

Cheap Donuts and Expensive Broccoli: The Effect of Relative Prices on Obesity

Chapter 11: Advanced Remedial Measures. Weighted Least Squares (WLS)

Marno Verbeek Erasmus University, the Netherlands. Cons. Pros

Applied Quantitative Methods II

OLS Regression with Clustered Data

Problem set 2: understanding ordinary least squares regressions

Measuring the Impacts of Teachers: Reply to Rothstein

Assessing Studies Based on Multiple Regression. Chapter 7. Michael Ash CPPA

Assignment #6. Chapter 10: 14, 15 Chapter 11: 14, 18. Due tomorrow Nov. 6 th by 2pm in your TA s homework box

Where does "analysis" enter the experimental process?

Final Exam - section 2. Thursday, December hours, 30 minutes

Glossary From Running Randomized Evaluations: A Practical Guide, by Rachel Glennerster and Kudzai Takavarasha

Reflection Questions for Math 58B

Meta-Analysis and Publication Bias: How Well Does the FAT-PET-PEESE Procedure Work?

Empirical Tools of Public Finance. 131 Undergraduate Public Economics Emmanuel Saez UC Berkeley

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Examining Relationships Least-squares regression. Sections 2.3

Still important ideas

Bayesian and Frequentist Approaches

Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F

DISCUSSION PAPER SERIES. No MEASURING THE IMPACTS OF TEACHERS: RESPONSE TO ROTHSTEIN (2014) Raj Chetty, John Friedman and Jonah Rockoff

MS&E 226: Small Data

Do Your Online Friends Make You Pay? A Randomized Field Experiment on Peer Influence in Online Social Networks Online Appendix

Business Statistics Probability

Introduction to Observational Studies. Jane Pinelis

Psychology Research Process

n Outline final paper, add to outline as research progresses n Update literature review periodically (check citeseer)

Propensity scores: what, why and why not?

Chapter 1: Exploring Data

Performance of Median and Least Squares Regression for Slightly Skewed Data

EPI 200C Final, June 4 th, 2009 This exam includes 24 questions.

For general queries, contact

Mantel-Haenszel Procedures for Detecting Differential Item Functioning

Establishing Causality Convincingly: Some Neat Tricks

Regression Discontinuity Design (RDD)

Dichotomizing partial compliance and increased participant burden in factorial designs: the performance of four noncompliance methods

What is: regression discontinuity design?

Treatment effect estimates adjusted for small-study effects via a limit meta-analysis

Econometric Game 2012: infants birthweight?

Study Guide for the Final Exam

RUNNING HEAD: TESTING INDIRECT EFFECTS 1 **** TO APPEAR IN JOURNAL OF PERSONALITY AND SOCIAL PSYCHOLOGY ****

An Introduction to Modern Econometrics Using Stata

A COMPARISON OF IMPUTATION METHODS FOR MISSING DATA IN A MULTI-CENTER RANDOMIZED CLINICAL TRIAL: THE IMPACT STUDY

Statistical Science Issues in HIV Vaccine Trials: Part I

The Limits of Inference Without Theory

Meta-Analysis. Zifei Liu. Biological and Agricultural Engineering

Comparing Pre-Post Change Across Groups: Guidelines for Choosing between Difference Scores, ANCOVA, and Residual Change Scores

Russian Journal of Agricultural and Socio-Economic Sciences, 3(15)

SESUG Paper SD

EMPIRICAL STRATEGIES IN LABOUR ECONOMICS

The Dynamic Effects of Obesity on the Wages of Young Workers

INTRODUCTION TO ECONOMETRICS (EC212)

Instrumental Variables I (cont.)

WDHS Curriculum Map Probability and Statistics. What is Statistics and how does it relate to you?

Insight Assessment Measuring Thinking Worldwide

Mark J. Koetse 1 Raymond J.G.M. Florax 2 Henri L.F. de Groot 1,3

Mostly Harmless Simulations? On the Internal Validity of Empirical Monte Carlo Studies

SUPPLEMENTARY INFORMATION

Testing the Predictability of Consumption Growth: Evidence from China

Section on Survey Research Methods JSM 2009

Checking the counterarguments confirms that publication bias contaminated studies relating social class and unethical behavior

Methods of Reducing Bias in Time Series Designs: A Within Study Comparison

Individual preference heterogeneity, targeting and welfare effects of soda taxes

Quasi-experimental analysis Notes for "Structural modelling".

1.4 - Linear Regression and MS Excel

Getting started with Eviews 9 (Volume IV)

Chapter 02. Basic Research Methodology

AP Statistics. Semester One Review Part 1 Chapters 1-5

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Should a Normal Imputation Model Be Modified to Impute Skewed Variables?

Sampling Weights, Model Misspecification and Informative Sampling: A Simulation Study

Randomization as a Tool for Development Economists. Esther Duflo Sendhil Mullainathan BREAD-BIRS Summer school

econstor Make Your Publications Visible.

F1: Introduction to Econometrics

Transcription:

Inference with Difference-in-Differences Revisited M. Brewer, T- F. Crossley and R. Joyce Journal of Econometric Methods, 2018 presented by Federico Curci February 22nd, 2018 Brewer, Crossley and Joyce Inference D-D Applied Reading Group 1 / 28

This paper What we know Misleading inference in D-D because of serial correlation in the errors Brewer, Crossley and Joyce Inference D-D Applied Reading Group 2 / 28

This paper What we know Misleading inference in D-D because of serial correlation in the errors What has been proposed Cluster-robust standard errors can deal with error correlation Not possible with small number of groups Brewer, Crossley and Joyce Inference D-D Applied Reading Group 2 / 28

This paper What we know Misleading inference in D-D because of serial correlation in the errors What has been proposed Cluster-robust standard errors can deal with error correlation Not possible with small number of groups What they propose Combination of feasible GLS with cluster-robust inference solve this problem even with small number of groups Brewer, Crossley and Joyce Inference D-D Applied Reading Group 2 / 28

Presentation Inference problem in D-D: Bertrand et al. (2004) Cluster-based solutions: Cameron and Miller (2015) Brewer et al. (2018) solution Brewer, Crossley and Joyce Inference D-D Applied Reading Group 3 / 28

Previously on Brewer, Crossley and Joyce Inference D-D Applied Reading Group 4 / 28

Bertrand et al. (2004) Standard D-D model y i g t = α g + δ t + βt g t + γw i g t + v i g t (1) Brewer, Crossley and Joyce Inference D-D Applied Reading Group 5 / 28

Bertrand et al. (2004) Standard D-D model y i g t = α g + δ t + βt g t + γw i g t + v i g t (1) Standard errors might be inconsistent (even if estimator unbiased) v i g t may not be iid within group: v i g t = ɛ g t + u i g t Brewer, Crossley and Joyce Inference D-D Applied Reading Group 5 / 28

Bertrand et al. (2004) Standard D-D model y i g t = α g + δ t + βt g t + γw i g t + v i g t (1) Standard errors might be inconsistent (even if estimator unbiased) v i g t may not be iid within group: v i g t = ɛ g t + u i g t...because of serial correlation DD estimation usually relies on fairly long time-series Most commonly used dependent variables in DD are serially correlated Treatment variable changes itself very little within a state over time Brewer, Crossley and Joyce Inference D-D Applied Reading Group 5 / 28

Bertrand et al. (2004) Standard D-D model y i g t = α g + δ t + βt g t + γw i g t + v i g t (1) Standard errors might be inconsistent (even if estimator unbiased) v i g t may not be iid within group: v i g t = ɛ g t + u i g t...because of serial correlation DD estimation usually relies on fairly long time-series Most commonly used dependent variables in DD are serially correlated Treatment variable changes itself very little within a state over time...and cross sectional correlation Use micro-data but estimate effect treatment which varies only at group level Brewer, Crossley and Joyce Inference D-D Applied Reading Group 5 / 28

Bertrand et al. (2004) Wrong inference in terms of both size and power Size: type I error Probability to reject null hypothesis when the null hypothesis is true Probability to assert something that is absent Power: type II error Probability to not reject null hypothesis when the null hypothesis is false Ability of a test to detect an effect, if the effect actually exist Brewer, Crossley and Joyce Inference D-D Applied Reading Group 6 / 28

To solve cross-sectional variation First-stage aggregation Regress using micro data y i g t on w i g t and take the mean residual within each group-time cell (Ŷ g t ) Regress Ŷ i g t on fixed effects and treatment Brewer, Crossley and Joyce Inference D-D Applied Reading Group 7 / 28

To solve cross-sectional variation First-stage aggregation Regress using micro data y i g t on w i g t and take the mean residual within each group-time cell (Ŷ g t ) Regress Ŷ i g t on fixed effects and treatment We still have a problem of serial correlation Brewer, Crossley and Joyce Inference D-D Applied Reading Group 7 / 28

Bertrand et al. (2004) Assess the extent of serial correlation problem in DD Examine how DD performs on placebo laws: treated states and year of passage are chosen at random Since law are fictitious a significant effect at the 5 percent level should be found roughly 5 percent of the time Use Monte Carlo simulations Brewer, Crossley and Joyce Inference D-D Applied Reading Group 8 / 28

Assess magnitude of serial correlation problem Use women s wages from Current Population Survey Years 1979-1999 Women between 25 and 50 with positive earnings 50*21 state-years couples (1050) Brewer, Crossley and Joyce Inference D-D Applied Reading Group 9 / 28

Assess magnitude of serial correlation problem Use women s wages from Current Population Survey Years 1979-1999 Women between 25 and 50 with positive earnings 50*21 state-years couples (1050) Randomly generate laws that affect some states and not other Draw a year at random from a uniform distribution between 85-95 Select half the states at random Create treatment dummy Brewer, Crossley and Joyce Inference D-D Applied Reading Group 9 / 28

Assess magnitude of serial correlation problem Use women s wages from Current Population Survey Years 1979-1999 Women between 25 and 50 with positive earnings 50*21 state-years couples (1050) Randomly generate laws that affect some states and not other Draw a year at random from a uniform distribution between 85-95 Select half the states at random Create treatment dummy Estimate w i g t = α g + δ t + βt g t + γw i g t + v i g t Generate estimate ˆβ and standard error Repeat this exercise large number of times each time drawing new laws at random Brewer, Crossley and Joyce Inference D-D Applied Reading Group 9 / 28

Bertrand et al. 200 independent draws of placebo laws Brewer, Crossley and Joyce Inference D-D Applied Reading Group 10 / 28

Bertrand et al. Reject null of no effect 67.5 percent of the time Brewer, Crossley and Joyce Inference D-D Applied Reading Group 10 / 28

Bertrand et al. Power: reject null of no effect against alternative of 2 percent 66 times Brewer, Crossley and Joyce Inference D-D Applied Reading Group 10 / 28

Presentation Inference problem in D-D: Bertrand et al. (2004) Cluster-based solutions: Cameron and Miller (2015) Brewer et al. (2018) solution Brewer, Crossley and Joyce Inference D-D Applied Reading Group 11 / 28

Cameron and Miller (2015) Cluster-robust standard errors can deal with error correlation Cluster-Robust Standard Error (CRSE): [ ( V clu ˆβ] = X X ) ( 1 ) Gg (X =1 X g v g v g X g X ) 1 Consistent and Wald statistics based on it asymptotical normal, as G Brewer, Crossley and Joyce Inference D-D Applied Reading Group 12 / 28

Cameron and Miller (2015) Cluster-robust standard errors can deal with error correlation Cluster-Robust Standard Error (CRSE): [ ( V clu ˆβ] = X X ) ( 1 ) Gg (X =1 X g v g v g X g X ) 1 Consistent and Wald statistics based on it asymptotical normal, as G Wald t-statistics w = ˆβ β 0 s ˆβ If G, then w N(0,1) under H 0 : β = β 0 Finite G: unknown distribution of w. Common to use w T (G 1) Brewer, Crossley and Joyce Inference D-D Applied Reading Group 12 / 28

Cameron and Miller (2015) Problem with Few Clusters Despite reasonable precision in estimating β, ˆV clu ( ˆβ) can be downwards-biased "Overfitting": estimated residuals smaller than true errors Small-G bias larger when distribution of regressors is skewed (Mackinnon and Webb, 2017) Imbalance between number of treated and control Brewer, Crossley and Joyce Inference D-D Applied Reading Group 13 / 28

Cameron and Miller (2015) Solution 1: Bias-Corrected CRSE ṽ g = G(N 1) (G 1)(N K ) ˆv g Reduce but not eliminate over-rejection when there are few clusters Brewer, Crossley and Joyce Inference D-D Applied Reading Group 14 / 28

Cameron and Miller (2015) Solution 1: Bias-Corrected CRSE ṽ g = G(N 1) (G 1)(N K ) ˆv g Reduce but not eliminate over-rejection when there are few clusters Solution 2: Wild Cluster Bootstrap Idea Estimate main model imposing null hypothesis that you wish to test to give estimate of β H0 Example: test statistical significance of one OLS regressor. Regress y i g on all components of x i g except regressor with coefficient 0 Obtain residuals ṽ i g = y i g x i g β H0 Less trivial to implement and computationally more intensive Montecarlo Brewer, Crossley and Joyce Inference D-D Applied Reading Group 14 / 28

Presentation Inference problem in D-D: Bertrand et al. (2004) Cluster-based solutions: Cameron and Miller (2015) Brewer et al. (2018) solution Brewer, Crossley and Joyce Inference D-D Applied Reading Group 15 / 28

Brewer et al. (2018) Solution: Combination of feasible GLS with cluster-robust inference Brewer, Crossley and Joyce Inference D-D Applied Reading Group 16 / 28

Brewer et al. (2018) Solution: Combination of feasible GLS with cluster-robust inference Tests of correct size can be obtained with standard statistical software vce(cluster clustervar) Even with few groups Brewer, Crossley and Joyce Inference D-D Applied Reading Group 16 / 28

Brewer et al. (2018) Solution: Combination of feasible GLS with cluster-robust inference Tests of correct size can be obtained with standard statistical software vce(cluster clustervar) Even with few groups Real problem is power Gains in power can be achieved using feasible GLS even with small groups Brewer, Crossley and Joyce Inference D-D Applied Reading Group 16 / 28

Brewer et al. (2018) Solution: Combination of feasible GLS with cluster-robust inference Tests of correct size can be obtained with standard statistical software vce(cluster clustervar) Even with few groups Real problem is power Gains in power can be achieved using feasible GLS even with small groups Combination feasible GLS and cluster-robust inference can also control test size Brewer, Crossley and Joyce Inference D-D Applied Reading Group 16 / 28

Structure of the paper Replicate Monte Carlo simulations of Bertrand et al. (2004) CRSE and wild-bootstrap have low size distortion CRSE and wild-bootstrap have low power to detect the real effects Increasing Power with Feasible GLS Brewer, Crossley and Joyce Inference D-D Applied Reading Group 17 / 28

Rejection rates when the Null is True Brewer, Crossley and Joyce Inference D-D Applied Reading Group 18 / 28

Rejection rates when the Null is True Assuming iid errors, reject more than 40 % Brewer, Crossley and Joyce Inference D-D Applied Reading Group 18 / 28

Rejection rates when the Null is True Cluster-robust SE solve serial correlation with big G Brewer, Crossley and Joyce Inference D-D Applied Reading Group 18 / 28

Rejection rates when the Null is True Problem cluster-robust SE with small G Brewer, Crossley and Joyce Inference D-D Applied Reading Group 18 / 28

Rejection rates when the Null is True Bias-correction of cluster-robust SE have low size distortion, even with small number of groups Brewer, Crossley and Joyce Inference D-D Applied Reading Group 18 / 28

Rejection rates when the Null is True Similar for wild cluster bootstrap Brewer, Crossley and Joyce Inference D-D Applied Reading Group 18 / 28

Rejection rates when the Null is True Results robust Binary dependent variable Wide range of error process Brewer, Crossley and Joyce Inference D-D Applied Reading Group 19 / 28

Rejection rates when the Null is True Results robust Binary dependent variable Wide range of error process Bias-corrected CRSE not robust Large imbalance between treatment and control groups Different from bootstrap Brewer, Crossley and Joyce Inference D-D Applied Reading Group 19 / 28

Structure of the paper Replicate Monte Carlo simulations of Bertrand et al. (2004) CRSE and wild-bootstrap have low size distortion CRSE and wild-bootstrap have low power to detect the real effects Increasing Power with Feasible GLS Brewer, Crossley and Joyce Inference D-D Applied Reading Group 20 / 28

Power to detect the real effects MDE Brewer, Crossley and Joyce Inference D-D Applied Reading Group 21 / 28

Power to detect the real effects Low power for both methods: 2 % effect detected only 40 % times MDE Brewer, Crossley and Joyce Inference D-D Applied Reading Group 21 / 28

Power to detect the real effects Higher power with higher effect MDE Brewer, Crossley and Joyce Inference D-D Applied Reading Group 21 / 28

Power to detect the real effects Problem of power higher with lower groups MDE Brewer, Crossley and Joyce Inference D-D Applied Reading Group 21 / 28

Power to detect the real effects Problem of power higher with lower groups even for large true effects MDE Brewer, Crossley and Joyce Inference D-D Applied Reading Group 21 / 28

Structure of the paper Replicate Monte Carlo simulations of Bertrand et al. (2004) CRSE and wild-bootstrap have low size distortion CRSE and wild-bootstrap have low power to detect the real effects Increasing Power with Feasible GLS Brewer, Crossley and Joyce Inference D-D Applied Reading Group 22 / 28

Feasible GLS Way to increase efficiency by exploiting knowledge of serial correlation y = X β + v, Cov [v X ] = Ω, ˆβGLS = ( X ΩX ) 1 X Ωy Assume AR(k) process for group-time shocks Estimate model Estimate k parameters using OLS residuals Apply GLS transformation Estimate model on transformed variables Brewer, Crossley and Joyce Inference D-D Applied Reading Group 23 / 28

Feasible GLS Problems Estimate k lag parameters inconsistent: Hansen s bias correction may not work for small G Misspecification of the error process Brewer, Crossley and Joyce Inference D-D Applied Reading Group 24 / 28

Feasible GLS Problems Estimate k lag parameters inconsistent: Hansen s bias correction may not work for small G Misspecification of the error process Not affect consistency and (likely) more efficient than OLS estimator, but affect test size Possible to use cluster-robust inference to improve test size Plug FGLS residuals into the CRSE formula Brewer, Crossley and Joyce Inference D-D Applied Reading Group 24 / 28

Increasing power with Feasible GLS MDE Brewer, Crossley and Joyce Inference D-D Applied Reading Group 25 / 28

Increasing power with Feasible GLS FGLS improves power wrt OLS, even with small G MDE Brewer, Crossley and Joyce Inference D-D Applied Reading Group 25 / 28

Increasing power with Feasible GLS FGLS combined with bias-corrected CRSE improves size, even with small G MDE Brewer, Crossley and Joyce Inference D-D Applied Reading Group 25 / 28

Increasing power with Feasible GLS Results not robust Parametric assumption about serial correlation (power advantage FGLS if process is not MA) Number of time periods (power advantage FGLS with high T) Brewer, Crossley and Joyce Inference D-D Applied Reading Group 26 / 28

What we learn from this paper Over-rejection null of no effect not a problem Both with simple bias-correction CRSE and bootstrap Even with small groups Bias-correction CRSE: vce(cluster clustervar) We know it already from Cameron and Miller (2015) Brewer, Crossley and Joyce Inference D-D Applied Reading Group 27 / 28

What we learn from this paper Over-rejection null of no effect not a problem Both with simple bias-correction CRSE and bootstrap Even with small groups Bias-correction CRSE: vce(cluster clustervar) We know it already from Cameron and Miller (2015) Bias-correction CRSE not robust in case of imbalances treatment-control and reduce (but not eliminate) over-rejection Bootstrap Brewer, Crossley and Joyce Inference D-D Applied Reading Group 27 / 28

What we learn from this paper Problem to detect real effect Combining FGLS and CRSE possible to solve power with correct test size Even with small groups Brewer, Crossley and Joyce Inference D-D Applied Reading Group 28 / 28

What we learn from this paper Problem to detect real effect Combining FGLS and CRSE possible to solve power with correct test size Even with small groups FGLS+CRSE not robust in case to small T periods and parametric assumption of serial correlation + improve but not solve power problem Bootstrap Brewer, Crossley and Joyce Inference D-D Applied Reading Group 28 / 28

Solution 2: Wild Cluster Bootstrap, cont d Implementation 1 Obtain b th resample 1 Obtain ũ i g = y i g x β i g H0 { 1 w.p. 0.5 2 Randomly assign cluster g the weight d g = 1 w.p. 0.5 3 Generate new pseudo-residuals u i g = d g ũ i g and new outcome variables y i g = x i g β H0 + u i g 2 Compute OLS estimate ˆβ b 3 Calculate the Wald test statistics w b = ˆβ b ˆβ s ˆβ b Essentially Replacing y g in each resample with y g = X β H0 + ũ g or y g = X β H0 ũ g Obtain 2 G unique values of w 1,..., w B Back to Solutions Brewer, Crossley and Joyce Inference D-D Applied Reading Group 29 / 28

Cameron and Miller (2015): Montecarlo Back to Solutions Back to Conclusion Brewer, Crossley and Joyce Inference D-D Applied Reading Group 30 / 28

Size robustness to error specification Simulate state-time shocks, changing degree of serial correlation and non-normality. Assume AR(1) process 0.004 ( 1 0.4 2) (d 2) ɛ g t = ρɛ g t 1 + w g t (2) d 0.004d ɛ g 1 = d 1 w g 1 (3) Back to Robustness Brewer, Crossley and Joyce Inference D-D Applied Reading Group 31 / 28

Size robustness to error specification Back to Robustness Brewer, Crossley and Joyce Inference D-D Applied Reading Group 32 / 28

Size robustness to imbalance groups Back to Robustness Brewer, Crossley and Joyce Inference D-D Applied Reading Group 33 / 28

Power MDE Minimum Detectable Effects (MDE) Smallest effect that would lead to a rejection of the null hypothesis of no effect with given probabilities Back to Power ( MDE }{{} x = se ˆ clu ˆβ) Power c u }{{} Upper critical value t G 1 p1 x t }{{} (1-x)th percentile of t G 1 under null of no effect (4) Brewer, Crossley and Joyce Inference D-D Applied Reading Group 34 / 28

Power MDE Back to Power Brewer, Crossley and Joyce Inference D-D Applied Reading Group 35 / 28

Power MDE with FGLS Back to Power FGLS Brewer, Crossley and Joyce Inference D-D Applied Reading Group 36 / 28

Power robustness to error specification Back to Robustness Brewer, Crossley and Joyce Inference D-D Applied Reading Group 37 / 28

Power robustness to number of time periods Back to Robustness Brewer, Crossley and Joyce Inference D-D Applied Reading Group 38 / 28