EC352 Econometric Methods: Week 07

Similar documents
Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n.

MEA DISCUSSION PAPERS

Ec331: Research in Applied Economics Spring term, Panel Data: brief outlines

Carrying out an Empirical Project

INTRODUCTION TO ECONOMETRICS (EC212)

Inference with Difference-in-Differences Revisited

Session 3: Dealing with Reverse Causality

Mostly Harmless Simulations? On the Internal Validity of Empirical Monte Carlo Studies

1. Introduction Consider a government contemplating the implementation of a training (or other social assistance) program. The decision to implement t

Session 1: Dealing with Endogeneity

Motivation Empirical models Data and methodology Results Discussion. University of York. University of York

Marno Verbeek Erasmus University, the Netherlands. Cons. Pros

Assessing Studies Based on Multiple Regression. Chapter 7. Michael Ash CPPA

Instrumental Variables Estimation: An Introduction

Example 7.2. Autocorrelation. Pilar González and Susan Orbe. Dpt. Applied Economics III (Econometrics and Statistics)

University of Pennsylvania

Methods for Addressing Selection Bias in Observational Studies

EMPIRICAL STRATEGIES IN LABOUR ECONOMICS

Regression Discontinuity Design (RDD)

NPTEL Project. Econometric Modelling. Module 14: Heteroscedasticity Problem. Module 16: Heteroscedasticity Problem. Vinod Gupta School of Management

Econometric Game 2012: infants birthweight?

Write your identification number on each paper and cover sheet (the number stated in the upper right hand corner on your exam cover).

Introduction to Observational Studies. Jane Pinelis

LINEAR REGRESSION FOR BIVARIATE CENSORED DATA VIA MULTIPLE IMPUTATION

1 Simple and Multiple Linear Regression Assumptions

Determinants of Dietary Choice in the US: Evidence from Consumer Migration

Applied Quantitative Methods II

An Instrumental Variable Consistent Estimation Procedure to Overcome the Problem of Endogenous Variables in Multilevel Models

Pros. University of Chicago and NORC at the University of Chicago, USA, and IZA, Germany

ECON Microeconomics III

Getting started with Eviews 9 (Volume IV)

SOME STATISTICS ON WOMEN IN ASTRONOMY. Peter B. Boyce. increase three times that of men. Among young astronomers, approximately one in

Heterogeneity and statistical signi"cance in meta-analysis: an empirical study of 125 meta-analyses -

THE WAGE EFFECTS OF PERSONAL SMOKING

(b) empirical power. IV: blinded IV: unblinded Regr: blinded Regr: unblinded α. empirical power

Multiple Linear Regression (Dummy Variable Treatment) CIVL 7012/8012

Technical Track Session IV Instrumental Variables

Improving ecological inference using individual-level data

Introduction to Econometrics

Judea Pearl. Cognitive Systems Laboratory. Computer Science Department. University of California, Los Angeles, CA

Version No. 7 Date: July Please send comments or suggestions on this glossary to

Maternal Mental Health and Child Development

The Dynamic Effects of Obesity on the Wages of Young Workers

Lecture II: Difference in Difference. Causality is difficult to Show from cross

Workplace smoking ban eects in an heterogenous smoking population

A NON-TECHNICAL INTRODUCTION TO REGRESSIONS. David Romer. University of California, Berkeley. January Copyright 2018 by David Romer

LINEAR REGRESSION FOR BIVARIATE CENSORED DATA VIA MULTIPLE IMPUTATION WEI PAN. School of Public Health. 420 Delaware Street SE

Correlation Neglect in Belief Formation

Simple Linear Regression the model, estimation and testing

PSYCHOLOGY 300B (A01) One-sample t test. n = d = ρ 1 ρ 0 δ = d (n 1) d

Estimating the causal eect of zidovudine on CD4 count with a marginal structural model for repeated measures

Consequences of effect size heterogeneity for meta-analysis: a Monte Carlo study

Lecture 4: Research Approaches

Lecture 14: Adjusting for between- and within-cluster covariates in the analysis of clustered data May 14, 2009

IS BEER CONSUMPTION IN IRELAND ACYCLICAL?

Sensitivity, specicity, ROC

Lec 02: Estimation & Hypothesis Testing in Animal Ecology

An Introduction to Modern Econometrics Using Stata

What is Multilevel Modelling Vs Fixed Effects. Will Cook Social Statistics

Data harmonization tutorial:teaser for FH2019

The Effect of Urban Agglomeration on Wages: Evidence from Samples of Siblings

The Limits of Inference Without Theory

Those Who Tan and Those Who Don t: A Natural Experiment of Employment Discrimination

Missing values in epidemiological studies. Werner Vach. Center for Data Analysis and Model Building

Meta-Analysis and Publication Bias: How Well Does the FAT-PET-PEESE Procedure Work?

Final Exam - section 2. Thursday, December hours, 30 minutes

RANDOMIZATION. Outline of Talk

Problem Set 5 ECN 140 Econometrics Professor Oscar Jorda. DUE: June 6, Name

The Impact of Weekend Working on Well-Being in the UK

Empirical Methods in Economics. The Evaluation Problem

Quasi-experimental analysis Notes for "Structural modelling".

Instrumental Variables I (cont.)

Introduction to Econometrics

Swat That Mosquito: Estimating the Decline of. Malaria in Georgia Carl T. Kitchens. August 13, Abstract

Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy Research

Establishing Causality Convincingly: Some Neat Tricks

[En français] For a pdf of the transcript click here.

Machine Learning Statistical Learning. Prof. Matteo Matteucci

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Internal Validity and Experimental Design

The Smoker s Wage Penalty Puzzle Evidence from Britain

The Perils of Empirical Work on Institutions

Rank, Sex, Drugs and Crime

Individual Sense of Fairness: An Experimental Study

Practical Regression: Convincing Empirical Research in Ten Steps

Limited dependent variable regression models

Wesleyan Economics Working Papers

ELICITING RISK PREFERENCES USING CHOICE LISTS. 1. Introduction

Cross-Price Moral Hazard: Evidence from Diabetics' Insulin Usage Before and After Medicare Part D

Randomization as a Tool for Development Economists. Esther Duflo Sendhil Mullainathan BREAD-BIRS Summer school

and errs as expected. The disadvantage of this approach is that it is time consuming, due to the fact that it is necessary to evaluate all algorithms,

THE EFFECTS OF ALCOHOL USE ON SCHOOL ENROLLMENT

SOME NOTES ON THE INTUITION BEHIND POPULAR ECONOMETRIC TECHNIQUES

The preceding five chapters explain how to use multiple regression to analyze the

Causality and Statistical Learning

Estimating dynamic Panel data. A practical approach to perform long panels

Introduction to Applied Research in Economics Kamiljon T. Akramov, Ph.D. IFPRI, Washington, DC, USA

Cross-Lagged Panel Analysis

Measuring the Impacts of Teachers: Reply to Rothstein

Transcription:

EC352 Econometric Methods: Week 07 Gordon Kemp Department of Economics, University of Essex 1 / 25

Outline Panel Data (continued) Random Eects Estimation and Clustering Dynamic Models Validity & Threats to Validity Types of Validity Internal Validity External Validity Robustness and Sensitivity Checks 2 / 25

Random Eects Suppose we are interested in the eect of schooling on wages and we have collected panel data on a sample of adults including data on wages, education, gender, experience, age, job tenure and measures of ability. As usual with panel data, we might worry that there are still a lot of individual specic factors which we are not able to measure but which inuence wages. 3 / 25

If we adopt a FE or a FD approach then usually we cannot conclude much about the returns to schooling since the schooling variable is time invariant for most adults: indeed, often it is time invariant for all the individuals in such a sample because of the way it is dened. But if the individual specic eects were not correlated with the observable regressors then OLS on the original equation with the individual specic eects included in the error term would be consistent (include time dummies if desired). 4 / 25

When the individual specic eects are uncorrelated with the regressors then we refer to the eects as random eects. There are several approaches we can adopt: 1. Pooled OLS estimation of the original equation (with the individual specic eects included in the error term) using adjusted standard errors. 2. GLS estimation of the original equation (with the individual specic eects included in the error term) 3. OLS estimation of the original equation but only using the individual means of the variables. 5 / 25

OLS with Clustered Standard Errors The adjustment that we need to make here to the standard errors for Pooled OLS estimation is called clustering. For any individual we treat the conditional variances and covariances over time of their error terms given the regressors as being being entirely individual specic: we allow any pattern. However, we assume conditional covariances between an error term of one individual and an error term for another individual are all zero. This is a generalization of the usual heteroskedasticity robust standard errors. 6 / 25

Random Eects (RE) Estimator If we are willing to assume that: the individual specic eects are iid conditional on the regressors with variance σa 2 ; the original error terms are iid conditional on the regressors with variance σu 2 ; and the individual specic eects and the original error terms are independent of each other conditional on the regressors then we can implement GLS provided that we can estimate σ 2 a/σ 2 u. This turns out to be feasible and the resulting GLS estimator is what is usually called the Random Eects (RE) estimator. Under these assumptions, RE is more ecient than either Pooled OLS or FE but these are strong assumptions. 7 / 25

Between Groups (BG) Estimator The third approach for handling random eects is OLS estimation of the original equation but only using the individual means of the variables: the so-called Between Groups (BG) estimator. This is obtained by regressing ȳ i = T 1 T t=1 y it on a constant and x i = T 1 T t=1 x it. One motivation for using the BG estimator is that it complements the FE estimator, which is sometimes called the Within Groups (WG) estimator: FE identies coecients from the changes over time in variables for each individual; BG identies coecients from dierences across individuals in the time averages of variables. Thus these two estimators use somewhat dierent bits of the information that is available. 8 / 25

We can show that the RE and Pooled OLS estimators are combinations of the FE and BG estimators. If we make the same assumptions as we did for the RE estimator then we can show that the FE and BG estimators are asymptotically independent of each other. Note that since the BG estimator consists of a single cross-section regression then it is easy to compute heteroskedasticity robust standard errors. 9 / 25

Random vs. Fixed Eects With xed eects we cannot estimate the eect of time invariant variables (such as gender, etc.). Random eects supposes that there is no correlation between the unobserved individual eect and the independent observed variables of interest. In general, we should use RE if this condition is satised. Otherwise we should use FE. How do we know whether this condition is satised? Durbin-Wu-Hausman test: If there is no such correlation, then RE and FE estimators are both consistent in which case the dierence between them should converge to zero. However, if such correlation is present then the RE estimator is inconsistent while the FE estimator remains consistent in which case the dierence between them should not converge to zero. Hence, the test consists of rejecting the null when the dierence between the RE and FE estimators is suciently large. 10 / 25

Dynamic Models Since panel data consist of a cross-section of time-series it is natural to consider the possibility of dynamics when dealing with panel data. Dynamics which occur via serial correlation in the disturbances are non-problematic provided that the regressors are strictly exogenous for the disturbances. In particular: the FD and FE estimators will remain consistent; the pooled OLS and BG estimators will be consistent if the unobserved eects are uncorrelated with the regressors; and the RE estimator will be consistent if the if the unobserved eects are uncorrelated with the regressors but will not be ecient. 11 / 25

However dynamics that that occur via lagged dependent variables are much problematic. For example, suppose that: y it = δ + βx it + φy it 1 + a i + ε it, where ε it uncorrelated with x js for all i, j, s and t. First-dierencing gives: y it = β x it + φ y it 1 + ε it, but then y it 1 and ε it are typically correlated since ε it 1 inuences y it 1 and hence aects y it 1 but also appears directly in ε it. Hence, in general, the FD estimator will be inconsistent. Much the same problem aicts the FE estimator. 12 / 25

In addition, since a i directly inuences y it 1 then (a i + ε it ) will typically be correlated with y t 1 and hence, in general, the pooled OLS and RE estimators will be inconsistent (though if Var [a i ] = 0 then this is not a problem). Similarly (a i + ε i ) will typically be correlated with ȳ i, 1 (the average for individual i of the y it 1 's) and hence, in general, the BG estimator inconsistent (though if Var [a i ] = 0 then this is not a problem). Thus, in general none of the panel data estimators we have considered will work when the model contains both a lagged dependent variable and unobserved individual specic eects. The standard methods for handling this situation rely on the use of instrumental variables methods more about this later in the module. 13 / 25

Types of Validity Internal Validity An empirical analysis is internally valid if the inferences from the sample are valid for the population and setting generating the sample. External Validity An empirical analysis is externally valid if its inferences and conclusions can be generalized from the particular population and setting generating the sample to other populations and settings. 14 / 25

Internal Validity Internal validity has two main components: The estimator of the parameters and causal eects of interest should be unbiased and consistent. Sometimes there are no unbiased estimators in which the estimator should be consistent. Tests of hypotheses about the parameters of interest should have the desired signicance level and condence intervals should have the desired condence level. Usually for tests and condence intervals to be valid we need standard errors to be consistent. 15 / 25

Threats to Internal Validity Possible threats to the internal validity of an empirical study depend on both: the nature of the models and methods used; and the nature of population from which the sample was drawn and the way in which it was drawn. 16 / 25

Example 1. Omitted variable bias is frequently a threat to the internal validity of least squares estimates in a regression. Methods for trying to deal with omitted variable bias in regressions include: Adding the omitted variable if observed or adding a proxy for the omitted variable or allowing for the omitted variable via a dummy variable. Using instrumental variables (more on this later in the module). Collecting panel data and then using rst dierences or xed eects methods to eliminate omitted variable bias due to individual factors that don't change over time. We might try to use randomized trials to avoid having omitted variable bias in the rst place. 17 / 25

Solutions Example 2. Conditional heteroskedasticity of error terms is a frequently a threat to the internal validity of least squares standard errors in a regression. Methods for trying to deal with conditional heteroskedasticity of error terms in regressions include: Using heteroskedasticity robust standard errors. Modeling the heteroskedasticity and using generalized least squares methods. 18 / 25

Use of Tests Tests are useful tools for detecting if certain threats to internal validity are present. Example 3. In panel data models, the presence of unobserved individual specic eects will render the pooled OLS, RE and BG estimators inconsistent try using a Hausman test to detect if such eects are present. Example 4. Serial correlation in the error terms of an ARDL model will render OLS inconsistent try using a Breusch-Godfrey test for the presence of serial correlation in the errors. 19 / 25

External Validity If the population being studied and the population of interest are dierent then there is always the possibility of threats to external validity. For example, lots of studies in experimental economics use data from surveys of students or experiments in which the participants are students. Do the results from such studies carry over, for example, to the general adult population? Even when the population being studied and the population of interest are the same, dierences in settings can generate threats to external validity. For example, in studying the eects on binge drinking of anti-drinking advertising campaigns the results from one university might not generalize to another if the legal penalties diered. 20 / 25

Robustness Checks Robustness checks is a somewhat catch-all term that usually refers to estimating and testing additional relationships, aside from primary relationship of interest, in order to see if one can eliminate various threats to validity. What robustness checks one performs therefore depends on what threats to validity seem to be of concern and what additional sources of data one has available: hence they tend to be very study specic. 21 / 25

Varieties of Robustness Checks These include (among others): altering the set of regressors and/or instruments; altering the model's functional form; using subsets of the dataset changing the dependent variable running analyses on separate data sets placebo regressions 22 / 25

Placebo Regressions One of the assumptions needed for dierences-in-dierences estimation to be valid is that the trend that would have occurred for the treatment group in the absence of treatment is the same as the trend for the control group. Suppose we are interested in the impact of reducing class sizes on the academic performance of school pupils. In addition, suppose that a particular town with two state schools had received some funding for educational improvement and decided it would allocate the funds to one of the schools (North) to reduce class sizes by hiring additional teaching sta but not to the other school (South). Then suppose we had data on the academic performance of the graduating classes of the two schools both for the year after the funding was given and for the year before the funding was given. 23 / 25

Ashenfelter Dip We could then run a dierences-in-dierences estimation. Question: How to interpret a positive estimate? We might worry that the North school had received the funding precisely because it's graduating class had performed badly (i.e., had dipped) the year before the funding was given. If so then since some of the poor performance could be the eect of pure chance so we might therefore expect a bit of an improvement in the performance at the North school from that year to the year in which funding was given for reasons that had nothing to do changes in class sizes resulting from the funding. 24 / 25

Placebo Regression If we had data on the academic performance of the graduating classes of the two schools three years before the funding was given then we could do a run second di-in-di estimation examining the changes from three years before funding was given to one year before it was given: a placebo regression. If we saw evidence suggesting that this second di-in-di estimate was signicantly dierent from zero then: this cannot be be the result of changing class sizes due to the funding because that hadn't yet happened; it would be consistent with the North school having a dierent trend over time in academic performance as compared to the South school; a negative estimate would be consistent with a temporary dip in the performance at the North school prior to the funding due to a randomly poor performance. 25 / 25