EC352 Econometric Methods: Week 07

EC352 Econometric Methods: Week 07 Gordon Kemp Department of Economics, University of Essex 1 / 25

Outline Panel Data (continued) Random Eects Estimation and Clustering Dynamic Models Validity & Threats to Validity Types of Validity Internal Validity External Validity Robustness and Sensitivity Checks 2 / 25

Random Eects Suppose we are interested in the eect of schooling on wages and we have collected panel data on a sample of adults including data on wages, education, gender, experience, age, job tenure and measures of ability. As usual with panel data, we might worry that there are still a lot of individual specic factors which we are not able to measure but which inuence wages. 3 / 25

If we adopt a FE or a FD approach then usually we cannot conclude much about the returns to schooling since the schooling variable is time invariant for most adults: indeed, often it is time invariant for all the individuals in such a sample because of the way it is dened. But if the individual specic eects were not correlated with the observable regressors then OLS on the original equation with the individual specic eects included in the error term would be consistent (include time dummies if desired). 4 / 25

When the individual specic eects are uncorrelated with the regressors then we refer to the eects as random eects. There are several approaches we can adopt: 1. Pooled OLS estimation of the original equation (with the individual specic eects included in the error term) using adjusted standard errors. 2. GLS estimation of the original equation (with the individual specic eects included in the error term) 3. OLS estimation of the original equation but only using the individual means of the variables. 5 / 25

OLS with Clustered Standard Errors The adjustment that we need to make here to the standard errors for Pooled OLS estimation is called clustering. For any individual we treat the conditional variances and covariances over time of their error terms given the regressors as being being entirely individual specic: we allow any pattern. However, we assume conditional covariances between an error term of one individual and an error term for another individual are all zero. This is a generalization of the usual heteroskedasticity robust standard errors. 6 / 25

Random Eects (RE) Estimator If we are willing to assume that: the individual specic eects are iid conditional on the regressors with variance σa 2 ; the original error terms are iid conditional on the regressors with variance σu 2 ; and the individual specic eects and the original error terms are independent of each other conditional on the regressors then we can implement GLS provided that we can estimate σ 2 a/σ 2 u. This turns out to be feasible and the resulting GLS estimator is what is usually called the Random Eects (RE) estimator. Under these assumptions, RE is more ecient than either Pooled OLS or FE but these are strong assumptions. 7 / 25

Between Groups (BG) Estimator The third approach for handling random eects is OLS estimation of the original equation but only using the individual means of the variables: the so-called Between Groups (BG) estimator. This is obtained by regressing ȳ i = T 1 T t=1 y it on a constant and x i = T 1 T t=1 x it. One motivation for using the BG estimator is that it complements the FE estimator, which is sometimes called the Within Groups (WG) estimator: FE identies coecients from the changes over time in variables for each individual; BG identies coecients from dierences across individuals in the time averages of variables. Thus these two estimators use somewhat dierent bits of the information that is available. 8 / 25

We can show that the RE and Pooled OLS estimators are combinations of the FE and BG estimators. If we make the same assumptions as we did for the RE estimator then we can show that the FE and BG estimators are asymptotically independent of each other. Note that since the BG estimator consists of a single cross-section regression then it is easy to compute heteroskedasticity robust standard errors. 9 / 25

Random vs. Fixed Eects With xed eects we cannot estimate the eect of time invariant variables (such as gender, etc.). Random eects supposes that there is no correlation between the unobserved individual eect and the independent observed variables of interest. In general, we should use RE if this condition is satised. Otherwise we should use FE. How do we know whether this condition is satised? Durbin-Wu-Hausman test: If there is no such correlation, then RE and FE estimators are both consistent in which case the dierence between them should converge to zero. However, if such correlation is present then the RE estimator is inconsistent while the FE estimator remains consistent in which case the dierence between them should not converge to zero. Hence, the test consists of rejecting the null when the dierence between the RE and FE estimators is suciently large. 10 / 25

Dynamic Models Since panel data consist of a cross-section of time-series it is natural to consider the possibility of dynamics when dealing with panel data. Dynamics which occur via serial correlation in the disturbances are non-problematic provided that the regressors are strictly exogenous for the disturbances. In particular: the FD and FE estimators will remain consistent; the pooled OLS and BG estimators will be consistent if the unobserved eects are uncorrelated with the regressors; and the RE estimator will be consistent if the if the unobserved eects are uncorrelated with the regressors but will not be ecient. 11 / 25

However dynamics that that occur via lagged dependent variables are much problematic. For example, suppose that: y it = δ + βx it + φy it 1 + a i + ε it, where ε it uncorrelated with x js for all i, j, s and t. First-dierencing gives: y it = β x it + φ y it 1 + ε it, but then y it 1 and ε it are typically correlated since ε it 1 inuences y it 1 and hence aects y it 1 but also appears directly in ε it. Hence, in general, the FD estimator will be inconsistent. Much the same problem aicts the FE estimator. 12 / 25

In addition, since a i directly inuences y it 1 then (a i + ε it ) will typically be correlated with y t 1 and hence, in general, the pooled OLS and RE estimators will be inconsistent (though if Var [a i ] = 0 then this is not a problem). Similarly (a i + ε i ) will typically be correlated with ȳ i, 1 (the average for individual i of the y it 1 's) and hence, in general, the BG estimator inconsistent (though if Var [a i ] = 0 then this is not a problem). Thus, in general none of the panel data estimators we have considered will work when the model contains both a lagged dependent variable and unobserved individual specic eects. The standard methods for handling this situation rely on the use of instrumental variables methods more about this later in the module. 13 / 25

Types of Validity Internal Validity An empirical analysis is internally valid if the inferences from the sample are valid for the population and setting generating the sample. External Validity An empirical analysis is externally valid if its inferences and conclusions can be generalized from the particular population and setting generating the sample to other populations and settings. 14 / 25

Internal Validity Internal validity has two main components: The estimator of the parameters and causal eects of interest should be unbiased and consistent. Sometimes there are no unbiased estimators in which the estimator should be consistent. Tests of hypotheses about the parameters of interest should have the desired signicance level and condence intervals should have the desired condence level. Usually for tests and condence intervals to be valid we need standard errors to be consistent. 15 / 25

Threats to Internal Validity Possible threats to the internal validity of an empirical study depend on both: the nature of the models and methods used; and the nature of population from which the sample was drawn and the way in which it was drawn. 16 / 25

Example 1. Omitted variable bias is frequently a threat to the internal validity of least squares estimates in a regression. Methods for trying to deal with omitted variable bias in regressions include: Adding the omitted variable if observed or adding a proxy for the omitted variable or allowing for the omitted variable via a dummy variable. Using instrumental variables (more on this later in the module). Collecting panel data and then using rst dierences or xed eects methods to eliminate omitted variable bias due to individual factors that don't change over time. We might try to use randomized trials to avoid having omitted variable bias in the rst place. 17 / 25

Solutions Example 2. Conditional heteroskedasticity of error terms is a frequently a threat to the internal validity of least squares standard errors in a regression. Methods for trying to deal with conditional heteroskedasticity of error terms in regressions include: Using heteroskedasticity robust standard errors. Modeling the heteroskedasticity and using generalized least squares methods. 18 / 25

Use of Tests Tests are useful tools for detecting if certain threats to internal validity are present. Example 3. In panel data models, the presence of unobserved individual specic eects will render the pooled OLS, RE and BG estimators inconsistent try using a Hausman test to detect if such eects are present. Example 4. Serial correlation in the error terms of an ARDL model will render OLS inconsistent try using a Breusch-Godfrey test for the presence of serial correlation in the errors. 19 / 25

External Validity If the population being studied and the population of interest are dierent then there is always the possibility of threats to external validity. For example, lots of studies in experimental economics use data from surveys of students or experiments in which the participants are students. Do the results from such studies carry over, for example, to the general adult population? Even when the population being studied and the population of interest are the same, dierences in settings can generate threats to external validity. For example, in studying the eects on binge drinking of anti-drinking advertising campaigns the results from one university might not generalize to another if the legal penalties diered. 20 / 25

Robustness Checks Robustness checks is a somewhat catch-all term that usually refers to estimating and testing additional relationships, aside from primary relationship of interest, in order to see if one can eliminate various threats to validity. What robustness checks one performs therefore depends on what threats to validity seem to be of concern and what additional sources of data one has available: hence they tend to be very study specic. 21 / 25

Varieties of Robustness Checks These include (among others): altering the set of regressors and/or instruments; altering the model's functional form; using subsets of the dataset changing the dependent variable running analyses on separate data sets placebo regressions 22 / 25

Placebo Regressions One of the assumptions needed for dierences-in-dierences estimation to be valid is that the trend that would have occurred for the treatment group in the absence of treatment is the same as the trend for the control group. Suppose we are interested in the impact of reducing class sizes on the academic performance of school pupils. In addition, suppose that a particular town with two state schools had received some funding for educational improvement and decided it would allocate the funds to one of the schools (North) to reduce class sizes by hiring additional teaching sta but not to the other school (South). Then suppose we had data on the academic performance of the graduating classes of the two schools both for the year after the funding was given and for the year before the funding was given. 23 / 25

Ashenfelter Dip We could then run a dierences-in-dierences estimation. Question: How to interpret a positive estimate? We might worry that the North school had received the funding precisely because it's graduating class had performed badly (i.e., had dipped) the year before the funding was given. If so then since some of the poor performance could be the eect of pure chance so we might therefore expect a bit of an improvement in the performance at the North school from that year to the year in which funding was given for reasons that had nothing to do changes in class sizes resulting from the funding. 24 / 25

Placebo Regression If we had data on the academic performance of the graduating classes of the two schools three years before the funding was given then we could do a run second di-in-di estimation examining the changes from three years before funding was given to one year before it was given: a placebo regression. If we saw evidence suggesting that this second di-in-di estimate was signicantly dierent from zero then: this cannot be be the result of changing class sizes due to the funding because that hadn't yet happened; it would be consistent with the North school having a dierent trend over time in academic performance as compared to the South school; a negative estimate would be consistent with a temporary dip in the performance at the North school prior to the funding due to a randomly poor performance. 25 / 25