Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n.

Similar documents
University of Groningen. BNP and NT-proBNP in heart failure Hogenhuis, Jochem

MEA DISCUSSION PAPERS

Carrying out an Empirical Project

Citation for published version (APA): Weert, E. V. (2007). Cancer rehabilitation: effects and mechanisms s.n.

Pharmacoeconomic analysis of proton pump inhibitor therapy and interventions to control Helicobacter pylori infection Klok, Rogier Martijn

An Instrumental Variable Consistent Estimation Procedure to Overcome the Problem of Endogenous Variables in Multilevel Models

EC352 Econometric Methods: Week 07

Instrumental Variables Estimation: An Introduction

Citation for published version (APA): Geus, A. F. D., & Rotterdam, E. P. (1992). Decision support in aneastehesia s.n.

What is Multilevel Modelling Vs Fixed Effects. Will Cook Social Statistics

INTRODUCTION TO ECONOMETRICS (EC212)

Sorting and trafficking of proteins in oligodendrocytes during myelin membrane biogenesis Klunder, Lammert

EMPIRICAL STRATEGIES IN LABOUR ECONOMICS

Chapter 11: Advanced Remedial Measures. Weighted Least Squares (WLS)

University of Groningen. Real-world influenza vaccine effectiveness Darvishian, Maryam

Technical Track Session IV Instrumental Variables

Problem Set 5 ECN 140 Econometrics Professor Oscar Jorda. DUE: June 6, Name

Multilevel analysis quantifies variation in the experimental effect while optimizing power and preventing false positives

Citation for published version (APA): Schortinghuis, J. (2004). Ultrasound stimulation of mandibular bone defect healing s.n.

Citation for published version (APA): Otten, M. P. T. (2011). Oral Biofilm as a Reservoir for Antimicrobials Groningen: University of Groningen

Session 1: Dealing with Endogeneity

Write your identification number on each paper and cover sheet (the number stated in the upper right hand corner on your exam cover).

Marno Verbeek Erasmus University, the Netherlands. Cons. Pros

Score Tests of Normality in Bivariate Probit Models

PET Imaging of Mild Traumatic Brain Injury and Whiplash Associated Disorder Vállez García, David

University of Groningen. The Economics of assisted reproduction Connolly, Mark Patrick

Session 3: Dealing with Reverse Causality

Inference with Difference-in-Differences Revisited

6. Unusual and Influential Data

University of Groningen. Mastering (with) a handicap Kunnen, Elske

Jae Jin An, Ph.D. Michael B. Nichol, Ph.D.

Applied Quantitative Methods II

Improving quality of care for patients with ovarian and endometrial cancer Eggink, Florine

Ec331: Research in Applied Economics Spring term, Panel Data: brief outlines

Propensity Score Analysis Shenyang Guo, Ph.D.

University of Groningen. Leven na een beroerte Loor, Henriëtte Ina

University of Groningen. Non-alcoholic fatty liver disease Sheedfar, Fareeba

Performance of Median and Least Squares Regression for Slightly Skewed Data

Introduction to Econometrics

University of Groningen. Visual hallucinations in Parkinson's disease Meppelink, Anne Marthe

University of Groningen. Diminished ovarian reserve and adverse reproductive outcomes de Carvalho Honorato, Talita

Methods for Addressing Selection Bias in Observational Studies

Meta-Analysis and Publication Bias: How Well Does the FAT-PET-PEESE Procedure Work?

Pros. University of Chicago and NORC at the University of Chicago, USA, and IZA, Germany

The Limits of Inference Without Theory

Identification of population average treatment effects using nonlinear instrumental variables estimators : another cautionary note

Econometric Game 2012: infants birthweight?

Module 14: Missing Data Concepts

WELCOME! Lecture 11 Thommy Perlinger

University of Groningen. Medication use for acute coronary syndrome in Vietnam Nguyen, Thang

Economics 345 Applied Econometrics

Multiple Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

University of Groningen. Maintaining balance in elderly fallers Swanenburg, Jaap

University of Groningen. Thoracolumbar spinal fractures Leferink, Vincentius Johannes Maria

University of Groningen. Intestinal nuclear receptor signaling in cystic fibrosis Doktorova, Marcela

THE WAGE EFFECTS OF PERSONAL SMOKING

Citation for published version (APA): Jonker, G. H. (1999). Hydrogenation of edible oils and fats Groningen: s.n.

University of Groningen. Fracture of the distal radius Oskam, Jacob

University of Groningen. Acute kidney injury after cardiac surgery Loef, Berthus Gerard

The significance of Helicobacter pylori in the approach of dyspepsia in primary care Arents, Nicolaas Lodevikus Augustinus

CLUSTER-LEVEL CORRELATED ERROR VARIANCE AND THE ESTIMATION OF PARAMETERS IN LINEAR MIXED MODELS

Macroeconometric Analysis. Chapter 1. Introduction

Bayesian Logistic Regression Modelling via Markov Chain Monte Carlo Algorithm

Russian Journal of Agricultural and Socio-Economic Sciences, 3(15)

Pathophysiology and management of hemostatic alterations in cirrhosis and liver transplantation Arshad, Freeha

University of Groningen. Common mental disorders Norder, Giny

SLAUGHTER PIG MARKETING MANAGEMENT: UTILIZATION OF HIGHLY BIASED HERD SPECIFIC DATA. Henrik Kure

Citation for published version (APA): Sinkeler, S. J. (2016). A tubulo-centric view on cardiorenal disease [Groningen]

The role of the general practitioner in the care for patients with colorectal cancer Brandenbarg, Daan

University of Groningen. Soft tissue development in the esthetic zone Patil, Ratnadeep Chandrakant

University of Groningen. ADHD and atopic diseases van der Schans, Jurjen

26:010:557 / 26:620:557 Social Science Research Methods

University of Groningen. Understanding negative symptoms Klaasen, Nicky Gabriëlle

Cross-Lagged Panel Analysis

Apoptosis in (pre-) malignant lesions in the gastro-intestinal tract Woude, Christien Janneke van der

Regression Discontinuity Analysis

In vitro studies on the cytoprotective properties of Carbon monoxide releasing molecules and N-acyl dopamine derivatives Stamellou, Eleni

Invloeden op de hypercholesterolemie bij proefdieren en bij de mens Valkema, Albert Jan

Quasi-experimental analysis Notes for "Structural modelling".

An Exercise in Bayesian Econometric Analysis Probit and Linear Probability Models

Citation for published version (APA): Eijkelkamp, M. F. (2002). On the development of an artificial intervertebral disc s.n.

University of Groningen. Insomnia in perspective Verbeek, Henrica Maria Johanna Cornelia

The preceding five chapters explain how to use multiple regression to analyze the

[En français] For a pdf of the transcript click here.

Regression Discontinuity Designs: An Approach to Causal Inference Using Observational Data

University of Groningen. Morbidity after neck dissection in head and neck cancer patients Wilgen, Cornelis Paul van

Lecture II: Difference in Difference. Causality is difficult to Show from cross

University of Groningen. Cost and outcome of liver transplantation van der Hilst, Christian

Chapter 1: Explaining Behavior

University of Groningen. Functional outcome after a spinal fracture Post, Richard Bernardus

Introduction to Observational Studies. Jane Pinelis

University of Groningen. Pharmacy data as a tool for assessing antipsychotic drug use Rijcken, Claudia

Causal Validity Considerations for Including High Quality Non-Experimental Evidence in Systematic Reviews

University of Groningen. Mutational landscape of Hodgkin lymphoma Abdul Razak, Fazlyn Reeny Binti

A critical look at the use of SEM in international business research

Citation for published version (APA): Tielliu, I. F. J. (2010). Endovascular repair of peripheral artery aneurysms Groningen: s.n.

Citation for published version (APA): Westerman, E. M. (2009). Studies on antibiotic aerosols for inhalation in cystic fibrosis s.n.

Citation for published version (APA): Koning, J. P. D. (2001). Dry powder inhalation: technical and physiological aspects, prescribing and use s.n.

University of Groningen. Carcinoembryonic Antigen (CEA) in colorectal cancer follow-up Verberne, Charlotte

Transcription:

University of Groningen Latent instrumental variables Ebbes, P. IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document version below. Document Version Publisher's PDF, also known as Version of record Publication date: 2004 Link to publication in University of Groningen/UMCG research database Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n. Copyright Other than for strictly personal use, it is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license (like Creative Commons). Take-down policy If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim. Downloaded from the University of Groningen/UMCG research database (Pure): http://www.rug.nl/research/portal. For technical reasons the number of authors shown on this cover page is limited to 10 maximum. Download date: 22-09-2018

Chapter 1 Introduction In this thesis we propose a new method to estimate regression coefficients in linear regression models where regressor-error correlations are likely to be present. This method, the Latent Instrumental Variables (LIV) method utilizes a discrete latent variable model that accounts for dependencies between regressors and the error term. As a result, observed exogenous instrumental variables are not required. In the following chapters we introduce and illustrate the LIV method on both simulated data and empirical applications. We show that the LIV method has desirable properties over existing methods, such as ordinary regression and instrumental variables methods, when regressor-error dependencies are present. Each chapter is more or less self-contained and based on articles. In the following we present the scope and outline of the thesis. The starting point of this research is the simple linear regression model given by y i = β 0 + β 1 x i + ɛ i, (1.1) where y i is the dependent variable, x i the explanatory variable (regressor), and ɛ i is the error term or disturbance with mean zero and variance σ 2, all independent. The regression parameters β 0 and β 1 are the objects of inference. We focus on a situation where the regressor is random and possibly correlated 1

2 Chapter 1 Introduction with the disturbance 1, in which case it is not exogenous but endogenous. Regressor-error correlations may be the result of several causes and arise in a wide variety of models, e.g. when relevant explanatory variables are omitted, when the dependent variable influences the explanatory variable (simultaneity), when the sampling process is non-random (self-selection), or when the explanatory variable is measured with error. The standard inferential methods are invalid if regressor-error dependencies exist. For instance, the ordinary least squares estimator for the regression parameters (β 0, β 1 ) suffers from inconsistency, in which case the true effect of the explanatory variable on the dependent variable is systematically over- or underestimated, leading to false conclusions and erroneous decision making. The instrumental variables (IV) methods were developed to overcome these problems and have a long history in econometrics (Bowden and Turkington, 1984, Greene, 2000, or Judge et al., 1985). Instruments z are variables that mimic the endogenous regressor x as well as possible, but are uncorrelated 2 with the error term ɛ. Once valid instruments are available, the regression parameters can be consistently estimated via, for instance, two-stage least squares techniques. However, finding exogenous instruments is hard work, and empirical researchers are often confronted with weak instruments. An instrument is weak when it only weakly correlates with the endogenous regressors. If instruments are weak and/or not exogenous, the standard instrumental variables estimation and inferential procedures are inaccurate and produce bad results, that are potentially worse than simply ignoring the endogeneity problem and relying on biased ordinary least squares. Hence, small biases in ordinary least squares estimates can become large biases when invalid instruments are used (Stock, Wright, and Yogo, 2002, or Hahn and Hausman, 2003) 3. Besides the problems of potential weak and/or endogenous instruments, these variables may simply not be available to a researcher, whereas collecting them is time 1 At least in the weak sense that plim i x i ɛ i 0, or that E (x i ɛ i ) 0 implying E (ɛ i x i ) 0, e.g. White (2001) or Ferguson (1996). 2 The instrument is said to be exogenous. 3 This was already observed by Sargan in the 1950s, see e.g. Arellano (2002).

3 consuming and expensive. The main purpose of this research is to develop a new method (the latent instrumental variables (LIV) method) that does not require observed instrumental variables at hand. As such, the difficult task of finding instruments and the inferential issues in presence of bad quality instruments are circumvented. In fact, the optimal LIV instruments are estimated as a by-product from the available data. The above discussion on the problems surrounding instrumental variables estimation is considered in greater detail in chapter 2. The literature review presented in this chapter covers most of the recent studies on weak instruments and contains several references to empirical research (labor economics, marketing, industrial economics) that aims at solving regressor-error dependencies. Furthermore, we point out a few alternative approaches to instrumental variables estimation that may be useful in solving regressor-error dependencies. This overview of the literature is a selection of issues that motivates the development of the latent instrumental variables (LIV) method. We conclude chapter 2 by highlighting the relevance and contribution of this research. In chapter 3 we introduce the latent instrumental variable (LIV) model. It solves regressor-error correlations in linear models by postulating that the instrumental variable is discrete and latent. As a byproduct, the method allows for testing for endogeneity without requiring access to observable instruments. Our simulation results show that the LIV method yields consistent estimates for the model parameters without having observable instrumental variables at hand. These results are superior to OLS estimates which are biased when the regressors are not exogenous. The proposed test statistic to test for exogeneity is shown to have a reasonable power throughout a wide range of settings. Furthermore, we prove identifiability of all model parameters. We apply the LIV method to an empirical measurement error application where a laboratory dummy instrumental variable is available. We show that the predicted LIV dummy instrument is identical to this observed laboratory instrument. Hence, the LIV estimate for the regression parameter, without using the observed instrument, is identical to the classical IV estimate that does require the

4 Chapter 1 Introduction existence of an observed instrument. We conclude that our instrument-free approach can be successfully used to estimate regression parameters in presence of regressor-error correlations, and to test for this dependency without the necessity of first finding valid instruments. The method proposed in chapter 3 is extended in chapter 4 to more general settings. We extend the model to a situation where several exogenous regressors are available. Furthermore, we allow for the possibility that observed instrumental variables are available. Using similar techniques as for the more simple LIV model, we prove that all model parameters can be identified. Importantly, from this proof it follows that the general LIV model is still identified, even when possible observed instruments have no or very small effects on the endogenous regressor. In such a case, the classical IV model is unidentified or weakly identified, respectively. This identifiability result suggests a straightforward approach to examine instrument weakness, that is based on existing testing principles. Furthermore, using a similar reasoning, it suggests a straightforward test of instrument exogeneity (validity). To the best of our knowledge, such tests to independently investigate instrument exogeneity and weakness for each instrument have not appeared in the literature before. We illustrate both tests by the means of a simulation example and show that the proposed tests have a reasonable power under a variety of settings. Besides, we propose several diagnostics to complete an LIV analysis. We propose several statistics to choose among the number of categories of the discrete LIV instrument. Furthermore, we examine the robustness of the LIV estimates towards misspecification of the likelihood equation and suggest how to examine residuals. We adapt standard methods from regression models to detect outliers and influential observations. The proposed LIV model, tests, and diagnostics are applied in chapter 5. We examine the effect of education on income, where the variable education is potentially endogenous due to omitted ability or other causes. We review part of the schooling literature and discuss the problems associated with classical instrumental variables estimation. As will become clear, the classical IV

5 method has produced a less than satisfactory solution in estimating the return to education. Importantly, researchers who use different sets of instruments arrive at different conclusions in terms of size and magnitude of the bias found in the OLS estimate for the return to eduction. We examine three empirical datasets. In all three applications, we find an upward bias in the OLS estimates of approximately 7%. Our conclusions agree closely with recent results obtained in studies with twins that find an upward bias in OLS of about 10% (Card, 1999). Diagnostic evaluations demonstrate that the LIV method provides a satisfactory fit of the data. We also find that for each of the three datasets the classical IV estimates for the return to education point to biases in OLS that are not consistent in terms of size and magnitude. The proposed diagnostics and tests to examine the validity of available observed instruments indicate that in two of the three datasets the used instruments are potentially weak and/or endogenous. Our conclusion is that LIV estimates are preferable to the classical OLS and IV estimates in understanding the effects of education on income. In chapter 6 we consider endogeneity problems in multilevel models, i.e. when data has an hierarchical structure. As before, the explanatory variables are assumed to be independent of the random components at various levels. However, in many applications this is an unrealistic assumption. When the same cross-section units are observed over time, for instance, or when data on siblings or twins is available, multilevel models may in fact be used to solve regressor-error correlations at a lower level. In this chapter we show that much care is required in relying on these methods in actual applications. We review methods that can be used to test for different types of random effects regressor dependencies. Secondly, we present results from Monte Carlo studies designed to investigate the performance of these methods, and, finally, we discuss estimation methods that can be used when some, but not all of the random effects regressor independence assumptions are violated. Because current methods are limited in various ways, we will also present a list of open problems and suggest solutions for some of them. As we will show, the issue of regressor random effects independence has received some attention in the

6 Chapter 1 Introduction econometrics literature, but this important work has had little impact on current research practices in the social and behavioral sciences. In chapter 7 we take parts of the results of chapter 6 a step further and develop sophisticated nonparametric Bayesian methods (Dey, Müller and Sinha, 1998) to solve regressor-error dependencies in multilevel models at various stages of the model. This method solves some of the problems addressed in chapter 6 and is a generalization of the standard LIV model in the sense that we do not impose restrictions (discreteness) on the distribution of the instruments. In fact, we let the data determine the best distribution. This is an important advantage as it does not require an a priori specification of the right number of categories of the unobserved discrete instrument. Because we take fully advantage of Bayesian estimation methods, the proposed model can readily be adapted and extended to more general and more complex model structures. Furthermore, insight in small sample properties of the estimation results is more easily obtained and inference does not rely on asymptotic results. This chapter is still work-in-progress and the results are preliminary, yet promising. We illustrate the potential usefulness of this approach to regressor-error dependencies and suggest steps for further research. In chapter 8 we present a discussion of the proposed LIV method and the results found. Furthermore, we present future research directions.