Generalized bivariate count data regression models

Similar documents
Limited dependent variable regression models

Score Tests of Normality in Bivariate Probit Models

Determinants and Status of Vaccination in Bangladesh

Fruit and vegetable consumption in Malaysia: a count system approach

Department of Statistics, Biostatistics & Informatics, University of Dhaka, Dhaka-1000, Bangladesh. Abstract

Does Consumer Behaviour on Meat Consumption Increase Obesity? - Empirical Evidence from European Countries

Testing the Predictability of Consumption Growth: Evidence from China

Measuring Goodness of Fit for the

Ec331: Research in Applied Economics Spring term, Panel Data: brief outlines

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n.

Marno Verbeek Erasmus University, the Netherlands. Cons. Pros

Lec 02: Estimation & Hypothesis Testing in Animal Ecology

Week 8 Hour 1: More on polynomial fits. The AIC. Hour 2: Dummy Variables what are they? An NHL Example. Hour 3: Interactions. The stepwise method.

Bayesian Joint Modelling of Longitudinal and Survival Data of HIV/AIDS Patients: A Case Study at Bale Robe General Hospital, Ethiopia

SUICIDE AND ORGAN DONORS: SPILLOVER EFFECTS OF MENTAL HEALTH INSURANCE MANDATES

A Brief Introduction to Bayesian Statistics

Multivariate meta-analysis for non-linear and other multi-parameter associations

Helmut Farbmacher: Copayments for doctor visits in Germany and the probability of visiting a physician - Evidence from a natural experiment

Health literacy and outpatient physician visits: An analysis of the Swiss health survey

WRITTEN PRELIMINARY Ph.D. EXAMINATION. Department of Applied Economics. January 17, Consumer Behavior and Household Economics.

Predictive statistical modelling approach to estimating TB burden. Sandra Alba, Ente Rood, Masja Straetemans and Mirjam Bakker

Bayesian Logistic Regression Modelling via Markov Chain Monte Carlo Algorithm

The University of North Carolina at Chapel Hill School of Social Work

CLASSICAL AND. MODERN REGRESSION WITH APPLICATIONS

Isolating causality between gender and corruption: An IV approach Correa-Martínez, Wendy; Jetter, Michael

MASTER DEGREE PROJECT

For general queries, contact

Econometric Game 2012: infants birthweight?

The Demand for Cigarettes in Tanzania and Implications for Tobacco Taxation Policy.

Chapter 14: More Powerful Statistical Methods

Testing for non-response and sample selection bias in contingent valuation: Analysis of a combination phone/mail survey

The Determinants of Fertility Change in Kenya: Impact of the Family Planning Program Extended Abstract Submitted to PAA 2008 by David Ojakaa 1

Journal of Economic Behavior & Organization

Attitudes, Acceptance, and Consumption: The Case of Beef Irradiation

NBER WORKING PAPER SERIES ALCOHOL CONSUMPTION AND TAX DIFFERENTIALS BETWEEN BEER, WINE AND SPIRITS. Working Paper No. 3200

Chapter 11: Advanced Remedial Measures. Weighted Least Squares (WLS)

UTILIZATION OF PRIMARY HEALTH CENTRE SERVICES IN RURAL AND URBAN AREAS: A COMPARATIVE STUDY IN MYSORE TALUK

Correlation and regression

NBER WORKING PAPER SERIES HOW WAS THE WEEKEND? HOW THE SOCIAL CONTEXT UNDERLIES WEEKEND EFFECTS IN HAPPINESS AND OTHER EMOTIONS FOR US WORKERS

Analysis of Hearing Loss Data using Correlated Data Analysis Techniques

Hazardous or Not? Early Cannabis Use and the School to Work Transition of Young Men

Impact and adjustment of selection bias. in the assessment of measurement equivalence

Generalized Linear Models of Malaria Incidence in Jubek State, South Sudan

Modern Regression Methods

Unit 1 Exploring and Understanding Data

EXAMINING THE EDUCATION GRADIENT IN CHRONIC ILLNESS

Adjusting for mode of administration effect in surveys using mailed questionnaire and telephone interview data

Modeling Tetanus Neonatorum case using the regression of negative binomial and zero-inflated negative binomial

Multiple Approaches to Absenteeism Analysis

Introduction to Survival Analysis Procedures (Chapter)

Aggregation Bias in the Economic Model of Crime

An Instrumental Variable Consistent Estimation Procedure to Overcome the Problem of Endogenous Variables in Multilevel Models

EMPIRICAL STRATEGIES IN LABOUR ECONOMICS

The decline of the British Pint: Estimates of the long run on and off-trade beer price elasticities

Cancer survivorship and labor market attachments: Evidence from MEPS data

Prepared by: Assoc. Prof. Dr Bahaman Abu Samah Department of Professional Development and Continuing Education Faculty of Educational Studies

Poisson regression. Dae-Jin Lee Basque Center for Applied Mathematics.

Finite Mixture Models with Applications

Midterm STAT-UB.0003 Regression and Forecasting Models. I will not lie, cheat or steal to gain an academic advantage, or tolerate those who do.

Hierarchical Linear Models: Applications to cross-cultural comparisons of school culture

Multiple Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

SPRING GROVE AREA SCHOOL DISTRICT. Course Description. Instructional Strategies, Learning Practices, Activities, and Experiences.

THE EFFECT OF WEIGHT TRIMMING ON NONLINEAR SURVEY ESTIMATES

MLE #8. Econ 674. Purdue University. Justin L. Tobias (Purdue) MLE #8 1 / 20

Underweight Children in Ghana: Evidence of Policy Effects. Samuel Kobina Annim

Latest developments in WHO estimates of TB disease burden

Application of Cox Regression in Modeling Survival Rate of Drug Abuse

Statistical Models for Censored Point Processes with Cure Rates

Do children in private Schools learn more than in public Schools? Evidence from Mexico

UNIVERSITY OF FLORIDA 2010

Michael Hallquist, Thomas M. Olino, Paul A. Pilkonis University of Pittsburgh

POL 242Y Final Test (Take Home) Name

Bootstrapping Residuals to Estimate the Standard Error of Simple Linear Regression Coefficients

Day Hospital versus Ordinary Hospitalization: factors in treatment discrimination

Behavioral probabilities

Data Analysis Using Regression and Multilevel/Hierarchical Models

Early Cannabis Use and the School to Work Transition of Young Men

Abstract. Introduction A SIMULATION STUDY OF ESTIMATORS FOR RATES OF CHANGES IN LONGITUDINAL STUDIES WITH ATTRITION

1. INTRODUCTION. Lalonde estimates the impact of the National Supported Work (NSW) Demonstration, a labor

P E R S P E C T I V E S

Final Exam - section 2. Thursday, December hours, 30 minutes

CHAPTER 3 RESEARCH METHODOLOGY

Biostatistics II

Meta-Analysis and Publication Bias: How Well Does the FAT-PET-PEESE Procedure Work?

Impact of Information, Goal Setting, Feedback & Rewards on Energy Use Behaviours: A Review & Meta-Regression Analysis of Field Experiments

Households: the missing level of analysis in multilevel epidemiological studies- the case for multiple membership models

Tails from the Peak District: Adjusted Limited Dependent Variable Mixture Models of EQ-5D Questionnaire Health State Utility Values

MEMORANDUM. No 03/99. The Economics of Screening Programs. By Steinar Strøm. ISSN: Department of Economics University of Oslo

Case A, Wednesday. April 18, 2012

MEA DISCUSSION PAPERS

HEALTH EXPENDITURES AND CHILD MORTALITY: EVIDENCE FROM KENYA

showcase the utility of models designed to incorporate zeros from multiple generating processes. We will examine predictors of absences as a vehicle

Analyzing diastolic and systolic blood pressure individually or jointly?

Sampling Weights, Model Misspecification and Informative Sampling: A Simulation Study

Introduction to Econometrics

University of Warwick institutional repository: A Thesis Submitted for the Degree of PhD at the University of Warwick

Part 8 Logistic Regression

Stochastic Dominance in Mobility Analysis

Quantitative Trait Analysis in Sibling Pairs. Biostatistics 666

Attitudes and Value of Time Heterogeneity

Transcription:

Economics Letters 68 (000) 31 36 www.elsevier.com/ locate/ econbase Generalized bivariate count data regression models Shiferaw Gurmu *, John Elder a, b a Department of Economics, University Plaza, Georgia State University, Atlanta, GA 30303, USA b Department of Economics, Munroe Hall, Middlebury College, Middlebury, VT 05753, USA Received 16 June 1999; accepted 1 October 1999 Abstract This paper proposes a flexible bivariate count data regression model that nests the bivariate negative binomial regression. An application to the demand for health services is given. 000 Elsevier Science S.A. All rights reserved. Keywords: Correlation; Censoring; Unobserved heterogeneity; Bivariate Negbin JEL classification: C35; I10 1. Introduction Event counts such as the number of doctor consultations and prescription drug utilization are likely to be jointly dependent. Another example of bivariate counts is the number of voluntary and involuntary job changes. Applications of these models have largely been confined to the bivariate Poisson regression model. This benchmark model imposes the restriction that the conditional mean of each count variable equals the conditional variance. For the common case of overdispersed counts, the bivariate negative binomial model is potentially useful. But this model has mainly been used in the 1 context of panel data. This paper develops a more general flexible bivariate count regression model based on first-order series expansion of the unknown density of unobserved heterogeneity component. We also extend the approach to estimation of censored bivariate regression models. We apply the method to bivariate models of the demand for health services. The dependent variables are the number of visits to a doctor and the number of visits to non-doctor health professionals. The proposed bivariate model nests the *Corresponding author. Tel.: 11-404-651-1907; fax: 11-404-651-4985. E-mail address: sgurmu@gsu.edu (S. Gurmu) 1 Gurmu and Trivedi (1994) and Cameron and Trivedi (1998) provide an overview of standard bivariate count models. 0165-1765/ 00/ $ see front matter 000 Elsevier Science S.A. All rights reserved. PII: S0165-1765(00)005-1

3 S. Gurmu, J. Elder / Economics Letters 68 (000) 31 36 bivariate negative binomial model as a special case. In the empirical application, the proposed generalized model dominates existing bivariate models using various criteria for comparing models.. The bivariate model Consider two jointly distributed random variables, Y1 and Y, each denoting event counts. For observation i (i 5 1,,...,N), we observe y, x h ji jijj51, where xji is a (kj 3 1) vector of covariates. The mean parameter associated with yji can be parameterized as uji 5 exp(x9jib j), j 5 1,, where bj is a (kj 3 1) vector of unknown parameters. If there is evidence that the dependent count variables y1 and y are correlated, joint estimation of the underlying equations is desirable. The most widely known model is the bivariate Poisson regression model which can be generated by convolutions of Poisson random variables (Kocherlakota and Kocherlakota, 199). The model restricts the mean to be equal to the variance for each of the respective marginal distributions; E( y ux )5var( y ux ) for j 5 1,. ji ji ji ji As in the univariate case, bivariate count models can be generalized to allow for overdispersion. We consider a bivariate model with unobserved heterogeneity. Let ( y ux,n ) Poisson(u n), where n is ji ji i ji i i the unobserved heterogeneity component with density g(n ). The mixture bivariate density takes the i form: Fj51 ji (u ) F y ji ]]]] G n j51 G( y 1 1) exp(ujin)(u i jin)y i ji ]]]]]] G( y 1 1) f( y, y ux ) 5E P g(n )dn 1i i i i i G ji y.i.i i i ji 5 P E [exp(u n)n ] where u 5u 1u, y 5 y 1 y, and E. denotes expectation taken with respect to the density of.i 1i i.i 1i i nf g n. i The proposed estimation approach is based on first-degree polynomial expansion of the distribution of unobserved heterogeneity where the leading term in the expansion is the gamma density with parameters a and l. The proposed flexible density is: a 1 a ni l ln ]]]] i i f 1 i g g(n ) 5 e 11 rp (n ) () G(a)s1 1 r d 1/ where P 1(n i) 5 a ln i/a is the first-order polynomial with unit variance, and r is an unknown parameter. The polynomial is squared to ensure non-negativity of the density in (). After some algebra, the mixture bivariate density (1) based on specification () can be derived as: F G s d f( y, y ux ) 5 P]]]] ]]]] l S 1 1]D C (3) y u ji ji G( y 1 a) u.i.i (a 1y.i ) y.i 1i i i j51 G( y 1 1) G(a) l i ji (1) For computational simplicity, this paper focuses on only first-order expansion. Analogous to recent techniques developed for univariate count models (Gurmu et al. 1999), higher-order polynomials can, in principle, be included.

S. Gurmu, J. Elder / Economics Letters 68 (000) 31 36 33 where: 1 ]] f Œ] i 1 1 1 C 5 1 1 r as1 h d1 r as1 h 1h h d (4) 1 1 r s ds d s ds d ]] a 1 y.i ] u.i 1 ]]] a 1 1 1 y.i ] u.i 1 with h15 a 1 1 l and h5 a 1 1 l. The log-likelihood can now be constructed readily. As in the univariate mixture Poisson models, we set the mean of the unobserved heterogeneity to unity. This implies the restriction: 1 Œ] l 5]] fa r a 1 r sa 1 dg (5) 1 1 r The model given in (3) will be referred to as the generalized bivariate negative binomial (GBIVARNB). Thus, the log-likelihood function for the GBIVARNB regression model is +(w) 5 N oi51 lnsf( y 1i, yiux i) d, where f(.) is given in (3) subject to the restriction (5). Here w consists of the unknown parameters b 1, b, a and r. The model just given can be generalized to truncated and censored jointly dependent count regression models. Censored samples may result when high counts are not observed, or may be imposed by survey design; see, for example, Okoruwa et al. (1988). Thus right censoring is the most common form in the analysis of univariate count models. Suppose that the bivariate counts are right censored at r 5 (r 1, r ) so that yji 5 1,, 3,...rj for j 5 1,. Letting f( y 1i, y i; w) denote the complete bivariate density (3), the log-likelihood for the right-censored bivariate count model is: r 1 r 1 N 1 f g f g F + (w) 5O d ln f( y,y ;w) 1 1 d ln 1 O O f( y 5 l, y 5 m; w) (6) c i 1i i i 1i i i51 l50 m50 where di5 1ify falls in the uncensored region, and di5 0 otherwise. The proposed approach provides flexible forms for the variances of the yji values and the 3 correlation between the dependent variables. The correlation coefficient for model (3) is: g G Vu1iui u1iui Corr( y 1i, y i) 5]]]]]]]]]]] (Vu u 1u )(Vu u 1u ) œ 1i 1i 1i i i i (7) where V5 ((a 1 1)/l (1 1 r ))fa 4rŒ] a 1 r sa 1 6dg. Another advantage of the proposed regression model is that it nests the bivariate negative binomial (BIVARNB) model when r 5 0. Note that, since r 5 0 implies that Ci5 1, setting Ci5 1 and l 5 a in (3) gives the BIVARNB model. The restriction r 5 0 can be used to test for the adequacy of the BIVARNB model using, for example, a score test. As a rough guide, the significance of the t-ratio corresponding to r is indicative of the inadequacy of the BIVARNB regression. In the application section, we also use the Akaike Information Criterion (AIC) to compare the estimated models. A model with a minimum AIC value is to be preferred. 3 As in bivariate Poisson and negative binomial models, this correlation can also be shown to be non-negative.

34 S. Gurmu, J. Elder / Economics Letters 68 (000) 31 36 3. An application The data analyzed here were originally employed by Cameron et al. (1988) in their analysis of various measures of health-care utilization using a sample of 5190 single-person households from the 1977 1978 Australian Health Survey. The data are obtained from the Journal of Applied Econometrics 1997 Data Archive. Here we model two possibly correlated dependent variables: (1) the number of consultations with a doctor during the -week period prior to the survey (Doctorcon); and () the number of consultations with non-doctor health professionals (chemist, optician, physiotherapist, social worker, district community nurse, chiropodist or chiropractor) during the past 4 weeks (Nondoccon). These utilization measures have two interesting features overdispersion and a very high proportion of non-users. The mean and the standard deviation of doctor visits are 0.30 and 0.798. The corresponding values for health professionals are 0.15 and 0.965. The frequencies of zero visits in the Doctorcon and Nondoccon samples are 80% and 91%, respectively. The explanatory variables are as follows: (1) Socio-economic variables a dummy variable for whether or not the patient is female (Sex), age in years divided by 100 (Age), age-squared (Agesq), and annual income in ten-thousands of dollars (Income); and () insurance and health status variables-indicator variable for private insurance coverage (Levyplus), free government insurance cover due to low income (Freepoor), free government coverage due to old age, disability or veteran status (Freerepa), default government Medibank insurance coverage paid for by income levy (Levy), number of illnesses in the past weeks (Illness), number of days of reduced activity in past weeks due to illness or injury (Actdays), general health questionnaire score using Goldberg s method with high score indicating bad health (Hscore), indicator variable for chronic condition not limiting activity (Chcond1), and indicator variable for chronic condition limiting activity (Chcond). See Cameron et al. (1988) for summary statistics of the covariates. An important consideration is whether the two health utilization variables are independent or not. Following a general framework for testing independence in various models summarized in Cameron and Trivedi (1998), p values of tests of independence for Doctorcon and Nondoccon are computed. For example, p values of tests of zero correlation of order one in baseline Poisson and negative binomial models are all less than %. There is overwhelming evidence that Doctorcon and Nondoccon are dependent counts, and hence joint estimation is desirable. The bivariate Poisson model seems to be inadequate for joint estimation of overdispersed count data. Table 1 presents the maximum likelihood estimates of the BIVARNB and the generalized BIVARNB. The table reveals that the GBIVARNB dominates the BIVARNB model in terms of both the maximized value of the log-likelihood function and the AIC. The GBIVARNB estimates show that recent health status measures (Illness, Actdays) and one of the measures of long-term health status (Hscore) are important determinants of both doctor and non-doctor health professional visits. All of these are significant at 5%. The positive coefficient on the health status insurance variable Levyplus indicates that, relative to the default government Medibank insurance coverage, private insurance is associated with higher use of health services. Further, Non-doctor consultation is responsive to Sex, Age-sq, Chcond1 and Chcond. The preferred model predicts a convex relationship between Non-doctor health care utilization and age, with a minimum number of visits reached at about age 35. Both health utilization measures are unresponsive to changes in income. Apart from log-likelihood and AIC values, there are also some differences in the results from the GBIVANB and the BIVARNB models. In particular, there are differences in the statistical significance

Table 1 Estimates from bivariate models S. Gurmu, J. Elder / Economics Letters 68 (000) 31 36 35 Variable BIVARNB Generalized BIVARNB Doctorcon Nondoccon Doctorcon Nondoccon Est. ut-ratiou Est. ut-ratiou Est. ut-ratiou Est. ut-ratiou Constant.309 17.0.58 4.83.514 1.96.753 6.0 Sex 0.176 1.98 0.341.46 0.160 1.01 0.38.55 Age 0.065 1.09 4.101 1.60 0.875 1.18.961 1.6 Agesq 0.533 1.34 5.35 1.84 0.497 0.64 4.10.10 Income 0.139 1.19 0.005 0.81 0.130 1.0 0.00 0.08 Levyplus 0.14 0.80 0.330 1.74 0.07 1.88 0.405. Freepoor 0.558 1.74 0.156 0.68 0.691.3 0.301 0.90 Freerepa 0.16 1.09 0.57.65 0.188 1.65 0.530.38 Illness 0.36 9.54 0.104.4 0.34 8.63 0.106.14 Actdays 0.151 17.41 0.135 9.18 0.144 16.60 0.18 9.71 Hscore 0.043.57 0.059.11 0.044.73 0.061.43 Chcond1 0.084 0.47 0.45.49 0.166 1.41 0.519 3.8 Chcond 0.98.07 1.096 5.8 0.05 1.39 0.975 5.88 a log(a) 0.506 7.9 0.115 0.96 r 0.604 10.35 Log-likel. 5806.3 577.0 AIC 11666.6 11600.0 a The coefficients log(a) and r are common to both the Doctorcon and Nondoccon equations. of some variables such as Sex, Levyplus and Chcond in the Doctorcon equation. We also computed the predicted marginal effects of changes in the regressors on the mean number of visits (Footnote 4). Generally the marginal effects are greater in the bivariate models than in the independent Poisson and Negbin models. The difference in the mean effects for various models is likely to be particularly important for significant regressors. The sample average of the correlation between Doctorcon and Nondoccon is 0.019 for bivariate Poisson, and about 0.3 in both the BIVARNB and GBIVARNB models. 4. Conclusion This paper has developed a generalization of the bivariate negative binomial regression model. In the empirical application, the generalized model dominates existing bivariate models in terms of 4 various statistical criteria. The approach has recently been extended to estimation of multi-factor error multivariate count models. 4 A GAUSS program used in this paper will be made available at the following Web site: http://www.gsu.edu/ ecosgg; a table showing the marginal effects and moments will also be posted here.

36 S. Gurmu, J. Elder / Economics Letters 68 (000) 31 36 References Cameron, C., Trivedi, P.K., 1998. Regression Analysis of Count Data, Cambridge University Press. Cameron, C., Trivedi, P.K., Milne, F., Piggott, J., 1988. A microeconometric model of the demand for health care and health insurance in Australia. Review of Economic Studies LV, 85 106. Gurmu, S., Rilstone, P., Stern, S., 1999. Semiparametric estimation of count regression models. Journal of Econometrics 88, 13 150. Gurmu, S., Trivedi, P.K., 1994. Recent developments in event count models: A survey, Discussion Paper No. 61, Thomas Jefferson Center, Department of Economics, University of Virginia. Kocherlakota, S., Kocherlakota, K., 199. Bivariate Discrete Distributions, Marcel Dekker, New York. Okoruwa, A.A., Terza, J.V., Nourse, H., 1988. Estimating patronization shares for urban retail centers: An extension of the Poisson gravity model. Journal of Urban Economics 4, 41 59.