The Effect of Urban Agglomeration on Wages: Evidence from Samples of Siblings

Similar documents
MEA DISCUSSION PAPERS

The Dynamic Effects of Obesity on the Wages of Young Workers

EMPIRICAL STRATEGIES IN LABOUR ECONOMICS

Noncognitive Skills and the Racial Wage Gap

Econometric Game 2012: infants birthweight?

Following in Your Father s Footsteps: A Note on the Intergenerational Transmission of Income between Twin Fathers and their Sons

Preliminary Draft. The Effect of Exercise on Earnings: Evidence from the NLSY

Constructing AFQT Scores that are Comparable Across the NLSY79 and the NLSY97. Joseph G. Altonji Prashant Bharadwaj Fabian Lange.

Cancer survivorship and labor market attachments: Evidence from MEPS data

Instrumental Variables Estimation: An Introduction

Motherhood and Female Labor Force Participation: Evidence from Infertility Shocks

Wesleyan Economics Working Papers

Reading and maths skills at age 10 and earnings in later life: a brief analysis using the British Cohort Study

Problem Set 5 ECN 140 Econometrics Professor Oscar Jorda. DUE: June 6, Name

THE WAGE EFFECTS OF PERSONAL SMOKING

Do children in private Schools learn more than in public Schools? Evidence from Mexico

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n.

Carrying out an Empirical Project

Marno Verbeek Erasmus University, the Netherlands. Cons. Pros

Those Who Tan and Those Who Don t: A Natural Experiment of Employment Discrimination

EC352 Econometric Methods: Week 07

Journal of Development Economics

The Dynamic Effects of Obesity on the Wages of Young Workers

How Early Health Affects Children s Life Chances

Does Male Education Affect Fertility? Evidence from Mali

Skills in Urban Economics, William Strange

Testing for non-response and sample selection bias in contingent valuation: Analysis of a combination phone/mail survey

Write your identification number on each paper and cover sheet (the number stated in the upper right hand corner on your exam cover).

THE EFFECT OF CHILDHOOD CONDUCT DISORDER ON HUMAN CAPITAL

CHAPTER 2: TWO-VARIABLE REGRESSION ANALYSIS: SOME BASIC IDEAS

Introduction to Econometrics

Baum, Charles L. and Ruhm, Christopher J. Age, Socioeconomic Status and Obesity Growth Journal of Health Economics, 2009

Fertility and its Consequence on Family Labour Supply

Hazardous or Not? Early Cannabis Use and the School to Work Transition of Young Men

Identifying Endogenous Peer Effects in the Spread of Obesity. Abstract

The Impact of Alcohol Consumption on Occupational Attainment in England

Gender and Generational Effects of Family Planning and Health Interventions: Learning from a Quasi- Social Experiment in Matlab,

A NON-TECHNICAL INTRODUCTION TO REGRESSIONS. David Romer. University of California, Berkeley. January Copyright 2018 by David Romer

ECON Microeconomics III

NBER WORKING PAPER SERIES CHILDHOOD HEALTH AND SIBLING OUTCOMES: THE SHARED BURDEN AND BENEFIT OF THE 1918 INFLUENZA PANDEMIC.

Lecture II: Difference in Difference and Regression Discontinuity

Early Cannabis Use and the School to Work Transition of Young Men

Problem set 2: understanding ordinary least squares regressions

Ec331: Research in Applied Economics Spring term, Panel Data: brief outlines

Sample selection in the WCGDS: Analysing the impact for employment analyses Nicola Branson

Are Illegal Drugs Inferior Goods?

Bayesian graphical models for combining multiple data sources, with applications in environmental epidemiology

SELECTION BIAS IN EDUCATIONAL TRANSITION MODELS: THEORY AND EMPIRICAL EVIDENCE

The U-Shape without Controls. David G. Blanchflower Dartmouth College, USA University of Stirling, UK.

Overview of data collection. Site Visit & Data Collection Overview. Preliminary Site Visit

Instrumental Variables I (cont.)

Multiple Linear Regression (Dummy Variable Treatment) CIVL 7012/8012

"Inferring Sibling Relatedness from the NLSY Youth and Children Data: Past, Present, and Future Prospects" Joseph Lee Rodgers, University of Oklahoma

Rapid decline of female genital circumcision in Egypt: An exploration of pathways. Jenny X. Liu 1 RAND Corporation. Sepideh Modrek Stanford University

Policy Brief RH_No. 06/ May 2013

The Prevalence of HIV in Botswana

Meta-Analysis and Publication Bias: How Well Does the FAT-PET-PEESE Procedure Work?

Supplemental Appendix for Beyond Ricardo: The Link Between Intraindustry. Timothy M. Peterson Oklahoma State University

Aggregation Bias in the Economic Model of Crime

Does Tallness Pay Off in the Long Run? Height and Life-Cycle Earnings

Isolating causality between gender and corruption: An IV approach Correa-Martínez, Wendy; Jetter, Michael

Class 1: Introduction, Causality, Self-selection Bias, Regression

Chapter 2 Interactions Between Socioeconomic Status and Components of Variation in Cognitive Ability

NBER WORKING PAPER SERIES HOW WAS THE WEEKEND? HOW THE SOCIAL CONTEXT UNDERLIES WEEKEND EFFECTS IN HAPPINESS AND OTHER EMOTIONS FOR US WORKERS

How Much Should We Trust the World Values Survey Trust Question?

The Changing Nature of Employment-Related Sexual Harassment: Evidence from the U.S. Federal Government,

La Follette School of Public Affairs

Introduction to Applied Research in Economics Kamiljon T. Akramov, Ph.D. IFPRI, Washington, DC, USA

Establishing Causality Convincingly: Some Neat Tricks

Fit to play but goalless: Labour market outcomes in a cohort of public sector ART patients in Free State province, South Africa

Version No. 7 Date: July Please send comments or suggestions on this glossary to

The Effects of Maternal Alcohol Use and Smoking on Children s Mental Health: Evidence from the National Longitudinal Survey of Children and Youth

Do Education and Income Really Explain Inequalities in Health? Applying a Twin Design

Do Comparisons of Fictional Applicants Measure Discrimination When Search Externalities Are Present? Evidence from Existing Experiments

Lecture II: Difference in Difference. Causality is difficult to Show from cross

Sampling Weights, Model Misspecification and Informative Sampling: A Simulation Study

Testing the Predictability of Consumption Growth: Evidence from China

Underweight Children in Ghana: Evidence of Policy Effects. Samuel Kobina Annim

Fix your attitude: Labor-market consequences of poor attitude and low self-esteem in youth

Stature and Life-Time Labor Market Outcomes: Accounting for Unobserved Differences

Examining Relationships Least-squares regression. Sections 2.3

HEALTH EXPENDITURES AND CHILD MORTALITY: EVIDENCE FROM KENYA

The Limits of Inference Without Theory

Working When No One Is Watching: Motivation, Test Scores, and Economic Success

Special Needs Children and Sibling Spillover Effects: Evidence from the National Longitudinal Study of Youth

Working paper no Selection Bias in Educational Transition Models: Theory and Empirical Evidence. Anders Holm Mads Meier Jæger

Confidence Men? Gender and Confidence: Evidence among Top Economists

Unit 1 Exploring and Understanding Data

Using twins to resolve the twin problem of having a bad job and a low wage* Petri Böckerman** Pekka Ilmakunnas*** and. Jari Vainiomäki****

August 29, Introduction and Overview

Practical propensity score matching: a reply to Smith and Todd

The Impact of Relative Standards on the Propensity to Disclose. Alessandro Acquisti, Leslie K. John, George Loewenstein WEB APPENDIX

The Effects of Food Stamps on Obesity

Do Comparisons of Fictional Applicants Measure Discrimination When Search Externalities Are Present? Evidence from Existing Experiments

1 Online Appendix for Rise and Shine: The Effect of School Start Times on Academic Performance from Childhood through Puberty

SELECTED FACTORS LEADING TO THE TRANSMISSION OF FEMALE GENITAL MUTILATION ACROSS GENERATIONS: QUANTITATIVE ANALYSIS FOR SIX AFRICAN COUNTRIES

School Starting Age and Long-Run Health in the U.S.

THEORY OF POPULATION CHANGE: R. A. EASTERLIN AND THE AMERICAN FERTILITY SWING

Firming Up Inequality

Cross-Lagged Panel Analysis

Transcription:

The Effect of Urban Agglomeration on Wages: Evidence from Samples of Siblings Harry Krashinsky University of Toronto Abstract The large and significant relationship between city population and wages has been well established in the agglomeration literature, yet its causal interpretation remains debated. This paper contributes new evidence to this debate by using multiple data sets of siblings in order to estimate the agglomeration premium while controlling for unobserved individual heterogeneity with a family-specific fixed effect. In the absence of this fixed effect, the agglomeration premium is large and significant. But after a familial fixed effect is included in the regression framework, the city-size wage premium becomes small in magnitude and statistically insignificant in all of the data sets used in the analysis. The results demonstrate the importance of family background for interpreting the agglomeration premium. Corresponding author: Harry Krashinsky, 121 St. George Street, Centre for Industrial Relations, University of Toronto, Toronto, Ontario, Canada, M5S 2E8. Telephone: (416) 978-1744. Fax: (416) 978-5696. Email: harry.krashinsky@utoronto.ca I would like to thank Orley Ashenfelter and Cecilia Rouse for permitting me to access to the data of identical twins. I would like to thank William Strange and Stuart Rosenthal for several helpful comments and discussions about this paper. I would also like to thank Jeremy Glazier for helpful research assistance.

1 Introduction The effect of a city s population on wages is highly significant and large in magnitude. Various studies have demonstrated that doubling the population of an individual s city would cause wages to rise by three to seven percent, and moving from a city of less than 500,000 people to one with more than half-a-million residents would increase wages by over 20 percent. Both of these effects are at least as large as the returns to many standard variables included in a wage regression, and perhaps because of this, researchers have questioned the causal nature of the relationship between city size and wages. Generally, papers which argue against the causal effects of city size on wages tend to find that non-random selection is the reason wages are higher in cities than in non-urban areas better workers select into cities to obtain higher-paying jobs there. Conversely, studies which argue that city size does have a causal effect on wages suggest that even after accounting for selection issues, agglomeration causes better matches between workers and firms because larger cities have thicker markets, or that cities contain other amenities (or disamenities) which cause wages to be higher in urban areas. To contribute to this debate, this paper presents new evidence from multiple data sets of siblings in order to estimate the city-size wage premium while using an econometric strategy that is new to the agglomeration literature: incorporating familial effects on wages. The first data set used in the analysis is a sample of identical twins; this data is advantageous for assessing the agglomeration premium because it is possible to contrast the wages of twins in cities of different sizes. Such a contrast estimates the causal effect of agglomeration on wages because it accounts for the unobserved component of ability, since the twin pairs are genetically identical, and it also accounts for familial effects on earnings because the twins also share the same family background. The evidence will show that in a crosssectional regression without controls for familial ability, there is a significant and large effect of city size on wages, but this effect becomes insignificant in the within-twin analysis. More importantly, controlling for familial ability causes the agglomeration premium to become 1

significantly reduced in many different specifications and econometric approaches. The analysis uses two popular measures of city size to represent the effect of agglomeration: the log of a city s population and an indicator equal to one if the city s population exceeds 500,000 residents. For both variables, the inclusion of a family-specific fixed effect eliminates the city-size wage premium. To address the robustness of these findings, two other data sets will also be incorporated into the analysis. First, the National Longitudinal Survey of Youth (NLSY) will be employed because it is a longitudinal data set that also contains a large number of siblings. Since there are few twins in this sample, it is not possible to assume that the siblings are equally able because they are genetically identical; however, the NLSY does contain a measure of unobserved ability since all respondents were assigned a score based upon their performance on a series of standardized tests. As such, within-family differences in the unobserved component of ability can be captured by differences in these standardized test scores. The results will demonstrate that, as was the case with the data set of twins, there is a large and significant effect of city size on wages in simple cross sectional regressions, but the inclusion of a familial fixed effect into these regressions eliminates the significance of the agglomeration premium. The second data set that will be used to consider the robustness of the impact of familial effects on the agglomeration premium is the five-percent Public Use Microdata Sample (PUMS) from the 2000 United States Census. The PUMS is a household-based sample, and it contains information about each member of the household, as well as information regarding the familial relationship between the head of the household and all other household members. As such, it is possible to identify two types of sibling sets in the data: household heads and household members who are siblings of the household head, as well as children of a household head who are living together in a given household. Furthermore, the information collected by the Census makes it possible to determine the effect on wages of the population of the public-use microdata area (PUMA) where the respondent works. This 2

measure of agglomeration does have a significantly positive effect on wages, but including familial-specific fixed effects eliminates the significance of the agglomeration premium for both sets of siblings. Overall, the evidence from all three data sets is remarkably consistent and underlines the importance of controlling for familial ability when assessing the magnitude and significance of the agglomeration premium. 2 Literature Review The literature on the agglomeration wage premium documents the significant effect of city size on wages 1, and discusses how this premium is affected by selection, which arises from the fact that the distribution of individual ability may not be the same across cities of different sizes. Wheeler (2001) uses the 1980 IPUMS sample and finds that the return to doubling city size is equal to approximately 3 percent. He also finds that this return varies significantly for less- or more-educated workers: whereas there is not a significant effect of city size on wages for workers without any high school education, college graduates exhibit a return of approximately 4 percent if city size is doubled. Glaeser and Mare (2001) study the urban wage premium by comparing individuals who live in a city of at least 500,000 people to those who do not, and find that the wage difference between these two groups is approximately 25 to 30 percent. After incorporating individual fixed effects into their framework, Glaeser and Mare show that this premium significantly decreases to between 4 to 10 percent, and the authors suggest that this decrease could be consistent with the agglomeration premium being caused by ability bias. However, they also argue that the fixed-effect framework does not address the higher wage growth evident within cities. Concurrent with this notion, Glaeser and Mare find evidence of higher wage growth for workers in cities, but it is smaller than the impact of the fixed effect on wages from living in cities, and the authors suggest that this fixed effect is an important component of the urban wage premium. Yankow (2006) also considers big-city wage premia by analyzing cities 1 One of the first papers to provide theoretical and empirical evidence of this idea is Roback (1982). 3

with at least one million residents, and cities between one-quarter million and one million residents. Similar to Glaeser and Mare, he finds that fixed effects can account for about two-thirds of the urban premium in this specification, and that growth effects can account for some of the remaining premium. Wheeler (2006) also investigates the effect of cities on wage growth, and specifically compares wage growth which occurs within a job to wage growth resulting from job changes. Overall, he finds mixed evidence for higher wage growth in cities, and that wage growth within a job is not significantly higher for workers at jobs in large cities; it is only wage growth resulting from job changes that is significantly higher in large cities (however, this difference is not significant in the fixed effect specification). Rosenthal and Strange (2006) consider the transmission mechanisms through which the urban wage premium is conveyed, and find that it is primarily due to the concentration of more-educated workers in urban areas, and that this effect is attenuates with distance. After including controls for endogeneity as well as using an instrumental variables procedure, the authors find that an increase in population size has a significant effect on wages. Bacalod et. al. (2007) create measures of skills to explore the urban premium, and they find that attributes such as cognitive skills are, in general, uniformly distributed across cities of different size. However, these skills are more highly valued in larger cities than smaller ones, and this greater valuation is robust to the inclusion of AFQT scores and individual fixed effects in thewageequation. International evidence on the agglomeration premium is generally similar to the evidence from U.S. data sources. Tabuchi and Yoshida (2000) use Japanese data to estimate the urban wage premium, and find that wages should increase by 10 percent if city size doubles. Combes, Duranton and Gobillon (2007) use a large French panel data set to consider the impact of the area fixed effects on wages. In their analysis of French wages, Combes et. al. find that individual fixed effects are, by far, the most important determinants of wages as well as area fixed effects, which suggests that sorting on the basis of ability is an important component of the urban wage premium. However, it should also be noted 4

that area fixed effects were not entirely eliminated by the incorporation of individual fixed effects into a wage regression, and that agglomeration still plays a significant role in wage determination. 3 Data and Estimation Approach To consider the effect of familial ability on the urban wage premium, a common estimation procedure will be used for all three data sets, and the analysis will begin with the data set of identical twins. This data was collected during the summers of 1991, 1992, 1993 at the Twinsburg Twins Festival in Twinsburg, Ohio, and the interview questionnaires were modeled after the Census and CPS instruments. 2 The data are drawn from the sub-sample of identical white twins, 3 both of whom have worked within two years prior to the interview and are living within the United States, and the key question for the purpose of the analysis is the population of the city in which each sibling lives. Respondents were asked to report the city in which they lived, and then this city s population was separately entered into the data based upon Census statistics. 4 Table 1 displays the characteristics of the twins sample, and compares them to white workers from reweighted 5 CPS supplements. The data 2 Some of the data from the first three waves of this survey were used by Ashenfelter and Krueger (1994) and Ashenfelter and Rouse (1998), who provide a discussion of the procedures used to collect this data. Some additional questions were specifically designed for interviewing twins, such as the twin s report of his or her sibling s educational attainment, which was used as an instrumental variable to account for the effect of measurement error on the return to education. 3 Ashenfelter and Krueger (1994) and Ashenfelter and Rouse (1998) discuss the fact that, on average, the black twins interviewed for the sample exhibited unrepresentative characteristics. As such, they were dropped from the sample. However, this exclusion does not affect any of the main results presented in this paper. 4 The respondent s report of their hometown was matched against the city population provided by the 1990 U.S. Census. 5 Reweighting was conducted on the basis of the twin s state of residence. As was the case with the results in Ashenfelter and Rouse (1998), these differences have no large effect on the results in this paper. Also, wage regressions using CPS and twin data yield very similar coefficients on all the variables in my wage regressions. 5

set composed of identical twins is generally similar to the reweighted CPS samples, with some small differences evident in characteristics like marital status. The data set of identical twins provides a unique advantage in assessing the agglomeration premium; specifically, it is possible to determine the causal effect of a city s population on wages by assuming that the unobserved component of ability is equal for both twins. This implies that the difference in earnings between a twin in a large city and his sibling in a small town will be attributed to the effect of city size on earnings, and will not be biased by the unobserved component of ability. Figures 1 and 2 graphically demonstrate the effect of making this assumption about the city-size premium. In Figure 1, the log of each twin s hourly wage is plotted against the log of their city s size, and a positive fitted relationship is evident between these two variables. 6 If this positive effect of city size was not prone to ability bias, then comparing the within-twin difference in wages with the within-twin difference in city size should yield a roughly similar result, given that each pair of twins is assumed to be equally able. However, Figure 2 plots the within-twin differences in wages and city size, and there is not a significant relationship between these two within-twin differences that is, a difference in city size is not observed to correlate with a difference in wages for each twin pair. 7 This suggests that the effect of ability bias is important within the analysis of the agglomeration premium. However, Figures 1 and 2 are only suggestive of the importance of ability bias, and it is necessary to explore the agglomeration premium with a more formal econometric framework. To operationalize this framework, it is assumed that ability has a linear effect on earnings, and the earnings equations for each twin can be expressed as follows: y 1j = β 0 1jX 1j + α 0 Z j + A j + ε 1j y 2j = β 0 2jX 2j + α 0 Z j + A j + ε 2j 6 In a simple bivariate regression of log wages on the log of city size, the coefficient on city size is 0.056 with a standard error of 0.012. 7 The bivariate regression of the within-twin difference in wages on the within-twin difference in city size yields a coefficient of 0.014 with a standard error of 0.016 for the within-twin difference in city size. 6

where X ij represents a vector of individual characteristics for twin i from family j, Z j represents common characteristics for family j, A j is a family-specific abilitytermandε ij is an individual-specific error term. The identifying assumption of the model is that the returns to individual characteristics X ij are the same for both twins, and that ability is correlated between twins. Specifically, A j is expressed as: µ X1j + X 2j A j = γ + v j 2 These assumptions lead to the reduced-form correlated random-effects model (Chamberlain 1982): µ X1j + X 2j y 1j = βx 1j + αz j + γ + v j + ε 1j 2 µ X1j + X 2j y 2j = βx 2j + αz j + γ + v j + ε 2j 2 where γ represents the correlation between a family s ability level and each twin s individual characteristics. An attractive component of this model is that it provides estimates of both γ, theeffect of familial ability on wages, and β, theeffect of individual-specific variables on earnings. An alternative estimation procedure that accounts for familial ability bias is the fixed-effects model, which differences the two regressions used in the correlated random effects model. The resulting equation is: (y 1j y 2j )=β(x 1j X 2j )+(ε 1j ε 2j ) Although the fixed-effect model yields unbiased estimates that are not correlated with ability, it does not provide a direct estimate of γ. Estimates from the OLS, correlated random effects and fixed effects models are provided in Table 2, which displays results for earnings equations which use two different measures to represent the agglomeration premium: the logarithm of the respondent s city s population, and an indicator variable equal to one if the respondent s city has a population in excess of 500,000 residents. If familial ability had no effect on earnings, then the OLS 7

estimates displayed in Table 2 would provide an unbiased estimate of the effect of the exogenous regressors, including both variables used to capture the agglomeration premium. Also, under these circumstances, the OLS and correlated random effects estimates would differ only because of sampling error. However, this is not the case. Results in the first three columns show that the coefficient for the log of city size differs dramatically depending on the estimation procedure. Without controls for ability, the estimates in column one show that the premium for doubling a city s size is roughly four percent, which is within the range of premia estimated by prior studies. 8 However, the results in columns two and three demonstrate that accounting for familial ability greatly reduces the significance and the magnitude of this coefficient the city size premium is basically reduced to zero, with a correspondingly small t-value. In addition, the correlation between ability and city size is large in magnitude and highly significant. These results suggest that unobserved ability is asignificant factor for determining the city size premium, which is consistent with studies such as Combes et. al. (2007), amongst others. 9 The last three columns of Table 2 display results from regressions which include an indicator equal to one if the city s population is greater than half of a million residents. The findings in the fourth column demonstrate that residing in a city whose population exceeds 500,000 generates a wage premium of approximately 19 percent. Glaeser and Mare argued that this effect may be due to selection (captured by an individual fixed effect) as well as higher wage growth that occurs in cities. The results in columns five and six of this Table attest to the importance of fixed effects, since they demonstrate that there is a large effect of 8 As previously discussed, Wheeler (2001) finds a premium of 3 percent for doubling city size. Also, Bacolod et. al. find a premium of approximately 6 to 7 percent. 9 It is also noted that the other estimated coefficients in the OLS model from the data set of twins (such as the returns to education, marital status, and tenure) are similar to those in commonly used data sets. Also, as demonstrated in earlier work, the return to education remains significant even after controlling for familial ability, as does tenure, but marital status does not. Ashenfelter and Krueger (1994) and Ashenfelter and Rouse (1998) showed that education remains significant even in the presence of a family fixed-effect, while Krashinsky (2004) showed that the marital premium drops to zero after familial controls are included in the regression. 8

familial ability on this wage premium. In fact, incorporating a familial fixed effect into the econometric framework results in the premium becoming insignificant and significantly lower than it was in the case where no such controls were included in the regression. This suggests that, similar to the findings in the first three columns of the table, the agglomeration wage premium for cities of at least 500,000 people is also prone to bias through unobserved ability. Table 3 bifurcates the sample into men and women to determine whether or not the effects of familial ability controls differ by gender, and uses both types of measures of agglomeration employed in Table 2 the log of the city s population and an indicator equal to one if the city s population is over five-hundred thousand. The first four columns of Table 3 present the OLS and correlated-random effects estimates for men, and the last four columns present the same estimates for women. The first and fifth columns show that the premium associated with the log of a city s population is highly significant forbothmenandwomen, and roughly of the same magnitude for both groups approximately 4 percent for women and 5 percent for men. For both genders, however, columns two and six demonstrate that the effect of controlling for familial ability is the same as in Table 2 the city-size premium is no longer significant, its magnitude is basically zero and it is significantly lower than it was in the absence of familial controls. Columns three and seven display the return to living in a city with a population of at least half of one million people, and for men, the premium is large and highly significant. Columns four and eight demonstrate that with the inclusion of familial controls, though, the coefficient on this variable is small in magnitude and statistically insignificant. Recent papers have also documented the value of various skills in cities. For instance, Bacolod et. al. have found that certain types of skills are rewarded more in cities than outside of cities. To that end, the city-size wage premium is investigated within a quantile regression context to allow the agglomeration premium to differ for workers with different levels of skill, and to assess the impact of a familial fixed effect within this context as well. Results from quantile regressions are displayed in Table 4 for each of the different variables that represent 9

the effect of city size on wages, both with and without the inclusion of family controls in the regression specification. 10 The firsttworowsoftable4showtheresultsfromquantile regressions at the 10th, 25th, 50th, 75th and 90th percentiles for models which use the log of the city s population as the independent variable representing the effect of city size. For all five percentiles, city size has a significant effect on wages, and this effect grows in magnitude at higher percentiles (which was also evident in Bacolod et. al.). However, for all five cases, introducing controls for familial ability causes the premium to become statistically insignificant, and significantly smaller than the return to city size in the absence of family controls. This suggests that familial ability not only has an impact on the average return to city size (as was demonstrated in Tables 2 and 3), but it also affects the agglomeration premium throughout the wage distribution as well. The last two rows of Table 4 show similar results for the quantile regressions which include a dummy variable equal to one if the twin resides in a city whose population exceeds 500,000 people: at all five percentiles, the effect of including familial controls reduces the magnitude of the effect of city size on wages. Furthermore, at all percentiles except the 50th, the premium is significant in the absence of familial controls, but insignificant with these controls. Overall, the results in Tables 2 through 4 demonstrate that controlling for familial ability between twins accounts for virtually all of the wage premium associated with city size, and this is true for many different specifications. Familial controls had a significant effect on the premium associated with the log of the city s population and a dummy variable equal to one if the city s population exceeds half a million residents. Also, the family fixed effect had a similar impact on the agglomeration premium evident in quantile regressions. This suggests that the unobserved component of ability is a significant factor for explaining the wage effects of agglomeration. However, given that the data set of twins is not a large data set, it is important to demonstrate that the impact of familial effects on the agglomeration 10 For brevity s sake, Table 4 only contains estimates from the OLS and correlated random effects models. Within-twin estimates of the agglomeration premium in the quantile regression context are similar to results using the correlated random effects model. 10

premium is also present in other data sets as well. As such, the agglomeration premium will be analyzed in two separate data sets in the next section. 4 Results from the NLSY and the U.S. Census To consider the robustness of the findings from the data set of twins, the analysis will also explore the effect of familial ability on the agglomeration premium within the National Longitudinal Survey of Youth (NLSY) and the five-percent Public Use Microdata Sample (PUMS) from the 2000 United States Census. The NLSY contains several sets of siblings because of its design: the data were assembled from an individual-level survey drawn from households with youths between the ages of 14 and 22 in 1979. A large number of households have multiple youths surveyed for the sample, and since the data also contains longitudinal information about the urban status of each respondent s town (well after they move out of their parents home), it possible to use this data to compare the wages of siblings in different areas. Similarly, the U.S. Census is a household-level survey which contains information about the head of the household, as well as other members within the household, so it is possible to make two types of sibling comparisons with this data. First, since some households contain a household head and his or her sibling, and because the data contain information on each respondent s Public Use Microdata Area (PUMA) of work, a wage comparison may be conducted for these siblings working in different PUMAs of work. Second, similar comparisons can be made for children of the household head who are working and still living with the household head. Overall, the evidence from the NLSY and Census will demonstrate that familial fixed effects make the agglomeration premium statistically insignificant and small in magnitude, which corroborates the findings from the data set of twins. Table 5 presents descriptive statistics from a sample drawn from the 1979 to 2004 waves of the NLSY for all respondents who work at least 15 hours per week and more than 26 weeks of the year, not including respondents from the two oversamples collected by the 11

NLSY. 11 For comparability s sake, the analysis is restricted to include only those siblings who are within three years of age of each other, and also uses same-gender siblings sisters are compared to sisters and brothers to brothers in order to avoid issues regarding differential labor force participation for brother-sister pairs. 12 Table 5 displays means from the entire cross-sectional sample of the NLSY as well as means from the sample of same-gendered siblings; generally, the samples are quite similar. The first row displays the percentage of respondents who reside in urban areas, which the NLSY classifies as a central core or city and its adjacent, closely settled territory which have a combined total population of 25,000 or more. Although not as detailed as the population of the respondent s city, the urban indicator is the best measure of agglomeration in the publicly-available files of the NLSY, and it is consistent with measures used in studies which study agglomeration by analyzing cities with populations above and below a given threshold. The results in the first row of the Table show that approximately seventy-seven percent of the overall sample live in urban areas, and about eighty percent of siblings of both genders live in urban places. The second through eighth rows of the Table display various observable characteristics of the sample, including age, marital status and average log wage. As it was with the results from the first row, the characteristics of theoverallsamplefromthenlsyarequitesimilartothecharacteristicsofthesampleof siblings. The ninth row of the Table displays the adjusted average score from a standardized test the respondents wrote, which is commonly referred to as the Air Force Qualifying Test (AFQT). The score on this test is created from an amalgam of scores on a series of tests known as the Armed Services Vocational Aptitude Battery (ASVAB); these tests were given 11 The NLSY is comprised of three main subsamples: the representative cross-sectional subsample, a military oversample, and an oversampling of civilian Hispanic or Latino, black, and economically disadvantaged, non-black/non-hispanic youth. In order to use the most representative data, the two oversamples were excluded from the analysis, and siblings were drawn from the representative cross-sectional subsample. 12 Since females have a lower probability of participating in the labor market than males, the exclusion of siblings of different genders circumvents the need to model these participation decisions. 12

to virtually all respondents in the NLSY in 1980. 13 However, since respondents varied in age and education at the time of writing the test, it is necessary to adjust the scores for these two factors when analyzing the test scores. As such, Table 5 presents an adjusted score from the AFQT it is the residual from a regression of the AFQT score on a respondent s age and education at the time of writing the tests. The advantage of this variable is that it provides an approximate measure of the unobserved component of the respondent s ability; that is, it represents his or her aptitude above and beyond observable measures. 14 This variable is useful in a regression context because it can assist in accounting for within-sibling differences in ability. This is explored in Table 6, which displays results from wage regressions which use a simple OLS procedure as well as a fixed-effects framework to measure the urban premium for siblings in the NLSY. The first two columns of the Table analyze a pooled sample of both brothers and sisters, and the results in the first column indicate that there is a highly significant 13 percent return to living in an urban area, even after controlling for observable characteristics. However, including a family-specific fixed effect into the analysis significantly alters the agglomeration premium. The findings in column two suggest that the agglomeration premium is slightly less than two percent, after accounting for a family-specific fixed effect. In addition, these results are strengthened by the fact that each sibling s adjusted AFQT score is included in the regression. Unlike the data set of twins, the siblings in the NLSY are not genetically identical, and it is plausible that afamilyfixed effect may not capture all of the within-sibling differences in the unobserved component of ability. However, the within-sibling difference in the adjusted AFQT score should serve as a good proxy for any remaining portion of unobserved ability that is not captured by the familial fixed effect. The remaining four columns of Table 6 present results bifurcated for the sample of 13 In a few cases, the ASVAB tests were written in 1981. 14 There are many other ways of normalizing the AFQT measure, such as converting the raw test score to a percentile score within each age cohort. All of the main findings from the NLSY are robust to the use of different normalization adjustments for the AFQT score. 13

brothers and the sample of sisters, since the returns to individual regressors in the wage equation may be different for men and women. Generally, though, the impact of a familial fixed effect for each gender is the same as the results for the pooled sample in columns one and two. Columns three and four demonstrate that the agglomeration premium for brothers is statistically significant and approximately 13 percent in an OLS regression without any family controls, but two-and-a-half percent (and only marginally significant) once familial fixed effects are included within the regression. Similarly, columns five and six show that the agglomeration premium for sisters is statistically significant and approximately twelve percent in a simple OLS regression, but only one percent and statistically insignificant once familial controls are included in the regression. Substantively, the results in columns three through six do not alter any of the fundamental conclusions drawn in the first two columns of the table: even separating the analysis by gender, the large and significant agglomeration premium becomes small in magnitude and statistically insignificant at the five percent level of significance after familial controls are included in the regression. A further examination of the agglomeration premium in the NLSY is presented in Table 7, which contains results from quantile regressions at the tenth, twenty-fifth, fiftieth, seventy-fifth and ninetieth percentiles for the three subsamples considered in Table 6. The first two columns present quantile regression results for the pooled sample of brothers and sisters from the data, and the results are similar to those in Table 6 (and also the findings from the quantile regression results from the data set of twins). In the first column, it is demonstrated that there is a significant urban premium at all five percentiles, and this premium is also increasing in magnitude at higher percentiles. However, the second column demonstrates that including a family-specific fixed effect makes this premium small and statistically insignificant in all but one case. When the siblings are analyzed by gender in the remaining columns of the Table, similar findings are evident. The results in columns three and five for brothers and sisters, respectively, demonstrate that the urban premium is significant at all of the percentiles, and that the premium increasing at higher percentiles. 14

In columns four and six, the results show that the urban premium becomes insignificant (and small in magnitude) in all cases. Overall, as was the case in Table 4, the findings suggest that the urban premium increases at larger percentiles, but becomes statistically insignificant throughout the wage distribution once familial fixed effects are included in the regression specification. The five-percent PUMS from the U.S. Census also allows for a within-sibling analysis, as previously discussed, because it contains information about two main groups that will be of particular use to the analysis: household heads and their siblings, and children of household heads who are siblings and live in the home of the household head. Ideally, it would be possible to identify siblings in different households (as was the case with the data set of twins and the NLSY), however, the sample design makes such a comparison impossible. This limitation has an effect on the types of individuals selected from the Census for this analysis, since the sample of household heads who also have a sibling living with them (or households with two working children still living at home) may not be representative of the overall population. Table 8 confirms this notion by comparing sample means from the overall census to means from the subsamples that will be used for the analysis. The first column shows the means from the overall census population of male household heads, male children of household heads and male siblings of household heads; comparing the characteristics of this group to the male household heads who live with their male siblings (in column three), it is clear that the latter sample is less educated, less wealthy and much younger than the former sample. A similar comparison can be made with the females from the census; column two reports sample means for all female household heads, siblings and children from the census, and a comparison to the results from column five (for female household heads who also live with their sisters) shows that the same differences are evident, although to a lesser degree there are only minor differences in hourly wages, suggesting that selection effects are more minor for the female sample. Given the results from the sample of twins and from the NLSY, it would be expected that the agglomeration premium would be smaller for siblings 15

from the Census (especially male siblings), given that they are drawn from lower percentiles of the wage distribution. Table 9 confirms this fact. Columns one and three show that there is a relatively small agglomeration premium for the PUMA of work for both male household heads and their siblings (approximately half of a percent) and for male children of household heads (approximately one-and-a-half percent), and both are consistent with the returns to agglomeration in the lower wage percentiles seen in the NLSY and data set of twins. However, columns two and four demonstrate that, as was the case in the other data sets, familial fixed effects eliminate the significance of the agglomeration premium for males, and significantly reduces its magnitude. An analysis of female siblings from the Census reveals similar patterns, too. Table 8 showed that female siblings within the Census appeared to be far more similar to the overall sample of women in the Census especially in regards to wages than was the case for men. As a result, the agglomeration premium for female siblings in the Census is much more similar to that from the overall literature; columns five and seven reveal that the agglomeration premium is approximately three percent for both female household heads and their siblings, as well as female children living with the household head. Again, the inclusion of a familial fixed effect in columns six and eight makes the premium insignificant and substantially smaller in magnitude virtually zero for both subsamples. To further consider the agglomeration premium within the Census, Table 10 replicates the analysis from Table 9, but instead of using the log of the population of the respondent s PUMA of work, the framework uses an indicator variable equal to one if population of the respondent s PUMA of work exceeds five-hundred thousand, and zero otherwise. The results in Table 10 are highly similar to findings in Table 9: the first and third column show that the two types of male siblings exhibit a significant wage premium (between six and nine percent) for working in a PUMA whose population exceeds one-half million, but columns two and four show that this premium becomes insignificant and small in magnitude after family fixed effects are included in the framework. As well, the two female samples of 16

siblings show that this type of agglomeration premium is large and significant in the absence of family fixed effects: columns five and seven report that these women exhibit a twelve to fourteen percent premium for working in a PUMA whose population exceeded one-half of a million people. However, these large premia became insignificant and small in magnitude once familial fixed effects were included in the regression specification. Overall, the NLSY and Census provide evidence that is consistent with findings from the data set of twins, and the results reinforce the notion that the issue of selection is a highly important factor when computing the agglomeration premium. One remaining issue for the analysis, though, involves some econometric complications that can affect conclusions drawn from any sibling-based framework; these issues will be discussed in the following section. 5 Within-Silbling Differences in Ability and Measurement Error The key issue raised in this study was the manner in which the wage premium for agglomeration involves the sorting of workers inside or outside of cities in particular, that there may be a non-random sorting of more able workers into cities which creates this premium. The assumption used to identify the causal effect of agglomeration on wages is that the unobserved component of ability is captured by a familial fixed effect, and any further sorting into or out of cities that occurs after accounting for this fixed effectisdueto factors unrelated to productivity, such as the preference for amenities or disamenities present within cities. 15 However, as it has been with other studies which use data on siblings, it can be questioned whether or not a familial fixed effect actually captures all of a siblings s unobserved ability. In particular, it may be the case that even after including a family-based fixed effect into the regression specification, within-sibling differences in ability still exist, even with the inclusion of within-sibling differences in test scores, as was the case with data 15 Roback s (1982) seminal work on this subject provides a model of individual choice to reside inside or outside of cities. 17

from the NLSY. Both Neumark (1999) and Bound and Solon (1999) outlined the potential biases that can affect within-twin estimates of the return to education, and the same biases can affect within-sibling estimates of any other variable in the wage equation. If sibling i s individual-specific component of ability is denoted by the variable ba ij, then the wage equations for each sibling can be written as: y 1j = βx 1j + αz j + θa j + φ ba 1j + ε 1j y 2j = βx 2j + αz j + θa j + φ b A 2j + ε 2j In this case, the within-sibling estimates of β derived from a regression of y j on X j are not unbiased, because a within-sibling estimator will not a fully remove the effects of ability: (y 1j y 2j )=β(x 1j X 2j )+φ( ba 1j ba 2j )+(ε 1j ε 2j ) y j = β X j + φ ba j + ε j and the resulting estimates of β are biased by the correlation of A 0 j and X j : b FE =( X 0 j X j ) 1 X 0 j y j = β + φ( X 0 j X j ) 1 X 0 j ba j It has been suggested that there exists a positive correlation with A 0 ij and a series of regressors in the wage equation, such as education, marital status, tenure, and city size. Thus, the row vector, Xj 0 A b j, would be expected to contain exclusively positive entries. The more able sibling would also receive a higher wage than his or her counterpart, suggesting that φ > 0, causing an upward bias in the estimation results for b FE. This lead Bound and Solon and Neumark to suggest that the within-sibling estimates are upper-bounds of the unbiased return to education, since it could be argued that differences in educational attainment between the siblings were due to differences in unobserved ability that was not captured by the family-specific fixed effect. However, this criticism is equally valid for any other variable analyzed in the within-sibling framework, including city size, since it could be argued that the more able sibling locates to a larger city. 18

Although the existence of within-sibling differences in ability may weaken conclusions drawn about estimates of the return to education from the data set of siblings (in particular, the data set of identical twins), it has favorable implications for the evidence on the city size wage premium presented herein. Since differences in inter-sibling ability cause an upward bias of the within-sibling fixed-effect estimator, then the fixed-effect estimate is an upper-bound on the true value of the return to city size. However, because the fixed-effect estimate of the city size wage premium is insignificant, then this suggests that the unbiased coefficients also are insignificant (and possibly negative). Thus, the presence of any withinsibling differences in ability would actually strengthen the conclusions drawn from the results about the causal effects of city size on wages. One additional consideration for the analysis is the potential effect of measurement error. Many authors (Ashenfelter and Krueger (1994), Griliches (1979)) have demonstrated that measurement error has an attenuating effect on coefficient estimates from a withinsibling framework, so it could be the case that the within-sibling or family fixed effects estimates of the urban premium are small because of these attenuating effects. However, this is unlikely to be true, because little measurement error would be present for the variables used to represent agglomeration in the analysis. In the NLSY and Census, the population of a respondent s city or PUMA of work is recorded through a relatively accurate administrative record, not a relatively inaccurate self-report. Further, for the data set of identical twins, each twin was asked for his or her city of residence, not the population of this place the population was coded into the data based upon each respondent s report of their town. The likelihood that a respondent misreported his or her hometown is exceptionally small, and as such, for all three data sources, the accuracy of the variables used in the analysis is good. Given this accuracy, the impact of attenuation bias due to measurement error should be very small (if at all), and could not account for the change in the estimated agglomeration premium in the presence of familial fixed effects. 19

6 Conclusion The effect of agglomeration on wages is highly significant and large in magnitude. But the question of the causal nature of this effect has been debated in the literature, which remains divided on this subject. The evidence presented in this paper is derived from multiple data sources and analyzed with an econometric approach that allows for the causal return to city size to be estimated by using a family fixed-effect for samples of siblings, including data from the U.S. Census, the NLSY and a sample of identical twins. The results from all three data sources were remarkably consistent, and demonstrated that there were not significant causal effects of many different variables used to represent the effect of agglomeration on wages, such as: the log of a city s population, an indicator variable equal to one if the city had a population in excess of 500,000 residents, an indicator variable equal to one if the city had a population in excess of 25,000 residents, and similar measures for the population of the respondent s PUMA of work. In a simple cross-sectional regression, all of these variables exhibited significant and large effects on wages. However, these effects became statistically insignificant and small in magnitude once controls for familial ability were included within the regression framework. In addition, it was found that the effect of controlling for familial ability was not only evident in regressions which estimated the average effect of agglomeration on wages, but also in quantile regressions as well. These approaches relate to the recent finding that agglomeration has greater effects for more skilled workers; even though the agglomeration premium is higher for more able workers, controlling for familial ability causes the city size wage premium to become insignificant for both less- and more-skilled workers. Overall, the evidence suggests that familial ability plays a significant role in the effect of city size on wages. 20

References [1] Aaronson, Daniel. Using Sibling Data to Estimate the Impact of Neighborhoods on Children s Educational Outcomes. Journal of Human Resources, Autumn 1998, 33(4), pp. 915-946. [2] Ashenfelter, Orley and Alan Krueger. Estimates of the Economic Return to Schooling from a new Sample of Twins. American Economic Review, December 1994, 84(5), pp. 1157-1173. [3] Ashenfelter, Orley and Cecilia Rouse. Income, Schooling and Ability: Evidence from a New Sample of Twins. Quarterly Journal of Economics, February 1998, 113(1), pp. 253-284. [4] Bacalod, Marigee; Blum, Bernado S. and Strange, William C. Skills in the City Mimeo, University of Toronto, 2007. [5] Bound, John and Gary Solon. Double Trouble: On the Value of Twins-Based Estimation of the Return to Schooling. Economics of Education Review, April 1999, 18(2), pp. 169-182. [6] Chamberlain, Gary. Multivariate Regression Models for Panel Data. Journal of Econometrics, January 1982, 18(1), pp. 5-46. [7] Combes, Pierre-Phillippe; Duranton, Gilles; and Gobillon, Laurent. Spatial Wage Disparities: Sorting Matters! Journal of Urban Economics, forthcoming. [8] Glaeser, Edward L. and Mare, David C. Cities and Skills. Journal of Labor Economics, 19(2), April 2001, pp. 316-342. [9] Griliches, Zvi. Sibling Models and Data in Economics: Beginnings of a Survey. Journal of Political Economy, October 1979, 87(5), Part 2, pp. S37-S64. [10] Krashinsky, Harry A. Do Marital Status and Computer Use Really Change the Wage Structure? Journal of Human Resources, Summer 2004, pp. 774-791. [11] Neumark, David. Biases in Twin Estimates of the Return to Schooling. Economics of Education Review, April 1999, 18(2), pp. 143-148. [12] Roback, Jennifer. Wages, Rents and the Quality of Life. Journal of Political Economy, 90(6), December 1982, pp. 1257-1278. [13] Rosenthal, Stuart S. and Strange, William C. The Attenuation of Human Capital Spillovers. Working paper, University of Toronto, 2006. [14] Tabuchi, Takatoshi and Yoshida, Atsushi. Separating Urban Agglomeration Economies in Consumption and Production. Journal of Urban Economics, 48, July 2000, pp. 70-84. 21

[15] Wheeler, Christopher H. Search, Sorting, and Urban Agglomeration. Journal of Labor Economics, 19(4), October 2001, pp. 879-899. [16] Wheeler, Christopher H. Cities and the Growth of Wages Among Young Workers: Evidence from the NLSY. Journal of Urban Economics, 60, September 2006, pp. 162-184. [17] Yankow, Jeffrey J. Why Do Cities Pay More? An Empirical Examination of Some Competing Theiries of the Urban Wage Premium. Journal of Urban Economics, 60, September 2006, pp. 139-161. 22