Economists are increasingly analyzing data on subjective well-being. Since 2000, 157

Similar documents
The Reliability of Subjective Well-Being Measures

Parameter Estimates of a Random Regression Test Day Model for First Three Lactation Somatic Cell Scores

Richard Williams Notre Dame Sociology Meetings of the European Survey Research Association Ljubljana,

International Journal of Emerging Technologies in Computational and Applied Sciences (IJETCAS)

What Determines Attitude Improvements? Does Religiosity Help?

Joint Modelling Approaches in diabetes research. Francisco Gude Clinical Epidemiology Unit, Hospital Clínico Universitario de Santiago

Incorrect Beliefs. Overconfidence. Types of Overconfidence. Outline. Overprecision 4/22/2015. Econ 1820: Behavioral Economics Mark Dean Spring 2015

The Marginal Income Effect of Education on Happiness: Estimating the Direct and Indirect Effects of Compulsory Schooling on Well-Being in Australia

An Introduction to Modern Measurement Theory

Appendix for. Institutions and Behavior: Experimental Evidence on the Effects of Democracy

310 Int'l Conf. Par. and Dist. Proc. Tech. and Appl. PDPTA'16

NHS Outcomes Framework

Human development is deeply embedded in social

Desperation or Desire? The Role of Risk Aversion in Marriage. Christy Spivey, Ph.D. * forthcoming, Economic Inquiry. Abstract

Economic crisis and follow-up of the conditions that define metabolic syndrome in a cohort of Catalonia,

Using Past Queries for Resource Selection in Distributed Information Retrieval

WHO S ASSESSMENT OF HEALTH CARE INDUSTRY PERFORMANCE: RATING THE RANKINGS

ALMALAUREA WORKING PAPERS no. 9

Kim M Iburg Joshua A Salomon Ajay Tandon Christopher JL Murray. Global Programme on Evidence for Health Policy Discussion Paper No.

ARTICLE IN PRESS Neuropsychologia xxx (2010) xxx xxx

Integration of sensory information within touch and across modalities

Using the Perpendicular Distance to the Nearest Fracture as a Proxy for Conventional Fracture Spacing Measures

HIV/AIDS-related Expectations and Risky Sexual Behavior in Malawi

Are National School Lunch Program Participants More Likely to be Obese? Dealing with Identification

HIV/AIDS-related Expectations and Risky Sexual Behavior in Malawi

Copy Number Variation Methods and Data

I: Single-Item Measures

I T L S. WORKING PAPER ITLS-WP Social exclusion and the value of mobility. INSTITUTE of TRANSPORT and LOGISTICS STUDIES

Does reporting heterogeneity bias the measurement of health disparities?

A Linear Regression Model to Detect User Emotion for Touch Input Interactive Systems

A comparison of statistical methods in interrupted time series analysis to estimate an intervention effect

Multidimensional Reliability of Instrument for Measuring Students Attitudes Toward Statistics by Using Semantic Differential Scale

National Polyp Study data: evidence for regression of adenomas

Rich and Powerful? Subjective Power and Welfare in Russia

Estimation Comparison of Multidimensional Reliability Coefficients Measurement of Senior High School Students Affection towards Mathematics

Risk Misperception and Selection in Insurance Markets: An Application to Demand for Cancer Insurance

Biased Perceptions of Income Distribution and Preferences for Redistribution: Evidence from a Survey Experiment

Rich and Powerful? Subjective Power and Welfare in Russia

A Meta-Analysis of the Effect of Education on Social Capital

A GEOGRAPHICAL AND STATISTICAL ANALYSIS OF LEUKEMIA DEATHS RELATING TO NUCLEAR POWER PLANTS. Whitney Thompson, Sarah McGinnis, Darius McDaniel,

The Limits of Individual Identification from Sample Allele Frequencies: Theory and Statistical Analysis

Non-linear Multiple-Cue Judgment Tasks

Prediction of Total Pressure Drop in Stenotic Coronary Arteries with Their Geometric Parameters

N-back Training Task Performance: Analysis and Model

Modeling Multi Layer Feed-forward Neural. Network Model on the Influence of Hypertension. and Diabetes Mellitus on Family History of

TOPICS IN HEALTH ECONOMETRICS

Statistical Analysis on Infectious Diseases in Dubai, UAE

INITIAL ANALYSIS OF AWS-OBSERVED TEMPERATURE

Optimal Planning of Charging Station for Phased Electric Vehicle *

Price linkages in value chains: methodology

Bimodal Bidding in Experimental All-Pay Auctions

Are Drinkers Prone to Engage in Risky Sexual Behaviors?

J. H. Rohrer, S. H. Baron, E. L. Hoffman, D. V. Swander

Latent Class Analysis for Marketing Scales Development

Estimation of Relative Survival Based on Cancer Registry Data

Alma Mater Studiorum Università di Bologna DOTTORATO DI RICERCA IN METODOLOGIA STATISTICA PER LA RICERCA SCIENTIFICA

Introduction ORIGINAL RESEARCH

THE NORMAL DISTRIBUTION AND Z-SCORES COMMON CORE ALGEBRA II

The High way code. the guide to safer, more enjoyable drug use [GHB] Who developed it?

Appendix F: The Grant Impact for SBIR Mills

Encoding processes, in memory scanning tasks

EVALUATION OF BULK MODULUS AND RING DIAMETER OF SOME TELLURITE GLASS SYSTEMS

RHEUMATOID ARTHRITIS PATIENTS CANNOT ACCURATELY REPORT SIGNS OF INFLAMMATORY ACTIVITY

Project title: Mathematical Models of Fish Populations in Marine Reserves

HERMAN AGUINIS University of Colorado at Denver. SCOTT A. PETERSEN U.S. Military Academy at West Point. CHARLES A. PIERCE Montana State University

The Importance of Being Marginal: Gender Differences in Generosity 1

THE NATURAL HISTORY AND THE EFFECT OF PIVMECILLINAM IN LOWER URINARY TRACT INFECTION.

Discussion Papers In Economics And Business

Length of Hospital Stay After Acute Myocardial Infarction in the Myocardial Infarction Triage and Intervention (MITI) Project Registry

IV Estimation. Dr. Alexander Spermann. Summer Term 2012

Inverted-U and Inverted-J Effects in Self-Referenced Decisions

Journal of Economic Behavior & Organization

Causal inference in nonexperimental studies typically

A Mathematical Model of the Cerebellar-Olivary System II: Motor Adaptation Through Systematic Disruption of Climbing Fiber Equilibrium

Addressing empirical challenges related to the incentive compatibility of stated preference methods

Unobserved Heterogeneity and the Statistical Analysis of Highway Accident Data

Rainbow trout survival and capture probabilities in the upper Rangitikei River, New Zealand

ME Abstract. Keywords: multidimensional reliability, instrument of students satisfaction as an internal costumer, confirmatory factor analysis

POLITECNICO DI TORINO Repository ISTITUZIONALE

Balanced Query Methods for Improving OCR-Based Retrieval

The High way code. the guide to safer, more enjoyable drug use. [cannabis] Who developed it?

Do norms and procedures speak louder than outcomes? An explorative analysis of an exclusion game. Timo Tammi

Do Animal-Assisted Activities Effectively Treat Depression? A Meta-Analysis

DS May 31,2012 Commissioner, Development. Services Department SPA June 7,2012

Can Subjective Questions on Economic Welfare Be Trusted?

Sparse Representation of HCP Grayordinate Data Reveals. Novel Functional Architecture of Cerebral Cortex

International Journal of Business and Economic Development Vol. 3 Number 1 March 2015

Prototypes in the Mist: The Early Epochs of Category Learning

Lateral Transfer Data Report. Principal Investigator: Andrea Baptiste, MA, OT, CIE Co-Investigator: Kay Steadman, MA, OTR, CHSP. Executive Summary:

HIV/AIDS AND POVERTY IN SOUTH AFRICA: A BAYESIAN ESTIMATION OF SELECTION MODELS WITH CORRELATED FIXED-EFFECTS

Physical Model for the Evolution of the Genetic Code

arxiv: v1 [cs.cy] 9 Nov 2018

The High way code. the guide to safer, more enjoyable drug use. (alcohol)

DECREASING SYMPTOMS IN INTERSTITIAL CYSTITIS PATIENTS: PENTOSAN POLYSULFATE VS. SACRAL NEUROMODULATION. A Research Project by. Katy D.

Does Context Matter More for Hypothetical Than for Actual Contributions?

Association between cholesterol and cardiac parameters.

Strong, Bold, and Kind: Self-Control and Cooperation in Social Dilemmas

MULTIDIMENSIONAL RELIABILITY OF INSTRUMENT STUDENTS SATISFACTION USING CONFIRMATORY FACTOR ANALYSIS ABSTRACT

Offsetting Behavior in Reducing High Cholesterol: Substitution of Medication for Diet and Lifestyle Changes

Modeling the Survival of Retrospective Clinical Data from Prostate Cancer Patients in Komfo Anokye Teaching Hospital, Ghana

Transcription:

The Relablty of Subjectve Well-Beng Measures by Alan B. Krueger, Prnceton Unversty Davd A. Schkade, Unversty of Calforna, San Dego CEPS Workng Paper No. 138 January 007 The authors thank our colleagues Danel Kahneman, Norbert Schwarz, and Arthur Stone for helpful comments and the Hewlett Foundaton, the Natonal Insttute on Agng, and Prnceton Unversty s Woodrow Wlson School and Center for Economc Polcy Studes for fnancal support.

Relablty of SWB Measures Introducton Economsts are ncreasngly analyzng data on subjectve well-beng. Snce 000, 157 papers and numerous books have been publshed n the economcs lterature usng data on lfe satsfacton or subjectve well-beng, accordng to a search of Econ Lt. 1 Here we analyze the test-retest relablty of two measures of subjectve well-beng: a standard lfe satsfacton queston and affectve experence measures derved from the Day Reconstructon Method (DRM). Although economsts have longstandng reservatons about the feasblty of nterpersonal comparsons of utlty that we can only partally address here, another queston concerns the relablty of such measurements for the same set of ndvduals over tme. Overall lfe satsfacton should not change very much from week to week. Lkewse, ndvduals who have smlar routnes from week to week should experence smlar feelngs over tme. How persstent are ndvduals responses to subjectve well-beng questons? To antcpate our man fndngs, both measures of subjectve well-beng (lfe satsfacton and affectve experence) dsplay a seral correlaton of about 0.60 when assessed two weeks apart, whch s lower than the relablty ratos typcally found for educaton, ncome and many other common mcro economc varables (Bound, Brown, and Mathowetz, 001 and Angrst and Krueger, 1999), but hgh enough to support much of the research that has been undertaken on subjectve well-beng. The lfe satsfacton queston that we examne s vrtually dentcal to that used n the World Values Survey, and smlar to that used n many other surveys. The DRM s a recent development n the measurement of the affectve experence of daly lfe. The gold standard for such measurements s the Experence Samplng Method (ESM) (also called Ecologcal Momentary Assessment (EMA)), n whch partcpants are prompted at rregular ntervals to 1 Promnent examples are Layard (005), Blanchflower and Oswald (004), and Frey and Stutzer (00).

Relablty of SWB Measures 3 record ther current crcumstances and feelngs (Cskszentmhaly & Larsen, 1987; Stone, Shffman & DeVres, 1999). Ths method of measurng affect mnmzes the role of memory and nterpretaton, but t s expensve and dffcult to mplement n large samples. Consequently, we use the Day Reconstructon Method (DRM), n whch partcpants are requred to thnk about the precedng day, break t up nto epsodes, and descrbe each epsode by selectng from several menus (Kahneman, Krueger, Schkade, Schwarz, & Stone, 004). The DRM nvolves memory, but t s desgned to ncrease the accuracy of emotonal recall by nducng retreval of the specfcs of successve epsodes (Robnson & Clore, 00; Bell, 1998). Evdence that the two methods can be expected to yeld smlar results was presented earler for subpopulaton averages (Kahneman, et al., 004). A crtcal advantage of the DRM s that t provdes data on tme-use a valuable source of nformaton n ts own rght, whch has rarely been combned wth the study of subjectve well-beng. In ths paper we examne relablty measures for a sample of 9 women who each flled out a DRM questonnare for two Wednesdays, two weeks apart n 005. We compare these relabltes to those of global well-beng measures more typcal n the lterature, and we decompose the relablty of duraton-weghted net affect nto a component due to the smlarty of actvtes across days and other factors. We also use these relablty estmates to correct observed relatonshps between reported well-beng and other varables (e.g., ncome) for attenuaton. We conclude wth a dscusson of the mplcatons of measurement error for DRM studes and for well-beng research more generally.

Relablty of SWB Measures 4 What s relablty and why should we care? Consder an observed varable, y, whch s a nosy measure of the varable of nterest, y*. We can wrte y y + e where y s the observed value for ndvdual, y s the correct = * * value, and e s the error term. Under the classcal measurement error assumptons, e s a whte nose dsturbance that s uncorrelated wth * y and homoskedastc. Classcal measurement error wll lead correlatons between y and other varables to be attenuated toward 0 n large samples. If we can measure y at two ponts n tme, and f the measurement errors are ndependent and have a constant varance over tme, then the correlaton between the two measures provdes an estmate of the rato of the varance n the sgnal to the total varance n y. We thus defne the relablty rato, r, as r = 1 corr( y, y ), where the superscrpts ndcate the measurement taken n perods 1 and. Under the assumptons stated, plm r = var(y*). var (y*) + var(e) In addton to summarzng the extent of random nose n subjectve well-beng reports, the sgnal-to-total varance rato s of nterest because, n the lmt, t equals the proportonal bas that arses when SWB s an explanatory varable n a bvarate regresson. Furthermore, as we explan below, correlatons between SWB and other varables are attenuated by random measurement error n SWB. An mportant applcaton of SWB data nvolves estmatng the correlaton between lfe satsfacton, affect and other varables such as ncome (e.g., Argyle, 1999). We can use the relablty rato to correct those correlatons for attenuaton. Of course, f the measurement error s not classcal, the test-retest correlaton can underor over-state the sgnal-to-total varance rato, dependng on the nature of the devaton from classcal measurement error. Wth only two reports of y, and wthout knowledge of y*, t s not If y s of lmted range (e.g., a bnary varable) than e wll necessarly be correlated wth y*. We gnore ths ssue for the tme beng.

Relablty of SWB Measures 5 possble to assess the plausblty of the classcal measurement error assumptons. If the errors n measurement are postvely correlated over tme, then the test-retest correlaton wll overstate the relablty of the data. Nevertheless, the test-retest correlaton s a convenent startng pont for assessng the relablty of subjectve well-beng data. Related lterature There s a vast emprcal lterature on subjectve well beng (Kahneman, Dener and Schwarz, 1999). Subjectve well-beng s most commonly measured by askng people a sngle queston, such as, All thngs consdered, how satsfed are you wth your lfe as a whole these days? or Taken all together, would you say that you are very happy, pretty happy, or not too happy? Such questons elct a global evaluaton of one s lfe. Surveys n many countres conducted over decades ndcate that, on average, reported global judgments of lfe satsfacton or happness have not changed much over the last four decades, n spte of large ncreases n real ncome per capta. Although reported lfe satsfacton and household ncome are postvely correlated n a cross secton of people at a gven tme, ncreases n ncome have been found to have manly a transtory effect on ndvduals reported lfe satsfacton (Easterln, 1995). Moreover, the correlaton between ncome and subjectve wellbeng s notably weaker when a measure of experenced happness s used nstead of lfe satsfacton (Kahneman et al., 006). Of course, such low correlatons could be partally due to attenuaton, f measurement error s hgh. There s a small lterature assessng the relablty of ndvdual-level sngle-tem wellbeng measures, even less on the relablty of ESM, and none as of yet on the DRM (see Table 1). Sngle-tem measures of SWB have been found to have relatvely low relabltes, usually between.40 and.66, even when asked twce n the same sesson one hour apart (Andrews and

Relablty of SWB Measures 6 Whthey, 1976). Kammann and Flett (1983) found that sngle-tem well-beng questons under the nstructons to consder the past few weeks or these days had relabltes of.50 to.55 when asked wthn the same day. Interestngly, the only study we are aware of that looked at the relablty of an ESM measure of duraton-weghted happness found a correlaton on the upper end of the range found for sngle-tem global well-beng measures (Steptoe, Wardle and Marmot, 005). Overall, there has been surprsngly lttle attenton pad to relablty, despte the wde use of these measures. The Satsfacton wth Lfe Scale (SWLS, Dener et al., 1985) s another commonly used global satsfacton measure. In contrast to the sngle queston measures t conssts of the average of fve related tems, each of whch s rated on a 7-pont scale from Strongly Dsagree (1) to Strongly Agree (7). The tems are: In most ways my lfe s close to my deal ; The condtons of my lfe are excellent ; I am satsfed wth my lfe ; So far I have gotten the mportant thngs I want n lfe ; and If I could lve my lfe over, I would change almost nothng. A key reason that SWLS has proven more relable than sngle tem questons (see Table 1), s that snce t s the sum of multple tems, t benefts from error reducton through aggregaton. Ed and Dener (004) used a structural model to estmate relablty for a sample of 49 students, measured three tmes wth four weeks between successve measurements. After statstcally separatng out the nfluence of stuaton specfc factors, they estmated that the mputed stablty for lfe satsfacton was very hgh, around 0.90.

Relablty of SWB Measures 7 Table 1. Estmates of Relablty for Well-Beng Measures Test-retest Temporal Correlaton nterval Varable Sngle-Item Measures Andrews & Whthey (1976).40-.66 1 hour lfe satsfacton Kammann and Flett (1983).50-.55 same day overall happness, satsfacton Multple Item Measures* Alfonso & Allson (199a).83 weeks SWLS Pavot et al. (1991).84 1 month SWLS Blas et al. (1989).64 months SWLS Dener et al. (1985).8 months SWLS Yardley & Rce (1991).50 10 weeks SWLS Magnus et al.(199).54 4 years SWLS ESM Steptoe, Wardle & Marmot (005).65 weekend-weekday experenced happness *Note: From Pavot and Dener (1993), Table One reason for the modest relablty of subjectve well-beng measures compared wth educaton and ncome, whch typcally have relablty ratos of around 0.90, could be the susceptblty of SWB questons to transent mood effects. For example, researchers have documented mood changes due to such subtle events as fndng a dme before fllng out a questonnare, the current weather, or queston order, whch n turn nfluence reported lfe satsfacton (e.g., Schwarz, 1987). Ed and Dener (004) used a structural model to attempted to separate stuatonal varablty from random error and basc stablty, they found that anywhere from 4% to 5% of the varance n varous affect and satsfacton measures were accounted for by stuaton-specfc factors. In an earler study, Ferrng et al. (1996) estmated ths the sze of ths nfluence as between 1% and 34% of the total varance. Snce the experenced affect measure produced by the DRM s focused on reconstructng a specfc event and the affect

Relablty of SWB Measures 8 experenced durng t, there s at least the possblty that such measures wll be less vulnerable to current mood at the tme of the ntervew. We mght expect DRM measures to be less relable over tme than lfe satsfacton because a person s actvtes change from day to day. At the same tme, DRM measures are averages of multple responses, whle global lfe satsfacton of happness s often assessed wth just one queston. If ESM s any gude, the DRM may be at least as relable as reported overall lfe satsfacton. Method We evaluate the test-retest relablty of the DRM by havng the same respondents complete a DRM questonnare two weeks apart regardng the same day of the week (Wednesday). The questonnare, whch s avalable from the authors on request, also contaned standard global lfe satsfacton measures. The resultng data provde nformaton about the relatve stablty of the DRM compared to the types of global lfe satsfacton questons used n most well beng research for the same sample. For comparablty wth our prevous studes, the respondents (n = 9) were selected by random selecton of women from the drver s lcense lst n Travs County, Texas and screenng for employment and age between 18 and 60. Respondents were pad $50 upon completng the frst questonnare and an addtonal $100 upon completng the second one for a total of $150. The ntervew dates were two Thursdays, March 31, 005 and Aprl 14, 005. Followng the DRM procedure, partcpants reported on the prevous day. Completon tmes for the selfadmnstered nstrument ranged from 45 to 75 mn. The ethnc composton of the sample was

Relablty of SWB Measures 9 67% whte (non-hspanc), 7% Afrcan Amercan, 1% Hspanc, and 5% other. Average age was 4.8 years. Medan household ncome category was $40,000-$50,000. The DRM protocol descrbed by Kahneman et al. (004) was followed. Groups of partcpants were nvted to a central locaton for a sesson on Thursday evenng, where they answered a seres of questons contaned n four packets. The frst packet ncluded general satsfacton and demographc questons. Next, the respondents were asked to construct a dary of the prevous day (Wednesday) as a seres of epsodes, notng the content and the begnnng and end tme of each. In the thrd packet, they were asked for a detaled descrpton of every epsode as explaned below. The average number of epsodes a respondent descrbed for the day was somewhat hgher n the second sesson (14.8 vs 13., p <.001, by a pared t-test) although the total tme covered by the epsodes was no dfferent (16.8 vs 16.7 hrs, p >.0, by a pared t-test). These fgures compare to the 14.1 epsodes and 15.4 hours reported n Kahneman et al. (004). The frst few questons n the survey were global SWB questons. Frst was the overall lfe satsfacton queston, Takng all thngs together, how satsfed are you wth your lfe as a whole these days? Are you very satsfed, satsfed, not very satsfed, not at all satsfed? Next, smlar questons were asked for your lfe at home and your present job. Two global mood questons followed, for home and for work. The queston posed was When you are at home, what percentage of the tme are you n a bad mood %, a lttle low or rrtable %, n a mldly pleasant mood %, n a very good mood %." The last two response categores were added together to obtan the percentage of tme n a good mood. Net mood was computed by subtractng the sum of the frst two response categores from the sum of the last two. The same procedure was appled to the work mood queston.

Relablty of SWB Measures 10 The affect measures derved from the DRM are combnatons of the duraton-weghted affectve adjectves that respondents rated for each epsode. Net affect was computed by subtractng the average of negatve affect (NA) tense/stressed, depressed/blue and angry/hostle from the average of postve affect (PA) happy, affectonate/frendly and calm/relaxed. 3 Dfmax s the duraton-weghted average of happy less the maxmum of tense/stressed, depressed/blue, angry/hostle. The U-ndex s closely related to Dfmax, and equals one when Dfmax < 0 and = 0 otherwse. Intutvely, the U-ndex s an bnary varable ndcatng the proporton of tme that an ndvdual spends n a state n whch the strongest emoton s a negatve one. Dfmax and the U-ndex are recently proposed summary measures of affectve experence (Kahneman & Krueger, 006; Kahneman, Schkade, Krueger, Fschler & Krlla, 006). Results Table presents the correlatons between varous measures for the same person n the frst and second sessons, as well as 95% confdence ntervals. We focus frst on overall measures of affectve experence. Perhaps the most surprsng fndng s that the relabltes of Net Affect (r=.64) and Dfmax (r=.60) are at least as hgh as that for lfe satsfacton (r=.59). Satsfacton wth domans of lfe (work and home) are both more relable than satsfacton wth lfe overall. The correspondng home and work mood measures are also more relable than lfe satsfacton. Another notable feature of the results s that postve affect appears to be somewhat more relable than negatve affect. 3 Frustrated was excluded from negatve affect for comparablty wth our other studes.

Relablty of SWB Measures 11 Table. Correlatons Between Selected Measures at Perod 1 and Perod 95% confdence nterval Observed Lower Upper Global Measures Lfe satsfacton.59.49.67 Home satsfacton.74.68.80 Work Satsfacton.68.61.75 Home net mood.70.63.76 Work net mood.68.61.75 Experence Measures Net affect.64.56.71 Dfmax.60.51.68 Undex.50.40.59 Postve Affect happy.6.54.70 affectonate/frendly.68.61.75 calm/relaxed.56.46.64 PA.68.61.75 Negatve Affect tense/stressed.54.44.6 depressed/blue.60.51.68 angry/hostle.54.44.63 frustrated.48.37.57 NA.60.51.68 Other affect adjectves mpatent for t to end.56.47.65 competent/dong well.64.55.71 nterested/focused.57.47.65 tred.65.56.7 Demographcs Household ncome.96.95.97 Educaton (yrs).98.98.99 Age 1.00 1.00 1.00 Note: Confdence ntervals for the correlatons are not symmetrc because they are based on the nonlnear Fsher s z transformaton (z =.5[ln(1+r) - ln(1-r)]), whch s normally dstrbuted and used for sgnfcance testng. Sample szes are 8 or 9, except for age, whch s 3 due to mssng data.

Relablty of SWB Measures 1 The extent to whch a person s ratng of a partcular adjectve over dfferent epsodes of the day represents personal trats or s nfluenced by the varablty n stuatons s lkely related to the relablty of that adjectve. If a gven person tends to feel the same way most of the tme (a happy person or a depressed person) regardless of the stuaton, then ths adjectve mght be expected to have greater relablty across the two sessons, snce the actvtes the person engages n on the two days vary. To crudely gauge the extent to whch partcular adjectves are person-bound or stuaton-bound, for each adjectve we pooled the two sessons and computed the varance of the duraton-weghted personal averages across people and the average varance wthn each person s days across epsodes, and then took the rato of the between people to wthn-person varances. A hgh rato would ndcate that an adjectve s relatvely constant for a person (more of an ndvdual dfference lke a trat) and a low rato would ndcate that an adjectve s determned more by the stuaton than who the person s. Results are shown n Table 3. Qute plausbly, feelng depressed appears to be a more trat-lke descrptor, whle feelng tense/stressed or mpatent for an epsode to end are hghly stuatonal. Interestngly, we found a correlaton of 0.41 between the varance rato and the relablty ratos shown n Table 3, whch ndcates moderate support for the hypothess of greater relablty for trat-lke emotons. 4 4 We also computed these ratos for the DRM sample n Kahneman et al (004). The two samples produced very smlar sets of ratos for the 8 adjectves n common between the two samples the correlaton of the ratos was.89.

Relablty of SWB Measures 13 Table 3. Are Trat-lke Feelngs More Relable? Adjectve Mean Wthn- Indvdual (σ w) Across Indvduals (σ a ) Rato (σ a / σ w) Relablty depressed/blue.70.9 1.3.60 tred 1.38 1.39 1.01.65 angry/hostle.81.73.90.54 competent/dong well 1.0.88.86.64 affectonate/frendly 1.31 1.10.84.68 happy 1.18.99.84.6 calm/relaxed 1.31.95.73.56 frustrated 1.43 1.0.71.48 nterested/focused 1.19.83.70.57 tense/stressed 1.50 1.01.68.54 mpatent for t to end 1.83.96.53.56 Affectve Smlarty of Tme Allocaton We next examne the affectve smlarty of how ndvduals spent ther tme on the survey reference dates. We can decompose the relablty rato nto a component that reflects the 1 hedonc smlarty of actvtes n the survey reference days and all other factors. Let denote A j Net Affect for person durng her actvty n epsode j n week 1, and net affect for person durng the actvty n epsode j n week. Usng h j to denote the fracton of the day devoted to epsode j, we wrte average net affect over the course of the day n week 1 and week as A j 1 y and y 1 1 1, respectvely, defned as y = h A and j j j y = h j j A j. The relablty of average net affect n successve ntervews, whch we have emphaszed so far, s measured by ρ = cov(y 1, y )/σ y 1σ y. The relablty rato reflects the accuracy of reportng of the data and the

Relablty of SWB Measures 14 persstence of average net affect over tme. The affectve experence data could be accurately reported, but f people engage n actvtes that yeld very dfferent affectve experences from week to week, the correlaton wll nonetheless be low. To ascertan the proporton of the relablty rato that results from engagng n actvtes that yeld smlar affectve experences over tme, we defne 1 y = h j 1 j _ A j, where _ A s the j average affect taken over all people whle they are engaged n actvty j. Analogously, we defne y = h j j _ A j for the follow-up ntervew. Notce that y and 1 y are predcted average net affect based entrely on an ndvdual s tme allocaton a nd the sample s overall ratng of actvty j. An ndvduals affectve ratng does not enter n these predctons (except through the sample mean). A straghtforward measure of the smlarty of actvtes on the reference days s the correlaton between 1 y and y, whch we denote as r. The share of a sngle day s sgnal n average net affect that s attrbutable purely to the affectve smlarty of the actvtes engaged n two weeks apart s gven by: 1 cov( y, y ) κ =. 1 cov( y, y ) We can also defne the fracton of the observed varance n average net affect due to the smlarty of actvtes as: 1 cov( y, ) γ y =. ( σ 1 σ y y ) We measure _ A j n two ways. Frst, we smply assgn the average net affect assocated wth actvty j. Second, we assgn the condtonal average based on a lnear regresson of net affect on

Relablty of SWB Measures 15 actvty dummes and 9 nteracton partner dummes. Table 4 presents these decompostons for Net Affect. Table 4: The affectve smlarty of tme use a fortnght apart Predcton of _ A j 1 cov( y, ) ' = y r ( σ 1, σ ) y y κ = 1 cov( y, y 1 cov( y, y ) ) 1 cov( y, ) γ y = ( σ 1 σ y y ) Actvty ().67.009.006 Actvty () and Interacton Partner (9).3.014.009 The results ndcate that ndvduals would have a correlaton of around 0.30 n ther net affect on the reference dates f they used the sample-wde average net affect to rate ther actvtes. Because actvtes and nteracton partners only account for around 10 percent of the 1 varaton n net affect at the epsode level, however, the varance of y and of y s consderably lower than the varance of y 1 and of y. Consequently, the share of the covarance or varance of reported net affect that s accounted for by tme use s qute small, on the order of around 1 percent. When we look at specfc affects we reach a smlar concluson. For example, tme use accounts for only percent of the estmated sgnal n tense/stress. Thus, the relatvely hgh relablty of the DRM data across two weeks comes about manly because of ndvdual dfferences n affect, rrespectve of the stuatons that people fnd themselves n on the reference days.

Relablty of SWB Measures 16 Adjustng Correlatons for Attenuaton One consequence of less than complete relablty s that observed correlaton between two measured varables x and y are attenuated n proporton to the degree of error. Assumng classcal measurement error, the asymptotc equaton relatng the observed correlaton to the true correlaton s (Nunnally, 1978): r xy = ρ xy r xx r yy where r xy = observed correlaton between x and y ρ xy = true correlaton between x and y r xx = relablty of x r yy = relablty of y The correcton for attenuaton uses ths relaton to produce an asymptotcally unbased estmate of the true correlaton by rearrangng to solve for ρ xy we have: ρ xy = r xy r xx r yy Snce for nondegenerate dstrbutons the denomnator s n the nterval (0,1), adjusted corrected correlatons are hgher than the observed correlatons ( ˆρ xy > r xy ). Attenuaton correctons are somewhat controversal because they are only asymptotcally unbased and because the degree of attenuaton may vary across data sets. Thus, f the assumptons of classcal measurement error are satsfed, the magntude of the adjus ted correlaton s best taken as a rule of thumb estmate for the true correlaton.

Relablty of SWB Measures 17 Usng the asymptotc formula as an approxmaton, we can examne the effect that measurement error mght have on observed relatonshps. Table 5 shows some examples. Perhap s most mportantly, the frst row shows that the correlaton between lfe satsfacton and net affect rses to.50 when we adjust for measurement error n both varables. Although ths s a substantal ncrease, the resultng correlaton nonetheless ndcates that daly net affect and lfe satsfacton are dstnct descrptors of ndvduals lves. Table 5. Examples of Correcton for Attenuaton x y r xy ˆρ xy r xx r yy Net affect Lfe Satsfacton.31.50.64.59 Dfmax Lfe Satsfacton.37.6.60.59 Undex Lfe Satsfacton -.6 -.48.50.59 Household Income Lfe Satsfacton.1.8.96.59 Household Income Net affect.1.15.96.64 Household Income Dfmax.10.13.96.60 Household Income U-ndex.-.06 -.09.96.50 Testng for Heteroskedastc Errors Although wth only two temporal observatons we cannot drectly test the assumptons of classcal measurement error, we can nvestgate whether dscrepances n Net Affect over tme are homoskedastc errors. Specfcally, we regress net affect at perod on the same measure at perod 1 and test for homoskedastc errors. Wth classcal measurement error e s assumed to

Relablty of SWB Measures 18 have the same dstrbuton for all. To examne ths property we use the method of Kroenker and Bassett (198), whch employs quantle regressons. Fgure 1 shows a scatter dagram of net affect at perods 1 and, wth 0 th and 80 th quantle regresson lnes. There s only a margnally sgnfcant dfference between the 0 th and 80 th quantle regresson lnes (t = -1.70, p =.09), whch ndcates that there s possbly some evdence of heteroskedastcty. However, adjacent comparsons yeld mxed results f we nstead use the 40 th and 60 th or 30 th and 70 th quantle parngs the test s not sgnfcant, but wth the 5 th and 75 th or the 10 th and 90 th parngs the test s sgnfcant (p <.05). Usng a dfferent test for homoskedastc errors due to Whte (1980), we regress net affect at perod on perod 1, and then regress the resultng squared resduals on perod 1 net affect; n*r from ths second regresson ~ χ. The resultng R s.004 and χ (1) =.916, ns, from whch we cannot reject the hypothess of homoskedastc errors. It s possble that the assumpton of homoskedastc measurement error could be volated, but the devaton s probably slght.

Relablty of SWB Measures 19 Fgure 1. Scatterplot of Net Affect at Perods 1 and wth Quantle Regresson Lnes 6 5 4 3 0th quantle 80th quantle 1 0-3 - -1 0 1 3 4 5 6-1 - -3 Net Affect at Tme 1

Relablty of SWB Measures 0 Relablty of Aggregate Actvty Experence Ratngs The relabltes we have computed thus far are defned at the level of the ndvdual. For many applcatons, however, the key ssue s not the relablty of net affect for ndvduals, but rather the relablty of average net affect across ndvduals engaged n varous actvtes. The queston of relablty n ths context s whether a gven actvty produces the same average experence at dfferent tmes. A smple test for ths s to compute the mean values for each actvty for each tme perod and correlate the vectors across actvtes. Table 6 presents the mean net affect for each day by actvty and nteracton partner. The two DRMs produce a remarkably smlar patterns of mean net affect across actvtes (r =.96, see Table 6 and Fgure ) and also of relatve frequency (r =.99, see thrd and sxth columns of Table 6). _ A j

Relablty of SWB Measures 1 Table 6. Net Affect by Actvty or Socal Context Perod 1 Perod Mean Stderr N Mean Stderr N Actvtes playng 4.09.6 48 4.16.4 48 ntmate relatons 4.07.33 9 4.40.7 6 relaxng 3.68.17 103 3.91.16 119 walkng, takng a walk 3.61.3 53 3.60.5 59 watchng tv 3.4.13 187 3.35.14 193 exercsng 3.37.3 54 3.4.9 49 eatng 3.34.13 10 3.44.11 16 readng 3.4.19 98 3.13.0 101 preparng food 3.19.16 165 3.0.16 160 prayng/worshppng/medtatng 3.14.3 54 3.4.33 43 talkng, conversaton.94.1 17.87.13 0 rest/sleep.85.8 76.81.6 88 chldcare.80.4 83.80.5 87 home computer.75.6 80.54.7 71 dong housework.73.19 15.86.19 11 groomng, self care.65.14 03.59.14 0 shoppng, errands.55.6 78 3.18.18 91 other actvtes.47.19 148.47.18 148 lstenng to musc.4.19 119.36.19 113 lstenng to rado, news.41.0 104.6.1 107 commutng, travelng.19.14 11..14 18 workng.07.14 18.0.14 14 Socal Interactons frends/relatves 3.4.18 16 3.19.19 133 spouse/sgnfcant other 3.10.17 143 3.18.17 150 my chldren 3.10.16 1.91.18 15 parents 3.05.41 40.55.34 35 other people.53.18 131.63.0 135 customers/students.5.19 104.38.4 89 co-workers.1.14 07.40.15 06 boss 1.88.19 19.08.19 11

Relablty of SWB Measures Fgure. Mean Net Affect for Actvtes by Sesson 5 4.5 r =.96 4 3.5 3.5 1.5 1 0.5 0 0 0.5 1 1.5.5 3 3.5 4 4.5 5 Net Affect at Tme 1

Relablty of SWB Measures 3 Dscusson We analyzed the persstence of varous subjectve well-beng questons over a two-week perod. We found that both overall lfe satsfacton measures and affectve experence measures derved from the DRM exhbted test-retest correlatons n the range of.50-.70. Whle these fgures are lower than the relablty ratos typcally found for educaton, ncome and many other common mcro economc varables, they are probably suffcently hgh to support much of the research that s currently beng undertaken on subjectve well-beng, partcularly n cases where group means are beng compared (e.g. rch vs poor, employed vs unemployed) and the benefts of statstcal aggregaton apply. It s perhaps surprsng that measures ntended to assess the general state of SWB over an extended perod (such as overall lfe satsfacton) should be no more relable than measures of affectve experence on dfferent days two weeks apart. One s general level of lfe satsfacton would be expected to change only very slowly over tme, because so do most of ts known correlates (age, ncome, martal status, employment). A key factor behnd ths result s probably the fact that answerng a lfe satsfacton queston explctly nvokes a nonsystematc revew of one s lfe, whch leaves such measures vulnerable to transent nfluences that draw attenton to arbtrary or ncomplete nformaton (e.g. one s mmedate mood, the weather). By contrast, measures of affectve experence from experence samplng or the DRM do not rely on such cogntve apprasals, and have the beneft of aggregatng over several epsodes and adjectves, They also have the dsadvantage, however, that no two days (even f ntentonally matched, as n our study) are truly the same.

Relablty of SWB Measures 4 Another applcaton of relablty estmates s to assst n the determnaton of approprate sample szes for the measurement of varous emotonal experences. In clncal trals, for example, f SWB measures are one of the outcome varables of nterest, relabltes can be used to help determne the sample sze needed to detect an expected dfference between groups. Because the relabltes are modest, the rsk of ncorrectly concludng that groups do not dffer s of partcular concern. As we saw n our examples of correcton for attenuaton, the true strength of relatonshps could easly be underestmated n the small samples that clncal research must sometmes employ (e.g. wth specal populatons). An alternate desgn approach to larger samples of course would be reduce error by samplng the same people at dfferent ponts n tme.

Relablty of SWB Measures 5 References Andrews, F. M. and Whthey, S. B. 1976. Socal ndcators of well beng: Amercans percepton of lfe qualty. New York: Plenum,. Angrst, J. and Krueger, A. B. 1999. Emprcal Strateges n Labor Economcs, Chapter 3 n O. Ashenfelter and D. Card, eds., The Handbook of Labor Economcs, Volume III, North Holland. Argyle, M. 1999. Causes and correlates of happness. In D. Kahneman, E. Dener & N. Schwarz (Eds.), Well Beng: The Foundatons of Hedonc Psychology. Russell Sage Foundaton. Bell, R. 1998. The structure of autobographcal memory and the event hstory calendar: Potental mprovements n the qualty of retrospectve reports n surveys Memory, 6, 383. Blanchflower, Davd and Andrew J. Oswald. 004. Well-beng over tme n Brtan and the Unted States. Journal of Publc Economcs. 88, 1359-1386. Bound, John, Charles C. Brown, Nancy Mathowetz. 001. Measurement Error n Survey Data." In Handbook of Econometrcs, edted by E.E. Learner and J.J. Heckman. Pp. 3705-3843. New York: North Holland Publshng. Cskszentmhaly M, Larson R. 1987. Valdty and relablty of the Experence-Samplng Method. Journal of Nervous and Mental Dsease, Sep;175(9):56-36. Dener, E., RA Emmons, RJ Larsen, S Grffn. 1985. The Satsfacton Wth Lfe Scale. Journal of Personalty Assessment. 49, 1.

Relablty of SWB Measures 6 Easterln, Rchard A. (1995) Wll Rasng the Incomes of All Increase the Happness of All? Journal of Economc Behavor and Organzaton, 7(1), (June), pp. 35-48. Ed, M., & Dener, E. (004). Global judgments of subjectve well-beng: Stuatonal varablty and long-term stablty. Socal Indcators Research, 65, 45-77. Ferrng, D., S.-H. Flpp and K. Schmdt: 1996, The Skala zur Lebensbewertung: Scale constructon and fndngs on relablty, stablty, and valdty, Zetschrft für Dfferentelle und Dagnostsche Psychologe 17, pp. 141 153. Frey, B. and A. Stutzer. 00. What Can Economsts Learn from Happness Research? Journal of Economc Lterature. 40, No., June 00 Kahneman, D., Dener, E. and Schwarz, N. 1999. Well Beng: The foundatons of hedonc psychology. NY: Russell Sage. Kahneman, Danel and Alan B. Krueger. 006. Developments n the Measurement of Subjectve Well-Beng. Journal of Economc Perspectves. 0, 3 4. Kahneman, D., Krueger, A., Schkade, D., Schwarz, N. and Stone, A. 006. Would you be happer f you were rcher? A focusng lluson. Scence, 31, 1908-1910. Kahneman, D., Krueger, A., Schkade, D., Schwarz, N. and Stone, A. 004. A survey method for characterzng daly lfe experence: The Day Reconstructon Method (DRM). Scence, 306, 1776-1780. Kammann, R. and Flett, R. 1983. Affectometer. A scale to measure current level of happness. Australan Journal of Psychology, 35, 59-65.

Relablty of SWB Measures 7 Kammann, R. 1984. The analyss and measurement of happness sense of well-beng. Socal Indcators Research, 15, 91. Kroenker, Roger, and Bassett, Glbert. 198. Robust tests for heteroscedastcty based on quantle regressons. Econometrca, 50, 43-6. Layard, Rchard. 005. Happness: Lessons from a New Scence,. Pengun Books: London. Lyubomrsky, S. and Lepper, H. S. 1999. A measure of subjectve happness: Prelmnary Relablty and construct valdaton. Socal Indcators Research 46: 137 155. Lyubomrsky, S., Sheldon, K. and Schkade, D. 005. Pursung happness: The archtecture of sustanable change. Revew of General Psychology, 9, 111-131. Nunnally, J. C. 1978. Psychometrc theory. nd edton. New York: McGraw-Hll, pp36-37. Wllam Pavot and Ed Dener. 1993. Revew of the Satsfacton Wth Lfe Scale. Psychologcal Assessment, 5, 164-17. Pavot, W., E. Dener, C. R. Colvn and E. Sandvk. 1991. Further valdaton of the Satsfacton wth Lfe Scale: Evdence for the cross-method convergence of well-beng measures. Journal of Personalty Assessment 49, 71 75. Robnson, M. D. and Clore, G. L. 00. Belef and feelng: Evdence for an accessblty model of emotonal self-reports. Psychologcal Bulletn, 18, 934. Schwarz, Norbert. 1987. Stmmung als Informaton: Untersuchungen zum Enfluß von Stmmungen auf de Bewertung des egenen Lebens. Hedelberg: Sprnger Verlag.

Relablty of SWB Measures 8 Steptoe, A., J. Wardle and M. Marmot. 005. Postve Affect and health-related neuroendocrne, cardovascular, and nflammatory processes. PNAS, 10, no. 18, 6508-1. Stone, A. A., Schwartz, J. E., Schwarz, N., Schkade, D., Krueger, A. and Kahneman, D. 006. A populaton approach to the study of emoton: Durnal rhythms of a workng day examned wth the Day Reconstructon Method (DRM). Emoton, 6, 139-149. Stone, A. A., Shffman, S. S., DeVres, M. W. 1999. Ecologcal momentary assessment. In Well-Beng: The Foundatons of Hedonc Psychology, D. Kahneman, E. Dener, N. Schwarz, Eds. (Russell-Sage, New York, pp. 61 84. Whte, H. 1980, A heteroskedastcty-consstent covarance matrx estmator anda drect test for heteroskedastcty. Econometrca. 48, 817-838.