Learning Effects in Time Trade-Off Based Valuation of EQ-5D Health States

Similar documents
Time Trade-Off and Ranking Exercises Are Sensitive to Different Dimensions of EQ-5D Health States

Time to tweak the TTO: results from a comparison of alternative specifications of the TTO

Study Protocol: Comparison of Inconsistency between Time Trade Off and Discrete Choice Experiments in EQ-5D-3 L Health State Valuations

How is the most severe health state being valued by the general population?

Department of Economics

Using Discrete Choice Experiments with duration to model EQ-5D-5L health state preferences: Testing experimental design strategies

Valuing health using visual analogue scales and rank data: does the visual analogue scale contain cardinal information?

Time trade-off: one methodology, different methods

Interim Scoring for the EQ-5D-5L: Mapping the EQ-5D-5L to EQ-5D-3L Value Sets

Comparison of Value Set Based on DCE and/or TTO Data: Scoring for EQ-5D-5L Health States in Japan

EuroQol Working Paper Series

LIHS Mini Master Class

Original Research Article Pain Quality of Life as Measured by Utilities

To what extent do people prefer health states with higher values? A note on evidence from the EQ-5D valuation set

Japan Journal of Medicine

Comparing the UK EQ-5D-3L and English EQ-5D-5L value sets

Valuation of EQ-5D Health States in Poland: First TTO-Based Social Value Set in Central and Eastern Europevhe_

Health Economics Working Paper Series HEWPS Number: Exploring differences between TTO and direct choice in the valuation of health states

Waiting times. Deriving utility weights for the EQ-5D-5L

Swedish experience-based value sets for EQ-5D health states

Population Health Metrics

This is a repository copy of Estimating an EQ-5D population value set: the case of Japan.

University of Groningen

Department of Medicine,Yong Loo Lin School of Medicine, National University of Singapore, Singapore

NICE DSU TECHNICAL SUPPORT DOCUMENT 8: AN INTRODUCTION TO THE MEASUREMENT AND VALUATION OF HEALTH FOR NICE SUBMISSIONS

Health Economics & Decision Science (HEDS) Discussion Paper Series

Willingness to pay: a feasible method for assessing treatment benefits in epilepsy?

Functional health state description and valuation by people aged 65 and over: a pilot study

Experience-based VAS values for EQ-5D-3L health states in a national general population health survey in China

Anchoring in the Lead-Time Time Trade-Off: Master Thesis. Carl Haakon Samuelsen. Does the Starting-point Influence Preference Elicitation?

Liv Ariane Augestad and Kim Rand-Hendriksen, 2012

Mapping the EORTC QLQ C-30 onto the EQ-5D Instrument: The Potential to Estimate QALYs without Generic Preference Data

Condition-Specific Preference-Based Measures: Benefit or Burden?

Methods of eliciting time preferences for health A pilot study

Valuation of the SF-6D Health States Is Feasible, Acceptable, Reliable, and Valid in a Chinese Population

Title: The sequence effect in time trade-off elicitation.

University of Bristol - Explore Bristol Research. Publisher's PDF, also known as Version of record

SUPPLEMENTAL MATERIAL

Comparing Generic and Condition-Specific Preference-Based Measures in Epilepsy: EQ-5D-3L and NEWQOL-6D

HEDS Discussion Paper 05/05

VALUE IN HEALTH 21 (2018) Available online at journal homepage:

Unit 1 Exploring and Understanding Data

To what extent can we explain time trade-off values from other information about respondents?

A consistency test of the time trade-off

Changing expectations about speed alters perceived motion direction

Is EQ-5D-5L Better Than EQ-5D-3L? A Head-to-Head Comparison of Descriptive Systems and Value Sets from Seven Countries

The role of non-transparent matching methods in avoiding preference reversals in the evaluation of health outcomes. Fernando I.

Validity of the EuroQoL (EQ-5D) Instrument in a Greek General Population

Research and Evaluation Methodology Program, School of Human Development and Organizational Studies in Education, University of Florida

Exploring the reference point in prospect theory

Analysis of EQ-5D scores from two phase 3 clinical trials of romiplostim in the treatment of immune thrombocytopenia (ITP)

Kelvin Chan Feb 10, 2015

Exploration and Exploitation in Reinforcement Learning

Choice of EQ 5D 3L for Economic Evaluation of Community Paramedicine Programs Project

Back to the future : Influence of beliefs regarding the future on TTO answers

Volume 30, Issue 3. Boundary and interior equilibria: what drives convergence in a beauty contest'?

This is a repository copy of The estimation of a preference-based measure of health from the SF-36.

Contingent valuation. Cost-benefit analysis. Falls in hospitals

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n.

Recent developments for combining evidence within evidence streams: bias-adjusted meta-analysis

NIH Public Access Author Manuscript J Clin Epidemiol. Author manuscript; available in PMC 2010 March 1.

Keep it simple: Ranking health states yields values similar to cardinal measurement approaches

Role of respondents education as a mediator and moderator in the association between childhood socio-economic status and later health and wellbeing

Does the correspondence between EQ-5D health state description and VAS score vary by medical condition?

Impact of Chronic Liver Disease and Cirrhosis on Health Utilities Using SF-6D and the Health Utility Index

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

This is a repository copy of A comparison of the EQ-5D and the SF-6D across seven patient groups.

Tails from the Peak District: Adjusted Limited Dependent Variable Mixture Models of EQ-5D Questionnaire Health State Utility Values

Using HAQ-DI to estimate HUI-3 and EQ-5D utility values for patients with rheumatoid arthritis in Spain

BMC Health Services Research

MEA DISCUSSION PAPERS

Interchangeability of the EQ-5D and the SF-6D in Long-Lasting Low Back Pain

Published online: 10 January 2015 The Author(s) This article is published with open access at Springerlink.com

12/31/2016. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

A Comparison of Several Goodness-of-Fit Statistics

Glossary From Running Randomized Evaluations: A Practical Guide, by Rachel Glennerster and Kudzai Takavarasha

Table of Contents. Plots. Essential Statistics for Nursing Research 1/12/2017

ISC- GRADE XI HUMANITIES ( ) PSYCHOLOGY. Chapter 2- Methods of Psychology

Sawtooth Software. The Number of Levels Effect in Conjoint: Where Does It Come From and Can It Be Eliminated? RESEARCH PAPER SERIES

HPS301 Exam Notes- Contents

Effect of Choice Set on Valuation of Risky Prospects

Adaptive Aspirations in an American Financial Services Organization: A Field Study

[1] provides a philosophical introduction to the subject. Simon [21] discusses numerous topics in economics; see [2] for a broad economic survey.

What is Multilevel Modelling Vs Fixed Effects. Will Cook Social Statistics


Supplementary Appendix

Mapping EORTC QLQ-C30 onto EQ-5D for the assessment of cancer patients

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions

Effects of Mode and Order of Administration on Generic Health-Related Quality of Life Scoresvhe_

Early Learning vs Early Variability 1.5 r = p = Early Learning r = p = e 005. Early Learning 0.

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Examining Relationships Least-squares regression. Sections 2.3

Version No. 7 Date: July Please send comments or suggestions on this glossary to

A Possible Explanation to Yea-saying in Contingent Valuation Surveys*

Testing for non-response and sample selection bias in contingent valuation: Analysis of a combination phone/mail survey

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions

A comparison of injured patient and general population valuations of EQ-5D health states for New Zealand

PRIME: impact of previous mental health problems on health-related quality of life in women with childbirth trauma

Responsiveness to feedback as a personal trait

Machine Learning to Inform Breast Cancer Post-Recovery Surveillance

Transcription:

Available online at www.sciencedirect.com journal homepage: www.elsevier.com/locate/jval Learning Effects in Time Trade-Off Based Valuation of EQ-5D Health States Liv Ariane Augestad, MD 1, *, Kim Rand-Hendriksen, Cand. Psychol, MSc 1, Ivar Sønbø Kristiansen, MD, MPH, PhD 2, Knut Stavem, MD, MPH, PhD 1,3,4 1 Health Services Research Centre, Akershus University Hospital, Lørenskog, Norway; 2 Institute of Health Management and Health Economics, University of Oslo, Oslo, Norway; 3 Department of Pulmonary Medicine, Akershus University Hospital, Lørenskog, Norway; 4 Medical Faculty, Faculty Division, Akershus University Hospital, University of Oslo, Lørenskog, Norway A B S T R A C T Objectives: In EuroQol five-dimensional questionnaire valuation studies, each participant typically assesses more than 10 hypothetical health states by using the time trade-off (TTO) method. We wanted to explore potential learning effects when using the TTO method, that is, whether the valuations were affected by the number of previously rated health states (the sequence number). Methods: We included 3773 respondents from the US EQ-5D valuation study, each of whom valued 12 health states (plus unconscious) in random order. With linear regression, we used sequence number to predict mean and standard deviations across all health states. We repeated the analysis separately for TTO responses indicating a state better than death and a state worse than death. Each TTO value requires a specific number of choice iterations. To test whether respondents used fewer iterations with experience, we used linear regression with sequence number as the independent variable and number of iterations as the dependent variable. Results: Mean TTO values were fairly stable across the sequence number, but analyzing state better than death and state worse than death values separately revealed a tendency toward more extreme values: state better than death values increased by 0.02, while state worse than death values decreased by 0.21 (P 0.0001) over the full sequence. The standard deviations increased slightly, while the number of choice iterations was the same over the sequence number. The findings were stable across the levels of health state severity, age, and sex. Conclusions: TTO values become more extreme with increasing experience. Because of the randomized valuation order, these effects do not bias specific health states; however, they reduce the overall validity and reliability of TTO values. Keywords: learning effect, preferences, time trade-off, valuation study. Copyright 2012, International Society for Pharmacoeconomics and Outcomes Research (ISPOR). Published by Elsevier Inc. Introduction Several methods are available to measure preferences for health states. Ideally, the elicitation method should not affect the responses, but there is ample evidence that it does [1,2]. Gaining experience with a specific elicitation method may also influence responses. A study examining willingness to pay reported lower willingness to pay and reduced variance as respondents gained experience with the valuation method [3]. The time trade-off (TTO) method is frequently used to elicit health state values [4,5]. It is used to identify the point of indifference between a fixed length of life in an impaired health state and a shorter life span in perfect health. Utilities are calculated as time in perfect health divided by time in the target health state. The TTO method is a challenging cognitive task, and it is conceivable that gaining experience with the method may influence the resulting values. The EuroQol five-dimensional (EQ-5D) questionnaire is one of the most frequently used multiattribute utility instruments, and it describes composite health states along five dimensions: mobility, selfcare, usual activities, pain/discomfort, and anxiety/depression. Each dimension has three levels: no health problems (level 1), moderate health problems (level 2), and extreme health problems (level 3). The EQ-5D thus describes 3 5 ( 243) health states [6]. The TTO method is dominant in EQ-5D valuation studies, in which respondents typically value 13 to 17 health states [7]. In a Polish EQ-5D TTO valuation study, each participant valued 23 health states but there were no differences between population means or variances for early valuations (6th 17th) and late valuations (18th 23rd) [8]. The first valuations (1st 5th) were considered warm-up exercises and did not include the same sample of health state profiles as the rest of the TTO tasks. To our knowledge, this is the only study that has examined the effect of increasing experience with the TTO exercise on the valuations. It is unknown whether there are effects earlier in the valuation process (1st 5th valuations) or whether experience with the TTO method affects the distribution of the responses in other ways than the mean. In the present study, we use the term learning effect for all systematic differences in responses as a function of increasing experience with the TTO method. Learning effect thus refers to the TTO method, and not learning from valuing specific health states. This is not only * Address correspondence to: Liv Ariane Augestad, Health Services Research Centre, Akershus University Hospital, 1478 Lørenskog, Norway. E-mail: livariane@gmail.com; liv.ariane.augestad@ahus.no. 1098-3015/$36.00 see front matter Copyright 2012, International Society for Pharmacoeconomics and Outcomes Research (ISPOR). Published by Elsevier Inc. doi:10.1016/j.jval.2011.10.010

341 restricted to an improved understanding of the method but may also include strategies enabling the respondent to finish the task quickly and avoid discomfort, exhaustion, boredom, and so on. The primary objective of the study was to identify potential learning effects by analyzing the distribution of respondents values as a function of the number of previously valued health states with the TTO task (sequence number). Because the TTO procedure is a complex task, one might expect that a part of the variance in TTO responses is attributable to respondents not understanding the task (noise) and that this noise would be reduced with increasing experience with the valuation task. Hence, an expected learning effect was a reduction in SD for single health states. The secondary objective was to test whether potential learning effects were stable for health states of different severity, as described by the EQ-5D system, and whether they were stable for respondents across age and sex. Methods Material We used data collected through face-to-face interviews in a large US EQ-5D valuation study and the same exclusion criteria as in the original study [9], after which 3773 respondents were included in our analyses. Each respondent valued 13 health states in a randomized order by using the TTO method. In total, 42 health states were valued. For further details, we refer to the original valuation study [9]. Valuation task Each TTO valuation started by asking the respondent whether he or she (a) would prefer to live 10 years in the presented (impaired) EQ-5D health state, (b) would prefer immediate death, or (c) considered that (a) and (b) were equally (un)preferable. If the respondent chose alternative (c), the valuation for that particular health state was over and the TTO value was set to 0. The method for subsequent valuation was different when choosing alternative (a) states considered better than death (SBD) and alternative (b) states considered worse than death (SWD). The interviewers used props to visualize the different lengths of life A and life B. SBD The aim of the SBD task was to reach a point of indifference between life A (10 years in the impaired EQ-5D health state) and a reduced number of years in life B (perfect health). Throughout the SBD task, life A was held constant at 10 years while life B was varied according to the choices of the respondent (Fig. 1). At equilibrium, the TTO value (u) was calculated by dividing the number of years in perfect health (t) by the number of years in the target health state: u t 10. The upper value is thus restricted to 1, the same as full health. SWD If the respondent preferred life B in the initial choice task, that is, stated that the target health state was worse than death, the choice task proceeded in a different manner. Life A was still 10 years but a composite life of the target health state (x years) and perfect health (10 x years), followed by immediate death. For an SWD valuation, life B was always set at 0 years. The aim of the task was to reach a point of indifference between immediate death and life A, in which time in target health state and time in perfect health were varied simultaneously (Fig. 1). The TTO value (u) was calculated as the negative number of years in perfect health divided by the number of years in the target health state: u t 10 t. Elicited in this way, SWD values have no lower boundary, the lowest possible value depends on the smallest tradable unit of time, which, in this case, was 0.25 years (3 months) [7]. Therefore, the lowest possible score was 39 or 9.75 10 9.75. In previous EQ-5D questionnaire TTO valuation studies, values for SWD had been transformed to restrict the scale to 1 [10]. In this study, we used the most frequently applied transformation method [7]: u t 10 t u u t 1 u 10 Sensitivity analyses were performed by using an alternative transformation algorithm [11], in which the TTO score is simply divided by the positive of the lowest possible value: in this case, 39: u u 39 t 39(10 t) Analyses We plotted all TTO values as a function of the sequence number, regardless of the target health state, and calculated the mean TTO value within each sequence number. We used this mean TTO value as the dependent variable and sequence number as the independent variable in ordinary least squares (OLS) regression. Because SBD and SWD values were elicited differently, the learning effects might also be different. We therefore performed separate analyses for the two categories of responses. To test whether SD decreased with increasing TTO experience, we needed a measure of agreement that was independent of the varying severity of the different health states. To achieve this, we first calculated the means of the TTO values for each of the 42 health states within each sequence number. For each single TTO valuation, we then subtracted the corresponding health state mean. We thus obtained an adjusted mean of zero for each health state, without altering the variation around the mean. We then calculated the SD around the adjusted mean of zero for each sequence number. Finally, we used OLS regression, with the 13 SD values as the dependent variable and the sequence number as the independent variable. Converging values should then translate into a negative slope in the regression model. Using the ping-pong procedure, different TTO values require different number of choice iterations before the respondent reaches the point of indifference. To test whether respondents chose values requiring less choice iteration with cumulative experience, we created a variable corresponding to the number of choice task iterations needed to reach each TTO value. In linear regression, we used this variable as the dependent variable and sequence number as the independent variable. To test whether learning effects were different for health states of different severity, we stratified the health states into three groups (14 states in each) by mean TTO values and replicated all the previous analyses for these. To test whether learning effects differed by sex, we repeated the previous OLS regression analyses with sequence number, sex, and variable sex sequence number interaction as independent variables. We also repeated all the above analyses separately for three age groups ( 35, 35 55, and 55 years). For analyses regarding sex and age, we chose a 5% significance level by using two-sided tests. To test whether learning effects differed by the levels of education, we repeated the previous regression analyses with the material split into five groups according to the level of education (in years of education: 9, 9 11, 12, 13 15, and 15). The analyses were performed by using 12-year education as a baseline, with one dummy variable representing each of the four other education groups, a main sequence number variable, and sequence number education group interaction terms.

342 VALUE IN HEALTH 15 (2012) 340 345 Life A: 10 years in target health state (t) Life B: 0 year in perfect health (p) Number of itera ons 1 2 3 4 5 6 7 8 TTO value 1,000 B: 9.75 0,975 B: 9.5 p 0,950 0,925 B: 9 p 0,900 0,875 B: 8.5 p 0,850 0,825 B: 8 p 0,800 0,775 B: 7.5 p 0,750 0,725 B: 7 p 0,700 0,675 B: 6.5 p 0,650 0,625 B: 6 p 0,600 0,575 B: 5.5 p 0,550 Be er than death. 0,525 Life A is 10 years in B: 5 p 0,500 0,475 target state (t) B: 4.5 p 0,450 Life B varies 0,425 depending on the respondent's choices: B: 4 p 0,400 0,375 B: 3.5 p 0,350 -Prefer Life A: 0,325 -Life A and life B are equal: B:3 p 0,300 0,275 -Prefer Life B: B: 2.5 p 0,250 0,225 B: 2 p 0,200 0,175 B: 1.5 p 0,150 0,125 B: 1 p 0,100 0,075 B: 0.5 p 0,050 0,025 Prefer life A: Life A and B are equal: 0,000 Prefer life B: Transformed TTO values -0,025 A: 9.5 t + 0.5 p -0,050-0,075 A: 9 t + 1 p -0,100-0,125 A: 8.5 t + 1.5 p -0,150-0,175 A: 8 t + 2 p -0,200-0,225 A: 7.5 t + 2.5 p -0,250-0,275 A: 7 t + 3 p -0,300-0,325 A: 6.5 t + 3.5 p -0,350-0,375 A: 6 t + 4 p -0,400-0,425 A: 5.5 t + 4.5 p -0,450 Worse than death -0,475 Life B constant at 0 years A: 5 t + 5 p -0,500 Life A varies depending on the respondent s choices: -0,525 A: 4.5 t + 5.5 p -0,550-0,575 A: 4 t + 6 p -0,600-0,625 -Prefer Life A: -Life A and life B are A: 3.5 t + 6.5 p -0,650-0,675 equal: B: 3 t + 7 p -0,700 -Prefer Life B: -0,725 A: 2.5 t + 7.5 p -0,750-0,775 A: 2 t + 8 p -0,800-0,825 A: 1.5 t + 8.5 p -0,850-0,875 A: 1 t + 9 p -0,900-0,925 A: 0.5 t + 9.5 p -0,950-0,975 Fig. 1 The routing of the time trade-off (TTO) iterations leading to all possible values. The number of iterations needed to reach each specific value is indicated at the top. Results In the OLS regression of the mean TTO score as a function of sequence number, the slope was 0.007 (P 0.0001; adjusted R 2 0.661), indicating a total reduction of 0.084 over the 13 valuations or a reduction of 0.007 per valuation. After decomposing the valuation data into SBD and SWD, the mean valuations for SBD were stable with increasing sequence number while the mean valuations for SWD showed a decrease (Fig. 2). The learning effects were relatively stable throughout the 13 valuations: visually, no specific cutoff could be identified. Stratified OLS regression analyses revealed a mean increase of 0.002 per

343 1.5 0.002 (P 0.0721). This finding was stable across sex and age groups. The model fit for the regressions involving interactions was consistently better than for the main effect analyses in terms of R and (adjusted) R 2. We must point out, however, that these analyses were performed on mean values, and therefore mask an enormous underlying variation in individual response patterns. TTO value 0.5 1 1 4 7 10 13 Sequence number Discussion Summary The results of this study indicate the presence of learning effects in the TTO method; respondents systematically change their responses with increased TTO valuation experience. The mean effects were small when all valuations were analyzed together, but decomposing valuations into SBD or SWD revealed underlying diverging patterns. The mean values for SBD increased slightly with increasing sequence number, while the mean values for SWD decreased substantially. These effects were similar regardless of health state severity. Variance in TTO values increased slightly with the sequence number. The number of iterations when valuing a state remained stable over the 13 states. Fig. 2 Time trade-off (TTO) values as a function of previously rated health states. The circle areas are proportional (log) to the prevalence of the specific TTO value. The continuous line represents the mean of TTO values for states considered better than death (SBD). The dashed line represents the mean of TTO values for states considered worse than dead (SWD). valuation for SBD (P 0.0001; adjusted R 2 0.511) and a difference in mean valuation for SWD of 0.21 from the first to the last valuation or a decrease of 0.016 per valuation for SWD (P 0.0001; adjusted R 2 0.936). On performing the same analysis but using the alternative transformation method for SWD, the mean of SWD values decreased by 0.011 per valuation (P 0.0001). There was a slight increase in SDs as a function of the sequence number (Fig. 3). The OLS regression analysis revealed an increase in SDs from 0.471 to 0.523 from the first to the last valuation. There was a similar pattern of separate learning effects for SBD and SWD valuations across the three severity groups (Table 1). There was no systematic change in the proportion of SWD valuations as a function of sequence number in each of the three severity groups. We observed no significant interactions between sex and sequence number (P 0.44) on learning effects. The oldest age group ( 55 years) did not have any significant increase in SBD valuations as a function of sequence number (P 0.805). When including variables representing education level groups and interactions between the education level and the sequence number, we observed statistically significant interactions for SWD valuations only: The group with the lowest level of education (8 years or less) displayed significantly lower mean values ( 0.069; P 0.001) and significantly less change as a function of sequence number than did the other education level groups ( 0.009 per iteration, as opposed to a mean of 0.016; P 0.009). The group with the highest level of education displayed significantly higher mean values (0.064; P 0.001) and displayed an increased learning effect ( 0.022 per iteration, as compared with a mean of 0.016; P 0.024). The mean number of iterations across all health states was 5.75 (median 7; range 1 8). The correlation between the number of iterations for each valued state and the sequence number was Interpretation The diverging patterns for SBD and SWD values could be interpreted as an increase in the gap effect, mainly for negative values. The gap effect is a previously known characteristic of the TTO method and refers to respondents tendency to avoid values in the vicinity of death (0) [12]. It has been suggested that the gap effect is a framing effect inherent to the TTO method, for instance, by discomfort introduced by the concept of death, which may cause respondents to distance their valuations from death (0) [13]. This is in line with the constructed preference hypothesis that argues that elicited preferences are, to a large extent, formed by the valuation task itself [14]. If we consider that characteristics of the TTO method become more salient with increasing experience, then unwanted framing effects of the method may have a larger impact on later values than on earlier values. There is a negative shift for all SWD values (see Fig. 2), but whether this represents an extended gap effect or a different framing effect remains unknown. Fig. 3 Standard deviations of adjusted time trade-off (TTO) means by sequence number. The TTO values were adjusted by subtracting the corresponding health state mean within each sequence number, resulting in a mean of 0 for all TTO values within each sequence number without altering the variance.

344 VALUE IN HEALTH 15 (2012) 340 345 Table 1 Relationship between sequence number and time trade-off values for health states stratified according to severity. All states Health state severity groups* Mild states Medium states Bad states Coefficient P Coefficient P Coefficient P Coefficient P Linear regression of state better than death values on sequence number Constant 0.747 0.001 0.834 0.001 0.695 0.001 0.602 0.001 Sequence number 0.002 0.001 0.002 0.001 0.003 0.001 0.002 0.086 Linear regression of state worse than death values on sequence number Constant 0.447 0.001 0.412 0.001 0.420 0.001 0.460 0.001 Sequence number 0.016 0.001 0.019 0.001 0.018 0.001 0.015 0.001 * Health states stratified in severity groups of 14 by mean time trade-off values. On the contrary, the discovered preference hypothesis argues that stable and theoretically consistent preferences are typically the product of experience gained through practice and repetition [15]. The TTO method is complex; for instance, the valuation task for SWD requires respondents to find the point of equivalence between death (0) and a composite life of perfect health and impaired health while varying time in perfect health and time in impaired health simultaneously. The complexity of the TTO method might favor the discovered preference hypothesis; respondents might need practice to fully understand the task, leading to increased quality with further repetition. Ideally, increased quality should increase agreement between respondents while we observed slight increases in SDs for single health states. The finding that the number of choice iterations was stable with increasing sequence number suggests that respondents did not alter their stated preferences to finish the task more rapidly. Nevertheless, the respondents might have finished the task faster with increasing sequence number, just by being quicker to state their preferences for life A or life B. We did not, however, have available data to investigate whether respondents used less time for each TTO task with increasing sequence number. The interpretation of how education influences the observed learning effect is uncertain. The values elicited from the high education group displayed the largest shift as a function of sequence number. This shift, however, brought the values from the high education group closer to the values from the low education group. An interpretation of this may be that the TTO task entails an initial framing effect that influences the high education group less than the lower education group. With the repetition of task, the high education group becomes more influenced while the low education group has already exhausted some of the learning effect potential in the first valuation. Limitations The generalizability of these findings beyond TTO-based EQ-5D valuations is uncertain. Within TTO-based EQ-5D valuation studies, the measurement of generalizability would require verification of the findings in other data sets. Health state specific transfer (order) effects [16] or learning effect interactions with specific dimensions within the EQ-5D questionnaire description system may exist but were outside the scope of this study. Our analyses were not suited to determine the number of TTO valuations required to maximize response validity whether later valuations are most valid, as suggested by the discovered preference hypothesis, or the initial valuations are most valid, because they may be less affected by unwanted framing effects. We initially performed separate analyses on each of the 42 valued health states. Because of fluctuations in the number of respondents valuing specific health states at specific sequence number, combined with individual variation in values given to all health states, the noise-to-signal ratio made the results at this level of aggregation inconclusive. To reduce the level of noise, we performed analyses on aggregates across all or groups of health states. Given a much larger sample size, analyses on the level of individual health states could prove highly interesting. A larger sample size could also allow investigation into whether the proportion of respondents valuing specific health states as SWD is a function of sequence number. Practical consequences The health state presentation order was randomized for each respondent, reducing the potential for biasing values for specific health states. The diverging pattern for SBD and SWD responses, however, could have implications for the tariff. Because the mild health states are typically considered SBD, these are primarily influenced by the upward shift of SBD values. Conversely, more severe health states are more often considered SWD and are thereby influenced to a greater extent by the downward shift of the SWD. It follows that the mean distance between health states increases with increasing sequence number. If the observed diverging patterns are generalizable to TTO valuation studies in general, studies where respondents value a large number of health states will result in more extreme values, increasing estimated utility differences between health states. Work is underway to develop an alternative TTO procedure called lead-time TTO (LT-TTO), where SBD values and SWD values are elicited by using one continuous method [13,17]. Because of the added lead time, SWD values may be elicited without a direct comparison to death but rather by trading away the lead time. A feasibility study of the LT-TTO indicates that the new method overcomes the discontinuance of values around dead [13]. We do not know whether this is because the subject of death is less directly invoked or because the scale is uniform (or both). The authors of the study found statistically significant differences in the distribution of LT-TTO values compared with the values derived by using the measurement and valuation of health TTO protocol. The contribution of learning effects to the observed differences between LT-TTO and measurement and valuation of health TTO, however, remains unknown. It is also unknown to which extent respondents give more extreme valuations with accumulating experience with LT-TTO. LT-TTO may have other protocol-specific learning effects. The same applies to other alternative TTO variants, for instance, when using lag time [18]. Future research into the potential learning effects of LT-TTO and other new valuation methods is called for. To gain a better understanding of the processes behind the observed effects, we believe that the best course of action would be qualitative studies using think-aloud techniques while actively manipulating factors

345 believed to contribute. For instance, we speculate that the learning effects may be caused in part by a delayed reaction to discomfort caused by the continuous invocation of immediate death. If this is the case, altered wording to reduce or increase the discomfort should alter the magnitude of change. To better isolate exhaustion effects, participants could be intentionally exhausted before TTO valuation. We observed much greater learning effects for SWD values than for SBD. It is conceivable that this is related to the SWD task manipulating the time in perfect health and the time in the target state simultaneously. If so, altering the SWD task such that only one factor is manipulated at a time should alter the observed learning effects. Conclusion Health state values are usually elicited in face-to-face interviews, and costs are largely driven by the number of respondents. Therefore, there is an interest in having each respondent value a large number of health states. We have demonstrated the presence of learning effects in the assessments of health states with the TTO method, in particular for SWD, indicating that the number of states valued by each respondent may influence the resulting EQ-5D tariffs. The causes of these effects and implications for the validity of TTO responses remain uncertain. An investigation of potential learning effects from other valuation methods could help gain an understanding of the processes involved. Source of financial support: This study was indirectly financed through two PhD grants from the Norwegian Research Council and from the South-Eastern Norway Regional Health Authority. The financing institutions had no involvement in the study or the study design. REFERENCES [1] Krabbe PF, Essink-Bot ML, Bonsel GL. The comparability and reliability of five health-state valuation methods. Soc Sci Med 1997;45:1641 52. [2] Brazier J, Dolan P. Evidence of preference construction in a comparison of variants of the standard gamble method. 2005. Available from: http://eprints.whiterose.ac.uk/10936/. [Accessed April 6, 2011]. [3] Bateman IJ, Burgess D, Hutchinson WG, Matthews DI. Learning effects in repeated dichotomous choice contingent valuation questions. In: Royal Economics Society 2004 Annual Conference, University of Swansea, 2004. [4] Drummond MF. Methods for the Economic Evaluation of Health Care Programmes. Oxford: Oxford University Press, 2007. [5] Dolan P, Gudex C, Kind P, Williams A. Valuing health states: a comparison of methods. J. Health Econ 1996;209 31. [6] Dolan P. Modeling valuations for EuroQol health states. Med Care 1997; 35:1095 108. [7] Norman R, Cronin P, Viney R, et al. International comparisons in valuing EQ-5D health states: a review and analysis. Value Health 2009; 12:1194 200. [8] Golicki D, Jakubczyk M, Niewada M, et al. Valuation of EQ-5D health states in Poland: first TTO-based social value set in central and eastern Europe. Value Health 2010;13:289 97. [9] Shaw JW, Johnson JA, Coons SJ. US valuation of the EQ-5D health states: development and testing of the D1 valuation model. Med Care 2005;43:203 20. [10] Lamers LM. The transformation of utilities for health states worse than death: consequences for the estimation of EQ-5D value sets. Med Care 2007;45:238 44. [11] Torrance GW. Health states worse than death. In: Third International Conference on System Science in Health Care. Berlin: Springer, 1984. [12] Stalmeier PF, Busschbach JJ, Lamers LM, Krabbe PF. The gap effect: discontinuities of preferences around dead. Health Econ 2005;14: 679 85. [13] Devlin NJ, Tsuchiya A, Buckingham K, Tilling C. A uniform time trade off method for states better and worse than dead: feasibility study of the lead time approach. Health Econ 2011;20:348 61. [14] Slovic P. The construction of preference. Am Psychol 1995;50:364 71. [15] Plott CR. The discovered preference hypothesis. In: Arrow KJ, ed., The Rational Foundations of Economic Behaviour: Proceedings of the IEA Conference Held in Turin, Italy. New York: Macmillan Press, 1996. [16] Poulton EC. Biases in quantitative judgements. Appl Ergon 1982;13: 31 42. [17] Robinson A, Spencer A. Exploring challenges to TTO utilities: valuing states worse than dead. Health Econ 2006;15:393 402. [18] Tilling C, Devlin N, Tsuchiya A, Buckingham K. Protocols for time tradeoff valuations of health states worse than dead: a literature review. Med Decis Making 2010;30:610 9.