Introduction

HRQoL values for health states can be derived indirectly with scaling models from judgmental tasks such as paired comparisons, discrete choices, and ranking. Judgments can also be elicited directly with specific valuation techniques (e.g., time trade-off, visual analogue scale). Frequently, the values for health states obtained by one of the measurement approaches mentioned above are multiplied by a certain amount of time (years) duration, to compute quality-adjusted life years (QALYs) [1]. Basically, the QALY is obtained by adjusting the amount of years lived in a health state by the value associated with this health state. The standard QALY model assumes that health-state values are independent of the duration of the corresponding health state [2, 3]. This assumption implies that in the QALY model, for health states valued as better than dead, such states with a longer duration would always be preferred to the same states with shorter durations. Earlier research indicated that this basic assumption may not hold [47], suggesting that the value assigned to health states can depend on their duration.

The present study is a first step in the development of models for discrete choice experiments, using health profiles that vary simultaneously in health and duration. Our aim is to model the value of health states at various durations of life. Unlike the standard QALY model, the models proposed in this paper estimate how health and duration are valued for durations longer than the standard 10 years. We discuss the possible consequences of our findings for the standard QALY model and options for future health-valuation studies.

Methods

Measurement framework

We adopted the technique of discrete choice (DC) analysis. With this technique respondents are presented with scenarios, in which two or more alternative options, here two health profiles with varying duration, are described. In each scenario, the respondents are assumed to choose the option providing the highest benefit or value. From the responses, it is possible to determine the importance of each attribute relative to the other attributes included in the scenario, and the willingness to give up the benefits of one attribute to gain the benefits of another attribute. Also, it is possible to assess the value for the different health profiles, as a function of their attribute levels.

Over the past 15 years, DC analysis has been used to elicit preferences for healthcare interventions and to value health benefits and patient experience factors [8]. More recently, DC modeling has also been considered as a technique for health-state valuation [916].

Construction of health profiles

We constructed health states based on the EQ-5D health-state system. The EQ-5D descriptive system comprises five domains (mobility, self-care, usual activities, pain/discomfort, anxiety/depression), each one with three possible levels (no problems, some problems, severe problems). Overall, 243 (35) different health states can be described with this system, with 11111 representing the best and 33333 the worst health state [17].

Duration was included as a sixth attribute in the profiles (Fig. 1). To achieve sufficient differentiation for duration and to cover a wide range, six different levels were chosen: 1, 5, 10, 15, 30, and 50 years followed by death. In particular, “1 year” was chosen because it was recognized as the minimum duration feasible and comprehensible in this study; “10 years” is used in conventional time trade-off (TTO) valuations from which QALYs are calculated; “50 years” is about the maximum life expectancy of people aged 20–25 years, who participated in this study. To obtain intermediate estimates and to make comparisons between these durations easier for the respondents, we added the attribute levels of 5, 10, and 15 years.

Fig. 1
figure 1

Example of a scenario containing two health profiles

A fractional factorial experiment was designed to obtain a total of 60 different pairs of health profiles (Table 1). This design allowed the estimation of models with both main effects and several interaction terms. The 60 pairs were selected using a Bayesian efficient discrete choice experimental design applied to the EQ-5D descriptive system [19].

Table 1 Final set of 60 pairs of EQ-5D health states integrated with life durations

To arrive at an acceptable cognitive burden, each participant received a subset of nine pairs. The nine pairs we randomly selected from the 60 pairs using a bounded randomization procedure to ensure that all pairs were valued a similar number of times, allowing a maximum difference of three valuations per pair.

This study was conducted as part of a larger project within the EuroQoL Group, in which the main objective was to explore the application of DC analysis to derive values for health states described with the EQ-5D instrument, and to compare these with values obtained using the TTO approach [14].

The Bayesian approach was adopted in the main study to improve the efficiency of the experimental design, allowing for more precise estimates. In particular we applied an iterative procedure (nested Monte Carlo simulation) with a computer algorithm [18]. Using this procedure, 2,000 possible fractional experimental designs were randomly selected from the full design. These 2,000 designs were compared on their D-errors, which were computed on the basis of expected values of the main effects model, and the most efficient design was selected. More details about the algorithm and procedure used can be found in the paper by Stolk et al. [14]. As priors for the main effects estimates, we included the weighted average of estimates obtained from previous studies using the TTO technique [916].

Because no prior information for duration was available, levels for duration were included manually in the final design: levels for duration were balanced among the pairs (level balance), and were always different between the health profiles described within each scenario, following the minimum overlap criterion [22] (Table 1).

Subjects

A convenience sample of university students from the Erasmus University in Rotterdam, the Netherlands, was recruited. Practical reasons justified the choice of involving university students, including it being realistic for respondents to consider scenarios that lasted as long as 50 years.

Procedure

Students were gathered into a group, given online exercises to complete, and assisted by a researcher if needed. For their participation in the study the students were offered €20. No ethical approval was required for this study.

Modeling

The results of three models are presented. The first model mimics the QALY model. The second model has many more parameters, to arrive at a flexible description of the preferences for duration. The third model is similar to the first model but uses a logarithmic function of duration to describe non-linear preferences for duration. Conditional logistic regression models were applied to analyze the DC data [23], using STATA software (v.11.0, routine clogit). These models belong to the group of probabilistic choice models and are embedded in random utility theory [24]. From these models, the value assigned to an option and its attribute level can be estimated with a linear, additive function of the attribute levels included in the health profiles.

We started to analyze the data with model 1, which is intended to reflect the standard QALY model as closely as possible:

$$\begin{aligned} V = &\beta_{0} + \beta_{1} X_{\text{MO2}} + \beta_{2} X_{\text{MO3}} + \beta_{3} X_{\text{SC2}} + \beta_{4} X_{\text{SC3}} + \beta_{5} X_{\text{UA2}} + \beta_{6} X_{\text{UA3}} + \beta_{7} X_{\text{PD2}} + \beta_{8} X_{\text{PD3}} \\ & + \, \beta_{9} X_{\text{AD2}} + \beta_{10} X_{\text{AD3}} + \beta_{11} X_{\text{years}} + \beta_{12} \left( {X_{\text{MO2}} \times X_{\text{years}} } \right) + \beta_{13} \left( {X_{\text{MO3}} \times X_{\text{years}} } \right) \\ & + \beta_{14} \left( {X_{\text{SC2}} \times X_{\text{years}} } \right) + s\beta_{15} \left( {X_{\text{SC3}} \times X_{\text{years}} } \right) + \beta_{16} \left( {X_{\text{UA2}} \times X_{\text{years}} } \right) + \beta_{17} \left( {X_{\text{UA3}} \times X_{\text{years}} } \right) \\ & + \beta_{18} \left( {X_{\text{PD2}} \times X_{\text{years}} } \right) + \beta_{19} \left( {X_{\text{PD3}} \times X_{\text{years}} } \right) + \beta_{20} \left( {X_{\text{AD2}} \times X_{\text{years}} } \right) + \beta_{21} \left( {X_{\text{AD3}} \times X_{\text{years}} } \right) \\ \end{aligned}$$
(1)

In model 1, V represents the value assigned to each health profile. The X terms represent the different attribute levels. The EQ-5D attribute levels are included as dummy variables. Each EQ-5D domain has three levels (no problems, some problems, and severe problems), therefore two dummy variables are included per EQ-5D domain, using “no problems” as a reference. Hence, for each EQ-5D domain, we estimated the value assigned to having some problems (labeled as MO2 for mobility, SC2 for self-care, UA2 for usual activities, PD2 for pain/discomfort, AD2 for anxiety/depression) and to having severe problems (labeled as MO3 for mobility, SC3 for self-care, UA3 for usual activities, PD3 for pain/discomfort, AD3 for anxiety/depression), relative to having no problem. Duration is included in the model as a continuous linear variable to estimate the mean value assigned to each year of duration ranging from 1 to 50 years. As regards the regression coefficients (β), the constant β 0 was initially included to check for the presence of systematic effects on choices, such as a tendency to always choose the same left health profile [25]; β 1–11 represent the weights estimated for the main effects of the attribute levels, while β 12–21 represent the effects of the interactions between the attribute levels and duration.

Model 1 is a multiplicative model since it includes first-order interaction terms between each EQ-5D attribute level and duration. Though we would have preferred to leave out the main effects in model 1, in order to mimic the multiplicative QALY model as closely as possible, this was not possible for statistical reasons. Main effects must remain in a multiplicative model in order to avoid biases in the estimates of the main effect and interaction coefficients.

To understand how duration is modeled for different health states, we created a second model in which duration was included as a categorical variable using the six levels presented to the respondents. In order to obtain a model able to estimate interactions between every EQ-5D attributes and duration, the duration levels were collapsed from 6 to 4, and then combined with the EQ-5D attribute levels (model 2). Finally, we estimated models in which several continuous non-linear functions of duration were tested, such as square root or logarithmic functions. The most efficient model (model 3) is presented. The statistical efficiency of the models was investigated using (1) goodness of fit tests, measuring the correspondence of the model with the observations, and (2) the parsimony criterion, which focuses on the minimum number of variables necessary to obtain reliable estimates. In particular, we calculated the pseudo (McFadden) R 2 adjusted for the number of parameters included in the model, the Akaike information criterion (AIC), and the Bayesian information criterion (BIC).

The Wald test was applied to test whether the βs were significantly different from 0 at a two-tailed p value <0.05.

We checked how many respondents showed ‘lexicographic’ preferences; i.e., whether they always chose the option with the longer time duration. Additional analyses were done for the models while excluding such respondents.

From the estimates obtained from the selected models, trends were calculated (shown as curves) for a sample of 15 possible EQ-5D health states: best health (11111), 13 mild, moderate and severe states (11113, 11131, 11133, 11312, 13311, 31311, 12333, 23232, 32211, 32223, 32313, 31333, 33323), and the worst state (33333).

To scale the values, the health profile (11111, 1 year) was chosen as the reference profile. In other words, V equals 0 for the health profile (11111, 1 year) because the reference categories are one for the EQ-5D domain levels and 1 year for duration. V < 0 is therefore interpreted as less preferred than (11111, 1 year), and V > 0 is interpreted as more preferred than living (11111, 1 year). Thus, values obtained with the proposed models do not refer to values anchored between 0, for dead, and 1, for full health.

Results

Subjects

Between June and July 2008, choice data were obtained from 208 students, out of the 209 enrolled in the study. Data from one respondent were erroneously not saved. Accordingly, a total of 1,872 observations (choices) were obtained. The participants were aged from 17 to 39 years (mean ± SD = 22.7 ± 3.5), 30.3 % were males. Twelve percent of the participants were Bachelor, while 88 % were Master students. Each pair of health profiles was answered by a minimum of 21 to a maximum of 40 respondents.

Model selection

Sixteen respondents (7.7 %) showed lexicographic preferences for the health profiles, always preferring the longer time durations. The remaining respondents chose a health profile with a shorter time duration at least once. Excluding the respondents with dominant preferences from the analyses did not alter the estimated parameters. Therefore, the results obtained from the full study sample are shown and discussed.

In the models analyzed, the constant β 0 was not statistically significant, meaning that there was no systematic preference for the right or left health profile. Therefore, β 0 was excluded from the models below.

As regards the model in which duration is included with categorical dummy variables, model 2 (Table 2), some duration levels were collapsed to obtain 4 levels: 1 year (reference level), 5 and 10 years collapsed, 15 and 30 years collapsed, and 50 years.

Table 2 Results of model 1 (including duration as a linear variable), model 2 (in which duration is included as a categorical variable) and model 3 (including a logarithmic function for duration)

Estimates from models 1, 2 and 3 are shown in Table 2. In the three models, a number of estimates for the EQ-5D attribute levels are positive, and in model 1 and 3 some of these are statistically significant. Main effects estimates for duration are positive and significant in all three models. Estimates of interaction terms are negative in most cases and, in models 1 and 3, are often statistically significant.

Relationships between quality and quantity of life on individuals’ preferences

Figure 2a, b and c were generated to show the values for each health state at the different time durations. Model 1, in which duration was linear (similar to the standard QALY model), produced straight diverging lines (see Fig. 2a). The estimates from model 2, with different weights for the 4 collapsed levels of duration, showed non-linear trends for each health state (Fig. 2b). Some states were represented by increasing curves, for instance the state 11111. Some states, e.g. 23232, were initially increasing, and decreasing for longer durations. State 33333 is represented by decreasing curves. Turning to the logarithmic model 3 in Fig. 2c, curves with a decreasing slope for longer durations, i.e., with a decreasing marginal utility (or disutility) were found for states better and worse than dead. Although there are clear differences between Figs. 2b and c, the more flexible model 2 partly resembles the logarithmic model 3, supporting the logarithmic model.

Fig. 2
figure 2

Value estimates from model 1 (a), from model 2 (b) and from model 3 (c). V estimated for the state 11111 lived for 1 year is 0 (reference health profile). V < 0 is interpreted as less preferred than living in the state 11111 for 1 year (gray area), V > 0 is interpreted as more preferred than living in the state 11111 for 1 year (white area)

The performance values in Table 2 suggest that values are non-linearly related to duration. This is shown by the poorer performance of model 1 compared to model 3, both having 21 parameters. Model 3 outperforms model 1 on all goodness of fit tests. Specifically, the AIC value for model 3 (AIC = 1,990.84) is much better than that of model 1 (AIC = 2,047.99).

Discussion

With the present discrete choice (DC) study we developed a model for values assigned by individuals to health states with various durations. The model gives insights on the likely trend of these values for life durations lived up to 50 years, which corresponds to the life expectancy of the participants. The most important finding is the clear evidence for non-linear values for duration. In particular, values for duration were best summarized using a logarithmic function. This finding is novel in the context of DC experiments comparing health states with various durations.

Recently, Bansback and colleagues [16] published a discrete choice experiment which included a combination of the EQ-5D domain levels and a life-years attribute, in order to compare its performance with values obtained with the time trade-off method. They found that discrete choices are promising as a stand-alone method for producing values amenable for QALY calculations. The authors included in their design durations up to 10 years, and modeled duration as linear. In contrast, we included longer durations up to 50 years and found evidence for non-linear values for duration.

We observed multiplicative terms between duration and the EQ-5D levels, which is in accordance with the multiplication of quality and duration in the QALY model. This agrees with findings by other researchers [47, 16] that the value of a health profile depends on the time spent in a certain health state. However, in the standard QALY model, as applied in decision models, the value for different states is estimated on durations of up to 10 years under the assumption of linearity. Then, QALYs are calculated using a linear extrapolation to longer durations. Instead, our results suggest that the standard QALY model may need adjustment at least when the value for health states with longer durations are calculated. Furthermore, our results suggest that non-linear trends might be present also for durations shorter than 10 years.

The results of our study can also be placed in the perspective of the maximum endurable time (MET) phenomenon [6, 2628]. A study [29] examined the subjects’ preferences for health states of different time durations compared with immediate death. It was shown that some health states, especially moderate and severe ones, are considered better than death for a short duration, but worse than death for longer durations. Similar patterns appeared also to be present in our data.

We certainly do not claim to be the first to establish the non-linear relationship for time using DC in general. Such results have been found previously in the field of inter-temporal choice [30], and in the health domain by comparing health states with different time delays [31]. However, our results are novel in the context of DC experiments which compare health states with various durations, aiming to arrive at tariffs for health economical evaluations.

A limitation of the present study is that the design of the experiment that included duration could have been more efficient. Furthermore, the small sample size could have reduced opportunities for finding significant main effects for the attributes other than time. The sample consisted of university students only, who represent a highly selected part of the general population. The likely consequence of these limitations is that results are less precise, yielding estimates with unexpected sign or low statistical significance. On the other hand, an advantage of the sample used was the possibility of investigating values for durations up to 50 years. Additional studies should be designed and conducted to obtain more reliable results, applicable to a wider population.

To conclude, this study gives clear evidence that values for health states are non-linearly related with duration. These results were obtained in a discrete choice experiment, in which health profiles were presented with durations up to 50 years. This research suggests that refinement of the standard QALY framework may be needed and further research needs to be done to obtain more precise results and investigate them in a broader population.