Measurement properties of EQ-5D-3L and EQ-5D-5L in recording self-reported health status in older patients with substantial multimorbidity and polypharmacy

Bhadhuri, Arjun; Kind, Paul; Salari, Paola; Jungo, Katharina Tabea; Boland, Benoît; Byrne, Stephen; Hossmann, Stefanie; Dalleur, Olivia; Knol, Wilma; Moutzouri, Elisavet; O’Mahony, Denis; Murphy, Kevin D.; Wisselink, Linda; Rodondi, Nicolas; Schwenkglenks, Matthias

doi:10.1186/s12955-020-01564-0

Research
Open access
Published: 29 September 2020

Measurement properties of EQ-5D-3L and EQ-5D-5L in recording self-reported health status in older patients with substantial multimorbidity and polypharmacy

Arjun Bhadhuri ORCID: orcid.org/0000-0003-1220-0731¹,
Paul Kind²,
Paola Salari¹,
Katharina Tabea Jungo³,
Benoît Boland⁴,
Stephen Byrne⁵,
Stefanie Hossmann⁸,
Olivia Dalleur⁴,
Wilma Knol⁶,
Elisavet Moutzouri^3,7,
Denis O’Mahony⁹,
Kevin D. Murphy⁵,
Linda Wisselink⁶,
Nicolas Rodondi^3,7 &
…
Matthias Schwenkglenks¹

Health and Quality of Life Outcomes volume 18, Article number: 317 (2020) Cite this article

4714 Accesses
19 Citations
6 Altmetric
Metrics details

Abstract

Background

The EQ-5D-3L and EQ-5D-5L are two generic health-related quality of life measures, which may be used in clinical and health economic research. They measure impairment in 5 aspects of health: mobility, self-care, usual activities, pain/discomfort and anxiety/depression. The aim of this study was to assess the performance of the EQ-5D-3L and EQ-5D-5L in measuring the self-reported health status of older patients with substantial multimorbidity and associated polypharmacy.

Methods

Between 2017 and 2019, we administered EQ-5D-3L and EQ-5D-5L to a subset of patients participating in the OPERAM trial at 6 months and 12 months after enrolment. The OPERAM trial is a two-arm multinational cluster randomised controlled trial of structured medication review assisted by a software-based decision support system versus usual pharmaceutical care, for older people (aged ≥ 70 years) with multimorbidity and polypharmacy. In the psychometric analyses, we only included participants who completed the measures in full at 6 and 12 months. We assessed whether responses to the measures were consistent by assessing the proportion of EQ-5D-5L responses, which were 2 or more levels away from that person’s EQ-5D-3L response. We also compared the measures in terms of informativity, and discriminant validity and responsiveness relative to the Barthel Index, which measures independence in activities of daily living.

Results

224 patients (mean age of 77 years; 56% male) were included in the psychometric analyses. Ceiling effects reported with the EQ-5D-5L (22%) were lower than with the EQ-5D-3L (29%). For the mobility item, the EQ-5D-5L demonstrated better informativity (Shannon’s evenness index score of 0.86) than the EQ-5D-3L (Shannon’s evenness index score of 0.69). Both the 3L and 5L versions of EQ-5D demonstrated good performance in terms of discriminant validity, i.e. (out of all items of the EQ-5D-3L and EQ-5D-5L, the pain/discomfort and anxiety/depression items had the weakest correlation with the Barthel Index. Both the 3L and 5L versions of EQ-5D demonstrated good responsiveness to changes in the Barthel Index.

Conclusion

Both EQ-5D-3L and EQ-5D-5L demonstrated validity and responsiveness when administered to older adults with substantial multimorbidity and polypharmacy who were able to complete the measures.

Introduction

Economic evaluations in health care involve the comparison of the costs and the benefits of different health technologies [1]. Cost-effectiveness analysis is a widely accepted form of economic evaluation. Cost-utility analysis (CUA) is a specific form of cost-effectiveness analysis in which the benefits of health technologies are measured in terms of quality adjusted life years (QALYs) [1]. The QALY is a composite measure of both quantity and quality of life.

EQ-5D is a generic measure of health-related quality of life (HrQoL) which can be used in clinical and economic studies, and is the recommended measure in National Institute of Health and Care Excellence (NICE) guidelines for calculating QALYs in cost-utility analysis in England and Wales [2]. EQ-5D consists of 5 dimensions of health i.e. mobility, self-care, usual activities, pain/discomfort and anxiety/depression [3]. It also includes a visual analogue scale (EQ-VAS), which asks participants to rate their overall health on a scale from 0 to 100. In the original of the EQ-5D (EQ-5D-3L), each dimension of health included 3 answer options (levels) to measure whether participants were experiencing no problems, some/moderate problems or severe/extreme problems [3]. However, there were concerns that the use of only 3-levels resulted in these levels being too broad so that the EQ-5D-3L measure offered only limited information on the degree to which respondents’ health was impaired, and was also less sensitive to changes in respondents’ health status over time. As a result, a 5-level version of the EQ-5D (EQ-5D-5L) was subsequently developed and introduced in 2009 to address these concerns by providing two additional levels for each dimension to enable a more nuanced profile of an individual’s health status to be elicited. In the EQ-5D-5L, each dimension of health includes 5 levels to measure whether participants are experiencing no problems, slight problems, moderate problems, severe problems, or extreme/unable problems [4]. Henceforth in this article, we refer to the EQ-5D-3L and EQ-5D-5L instruments as 3L and 5L respectively.

For the purposes of economic evaluation, EQ-5D responses can be converted into a single index summary score based on questionnaire responses to the 5 dimensions of health by using a valuation algorithm based on the social preferences of the general population. Such evaluation algorithms are country-specific and are currently available for a number of countries.

The measurement properties of any HrQoL instrument, such as distributional properties, consistency, reliability and validity, should be evaluated in order to assess its appropriateness for use in a specific patient population [5]. A measurement instrument may exhibit good distributional properties if the presence of ceiling and floor effects are low (so that responses are not concentrated within the highest and/or lowest levels of an instrument). Both 3L and 5L should demonstrate consistency with each other if participants’ responses to the 5L matched with the corresponding levels of the 3L when both measures were administered at the same time point [6]. Reliability analysis assesses the ability of an instrument to provide reproducible measurements, whereas validity analysis involves assessing the extent to which an instrument measures what it purports to measure [7]. Convergent (and discriminant) validity and responsiveness are two types of validity analysis. An instrument exhibits convergent validity if it is highly correlated with a related instrument, whereas an instrument exhibits discriminant validity if it has a comparatively low correlation with an unrelated instrument [5].

Responsiveness may be described as ‘longitudinal validity’, and assesses the degree to which an instrument is able to respond to a meaningful or clinically important external change over time [5, 8]. Given the most common function of the EQ-5D is to detect changes in HrQoL over time in clinical trials, it is particularly important to evaluate the responsiveness of EQ-5D. An anchor-based analysis may be performed to assess responsiveness. The objective of an anchor-based analysis is to assess whether scores on the measure of interest (i.e. 3L or 5L) change in the expected direction when compared with changes in the scores of a related construct or measure (the ‘anchor’ measure) [9, 10]. For an anchor-based responsiveness analysis to be undertaken, it is necessary that the anchor measure is responsive in the study population.

We are aware of two previous studies which have compared the 3L and 5L versions of EQ-5D for people of any age with multimorbidity (defined in these studies as ≥ 2 chronic conditions) [11, 12]. Our study is the first we are aware of, to examine the responsiveness of the 3L or 5L in a population with substantial multimorbidity (our sample presented with a mean of 11.5 chronic conditions upon entry into our study) and polypharmacy (defined for this study, as 5 or more different regular drugs for more than 30 days). The definition of multimorbidity used in this study was based on the inclusion criteria for the underlying clinical trial that provided the data basis for the present study (presence of ≥ 3 concurrent chronic conditions), and is stricter than the definition of multimorbidity which is typically used in the clinical field (presence of ≥ 2 concurrent chronic conditions) [13, 14]. Studying the measurement properties of the 3L and 5L in this population is of significant interest, because this population is increasing in prevalence over time [15]. Our study is also the first head-to-head study we are aware of (i.e. the same individual completing both the 3L and 5L) that has been undertaken for this population. In terms of studies comparing the measurement properties of 3L and 5L versions of EQ-5D, many studies have been carried out across other populations [16]. Most of these studies showed that the 5L is highly consistent with 3L responses, as well as offering a better level of performance in terms of reduced ceiling effects and better informativity compared to the 3L [16,17,18]. Ceiling effects occur when a high proportion of subjects have maximum scores on the measurement of interest. A smaller number of studies, which have applied modern test theory through Rasch analysis, have also indicated improved performance of the 5L compared to the 3L in terms of demonstrating greater sensitivity [19, 20]. Furthermore, we are aware of only six studies comparing the responsiveness of the 5L and 3L. Of these, three studies found that the 5L was more responsive than the 3L [21,22,23], two found that the measures exhibited similar responsiveness [24, 25], and one study of 112 stroke patients indicated that the 3L was more responsive than the 5L [26].

The main objectives of this study were to:

a
Assess discriminant validity, informativity and responsiveness of the 3L and 5L versions of EQ-5D in an older adult population with substantial multimorbidity, and polypharmacy.
b
Assess consistency of the 3L and 5L, in an older adult population with substantial multimorbidity, and polypharmacy. Consistency involves assessing the extent to which responses based on 3L correspond to those based on 5L.

Methods

Data collection

The OPERAM clinical trial is a two-arm, cluster-randomised controlled trial of a structured medication review assisted by a software-based decision support system versus usual care, funded by the European Union Horizon 2020 programme (trial identifier: NCT02986425) [14]. The trial was conducted in four centres in Belgium, Ireland, Netherlands, and Switzerland with a follow-up period of 12 months. The trial participants were 2,008 people aged 70 years or above with both multimorbidity (experiencing 3 or more chronic conditions concurrently) and polypharmacy (5 or more different regular drugs) [14]. The trial intervention was based on the so-called ‘Systematic Tool to Reduce Inappropriate Prescribing’ (STRIP) assistant, which is deployed using a clinical decision support system [27]. STRIP is a structured method for performing customised medication reviews and to detect potentially inappropriate prescribing [14], based on STOPP/START version 2 criteria for potentially inappropriate medications (STOPP) and potential prescribing omissions (START) [28]. Baseline characteristics of the OPERAM trial participants included a mean age of 80 years, 55% being male, 24% being university educated, and presenting with a mean of 11.0 comorbidities at baseline. The chronic comorbid conditions most commonly reported by OPERAM trial participants at baseline were hypertension (n = 1309; 65%), hypercholesterolemia (n = 725; 36%) and atrial fibrillation (n = 724; 36%). The OPERAM trial had broad eligibility criteria to improve representativeness for the population of interest, and external validity.

During the 6 month and 12 month participant telephone interviews, which took place between 2017 and 2019, all trial participants were asked to verbally complete questionnaires read out to them by the trial primary researcher, to elicit a range of trial outcomes (including the Barthel Index, EQ-VAS and EQ-5D-5L). The EQ-5D-5L was included in the OPERAM trial as part of a pre-planned, within-trial health economic analysis, but also to assess HRQoL clinically [14] (primarily by using the EQ-VAS). In addition, after initial trial recruitment had been completed, the 3L questionnaire was administered in the same way to a subset of participants of the OPERAM trial at the 6 month and 12 month follow-up time points. We chose these time points to collect EQ-5D data for assessment of responsiveness, as we judged that a 6 month interval was sufficient time for clinically important changes in patient’s health to occur. We did not use a longer time period, as it would have exacerbated the generation of missing data due to the substantial mortality rate in the target population. Based on the standard operating procedure for the administration of trial questionnaires, the 3L was administered at the end of the telephone interview. For patients with potential difficulty in concentrating, the 5L was the first questionnaire administered, followed in sequence by (1) the Morisky Medication Adherence Scale (MMAS-8), (2) the Barthel Index and (3) Beliefs about Medicines Questionnaire (BMQ).

In the course of the 6 month follow-up interviews, patients were consecutively added to the present study until a maximum of 75 participants was reached for each country [implying a planned maximum of 300 participants in total; 300 being comparable to the sample size of other responsiveness studies comparing 3L and 5L [16]]. In this study nested within a multinational clinical trial, we used the combined sample across all countries to ensure sufficient statistical power. However, for the responsiveness analyses, we also carried out subgroup analyses at the country level to check for potential differences in the responsiveness of the instruments between countries. Questionnaires were completed by patients or proxies on behalf of the patient, usually a family member or other responsible individual [i.e. nursing home employee (if applicable) or the patient’s GP [14]] if the patient presented with cognitive impairment or was otherwise unable to respond. However, our present analysis was restricted to participants who self-completed the EQ-5D measures at 6 and 12 months. It was considered necessary to remove participants for whom a proxy EQ-5D report was obtained, as they were shown to have a markedly different health profile compared to participants who self-completed all EQ-5Ds, reflected by them having statistically significant lower 6 month Barthel Index score (i.e. greater impairments in activities of daily living; p < 0.001). Another reason for removing proxy EQ-5D responses was that these can be divergent from self-completed EQ-5D responses [29]. The inclusion of the proxy responses might have led to a situation where observed differences between 3 and 5L could partially be driven by the proxy responses, with no sufficient possibility to distinguish this. Therefore, we regarded it as more appropriate to focus on the responses directly provided by patients. Ethical approval for the study was obtained at the four OPERAM clinical sites.

Calculation of EQ-5D scores

To be consistent and because no equivalent value set exists for Switzerland, we used German EQ-5D value sets for all analyses. The German time trade-off value set was used to calculate 3L scores (utilities) [30], and the German cross-walk algorithm was used to calculate 5L scores [31]. The German crosswalk algorithm maps 5L responses onto the German 3L value set to calculate 5L scores.

Statistical analysis

All psychometric analyses were restricted to participants who self-completed all items of the 3L and 5L instruments at 6 months and 12 months.

Descriptive statistics

Descriptive statistics were calculated for the study sample, including participant characteristics and the distribution of participants across all of the levels and dimensions of the 3L and 5L at baseline (6 month responses) and follow-up (12 month responses) [26]. Volume and patterns of missing data for the 3L and the 5L were assessed. We also calculated correlation coefficients between the 3L and 5L index scores, between the 3L and VAS, and between the 5L and VAS. A very high correlation between the 3L and 5L index scores might indicate the instruments produce similar results and imply that they could be used interchangeably [32].

Consistency and redistribution properties

The consistency of the EQ-5D at 6 months (i.e. first measurement time point) was evaluated by cross-tabulating within-participant 3L and 5L responses. An inconsistent response was defined as a 5L response that was two or more levels away from the corresponding 3L response [6]. For example, an inconsistent response would be established for a participant who reported level 1 (no problems) using 3L but reported level 3 (some problems) or worse for the same dimension using 5L. An exception to this rule was made for the mobility item. Here, we considered responses from participants who reported with the 3L some problems in walking about, and also reported with the 5L being unable to walk about, to not be inconsistent. This is because the 3L mobility item is categorised into a person having “no problems in walking about”, “some problems in walking about” and being “confined to bed”. Patients who report being unable to walk about with the 5L, may not necessarily be confined to bed and may therefore instead logically report having “some problems in walking about” with the 3L.

The proportions of inconsistent responses for each of the dimensions were computed. For consistent responses, the redistribution properties of the 5L were also assessed in the cross-tabulation. For example, we were able to assess the redistribution of participants who reported ‘some problems’ for a 3L dimension, across the ‘some problems’, ‘moderate problems’ and ‘severe problems’ levels of the corresponding 5L dimension.

Ceiling effects

The proportion of participants who reported ‘no problems’ for each dimension of the 3L and the 5L was assessed. We also examined the proportion of participants who reported no problems for all dimensions of 3L and 5L (i.e. index scores of 1). McNemar’s test was used to test whether there were statistically significant differences in ceiling effects between the measures for each dimension [33]. A previous study of the general German population found that approximately 39% of respondents aged 70–79 years and 7.6% of respondents aged 80+ years reported ‘no problems’ for all 5 items of the EQ-5D-5L [34].

Discriminant validity

The discriminant validity of the EQ-5D-3L and 5L was assessed by computing Spearman’s rho between each of the EQ-5D items, and the Barthel Index at 6 months [33]. The Barthel Index is a measure of individual performance in activities of daily living (ADLs) widely used in the field of rehabilitation, consisting of 10 items. Barthel Index scores range from 0 (indicating ‘total’ dependency in ADLs) to 100 (indicating no dependency in ADLs) [35]. Spearman’s rho effect sizes of between 0.20 and 0.35 were considered weak, between 0.35 and 0.50 moderate and > 0.50 strong [33]. We assessed discriminant validity for the 3L and the 5L by testing the hypothesis that Spearman’s rho for the EQ-5D anxiety/depression or pain/discomfort items with the Barthel Index would be lower than for the other EQ-5D items. This is because the other EQ-5D items (mobility, self-care, usual activities) measure functioning, thereby being expected to correlate better with the Barthel Index which measures ADL-related functioning [36].

Responsiveness

The responsiveness of the EQ-5D-3L and 5L measures to changes in the Barthel Index and the EQ-VAS over time (i.e. between 6 and 12 months) was assessed by using an anchor-based analysis [8]. The Barthel Index and EQ-VAS were also secondary outcome measures in the OPERAM trial [14], due to their perceived responsiveness in the OPERAM population. The EQ-VAS is a visual analogue scale measure of a person’s self-assessed health with status ranging from 0 to 100 [37]. The ‘anchor’ measures (Barthel Index and EQ-VAS) were each sub-divided into three categories to reflect whether (1) the participant’s score for the anchor measure improved clinically, (2) did not change in a clinically important way, or (3) clinically worsened between 6 and 12 months. The threshold for a clinically important change was determined using a literature-based minimal clinically important difference (MCID) estimate of 8 points for the EQ-VAS [37], and any change in the total score of the Barthel Index can be considered clinically important [38]. Standardised effect sizes (Cohen’s D) were calculated for changes in EQ-5D scores between 6 and 12 months. Cohen’s D effect sizes of between 0.2 and 0.5 were considered small, 0.5 and 0.8 moderate and > 0.8 large [39]. A high degree of responsiveness of the EQ-5D-3L/5L measures would be indicated through their demonstrated ability to detect change in the anchor measures, i.e. positive effect sizes (moderate or large) for the EQ-5D when there is an improvement in the anchor measure and negative effect sizes when there is a worsening in the anchor measure. In our study, both the EQ-5D-3L and EQ-5D-5L were administered in full, including their VAS parts that are introduced slightly differently. We assessed responsiveness of the EQ-5D-3L to change in the 3L-VAS measure, and responsiveness of the EQ-5D-5L to change in the 5L-VAS measure, as we observed differences between 3L-VAS responses and 5L-VAS responses elicited at the 6 month time point, in 15 out of the 224 participants in our sample (although we observed no differences between 3L-VAS responses and 5L-VAS responses at the 6 month time point in 209 out of the 224 participants, suggesting that broadly, the VAS can still be considered a common anchor measure for our analysis). The 3L-VAS and the 5L-VAS measures are for all essential purposes, identical measures.

Informativity

Informativity of the EQ-5D-3L and the 5L measures were assessed at 6 months using the Shannon index (H′) and the Shannon evenness index (J′) [40]. H′ was calculated for each dimension of the 5L using the formula: H′ = − (proportion_none*log2(proportion_none) + proportion_some*log2(proportion_some) + proportion_moderate*log2(proportion_moderate) + proportion_severe*log2(proportion_severe) + proportion_extreme/unable*log2(proportion_extreme/unable), and similarly calculated for the 3L dimensions [21]. Higher H′ values indicate that responses to the dimension are more evenly spread across the different categories of the dimension, and consequently suggest greater informativity. The formula for the Shannon evenness index is: J′ = H′/H′max. The value of H’max for the 3L is log2(3) = 1.58 and for the 5L is log2(5) = 2.32. Unlike H′ values, J′ values lie on a common 0 to 1 scale allowing for direct comparison of results from the 3L with the 5L.

Results

Descriptive statistics

At the 6 months follow-up in the OPERAM study, 256 (83%) of patients reported the EQ-5D measures themselves, 45 (15%) had the EQ-5D measures reported by proxy by their next of kin, and 8 (2%) had the EQ-5D measures reported by proxy by some other individual (unspecified). Of the 256 participants, 224 participants also self-reported EQ-5D measures at 12 months, and with full completion of all 3L and 5L items at 6 and 12 months. This sample of 224 participants was used for all analyses and included participants who reported inconsistent responses. Age, gender, education level and comorbidity characteristics of the sample analysed for this study, were broadly similar to the characteristics of the overall OPERAM trial population (described in the methods section).

Summary statistics are provided in Table 1, showing that 56% of participants were male, 28% were university educated, the highest level of education was completed high school for 46% of participants, 26% of participants did not complete high school and 5% had spent some time in the 6 months before the trial started living in a nursing home. The average participant was experiencing a median of 10 coexistent chronic conditions upon entering the OPERAM trial. A small index score reduction of 0.01 (rounded) was observed between 6 and 12 months for both the 3L and 5L.

Table 1 Summary statistics (n = 224)

Measurement properties of EQ-5D-3L and EQ-5D-5L in recording self-reported health status in older patients with substantial multimorbidity and polypharmacy

Abstract

Background

Methods

Results

Conclusion

Introduction

Methods

Data collection

Calculation of EQ-5D scores

Statistical analysis

Descriptive statistics

Consistency and redistribution properties

Ceiling effects

Discriminant validity

Responsiveness

Informativity

Results

Descriptive statistics

Consistency

Ceiling effects

Validity

Responsiveness

Informativity

Discussion

Availability of data and materials

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher's Note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Health and Quality of Life Outcomes

Contact us