ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Research Article

Comparison of the EQ-5D-5L and the EQ-5D-3L using individual patient data from the REFORM trial

[version 1; peer review: awaiting peer review]
PUBLISHED 27 Sep 2021
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS AWAITING PEER REVIEW

Abstract

Background: This study compares the 5-level version of the EQ-5D (5L) with the 3-level version EQ-5D (3L) in older adults using individual patient data from the REFORM (REducing Falls with Orthoses and a Multifaceted podiatry intervention) trial.
Methods: EQ-5D-5L and EQ-5D-3L were administered to men and women (n=151) over the age of 65 years alongside the REFORM trial. The two versions of the EQ-5D were assessed in terms of feasibility, level of consistency, ceiling effect and discriminatory power. We also undertook a comparison of the performance of different EQ-5D-3L and EQ-5D-5L value sets.
Results: The proportion of participants that returned a complete questionnaire was higher for the 5L (96.7%) than for the 3L (92.7%). Missing values among dimensions were on average 1.59% (5L) and 1.45% (3L). The ceiling effect was reduced from 18.2% (3L) to 6% (5L). On average the proportion of inconsistent responses between both descriptive systems was 3.25%. Redistribution from 3L to 5L showed valid results for the majority of consistent level combinations, with slight inconsistency in the case of Anxiety/Depression. For the 5L, 67 unique health states were observed for the 5L compared to 27 for the 3L. The absolute informatively improved with the new classification system (5.48 for 5L versus 3.91 for 3L) and relative discriminatory power improved slightly on average (0.90 for 5L versus 0.84 for 3L). The mean difference between the EQ-5D-5L and EQ-5D-3L values was 0.091 (range -0.345 to 0.505); whilst the mean difference between the EQ-5D-5L and the crosswalk values was 0.082 (range -0.035 to 0.293).
Conclusion: In the REFORM clinical trial involving an elderly population, our study supported the feasibility and convergent validity of both EQ-5D-3L and EQ-5D-5L. Results suggest that the 5L improves the ceiling effect and discriminatory power. The EQ-5D-5L scores were significantly higher than both EQ-5D-3L and crosswalk.

Keywords

Health related Quality of life, EQ-5D-3L, EQ-5D-5L, elderly population

Introduction

The National Institute for Health and Care Excellence (NICE) develop evidence-based guidelines on the most effective ways to diagnose, treat and prevent disease and ill health. Part of the evaluation includes a health economic component. Typically, these evaluations use a cost-utility analysis, where health gains are normally expressed in terms of quality-adjusted life years (QALYs), and decisions about whether a treatment is efficient or not are determined by whether the cost per QALY is below a certain threshold. The QALY considers both the quantity and quality of life generated by healthcare interventions. QALYs are estimated following a three-step approach. The first step requires the need for assessing heath related quality of life experienced by patients from generic instruments such as the EQ-5D. The value tariff of patients’ EQ-5D scores are then converted into a utility index score on a scale from 0 (dead) to 1 (perfect health), with negative values for health states considered worse than death. Finally, the EQ-5D index score is used as the quality adjusted component in the calculation of QALYs, which are estimated by multiplying the time spent in each health state over time with its corresponding utility value.

The EQ-5D is widely used as a measure of health in economic evaluations and is designed for self-completion by the respondent. NICE1 and other reimbursement agencies2,3 recommend the use of the EQ-5D, as it is a generic quality of life instrument that can be applied to a wide range of health conditions. Therefore, the EQ-5D is commonly included in trials such as those funded by the National Institute for Health Research (NIHR), such as the REFORM (REducing Falls with Orthoses and a Multifaceted podiatry intervention) trial (ISRCTN68240461).

There are currently two versions of the EQ-5D that researchers can use for adults; the original EQ-5D-3L (five dimensions of health with three levels of problems); and a new EQ-5D-5L more complex version (same five dimensions of health with increased five levels of problems). The EQ-5D-3L has an associated utility value set based on estimates of the preferences of the UK general population.4 There is also a value set for the EQ-5D-5L available for England5; in addition, utilities for the 5L can be derived using the crosswalk by van Hout et al.6

Following on the publication of the English value set, NICE, in collaboration with the Department of Health and Social Care, commissioned a review to evaluate the quality of the data and the modelling techniques used to derive the EQ-5D-5L valuation set for England. While the EQ-5D-5L valuation set is under review, NICE supports funders to use the 5L version of the EQ-5D to collect data on health related quality of life in randomised controlled trials (RCTs), and recommends that utility values should be calculated using the crosswalk developed by van Hout et al (2012).6

Although the EQ-5D-5L implies an improvement of the descriptive system,7 it remains important to explore the use of the new EQ-5D-5L in clinical trials and its potential to improve the sensitivity of the original 3L and reduce ceiling effects. The aim of our study is to compare the use of the EQ-5D-5L to the EQ-5D-3L in the context of the REFORM trial. We compared both versions of the EQ-5D instrument in terms of their feasibility, level of consistency, ceiling effect and discriminatory power. We also investigated the differences in the utility values generated by both valuation systems for the participants in the trial.

Methods

REFORM study design and participants

The REFORM trial was a pragmatic multicentre cohort RCT in England and the Republic of Ireland. The design involved the recruitment of an observational cohort from which eligible, consenting participants (≥65 years old) were randomised into the trial to receive either a podiatry intervention (n = 493), including foot and ankle strengthening exercises, foot orthoses, new footwear if required, and a falls prevention leaflet, or usual podiatry treatment plus a falls prevention leaflet (n = 517). Recruitment took place through 37 NHS podiatry clinics in primary or secondary care between October 2012 and August 2014. Participants were ineligible if they were <65 years of age; reported neuropathy, dementia, or another neurological condition; were unable to walk household distances; had a lower limb amputation; or were unwilling to attend their podiatry clinic for a REFORM appointment. Participants in the cohort were eligible for inclusion in the trial if they had had a fall in the past 12 months, or a fall in the past 24 months requiring hospital attention or reported worrying about falling. The same eligibility criteria applied to this substudy. The primary outcome was the incidence rate of self-reported falls per participant in the 12 months following randomisation. Secondary outcomes included: proportion of fallers and those reporting multiple falls, time to first fall, fear of falling, Frenchay Activities Index, Geriatric Depression Scale, foot pain, health related quality of life, and cost-effectiveness. The study (and this sub study) was approved by the East of England National Research Ethics Committee. The REFORM protocol,8 and trial results in terms of effectiveness and cost-effectiveness results have been published elsewhere.9,10

The participants were asked to complete a consent form to indicate they wished to take part in the study; they were also informed about the opportunity to take part in this EQ-5D sub-study. The EQ-5D data used for this analysis were collected from a sample of participants in the REFORM trial (n = 151); hence the dataset was a pre intervention dataset. Each participant was asked to complete a baseline questionnaire which included both the 3L and 5L versions of the UK English language EQ-5D. Participants completed in order: (1) EQ-5D-5L questions and associated visual analogue scale (EQ-VAS); and then (2) EQ-5D-3L and associated EQ-VAS. The 5L version was administered first to avoid participants not using the ‘in-between’ levels 2 and 4 of the EQ-5D-5L.11

Instruments

EuroQol

The EQ-5D comprises of two parts: classification of five dimensions of health (mobility, self-care, usual activities, pain/discomfort, and anxiety/depression) and a visual analogue scale (VAS), which records participant’s overall evaluation of their health on a scale from 100 (best imaginable health) to 0 (worst imaginable health). The 3L classification system considers three severity levels (1 = no problems, 2 = some/moderate problems, 3 = severe problems) which defines 243 health states ranging from 11111 (full health) to 33333 (worst health). In the new EQ-5D-5L version the level of each of the five domains has been increased from 3 levels to 5 levels (1 = no problems, 2 = slight problems, 3 = moderate problems, 4 = severe problems and 5 = unable/extreme problems) and also standardised wording across dimensions. In addition, in the mobility dimension the response option “confined to bed” has been replaced with “unable to walk” as the most severe level. Therefore, the number of possible health states has increased for the 5L to 3,125 ranging from 11111 (full health) to 55555 (worst health).

Other self-reported health outcomes

The self-reported presence of ten long-term health conditions (Diabetes, Arthritis, Osteoporosis, ALS/Lou Gehrig’s disease, Multiple Sclerosis, Parkinson’s disease, Huntingdon’s disease, Alzheimer’s disease, Depression and Dementia) was collected at baseline for REFORM participants. Besides, participants also completed other related health instruments, including the Frenchay Activities Index (FAI)12 and the Geriatric Depression Scale (GDS).13 The FAI is a 15-item instrument to assess a broad range of activities of daily living by recording the frequency with which each item or activity was undertaken over the previous three or six months. The scale provides a summed total score from 15 to 60 (higher scores represent higher activity levels). The GDS is a 15-item scale to assess geriatric depression, with each item representing a negative state of mind, with which respondents either agree or disagree. The GDS score ranges from 0 to 15 (higher scores represent greater levels of depression), with a scores greater than five considered the threshold for mild depression and scores greater than 10 considered the threshold for severe depression.

Data analysis

Feasibility

The feasibility was explored in terms of missing responses. For each EQ-5D version, we assessed the proportion of participants with (i) completely missing responses (all five dimensions missing); (ii) partially missing responses (at least one domain completed); and (iii) missing responses by domain. We also computed the proportion of participants not completing the VAS.

Redistribution analysis

The distribution of the 3L and 5L responses was shown in terms of percentage of each level reported. The redistribution patterns of the responses from the 3L to 5L for each dimension were also reported in terms of percentage. An inconsistent 3L response by the same REFORM participant was defined as being at least two levels away from the 5L response given. The mapping between 3L and 5L responses was recorded, and the mean size of inconsistencies calculated following the redistribution diagram and inconsistency weights proposed by Janssen et al14 (Table 1). We analysed the redistribution properties of the consistent response pairs as the proportion of the 3L-5L response pairs within each 3L response level (e.g. 1, 2 and 3) and the corresponding mean and median VAS values.

Table 1. Redistribution diagram and inconsistency weights for the relationship between the 3-5L, described by Jassen et al.14

EQ-5D Response Level (5L)
12345
EQ-5D Response Level (3L)1ConsistentConsistentInconsistent (weight 1)Inconsistent (weight 2)Inconsistent (weight 3)
2Inconsistent (weight 1)ConsistentConsistentConsistentInconsistent (weight 1)
3Inconsistent (weight 3)Inconsistent (weight 2)Inconsistent (weight 1)ConsistentConsistent

Ceiling effect

Assuming the elderly REFORM participants had at least slight or some problems in one or more dimensions of health, we compared ceiling effects between both systems by estimating the proportion of participants responding with no problems in each dimension and also the proportion of participants reporting problems in every domain (i.e. 11111). We would expect the 5L to have a lower ceiling effect compared with the 3L. We estimated absolute ceiling effect (i.e. difference between the proportions) and the relative change [(ceiling3L – ceiling5L/ceiling3L) ×100].

Discriminatory power

A key feature of any health status measure is the ability to discriminate among people at a single point in time.15 Discriminatory power was assessed using the Shannon indices. Unlike floor and ceiling effects, the Shannon indices incorporate the frequency distribution across all categories of a health status classification system.15

The Shannon index is defined as:

H=i=1Cpilog2pi

where H represents the absolute amount of informativity captured, C is the number of possible categories (levels or permutations in this study), and pi = ni/N, the proportion of observations in the ith category (i = 1, … , C), where ni is the observed number of scores (responses) in category i and N is the total sample size. The higher the value of H, the more information that is being captured by the system.15

In the case of a homogeneous distribution, where responses are evenly distributed among categories, an optimal amount of information is captured and H reaches its maximum (Hmax). This equals log2C. If the number of categories (C) is increased, Hmaxwill increase, however H will only increase if the additional categories are utilised.15

Shannon’s Evenness Index (J) is defined as:

J=HHmax

Shannon’s index H can be considered to represent the absolute informativity of a system whereas Shannon’s Evenness index J expresses the relative informativity, regardless of the number of categories.15

Discriminatory power for the EQ-5D-3L and EQ-5D-5L was assessed by dimension and by instrument as a whole. To calculate Shannon’s indices by dimension, levels are treated as categories, so C here would be the number of levels per instrument and pi is the proportion of responses of the ith level. However, to calculate Shannon’s indices by instrument, permutations are treated as unique categories, where C is the number of permutations (Pmax), pi is the proportion of the ith permutation and Hmax now equals log2Pmax.15 For the EQ-5D-3L there are 243 possible permutations, or unique health states, and 3125 for the EQ-5D-5L.

As the number of participants in this study is N = 151, the maximum number of unique health states (243, 3125) could never have been reached a priori, and consequently maximum informativity (Hmax) and maximum relative informativity (J) would never have been reached. Using methods similar to Polinder et al. (2010) the observed number of unique health states in the population are used, not the theoretical possible number of health states.16

The data shows 25 and 67 unique health states for the EQ-5D-3L and EQ-5D-5L respectively. This gives:

(EQ-5D-3L)
Hmax=4.64
(EQ-5D-5L)
Hmax=6.07

Association between multimorbidity and EQ 5D index score

Utility values for the 3L were derived using the UK EQ-5D-3L value set.4 Utility values for the 5L were derived using both the crosswalk by van Hout et al. (2012)6 and the English EQ-5D-5L value set.5 We compared the value sets scores overall and across the different health conditions.

Results

Characteristics of REFORM sub-sample and EQ-5D profile

The sub-sample comprised a total of 151 participants. Characteristics of respondents were comparable in both samples. The mean age of participants for the sub-sample was 78.3 (SD 6.25) years and 56.2% were women, compared to 78.1 years (SD 7.2) and 61.0% female for the main trial. For the sub-sample, 37% had diabetes, 7% were affected by depression, 17% had osteoporosis and 58% suffered from arthritis; compared to 33% diabetes, 9.6% depression, 15% osteoporosis and 58.6% arthritis for the main trial. Table 2 displays the dimension level responses of REFORM participants to both descriptive systems. The proportion of REFORM participants reporting slight problems was higher for each dimension of the 3L compared to the 5L. For instance, the proportion of participants reporting slight problems with mobility was 61% for 3L compared to 30% for 5L; and similarly, for pain/discomfort (74% for 3L vs 44% for 5L).

Table 2. Dimension level responses across the EQ-5D-3L and EQ-5D-5L.

MobilitySelf-CareUsual ActivitiesPain/discomfortAnxiety/depression
3L^5L~3L5L3L5L3L5L3L5L
LevelN%N%N%N%N%N%N%N%N%N%
15839483312081113767852624228181913106719664
291614430281923156544463111074664442284027
300372500128642919117493311117
4--1812--11--128--128--00
5--00--00--00--32--22

Feasibility

The proportion of completed questionnaires (i.e., health dimensions plus VAS score) was higher for the 5L (96.7%) than for the 3L (92.7%). Less than 1% of 5L questionnaires were returned blank compared to 1.3% for the 3L. Unsurprisingly the missing VAS score was higher for the 3L than 5L (4.0% vs 2.0%), as the VAS score appears two times in the questionnaire and participants might have seen this as duplication. The proportion of partially completed questionnaires (i.e., one to four missing domains) was higher for the 3L (3.31%) compared to the 5L (2.64%). Overall missing dimensions were on average 2.2 (1.45%) for the 3L and 2.4 (1.59%) for the 5L. The range of missing values per dimension was similar for both descriptive systems, ranging from 1.32% to 2.65% in mobility for the 5L; and from 1.32% to 1.99% in self-care for the 3L.

Redistribution analysis

Table 3 shows cross tabulations to the 3L and 5L indicating inconsistent responses in italics. On average, the proportion of inconsistent responses was 3.25%. The ‘usual activities’ domain had the highest frequency of inconsistent responses (6.76%) while self-care showed no inconsistency at all.

Table 3. Cross tabulation for EQ-5D-3L and 5L.

EQ-5D 5L
No problemsSlight problemsModerate problemsSevere problemsUnable to/Extreme
EQ-5D 3LMobility
No problems46120*0*0*
Some problems2*3237180*
Confined bed0*0*0*00
Self-care
No problems11270*0*0*
Some problems0*151210*
Unable to0*0*0*00
Usual activities
No problems57164*0*0*
Some problems5*302370*
Unable to0*0*1*50
Pain/discomfort
None14131*0*0*
Moderate5*524750*
Extreme0*0*1*73
Anxiety/depression
None92140*0*0*
Moderate4*25110*
Extreme0*0*0*01

* Inconsistent responses are marked in italics.

Redistribution from 3L to 5L is displayed in Table 4. The “some” responses on the 3L are reassigned between levels 2 to 4 (slight, moderate, severe) on the 5L, while the “severe” responses on the 3L are spread between level 4 (severe) and 5 (extreme) on the 5L. For the level 1 in 3L, there was always a higher proportion of 1→1 than 1→2. The most skewed relative frequency distribution was in self-care (94.1/5.9) and the least in pain/discomfort (51.9/48.1). For the level 2 in 3L, the most evenly spread proportion was in mobility (2→2: 36.8/ 2→3: 42.5/2→4: 20.7). Between 54.8% (pain/discomfort) and 69.4% (anxiety/depression) of the participants reporting level 2 (moderate problems) with the 3L answered 2 (slight problems) or 4 (severe problems) with the 5L. Only a few participants reported severe problems in the 3L, among these, participants chose the fourth level (severe) of the 5L in usual activities, the fifth level (extreme) in 5L for anxiety and depression; and were evenly spread between levels 4 and 5 in 5L in pain/discomfort. Generally, the median and the mean VAS scores decreased as participants move from a better to a worse level of health, indicating valid results for the majority of pair combinations of consistent responses. We observed a level of discrepancy in particular for anxiety/depression, where decreasing median VAS values were slightly higher than expected.

Table 4. Redistribution of consistent responses from the EQ-5D-3L to the EQ-5D-5L.

Dimension3Ln%Pair
3L→5L
n%VAS 5L median*VAS 3L medianDifference 5L – 3L
Mean (SD)
Mobility15839.41→14679.380802.3 (7.4)
1→21220.782.582.5−0.1 (5.1)
28960.52→23236.875750.7 (3.8)
2→33742.565652.2 (7.8)
2→41820.74040−0.7 (6.8)
3003→400---
3→500---
Self-care111981.01→111294.180801.9 (6.6)
1→275.968650.7 (7.8)
22819.02→21553.65059− 1.5 (6.7)
2→31242.94547.5− 0.7 (6.3)
2→413.625250
3003→400---
3→500---
Activities17752.01→15778.185802.0
1→21621.97574.5−0.4 (7.1)
26544.02→23050.075751.7 (5.4)
2→32338.35050−0.9 (6.9)
2→4711.745453.1 (5.0)
364.03→45100.040371.8 (2.1)
3→500---
Pain/discomfort12819.01→11451.980790.6 (1.7)
1→21348.185852.5 (4.8)
210973.62→25250.080802.2 (6.7)
2→34745.266.5670.4 (6.1)
2→454.850500.8 (7.2)
3117.43→4770.03040−4.3 (7.9)
3→5330.040390.3 (5.6)
Anxiety/depression110671.61→19286.880801.3 (6.7)
1→21413.27574.5−0.6 (6.4)
24127.72→22569.460650.8 (6.7)
2→31130.645402.7 (3.3)
2→400---
310.673→400---
3→51100.020200

* Dimension-specific rating scale values were not available. Therefore, we used VAS scale as comparator.

Ceiling effect

Self-care and anxiety/depression were the domains showing the highest percentage of “no problems” in both the 3L and the 5L; while Pain/Discomfort was the domain showing the lowest percentage of “no problems” (Table 5). Moving from the 3L to the 5L showed a decrease of “no problem” respondents for all domains; where Pain/Discomfort showed the highest relative reduction in ceiling effect. 17 participants (11.26%) reported full health with the 3L (11111), and when moving to the 5L these participants reported: 41.2 % perfect health; 35.3% scored ‘slight problems’ in one dimension; 11.8% scored ‘slight problems’ in two dimensions; 5.9% scored ‘moderate problems’ in one dimension; and 5.8% ‘slight’ in two dimensions and ‘moderate’ one dimension. Conversely of 9 participants (5.96%) that reported full health on the 5L only 2 participants reported some problems in one dimension of the 3L (pain/discomfort). The percentage of REFORM participants that reported full health decreased from 11.2% with the 3L to 6% with the 5L, indicating an absolute ceiling reduction of 5.2 and a relative ceiling reduction of 47.1% on the full profile.

Table 5. Proportion of “no problem” responses for both descriptive systems.

EQ-5D-3LEQ-5D-5LCeiling effect reduction
n (%)n (%)AbsoluteRelative (%)
Full Health (11111)17 (11.2)9 (6.0)5.247.1
Mobility58 (38.9%)48 (32.7%)6.316.1
Self-care120 (81.1%)113 (75.8%)5.26.5
Usual-activity78 (52.4%)62 (41.6%)10.720.5
Pain/discomfort28 (18.8%)19 (12.8%)6.032.1
Anxiety/depression106 (71.1%)96 (64.4%)6.79.4

REFORM participants were recruited from NHS podiatry clinics. Participants were eligible for inclusion in the trial if they had had a fall in the past 12 months, or a fall in the past 24 months requiring hospital attention or reported worrying about falling at least some of the time in the 4 weeks prior to completing their baseline questionnaire. Therefore, the ceiling effect was estimated assuming that none of REFORM participants were in full health.

Discriminatory power

The Shannon’s indices were calculated both by dimension and for each instrument as a whole. The results, by dimension, are shown in Table 6. It can be shown that the absolute informativity (H) increased for each dimension when moving from the 3L to the 5L. The relative informativity (J’) increased for all dimensions, except ‘anxiety/depression’.

Table 6. Shannon’s indices for the EQ-5D-3L and EQ-5D-5L by dimension.

EQ-5D-3L
Hmax=1.58
EQ-5D-5L
Hmax=2.32
HJHJ
Mobility0.960.611.920.83
Self-care0.700.441.060.46
Usual activities1.200.761.800.78
Pain/discomfort1.050.671.830.79
Anxiety/depression0.910.581.280.55

H′ represents the absolute amount of informativity captured. J′ represents the relative informativity of a system, regardless of the number of categories.

The Shannon Index and Evenness Index were also calculated for each instrument as a whole. When calculating the Evenness Index (J′) Hmax, varied based on the number of unique health states found in our sample of participants. The results are shown in Table 7. Again, the absolute informativity (H) increased when moving from the 3L to the 5L. This is to be expected due to the increase in levels. However, the relative informativity (J) also increased, indicating that the extra levels were being used and providing information.

Table 7. Shannon’s Indices for the EQ-5D-3L and EQ-5D-5L overall.

EQ-5D-3LEQ-5D-5L
H3.915.48
Hmax4.646.07
J0.840.90

H′ represents the absolute amount of informativity captured. J′ represents the relative informativity of a system, regardless of the number of categories.

Association between multimorbidity and EQ 5D index score

At the overall level, the EQ-5D-5L scores were significantly higher than both EQ-5D-3L and crosswalk (Table 8). The mean difference between the EQ-5D-5L and EQ-5D-3L values was 0.091 (range -0.345 to 0.505); whilst the mean difference between the EQ-5D-5L and the crosswalk values was 0.082 (range -0.035 to 0.293). As expected, the percentage of states worse than dead was lower for the EQ-5D-5L. Overall long term health problems are associated with a reduction in utility values for all used value sets except among patients with diabetes; even though the decrements associated with specific health problems were different for both value sets.

Table 8. Utility scores according to 3 systems.

REFORM SampleNEQ-5D-3LCrosswalkEQ-5D-5LSignificance (p-value)
Mean (sd)RangeSWD %Sever orderMean (sd)RangeSWD %Sever orderMean (sd)RangeSWD %Sever order3L-5L5L-Crosswalk
Overall1510.679 (0.24)1 to −0.1813.30.684 (0.20)1 to −0.0422.00.766 (0.20)1 to 0.01600.00000.0000
Arthritis880.601 (0.26)1 to −0.1815.740.608 (0.21)0.879 to −0.0423.440.69.3 (0.22)0.937 to 0.016030.00000.0000
Diabetes560.681 (0.23)1 to −0.0163.660.695 (0.19)1 to −0.0211.860.774 (0.18)1 to 0.168060.00000.0000
Osteoporosis250.582 (0.28)1 to −0.0168.030.595 (0.24)0.837 to −0.0234.030.700 (0.22)0.115 to 0.937040.00000.0000
Depression100.500 (0.34)1 to −0.18111.110.413 (0.28)0.75 to −0.0422010.479 (0.29)0.833 to 0.016010.00000.0093
> 1 condition780.650 (0.24)1 to −0.1816.450.654 (0.20)1 to −0.0423.950.736 (0.20)1 to 0.016050.19880.0000
> 2 condition1230.544 (0.29)1 to −0.18110.420.556 (0.24)0.837 to −0.0426.120.646 (0.24)0.937 to 0.016020.26350.0000

Discussion

The aim of our study was to compare the performance of the new EQ-5D-5L in the context of the REFORM trial in terms of feasibility, ceiling effect, redistribution properties and discriminatory power. Results showed that both descriptive systems showed good feasibility. Redistribution was confirmed indicating valid results for most pair combinations of consistent responses; showing that the elderly population in the REFORM trial were able to consistently respond to both the 3L and the 5L. Compared to the 3L, the 5L reduced ceiling effects. Likewise, the 5L had a higher absolute discriminatory power in all five dimensions; similarly, the relative discriminatory power was slightly better in the 5L than the 3L. The EQ-5D-5L scores were significantly higher than both EQ-5D-3L and crosswalk.

It could be argued that the completion of the 5L and the 3L was not randomised, which may have affected our study by the way participants completed the first and the second instrument. However, the same study design has been followed in other studies as this has been proved to lead to more efficient design and avoid order bias. Overall, our results are supported by other studies; ceiling effect reductions for the 5L were expected and similar to the reductions found in other studies,11,17-19 which confirm the 5L would be an adequate measure of health related quality of life among the elderly who suffer from two or more chronic conditions. As in previous studies,11,18-20 we found valid redistribution showing that the largest impact of the addition or two extra intermediate levels is to spread the “some” responses on 3L between levels 2 to 4 on the 5L. Regarding discriminative properties we also found a greater discriminative ability of the new 5L descriptive system.11,18,21

When we assessed feasibility, we observed that the amount of invalid questionnaires is lower for the 5L. This observation might have helpful implications for handling missing data in RCTs. Missing data are a frequent problem in RCTs irrespective of how well designed the data collection is; and the challenge is generally even greater for economic data. QALYs are cumulative measures hence missing dimensions on EQ-5D at one follow-up point implies that the aggregate variable (e.g. total QALYs over the trial) is also missing. Missing data can produce different cost-effectiveness results and alter decisions on the value for money of healthcare interventions. Therefore, considering this finding, the use of the EQ-5D-5L may be preferable to the EQ-5D-3L to lower the impact of missing data in cost-effectiveness analysis conducted alongside clinical trials.

Our results highlight differences in the utility values depending on the value set used in the analysis. Results showed the 5L shifts mean utilities up in the utility scale towards full health and the overall range of values is smaller compared to the 3L when administered to an elderly population. The same pattern is observed across all long-term health conditions for REFORM participants, which is more accentuated for depression. The differences in utilities produced by the UK 3L, the crosswalk and the England 5L value sets are supported by other studies.20,22 Values for the 5L are expected to be higher because the 3L value set has a lower minimum value and a larger proportion of states that are considered worse than dead compared to the 5L. The economic evaluation conducted alongside the REFORM trial used the EQ-5D-3L; the intervention was found to be cost-effective with a marginal gain in QALYs compared to usual care. However, we cannot compare the impact on cost-effectiveness decisions of moving from the 3L to the new descriptive system, as the 5L was only collected at baseline. Therefore, the implications of these findings for the decision-making process by NICE remain unclear. A recent simulation based study showed that 3L and 5L can produce substantially different estimates of cost-effectiveness in a number of health conditions, severities and technologies.22 The authors concluded that interventions that improve quality of life are more likely to be considered less cost-effective if the 5L is used, while the cost-effectiveness of interventions driven by mortality rather than morbidity would be improved if the 5L is used in place of the 3L. REFORM cost-effectiveness is not driven by mortality therefore it might be possible that shifting to the 5L would have made the intervention become less cost-effective. However, this is difficult to predict. NICE’s Decision Support Unit has looked at the differences between both value sets.23 In particular the weight given to mobility decreased in the 5L relative to the 3L; therefore, as the REFORM intervention focused on mobility we would expect lower QALYs using the EQ-5D-5L. However, at the time of writing, the impact of adopting the EQ-5D-5L value set in England is still unclear.

Conclusion

In this clinical trial involving an elderly population both the EQ-5D-3L and the EQ-5D-5L showed good feasibility. However, the use of the 5L reduced the ceiling effects and improved discriminatory power. Likewise, the 5L instrument is likely to lower the problem of missing data in cost-effectiveness analysis. Further research is required to explore the impact of using the new EQ-5D-5L value set on estimates of QALYs gained.

Data availability

Underlying data

Full underlying (non-aggregated) data cannot be made publicly available since the ethics approval of this study does not cover openly publishing non-aggregated data.

In order to access this data, it must be requested from the corresponding author. Data requestors will have to provide: i) written description and legally binding confirmation that their data use is within the scope of the study; ii) detailed written description and legally binding confirmation of their actions to be taken to protect the data (e.g. with regard to transfer, storage, back-up, destruction, misuse, and use by other parties), as legally required and to current national and international standards (data protection concept); and iii) legally binding and written confirmation and description that their use of this data is in line with all applicable national and international laws (e.g. the General Data Protection Regulation of the EU).

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 27 Sep 2021
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Corbacho B, Keding A, Chuang LH et al. Comparison of the EQ-5D-5L and the EQ-5D-3L using individual patient data from the REFORM trial [version 1; peer review: awaiting peer review] F1000Research 2021, 10:974 (https://doi.org/10.12688/f1000research.54554.1)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status:
AWAITING PEER REVIEW
AWAITING PEER REVIEW
?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 27 Sep 2021
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.