A head-to-head comparison of EQ-5D-5 L and SF-6D in Chinese patients with low back pain

Ye, Ziping; Sun, Lihua; Wang, Qi

doi:10.1186/s12955-019-1137-6

Research
Open access
Published: 11 April 2019

A head-to-head comparison of EQ-5D-5 L and SF-6D in Chinese patients with low back pain

Ziping Ye¹,
Lihua Sun¹ &
Qi Wang²

Health and Quality of Life Outcomes volume 17, Article number: 57 (2019) Cite this article

4508 Accesses
28 Citations
1 Altmetric
Metrics details

Abstract

Background

The comparative performance of the 3-level EuroQol 5-dimension and Short Form 6-dimension (SF-6D) has been investigated in patients with low back pain (LBP). The aim of this study was to explore the performance including agreement, convergent validity as well as known-groups validity of the 5-level EuroQol 5-dimension (EQ-5D-5 L) and SF-6D in Chinese patients with LBP.

Methods

Individuals with LBP were recruited from a large tertiary hospital in China. All subjects were interviewed using a standardized questionnaire including the EQ-5D-5 L, 36-item Short Form Health Survey (SF-36), the Oswestry questionnaire and socio-demographic questions from June 2017 to October 2017. Agreement was evaluated by intra-class correlation coefficients (ICCs) and Bland–Altman plots. Spearman’s rank correlation coefficients were applied to assess convergent validity. For known-groups validity, the Mann–Whitney U test or Kruskal-Wallis H test were used, effect size (ES) and relative efficiency (RE) were also reported. The efficiency of detecting clinically relevant differences was measured by receiver operating characteristic (ROC) curves between pre-specified groups based on Oswestry disability index (ODI), ES and RE statistics were also reported.

Results

Two hundred seventy-two LBP patients (age 38.1, 38% female) took part in the study. Agreement between the EQ-5D-5 L and the SF-6D was good (ICC 0.661) but with systematic discrepancy in the Bland–Altman plots. In terms of convergent validity, most priori assumptions were more related to EQ-5D-5 L than SF-6D, but MCS derived from SF-36 was more associated with SF-6D. EQ-5D-5 L demonstrated better performance for most groups except location and general health grouped by the general assessment of health item from SF-36. Furthermore, when we applied ODI as external indicator of health status, the area under the ROC curve for EQ-5D-5 L was larger than that for the SF-6D (0.892, 95% CI 0.853 to 0.931 versus 0.822, 95% CI 0.771 to 0.873), the effect size was 0.63 for EQ-5D-5 L and 0.44 for SF-6D, and it was proved that EQ-5D-5 L was 42% more efficient than SF-6D at detecting differences measured by ODI.

Conclusions

Both EQ-5D-5 L and SF-6D are valid measures for LBP patients. Even though these two measures had good agreement, they cannot be used interchangeably. The EQ-5D-5 L was superior to the SF-6D in Chinese low back pain patients in this research, with stronger correlation to ODI and better known-groups validity. Further study needs to evaluate other factors, such as responsiveness and reliability.

Background

Low back pain (LBP) is a common condition that can cause severe activity impairment and physical limitations [1]. Among employees of China, the prevalence of LBP is around 42.7–72.0%, which makes LBP the most common cause of physical disability [2, 3]. As an incapacitating disease, LBP is related to significant reduction in health-related quality of life (HRQoL) [4]. Hence, a valid and reliable HRQoL measure is needed to evaluate interventions or programs for LBP, and inform resource allocation decisions.

In general, HRQoL can be assessed using either disease-specific or generic instruments. The generic instruments can be in turn subdivided into: preference-based and non-preference based. The main benefit of generic preference-based measures is their broad range of health dimensions, which makes the comparisons of various disease, interventions and health programs possible [5]. Besides, generic preference-based measures provide a general estimate of health outcomes and can capture survival data in the form of quality-adjusted life years (QALYs), which is largely used as clinical effectiveness indicator [6].

The EuroQol 5-dimension (EQ-5D) is the most frequently used preference-based instrument around world [7]. Due to the high ceiling effects of the three level of EQ-5D (EQ-5D-3 L), a new version of the EQ-5D (known as EQ-5D-5 L) was developed [8]. With increasing availability of national value sets, crosswalk algorithms for converting 3 L scores to 5 L scores and more evidence about better psychometric properties of EQ-5D-5 L, we could observed increased uptake of the EQ-5D-5 L. Since Luo and colleagues [9] developed the scoring algorithm for the EQ-5D-5 L based on Chinese preference, the EQ-5D-5 L is becoming popular in clinical studies in China. The Short Form 6-dimension (SF-6D) is a utility measure from the 36-item Short Form Health Survey (SF-36) [10], which has been considered as one of the most widely used generic measures of HRQoL in clinical trials. A number of studies have explored the performance of EQ-5D and SF-6D in various patient sets, and the results showed that comparative validity and responsiveness differed depending on the target population [11,12,13,14].

The comparative performance of the EQ-5D and SF-6D has been investigated in patients with LBP [15, 16], and it was found that EQ-5D and SF-6D were not interchangeable with the SF-6D largely outperforming the former in terms of measurement characteristics. However, both studies applied the 3-level version of the EQ-5D (EQ-5D-3 L), which was found to possess poor discriminative ability [17] and ceiling effects [18]. Several studies found better psychometric properties for the EQ-5D-5 L compared with EQ-5D-3 L [19,20,21,22]. Therefore, it seems vital to compare the EQ-5D-5 L with SF-6D in LBP patients. Hence, this study attempts to evaluate agreement, convergent validity as well as known-groups of EQ-5D-5 L and SF-6D in patients with LBP.

Methods

Study design and patient recruitment

After being approved by ethics committee, consecutive patients of this cross-sectional study were recruited at the General Hospital of Shenyang Military Area Command in Shenyang city of China from June 2017 to October 2017. The inclusion and exclusion criteria were as follows.

Inclusion criteria: Patients with LBP aged more than 18, with or without the lower limb pain, not experiencing any other coexisting treatments for pain except routine painkilling, understanding and speaking Mandarin; Exclusion criteria: patients with coexisting infection, malignancy, severe spinal cord disease or inflammatory joint disease; patients with myocardial infarction, cerebrovascular events, chronic lung disease, kidney disease or severe mental illness; pregnant women.

Confidence intervals were used to estimate the sample mean using following equation [23]:

$$ \mathrm{n}=\frac{\sigma^2}{{\left[\frac{\omega }{1.96}\right]}^2} $$

ω is the margin of error, σ is the outcome variable standard deviation (assumed to be the same under the null and alternative hypotheses). We wish ω to be 0.03 for all measures, σ = 0.238 for EQ-5D-5 L [24], σ = 0.152 for SF-6D [15], σ = 0.2026 for ODI [15], which gives an estimated sample size for the survey of n = 242, 98 and 176 for EQ-5D-5 L, SF-6D and ODI respectively. Assuming an 80% response rate to the survey, we would like to interview 300 LBP patients.

The diagnosis of LBP was based on the imaging information, physical examination as well as patients’ complaints of LBP. As all the questionnaires used in this survey were verified, no pilot or pre-testing survey was performed. After submitting formal consent, every patient was questioned by the same interviewer. The interviewer was trained to conduct the survey in the same manner. At outpatient clinics, individuals were interviewed in the waiting room after consultation; at inpatient clinics, the survey was implemented in the sickroom before operation. The questions of the survey were organized in the following order: socio-demographic queries, Oswestry disability questionnaire, questions regarding the EQ-5D-5 L and SF-36. The interviewer, procedure, and questionnaire were the same for all patients.

Instruments and measures

EQ-5D-5 l

The EQ-5D-5 L contains two parts that assesses health status of respondents on the day of interview [8]. The first part is a descriptive system with five items (mobility, self-care, pain/discomfort, usual activities, and anxiety/depression), every item has five different levels of severity. Theoretically, the EQ-5D-5 L can define 3125 different health states. In accordance with the Chinese scoring algorithm [9], the EQ-5D-5 L gives a score from − 0.39 to 1 where 1 is the best possible health state. The other part of EQ-5D-5 L is a visual analogue scale (EQ-VAS), asking interviewees to mark their present health status on a 20 cm vertical scale from 0 to 100. The simplified Chinese version of EQ-5D-5 L in our research is approved by the EuroQol Group.

SF-36 based SF-6D

The SF-6D is an utility measure which was derived from the SF-36 [10]. Health status here is defined in terms of 6 dimensions (physical functioning, role limitation, social functioning, pain, energy and mental health), with each dimension having four to six levels. There are potentially 18,000 different health states. A value set for general population in Hong Kong [25] was used to estimate utility index for the SF-6D in this study. Utility score of SF-6D can range from 0.315 to 1.00. As recommended by previously published research [26], SF-36v2 was used as questionnaire when the survey was conducted instead of applying SF-6D as an independent instrument. The official version of SF-36 in simplified Chinese was authorized by QualityMetric [27].

Oswestry disability index

The Oswestry Disability Index (ODI) [28, 29] is an instrument measuring degree of disability in people with LBP. This questionnaire contains 10 items, including intensity of pain, personal care, lifting, walking, sitting, standing, sleeping, sex life, social life, and traveling. Each item is followed by 6 different levels, with scores from 0 (the least disability) to 5 (the most severe disability). The sum of all item scores is needed to transform into a 0 to 100% index. Patients with scores between 0 and 20% have minimal disability, 21 to 40% moderate disability, 41 to 60% severe disability, 61 to 80% unable to walk which was always defined as crippled, and 81 to 100% [30] bedbound or overstating their symptoms. Previous studies found the item about “sex life” culturally inappropriate for Chinese citizens [31]. Hence, we applied only 9 items in the ODI. The Chinese version of the ODI was an official version from Mapi Research Trust.

Statistical analysis

Patient characteristics and descriptive statistics

Only patients who completed all questionnaires were included in this analysis, we did not perform further imputation for missing scores. Continuous variables were reported as means and standard deviations (SD), frequencies and proportions were used for categorical variables. Descriptive statistics (mean, SD, median, inter-quartile range, minimum and maximum) for the ODI, EQ-5D-5 L and SF-6D were computed. Floor and ceiling effects for EQ-5D-5 L and SF-6D were evaluated by calculating the proportion of sample in the worst and best possible health states. Statistical analysis was conducted using IBM SPSS version 23.0 [32].

Agreement between the EQ-5D-5 L and SF-6D

When we repeat measurements by each of two methods on the same subjects, agreement analysis is essential to see whether they agree sufficiently for one method to replace the other one [33]. Both EQ-5D-5 L and SF-6D are measures for health utility, even though the EQ-5D-5 L has a possible range of − 0.39 to 1.00, while the SF-6D has a range of 0.315 to 1.00. Hence, it is necessary for us to know to what degree these two utility measures agree and if it is possible to use these two measures interchangeably in the context of LBP patients in China. Agreement was assessed by intra-class correlation coefficients (ICCs) and Bland-Altman plots. The ICCs were calculated with two-way random effects model using average measures and absolute agreement. The ICCs can range between 0 and 1. An ICC < 0.4 suggests poor agreement, 0.4–0.59 fair, 0.6–0.74 good, and 0.75–1 excellent agreement [34]. Bland-Altman plots were also performed to explore the agreement between these two measures. In this method, the differences between the scores of the two instruments were plotted against the average utility scores [35].

Convergent validity

Following previous research [12, 36,37,38], the size of the correlations was compared for the EQ-5D-5 L and SF-6D scores with the ODI, the EQ-VAS, SF-36 physical (PCS) and mental component summary (MCS). The association was evaluated by Spearman’s rank correlation coefficient, considering 0.9–1.0 as very highly correlated, 0.7–0.9 as highly correlated, 0.5–0.7 as moderately correlated, and 0.3–0.5 as low correlated [39].

Known-groups validity

General known-group validity

EQ-5D-5 L and SF-6D scores were compared across important groups. Therefore, we divided sample by demographic characteristics, duration of pain [40], outpatients and inpatients, the general assessment of health item from SF-36 and EQ-VAS. It was hypothesized that patients with lower utility scores included the elderly [41], females [41], patients with longer duration of disease [36] and lower education [42], patients from rural areas [43] and with lower income [44], even though U-shaped relationships between income and health status were reported in some studies [45].

Age was divided into two groups based on medians [36]. Education level was regrouped into three sub-levels, <=junior school, high school as well as > = college. Income data was divided into four categories: <1000yuan, 1001–3500yuan, 3501–5000 yuan and > 5000 yuan. We categorized the EQ-VAS scores into four groups, with score < 65 as bad health, 65–79 as fair health, 80–89 as good health, and 90–100 as excellent health [46]. To investigate whether dichotomous variables had significant impact on utility scores, Mann-Whitney U-tests were implemented [47]. For polychromous variables, Kruskal-Wallis H tests were used. The effect size (ES) and relative efficiency (RE) statistics were also applied. The ES was calculated using the statistics from above-mentioned tests, which was recommended by a recent published review [48], indicating the percentage of variance in the dependent variable explained by the independent variable. The RE was based on the ratio of statistics from the Mann–Whitney U or Kruskal-Wallis H tests on the EQ-5D-5 L and the SF-6D. The statistic of the SF-6D was the reference. Thus, if the RE was higher than 1, the EQ-5D-5 L was believed to be more efficient for discriminating between known groups than SF-6D.

Efficiency of detecting clinically relevant differences measured by ODI

The efficiency of the EQ-5D-5 L and SF-6D to distinguish clinical relevant change of individuals with LBP was measured using the ES, RE, and receiver operating characteristic (ROC) curves. The utility instrument that creates the largest area under the ROC curve is considered to be the most sensitive measure at detecting differences of external indicator. An area under the curve (AUC) of 1 denotes perfect sensitivity, whereas an area of 0.5 represents less efficient [49]. ODI was applied as external indicator, which classified individuals into five different groups. For more valid outcomes [50], ODI was also dichotomized using different cut-off points.

Results

Patient characteristics and descriptive statistics of ODI, EQ-5D-5 L and SF-6D utility scores

Two hundred seventy-two patients out of 300 (total number of patients who participated in the survey) were included in the research, thus we achieved 91% response rate. 28 individuals were not included in this research for the following reasons: not completing the questionnaires (N = 17) or being too young/too old for the research (N = 11).

Demographic and clinical characteristics are presented in Table 1. The mean age of participants was 38.1 years and the proportion of female was 38%. 69% of sample was from urban population. About 40% of the patients had education of college. Around 28% of patients had income of 1001–3500 Yuan. Most patients had suffered LBP for more than 12 weeks.

Table 1 Patient characteristics (N = 272)

Full size table

The distribution of scores for the ODI, EQ-5D-5 L and SF-6D is displayed in Figs. 1, 2 and 3. Moreover, descriptive statistics of the ODI, EQ-5D-5 L and SF-6D index are demonstrated in Table 2. The mean ODI was 33.1% (SD 0.210) (median 28.9%; IQR (17.8, 44.4%)), with a distribution skewed towards full health. The mean EQ-5D-5 L score was 0.603 (SD 0.336) (median 0.702; IQR (0.438, 0.862)). The distribution of EQ-5D-5 L was skewed towards full health as well, which was akin to ODI. The score range of EQ-5D-5 L was from − 0.39 to 1. The mean scores of SF-6D was 0.593 (SD 0.143) (median 0.567; IQR (0.500, 0.656)) with a distribution more symmetric around its mean. The score range of SF-6D was from 0.320 to 0.960 (Fig. 3).

Table 2 Descriptive statistics of ODI, EQ-5D and SF-6D utility scores, n = 272

Full size table

Floor and ceiling effect were low for all three measurements. EQ-5D-5 L showed a slightly ceiling effect for 0.4% (N = 1), floor effect for 1.1% (N = 3). The ODI and SF-6D yield no ceiling effect, but indicated a small floor effect for 1.1% (N = 3) and 1.5% (N = 4) of the respondents.

Agreement between the EQ-5D-5 L and SF-6D

The agreement between EQ-5D-5 L and SF-6D was good, with ICC of 0.661 (95%CI 0.57–0.733). Considering the fact that ICC might be influenced by scaling differences between the EQ-5D-5 L and the SF-6D, we reanalyzed the ICC after truncating the EQ-5D-5 L index score at 0, results were similar with those without truncation. The Bland–Altman plot (Fig. 4) demonstrated a comparable picture with the ICC, as a mean difference in utility between these two measures of 0.01. The plot showed that approximately 93.8% of the utility scores were between the bounds of agreement (0.52 and − 0.50) (Fig. 4). Particularly, EQ-5D-5 L and SF-6D utility index appeared to be less consistent at the relatively bad health status where sores were outside the limit of the agreement lie. Systematic discrepancy was observed in the mean difference between these two measures, with higher SF-6D scores at low mean utility scores, and higher EQ-5D-5 L scores at high mean utility scores.