A new method for quantitative assessment of hand muscle volume and fat in magnetic resonance images

Friedberger, Andreas; Figueiredo, Camille; Bäuerle, Tobias; Schett, Georg; Engelke, Klaus

doi:10.1186/s41927-020-00170-3

Research article
Open access
Published: 22 December 2020

A new method for quantitative assessment of hand muscle volume and fat in magnetic resonance images

Andreas Friedberger ORCID: orcid.org/0000-0002-5615-8478¹,
Camille Figueiredo²,
Tobias Bäuerle³,
Georg Schett² &
…
Klaus Engelke¹

BMC Rheumatology volume 4, Article number: 72 (2020) Cite this article

2253 Accesses
3 Citations
Metrics details

Abstract

Background

Rheumatoid arthritis (RA) is characterized by systemic inflammation and bone and muscle loss. Recent research showed that obesity facilitates inflammation, but it is unknown if obesity also increases the risk or severity of RA. Further research requires an accurate quantification of muscle volume and fat content.

Methods

The aim was to develop a reproducible (semi) automated method for hand muscle segmentation and quantification of hand muscle fat content and to reduce the time consuming efforts of manual segmentation. T1 weighted scans were used for muscle segmentation based on a random forest classifier. Optimal segmentation parameters were determined by cross validation with 30 manually segmented hand datasets (gold standard). An operator reviewed the automatically created segmentation and applied corrections if necessary. For fat quantification, the segmentation masks were automatically transferred to MRI Dixon sequences by rigid registration. In total 76 datasets from RA patients were analyzed. Accuracy was validated against the manual gold standard segmentations.

Results

Average analysis time per dataset was 10 min, more than 10 times faster compared to manual outlining. All 76 datasets could be analyzed and were accurate as judged by a clinical expert. 69 datasets needed minor manual segmentation corrections. Segmentation accuracy compared to the gold standard (Dice ratio 0.98 ± 0.04, average surface distance 0.04 ± 0.10 mm) and reanalysis precision were excellent. Intra- and inter-operator precision errors were below 0.3% (muscle) and 0.7% (fat). Average Hausdorff distances were higher (1.09 mm), but high values originated from a shift of the analysis VOI by one voxel in scan direction.

Conclusions

We presented a novel semi-automated method for quantitative assessment of hand muscles with excellent accuracy and operator precision, which highly reduced a traditional manual segmentation effort. This method may greatly facilitate further MRI image based muscle research of the hands.

Peer Review reports

Background

Rheumatoid arthritis (RA) is a chronic disease characterized by inflammation of the synovial membrane (synovitis) and loss of bone, cartilage and muscle [1]. In contrast to bone, muscle is not yet in the spotlight of RA research, although studies show that muscle atrophy is a hall mark of RA [2, 3]. Compared to healthy controls, RA patients have an accelerated loss of fat-free skeletal muscle, so called rheumatoid cachexia [4] and impaired hand motor performance [5], probably caused by muscle fiber degeneration.

It is well known that obesity is associated with systemic inflammation, because adipose tissue is a source of pro-inflammatory cytokines and triggers inflammatory responses [6, 7]. In patients with RA, muscle fat infiltration of the thigh contributes to low physical function and activity [8]. Another study reported lower calf muscle area and density in RA patients compared to controls and also found associations between greater joint destruction and greater muscle deficits. Whether these findings can be explained by a cause-effect relation between muscle fat infiltration and RA is a pending question [9], as so far the interrelationships of inflammation and muscle dysfunction [10] in RA is poorly understood.

Obviously, the quantification of muscle volume and fat content is the first step to address this question. The hand with its multitude of joints and associated synovial fluids is usually most severely affected by RA and is probably the prime target for the investigation of RA related muscle and fat characteristics. However, the small volume and multiple different tissues like bone, tendons, vessels and muscle require a sophisticated segmentation for the quantification of muscle volume and adipose tissue in the hand.

Manual segmentation is tedious and for complicated tasks can take hours. In contrast, a fully automated segmentation may be desirable to supersede the need for manual interactions, but in our experience rarely works in elderly subjects, in which pathological conditions often define most of the population variance. In the hand many small and rapidly changing anatomical structures further complicate a segmentation. In particular, in clinical studies, a careful review of any automatic segmentation process with the options of manual editing by experts is typically required. Thus, in this paper we describe a method, which provides an automated initial segmentation, with two aims:

To significantly reduce overall processing time compared to a standard manual slice by slice based segmentation approach
To provide manual editing, but to limit these interactions to a minimum in order to improve reanalysis precision.

The core of the automated process is based on a random forest algorithm running on standard clinical T1 weighted magnetic resonance (MR) images. It exploits the characteristic muscle feature of a distinct grey value throughout the acquired stack of MR images. 3D editing after the automatic segmentation part can be performed using a set of tools developed earlier [11]. Fat quantification is then done using 2-pt Dixon (MR) images of the hand.

To the best of our knowledge such a method for hand muscle segmentation has not been published yet. So far studies on muscle and fat quantification mainly targeted the thigh, where simple threshold based techniques to separate subcutaneous adipose tissue (SAT), muscle and the femoral bone worked well [12]. A more advanced approach used a combination of fuzzy clustering, morphological operations and active snakes to segment the deep fascia lata of the thigh muscles [13] to further differentiate SAT from intermuscular adipose tissue. Another study [14] used a random walk graph-based formulation with incorporated prior knowledge of shape to segment the individual muscles of the thigh.

Methods

Patient details

MR hand scans of 76 outpatients with RA (37 males, 39 females), were acquired at the Rheumatology Outpatient Clinic of the University of Erlangen. Apart from the diagnosis of RA, no other inclusion or exclusion criteria were applied. Patient characteristics are shown in Table 1.

Table 1 Patient characteristics

Full size table

MRI scans

This study utilized two different MR sequences – a T1 weighted scan and a two-point Dixon scan, described in detail below. These two sequences were part of the standard hand protocol of the Rheumatology Outpatient Clinic of the University of Erlangen. The scans and the protocol were not tailored to this study, but rather part of clinical routine and were typically performed once annually. The scans were performed on the dominant hand of the patient. The patient consent form included the agreement to also use these data for research purposes (Ethics approval 52_14B of the Medical Faculty of FAU Erlangen-Nuremberg).

MR imaging was performed on a 1.5-Tesla MR system (MAGNETOM Aera, Siemens Healthcare GmbH, Erlangen, Germany) with a hand/wrist radio frequency 16-channel coil. Subjects were positioned prone with head first.

The protocol consisted of a standard axial fat saturated T_1w turbo spin echo (TSE) sequence (Matrix size: 320 × 320, Voxel size: 0.5 × 0.5 × 3.0 mm³, Slices: 64, TR: 760 ms, TE: 13 ms) and an axial T_2w two point Dixon TSE sequence (Matrix size: 320 × 320, Voxel size: 0.5 × 0.5 × 3.0 mm³, Slices: 30, TR: 3040 ms, TE: 78 ms, TD: 0 ms). Throughout this paper, these two sequences are referred to as T1 weighted and Dixon sequences.

The Dixon sequence produces a fat and water images [15] calculated from two acquired spin echo images - one with water and fat signals in phase, the other out of phase. These two images are used to calculate the pure-water V_water and pure-fat images V_fat. Applying equation

$$ {V}_{ff}=\frac{V_{fat}}{V_{water}+{V}_{fat}}\ast 1000{\mbox{\fontencoding{U}\fontfamily{wasy}\selectfont\char104}} $$

(1)

results in a quantitative fat fraction (FF) image. This image assigns the percentage of fat (Fig. 1b) to every voxel. The fat fraction intensity values range from 0 to 1000, which corresponds to 0% to 1000‰ fat per voxel.

Image processing overview

Muscle segmentation was performed in the T1 weighted scans (Fig. 1a), because the muscle boundary is difficult to detect in Dixon fat images (Fig. 1b). The main component of the automated segmentation was a random forest classifier.

It was trained on 30 randomly chosen datasets, which had been segmented manually by a medical expert (gold standard). The trained RF was subsequently used to segment muscle in all 76 datasets. These results were again reviewed by a clinical expert and edited as necessary. The 30 gold standard datasets were used for the validation of accuracy. 14 of the 76 datasets were used for precision analysis. The two distinct domains of the method are depicted in Fig. 2 and consist of the following:

1.
Random forest training
2.
Segmentation workflow

Pre processing of T1 scans to remove image inhomogeneities
Segmentation of hand cross sectional area (CSA)
Random forest segmentation of hand muscle
Rigid multimodal registration of segmentation mask to Dixon fat image

Random forest training is required once only and learns the RF to detect muscle. The following sub chapters describe these steps in detail.

Random forest training

Random forest (RF) is a well-known ensemble learning method from machine learning [16], but is also widely used for image segmentation [17,18,19]. Before being able to used it for classification, it must be trained on training data.

A RF consists of an ensemble of decision trees of an arbitrary but set number and are trained for a specific problem using training data. The RF input are features, which are calculated for each voxel. Usually a set of different features is created for each voxel. The RF output are labels, which are ‘muscle’ and ‘background’ in our study. Background in our case is everything except muscle, i.e. air but also soft tissue, bone, tendons etc. During the RF training, features and their corresponding labels have to be provided, so the RF is able to learn its decisions. Randomness is introduced by picking a random subset of the available feature-label pairs and a random subset of features for each decision node of each tree. The determination of the optimal RF parameter set (Ƥ_RF), i.e. the number of trees, the maximal tree depth and the number of samples per decision node, was part of the training. The number of used features per node was empirically set to the square root of the number of features as typically used for RF.

Feature description

Features are the image ‘properties’ based on which the RF makes its decision. In this study the following features were used and calculated for each voxel of the T1 image:

Mean grey value and standard deviation of 2D (in-plane) neighborhoods
Mean grey value and standard deviation of 3D neighborhoods
Gradient magnitude by Sobel operator
Extended Local Binary Pattern
Gabor filter

Mean grey values and standard deviations were calculated in voxel neighborhoods with different radii. Since the ratio between in-plane voxel size and slice thickness was 6, 2D and 3D neighborhoods were differentiated. In the 2D case, only neighboring voxels in the slice itself were considered; voxel radii were 1, 2 and 3 in city block distance. In the 3D case, voxels of the two adjacent slices were taken into account, too. In-plane voxel radii were 4, 5 and 6 in city block distance.

Extended Local Binary Pattern (ELBP) are 2D texture descriptors calculated for each slice individually. They are invariant to monotonic intensity changes and require little processing time [20]. Two specific ELBP pattern encode intensity relationships of the grey value of a voxel with the mean grey value of the image (ELBP_CI) and of a neighborhood with a specific radius r around the voxel (ELBP_NI) (Fig. 3). Two other pattern encode radial (ELBP_RD) and angular (ELBP_AD) grey value differences around the voxel. These two radial pattern were calculated using sampling in polar coordinates around the center voxel (Fig. 3). Sample points S were characterized by the radius r and an angle α, S=S(r, α). The angle was determined by the number of samples, which were arranged equidistantly on a circle with radius r. The sample grey values were linearly interpolated. Radial differences (RD) were calculated from sample pairs with same angle but different radii, S(r₁, α_n) and S(r₂, α_n) and angular differences (AD) from samples pairs with same radius but different angles, S(r, α_n) and S(r, α_n + 1). The determination of the optimal ELBP parameter set (Ƥ_ELBP), i.e. the number of samples and the circle radii, was part of the training and described below.

The Gabor filter response for each voxel was calculated from an in-plane kernel around the voxel. In this area, the image grey values were modulated by a 2D sinusoidal wave and convoluted by a Gaussian function. In our case the sinusoidal was rotated in-plane by 0°, 45°, 90° and 135°, resulting in 4 different Gabor filter outputs. These were applied to each slice individually. As for ELBP, the optimal Gabor filter parameter set (Ƥ_GF), i.e. the Gaussian σ, the sinusoidal wavelength (λ) and the kernel size, was determined by the training.

Training

The training workflow is depicted in Fig. 2a. Aim of the RF training is the determination of the optimal parameter set Ƥ, as the union of the above mentioned parameter sets: Ƥ = Ƥ_RF ∩ Ƥ_GF ∩ Ƥ_ELBP. For the training, 30 datasets were randomly selected from the cohort for which a clinical expert manually outlined the hand muscle to provide the correct labels used as gold standard. Since intensities of the same tissue differ among MR scans (even if obtained from the same scanner and corrected by N4ITK, see below), the T1 scans had to be normalized for comparable features across datasets. This normalization was performed according to Eq. (2), where V and V_norm were the original and normalized voxel grey values, and V_muscle and V_cort were the grey values of the peak maxima of cortical bone and muscle in the grey value histogram, respectively:

$$ {V}_{norm}=\frac{V-{V}_{cort}}{V_{muscle}-{V}_{cort}}\ast 100 $$

(2)

The normalized distance between V_muscle and V_cort was chosen to be 100 intensity units. V_muscle and V_cort were obtained by a watershed-like thresholding of the histogram values, until the two maxima were left. The features described in the prior section were calculated using these normalized grey value images.

Ƥ was determined by an iterative process involving 7-fold cross-validation (CV-7). For this purpose, the 30 datasets were split into two subsets with a 6:1 cardinality ratio: a training set of 26 (≈30/7) and a validation set of 4 datasets. Initially, a sensible value range for each parameter of Ƥ was specified, for example, the number of RF trees, was varied between 1 and 150. Afterwards, all parameters of Ƥ were set to the start values, forming a set Ƥ’. Then the RF was trained on the training set with 25 datasets, using Ƥ’. In a subsequent validation step, the RF was used to segment the 4 datasets of the validation set and the resulting segmentation was compared to the gold standard using the Dice ratio as similarity metric. In the end, one of the parameters of Ƥ’ was changed to the next value in the specified range, thus forming a new Ƥ´, with which the RF was trained and validated again, till all possible parameter permutation had been tested.

The training result for each parameter set Ƥ’ was described by the Dice ratio D:

$$ D=\frac{2\left(\left|R\right|\cap \left|M\right|\right)}{\left|R+M\right|} $$

(3)

It quantified the percentage of overlap between the tested RF segmentation of the 4 validation datasets (R) and the corresponding manual segmentation (M). The training finished with picking the parameter set Ƥ’ with the highest average value of the Dice ratio. In our case, a Dice ratio difference of 0.1% meant that on average about 500 voxels were classified differently.

Parameter value ranges

The optimal number of trees was determined by CV-7 using a range of 1 to 150 trees. Similar, the samples per decision node was varied from 1 to 0.00001% of the total number of variables, which in our case resulted in 125,371 to 1 feature-label pairs.

Input parameters for Gabor filters (Ƥ_GF) were size of the 2D Gabor kernel (given by a pixel window of S_x x S_y), wavelength of the sinusoidal wave and standard deviation of the Gaussian. The following parameter ranges were used: Kernel size: 3 × 3 to 26 × 26 pixel window dimension; wavelength: 0.01 mm to 4 mm; standard deviation 0.01 mm to 4 mm.

Parameters for ELBP (Ƥ_ELBP) were circle radius, number of samples on the circle and radius of the second, smaller circle for the spatial relationship. The radius of the second circle was empirically set to half the radius of the first one. CV-4 was used with the following ranges: Radius of the first circle: 0.5 to 5.5; number of samples: 4 to 28.

Segmentation workflow

The trained RF was used to segment all 76 datasets. The segmentation workflow (Fig. 2b) is described in more detail in the following subsections.

Pre processing of T1 scans

MR images are often distorted by bias fields, caused by inhomogeneous magnetic fields of the coils. These image distortions were corrected in a pre processing step using the N4ITK algorithm [21], which is an improvement of the well-known and established N3 (nonparametric nonuniform intensity normalization) approach [22].

Segmentation of hand cross sectional area

The segmentation of the hand CSA was performed by a succession of basic image processing methods: first a threshold was used to roughly divide the T1 weighted image into background and hand. After N4ITK normalization described above, background intensities were around 10 and hand intensities around 500 units. For the particular scanner and MR acquisition protocol used in the study, a threshold of 70 was used. This resulted in an image containing one or more volumes of interests (VOI): one large volume of the hand and, due to image acquisition artifacts, multiple small volumes outside the hand VOI, where intensities were also higher than 70. These VOIs were smoothed by a 3D morphological opening. Finally, the hand VOI, which was always the largest VOI, was extracted.

Hand muscle segmentation

For segmentation, the features described above were calculated for each voxel using the optimal parameters Ƥ_GF and Ƥ_ELBP determined by the RF training. The trained RF (using Ƥ_RF as determined by the RF training) classifies each voxel into muscle and background, leading to a raw (i.e. without post processing) muscle segmentation.

Post processing of muscle segmentation

The raw segmentation was post processed by a morphological dilation with radius 1, followed by island extraction, where islands (directly connected muscle voxels) with a size smaller than 10 voxels were discarded. Finally, a Gaussian function f(μ,σ) was fitted to the grey value histogram of the muscle segmentation mask and voxels with grey values outside the range [μ ± 3σ] were excluded from the segmentation. The underlying idea of this procedure was to include a wider area of voxels into the segmentation mask and then remove wrongly included voxels by the Gaussian fit. The result of the automated procedure is shown in Fig. 4a. The resulting muscle segmentation was reviewed by a clinical expert and manually edited if necessary.

Volume of interests

The volume of interest (VOIs) are hand volume V_H and muscle volume V_M, obtained from the hand CSA and the muscle segmentation mask, respectively. In order to increase their longitudinal and cross sectional comparability, these two VOIs were manually limited to the metacarpal region defined by the metacarpal bone (MCP) III. For this purpose, the clinical expert had to set the proximal and distal MCP III boundaries. This could be achieved by navigating to the corresponding two slices in a transversal view.

Fat quantification

Fat was quantified using the Dixon fat fraction image. Since muscle was difficult to detect in the FF image, the muscle segmentation mask obtained using the T1 scans was transferred to the FF image via multimodal rigid image registration (Fig. 4b). The used similarity metric was mutual information as described by Mattes et al. [23], optimized by the gradient descent method. In the segmentation VOI the average and the absolute fat content was calculated.

Validation of accuracy

Accuracy of the RF based segmentation was determined using the 30 gold standard datasets. Segmentation masks were compared between the manual and the RF approach using three different image metrics: the Dice ratio (Eq. 3), the average surface distance (Eq. 4) and the Hausdorff distance (Eq. 5). The average surface distance is the average of the distances from all points of one to the corresponding closest point of the other surface:

$$ {d}_{avg}=\frac{1}{\left|A\right|}\sum \limits_{a\in A}\underset{b\in B}{\mathit{\min}}\left\{d\left(a,b\right)\right\} $$

(4)

For d the Euclidean metric was used. The Hausdorff distance h is the maximum of the individual distances, i.e. the maximum local distance between the two segmentation masks:

$$ h=\underset{a\in A}{\mathit{\max}}\left\{\underset{b\in B}{\mathit{\min}}\left\{d\left(a,b\right)\right\}\right\} $$

(5)

Reanalysis precision

For the determination of reanalysis precision errors, three operators analyzed 14 random data sets once (interoperator) and one operator analyzed the same 14 data sets three times (intraoperator). Reanalysis precision errors were calculated as root mean square average of standard deviation (RMS_SD) and coefficient of variation (CV) of individual data sets [24]. Precision was calculated for the hand segmentation in the T1 weighted scans, which depended on the manual determination of the MCP III length and potential manual segmentation corrections. Additionally the precision of the registration based fat quantification was calculated.

Implementation details

The method was embedded in the Medical Image Analysis Framework (MIAF, Institute of Medical Physics, Erlangen, Germany). Implementation was done in C++ with the help of the Insight Segmentation and Registration Toolkit (ITK [25]) and the Open Source Computer Vision library (openCV [26]). For the analysis, a computer with a 3.4 GHz quadcore processor and 16 GB RAM was used.