Machine-Learning Based Hybrid-Feature Analysis for Liver Cancer Classification Using Fused (MR and CT) Images

Naeem, Samreen; Ali, Aqib; Qadri, Salman; Khan Mashwani, Wali; Tairan, Nasser; Shah, Habib; Fayaz, Muhammad; Jamal, Farrukh; Chesneau, Christophe; Anam, Sania

doi:10.3390/app10093134

Open AccessArticle

Machine-Learning Based Hybrid-Feature Analysis for Liver Cancer Classification Using Fused (MR and CT) Images

¹

Department of Computer Science & IT, The Islamia University of Bahawalpur, Bahawalpur 63100, Pakistan

²

Institute of Numerical Sciences, Kohat University of Science & Technology, Kohat 26000, Pakistan

³

College of Computer Sciences, King Khalid University, Abha Kingdom 61321, Saudi Arabia

⁴

School of Arts and Sciences, University of Central Asia, Naryn 722919, Kyrgyzstan

⁵

Department of Statistics, Govt S.A Post Graduate College Dera Nawab Sahib, Bahawalpur 63351, Pakistan

⁶

Department of Mathematics, Université de Caen, LMNO, Campus II, Science 3, 14032 Caen, France

⁷

Department of Computer Science, Govt Degree College for Women Ahmadpur East, Bahawalpur 63350, Pakistan

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2020, 10(9), 3134; https://0-doi-org.brum.beds.ac.uk/10.3390/app10093134

Submission received: 2 April 2020 / Revised: 13 April 2020 / Accepted: 16 April 2020 / Published: 30 April 2020

(This article belongs to the Section Computing and Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

:

The purpose of this research is to demonstrate the ability of machine-learning (ML) methods for liver cancer classification using a fused dataset of two-dimensional (2D) computed tomography (CT) scans and magnetic resonance imaging (MRI). Datasets of benign (hepatocellular adenoma, hemangioma, cyst) and malignant (hepatocellular carcinoma, hepatoblastoma, metastasis) liver cancer were acquired at Bahawal Victoria Hospital (BVH), Bahawalpur, Pakistan. The final dataset was generated by fusion of 1200 (100 × 6 × 2) MR and CT-scan images, 200 (100 MRI and 100 CT-scan) images size 512 × 512 for each class of cancer. The acquired dataset was preprocessed by employing the Gabor filters to reduce the noise and taking an automated region of interest (ROIs) using an Otsu thresholding-based segmentation approach. The preprocessed dataset was used to acquire 254 hybrid-feature data for each ROI, which is the combination of the histogram, wavelet, co-occurrence, and run-length features, while 10 optimized hybrid features were selected by employing (probability of error plus average correlation) feature selection technique. For classification, we deployed this optimized hybrid-feature dataset to four ML classifiers: multilayer perceptron (MLP), support vector machine (SVM), random forest (RF), and J48, using a ten fold cross-validation method. MLP showed an overall accuracy of (95.78% on MRI and 97.44% on CT). Unfortunately, the obtained results were not promising, and there were some limitations due to the different modalities of the dataset. Thereafter, a fusion of MRI and CT-scan datasets generated the fused optimized hybrid-feature dataset. The MLP has shown a promising accuracy of 99% among all the deployed classifiers.

Keywords:

liver cancer; Otsu thresholding-based segmentation; fused optimized hybrid features; multilayer perceptron

1. Introduction

The liver is very important and the largest internal organ of the human body that performs very basic functions, such as detoxification of drugs, hormones, protein production, and blood filtration [1]. Liver cancer or haptic cancer is one of the most threatening diseases. Hepatocellular carcinoma (HCC) is the most encountered, at 80% of all hepatic cancers [2]. There are several factors that cause liver cancer, such as alcohol, smoking, obesity, etc. Diagnosis of liver cancer is not easy at an early stage [1]. Medical diagnosis is really important and needs to be performed accurately and efficiently. Sometimes, there is a possibility of mistakes while examining such severe diseases. Therefore, an intelligent system is needed to reduce the probability of error in diagnosis [2].

According to World Cancer Research Fund International, an American institute for cancer research (2018), liver cancer is the fifth most common cancer for men and the ninth most common cancer for women [3]. In 2018, the number of new liver cases that were diagnosed worldwide was 840,000. Among all countries, Mongolia had the highest standardized age rate of 93.7 per 10,000 of liver cancer in 2018, followed by Egypt, with a rate of 32.2 [4]. The “Continuous Update Project” panel considered that there is clear evidence that a greater amount of body fat and consumption of alcoholic beverages and contaminated foods with aflatoxins increase the threat of liver cancer. There is strong evidence that drinking coffee decreases the risk of liver cancer. Being physically active and consuming fish can shrink the risk of liver cancer [4].

Liver is a major parenchymal organ of the human body that performs several functions. This means that a high amount of blood flows through the liver, which makes the liver vulnerable to the occurrence of secondary cancers. The vast majority of liver cancers originate in different organs. For effective treatments, cancers must be localized and identified accurately [5]. Liver cancer is the second most frequent reason of death from cancer and is less common in women than in men. As age increases, the risk of liver cancer increases. Most cases are usually diagnosed in the patients who are above 75 years of age [6]. However, there is a probability that people who belong to less developed areas in Africa and in Asia are more vulnerable to liver cancer at a younger age as compared to people in developed countries (typically around the age of 40). Approximately 83% of liver cancer cases are found in less developed geographical regions. In Europe, the Caribbean, and Latin America, the occurrence of liver cancer is the lowest. The age-standardized rate of this cancer is more than six times higher in Eastern Asia than in Northern Europe [6].

Image processing is an area of computing that is becoming progressively prominent these days. It is applicable in various domains and can be used in the industry to regulate production, in safety systems to assess people’s biometrics, etc. [7]. Computer-aided analysis of medical images contributes a lot in several fields of medicine, such as diagnosis, monitoring or therapy planning [8]. Due to the aging of society and the generality of modern technologies of imaging, the number of medical images that will be processed in clinical practice is increasing [9]. There is a great need for software tools that speed up the analysis of the medical image and make it beneficial and reproductive. An important fragment of the field is the medical image, in which image processing techniques and artificial intelligence are employed to crack problems [9]. This work offers a computerized diagnostic system that adjusts to the field of medical imaging and aims to locate and identify cancers in liver.

Machine learning refers to training a machine to do a specific task (here, image processing) by providing it with a training data set. It incorporates several algorithms and approaches [10]. Among them, we can choose any to analyze, which will provide better results for image classification. Results also depend on the image processing type that you are willing to adapt, as each type has its own inherent properties [7]. For example, there is a high possibility that the cross-entropy loss function could perform better than other loss functions to give better image processing.

1.1. Literature Review

The Reference [11] proposed a study in which liver lesions were classified into malignant and benign using a comparative analysis. Fused fluorodeoxyglucose (FDG) PET/CT and MRI (PET/MRI) were compared with FDG PET/CT and MRI to find which among them is the best. Seventy patients were selected. A comparison was performed between the results of PET/MRI and those of PET/CT and MRI alone. All lesions were detected by MRI and fused PET/MRI, while PET/CT could detect 89.4% of them. The final accuracy obtained by PET/CT, MRI, and fused PET/CT and MRI was 66.7%, 80.0%, and 94.7%, respectively. The Reference [12] proposed a study on the development of an algorithm that uses deep learning for the detection and classification of liver lesions into malignant and benign. An algorithm was trained on a dataset containing 367 ultrasound (US) images. Then, the algorithm was tested using 117 new images. For the given model, the receiver operating characteristic–area under the curve (ROC–AUC) score for focal liver lesion FLL detection was 0.935, and the score for FLL classification was 0.916. The Reference [13] proposed a study that used dynamic contrast enhanced-magnetic resonance (DCE-MR) and T2-weighted images for automatic classification to get better results. In this study, 125 benign and 88 malignant lesions were used. Benign lesions included hemangioma, adenoma, and cyst, while malignant lesions included metastasis and hepatocellular carcinoma. The gray-level co-occurrence matrix, gray-level histogram, and contrast curve texture features were taken out of the T2-weighted and DCE-MR images. Fifty features were selected to feed the tree classifier, which had the highest ANOVA F-score. Overall accuracy obtained was 0.77. The sensitivity/specificity was 0.62/0.77, 0.73/0.56, 0.84/0.82, 0.93/0.93, and 0.80/0.78 for metastasis, HCC, hemangioma, cyst, and adenoma, respectively. The Reference [14] classified hepatic lesions as hemangiomas, cysts, and malignant cancers using feature selection methods based on multiple regions of interest (ROI). In this case, the classification of liver lesions was performed using ultrasound images. This depends to a large extent on certain properties such as internal edge, echogenicity, echo, morphology, and improvement of the subsequent echo. The proposed method achieved improved and stable classification, regardless of the characteristics used. The Reference [15] provided an overview of liver cancers in infants. Two of the most common cancers found in infants were hepatocellular carcinomas and hepatoblastomas. Recently, there has been much advancement in treatment outcomes. On the other side, hepatic cancers in infants are very rare. Four pediatric liver cancer study groups (international childhood liver tumor strategy group (SIOPEL), children oncology group (COG) and Japanese pediatric liver tumors group (JPLT)) joined together to work on these cancers. In this collaboration, a new histopathological agreement sorting of liver cancers in infants was established. Along with all this, some additions in chemotherapy, transplantations, and a web application were made. The Reference [16] introduced an end-to-end deep learning model to discriminate between liver metastases from colorectal cancer and benign cysts in CT images. This approach follows the InceptionV3 architecture and achieved a 96% accuracy result. The Reference [17] proposed an improved automatic classification of liver tumors on three-dimensional (3D) computed tomography (CT) volume images using fuzzy C-Means (FCM) and graph cuts. The liver volume of interest (VOI) was extracted and the region-growing algorithm was used to reduce computational cost and acquire promising results, reducing the processing time. The Reference [18] proposed mammogram segmentation was performed using second-order texture features. A clustering comparison was performed between fuzzy C-Means and K-Means algorithms. The segmentation results were measured in terms of error, such as mean square error (MSE) and root mean square error (RMSE). The Reference [19] employed machine-learning approaches for the classification of focal liver lesions (FLLs) using contrast-enhanced ultrasonography (CEUS). The spatial and temporal features in the arterial, portal, and post-vascular phase, as well as max-hold images and the tumors were classified into benign and malignant [19].

1.2. Contribution

The contribution of this study can be summarized as follows:

A novel segmentation technique is introduced and developed, called Otsu thresholding-based region growing segmentation (OTRGS). This technique includes four steps:

First, the liver CT and MR image is divided into four equal regions. A group of neighboring seeds is used for the formulation of an identifiable region. These seeds are in the shape of an irregular polygon with a variable radius from the center of an image to ensure maximum chances of grouping seeds that belong to the same region. It allows any possible size, dimension, and shape to be considered as a region of interest (ROI). At the post-processing stage, Otsu thresholding-based segmentation is employed on improved segmented regions;
After segmentation, hybrid-feature data are extracted;
Probability of error plus average correlation feature selection technique is employed for optimal hybrid-feature selection;
Finally, the optimal hybrid-feature dataset is deployed to four ML classifiers, and efficient classification accuracy is acquired.

2. Materials and Methods

This research comprises a dataset of six classes of liver cancer which are sorted using CT and MR images. An image dataset with two categories of liver cancer was collected. These two categories are: (i) benign liver cancer, which includes a further three subcategories (hepatocellular adenoma, hemangioma, and cyst); (ii) malignant liver cancer, which includes three subcategories (hepatocellular carcinoma, metastasis, and hepatoblastoma), as shown in Figure 1.

Patients suffering from liver cancer were selected as a source of the dataset. The MRI dataset was collected via a Siemens Essenza 1.5T machine with a resolution of 1–2 mm, and the CT-scan dataset was collected via a Siemens Somatom definition-AS 64 machine with a resolution of 0.5–0.625 mm, available in the radiology department of Bahawal Victoria Hospital (BVH) Bahawalpur [20], Pakistan. For each type, 100 patients were selected to examine their liver cancer using CT scan, and 100 patients were examined using MR images of a size of 512 × 512, and a dataset with a total 1200 (100 × 6 × 2) fused (MR and CT) images of liver cancer patients was acquired. All the images were manually examined by an expert radiologist in the light of different medical tests and biopsy reports. Finally, based on a gold standard/ground truth fused liver cancer image dataset, we proposed a novel Otsu thresholding-based region growing segmentation technique.

Data Fusion: This is a very powerful technique for merging multiple datasets to produce an accurate classification as compared to individual datasets. Depending on the processing stage where the fusion occurs, the data fusion process is often classified as low, medium or high. Low-level data fusion combines multiple raw data sources to create new raw data. The fused data should be more informative and artificial than the original information [21]. In this research, two different types of data modalities (CT scan and MRI) were used. We generated a fused dataset, which is the combination of liver cancer CT-scan and MRI datasets using a data fusion approach.

Modalities’ Importance: MRI is a scan that uses radio waves and magnetic fields to create a detailed image of the soft tissues of the body. A CT scan is a series of X-ray images taken at different angles, and CT uses a computer to create images with X-rays [22]. Both modalities have their own qualities and are equally important, explaining our choice of a fused MRI and CT-scan dataset for the experiments.

2.1. Proposed Methodology

First, proposed algorithm is described with all the procedural steps in Algorithm 1. Then proposed methodology is described in Figure 2.

Algorithm 1. Otsu thresholding-based segmentation and fused hybrid-feature analysis for liver cancer classification proposed algorithm.

Begin

Main {

Input ϵ Liver cancer CT and MR image dataset

For {

Step 1 to Step 9

MRI datasets ϵ six liver cancer types
CT-scan datasets ϵ six liver cancer types
Image preprocessing
Otsu thresholding-based region growing segmentation
Extract hybrid-feature ϵ histogram, co-occurrence matrix, run-length matrix, and wavelet
Fused (MRI and CT-scan) hybrid-feature dataset
Hybrid-feature optimization via probability of error plus average correlation feature selection technique
Extract 10 optimized fused hybrid-feature dataset

End For

}

9.: Machine-learning classifiers are employed on fused hybrid-feature dataset

Output = Liver cancer classification results

}

Now, let us discuss the proposed methodology in detail. The first step consisted of a collection of image datasets with two categories of liver cancer. These two categories are (i) benign liver cancer, which includes a further three subcategories (hepatocellular adenoma, hemangioma, and cyst), and (ii) malignant liver cancer, which includes three subcategories (hepatocellular carcinoma, metastasis, and hepatoblastoma). In this step, for each type, 100 patients were selected to examine their liver cancer using a CT scan, and 100 patients were examined using MR images of a size of 512 × 512, and a dataset of 1200 (100 × 6 × 2) fused (MR and CT) images of a liver cancer patients was acquired from the radiology department of Bahawal Victoria Hospital Bahawalpur, Pakistan [20]. The second step is image preprocessing. In this step, firstly, digital MR and CT images were converted into a gray-level eight-bit image format. Secondly, noise removal was performed using “Gabor filters”. Thirdly, we enhanced the images using “Sharping Algorithm II” to sharpen the edges of the image, then a data cleaning was done for liver cancer image dataset standardization. The third step was segmentation, which helped to remove the extra object, nominate the exact position, and refine the texture of the lesion. There are several automated and semiautomated methods for extracting ROI. Automated ROI extraction is generally based on the idea of image segmentation, but there is no single technique for ideal segmentation. On the other hand, there are semiautomated techniques based on expert opinion, but human-based extraction has some limitations. To solve this problem, Otsu thresholding-based region growing segmentation (OTRGS) was employed on the liver cancer fused image dataset. The segmentation process is shown in Figure 3. The fourth step was first-order and second-order hybrid-feature extraction. In this step, four types of features, namely, “Co-Occurrence Matrix Feature”, “Wavelet Feature”, “Run-Length Matrix feature”, and “Histogram Feature”, were extracted from the standardized liver cancer (MR and CT) image dataset. The fifth step was the formation of a fused hybrid-feature dataset using the data fusion technique. The sixth step was fused hybrid-feature optimization. In this step, we selected the best 10 optimized fused hybrid features of the extracted feature dataset using probability of error, plus the average correlation feature selection technique. The last step was classification, where four ML classifiers named multilayer perceptron (MLP), support vector machine (SVM), random forest (RF), and J48 were employed (using cross-validation 10) on a selected optimized fused hybrid-feature dataset.

2.2. Image Preprocessing

The acquired (2D) MR and CT-scan image datasets were converted into a gray-level (eight-bit) image format, and Sharping Algorithm II was used to normalize non-uniformities and improve contrast. The gray-level image has a 256 gray-level, the vertical axis rests on the numeral of a pixel in the image, and the horizontal axis extends from 0 to 255 [23]. The probability density function (p.d.f.) of the pixel intensity level is shown in Equation (1):

P_{\partial} (\partial_{i}) = M_{i} / M

(1)

where

0 \leq \partial_{i} \leq 1

,

M_{i}

is the intensity level of pixels, and M represents the total number of pixels and i = 0 to 255.

During the acquisition of the MR and CT-scan image data, a freckled noise was detected due to the environmental conditions of the image sensor. A noise removal process was adopted to solve this problem. In this process, the Gabor filter [24] was implemented to enhance and detect edges that are based on the Gaussian kernel function. Mathematically, the Gabor filter is expressed in Equation (2).

F (j, k) = α^{s c a l e - i n d e x S c a l e} g (\overset{´}{J}, \overset{´}{k})

(2)

\overset{´}{J} = (j c o s \dot{θ} + k s i n \dot{θ})

(3)

\overset{´}{k} = (- j s i n \dot{θ} + k c o s \dot{θ})

(4)

Here, (j, k) is the image pixel, scale is location parameter,

\dot{θ}

positioning angle,

\dot{θ} = \frac{(E π)}{m}

, (E = 0 to m-1), where m is the angles (x = 0 to 7) and 5 location parameters (scale = 1 to 5). By using these techniques, the noisy values are replaced with average values in the image and a smooth and enhance MR and CT-scan image dataset has been acquired.

2.3. Otsu Thresholding-Based Region Growing Segmentation (OTRGS)

There are several automated and semiautomated methods for extracting ROI. Automated ROI extraction is generally based on the idea of image segmentation, but there is no single technique for ideal segmentation [25]. On the other hand, semiautomated techniques are based on expert opinion, but human-based extraction has some limitations. To solve this problem, Otsu thresholding-based region growing segmentation (OTRGS) was employed on the liver cancer fused image dataset. Liver CT and MR images were divided into four equal regions. Groups of neighboring seeds were used for the formulation of an identifiable region. These seeds were in the shape of an irregular polygon with a variable radius from the center of an image, and that ensured a maximum chance of grouping seeds that belong to the same region. It also allowed any possible size, dimension, and shape to be considered as a region of interest (ROIs). At the postprocessing stage, the Otsu thresholding-based segmentation was employed on improved segmented regions. Otsu thresholding [26] involves iteration through all possible threshold values. It calculated the measure of spread for the pixel level on each side of the thresholding; the pixel either falls in the foreground or background.

Let

ω_{β}

,

θ_{β}

, and

σ_{β}^{2}

be the weight, mean, and variance for the background, respectively. Meanwhile,

ω_{ϕ}

,

θ_{ϕ}

, and

σ_{ϕ}^{2}

are the weight, mean, and variance for the foreground, respectively, i.e.,

β

= background and

ϕ

= foreground.

The variance within class is denoted by

σ_{γ_{1}}^{2}

, and between-class variance is denoted by

σ_{γ_{2}}^{2}

, and they are defined as under Equations (5) and (6), respectively.

σ_{γ_{1}}^{2} = ω_{β} σ_{β}^{2} + ω_{ϕ} σ_{ϕ}^{2}

(5)

and

σ_{γ_{2}}^{2} = ω_{β} ω_{ϕ} {(θ_{β} - θ_{ϕ})}^{2}

(6)

2.4. Feature Extraction

For this research, the hybrid-feature dataset of liver cancer was acquired using used (MR and CT-scan) images, that is, first-order histogram, second-order co-occurrence matrix, run-length matrix, and wavelet features. These features are grouped as follows: 11 second-order co-occurrence matrix features, including five average texture values in all four dimensions (0, 45, 90, and 135°) and calculated (11 × 5 × 4) to a total of 220 features, 17 wavelet features, 8 histogram features, and 9 run-length matrix features. Thus, 254 features per ROI were extracted, and the total calculated features vector space (FVS) was 5,486,400 (254 × 21,600) for the acquired fused (MR and CT-scan) image dataset. All these features were acquired using MATLAB software version R2019b. To carry out this study, all the experimentations were carried out on the Intel^® Core i7 3.9 gigahertz (GHz) processor with 16 gigabytes (GB) of RAM and a 62-bit Windows 10 operating system.

2.4.1. Histogram Features

Histogram features were used by selecting the object with respect to rows and columns [27]. This binary object was used as a mask of the original image for feature extraction. Histogram features were calculated based on the intensity of the individual pixels that were part of the objects. These features were based on the histogram, also called first-order histogram or statistical features. The first-order histogram probability

ψ

(

ε

) is described in Equation (7).

ψ (ε) = \frac{κ (ε)}{N}

(7)

Here, N represents the total number of pixels in the image, and

κ

(

ε

) measures the complete instances of the gray-scale value of

ε

. Eight first-order histogram features are calculated (mean, standard deviation, skewness, energy, and entropy, etc.). Mean is the average of values describing the bright mean and dark mean in an image. It is defined in Equation (8).

η = \sum_{i} \sum_{j} \frac{κ (i, j)}{κ}

(8)

The consequent values of i (rows) and j (columns) show the pixel. Standard deviation (SD) describes the contrast of the image. It is presented in Equation (9).

σ_{ε} = \sqrt{\sum_{ε = 0}^{P - 1} {(ε - \bar{ε})}^{2} ψ (ε)}

(9)

When, around the central value, there is no symmetry (mean, median, mode), it is the degree of asymmetry, and this is called skewness and denoted by

γ

. It is defined in Equation (10).

γ = \frac{1}{σ_{ε}^{3}} \sum_{ε = 0}^{P - 1} {(ε - \bar{ε})}^{3} ψ (ε)

(10)

The gray-level distribution is called energy, denoted by

ϑ

and defined in Equation (11).

ϑ = \sum_{ε = 0}^{P - 1} {[ψ (ε)]}^{2}

(11)

Entropy is denoted by

ν

and describes the randomness in the image data. It is defined in Equation (12).

ν = - \sum_{i} \sum_{j} κ (i, j) l o g_{2} κ (i, j)

(12)

2.4.2. Co-occurrence Matrix Features

Co-occurrence Matrix (COM) features are also called second-order statistical features. They are obtained from the distance and the angle between pixels, based on the gray-level co-occurrence matrix (GLCOM) [28]. For this study, eleven second-order texture features were calculated in four dimensions, 0, 45, 90, and 135°, up to five-pixel distance. Eleven second-order COM features were acquired, namely, entropy, inertia, correlation, inverse difference, and energy, etc. First of all, energy is defined in Equation (13). Energy is calculated in the distribution between gray-level values, i.e.,

ζ

.

E n e r g y = \sum_{i} \sum_{j} {(ζ_{i, j})}^{2}

(13)

Correlation describes the pixel similarity at particular pixel distance and is described in Equation (14).

C o r r e l a t i o n = \frac{1}{σ_{α} σ_{β}} \sum_{x} \sum_{y} (x - μ_{α}) (y - μ_{β}) ζ_{i, j}

(14)

Entropy measures the total content of the image. It is described in Equation (15).

E n t r o p y = - \sum_{i} \sum_{j} ζ_{i, j} l o g_{2} ζ_{i, j}

(15)

The local homogeneity of image is called inverse difference, and it is presented in Equation (16).

I n v e r s e D i f f e r e n c e = \sum_{i} \sum_{j} \frac{ζ_{i, j}}{| i - j |}

(16)

The contrast describes the inertia, which is defined in Equation (17).

I n e r t i a = \sum_{i} \sum_{j} {(i - j)}^{2} ζ_{i, j}

(17)

2.4.3. Run-Length Matrix Features

Galloway [28] introduced the gray-level run-length matrix (GLRM) in run-length (RL) features to explain the length of the run. Because it is a constant, the number of pixels in the collinear is measured horizontally in a particular direction (0, 45, 90, and 135°). According to Gonzales, a section of gray scale or color level—also known as range or length of run—is a linear multitude of continuous pixels with the same color or gray level in a particular direction [29].

Let

κ_{g}

be the number of discreet intensity values in the image,

K_{r}

be the number of discreet run lengths in the image,

K_{p}

be the number of voxels in the image,

K_{r} (μ)

be the number of runs in the image along angle

μ

and

ψ (x, y | μ)

be the run-length matrix for an arbitrary direction θ. Then, the short run emphasis (SRE) is described in Equation (18).

S R E = \frac{\sum_{x = 0}^{κ_{g}} \sum_{y = 1}^{κ_{r}} \frac{ψ (x, y | μ)}{y^{2}}}{κ_{r} (μ)}

(18)

SRE is a measure of the distribution of short run lengths, with a greater value indicative of shorter run lengths and finer textural textures. Long run emphasis (LRE) is described in Equation (19).

L R E = \frac{\sum_{x = 1}^{K_{g}} \sum_{y = 0}^{K_{r}} ψ (x, y | μ) y^{2}}{K_{r} (μ)}

(19)

LRE is a measure of the distribution of long run lengths, with a greater value indicative of longer run lengths and more coarse structural textures. Gray-level nonuniformity (GLN) is described in Equation (20).

G L N = \frac{\sum_{x = 1}^{κ_{g}} {[\sum_{y = 1}^{κ_{r}} ψ (x, y | μ)]}^{2}}{κ_{r} (μ)}

(20)

GLN measures the similarity of gray-level intensity values in the image, where a lower GLN value correlates with a greater similarity in intensity values. Gray-level nonuniformity normalized (GLNN) is defined in Equation (21).

G L N N = \frac{\sum_{x = 1}^{κ_{g}} {[\sum_{y = 1}^{κ_{r}} ψ (x, y | μ)]}^{2}}{κ_{r} {(μ)}^{2}}

(21)

GLNN measures the similarity of gray-level intensity values in the image, where a lower GLNN value correlates with a greater similarity in intensity values. This is the normalized version of the GLN formula. Run-length nonuniformity (RLN) is described in Equation (22).

R L N = \frac{\sum_{y = 1}^{κ_{r}} {[\sum_{x = 1}^{κ_{g}} ψ (x, y | μ)]}^{2}}{κ_{r} (μ)}

(22)

RLN measures the similarity of run lengths throughout the image, with a lower value indicating more homogeneity among run lengths in the image. Run-length nonuniformity normalized (RLNN) is defined in Equation (23).

R L N N = \frac{\sum_{y = 1}^{κ_{r}} {[\sum_{x = 1}^{κ_{g}} ψ (x, y | μ)]}^{2}}{κ_{r} {(μ)}^{2}}

(23)

Additionally, RLNN measures the similarity of run lengths throughout the image, with a lower value indicating more homogeneity among run lengths in the image. This is the normalized version of the RLN formula. Run percentage (RP) is given in Equation (24).

R P = \frac{κ_{r} (μ)}{κ_{p}}

(24)

Finally, RP measures the coarseness of the texture by taking the ratio of number of runs and number of voxels in the ROI. Values are in the range 1 RLNN ≤ RP ≤ 1 RLNN ≤ RP ≤ 1, with higher values indicating a larger portion of the ROI consists of short runs (which indicates a finer texture).

2.4.4. Wavelet Features

The conversion of a 2n × 2n image matrix into a same dimension matrix, where n is a positive integer, is a liner operation for images called discrete wavelet transforms. The input image is multiplied by the transform matrix, resulting in the transform matrix being transposed and multiplied once again. Half of the transform matrix rows can be considered as coefficients of filter (G), while the rest accelerate the coefficient of filter (L). Therefore, the obtained matrix is composed of four-square circles denoted as LL, LG, GL, and GG, and is called sub-band, where the energy feature counts for each sub-band. Thus, the total number of wavelength attribute values can vary depending on the invoice image dimensions [30].

E n e r g y = λ^{- 1} \sum_{x, y \in R i g i o n o f I n t r e s t} t_{x, y}^{2}

(25)

Here, the constraint

t_{x, y}

is the resultant matrix component. Σ is performed for each pixel (x, y) situated in the region of interest definition, and

λ

is the total number of pixels.

2.5. Feature Selection

Feature selection is the most important part of the ML process. The main goal of this process is to select the most valuable features and remove the worthless features in a dataset. In this research, it was observed that all the extracted features were not equally valuable for liver cancer classification. The acquired dataset had a large FVS of 5,486,400 (254 × 21,600), making it very difficult to deal with. This problem was solved by maximizing the size of the functionality, as indicated in [31], so that the faithful representation of all data was consistent and resulted in an adequate classification with a minimum error rate. The principal component analysis (PCA) technique provided excellent results on linearly separated data, since PCA facilitated the linear transformation of the input data [32]. In addition, it was also used in the selection of features. Using PCA, the most important feature set was obtained, which had less functionality than the original FVS. Regrettably, this optimized feature set did not provide an accurate picture of the entire data set because the PCA was unable to retain many discrete data. Additionally, the PCA method was an unsupervised approach [33], but the liver cancer dataset was labeled, and the PCA results were not as promising on the labeled data. To solve this problem, ML-based supervised feature selection techniques, namely, probability of error (POE) and average correlation (AC), were used to select the optimized features from this large-scale and high-dimensional liver cancer acquired fused dataset. This approach was better compared to PCA and was able to obtain the sub-dataset with the optimal characteristics for this large dataset. The proposed approach selected 10 optimized features out of 254 features using MATLAB software [34]. Mathematically, (POE + AC) is defined as:

P A (f_{w}) = \frac{M i s s c l a s s i f i e d S a m p l e s}{T o t a l S a m p l e s}

(26)

f_{2} = f_{w} : M i n_{w} [P A (f_{w}) + | C o r r e l a t e (f_{1}, f_{w}) |]

(27)

f_{k} = f_{w} : M i n_{w} [P A (f_{w}) + \frac{1}{N - 1} | C o r r e l a t e (f_{1}, f_{w}) |]

(28)

When (POE + AC) was deployed on the acquired dataset, it selected 10 optimized features for further processing. The optimized features are described in Table 1.

Finally, 5,486,400 (254 × 21,600) fused hybrid-feature vector spaces were reduced to the 216,000 (10 × 21,600) shown in Figure 4, a (POE + AC)-based optimized dataset for each type of liver cancer, and this optimized fused hybrid-feature dataset was deployed to four machine-learning classifiers.

2.6. Classification

Four ML classifiers, namely, multilayer perceptron (MLP), support vector machine (SVM), random forest (RF), and J48, were employed on a liver cancer fused dataset. The MLP classifiers performed best among the implemented classifiers because MLP mostly performed well for noisy, big, and complex data [35]. The MLP classifiers [36,37] are explained below; the production of input weight and bias are summed up using the summation function (

ρ_{n}

) given in Equation (29).

ρ_{n} = \sum_{m = 1}^{k} η_{m n} I_{n} + θ_{n}

(29)

Here, k is the number of inputs,

I_{j}

is the input variable I,

θ_{j}

is the bias term, and

η_{m j}

is the weight. There are many activation functions of MLP, including the one given below.

ψ_{n} (x) = \frac{1}{1 + \exp (ρ_{n})}

(30)

The output of neuron j can be obtained as:

z_{n} = ψ_{n} (\sum_{m = 1}^{k} η_{m n} I_{n} + θ_{n})

(31)

The MLP classifier parameter setting is described in Table 2. The hybrid-feature analysis MLP framework with all regulation parameters is shown in Figure 5.

In the first layer, the input layer is defined in “green” with 10 features, while the second layer “red” shows the invisible layer with 18 neurons. The third layer, which has six “yellow” nodes showing the weights of the hidden layers, is the output layer. The regulation parameters along with their values are also shown above.

3. Results and Discussion

For this study, four ML classifiers named multilayer perceptron (MLP), support vector machine (SVM), random forest (RF), and J48 were deployed on selected optimized hybrid features (using cross-validation 10-folds) for the classification of liver cancer. There are many ML classifiers, but these four ML classifiers perform the best in the senses of accuracy and time in this study. As discussed earlier, four types of texture features, namely, the histogram, wavelet, co-occurrence, and run-length features, were extracted using the MR and CT-scan image dataset, and we fused these later on to generate a fused hybrid-feature dataset. First, experimentation was performed on an MRI-based hybrid-feature dataset (using 10-fold cross-validation), which did not give promising results on employed classifiers. The overall classification accuracy of the employed classifiers of MLP, SVM, RF, and J48 was 95.88%, 95.78%, 94.44%, and 94.44%, respectively. In the second step, the same approach was employed on the CT-scan-based hybrid-feature dataset. The overall classification accuracy of employed classifiers MLP, SVM, RF, and J48 was 97.44%, 96.89%, 96.83%, and 96%, respectively. There was a very promising improvement in the accuracy of the results as compared to that of the MRI-based dataset results. In this analysis, we observed that both CT and MRI modality have their own worth in liver cancer analysis and diagnosis. Thus, for further analysis, we fused the MRI and CT datasets using a data fusion approach to obtain more accurate classification results. Data fusion [38] is a very powerful technique for merging multiple datasets to produce an accurate classification as compared to individual datasets. As a final step, the same approach was employed on the overall classification accuracy of employed classifiers for fused optimized hybrid features. MLP, SVM, RF, and J48 showed a considerably higher classification accuracy of 99%, 98.5%, 98.17%, and 97.11%, respectively. The overall accuracy of the result of the MRI-based dataset with the employed ML classifiers was determined using other performance-evaluating factors, such as kappa statistics, which is a metric that compares an observed accuracy with an expected accuracy. This takes into account true positives (TP), which is an outcome where the model correctly predicts a positive class, false positives (FP), which is an outcome where the model incorrectly predicts a positive class, and precision, which is related to reproducibility and repeatability, defined as the degree to which measurements are repeated under unchanged conditions and given in Equation (32).

P r e c i s i o n = T P / (T P + F P)

(32)

Recall is the fraction of the total amount of relevant instances that were actually retrieved, given in Equation (33).

R e c a l l = T P / (T P + F N)

(33)

The f-score (or f-measure) is calculated based on the precision and recall, given in Equation (34).

F - M e a s u r e = 2 \times P r e c i s i o n \times R e c a l l / (P r e c i s i o n + R e c a l l)

(34)

The receiver-operating characteristic (ROC) is a graphical plot equating the TP rates and the FP rates of a classifier. As the refinement threshold of the classifier is different, the mean absolute error (MAE) quantity, used to measure how close forecasts or predictions are to the eventual outcomes, and the root mean squared error (RMSE), which is the sample standard deviation of the differences between predicted values and observed values, as well as the confusion matrix and time complexity (T) are shown in Table 3. It has been observed that in the employed classifiers on the MRI-based dataset, the multilayer perceptron (MLP) classifier showed a relatively better classification accuracy of 95.88% as compared to the other deployed classifiers shown in Figure 6.

To acquire some more promising improvements in classification accuracy, we then deployed the same classifiers on the CT-scan-based dataset shown in Table 4.

It has been observed that the overall accuracy of MLP, SVM, RF, and J48 was 97.44%, 96.89%, 96.83%, and 96%, respectively, as shown in Figure 7.

Finally, the overall accuracy of the MRI and CT-scan dataset was not so impressive, while regarding the fused optimized hybrid feature, the same strategy with the same deployed classifiers was employed for this image dataset of six liver cancers, and we observed very promising results, with accuracy levels between 97.11% and 99%. The overall classification accuracy of the employed classifiers, that is, MLP, SVM, RF, and J48, was 99%, 98.5%, 98.17%, and 97.11%, respectively, as shown in Figure 8.

These results were encouraging, and we observed that the MLP showed the best accuracy among all the implemented classifiers. The four implemented ML classifiers on the fused optimized hybrid-feature dataset gave the accuracy results shown in Table 5.

Similarly, the confusion matrix (CM) of the fused optimized hybrid feature is shown in Table 6. The diagonal of the CM (TABLE) shows the classification accuracy in appropriate classes, while other instances show them in other classes. It contains the actual and predicted data for the MLP classifier. The MLP classifier showed a relatively better overall accuracy among the implemented classifiers. The classification accuracy of the results of the fused dataset of the six types of liver cancer, that is, hepatoblastoma, cyst, hemangioma, hepatocellular adenoma, hepatocellular carcinoma, and metastasis, was 99.67%, 99.33%, 98.33%, 99.72%, 97.30%, and 99.67%, respectively. Graphical accuracy results are shown in Figure 9.

Finally, we present a comparative liver cancer classification graph of the MRI, CT-scan, and fused optimized hybrid-feature dataset using the employed ML classifiers. This graph shows an overall better accuracy (red) for liver cancer classification using the fused dataset as compared to the CT-scan (green) and MRI (blue)-based datasets, as shown in Figure 10.

The existing system has some limitations because it is a supervised learning-based classifier. All experiments were performed on a liver cancer fused dataset, acquired to meet all legal requirements of the radiology department of the Bahawal Victoria Hospital (BVH) Bahawalpur Pakistan [21], with the collaboration of the Department of Computer Science of the Islamic University from Bahawalpur (IUB) [39], Pakistan to address the regional and local problem of identifying liver cancer types. This proposed model has been verified on the publicly available dataset of the liver cancer archives, the Radiopaedia Image Database [40]. Sixty liver cancer fused datasets of six liver types of liver, namely, hepatocellular adenoma, hemangioma, cyst, hepatocellular carcinoma, hepatoblastoma, and metastasis, totaling 360 (60 × 6) datasets, were acquired from the Radiopaedia public archive. Multi-institutional dataset differences between BVH and Radiopaedia were observed, and we tried our best to normalize the Radiopaedia dataset with respect to the BVH dataset. At first, we resized each slice of the Radiopaedia dataset as per the standards of BVH, then we employed the proposed technique (OTRGS) with the same classifiers as shown in Table 7.

A very promising result was observed, that is, MLP, SVM, RF, and J48 had an overall accuracy of 98.27%, 96.72%, 96.44%, and 95.95%, respectively, as shown in Figure 11.

Experimental results were observed to vary due to variations in the multi-institutional liver cancer image datasets. We must support the development of a dataset based on the global platform for medical patient data, in which, despite differences in patient mode, region, demography, geography, and medical history, we can address these medical health issues more accurately. A comparison diagram between the BVH and Radiopaedia fused (MRI and CT-scan) dataset is shown in Figure 12.

A comparison between the proposed methodology and the current state-of-the-art techniques is shown in Table 8.

4. Conclusions

This research work focused on the classification of six liver cancers (hepatocellular adenoma, hemangioma, cyst, hepatocellular carcinoma, hepatoblastoma, and metastasis) with the help of texture analysis using a fused dataset based on the hybrid-feature analysis. The main goal was Otsu thresholding-based segmentation, selection of suitable optimized hybrid features, and the best classifiers for efficient classification. The variation of results is due to the different modalities of the MRI and CT-scan datasets. The fused hybrid-feature dataset was generated using a data fusion approach. Four machine learning classifiers, that is, MLP, SVM, RF, and J48, were employed using this optimized hybrid-feature dataset. The employed classifiers showed satisfactory results, but the MLP classifier results were exceptionally high among all other implemented classifiers. After implementing the MLP classifier, it was observed that an overall accuracy of 99% was achieved in these six liver tumors. The accuracies obtained by MLP on the six classes of hepatoblastoma, cyst, hemangioma, hepatocellular adenoma, hepatocellular carcinoma, and metastasis were 99.67%, 99.33%, 98.33%, 99.67%, 97.33%, and 99.67% respectively. The proposed model has been verified on the publicly available dataset of Radiopaedia. A very promising result was observed, with a variation in classification accuracy of 95.94% to 98.27%. This research was designed in such a manner that if a new patient provides either a CT or a MR image, the system performs well. Our proposed system has the capability to verify the results on different MRI and CT-scan databases, which could help radiologists to diagnose liver tumors. It is a robust and efficient technique to reduce human error and can be implemented on large clinical datasets. The proposed system accurately diagnoses six types of liver tumors, including three benign and three malignant ones.

Future Work

In the future, this technique can be improved further using 3D visualization of volumetric fused (MRI and CT-scan) data. In the future, clinical applicability will be observed for this proposed model.

Author Contributions

S.N., Data curation; A.A., Conceptualization, Methodology, Writing—original draft, Software; S.Q., Supervision; W.K.M., Project administration; N.T., Validation; H.S., Resources; M.F., Investigation; F.J., Formal analysis, Writing—review and editing; C.C., Validation, Writing—review and editing; S.A., Visualization; All authors have read and agreed to the published version of the manuscript.

Funding

King Khalid University of Saudi Arabia, Grant number R.G.P.2/7/38.

Acknowledgments

This work was supported by King Khalid University of Saudi Arabia, and the authors would like to thank for supporting this research under the grant number R.G.P.2/7/38. Additionally, authors would like to thank the referees for careful reading and for their comments, which significantly improved the paper.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Rane, J.; Jadhao, R.; Bakal, R.L. Liver diseases and herbal drugs:—A review. J. Innov. Pharm Biol. Sci. 2016, 3, 24–36. [Google Scholar]
Ntomi, V.; Paspala, A.; Schizas, D. Novel Techniques in the Surgical Management of Hepatocellular Carcinoma. Liver Cancer 2018, 77. [Google Scholar] [CrossRef] [Green Version]
Bandera, E.V.; Fay, S.H.; Giovannucci, E. World Cancer Research Fund International Continuous Update Project Panel. The use and interpretation of anthropometric measures in cancer epidemiology: A perspective from the World Cancer Research Fund international continuous update project. Int. J. Cancer 2016, 139, 2391–2397. [Google Scholar] [CrossRef] [PubMed]
Ferlay, J.; Colombet, M.; Soerjomataram, I.; Mathers, C.; Parkin, D.M.; Piñeros, M.; Znaor, A.; Bray, F. Estimating the global cancer incidence and mortality in 2018: GLOBOCAN sources and methods. Int. J. Cancer 2019, 144, 1941–1953. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Lam, A.K. Update on adrenal tumours in 2017 World Health Organization (WHO) of endocrine tumours. Endocr. Pathol. 2017, 3, 213–227. [Google Scholar] [CrossRef]
Bruix, J.; Han, K.; Gores, G.; Llovet, J.M.; Mazzaferro, V. Liver cancer: Approaching a personalized care. J. Hepatol. 2015, 1, S144–S156. [Google Scholar] [CrossRef] [Green Version]
Russ, J.C.; Rindel, J.; Lord, P. Forensic Uses of Digital Imaging; CRC Press: Boca Raton, FL, USA, 2016. [Google Scholar]
Siuly, S.; Zhang, Y. Medical big data: Neurological diseases diagnosis through medical data analysis. Data Sci. Eng. 2016, 2, 54–64. [Google Scholar] [CrossRef] [Green Version]
Tadeusiewicz, R.; Ogiela, M.R. Medical Image Understanding Technology; Springer: Heidelberg, Germany, 2004. [Google Scholar]
Mohri, M.; Rostamizadeh, A.; Talwalkar, A. Foundations of Machine Learning; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
Parsai, A.; Miquel, M.E.; Jan, H.; Kastler, A.; Szyszko, T.; Zerizer, I. Improving liver lesion characterisation using retrospective fusion of FDG PET/CT and MRI. Clin. Imaging 2019, 55, 23–28. [Google Scholar] [CrossRef]
Schmauch, B.; Herent, P.; Jehanno, P.; Dehaene, O.; Saillard, C.; Aubé, C.; Luciani, A.; Lassau, N.; Jégou, S. Diagnosis of focal liver lesions from ultrasound using deep learning. Diagn. Interv. Imaging 2019, 4, 227–233. [Google Scholar] [CrossRef]
Mariëlle, J.A.; Kuijf, H.J.; Veldhuis, W.B.; Wessels, F.J.; Viergever, M.A.; Pluim, J.W. Automatic classification of focal liver lesions based on MRI and risk factors. PLoS ONE 2019, 14, 5. [Google Scholar]
Ta, N.C.; Kono, Y.; Eghtedari, M.; Oh, Y.T.; Robbin, M.L.; Barr, R.G.; Kummel, A.C.; Mattrey, R.F. Focal liver lesions: Computer-aided diagnosis by using contrast-enhanced US cine recordings. Radiology 2018, 3, 1062–1071. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Oliva, J.T.; Lee, H.D.; Spolaôr, N.; Coy, C.S.R.; Wu, F.C. Prototype system for feature extraction, classification and study of medical images. Expert Syst. Appl. 2016, 63, 267–283. [Google Scholar] [CrossRef]
Wu, W.; Zhou, Z.; Wu, S.; Zhang, Y. Automatic liver segmentation on volumetric CT images using supervoxel-based graph cuts. Comput. Math. Methods Med. 2016, 2016. [Google Scholar] [CrossRef] [Green Version]
Wu, W.; Wu, S.; Zhou, Z.; Zhang, R.; Zhang, Y. 3D liver tumor segmentation in CT images using improved fuzzy C-means and graph cuts. BioMed Res. Int. 2017, 2017. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Boss, R.; Chandra, S.; Thangavel, K.; Daniel, D.A.P. Mammogram image segmentation using fuzzy clustering. In Proceedings of the International Conference on Pattern Recognition, Informatics and Medical Engineering, Tamilnadu, India, 21–23 March 2012; pp. 290–295. [Google Scholar]
Kondo, S.; Takagi, K.; Nishida, M.; Iwai, T.; Kudo, Y.; Ogawa, K.; Kamiyama, T.; Shibuya, H.; Kahata, K.; Shimizu, C. Computer-aided diagnosis of focal liver lesions using contrast-enhanced ultrasonography with perflubutane microbubbles. IEEE Trans. Med. Imaging 2017, 7, 1427–1437. [Google Scholar] [CrossRef] [PubMed]
Quaid-e-Azam Medical College. Bahawal Victoria Hospital—Quaid-e-Azam Medical College. Available online: https://www.qamc.edu.pk/bahawalvictoriahospital/ (accessed on 5 December 2018).
Chen, F.; Yuan, Z.; Huang, Y. Multi-source data fusion for aspect-level sentiment classification. Knowl. Based Syst. 2020, 187, 104831. [Google Scholar] [CrossRef]
Kim, Y.-Y.; Choi, J.; Sirlin, C.B.; An, C.; Kim, M. Pitfalls and problems to be solved in the diagnostic CT/MRI Liver Imaging Reporting and Data System (LI-RADS). Eur. Radiol. 2019, 3, 1124–1132. [Google Scholar] [CrossRef]
Ginneken, V.B.; Frangi, A.F.; Staal, J.J.; Romeny, B.M.H.; Viergever, M.A. A non-linear gray-level appearance model improves active shape model segmentation. In Proceedings of the IEEE Workshop on Mathematical Methods in Biomedical Image Analysis, Kauai, HI, USA, 9–10 December 2001; pp. 205–212. [Google Scholar]
Darvishnezhad, M.; Ghassemian, H.; Imani, M. Local binary graph feature reduction for three-Dimensional gabor filter based hyperspectral image classification. In International Archives of the Photogrammetry, Remote Sensing & Spatial Information Sciences; Copernicus Publications: Göttingen, Germany, 2019. [Google Scholar]
Ali, S.; Zhou, F.; Braden, B.; Bailey, A.; Yang, S.; Cheng, G.; Zhang, P.; Li, X.; Kayser, M.; Soberanis-Mukul, R.D.; et al. An objective comparison of detection and segmentation algorithms for artefacts in clinical endoscopy. Sci. Rep. 2020, 1, 1–15. [Google Scholar] [CrossRef]
Bhandari, K.A.; Kumar, I.V.; Srinivas, K. Cuttlefish algorithm based multilevel 3D Otsu function for color image segmentation. IEEE Trans. Instrum. Meas. 2019. [Google Scholar] [CrossRef]
Saif, S.W.; Alshawi, T.; Esmail, M.A.; Ragheb, A.; Alshebeili, S. Separability of histogram-based features for optical performance monitoring: An investigation using t-SNE technique. IEEE Photonics J. 2019, 3, 1–12. [Google Scholar] [CrossRef]
Abbas, Z.; Rehman, M.; Najam, S.; Rizvi, S.M.D. An efficient gray-level co-occurrence matrix (GLCM) based approach towards classification of skin lesion. In Proceedings of the 2019 Amity International Conference on Artificial Intelligence (AICAI), Dubai, UAE, 4–6 February 2019; pp. 317–320. [Google Scholar]
Prasetyo, H.; Simatupang, J.W. Batik Image Retrieval Using Maximum Run Length LBP and Sine-Cosine Optimizer. In Proceedings of the 2019 International Conference on Sustainable Engineering and Creative Computing (ICSECC), Bandung, Indonesia, 20–22 August 2019; pp. 265–269. [Google Scholar]
Pancholi, S.; Joshi, A.M. Improved Classification Scheme using Fused Wavelet Packet Transform based Features for Intelligent Myoelectric Prostheses. IEEE Trans. Ind. Electron. 2019. [Google Scholar] [CrossRef]
Abdel-Basset, M.; El-Shahat, D.; El-henawy, I.; de Albuquerque, V.H.C.; Mirjalili, S. A new fusion of grey wolf optimizer algorithm with a two-phase mutation for feature selection. Expert Syst. Appl. 2020, 139, 112824. [Google Scholar] [CrossRef]
Sarker, H.I.; Abushark, Y.B.; Khan, A.I. ContextPCA: Predicting Context-Aware Smartphone Apps Usage Based on Machine Learning Techniques. Symmetry 2020, 4, 499. [Google Scholar] [CrossRef] [Green Version]
Taguchi, Y.H. Applications of PCA Based Unsupervised FE to Bioinformatics. In Unsupervised Feature Extraction Applied to Bioinformatics; Springer: Cham, Switzerland, 2020; pp. 119–211. [Google Scholar]
Qadri, S.; Khan, D.M.; Qadri, S.F.; Razzaq, A.; Ahmad, N.; Jamil, M.; Shah, A.N.; Muhammad, S.S.; Saleem, K.; Awan, S.A. Multisource data fusion framework for land use/land cover classification using machine vision. J. Sens. 2017, 2017. [Google Scholar] [CrossRef] [Green Version]
Pathak, S.; Kumar, B. A robust automated cataract detection algorithm using diagnostic opinion-based parameter thresholding for telemedicine application. Electronics 2016, 3, 57. [Google Scholar] [CrossRef] [Green Version]
Sarker, H.I.; Kayes, A.S.M.; Watters, P. Effectiveness analysis of machine learning classification models for predicting personalized context-aware smartphone usage. J. Big Data 2019, 1, 57. [Google Scholar] [CrossRef]
Sumsion, R.G.; Bradshaw, M.S.; Hill, K.T.; Pinto, L.D.G.; Piccolo, S.R. Remote sensing tree classification with a multilayer perceptron. PeerJ 2019, 7, e6101. [Google Scholar] [CrossRef] [Green Version]
Lau, B.P.L.; Marakkalage, S.H.; Zhou, Y.; Hassan, N.U.; Yuen, C.; Zhang, M.; Tan, U. A survey of data fusion in smart city applications. Inf. Fusion 2019, 52, 357–374. [Google Scholar] [CrossRef]
Bhatti, R. Information needs and information-seeking behaviour of faculty members at the Islamia University of Bahawalpur. Libr. Philos. Pract. 2009, 3, 7–21. [Google Scholar]
Radiopaedia. Available online: https://radiopaedia.org/ (accessed on 5 August 2019).

Figure 1. Typical six liver cancer CT and MR image datasets.

Figure 2. Otsu thresholding-based segmentation and fused hybrid-feature analysis for liver cancer classification framework.

Figure 3. Otsu thresholding-based region growing segmentation (OTRGS) framework for liver cancer.

Figure 4. Liver cancer dataset visualization in 3D vector space.

Figure 5. MLP framework for classification of liver cancer using optimized fused hybrid features.

Figure 6. The overall accuracy graph of the employed ML classifiers on the MRI dataset.

Figure 7. The overall accuracy graph of the employed ML classifiers on the CT-scan dataset.

Figure 8. The overall accuracy graph of the employed ML classifiers on the fused dataset.

Figure 9. Classification accuracy graph of the six liver cancer types using the MLP classifier on the fused optimized dataset.

Figure 10. Comparative analysis of the liver cancer classification accuracy graph among different modalities’ datasets.

Figure 11. The overall accuracy graph of employed ML classifiers on Radiopaedia fused optimized public dataset.

Figure 12. Comparison accuracy graph between Bahawal Victoria Hospital (BVH) and Radiopaedia fused dataset.

Table 1. Optimized fused hybrid-feature selection table (POE + AC).

Features
1.	45dgr_RLNonUni	6.	WavEnLL_s-4
2.	135dr_RLNonUni	7.	Mean
3.	Horzl_RLNonUni	8.	WavEnHH_s-4
4.	135dr_GLevNonU	9.	Vertl_LngREmph
5.	S(5,5)Entropy	10.	Skewness

Table 2. Implemented multilayer perceptron (MLP) classifier constraints values.

Parameter	Value
Input Layers	1
Hidden Layers	15
Neurons	18
Learning Rate	0.5
Momentum	0.4
Validation Threshold	18
Epochs	500

Table 3. The overall classification accuracy table of employed machine-learning (ML) classifiers on the liver cancer magnetic resonance imaging (MRI) dataset.

Classifiers	Kappa Statistics	TP Rate	FP Rate	ROC	Recall	F-Measure	MAE	RMSE	Time (Sec)	Precision
MLP	0.9507	0.959	0.008	0.979	0.959	0.959	0.0154	0.1152	0.42	0.959
SVM	0.9493	0.958	0.008	0.996	0.958	0.958	0.0276	0.1083	0.31	0.958
RF	0.9333	0.944	0.011	0.986	0.944	0.944	0.2236	0.3125	0.11	0.946
J48	0.9333	0.944	0.011	0.995	0.944	0.944	0.0295	0.1198	0.16	0.944

Table 4. The overall classification accuracy table of employed ML classifiers on the liver cancer CT-scan dataset.

Classifiers	Kappa Statistics	TP Rate	FP Rate	ROC	Recall	F-Measure	MAE	RMSE	Time (s)	Precision
MLP	0.9693	0.974	0.005	0.995	0.974	0.975	0.2228	0.3113	0.27	0.976
SVM	0.9627	0.969	0.006	0.998	0.969	0.969	0.0176	0.0874	0.12	0.969
RF	0.9367	0.968	0.032	0.997	0.968	0.968	0.0302	0.1542	0.11	0.968
J48	0.952	0.960	0.008	0.996	0.960	0.960	0.0334	0.1203	0.09	0.960

Table 5. The overall classification accuracy table of the employed ML classifiers on the fused dataset.

Classifiers	Kappa Statistics	TP Rate	FP Rate	Recall	F-Measure	ROC	MAE	RMSE	Time (s)	Precision
MLP	0.988	0.99	0.002	0.99	0.99	0.997	0.2225	0.3108	0.31	0.99
SVM	0.982	0.985	0.003	0.985	0.985	0.999	0.0116	0.0655	0.19	0.985
RF	0.978	0.982	0.004	0.982	0.982	0.999	0.0114	0.068	0.28	0.982
J48	0.9653	0.971	0.006	0.971	0.971	0.997	0.01	0.0933	0.07	0.972

Table 6. Confusion matrix of the fused optimized hybrid-feature dataset using the MLP classifier.

Classified as	Hepatoblastoma	Cyst	Hemangioma	Adenoma	Carcinoma	Metastasis	Total
Hepatoblastoma	3588	2	0	8	0	2	3600
Cyst	4	3576	0	0	5	15	3600
Hemangioma	10	20	3540	5	25	0	3600
Adenoma	0	0	0	3590	5	5	3600
Carcinoma	10	30	5	45	3503	8	3600
Metastasis	2	0	10	0	0	3588	3600

Table 7. The overall classification accuracy results using Radiopaedia dataset using ML classifiers.

Classifiers	Kappa Statistics	TP Rate	FP Rate	ROC	Recall	F-Measure	MAE	RMSE	Time (s)	Precision
MLP	0.9793	0.983	0.003	0.999	0.983	0.984	0.0171	0.0725	0.77	0.983
SVM	0.9607	0.967	0.007	0.990	0.967	0.966	0.0151	0.1008	0.14	0.967
RF	0.9573	0.964	0.007	0.980	0.964	0.965	0.013	0.108	0.22	0.964
J48	0.9513	0.959	0.008	0.976	0.959	0.958	0.0136	0.1165	0.09	0.959

Table 8. Comparison table of the proposed and current state-of-the-art techniques.

Source/Reference	Methodology	Modality	Accuracy
Parsai, A. et al. [11]	Fused Features Based CT/PET/MRI	CT/MRI/PET	94.7%
Schmauch, B. et al. [12]	Deep Learning	US	91.6%
Jansen, M.J. [13]	Texture + Tree Classifier	MRI	77%
Ta, C. N. et al. [14]	Internal Edge, Echogenicity, Echo, Morphology	US	93.8%
Oliva, J. T. et. al. [15]	SIOPEL/GPOH, COG, and JPLT	MRI	92.5%
Romero et al. [16]	Inception V3 Modified Architecture + Deep Learning	CT	96%
Wu, W. et al. [17]	Fused Feature + Fuzzy C-Means	CT	91.63%
Boss, R. et al. [18]	Mammogram Segmentation	MRI	86.67%
Kondo, S. [19]	Spatial and Temporal Features	US/CT	91.8%
Proposed Methodology	OTRGS + Optimized Fused Hybrid Features + MLP	BVH (CT and MRI) Dataset	99%
Proposed Methodology (Validation)	OTRGS + Optimized Fused Hybrid Features + MLP	Radiopaedia (CT and MRI) Public Dataset	98.27%

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Naeem, S.; Ali, A.; Qadri, S.; Khan Mashwani, W.; Tairan, N.; Shah, H.; Fayaz, M.; Jamal, F.; Chesneau, C.; Anam, S. Machine-Learning Based Hybrid-Feature Analysis for Liver Cancer Classification Using Fused (MR and CT) Images. Appl. Sci. 2020, 10, 3134. https://0-doi-org.brum.beds.ac.uk/10.3390/app10093134

AMA Style

Naeem S, Ali A, Qadri S, Khan Mashwani W, Tairan N, Shah H, Fayaz M, Jamal F, Chesneau C, Anam S. Machine-Learning Based Hybrid-Feature Analysis for Liver Cancer Classification Using Fused (MR and CT) Images. Applied Sciences. 2020; 10(9):3134. https://0-doi-org.brum.beds.ac.uk/10.3390/app10093134

Chicago/Turabian Style

Naeem, Samreen, Aqib Ali, Salman Qadri, Wali Khan Mashwani, Nasser Tairan, Habib Shah, Muhammad Fayaz, Farrukh Jamal, Christophe Chesneau, and Sania Anam. 2020. "Machine-Learning Based Hybrid-Feature Analysis for Liver Cancer Classification Using Fused (MR and CT) Images" Applied Sciences 10, no. 9: 3134. https://0-doi-org.brum.beds.ac.uk/10.3390/app10093134

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Machine-Learning Based Hybrid-Feature Analysis for Liver Cancer Classification Using Fused (MR and CT) Images

Abstract

1. Introduction

1.1. Literature Review

1.2. Contribution

2. Materials and Methods

2.1. Proposed Methodology

2.2. Image Preprocessing

2.3. Otsu Thresholding-Based Region Growing Segmentation (OTRGS)

2.4. Feature Extraction

2.4.1. Histogram Features

2.4.2. Co-occurrence Matrix Features

2.4.3. Run-Length Matrix Features

2.4.4. Wavelet Features

2.5. Feature Selection

2.6. Classification

3. Results and Discussion

4. Conclusions

Future Work

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI