Automated Wheat Diseases Classification Framework Using Advanced Machine Learning Technique

Khan, Habib; Haq, Ijaz Ul; Munsif, Muhammad; Mustaqeem,; Khan, Shafi Ullah; Lee, Mi Young

doi:10.3390/agriculture12081226

Open AccessArticle

Automated Wheat Diseases Classification Framework Using Advanced Machine Learning Technique

¹

Sejong University, Seoul 05006, Korea

²

Interaction Technology Laboratory, Department of Software Convergence, Sejong University, Seoul 05006, Korea

³

Department of Electronics, Islamia College University, Peshawar 25000, Pakistan

^*

Author to whom correspondence should be addressed.

Agriculture 2022, 12(8), 1226; https://0-doi-org.brum.beds.ac.uk/10.3390/agriculture12081226

Submission received: 10 June 2022 / Revised: 23 July 2022 / Accepted: 8 August 2022 / Published: 15 August 2022

(This article belongs to the Section Digital Agriculture)

Download

Browse Figures

Versions Notes

Abstract

:

Around the world, agriculture is one of the important sectors of human life in terms of food, business, and employment opportunities. In the farming field, wheat is the most farmed crop but every year, its ultimate production is badly influenced by various diseases. On the other hand, early and precise recognition of wheat plant diseases can decrease damage, resulting in a greater yield. Researchers have used conventional and Machine Learning (ML)-based techniques for crop disease recognition and classification. However, these techniques are inaccurate and time-consuming due to the unavailability of quality data, inefficient preprocessing techniques, and the existing selection criteria of an efficient model. Therefore, a smart and intelligent system is needed which can accurately identify crop diseases. In this paper, we proposed an efficient ML-based framework for various kinds of wheat disease recognition and classification to automatically identify the brown- and yellow-rusted diseases in wheat crops. Our method consists of multiple steps. Firstly, the dataset is collected from different fields in Pakistan with consideration of the illumination and orientation parameters of the capturing device. Secondly, to accurately preprocess the data, specific segmentation and resizing methods are used to make differences between healthy and affected areas. In the end, ML models are trained on the preprocessed data. Furthermore, for comparative analysis of models, various performance metrics including overall accuracy, precision, recall, and F1-score are calculated. As a result, it has been observed that the proposed framework has achieved 99.8% highest accuracy over the existing ML techniques.

Keywords:

artificial intelligence; computer vision; machine learning; precision agriculture; wheat diseases

1. Introduction

Agriculture is the most important sector due to its economic influence on society, particularly in developing countries [1]. Food demand is growing exponentially day by day due to the increase in population and shortage of food ingredients. Crops such as wheat, maize, and rice are the main components of foods [2]. However, one of the most badly influencing factors among others on the quality of crop production is crop diseases. Crop diseases are the main source of crop yield losses in terms of quality and quantity, having a bad effect on large as well as small-scale farmers [3]. Moreover, small-scale cultivators of developing countries are contributing up to 80% of the global production of crop yields; in contrast, food losses in these areas are much higher due to a lack of access to the resources and latest technology [4]. Apart from this, according to the World Health Organization (WHO) [5], more than a hundred diseases are caused by poisoned food, and almost six hundred million people are ill, of which 0.4 million people die yearly. In addition, the farmers have no quick way to identify on-time diagnoses of particular diseases. So, the quality and quantity of the crops can improve through precision agriculture.

Around the globe, wheat is the most important ingredient of food; therefore, it is the most popular cereal which has been cultivated by farmers around the world [6]. According to the Food and Agriculture Organization (FAO) of the United Nations [7], in the years 2018 and 2019, wheat produced almost 28% of total global cereal production from an estimated area of 215 million hectares. However, the demand for wheat is far higher than the production of wheat cereal, specifically in developing countries [2]. Different factors are involved in the low production of wheat, one and most important factor are diseases, which can cause 15–20% losses in global wheat production per annum [8]. Wheat leaf common diseases such as leaf rust and yellow rust are the most widespread diseases in wheat plants and can cause huge losses of food and economic activities deflation if they remain uncontrolled [9]. Further, most farmers, especially in developing countries, depend on agriculture experts to identify and diagnose the disease [10]. The quick response to disease is very crucial to stop it from spreading to the entire plant and even the entire field, particularly in wheat plants.

Wheat plant disease detection and identification are always a challenge for farmers to look after the whole field and visit and examine each plant by themselves or through an agriculture expert. Because of the density of wheat crops in the field, it is very time- and resource-consuming to manually monitor the whole field [11]. Due to recent advancements in computer technology such as human-computer interaction [12,13] and AI [14,15,16,17,18,19], intelligent systems can help out farmers in the field identify wheat leaf diseases, using different automatic methods, such as Computer Vision (CV) and AI-based methods [20]. The development of a robust ML and CV-based system for wheat disease recognition that is capable to work in diverse field conditions with accurate performance has various challenges and some of them are discussed below.

The first and very important challenge for plant disease recognition, especially for wheat using CV and ML, is data collection of diseases and the creation of a challenging database which have a variety of datasets such as images captured with various angles and occlusion [21,22,23]. Current freely available datasets of plant disease include [24] which includes the images of wheat diseases, another one is reported in [25] which has datasets with variations of angles and illumination. So, data acquisition from fields with wheat diseases during different conditions is very essential for accurate automatic disease identification.

The second challenge is the segmentation of leaves from busy backgrounds and regions of interest which has the potential to improve model accuracy [22,26,27]. Most of the existing methods for wheat disease recognition use manual methods such as manual cropping of images after capturing by farmers. Most inexperienced farmers cut the important region in an image by cropping and leave the unessential region of the captured image which badly affects the accuracy of the system. On the other hand, the segmentation of interesting regions in leaves can improve the accuracy of the disease identification system.

The third one is the selection of proper methods for feature extraction and classification of extracted features [28]; most of the existing techniques for wheat leaf disease detection used rare feature extractors of CV and ML models without any investigation of other descriptors and models’ performance. For example, [29] used only color and texture descriptors with Random Forest (RF) classifiers for wheat disease classification. Further, [30] used only the Maximum Margin Criterion (MMC) method for the severity and classification of wheat leaf diseases. Thus, feature extractors and classifiers for classification should be selected based on comparative performance.

In this paper, we proposed an ML-based wheat disease recognition framework and achieved the highest recognition accuracy after a comparative analysis of various feature extraction and classification algorithms. The major contributions of our work are as follows:

We proposed a machine learning-based framework for the detection of salient cues regarding wheat diseases and accurately classify them into yellow and brown rust. Our model utilized a masked-based segmentation technique that automatically removes the background, noises, and identifies healthy, unhealthy wheat crops, and determines the affected and unaffected area of the crop. The proposed framework is lightweight and automatically identifies the wheat crop diseases with a high recognition rate.
A new dataset for wheat disease classification is introduced. The dataset is collected from different wheat fields in various regions of Peshawar, and Dir Pakistan. We focused on two categories of diseases, having a total of three classes, i.e., brown rust, severe yellow, and healthy leaves, respectively. The dataset will be publicly available to the research community.
A comparative analysis has been conducted among ML techniques for wheat disease recognition. The proposed framework achieved 98.8% accuracy for the wheat diseases classification. Due to good generalization and a high recognition rate, the system can be employed in various real-time industrial applications.

The rest of the paper is arranged as follows. Section 2 discusses the related work in detail, followed by Section 3, where the proposed system is comprehensively discussed with implementation details. Section 4 illustrates the experimental results in detail. The outcomes and possible future research directions are presented in Section 5.

2. Related Work

Around the globe, researchers are striving to develop significant guidance and insights to help farmers make better decisions and take actions accordingly. For the last two decades, advancement in technology such as AI and computing has attained the attention of researchers. To produce a more effective system for actual disease diagnosis and to categorize diseases with high accuracy, there are many alternative schemes with diverse combinations that may be used. All these conventional (statistical and image processing) techniques as well as ML-based methods for the plant and leave diseases, specifically wheat disease recognition and classification. Different researchers had contributed to the different aspects of precision agriculture [31]. Various advances in digital image processing and ML methods were used for crop leaf disease detection and recognition using their leaf images [28,32,33,34]. The literature can be categorized into two subsections. This section discussed comparatively less intelligent ways of precision agriculture such as pure image processing or CV-based disease classification and more intelligent ones such as ML-based task handling during precision agriculture.

2.1. Statistical-Based Approaches

To investigate pure image processing-based approaches toward plant leaf disease recognition, Xu et al. [35] designed an image recognition-based embedded technique for wheat leaf rust disease identification, where high-resolution images of wheat leaf rust were first converted into a single-channel gray-level RGB images then used the Sobel operator approach to detect vertical edges on a grey image. The background was removed, and a binary feature point set of diseased spots was extracted. They used a flood-filling algorithm to filter out the noisy points in the point set. Their framework achieved 92.3% accuracy by using the technique of pure image processing. However, they targeted only one disease and the proposed method is not robust when the condition of the field changes. To develop a more intelligent system, similarly, Islam et al. [36] proposed an approach that combines image processing and ML for identifying potato diseases from leaf images. They used color-based image segmentation as a preprocessing step followed by feature extraction by using statistical features and multiclass SVM classification for the categorization of potato leaves. They took only 300 images consisting of three potato leaf classes (late blight leaf, early blight leaf, and healthy potato leaf). Their method achieved 95% accuracy. However, the dataset was taken from a very small and publicly available plant village. Furthermore, they used statistical features instead of other available state-of-the-art feature descriptors for feature extraction. The accuracy can be improved by using other ML techniques. Apart from this, Alehegn et al. [37] developed a hybrid maize leaf disease classification and recognition by proposing a hybrid approach to ML. They used preprocessing techniques for feature extraction such as segmentation. Their dataset consists of 800 images and targeted four classes such as the healthy leaf, common rust, leaf spot, and leaf blight. The 80/20 ratio was considered for training/testing. They obtained the most accurate classification using SVM when color, texture, and morphology features were combined. As a result, their model has achieved 95.63% accuracy. Hossain et al. [38] developed an automated SVM-based ML model for the recognition and classification of tea leaf diseases in Bangladesh. Their dataset consisted of three classes, two for diseases (brown blight disease and algal leaf disease), including a healthy leaf class. They used statistical features for feature extraction for the given model. The suggested technique was able to classify more accurately with 93% accuracy. However, their proposed framework has many limitations which include limited samples in the dataset, using only pure statistical features, and their proposed framework has low accuracy which can be improved by using more models and a big dataset.

2.2. Machine Learning-Based Approaches

ML-based techniques are widely used in various domains such as medical [39,40], agriculture, and image classification domain. To investigate other ML models and state-of-the-art feature extractors, Akmal et al. [41] proposed a technique for the recognition of corn and potato leaf diseases by considering plant village datasets for classification and validation. They used three feature extractors which include local ternary pattern (LTP), histogram-oriented gradient (HOG), and segmented fractal texture analysis (SFTA). On chosen crop diseases, competent results were produced between 92.8 to 98.7 percent using the multi-class SVM (MSVM) along with the Cubic kernel function, which was comparatively good and higher than existing methods. Jerome Treboux et al. [42] presented an ML innovative approach to discriminate vineyards from other agricultural objects. The dataset is composed of images taken by a drone from 5 vineyards in Valais, Switzerland. For training and testing purposes, the dataset was divided into two portions. The 90/10 ratio was considered for training/testing. The authors achieved the highest accuracy of 89.6% from baseline using color analysis. Further, their accuracy improved up to 94.27% after using the decision tree ensemble (DTE) innovative approach of ML. For early plant disease detection, Rump et al. [43] proposed a framework for sugar beet diseases using an ML algorithm based on SVM and spectral vegetation indices. Their proposed framework has achieved an accuracy of up to 97% on discrimination of diseased and healthy sugar beet leaves. Further, the accuracy for the classification of healthy leaves and leaves with symptoms of three diseases was above 86%. However, the accuracy is low while differentiating between more than one disease which can be improved by using a robust descriptor and ML models. To develop a robust system, Ramesh et al. [44] proposed an RFC-based ML technique for identification between healthy and diseased papaya leaves. They have used a histogram of an oriented gradient (HOG) for feature extraction. For the training of the ML model, they used only 160 images which is a very low number to achieve high accuracy. Their framework achieved 70% accuracy. However, the achieved accuracy can be improved by using a vast number of images and state-of-the-art feature descriptors.

Phadikar et al. [45] proposed an ML technique for the classification of rice healthy leaf and diseased leaves such as blast and leaf brown-spot diseases. For training and testing of the ML model, they used their dataset which was collected from major rice-producing areas of India such as district East Midnapur of South Bengal. They performed two phases of classification: (a) classification of the healthy and diseased leaves. (b) Classification of various leaf diseases. Bayes’ theorem and SVM classifiers were used and their performance is compared. They achieved 68.1% and 79.5% accuracies for SVM and Bayes’ classifier-based systems, respectively. For the same plant, Prajapati et al. [46] presented a technique to identify and categorize three diseases of rice plants. For multi-class classification, they used a support vector machine (SVM). They collected samples from a rice field in the village Shertha near Gujrat India, to produce their dataset of leaf images. They tested some feasible image processing and background removal methods. To segment the diseased part, they tested several segmentation algorithms. For disease segmentation, K-means clustering with feeding centroid values was used. To extract different features, they used three main categories: texture, color, and shape. Their approach achieved up to 93.33% accuracy. However, their accuracy can be improved further by using a dataset that has variations (illumination and different other angles) and state-of-the-art models. In continuation of rice plants diseases identification, Ahmed et al. [47] introduced a model for rice leaf disease detection using ML techniques. This study focused on three of the most widespread rice plant diseases which include brown spot, leaf smut, and bacterial leaf blight. Their dataset consists of 480 images and the 90/10 ratio was considered for training/testing. They compared four ML techniques (logistic regression, decision tree, KNN, and Naïve Bayes) with each other, and it was found that decision tree comparatively performed 97.91% performance. However, they used statistical feature extractors which are not very robust towards physical changes (illumination structures) and by improving the dataset, the performance of the model can be improved. To develop a more robust system, Panigrahi et al. [48] proposed a framework based on ML algorithms which include SVM, RFC, DT, and KNN used for the detection of various maize crop diseases. The aforementioned classification approaches are tested and compared to determine the best model with the highest accuracy. In comparison to the other classification approaches, the RFC algorithm showed good accuracy of 79.23%. The maize datasets are divided into 90% for training and 10% for testing the whole dataset. The dataset contains 3823 images consisting of four classes, namely healthy (1162 images), gray leaf spot (513 images), common rust (1192 images), and northern leaf blight (985 images). However, they trained the models on poorly captured images which are not sufficient because diseases can affect any part of the plant. Waghmare et al. [49] proposed a multi-class SVM classification-based machine learning technique for grape plant disease identification. As preprocessing steps, they used segmentation to remove the background area. Their research focused on two major diseases that commonly affect grape plants (black rot and downy mildew). Their dataset consists of 450 samples (160 healthy leaves and 290 diseased leaves) of grape leaves. A special texture-based feature is used to extract the segmented leaf texture. They used multi-class SVM to classify the extracted texture patterns. Their model achieved 96.6% accuracy. However, the accuracy is low which can be improved. For wheat leaves diseases classification, Zhao et al. [50] suggested an integral technique based on ML algorithms for leaf-scale wheat powdery mildew. The proposed framework was evaluated and trained by a hyperspectral images-based dataset. Three diagnosis models were constructed which include SVM, probabilistic neural network (PNN), and RFC. After a comparison of used models based on accuracy, the best model which was SVM had 93.33% classification accuracy. However, their accuracy is low and can be improved by using other state-of-the-art models with various feature descriptors. To improve accuracy, GuanLin et al. [51] proposed a novel approach to recognize two types of wheat rusts (stripe rust and wheat leaf rust) based on SVM and multiple feature parameters of their dataset. As preprocessing steps, they used image cutting, de-noising, and segmentation techniques. Furthermore, they extracted color- and texture-related features from preprocessed images. As a classification model, they utilized SVM. The authors achieved an accuracy of 96.67% by using the SVMs with radial basis function (RBF) on the selected twenty-six features. However, they used only one feature extractor and the model was trained on an invariant dataset which became less robust when the physical appearance changed. To improve accuracy, Azadbakht et al. [52] proposed an ML-based detection of one wheat disease. They developed their dataset based on canopy scale and under different LAI levels. Their proposed framework identified the severity level of wheat leaf rust at the canopy scale by using four ML techniques which include Random Forests Regression (RFR), ν-support vector regression (ν-SVR), Gaussian process regression (GPR), and boosted regression trees (BRT). They achieved high accuracy up to 99% using ν-support vector regression (ν-SVR). However, the experiments were performed only on one disease and their focus was only on the severity of one disease.

In Table 1, we tabulate the summary of the above-discussed related work. Different techniques were proposed by researchers for various crop leave disease recognitions in the ML domain which include maize, rice, tea, vineyard, and our focus plant wheat. They used various methods for preprocessing, commonly resizing, de-noising, and cropping; some of them used segmentation for other than wheat plant leaves as preprocessing steps. For feature extractions, they used statistical and CV-based techniques. Recognition has been done by algorithms that include pure statistical as well as ML approaches such as SVM and RFC. However, we found deficiencies in the existing works about wheat leave disease recognition and classification using ML methods in terms of unavailability of diverse datasets, robust preprocessing techniques, and accuracy of the framework. Therefore, we bridge this gap by proposing a framework for common wheat diseases recognition which proved to have a comparatively high accuracy in the result of the collected dataset, preprocessing steps such as masked-based segmentation, feature descriptors, and the proposed fine-tuned RFC framework, specifically for wheat common disease recognitions.

3. Materials and Methods

In this section, the proposed approach for various wheat disease recognition and classification is discussed. For better understanding, this section is divided into three subsections. Section 3.1:Real-Time Data Collection, Section 3.2: Data Preprocessing and Features Extraction3.3: Proposed Fine-Tuned Framework. These steps are comprehensively discussed below in the relevant subsections. The pictorial representation of the overall work is given in Figure 1, where first step is the collection of data, data are collected from different wheat fields, and a new database from the data is made. In the second step, the collected data are divided into three different classes which include healthy, rusted, and yellow-rusted leaves. Furthermore, the data of different classes are divided into training and testing with a ratio of 8:2, respectively. In the next step, different ML models are trained which are discussed below in detail. For better evaluation, our model performance is evaluated using test data and calculated different evaluation matrices such as accuracy, confusion matrix, precision-recall, and F1 score on unseen data. In this step, the remaining 20% of data that are allocated for the training phase is loaded and passed from the given preprocessing and feature extraction steps, as in the training phase, and observed during the prediction of the model.

3.1. Real-Time Data Collection

For the Integration of computer vision technology in agriculture and facilitation of plant disease diagnosis, researchers developed various open-access datasets, such as plant village datasets containing over 50,000 images of different plant species with diseases annotated by field experts. However, it does not contain sufficient wheat disease data. To the best of our knowledge, there is no available and appropriate open-access dataset of wheat leaves. Thus, over 3000 images for three different and important classes (Healthy, Rusted, and Yellow-rusted) of wheat leaves are collected from the actual fields of two geographical and environmentally different places of Pakistan, including Peshawar city located in the east of Khyber Pass, mostly having very warm weather up to 40 °C in summer and Dir located near the Lowari pass in the Khyber Pakhtunkhwa which has mostly normal temperature in summer up to 32 °C. Moreover, the dataset is collected through smartphone cameras with resolutions of 1024 × 768. However, the images are resized using the inter-area interpolation technique in a preprocessing step to reduce computation time which is discussed in the next section. In addition, the data are equally distributed in three classes (healthy, rusted, and yellow-rusted leaves), each class contains 1050 images of wheat leaves.

3.2. Data Preprocessing and Features Extraction

For comprehensive understanding, this section is divided into two subsections which include preprocessing and feature extraction. The details of each subsection are given in the following.

3.2.1. Preprocessing

Preprocessing is a very important step in ML. It helps remove unwanted data and reduce the computation time during the training and testing of models. Our preprocessing is shown in Figure 2, which includes two techniques. The first technique in preprocessing is resizing which is the adjustment of the sizes of images without having to take anything out. To make image processing systems more accurate and run faster, high-resolution images are nearly always down-sampled. In this work, the INTER_AREA interpolation method is used for the scaling of images, and each image is resized from 1026 × 768 to 250 × 250. The objective of the interpolation function is to take these neighboring areas of pixels and use them to expand or reduce the image’s size. In general, it is considerably better to reduce the image size using the interpolation method. This is due to the fact that the interpolation function removes pixels from an image. Inter-area is the desired approach for image reduction because it produces moire free results. This step improved our system performance in terms of computational complexity and accuracy. The second is Image segmentation which is the process of segmenting an image into clusters based on the similarity found in the intensity values of the input image. For instance, the pixel values in wheat leaves that are similar to the affected region will belong to the affected cluster and the rest of the region will consider the healthy part. This process seems simple apparently, but it becomes very difficult when some pixel values laying on the boundary such as boundary spots of brown rust. The brown-rusted pixels on the boundary sometimes are much close to healthy region values; so, the decision about it becomes difficult, whether to consider it as the part of affected, healthy or accommodate it into another region. The Fuzzy set-based ideas allow us to deal with these situations. For example, there is a set consisting of N number of elements, and one needs to divide these into C number of clusters. Each element of the set will have a C number of membership values according to C clusters. So, an element will be the part of that cluster C which have the highest membership value. The intensity values of the wheat leaf images as a set of N number of pixel values and C-means clustering algorithms are used. This works with the same concept and perform this process is iteratively performed for each pixel value using the following formula given in Equation (1).

J_{m} = \sum_{i = 1}^{N} \sum_{j = 1}^{C} u_{i j}^{m} {| | x_{i} - c_{j} | |}^{2}

(1)

where N represents image intensity values, C is the number of clusters and m is any real number greater than 1. While

\underset{i j}{u}

shows the degree of membership in cluster j,

x_{i}

is the current value in the image,

c_{j}

and is the center of a specific cluster. In our case, the m value is taken 2 and achieves the highest performance when the number of clusters was 3. The images were passed from the proposed algorithm and the colors were assigned to various parts of the leaves. Generally, diseased parts of the image have greater intensities than healthy parts.

Therefore, we assigned the highest intensity values to the diseased spots and made a cluster from it, and the rest of the parts are divided into the boundary and healthy clusters. As the result of segmentation, more suitable and highlighted images are obtained, as shown in Figure 2 for further processing.

3.2.2. Feature Extraction

Feature extraction is one of the most important steps for ML-based model development. The performance of ML algorithms depends on the extracted features. If extracted features are relevant to the ROI, ML classifiers can differentiate among classes with high accuracies. The basic idea of feature extraction is to extract only those features which have a high weight in terms of the representation of an object and reduce computational complexity by avoiding further processing of less meaningful features. Many descriptors are used by researchers for varieties of features that include texture, shape, and color feature descriptors. As shown in Figure 1, three relevant feature descriptors are used which include Histogram of Oriented Gradient (HOG), particularly designed for shape extraction, Local Binary Pattern (LBP) which is mostly used for texture feature extraction, Hue- Moment (HM) which is a statistical descriptor using for shape feature extraction, Color Histogram (CH) is used for color features extraction, and Haralick Texture (HT) that is using 14 features as textures. Each of the descriptors is used separately, and three (HM, HT, and CH) of them are combined for performance analysis. After detailed experiments, it is found that the combination of these three feature descriptors is effective. In terms of testing accuracy, these three feature descriptors are explained below in detail.

Hue Moments (HM)

Hue moments are descriptors used for shape feature extractions, accepting gray and binary input images. In our case after the image segmentation, the healthy and diseased areas become distinguished and the shape of the segmented clusters could be helpful in the identification of the diseases. For that simple translation, invariant shape features are calculated using Equation (2).

μ_{i j} = \sum_{x} \sum_{y} {(x - \bar{x})}^{i} {(y - \bar{y})}^{j} I (x, y)

(2)

where x and y represent the location of the pixel connected to the object region while

\bar{x}

and

\bar{y}

represent the centroid of the shape of the object. Centroid is considering the center of the mass which is calculated using Equations (3) and (4), where

M_{00}

the area is while

M_{10}

and

M_{01}

represent coordinates of the center of a shape.

\bar{x} = \frac{M_{10}}{M_{00}}

(3)

\bar{y} = \frac{M_{01}}{M_{00}}

(4)

To equip the central HM with scale invariance. It is very essential to normalize the central moment. The normalized version of the main moment is calculated by using Equation (5).

η_{i j} = \frac{μ_{i, j}}{μ_{00}^{(i + j) / 2 + 1}}

(5)

Now, the central moment is a translation and scale-invariant, however, it is not enough for robust shape matching. The central moment must be rotation- and reflection-invariant along with scale and rotation invariance. So, the following seven moments are calculated using Equations (6)–(12) and defined as h_n where n is the number of moments.

h_{0} = η_{20} + η_{02} h_{1}

(6)

h_{1} = {(η_{20} - η_{02})}^{2} + 4 η_{11}^{2}

(7)

h_{2} = {(η_{30} - 3 η_{12})}^{2} + {(3 η_{21} - η_{03})}^{2}

(8)

h_{3} = {(η_{30} + η_{12})}^{2} + {(η_{21} + η_{03})}^{2}

(9)

h_{4} = (η_{30} - 3 η_{12}) (η_{30} + η_{12}) [{(η_{30} + η_{12})}^{2} - 3 {(η_{21} + η_{03})}^{2}] + (3 η_{21} - η_{03}) [3 {(η_{30} + η_{12})}^{2} - {(η_{21} + η_{03})}^{2}]

(10)

h_{5} = (η_{20} - η_{02}) [{(η_{30} + η_{12})}^{2} - {(η_{21} + η_{03})}^{2} + 4 η_{11} (η_{30} + η_{12}) (η_{21} + η_{03})]

(11)

h_{6} = (3 η_{21} - η_{03}) (η_{30} + η_{12}) [{(η_{30} + η_{12})}^{2} - 3 {(η_{21} + η_{03})}^{2}] + (η_{30} - 3 η_{12}) (η_{21} + η_{03}) [3 {(η_{30} + η_{12})}^{2} - {(η_{21} + η_{03})}^{2}]

(12)

These seven invariants of the moments describe the shape of objects as a 7D vector which is used in this work as a feature with the concatenation of other features in the training of the ML model.

Color Histogram (CH)

Color has a very important role in image recognition as, in our case, the affected area of leaves has the color yellow or brown, and healthy areas are green. So, due to this, color features along with other features are considered to attain better performance. Our collected dataset was in RGB in which an individual channel contains various information. The red channel contains high, and the blue contains less information. To separate illumination from the color RGB to HSV, the model is converted to build a histogram of illumination. The histogram is a statistical representation of visuals that calculate a histogram by summation of occurrences of illumination levels. The histogram transforms image intensities into frequencies without considering the coordinates of the pixels. Due to this property, a color histogram becomes rotation invariant. The HSV image histogram is calculated using Equation (13). The histogram is calculated for each channel and then concatenated after the summation of the intensities in each channel.

{H, S, V} = [\sum_{i = 0}^{L} i_{H} \sum_{i = 0}^{L} i_{s} \sum_{i = 0}^{L} i_{v}]

(13)

where L denotes levels of intensities and

i_{H}

,

i_{s}

and

i_{v}

represent frequencies of levels in H, S, and V channels, respectively. After calculating the histogram, it is converted into a vector and considered a color-based feature for input images.

Haralick Texture (HT)

Texture features are also always considered very important in CV. Texture features show the properties of the surface whether the surface is smooth, rough, or how a pattern exists on the surface. In automatic plant health analysis, texture features have very important, i.e., in our case, healthy and yellow leave surfaces are mostly smooth; however, brown-rusted ones have high roughness. To keep aware of our system of the texture of leaves, texture features are calculated by using the HT descriptor. HT calculates the total of 14 features, however, we selected only four (contrast, correlation, uniformity, and homogeneity) of them due to robustness properties. Contrast features return the variations in the intensity of the adjacent pixels of input images. If the image has no variations, then the contrast will be zero. Correlation is the property of a grayscale image that returns how much the adjacent pixel is interrelated to each other. The value of correlation can be 1, −1, or Nan. Uniformity shows how much the surface of the image is uniform. It is calculated by taking the square of the summation of the gray level co-occurrence matrix (GLCM). Homogeneity calculates intimacy between adjacent pixels. At the end of these features, we made a vector and considered it as a texture feature vector. Contrast, correlation, uniformity, and kurtosis are calculated using Equations (14)–(17), respectively.

C o n = \sum_{i}^{G_{l} - 1} \sum_{j}^{G_{l} - 1} ({| | i - j | |}^{2} G (i, j)

(14)

C o r = \sum_{i, j = 0}^{G_{l} - 1} \frac{1}{σ_{i} σ_{j}} [{i . j \times G (i, j)} - μ_{i} μ_{j}]

(15)

U n i = \sum_{i}^{G_{l} - 1} \sum_{j}^{G_{l} - 1} G {(i, j)}^{2}

(16)

H o m o = \sum_{i}^{G_{l} - 1} \sum_{j}^{G_{l} - 1} {1 + {(i, j)}^{2}}^{- 1} G (i, j)

(17)

where the texture of an image is stored in a matrix I (i, j) called GLCM. So, here, in these equations, G_l represents the total number of gray levels of an image. After extraction, these features are normalized, using skewness, mean, and kurtosis as defined in Equations (18) and (19).

S k e w = \frac{E (x_{i}^{3}) - 3 μ σ^{2} - μ^{3}}{σ^{3}}

(18)

K u r = E [{(\frac{X_{i} - μ}{σ})}^{4}]

(19)

where Skew and Kur represent skewness and kurtosis, respectively. Further, in these equations, E denotes the expected mean and Xi normalized scale matrix. After extracting the above features, these are fused into one matrix and passed for model training in the next module.

3.3. Proposed Fine-Tuned Framework

The major goal of our study is to develop an efficient recognition system that can automatically identify and classify wheat diseases. Therefore, machine vision is a technology that has the ability to substitute human inspectors in order to achieve automatic evaluation and diagnosis of wheat diseases, thus providing objective inspection results. To achieve the goal of an efficient recognition framework, a comprehensive baseline study was conducted as discussed in Section 3.4. In the experimental evaluations, the RFC performance achieved the best result as compared to other baseline models, and it is therefore recommended to fine-tune the RFC model in order to achieve the best or desired performance. RFC is mostly used to solve classification problems and its basic workflow consists of randomly selected samples from the provided dataset and then constructing a mathematical tree for decision during prediction for each sample and later providing a vote for each predicted sample. The model chooses the high vote decision as a final prediction. Furthermore, RFC does not require huge memory and can be parallelized while training on the multi-cores of the computer, which speeds up the training process. RFC mainly consists of DTs where it randomly selects several features from the subset of the input features and further processes it to decide a class for the input sample. A tree in the RFC consists of three components, including a root node from where the random forest starts. The decision node comes after splitting the root node and the leaf node which indicates the end of a tree.

G I = 1 - \sum_{J = 1}^{n} {(p_{j})}^{2}

(20)

In addition, the RFC performance depends on various hyperparameters which include the number of trees, feature selection method, tree depth, complexity handling, etc. After performing the experiments and to keep the trade-off between model computational complexity and accuracy, we choose only 100 trees for the final model. For the subset of feature selection, we use different techniques which include Gini, entropy, and log loss. However, due to simplicity and less computational complexity, we choose the Gini as presented in Equation (20) where GI represents the Gini index probability for each feature represented by p. For the depth of the tree leaves, the node expands until all leaves are pure. Another important attribute of the RFC is bootstrap which considers only a specific part of the data to construct trees. As mentioned above, the wheat disease class seems very similar due to color similarities such as brown and yellow rust. Thus, we keep bootstrap false to consider all input data to construct trees. Besides this cost complexity, a pruning parameter called CCP alpha is used to choose only those trees which have minimum computational complexity.

Furthermore, the key contribution of the proposed fine-tuned RFC is the self-collected data and its preprocessing techniques where a proposed segmentation technique is applied for the region of interest extraction, in order to reduce the computation complexity. Additionally, our selected most relevant feature descriptors perform well in accuracy which plays a major role in the proposed lightweight and accurate framework. If we compare our proposed fine-tuned RFC with other ML models such as neural networks, it can perform well when handling classification tasks but most of them are computationally expensive. So, with this set of the RFC model, we achieve the high performance presented in the result section.

3.4. Comparative Analysis of Baseline Models

Baseline model analysis can show the effectiveness of the proposed fine-tuned RFC. In our study, it is very significant to evaluate the performance of our model and comparatively analyze the results on deployed data, as we are also proposing a new dataset. Therefore, apart from the proposed fine-tuned RFC, we consider five other ML models adjusting for baselines. The risk of increasing unnecessary model complexity is checked and reduced accordingly after the detailed comparative analysis of these baseline models. Hence, several experiments are done on the selected models, including Logistic Regression (LR), KNN, Decision Tree (DT), NB, and SVM, as given in Table 2. The LR model is used to assign observations to discrete sets of classes. The logistic sigmoid activation function transforms the result of LR to return a probability value which can then be mapped to three classes. It is used to solve our multi-class classification problem through the “one vs all” technique. LR achieved 89.6% accuracy on the given dataset. KNN is also used by many researchers for solving crop classification problems as mentioned in Table 1. This algorithm works on “feature similarity”, which means that values will be assigned to a data point based on how nearly it resembles the points in the training set and it has reached 99.0% accuracy, which is the third one in Table 2. The DT technique is also included in our experiments, it starts with the root node and grows into a tree-like structure with more branches. The purpose of this technique is to develop a mathematical model that predicts the value of the targeted classes, and the decision tree solves the problem by using the tree representation. Decision nodes are used for making the decision and have several branches whereas leaf nodes are the result of those ML decisions and do not contain any other branches. DT achieved 99.2% performance and ranked as runner-up results after our proposed fine-tuned RFC. Furthermore, NB algorithms are also used to compare the results. NB uses Bayes theorems to create a set of classification algorithms known as Naïve Bayes classifiers. It is a family of algorithms that share a similar concept, i.e., each pair of features is categorized as independent of the others. It is a probabilistic technique, which makes predictions based on an object’s probabilities. Additionally, SVM is used for comparing the results, as it is the most widely used and popular supervised ML algorithm, which is used for both classification and regression problems. According to the literature, SVM is mostly used for classification problems in machine learning. The SVM algorithm’s objective is to find a hyperplane in N-dimensional space (n is the number of features) that classifies distinctly the data points. Several hyperplanes are selected to split the three classes of data points and it has achieved 94.6% accuracy. After a comparative analysis of Table 2, it has been observed that the proposed fine-tuned RFC model achieved 99.8% accuracy on the given three classes dataset.

4. Experimental Results

In this section, we discuss the experimental setting, collected dataset, evaluation metrics, and evaluation of the performance of our proposed framework. Furthermore, we elaborate on the performance of trained models comparatively.

4.1. Experimental Settings

All the experiments are carried out on a computer system with specifications of Intel^® Xeon^® X5560 processor with 2.80 GHz clock speed and installed memory (RAM) 8.00 Giga bite and GPU of GTX GFORCE 1070 are installed. In addition, the Microsoft Windows operating system are used. Apart from this, different libraries are utilized for the implementation of our project which includes python 3.7 as a programming language, OpenCv version 3.4 for preprocessing which is a CV library, and scikit learn ML library version 0.24.2 is used for training and testing of various ML models. Matplotlib is a python-based visualization library for the visualization of various images, results, and graph generation. Furthermore, the OS glob library is utilized for reading different files from the hard drive.

4.2. Dataset

After collecting the data, the images are resized using the inter-area interpolation technique in a preprocessing step to reduce computation time which is discussed in the previous section. In addition, the distribution of the wheat leaf dataset consists of three classes (healthy, rusted, and yellow-rusted leaves) each class contains 1050 images of wheat leaves. Furthermore, Figure 3 shows image samples from different classes: the first row contains healthy samples, the second is rusted, and the last row visualized the yellow rust class of the wheat diseases.

4.3. Evaluation Metrics

When the training of all models became completed, as shown in Figure 1, then, the remaining unseen 20% data of the whole dataset are passed from preprocessing and feature extraction phases, and, then, extracted features are classified, using different trained models as discussed above in detail. The models are evaluated based on accuracy, precision, recall, and F1 score. All these performance evaluation values are calculated from the confusion matrix using Equations (21)–(24), respectively. These equations contain different values which were acquired from the confusion matrix of each model. These magic values include True Positive (TP) which shows how many times our model classified the positive testing data sample as positive, True Negative (TN) shows how many times our model classified the negative class correctly, False Positive (FP), and False Negative (FN) show the rate of misclassification of different positive classes. As a result, we found the highest accuracy which is 99.8% of our proposed framework followed by DT which is 99.2%. On other hand, the testing accuracy of LR is observed lower than all of the others which were 89.6%.

Acc = \frac{TP + TN}{TP + TN + FP + FN}

(21)

Precision = \frac{TP}{TP + FP}

(22)

Re c a l l = \frac{TP}{TP + FP}

(23)

F 1 - s c o r e = \frac{2 * (P r e c i s i o n * Re c a l l)}{P r e c i s i o n + Re c a l l}

(24)

4.4. Results

To evaluate the performance of our framework, different experiments on the training dataset are conducted. All testing images were selected randomly. The size of the testing dataset was 20% of our whole dataset which consists of 210 images for each category (healthy, rusted, yellow-rusted). Furthermore, based on the testing set, it is observed that different evaluation graphs include confusion matrix, overall accuracy, and performance evaluation graph. These graphs contain precision, recall, and F1-score. All of these are discussed below in detail.

In these experiments, a comparative analysis of the performance of ML models is conducted on the combined extracted features of colors, shape, and harlick textures of wheat leaves. Accuracy is calculated using Equation (21). We divided the total accurate prediction of trained models by the total number of testing samples of the allocated dataset for testing. 20% of the whole dataset is split for testing as mentioned in the above relevant section. Calculated accuracies of all the trained models are mentioned in Figure 4, in which the proposed framework has the highest accuracy of 99.8% followed by DT (99.2%) and KNN (99.0%). On the other hand, the accuracy of the LR is 89.6% which is lower than all the mentioned models. However, SVM and NB achieved accuracies up to 94.0% and 97.7%, respectively.

For better evaluation and verification of our experimental results, we calculated various evaluation values from the confusion matrix. These values include precision, recall, and F1-score. These metrics are calculated using Equations (22)–(24), respectively. All these values are depicted in Figure 5, where our proposed framework has achieved the highest 99.8% precision, recall, and F1 score. These results show that almost all testing samples are classified correctly by our proposed framework which is secondly followed by DT with 99.2% for all values. This means that only 1% of the data of each class is misclassified. However, LR has lower 91.0%, 90.0%, and 90.0% for these values, respectively. Although, the performance of other used ML models is between the range of our proposed framework and LR. All models except the LR model classified the class of yellow images correctly due to clear pixels in the input images. However, other classes of images are misclassified because these classes have high minimum differences due to light and dark disease spots on wheat leaves.

The confusion matrices for each model are computed that show us the accurate and wrong predictions for each class of testing samples. The confusion matrix consists of four values which include TP, TN, FP, and FN in which TP and TN show the accurate prediction while the other two values show the rate of false predictions against the ground truth. Confusion matrices for the proposed framework which has higher accuracy followed by DT and lowest accuracy RL models are tabulated in Table 3. The confusion matrix of the proposed framework verifies the best performance because, as shown, the proposed framework has classified all testing samples correctly except only one rusted sample is classified as healthy. On the other hand, DT has the second-highest accuracy due to only 5 testing samples being misclassified. Further, the Performance of LR is lower because 10.4% of the testing samples are misclassified instead of their corresponding classes.

Additionally, for a fair evaluation, the proposed framework is tested on independent data collected from online resources. Moreover, as we have mentioned, a suitable dataset is not available for the wheat leaves; therefore, we collect around 50 images of the three mentioned classes from google images and evaluate the model on it. Results of this evaluation are given in Table 4 of the updated manuscript, where the model has achieved around 88% for all classes. Here, the accuracy was found low because of the low and weird quality of the images.

The result of the suggested method has been compared against three state-of-the-art approaches, and the findings are shown in Table 5. The data are arranged in the table year-wise. The authors proposed methods for wheat disease classification using different techniques such as [52], used texture features and classified it with four ML models achieved 99% accuracy. In [50], the authors used the affected area divided by the total area of the wheat leaves formula for the finding of a disease as a feature and classified it using three ML models which include SVM, PNN, and the proposed framework, and achieved 93.33% accuracy. Further, in [30], the authors used color, texture, and their combination as a feature and classified them using the EMMC matrix. They achieved 94.16 % accuracy. On the other hand, our proposed framework achieved 99.8% accuracy in the result of a comparative analysis of different ML models with the proposed RFC model. We achieved this accuracy by concatenating the three descriptor features which include Haralick-texture, Color-histogram, and Hue-moment. The use of these techniques has increased in recent years, as researchers utilized them in high-level research for forecasting [53], age estimation [54], and time-series analysis [55].

5. Conclusions and Future Work

This paper presented a proposed framework for wheat diseases using the ML approach. We have collected high-quality images of wheat leave diseases that include brown-rusted and yellow-rusted ones with the addition of a healthy leave class from different fields of Pakistan to evaluate the proposed system. For accurate preprocessing, segmentation and resizing techniques are used. Various features such as shape, color, and texture features are extracted using different feature descriptors and combined. Six ML models are trained on extracted combined features of segmented images for comparative analysis of the SOTA ML model’s performance. After the comparative analysis, the proposed fine-tuned RFC model is observed with better accuracy. The performance of the proposed approach is evaluated using unseen data and a variety of evaluation matrices which included accuracy, precision, recall, and F1-score. For further evaluation, a comparison of our approach and the existing SOTA approaches is conducted. As a result, our method is observed to be more accurate in the recognition and classification of wheat diseases than existing approaches.

In the future, our objective is to extend the developed dataset to more wheat disease classes and incorporate the functionality of suggestion of treatment for corresponding detected disease. Furthermore, our aim is to deploy that model on resource-constrained devices such as handheld devices (android, iPhone, or jetson nano) which will become ready to help farmers very efficiently on time and in the field, without wasting resources and time.

Author Contributions

Conceptualization, H.K.; data curation, M.M.; formal analysis, I.U.H. and S.U.K.; funding acquisition, M.Y.L.; methodology, H.K., M.M. and M.; project administration, M.Y.L.; software, H.K.; supervision, M.Y.L.; validation, H.K., I.U.H. and S.U.K.; visualization, M.; writing—review and editing, I.U.H., M. and M.Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (No. 2021R1I1A1A01055652).

Institutional Review Board Statement

Not available.

Data Availability Statement

Not available.

Conflicts of Interest

The authors declare no conflict of interest.

References

Aker, J.C. Dial “A” for agriculture: A review of information and communication technologies for agricultural extension in developing countries. Agric. Econ. 2011, 42, 631–647. [Google Scholar] [CrossRef]
Barretto, R.; Buenavista, R.M.; Rivera, J.L.; Wang, S.; Prasad, P.V.; Siliveru, K. Teff (Eragrostis tef) processing, utilization and future opportunities: A review. Int. J. Food Sci. Technol. 2020, 56, 3125–3137. [Google Scholar] [CrossRef]
Chakraborty, S.; Newton, A.C. Climate change, plant diseases and food security: An overview. Plant Pathol. 2011, 60, 2–14. [Google Scholar] [CrossRef]
Nicholls, E.; Ely, A.; Birkin, L.; Basu, P.; Goulson, D. The contribution of small-scale food production in urban areas to the sustainable development goals: A review and case study. Sustain. Sci. 2020, 15, 1585–1599. [Google Scholar] [CrossRef]
WHO. Available online: https://www.who.int/news-room/fact-sheets/detail/food-safety (accessed on 26 April 2022).
Mottaleb, K.A.; Singh, P.K.; Sonder, K.; Kruseman, G.; Tiwari, T.P.; Barma, N.C.; Malaker, P.K.; Braun, H.-J.; Erenstein, O. Threat of wheat blast to South Asia’s food security: An ex-ante analysis. PLoS ONE 2018, 13, e0197555. [Google Scholar] [CrossRef] [PubMed]
Food and Agriculture Organization of the United Nations (FAO). Supply and Deman Brief; Food and Agriculture Organization of the United Nations (FAO): Rome, Itlay, 2020. [Google Scholar]
Figueroa, M.; Hammond-Kosack, K.E.; Solomon, P.S. A review of wheat diseases—A field perspective. Mol. Plant Pathol. 2018, 19, 1523–1536. [Google Scholar] [CrossRef]
Huerta-Espino, J.; Singh, R.; German, S.; McCallum, B.; Park, R.; Chen, W.Q.; Bhardwaj, S.; Goyeau, H. Global status of wheat leaf rust caused by Puccinia triticina. Euphytica 2011, 179, 143–160. [Google Scholar] [CrossRef]
Sankaran, S.; Mishra, A.; Ehsani, R.; Davis, C. A review of advanced techniques for detecting plant diseases. Comput. Electron. Agric. 2010, 72, 1–13. [Google Scholar] [CrossRef]
Jha, K.; Doshi, A.; Patel, P.; Shah, M. A comprehensive review on automation in agriculture using artificial intelligence. Artif. Intell. Agric. 2019, 2, 1–12. [Google Scholar] [CrossRef]
Khan, N.; Muhammad, K.; Hussain, T.; Nasir, M.; Munsif, M.; Imran, A.S.; Sajjad, M. An adaptive game-based learning strategy for children road safety education and practice in virtual space. Sensors 2021, 21, 3661. [Google Scholar] [CrossRef] [PubMed]
Haroon, U.; Ullah, A.; Hussain, T.; Ullah, W.; Sajjad, M.; Muhammad, K.; Lee, M.Y.; Baik, S.W. A Multi-Stream Sequence Learning Framework for Human Interaction Recognition. IEEE Trans. Hum.-Mach. Syst. 2022, 52, 435–444. [Google Scholar] [CrossRef]
Khan, S.U.; Haq, I.U.; Khan, N.; Muhammad, K.; Hijji, M.; Baik, S.W. Learning to rank: An intelligent system for person reidentification. Int. J. Intell. Syst. 2022, 37, 5924–5948. [Google Scholar] [CrossRef]
He, J.; Baxter, S.L.; Xu, J.; Xu, J.; Zhou, X.; Zhang, K. The practical implementation of artificial intelligence technologies in medicine. Nat. Med. 2019, 25, 30–36. [Google Scholar] [CrossRef] [PubMed]
Khan, S.U.; Hussain, T.; Ullah, A.; Baik, S.W. Deep-ReID: Deep features and autoencoder assisted image patching strategy for person re-identification in smart cities surveillance. Multimed. Tools Appl. 2021, 1–22. [Google Scholar] [CrossRef]
Ullah, W.; Ullah, A.; Hussain, T.; Muhammad, K.; Heidari, A.A.; Del Ser, J.; Baik, S.W.; De Albuquerque, V.H.C. Artificial Intelligence of Things-assisted two-stream neural network for anomaly detection in surveillance Big Video Data. Future Gener. Comput. Syst. 2022, 129, 286–297. [Google Scholar] [CrossRef]
Ullah, W.; Ullah, A.; Haq, I.U.; Muhammad, K.; Sajjad, M.; Baik, S.W. CNN features with bi-directional LSTM for real-time anomaly detection in surveillance networks. Multimed. Tools Appl. 2021, 80, 16979–16995. [Google Scholar] [CrossRef]
Yar, H.; Hussain, T.; Khan, Z.A.; Koundal, D.; Lee, M.Y.; Baik, S.W. Vision sensor-based real-time fire detection in resource-constrained IoT environments. Comput. Intell. Neurosci. 2021, 2021, 5195508. [Google Scholar] [CrossRef] [PubMed]
Patrício, D.I.; Rieder, R. Computer vision and artificial intelligence in precision agriculture for grain crops: A systematic review. Comput. Electron. Agric. 2018, 153, 69–81. [Google Scholar] [CrossRef]
Barbedo, J.G.A. A review on the main challenges in automatic plant disease identification based on visible range images. Biosyst. Eng. 2016, 144, 52–60. [Google Scholar] [CrossRef]
Barbedo, J.G. Factors influencing the use of deep learning for plant disease recognition. Biosyst. Eng. 2018, 172, 84–91. [Google Scholar] [CrossRef]
Barbedo, J.G.A. Automatic image-based detection and recognition of plant diseases—A critical view. In Proceedings of the XI Congresso Brasileiro de Agroinformática, Sao Paulo, Brazil, 2–6 October 2017. [Google Scholar]
Hughes, D.; Salathé, M. An open access repository of images on plant health to enable the development of mobile disease diagnostics. arXiv 2015, arXiv:1511.08060. [Google Scholar]
Barbedo, J.G.A.; Koenigkan, L.V.; Halfeld-Vieira, B.A.; Costa, R.V.; Nechet, K.L.; Godoy, C.V.; Junior, M.L.; Patricio, F.R.A.; Talamini, V.; Chitarra, L.G. Annotated plant pathology databases for image-based detection and recognition of diseases. IEEE Lat. Am. Trans. 2018, 16, 1749–1757. [Google Scholar] [CrossRef]
Johannes, A.; Picon, A.; Alvarez-Gila, A.; Echazarra, J.; Rodriguez-Vaamonde, S.; Navajas, A.D.; Ortiz-Barredo, A. Automatic plant disease diagnosis using mobile capture devices, applied on a wheat use case. Comput. Electron. Agric. 2017, 138, 200–209. [Google Scholar] [CrossRef]
Barbedo, J.G.A. Plant disease identification from individual lesions and spots using deep learning. Biosyst. Eng. 2019, 180, 96–107. [Google Scholar] [CrossRef]
Ngugi, L.C.; Abelwahab, M.; Abo-Zahhad, M. Recent advances in image processing techniques for automated leaf pest and disease recognition—A review. Inf. Process. Agric. 2021, 8, 27–51. [Google Scholar] [CrossRef]
Wójtowicz, A.; Piekarczyk, J.; Czernecki, B.; Ratajkiewicz, H. A random forest model for the classification of wheat and rye leaf rust symptoms based on pure spectra at leaf scale. J. Photochem. Photobiol. B Biol. 2021, 223, 112278. [Google Scholar] [CrossRef] [PubMed]
Bao, W.; Zhao, J.; Hu, G.; Zhang, D.; Huang, L.; Liang, D. Identification of wheat leaf diseases and their severity based on elliptical-maximum margin criterion metric learning. Sustain. Comput. Inform. Syst. 2021, 30, 100526. [Google Scholar] [CrossRef]
Paul, A.; Ghosh, S.; Das, A.K.; Goswami, S.; Choudhury, S.D.; Sen, S. A review on agricultural advancement based on computer vision and machine learning. In Emerging Technology in Modelling and Graphics; Springer: Cham, Switzerland, 2020; pp. 567–581. [Google Scholar]
Kumar, M.; Hazra, T.; Tripathy, S.S. Wheat leaf disease detection using image processing. Int. J. Latest Technol. Eng. Manag. Appl. Sci. (IJLTEMAS) 2017, 6, 73–76. [Google Scholar]
Liakos, K.G.; Busato, P.; Moshou, D.; Pearson, S.; Bochtis, D. Machine learning in agriculture: A review. Sensors 2018, 18, 2674. [Google Scholar] [CrossRef] [PubMed]
Dixit, A.; Nema, S. Wheat Leaf Disease Detection Using Machine Learning Method—A Review. Int. J. Comput. Sci. Mob. Comput. 2018, 7, 124–129. [Google Scholar]
Xu, P.; Wu, G.; Guo, Y.; Yang, H.; Zhang, R. Automatic wheat leaf rust detection and grading diagnosis via embedded image processing system. Procedia Comput. Sci. 2017, 107, 836–841. [Google Scholar] [CrossRef]
Islam, M.; Dinh, A.; Wahid, K.; Bhowmik, P. Detection of potato diseases using image segmentation and multiclass support vector machine. In Proceedings of the IEEE 30th Canadian Conference on Electrical and Computer Engineering (CCECE), Windsor, ON, Canada, 30 April–3 May 2017; pp. 1–4. [Google Scholar]
Alehegn, E. Ethiopian maize diseases recognition and classification using support vector machine. Int. J. Comput. Vis. Robot. 2019, 9, 90–109. [Google Scholar] [CrossRef]
Hossain, S.; Mou, R.M.; Hasan, M.M.; Chakraborty, S.; Razzak, M.A. Recognition and detection of tea leaf’s diseases using support vector machine. In Proceedings of the IEEE 14th International Colloquium on Signal Processing & Its Applications (CSPA), Penang, Malaysia, 9–10 March 2018; pp. 150–154. [Google Scholar]
Ullah, W.; Muhammad, K.; Ul Haq, I.; Ullah, A.; Ullah Khattak, S.; Sajjad, M. Splicing sites prediction of human genome using machine learning techniques. Multimed. Tools Appl. 2021, 80, 30439–30460. [Google Scholar] [CrossRef]
Ahmad, F.; Ikram, S.; Ahmad, J.; Ullah, W.; Hassan, F.; Khattak, S.U.; Rehman, I.U. GASPIDs Versus Non-GASPIDs-Differentiation Based on Machine Learning Approach. Curr. Bioinform. 2020, 15, 1056–1064. [Google Scholar] [CrossRef]
Aurangzeb, K.; Akmal, F.; Khan, M.A.; Sharif, M.; Javed, M.Y. Advanced machine learning algorithm based system for crops leaf diseases recognition. In Proceedings of the IEEE 6th Conference on Data Science and Machine Learning Applications (CDMA), Riyadh, Saudi Arabia, 4–5 March 2020; pp. 146–151. [Google Scholar]
Treboux, J.; Genoud, D. Improved machine learning methodology for high precision agriculture. In Proceedings of the IEEE Global Internet of Things Summit (GIoTS), Bilbao, Spain, 4–7 June 2018; pp. 1–6. [Google Scholar]
Rumpf, T.; Mahlein, A.-K.; Steiner, U.; Oerke, E.-C.; Dehne, H.-W.; Plümer, L. Early detection and classification of plant diseases with support vector machines based on hyperspectral reflectance. Comput. Electron. Agric. 2010, 74, 91–99. [Google Scholar] [CrossRef]
Ramesh, S.; Hebbar, R.; Niveditha, M.; Pooja, R.; Shashank, N.; Vinod, P. Plant disease detection using machine learning. In Proceedings of the IEEE International Conference on Design Innovations for 3Cs Compute Communicate Control (ICDI3C), Bangalore, India, 25–28 April 2018; pp. 41–45. [Google Scholar]
Phadikar, S.; Sil, J.; Das, A.K. Classification of rice leaf diseases based on morphological changes. Int. J. Inf. Electron. Eng. 2012, 2, 460–463. [Google Scholar]
Prajapati, H.B.; Shah, J.P.; Dabhi, V.K. Detection and classification of rice plant diseases. Intell. Decis. Technol. 2017, 11, 357–373. [Google Scholar] [CrossRef]
Ahmed, K.; Shahidi, T.R.; Alam, S.M.I.; Momen, S. Rice leaf disease detection using machine learning techniques. In Proceedings of the IEEE International Conference on Sustainable Technologies for Industry 4.0 (STI), Dhaka, Bangladesh, 24–25 December 2019; pp. 1–5. [Google Scholar]
Panigrahi, K.P.; Das, H.; Sahoo, A.K.; Moharana, S.C. Maize leaf disease detection and classification using machine learning algorithms. In Progress in Computing, Analytics and Networking; Springer: Cham, Switzerland, 2020; pp. 659–669. [Google Scholar]
Waghmare, H.; Kokare, R.; Dandawate, Y. Detection and classification of diseases of grape plant using opposite colour local binary pattern feature and machine learning for automated decision support system. In Proceedings of the IEEE 3rd International Conference on Signal Processing and Integrated Networks (SPIN), Noida, India, 11–12 February 2016; pp. 513–518. [Google Scholar]
Zhao, J.; Fang, Y.; Chu, G.; Yan, H.; Hu, L.; Huang, L. Identification of leaf-scale wheat powdery mildew (Blumeria graminis f. sp. Tritici) combining hyperspectral imaging and an SVM classifier. Plants 2020, 9, 936. [Google Scholar] [CrossRef] [PubMed]
Li, G.; Ma, Z.; Wang, H. Image recognition of wheat stripe rust and wheat leaf rust based on support vector machine. J. China Agric. Univ. 2012, 17, 72–79. [Google Scholar]
Azadbakht, M.; Ashourloo, D.; Aghighi, H.; Radiom, S.; Alimohammadi, A. Wheat leaf rust detection at canopy scale under different LAI levels using machine learning techniques. Comput. Electron. Agric. 2019, 156, 119–128. [Google Scholar] [CrossRef]
Tursunov, A.; Choeh, J.Y.; Kwon, S. Age and gender recognition using a convolutional neural network with a specially designed multi-attention module through speech spectrograms. Sensors 2021, 21, 5892. [Google Scholar] [CrossRef] [PubMed]
Mustaqeem; Ishaq, M.; Kwon, S. A CNN-Assisted deep echo state network using multiple Time-Scale dynamic learning reservoirs for generating Short-Term solar energy forecasting. Sustain. Energy Technol. Assess. 2022, 52, 102275. [Google Scholar] [CrossRef]
Maji, B.; Swain, M.; Mustaqeem. Advanced Fusion-Based Speech Emotion Recognition System Using a Dual-Attention Mechanism with Conv-Caps and Bi-GRU Features. Electronics 2022, 11, 1328. [Google Scholar] [CrossRef]

Figure 1. Framework: for better understanding, the whole framework is divided into three sub-sections which include: (1) data preprocessing and feature extraction. The images from wheat fields are collected, and our dataset consists of three classes, including healthy, rusted, and yellow-rusted leaves. This contains sub-steps which include preprocessing which preprocess the data by using the resizing and segmentation techniques and feature extraction where extracted features from preprocessed data using different feature descriptors are used. (2) The second one is model training, in which 80% of data loads from the dataset for training purposes. Lastly, six different ML models are trained. (3) The third step is testing, after training models, these are evaluated by checking the performance of models on unseen data. As a testing load, the remaining 20% of the data are used and passed from preprocessing and feature extraction steps, as same as applied in the training phase, and the performance of the model is observed.

Figure 2. In image segmentation, the first columns are the input images, and the second column shows the segmented image of the corresponding input image.

Figure 3. Samples of images. The first, second, and third rows show healthy, rusted, and yellow rusted leaves of wheat diseases, respectively.

Figure 4. Accuracy graph of all six trained models: the proposed framework has the highest accuracy followed by DT while LR has the lowest, and all others have a performance value between these two.

Figure 5. Performance evaluation based on precision, recall, and F1-score: the proposed framework followed by DT has the highest performance, however, LR has the lowest precision, recall, and F1 score.

Table 1. Summary of the literature and at the end the results of our proposed framework is marked bold for comparison.

Article	Crop	Preprocessing	Features	Algorithms/ Models	Accuracy
Xu et al., 2017 [35]	Wheat	Conversion images to G single gray RGB model, background removal	Binary features point set	Flood filling algorithm	92.3%
Islam et al., 2017 [36]	Potato	Color based segmentation	Statistical features	SVM	95%
Alehegen et al., 2019 [37]	Maize	Segmentation	Texture and morphological	SVM	95.63%
Hossain et al., 2018 [38]	Tea	Image resizing and cropping	Statistical Features	SVM	93%
Aurangzeb et al., 2020 [41]	Corn and Potato	Image resizing	LTP, HOG, SFTA	MSVM	92.8% and 98.7%
Treboux et al., 2018 [42]	Vineyards	Morphological operation (Opening and closing)	First order statistic, Tamura, Haralick	DTE	94.275%
Rumpf et al., 2010 [43]	Sugar beet	Image resizing, Clustering	Physiological parameters	SVM	97%
Ramesh et al., 2018 [44]	Papaya	Image resizing and Normalization	HOG	RFC	70%
Phadikar et al., 2012 [45]	Rice	Enhancement via mean filters and segmentation	Colors descriptors	SVM, NB	68.1% and 79.5%
Prajapati et al., 2017 [46]	Rice	Back removal, segmentation	Texture, Color, and shape	SVM	93.33%
Ahmed et al., 2019 [47]	Rice	Augmentation	Pure statistical features	DT	97.91%
Panigrahi et al.,2020 [48]	Maize	Resizing, denoising, segmentation	Grayscale pixel values	NB, KNN, DT, SVM and RFC	79.23% (highest with RFC)
Waghmare et al., 2016 [49]	Graphs	Back removal	Texture	SVM	96.6%
Zhao et al., 2020 [50]	Wheat	Image smoothing via S-G filter and derivative function	Disease level of severity, and affected leaf spots	SVM, PNN, and RFC	93.33%
Li et al., 2012 [51]	Wheat	Cropping, denoising	Colored and texture	SVM with RBF	96.67%
Azadbakht et al., 2019 [52]	Wheat	Noise reduction	Disease severity level, leaf area index, and pixel values	V-SVR, and RFR	99% and 79%
Proposed framework	Wheat	Resizing, Masked based segmentation	Haralick texture, color histogram, and hue moments	Fine-tuned RFC	99.8%

Table 2. Baseline Models Experimental Analysis.

Model	LR	SVM	NB	KNN	DT	Proposed Framework
Accuracy (%)	89.6	94.4	97.7	99.0	99.2	99.8

Table 3. Confusion matrices of the trained models.

The Proposed Framework
	Predicted Classes ↓	Healthy	Rusted	Yellow rusted
Actual Classes →		Healthy	Rusted	Yellow rusted
Healthy		215	1	0
Rusted		0	201	0
Yellow Rusted		0	0	212
Overall Accuracy (%)		99.8
DT
	Predicted Classes ↓	Healthy	Rusted	Yellow rusted
Actual Classes →		Healthy	Rusted	Yellow rusted
Healthy		213	1	1
Rusted		0	205	1
Yellow Rusted		1	1	207
Overall Accuracy (%)		99.2
LR
	Predicted Classes ↓	Healthy	Rusted	Yellow rusted
Actual Classes →		Healthy	Rusted	Yellow rusted
Healthy		167	48	0
Rusted		2	204	0
Yellow Rusted		13	2	194
Overall Accuracy (%)		89.6

Table 4. Performance of the proposed framework on independent data.

Class	Total Testing Images	Accurate Prediction
Healthy	16	13
Yellow	18	12
Rusted	16	11

Table 5. Comparison with SOTA papers where our work achieved the highest accuracy, using the proposed framework on the extracted features of three descriptors.

Authors	Year	Features	Classifiers	Accuracy (%)
Azadbakht et al. [52]	2019	Texture	SVR, RFR, GRR, BRT	99
Zhao et al. [50]	2020	Diseased area/total area of leaf	SVM, PNN, RFC	93.33
Bao et al. [30]	2021	Color, texture, and combination of these two	Elliptical-Maximum Margin Criterion (E-MMC) metric learning	94.16
Proposed method	2022	Haralick-texture, Color-histogram, Hue-moment, LBP, HOG	Fine-Tuned RFC	99.8

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Khan, H.; Haq, I.U.; Munsif, M.; Mustaqeem; Khan, S.U.; Lee, M.Y. Automated Wheat Diseases Classification Framework Using Advanced Machine Learning Technique. Agriculture 2022, 12, 1226. https://0-doi-org.brum.beds.ac.uk/10.3390/agriculture12081226

AMA Style

Khan H, Haq IU, Munsif M, Mustaqeem, Khan SU, Lee MY. Automated Wheat Diseases Classification Framework Using Advanced Machine Learning Technique. Agriculture. 2022; 12(8):1226. https://0-doi-org.brum.beds.ac.uk/10.3390/agriculture12081226

Chicago/Turabian Style

Khan, Habib, Ijaz Ul Haq, Muhammad Munsif, Mustaqeem, Shafi Ullah Khan, and Mi Young Lee. 2022. "Automated Wheat Diseases Classification Framework Using Advanced Machine Learning Technique" Agriculture 12, no. 8: 1226. https://0-doi-org.brum.beds.ac.uk/10.3390/agriculture12081226

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Automated Wheat Diseases Classification Framework Using Advanced Machine Learning Technique

Abstract

1. Introduction

2. Related Work

2.1. Statistical-Based Approaches

2.2. Machine Learning-Based Approaches

3. Materials and Methods

3.1. Real-Time Data Collection

3.2. Data Preprocessing and Features Extraction

3.2.1. Preprocessing

3.2.2. Feature Extraction

Hue Moments (HM)

Color Histogram (CH)

Haralick Texture (HT)

3.3. Proposed Fine-Tuned Framework

3.4. Comparative Analysis of Baseline Models

4. Experimental Results

4.1. Experimental Settings

4.2. Dataset

4.3. Evaluation Metrics

4.4. Results

5. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI