Hand Gesture Recognition Based on Auto-Landmark Localization and Reweighted Genetic Algorithm for Healthcare Muscle Activities

Ansar, Hira; Jalal, Ahmad; Gochoo, Munkhjargal; Kim, Kibum

doi:10.3390/su13052961

Open AccessArticle

Hand Gesture Recognition Based on Auto-Landmark Localization and Reweighted Genetic Algorithm for Healthcare Muscle Activities

¹

Department of Computer Science, Air University, Islamabad 44000, Pakistan

²

Department of Computer Science and Software Engineering, United Arab Emirates University, Al Ain 15551, United Arab Emirates

³

Department of Human-Computer Interaction, Hanyang University, Ansan 15588, Korea

^*

Author to whom correspondence should be addressed.

Sustainability 2021, 13(5), 2961; https://0-doi-org.brum.beds.ac.uk/10.3390/su13052961

Submission received: 10 February 2021 / Revised: 4 March 2021 / Accepted: 5 March 2021 / Published: 9 March 2021

(This article belongs to the Special Issue Sustainable Human-Computer Interaction and Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

Due to the constantly increasing demand for the automatic localization of landmarks in hand gesture recognition, there is a need for a more sustainable, intelligent, and reliable system for hand gesture recognition. The main purpose of this study was to develop an accurate hand gesture recognition system that is capable of error-free auto-landmark localization of any gesture dateable in an RGB image. In this paper, we propose a system based on landmark extraction from RGB images regardless of the environment. The extraction of gestures is performed via two methods, namely, fused and directional image methods. The fused method produced greater extracted gesture recognition accuracy. In the proposed system, hand gesture recognition (HGR) is done via several different methods, namely, (1) HGR via point-based features, which consist of (i) distance features, (ii) angular features, and (iii) geometric features; (2) HGR via full hand features, which are composed of (i) SONG mesh geometry and (ii) active model. To optimize these features, we applied gray wolf optimization. After optimization, a reweighted genetic algorithm was used for classification and gesture recognition. Experimentation was performed on five challenging datasets: Sign Word, Dexter1, Dexter + Object, STB, and NYU. Experimental results proved that auto landmark localization with the proposed feature extraction technique is an efficient approach towards developing a robust HGR system. The classification results of the reweighted genetic algorithm were compared with Artificial Neural Network (ANN) and decision tree. The developed system plays a significant role in healthcare muscle exercise.

Keywords:

directional image; geodesic distance; gray wolf optimization; hand gesture recognition; landmark localization; reweighted genetic algorithm; saliency map

1. Introduction

Recent developments in artificial intelligence and digital technologies have provided several effective ways to communicate in terms of human–computer interaction (HCI). When gestures are made by human body movements, physical actions of fingers, hands, arms, head, and face are recognized by the receiver—this methodology is termed human gesture recognition (HGR) [1,2,3,4]. HGR has wide-ranging applications such as communication with and between deaf people, as well as interactions between young children and patients using a PC [5,6,7]. For rehabilitation purposes, healthcare centers provide hand muscle exercise in which HGR plays a vibrant role. According to the World Health Organization (WHO), 15 million people suffer from stroke and 50,000 people suffer from spinal cord injuries. They affect individuals’ upper limb function and also leads to long-term disabilities. Rehabilitation strategy is an essential method for upper limb recovery. HGR is used to perform rehabilitation gestures, and also daily gestures can be recognized [8].

Gestures are extensively characterized as static and dynamic in a natural way of communication [9]. A static gesture is seen at the spurt of time, whereas a dynamic gesture changes with a time frame. The static gestures are specific transition phases in a dynamic gesture that display as specific action or gesture. The gesture can be inferred by a vision-based system and data-glove-based system collected via (i) camera, (ii) sensors, and (iii) gloves [10]. The sensors and gloves measure the angles of the joint and the positions of a finger in real time. The use of gloves and sensors adds a certain burden to the user, and the weight of cables can hinder the movement of the hand, which affects accuracy when measuring gestures. On the other hand, one or more cameras can be used to capture images of gestures performed by an individual. The camera collects static gestures, which are used to train the machine for recognition; for this purpose, only a sufficient dataset is required [11,12,13,14].

In this paper, we propose an effective method to extract gestures from RGB images. First of all, preprocessing was performed on all the images. Then, the hand was segmented from the background by two methods, one being a fused method and the other being a directional images method. Both of these methods extracted the hand from the background successfully, but after a comparison of the two methods, the fused method gave better results and was used for further processing. In the second step, landmarks were extracted via color quantization. These landmarks were then used for feature extraction. In this paper, we extracted different features for the accurate recognition of gestures, i.e., angular features, geometric features, and mesh geometry. Those features were optimized and then classified via a genetic algorithm into gestures. The five datasets used for experimentation are named Sign Word, Dexter1, Dexter + Object, STB, and NYU datasets. The proposed system produced significantly better recognition accuracy compared with other state-of-the-art methods.

The main contributions of the paper can be summarized as follows:

We extracted the hand via a fused method technique from RGB images for gesture classification.
Auto-landmark localization was performed for multi-feature extraction to improve the feature selection process for daily gestures.
Multi-features were then optimized via a gray wolf algorithm and classified with a weighted genetic algorithm.
A comprehensive evaluation was performed on three datasets with significantly better performance than other state-of-the-art methodologies.

The rest of the paper is organized as follows. In Section 2, the literature review is presented on the basis of two main categories of HGR feature extraction and recognition. Section 3 addresses the proposed HGR model, which includes angular, geometric, and mesh geometry-based features; gray wolf optimization; and the genetic algorithm as a classifier. Section 4 discusses the experimental setup and a comparison of the proposed method with other state-of-the-art methods. Finally, Section 5 presents the conclusion and future work.

2. Literature Review

2.1. HGR Through Electromyographic Signals

Human gesture recognition is applied in many research areas because the accurate classification of hand gesture electromyography (EMG) signals provides accurate gesture recognition results [15]. However, the collection of features and the labeling of the large datasets consumes a large amount of processing time. Su et al. [16] proposed a novel method in which they combined depth vision learning and EMG for hand gesture recognition. The system labels data without considering the sequence of hand motion via depth vision learning. The hierarchical k-means (HK-mean) algorithm is used to classify 10 hand gestures using a Myo armband. Motoche et al. [17] used superficial EMG for hand gesture recognition. They applied a sliding window approach; a sub-window is applied to observe signal segments through the main window. The acquired data using Myo armband is then applied to preprocess for rectification and filtering. After that, they extracted features from the feature vector and the results from the functions. They used a feedforward neural network for classification and obtained 90.7% recognition accuracy. Sapienza et al. [18] presented a model with minimum complexity based on the average threshold crossing (ATC) technique. Four movements of the wrist: flexion, extension, abduction, and grasp were detected after the acquisition of signals from EMG. The signal threshold-crossing event number was exploited and then the average ATC classifier produced 92.87% accuracy. Arenas et al. [19] collected data via eight Myo armband sensors with the use of a power spectral density map. For classification, they built a feature set consisting of 2880 multi-channel feature maps, which were divided into three equal sets for training, validation, and testing. Convolutional neural networks (CNNs) obtained 98% accuracy in validation and 99% in testing. Benalcazar et al. [20] identified the labels of hand movements in real time. Their model collected hand movements from a Myo armband, and they used a window-based approach to make feature vectors. For classification, they used k-nearest neighbor and a time wrapping algorithm, which achieved 89.5% accuracy. Qi et al. [21] reduced the redundancy of EMG signals and enhanced real-time gesture recognition. They used principal component analysis and General Regression Neural Network (GRNN) for the construction of a gesture recognition system. The authors collected nine static gestures using an electromyographic instrument for the extraction of four kinds of signals. After dimension reduction, accuracy reached to 95.1%

2.2. HGR through Smartphone

The pioneering works of hand gesture recognition through smartphones explored different sensing technologies and feature extraction methods for the improvement of recognition accuracy [22]. Wang et al. [23] used a smartphone as an active sonar sensing system for hand movement recognition. The ultrasonic signal emitted by speakers and the phone’s microphone receives an echo that is changed by hand movements. The gesture is identified from the recorded signals. Haseeb et al. [24] introduced a novel machine learning solution for hand gesture recognition. They relied on standard Wi-Fi signals, thresholding filters, and recurrent neural network (RNN); for recognition, the smartphone does not require any change in either the hardware or the operating system. The experimental results included changes in scenarios, as well as network traffic between smartphone and Wi-Fi access points. They classified three gestures with 93% accuracy. Zhang [25] used binary motion gestures methods on a smartphone with an accelerometer. They used only two simple gestures, which were expressed as “0” and “1”. They first evaluated four kinds of candidate binary gestures and then they split the accelerometer signal sequence into multiple separate gesture signal segments using signal cutting and a merging algorithm. The segments were then classified using five algorithms, namely, dynamic time wrapping (DTW), naïve Bayes, decision tree, support vector machine (SVM) and bidirectional long short-term memory (BLSTM) networks. Panello et al. [26] addressed the issue faced for gesture segmentation and recognition using a smartphone device. They designed an application that uses low-cost and diffused technologies. They designed a new machine learning algorithm that identifies hand gestures using Hu image moments, invariance rotation, translation, and scaling, all with low computation cost.

2.3. HGR Through Camera

A substantial amount of work has been done on the recognition of static gestures using cameras. For static hand, gesture recognition features are extracted via different methods [27,28,29,30,31]. Features can be extracted using the full hand or by using only the fingers of the hand. This section divides the literature review into two subsections: (i) Section 2.3.1 and (ii) Section 2.3.2.

2.3.1. HGR via Full-Hand Features

HGR of static gestures is a challenging task as the extraction of features from the full hand is a composite and requires a lot of machine training for recognition. Many researchers have presented different methods for gesture recognition of the full hand. Oprisescu et al. [32] proposed a method that extracted the contour of the hand, then calculated convexity and finger positioning from the centroid for gestures. Gesture classification is done via a decision tree on nine different gestures with 93.3% mean accuracy. Yun et al. [33] detected the hand via skin color and angle, combined with Hu invariant moments. For classification, they used a Euclidean distance template-matching technique. Ghosh et al. [34] designed a system in which they segmented the hand in preprocessing. A localized contour sequence (LCS) and block-based features are extracted for better representation of the hand. Those features are combined and an SVM classifier is used for the recognition of static hand gestures. Candrasari [35] extracted the hand via YCbCr values. They extracted features on discrete wavelet transform (DWT) and those features were passed through hidden Markov model (HMM) and k-nearest neighbor (KNN) for classification. Rosalina et al. [36] extracted the hand via contour representation using a glove worn by the user. ANN was applied on the American Sign Language (ASL) and digits from 0–9 for classification. The accuracy rate of gesture recognition was 90%. Lin [37] segmented the hand via a color model, and hand poses were obtained for training purposes. The recognition accuracy was 95.96% for seven hand gestures. Pansare et al. [38] proposed a system that was divided into four stages—preprocessing, hand extraction using the Sobel edge detection method, after which the feature vector is computed via the Euclidean distance between contours. After that, the Euclidean distance is compared with the ground truth and the comparison is done for gesture recognition. Xu et al. [39] proposed a novel hand gesture recognition method in which the hand is extracted via skin-color features, and the arm is removed using distance transformation. Hu moments of the gestures are calculated and SVM is used for classification. This approach produced 95.83% accuracy with eight gestures. Lee et al. [40] introduced a method to extract the hand via wristband-based contour features. A simple feature matching method was proposed to obtain a recognition result. Liu J. et al. [41] proposed a feature-boosting network for estimating 3D hand pose. They used convolutional layers for feature learning; these convolution layers were boosted with a new long short-term dependence-aware (LSTD) module which perceived the dependency on different hand parts. To improve reliability of features representation of each part of hand, the researchers also added a context consistency gate (CCG). They used benchmark datasets to test their system against other state of the art methods.

2.3.2. HGR via Landmarks Features

Many approaches have been proposed to localize hand landmarks as a feature extraction technique for gesture recognition. The majority of existing methods include fingertip detection, which is successfully applied by researchers. Puttapirat et al. [42] proposed a system that extracted important landmarks of the hand in the image. They identified the location to specify those landmarks, and the landmarks were matched with the corresponding landmarks in a 3D model to estimate the hand posture. Ma et al. [43] designed a method that extracted region of interest (ROI) by the local neighbor method. They used the convex hull detection algorithm for the identification of fingertips. Al Marouf et al. [44] developed a novel approach to determine the fingertips and the center of the palm. The procedure of fingertip detection is performed via an adaptive hill-climbing algorithm applied on distance graphs. Finger identification is performed via the relative distances between fingers and valley points. Mahdikhanlou et al. [45] explained a novel multimodal framework that computed two sets of features. The first set of features is angles from the hand joints and the second set of features is from hand contours. Those features are then classified using random forest. Grzejszczak [46] proposed a method for the localization of landmarks in RGB images. They analyzed a skin-masked directional image using hand transform and template matching. They detected landmarks on both contour and inside of the hand masks. Recognition is done by computing the localization error of the landmark. Kerdvibulvech [47] made a tracking system for fingertips. They achieved detection by matching a semicircular template to the detected skin region while for classification they used Bayesian classifiers. Nguyen et al. [48] made a system to segment the hand using color information separated from the arm. Then, features were extracted, namely, ratio of width to height, wrist angle, and the number of fingers; calculations are based on fingertips and cross-sections. SVM was applied for classification and they achieved 89.5% accuracy.

3. Materials and Methods

The proposed system is comprehensively discussed in this section. The system is divided into various phases. The HGR system starts with the preprocessing phase, where the hand gesture from each RGB image is segmented from the background using a morphological operation. A fused method is used for hand detection. Next is the feature extraction phase, where geodesic distance, landmarks, geometric features, and spatial features are extracted from processed RGB images. Then, the optimization phase results in a representation of features in the vectorized form via a gray wolf optimization algorithm. Finally, in the classification phase, each gesture is classified via a reweighted genetic algorithm. The overall architecture of the HGR system is shown in Figure 1.

3.1. Preprocessing

RGB images are prone to having noise. This makes the extraction of a region of interest from the background a challenging task. We can extract the ROI by preprocessing, in which first of all noise is removed from the image. Then, a sharpening and enhancement technique is used to increase the intensity and brightness of the image. This image is then converted into binary form for further processing in designed the HGR system. In this phase, a connected component is applied to select the largest component in the image. Then, morphological operations, namely, dilation and erosion, are used to extract the desired region of interest [49].

X \oplus Y = {z| {(\overset{\land}{Y})}_{z} \cap X \neq φ}

(1)

X - Y = {z| {(Y)}_{z} \subseteq X}

(2)

where

Y

is the structuring element and z is the location of the set of pixels. During the translation of

z

, the reflection

\overset{\land}{Y}

of

Y

joins with the pixels of the foreground element X. In this phase, the shape of the object is maintained, and the gesture mask is extracted. Images of all 5 datasets are passed through this phase that has enhanced the images at the pixel level for further processing. Preprocessing results are shown in Figure 2.

3.2. Hand Detection

Region of interest (ROI) extraction is the first step in any HGR system [50]. Thus, the ROI, either single or both hand gestures in all RGB images, is first extracted from the background using 2 methods. The 2 methods implemented to segment gesture silhouettes are separately described in the following subsection.

3.2.1. The Fused Method

RGB silhouette extraction of all 5 datasets is carried out through the fused method for hand detection. This method involves 2 methods of detection. Firstly, the entire image dimension is reduced to two-dimensional space where the column size is defined by width and rows are defined by heights in an image. The RGB image is then divided into planes and converted into YCbCr space where the color of each pixel is 32 bits. For the extraction of each channel, right shift is performed on red, blue, green, and alpha by 24-bit processing to obtain the values of alpha. The alpha channel is used to check the opacity of the image; if the pixel has 0% value, then it is fully transparent, whereas if it has 100% values, then it is a fully opaque pixel. For the red and green channel, 16 bits and 8 bits right shift is performed, respectively. The remaining pixel values are for the blue channel.

On these calculated values, bitwise logical AND operation with 0 xff is applied to extract the desired color. These operations are applied to all image pixels [51]. To obtain accurate and more precise recognition, we converted the IRGB image into YCbCr color space as in the equation given below:

[\begin{matrix} Y \\ C b \\ C r \end{matrix}] = [\begin{matrix} 16 \\ 128 \\ 128 \end{matrix}] + \frac{1}{265} [\begin{matrix} 65.738 & 129.057 & 25.06 \\ - 37.945 & - 74.494 & 112.43 \\ 112.439 & - 94.154 & - 18.28 \end{matrix}] [\begin{matrix} R \\ G \\ B \end{matrix}]

(3)

where Y is the luminance. To overcome the interference of highlights, Y ϵ (0, 80) is set. Then, using an elliptical equation, human skin color is located via Cb and Cr values. The equation is shown below:

\{\begin{matrix} \begin{matrix} \frac{{(i - 1.6)}^{2}}{{26.39}^{2}} + & \frac{{(j - 2.41)}^{2}}{{14.03}^{2}} & < 1 \end{matrix} \\ [\begin{matrix} i \\ j \end{matrix}] = [\begin{matrix} \cos (2.53) & \sin (2.53) \\ - \sin (2.53) & \cos (2.53) \end{matrix}] [\begin{matrix} C b - 109.38 \\ C r - 152.02 \end{matrix}] \end{matrix}

(4)

where i and j are the intermediate values. Each pixel value of IRGB and YCbCr is compared with the standard skin pixel, and a decision, whether each pixel is skin or not, is made on the range of predefined threshold value for each parameter.

Secondly, a contrast-based method is applied to compute a saliency map. In a saliency map, the dominant part of the gesture is based on saliency values, making the segmentation of gestures faster and more accurate. The algorithm designed for computing the saliency map has 3 aspects: (1) contrast will depend on the color and the area of the two partitions in the image; (2) the partitions will have a greater impact on each other’s saliency if the distance between them is closer; (3) the proximity of the saliency object to the center of the image [52]. Saliency map computation is performed by segmenting the input image into super pixels. Then, a sparse color histogram of the super pixels is constructed and the color number of each channel is reduced to simplify the calculations. Each histogram is converted into lab space, and the differences of color and distance between pixels are then calculated.

D i s_{p i x e l} (p_{i}, p_{j}) = \sum_{x = 1}^{k_{1}} \sum_{y = 1}^{k_{2}} f (d_{i, x}) f (d_{j, y}) D i s (d_{i, x}, d_{j, y})

(5)

where

D i s (d_{i, x}, d_{j, y})

is the distance between color x and y in super pixel

p_{i} and p_{j}

. K₁ and K₂ represent the color number of the super pixels

p_{i} and p_{j}

.

As the partitions have a greater impact on each other’s saliency map, thus the distance between

p_{i} and p_{j}

is computed as

D i s_{d} (p_{i}, p_{j}) = |m_{i} - m_{j}| + |n_{i} - n_{j}|

(6)

D i s_{d} (p_{i}, p_{j})

is the distance between regions. m and n represent the X and Y coordinate values of region

p_{i}

, respectively.

S u p e r P (p_{i}) = \sum_{t \neq i} \frac{n (p_{t}) D i s_{i} (p_{i}, p_{t})}{δ D i s_{d} (p_{i}, p_{t})}

(7)

where

n (p_{t})

is the total number of super pixels

p_{t}

. The greater value will represent the greater impact on each super pixel. The original image is then segmented using graph-based segmentation to obtain a larger partition, and the contour of the salient object is generated from the saliency map [53]. Then, the gray values of the saliency map are merged in the contour. The resultant saliency map is then represented on the top of the original image, as shown in Figure 3.

3.2.2. Directional Images

In the second hand-detection method, the outer and inner edges of the hand region are detected via a new approach. The ROI is obtained by specifying a threshold value T, which compares foreground and background pixel values, and as a result, a binary image is generated. For hand detection, a 3*3 gradient vector-matrix G_(x,y) is computed. The gradient vector matrix is computed for every pixel of image I, and is represented as follows:

G_{(x, y)} = [\begin{matrix} I_{(x, y)} - I_{(x, y - 1)}, I_{(x, y)} - I_{(x, y + 1)} & I_{(x, y)} - I_{(x - 1, y)}, I_{(x, y)} - I_{(x + 1, y)} \\ I_{(x, y)} - I_{(x - 1, y - 1)}, I_{(x, y)} - I_{(x + 1, y + 1)} & I_{(x, y)} - I_{(x - 1, y + 1)}, I_{(x, y)} - I_{(x + 1, y - 1)} \end{matrix}]

(8)

Every second pixel in the second row of the matrix will be compared with the distances adjacent to the pixel in the 3 × 3 window, resulting from the gradient vector matrix. The negative values of distances of every pixel, calculated after subtraction, will be converted into positive.

d l = d l * (- 1)

(9)

The distances that lie vertically, horizontally, and diagonally in the gradient vector are compared with a constant threshold. A distance greater than the threshold is set to the white pixel value of 1, and distances less than the threshold are set to the black pixel value 0 and, as a result, a binary image is formed of outer and internal boundaries of the hand [54]. The resulting directional image is shown in Figure 4.

Both methods were tested on the Sign Word, Dexter1, Dexter + Object, STB, and NYU datasets. The fused method gave more promising hand detection results than the directional image method. The ground truth of gestures is first computed in order to compute the accuracy of the resultant hand detection images for both fused and directional image methods. Then, the contour pixel index values distance is compared via geodesic distance on both of the methods. Table 1 shows comparisons of detection accuracy for the Sign Word dataset. It is clearly shown that the fused method produced more accurate results. Thus, the fused method was selected for further processing of the system architecture.

3.3. Landmark Detection

The segmented hand is then used for landmark detection. Many approaches are proposed to localize hand landmarks, which help in feature extraction for distinguishing and determining specific gestures [54,55,56,57,58]. The majority of techniques are quite simple and limit the exact localization of landmarks. In our proposed method, landmark detection is performed using 2 different methods on different segmented images for the more exact localization of landmarks.

Geodesic Distance

In this method, gestures performed by hands are represented via geodesic wave maps. These maps are generated by calculating geodesic distance found by a fast-marching algorithm. First of all, the center points of a human hand silhouette are located, and the distance value is given as d (h) = 0. Point h is the starting point, which is marked as a visited point. All the other pixel points p are unvisited and given a distance value d (p) = ∞ on hand silhouettes. The neighbor of each pixel p is represented as n, and p pixel distance is measured from n. Every neighboring pixel is taken in each iteration until all pixel points are marked “visited” [55,56,57,58,59]. The distance calculated from each iteration is compared with the distance of previous iterations. Priority is given to the shortest distance calculated. An updated distance is defined as

d = \{\frac{d_{x} + d_{y} + \sqrt{Δ}}{\begin{matrix} 2 \\ m i n (d_{x}, d_{y}) + w o t h e r w i s e \end{matrix}} w h e n Δ \geq 0

(10)

Δ = 2 w^{2} - {(d_{x} - d_{y})}^{2}

where d_x and d_y is the distance in x- and y-coordinates, respectively, d_x=min(D_i+1,m,D_i−1,m) and d_y=min(D_i,n+1,D_i,n−1). Figure 5 demonstrates the wave propagation of geodesic distance via fast-marching algorithm (FMA).

Landmark detection is performed after obtaining the wave propagation of geodesic distance via the fast-marching algorithm (FMA) on images. Color values of pixels p are computed on the outer boundary b of the hand silhouettes. Pixels having same color values c are counted first and then the mean is computed; on the mean value of the pixel, the landmark l is drawn. For the inner landmark, the color value of neon green is taken, and the distance is set between points. The fingertips can be calculated as

l = \frac{c (p_{x}, p_{y})}{2}

(11)

where p_x and p_y belong to the same color in the outer boundary and

c (p_{x}, p_{y})

is the total number of that colored pixel located in the outer boundary. Landmarks are drawn on the hand silhouettes in Figure 6 given below:

3.4. Feature Extraction via Point-Based Method

This section provides a detailed description of feature extraction via landmarks. The landmarks are extracted by a point-based features extraction method for hand gesture representation, training, and recognition.

3.4.1. Distance Features

Feature extraction for hand gestures is achieved via the point-based method, which includes points on the thumb, index finger, middle finger, ring finger, and little finger (see Figure 7). All the points are combined in various ways to produce a variety of features that are extracted for training and recognition purpose. These points are distance features, geometric features, and angle-based features. The distance feature d measures the distance between the ixy extreme landmark on the fingertip and the cxy inner landmark, using that geodesic distance of the hand, which is formulated as

∥ d ∥ = \sqrt{{(x i_{2} - x c_{1})}^{2} + {(y i_{2} - y c_{1})}^{2}}

(12)

where d represents the distance between two points;

x i_{2}

and

x c_{1}

are the x-coordinates of the extreme landmark and inner landmark of the hand, respectively [60,61]; while

y c_{1}

and

y i_{2}

are the y-coordinates of the same landmarks.

3.4.2. Angular Features

The angular features are extracted through the cosine of the angles (i.e., α, β, γ) that is measured on the geodesic distance angle of 2 extreme points [62,63,64]. Three points—adjacent, side, and centroid—form a triangle, as shown in Figure 8b.

Similarly, we have vertices, i.e., A, B, and C, and a, b, and c are the sides of a particular triangle, as shown in Figure 8a, having

a = \bar{B C}

,

b = \bar{A C},

and

c = \bar{A B}

, respectively [65].

α = \cos^{- 1} b^{2} + c^{2} - a^{2} / 2 b c β = \cos^{- 1} a^{2} + c^{2} - b^{2} / 2 a c γ = \cos^{- 1} a^{2} + b^{2} - c^{2} / 2 a b

(13)

where α, β, and γ are the measures of the angle between two adjacent sides b<->c, a<->c, and a<->b of the triangle formed, respectively. Finally, these features are provided to the classifier for further processing towards recognition, which is discoursed successively [66].

3.4.3. Geometric Features

Hand gestures are formed using different combinations of fingers and palms, which result in forming different shapes. These shapes form a specific geometry over different gestures. Such geometric shapes are the best features for the classification and recognition of gestures [67,68,69,70]. The geometric feature is the third point-based feature that includes different irregular shapes formed by 2 consecutive fingers. This includes different irregular shapes formed by 2 consecutive fingers of the hand in a gesture. The area is computed on the shape formed via Heron’s formula.

In this method, the irregular shape is simply divided into regular shapes such as a polygon, which is divided into 2 triangles [71]. Each side distance of the triangle is measured as the distance calculated between 2 points, and the values are computed with Heron’s formula:

G = \sqrt{t (t - m) (t - n) (t - o)} w h e r e t = \frac{m + n + o}{2}

(14)

where m, n, and o are the sides of the triangle, as shown in Figure 9. After the area of each triangle is calculated, the areas of both triangles are added together to find the area of the irregular shape. In this way, all the shape areas of the various shapes are computed, and the features are then available for classification and recognition.

3.5. Feature Extraction via Full Hand

This section provides a detailed description of feature extraction from the full hand using the index values of points drawn using self-organizing neural gas.

3.5.1. Mesh Geometry

The aim of this stage is to estimate the morphology of the hand. This is accomplished by applying self-organizing neural gas (SONG) on the segmented binary image. SONG is an unsupervised learning model used in applications in which it is important to maintain the topology between input and output spaces. The clustering of input data is achieved so that the distance of the data item in inter-cluster variance is small, and in different classes, inter-cluster variance is large [72,73,74].

A typical SONG training starts with the first two output neurons (n = 2). For training of the SONG, all the training datasets I are circularly used. All accumulated errors E_w⁽¹⁾, E_w⁽²⁾,

\forall w ε [1, n]

are set to zero from the beginning of each epoch. E_w⁽¹⁾ shows the total quantization error that corresponds to the neuron at the end of an epoch, while the increment of the total quantization error we obtain after removal of the neuron is represented by variable E_w⁽²⁾. For the given input vector I_x, the starting two neurons are obtained by

N e u r o n_{a 1} = ∥ I_{x} - W_{a 1} ∥ \leq ∥ I_{x} - W_{w} ∥, \forall w ε [1, n]

(15)

N e u r o n_{a 2} = ∥ I_{x} - W_{a 2} ∥ \leq ∥ I_{x} - W_{w} ∥, \forall w ε [1, n] a n d w \neq y 1

(16)

where the initial weight vector W_w, w = 1, 2 are randomly selected by the two neurons in the input space. The values of the local variables E_a1⁽¹⁾ and E_a1⁽²⁾ change according to the following equation:

E_{a 1}^{(1)} = E_{a 1}^{(1)} - ∥ I_{x} - W_{a 1} ∥ E_{a 1}^{(2)} = E_{a 1}^{(2)} - ∥ I_{x} - W_{a 2} ∥ C_{a 1} = C_{a 1} + 1

(17)

The counter C_a is assigned a zero value for these two neurons, as w = 1, 2, and if

C_{a 1} \leq C_{i d l e}

, then the local learning rate is defined as

ε 2_{a 1 =} \frac{ε 1_{a 1}}{r_{a 1}}

(18)

where

ε 2_{a 1}

and

ε 1_{a 1}

change the value according to (17), (18), and (19). Otherwise, the local values will have constant values

ε 1_{a 1} = ε 1_{m i n}

and

ε 2_{a 1} = 0

.

ε 1_{a 1} = ε 1_{m a x} + ε 1_{m i n} - ε 1_{m i n} \cdot {[\frac{ε 1_{m a x}}{ε 1_{m i n}}]}^{\frac{I_{a 1}}{I_{i d l e}}}

(19)

r_{a 1} = r_{m a x} + 1 - r_{m a x} \cdot {[\frac{1}{r_{m a x}}]}^{\frac{I_{a 1}}{I_{i d l e}}}

(20)

The learning rate

ε 1_{w}

is applied to the winner neuron, while

ε 2_{w}

is applied to the weights of the neighbor of the winning neuron. The learning rate changes values from maximum to minimum, which is defined by the

I_{i d l e}

parameter. The initial value of r_min = 1, with the period of time the value of r_aw defined by I_idle parameter, reaches to maximum r_max. The weight vectors of the winning neuron Neuron_a1 and its neighbor neurons Neuron_o, o ϵ ne(a1) are adapted according to the following equations:

W_{a 1} = W_{a 1} + ε 1_{a 1} \cdot (I_{x} - W_{a 1})

(21)

W_{o} = W_{o} + ε 2_{o} \cdot (I_{x} - W_{o}), \forall o \in n e (a 1)

(22)

After neurons Neuron_a1 and Neuron_a2 are detected, the connection between them is created. At the end of each epoch, all the neurons are in the idle state. If the local counters are greater than the value of C_idle then the neurons are well trained. Here, the convergence SONG network is assumed. Figure 10 shows the topological features of input space I extracted by SONG.

Figure 10. Self-organizing neural gas (SONG) extracted on the input space.

Algorithm 1. Pseudo code for self-organizing neural gas

Input: Input space, I;
Output: the map, G = (V, E);

Initialization:
First, randomly generate two nodes, N = (n₁, n₂) in the input space.
Second, set the neighboring neuron to zero and set the maximum number of nodes to 100.

1. Randomly generate one input signal Ў to update input space I, calculate the winning node x₁ and x₂ nearest to Ў

x_{1} = a r g m i n_{n \in N} ∥ Ў - w_{n} ∥

x_{2} = a r g m i n_{n \in \{x_{1}\}} ∥ Ў - w_{n} ∥

2. Adjust x₁ and x₂
(a) Create a connecting edge if there is no connection between x₁ and x₂.

e d g e = e d g e U \{(x_{1}, x_{2})\}

(b) Set the edge=0.
(c) Adjust the error of the winning node x₁:

E_{x_{1}} = E_{x_{1}} + ∥ Ў - w_{x_{1}} ∥

(d) To adjust the winning node

x_{1}

use the learning rate
(e) Adjust all of the edges connected with node

x_{1}

:

3. Remove all edges larger than a_max and delete all nodes without connecting edges

4. Insert new nodes and divide them into two parts.

5. Insert new nodes in the following steps
i. Locate the neighboring node n of u with the largest error, and insert new node r between them.

V = V U \{r\}, w_{r} = (w_{u} + w_{v}) / 2

ii. Create the edges of r with u and v, and delete the edge between u and v and locate the induced subgraph
iii. Lower down the error of u and v, and set the error of node r.
iv. Regulate error of all nodes

6. If stop conditions are not satisfied, then go back to Step 1.

The outer nodes index value is taken as a feature. Each gesture depicts the different morphology of the gesture. The outer nodes are selected by inspecting the neighborhood pixel values. If the pixel has the white value, then the node is selected; otherwise, it is rejected. Figure 11 shows the mesh and the selected outer boundary of the hand.

3.5.2. Active Model

The second method used for feature extraction from the full hand uses 8 Freeman chain code algorithms. This method measures the intensity change along with the curve points on the boundary of hand gestures. First, the boundary of the hand is identified. All the curve points along the hand contour are identified and represented using the 8 Freeman chain code algorithm [75,76]. Let us suppose all the points along the boundary b are represented by points n. The s is the starting point on the top left side of the thumb, and s will check points until n-1. The curve point on the boundary is represented as C_b, and thus all points will be C_b = {s₀, s₁, ……, s_n−1}.

We start to find feature points from s0 and move in a clockwise direction along the boundary until a change in the direction is observed. Let the next point be s1 and current points s0; if the direction of both s0 and s1 is the same, then the point s1 will be excluded and the next point, s2, will be checked. If the directions of both s0 and s1 are different, then s1 will be considered as feature point f. All points on the boundary will be checked similarly and, if the current point and the next point difference is greater than 0, then it will be selected as feature point f [77]. Figure 12 depicts point selection.

A total of 8 cases of 45° and 4 cases of 90° are taken to find the changes in the direction of points in order to find the points for features. Figure 13 represents the changes in direion of 45° and 90° in which the yellow line shows the direction of the current curve point while the blue arrow shows the subsequent direction of the curve point.

3.6. Features Optimization

For feature optimization, gray wolf optimization (GWO) is applied in order to obtain the best feature vector for classification. GWO discriminates the different cases and provides multiple solutions. It resembles the organizational structure of wolves for group hunting, which is a very clever swarm tactic. Four types of wolves stimulate leadership hierarchy. The alpha wolf is the master for all the gestures. The beta wolf is a subordinate wolf, which also helps the alpha to make choices [51,78,79]. The delta wolf is only appointed when alpha, beta, and omega are not wolves. The omega is a low-rated wolf that only reports to the other wolves. The omega is also dominated by delta wolves and it reports to both alpha and beta. The strategies of hunting that identify a wolf’s location can be seen mathematically as

\hat{A} α = |\overset{´}{Y} 1 \cdot \bar{L} α - \bar{L} (t)|, \hat{A} β = |\overset{´}{Y} 2 \cdot \bar{L} β - \bar{L} (t)|, \hat{A} δ = |\overset{´}{Y} 3 \cdot \bar{L} δ - \bar{L} (t)|

(23)

where t is iterations. When the target is identified, the repletion begins (t = 1). The alpha, delta, and beta would instruct the omegas to chase and encircle the target. L is the location trajectory of the gray wolf [80]. L is defined as

\bar{L} 1 = \bar{L} α - \bar{Z} 1 \cdot \hat{A} α, \bar{L} 2 = \bar{L} β - \bar{Z} 2 \cdot \hat{A} β, \bar{L} 3 = \bar{L} δ - \bar{Z} 3 \cdot \hat{A} δ, \bar{L} (t) = \frac{\bar{L} 1 + \bar{L} 2 + \bar{L} 3}{3}

(24)

where L1, L2, and L3 are the location trajectories of alpha, beta, and delta wolves, respectively. The L and d are the mixtures of the containing restriction a and the haphazard quantities x1 and x2 as

\bar{L} = 2 α x_{1}, d = 2 x_{2}

(25)

The optimization result for the Sign Word dataset is given below (Figure 14):

3.7. Classifier: Reweighted Genetic Algorithm

For classification, a modified version of the state-of-the-art genetic algorithm (GA) is introduced. A genetic algorithm (GA) is an evolutionary algorithm that is robust, heuristic, and stochastic and is reliable for high-dimensional space [81]. The genetic strategy is used during complex computational problems. It is a pool-based algorithm that uses small chunks of data to find optimal solutions with random biological operations, i.e., crossover, mutation, and selection. In the genetic model, operations are performed on a basic unit known as chromosomes. Feature vectors are converted into chromosomes by mapping every single feature to respective genes [82]. Chromosomes consist of genes; each gene represents a single feature in the feature vector. Figure 15 shows the basic structure of genetic model units. To find the optimal solution chromosomes, filter the search space in different orders. On the other hand, the population is the pool of chromosomes. In selection process, the first chromosome is selected randomly from the pool, and after that, a fitness function is applied to all chromosomes and numbers are generated. The chromosome having greater number is the fittest and it is selected for the optimal path solution [83].

In the reweighted genetic algorithm, the classifier is divided into 2 phases: reweighted feature selection and classification. In the first phase, weights are assigned to optimized features using a support vector machine and random forest classifier. In the classification phase, the resultant output is classified into different human gestures.

Initially, GA starts with optimized features on which crossover and mutation techniques are applied. In the crossover function, the optimized features are represented as chromosomes in a subspace known as population. After this, mutation is applied to crossed chromosomes to increase diversity. This also provides a method that helps in escaping from the local optimum. Finally, resultant chromosomes are duplicated, and weights are assigned to them so that prominent features are assigned according to better weights.

C_{o p t} (f) = \sum_{k = 1}^{K} O_{f 1}, O_{f 2} \dots \dots, O_{f n}, * O_{f 1}, O_{f 2} \dots \dots, O_{f n} M_{o p t} (f) = O ”_{f 1}, O ”_{f 2} \dots \dots, O ”_{f n 1}

(26)

where Of1 is the optimized feature,

C_{o p t}

is the crossover, and

M_{o p t}

is the mutation function applied over gray wolf optimized features. These GA patterns are then inserted into a codebook pattern and classified by finding a maximum matching cluster from the codebook [84] (Figure 16).

4. System Validation and Experimentation

This section provides a brief description of the datasets used for the training and testing of the proposed system. All the experiments were performed on MATLAB R2017a. The following parameters were used to validate the system’s performance. Firstly, the recognition rate of single and gesture performed by both hands from all five datasets is given. Secondly, the precision, recall, and F1 values via decision tree, ANN, and genetic algorithm are given for all five datasets. Finally, a comparison of our method with other state-of-the-art methods is provided.

4.1. Dataset Description

Table 2 represents the name, type of input data, and description of each dataset for the training and testing of the proposed system.

4.2. Recognition Accuracy

To validate the system’s performance, we first gave the Sign Word dataset hand gesture to the proposed system to determine the recognition rate using a genetic classifier. The percentage of accuracies for each class was given separately in the form of a confusion matrix. Each gesture class for all five datasets used for experimentation achieved up to the mark performance with our proposed system. Table 3, Table 4 and Table 5 show the confusion matrix of accuracy scores for gesture classification for the proposed approach for the Sign Word dataset, the Dexter1, and the Dexter + Object, respectively. Table 6 shows the mean accuracy of all five datasets used for testing the proposed system.

We used five HGR datasets for experimentation, namely, Sign Word, Dexter1, Dexter + Object, STB, and NYU datasets that produced 92.1%, 93.1%, 88.2%, 90.8%, and 85.3% mean accuracy, respectively.

4.3. Precision, Recall, and F1 Score

In this Section, precision accuracy, recall, and F1 scores are given using a decision tree, ANN, and genetic algorithms on all five datasets. Results show that the genetic algorithm produced a better performance over all three classifiers. The decision tree omitted sampling features for classification while training and the classification process was faster compared to training. ANN required a maximum number of samples for training and, as the number of training samples was less than 100 million, the accuracy rate for ANN was less compared to the other classifiers. The genetic algorithm gave better results in the proposed system. Table 7, Table 8, Table 9, Table 10 and Table 11 present the test results for precision, recall, and F1 scores for all the three classifiers on all five respective datasets.

4.4. Comparison

The comparison between our proposed method and other state-of-art-methods is given in Table 12. The results show that our proposed method, which is the combined feature extraction method (i.e., using both key points and full hand), produced higher recognition accuracy rates than the other state-of-the-art methods, which use a single feature extraction method (i.e., either point based or full hand). Our proposed method accurately extracted ROI from RGB images and accurately extracted feature vectors on the proposed method. The reweighted genetic algorithm used optimized features, 70% of the feature vectors for training and 30% of the feature vector for testing, to produce accurate results. The table shows that on all five datasets, namely, Sign Word, Dexter1, Dexter + Object, STB, and NYU used for training and testing, our proposed method produced higher accuracy than the other methods.

5. Conclusions

In this research work, we developed an efficient HGR system for healthcare muscle exercise via point-based and full-hand features and a reweighted genetic algorithm. Features proposed in this method include Euclidean distance, the cosine of angles (i.e., α, β, γ), area of irregular shapes, SONG mesh, and chain model to select the optimal features. GWO with RGA is used to optimize, train, and recognize different gestures for muscle exercise. Our proposed system outperformed other HGR systems in terms of accuracy at 92.1%, 93.1%, 88.2%, 90.8%, and 85.3% over the Sign Word, Dexter1, Dexter + Object, STB, and NYU datasets, respectively. Then, precision, recall, and F1 scores were also measured for overall gesture recognition in all datasets. In the end, the performance of the proposed system was compared with the other state-of-the-art systems. We expect our system to perform well for the recognition of daily gestures performed in any environment.

In the future, we plan to improve features with different techniques. 3D mesh features will be improved. We will also develop our dataset for healthcare, which will include complex gestures. Dynamic gestures will also be tackled and recognized by the system.

Author Contributions

Conceptualization, H.A.; methodology, H.A., M.G., and A.J.; software, H.A.; validation, M.G. and A.J.; formal analysis, K.K. and M.G.; resources, A.J. and K.K.; writing—review and editing, A.J. and K.K.; funding acquisition, A.J. and K.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF), funded by the Ministry of Education (no. 2018R1D1A1A02085645). Moreover, this work was supported by the Korea Medical Device Development Fund grant funded by the Korean government (the Ministry of Science and ICT; the Ministry of Trade, Industry and Energy; the Ministry of Health and Welfare; and the Ministry of Food and Drug Safety) (project number: 202012D05-02).

Conflicts of Interest

The authors declare no conflict of interest.

References

Liu, H.; Lihui, W. Gesture recognition for human-robot collaboration: A review. Int. J. Ind. Ergon. 2018, 68, 355–367. [Google Scholar] [CrossRef]
Tingting, Y.; Junqian, W.; Lintai, W.; Yong, X. Three-stage network for age estimation. CAAI Trans. Intell. Technol. 2019, 4, 122–126. [Google Scholar] [CrossRef]
Haria, A.P.; BMS College of Engineering; Subramanian, A.; Asokkumar, N.; Podda, S.; Nayak, J.S. Hand Gesture Recognition System. Int. J. Comput. Trends Technol. 2017, 47, 209–212. [Google Scholar] [CrossRef]
Nishihara, H.K.; Hsu, S.P.; Kaehler, A.; Jangaard, L. Northrop Grumman Systems Corp. Hand-Gesture Recognition Method. U.S. Patent No. 9,696,808, April 2017. [Google Scholar]
Sagayam, K.M.; Hemanth, D.J. Hand posture and gesture recognition techniques for virtual reality applications: A survey. Virtual Real. 2017, 21, 91–107. [Google Scholar] [CrossRef]
Bobic, V.; Tadic, P.; Kvascev, G. Hand gesture recognition using neural network based techniques. In Proceedings of the 2016 13th Symposium on Neural Networks and Applications (NEUREL), Belgrade, Serbia, 22–24 November 2016; pp. 1–4. [Google Scholar] [CrossRef]
Oudah, M.; Al-Naji, A.; Chahl, J. Hand Gesture Recognition Based on Computer Vision: A Review of Techniques. J. Imaging 2020, 6, 73. [Google Scholar] [CrossRef]
Li, W.-J.; Hsieh, C.-Y.; Lin, L.-F.; Chu, W.-C. Hand gesture recognition for post-stroke rehabilitation using leap motion. In Proceedings of the 2017 International Conference on Applied System Innovation (ICASI), Sapporo, Japan, 13–17 May 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 386–388. [Google Scholar] [CrossRef]
Cheng, J.; Wei, F.; Liu, Y.; Li, C.; Chen, Q.; Chen, X. Chinese Sign Language Recognition Based on DTW-Distance-Mapping Features. Math. Probl. Eng. 2020, 2020, 1–13. [Google Scholar] [CrossRef]
Jalal, A.; Uddin, I. Security architecture for third generation (3G) using GMHS cellular network. In Proceedings of the 2007 International Conference on Emerging Technologies, Rawalpindi, Pakistan, 12–13 November 2007; pp. 74–79. [Google Scholar] [CrossRef]
Oyedotun, O.K.; Khashman, A. Deep learning in vision-based static hand gesture recognition. Neural Comput. Appl. 2017, 28, 3941–3951. [Google Scholar] [CrossRef]
Pinto, R.F.; Borges, C.D.B.; Almeida, A.M.A.; Paula, I.C. Static Hand Gesture Recognition Based on Convolutional Neural Networks. J. Electr. Comput. Eng. 2019, 2019, 1–12. [Google Scholar] [CrossRef]
Gao, Q.; Liu, J.; Ju, Z.; Li, Y.; Zhang, T.; Zhang, L. Static Hand Gesture Recognition with Parallel CNNs for Space Human-Robot Interaction. In Constructive Side-Channel Analysis and Secure Design, Proceedings of the 8th International Workshop, COSADE 2017, Paris, France, 13–14 April 2017; Springer International Publishing: Singapore, 2017; pp. 462–473. [Google Scholar] [CrossRef]
Zheng, Q.; Yang, M.; Tian, X.; Jiang, N.; Wang, D. A Full Stage Data Augmentation Method in Deep Convolutional Neural Network for Natural Image Classification. Discret. Dyn. Nat. Soc. 2020, 2020, 1–11. [Google Scholar] [CrossRef]
Asif, A.R.; Waris, A.; Gilani, S.O.; Jamil, M.; Ashraf, H.; Shafique, M.; Niazi, I.K. Performance Evaluation of Convolutional Neural Network for Hand Gesture Recognition Using EMG. Sensors 2020, 20, 1642. [Google Scholar] [CrossRef] [Green Version]
Su, H.; Ovur, S.E.; Zhou, X.; Qi, W.; Ferrigno, G.; De Momi, E. Depth vision guided hand gesture recognition using electromyographic signals. Adv. Robot. 2020, 34, 985–997. [Google Scholar] [CrossRef]
Motoche, C.; Benalcázar, M.E. Real-Time Hand Gesture Recognition Based on Electromyographic Signals and Artificial Neural Networks. In Constructive Side-Channel Analysis and Secure Design, Proceedings of the 9th International Workshop, COSADE 2018, Singapore, 23–24 April 2018; Springer International Publishing: Singapore, 2018; pp. 352–361. [Google Scholar] [CrossRef]
Sapienza, S.; Ros, P.M.; Guzman, D.A.F.; Rossi, F.; Terracciano, R.; Cordedda, E.; Demarchi, D. On-Line Event-Driven Hand Gesture Recognition Based on Surface Electromyographic Signals. In Proceedings of the 2018 IEEE International Symposium on Circuits and Systems (ISCAS), Florence, Italy, 27–30 May 2018; pp. 1–5. [Google Scholar] [CrossRef]
Pinzon-Arenas, J.O.; Jimenez-Moreno, R.; Herrera-Benavides, J.E. Convolutional Neural Network for Hand Gesture Recognition using 8 different EMG Signals. In Proceedings of the 2019 XXII Symposium on Image, Signal Processing and Artificial Vision (STSIVA), Bucaramanga, Colombia, 24–26 April 2019; pp. 1–5. [Google Scholar] [CrossRef]
Benalcazar, M.E.; Jaramillo, A.G.; Jonathan; Zea, A.; Paez, A.; Andaluz, V.H. Hand gesture recognition using machine learning and the Myo armband. In Proceedings of the 2017 25th European Signal Processing Conference (EUSIPCO), Kos Island, Greece, 28 August–2 September 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1040–1044. [Google Scholar] [CrossRef] [Green Version]
Qi, J.; Jiang, G.; Li, G.; Sun, Y.; Tao, B. Surface EMG hand gesture recognition system based on PCA and GRNN. Neural Comput. Appl. 2019, 32, 6343–6351. [Google Scholar] [CrossRef]
Qi, W.; Su, H.; Aliverti, A. A Smartphone-Based Adaptive Recognition and Real-Time Monitoring System for Human Activities. IEEE Trans. Hum. Mach. Syst. 2020, 50, 414–423. [Google Scholar] [CrossRef]
Wang, Z.; Hou, Y.; Jiang, K.; Dou, W.; Zhang, C.; Huang, Z.; Guo, Y. Hand Gesture Recognition Based on Active Ultrasonic Sensing of Smartphone: A Survey. IEEE Access 2019, 7, 111897–111922. [Google Scholar] [CrossRef]
Haseeb, M.A.A.; Parasuraman, R. Wisture: Touch-Less Hand Gesture Classification in Unmodified Smartphones Using Wi-Fi Signals. IEEE Sens. J. 2018, 19, 257–267. [Google Scholar] [CrossRef]
Zhang, H.; Xu, W.; Chen, C.; Bai, L.; Zhang, Y. Your Knock Is My Command: Binary Hand Gesture Recognition on Smartphone with Accelerometer. Mob. Inf. Syst. 2020, 2020, 1–16. [Google Scholar] [CrossRef]
Panella, M.; Altilio, R. A Smartphone-Based Application Using Machine Learning for Gesture Recognition: Using Feature Extraction and Template Matching via Hu Image Moments to Recognize Gestures. IEEE Consum. Electron. Mag. 2018, 8, 25–29. [Google Scholar] [CrossRef]
Aldabbagh, G.; AlGhazzawi, D.M.; Hasan, S.H.; Alhaddad, M.; Malibari, A.; Cheng, L. Optimal Learning Behavior Prediction System Based on Cognitive Style Using Adaptive Optimization-Based Neural Network. Complexity 2020, 2020, 1–13. [Google Scholar] [CrossRef]
Wiens, T. Engine speed reduction for hydraulic machinery using predictive algorithms. Int. J. Hydromech. 2019, 2, 16. [Google Scholar] [CrossRef]
Li, G.; Tang, H.; Sun, Y.; Kong, J.; Jiang, G.; Jiang, D.; Tao, B.; Xu, S.; Liu, H. Hand gesture recognition based on convolution neural network. Clust. Comput. 2019, 22, 2719–2729. [Google Scholar] [CrossRef]
Zheng, Q.; Tian, X.; Liu, S.; Yang, M.; Wang, H.; Yang, J. Static Hand Gesture Recognition Based on Gaussian Mixture Model and Partial Differential Equation. IAENG Int. J. Comput. 2018, Sci. 45, 569–583. [Google Scholar]
Cheng, H.; Dai, Z.; Liu, Z.; Zhao, Y. An image-to-class dynamic time warping approach for both 3D static and trajectory hand gesture recognition. Pattern Recognit. 2016, 55, 137–147. [Google Scholar] [CrossRef]
Oprisescu, S.; Christoph, R.; Bochao, S. Automatic static hand gesture recognition using tof cameras. In Proceedings of the 20th European Signal Processing Conference (EUSIPCO), Bucharest, Romania, 27–31 August 2012; IEEE: Piscataway, NJ, USA, 2012; pp. 2748–2751. [Google Scholar]
Yun, L.; Lifeng, Z.; Shujun, Z. A Hand Gesture Recognition Method Based on Multi-Feature Fusion and Template Matching. Proc. Eng. 2012, 29, 1678–1684. [Google Scholar] [CrossRef] [Green Version]
Ghosh, D.K.; Ari, S. Static Hand Gesture Recognition Using Mixture of Features and SVM Classifier. In Proceedings of the 2015 Fifth International Conference on Communication Systems and Network Technologies, Gwalior, India, 4–6 April 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 1094–1099. [Google Scholar] [CrossRef]
Candrasari Banuwati, E.; Novamizanti, L.; Aulia, S. Discrete Wavelet Transform on static hand gesture recognition. J. Phys. Conf. Ser. 2019, 1367, 012022. [Google Scholar] [CrossRef]
Jalal, A.; Khalid, N.; Kim, K. Automatic Recognition of Human Interaction via Hybrid Descriptors and Maximum Entropy Markov Model Using Depth Sensors. Entropy 2020, 22, 817. [Google Scholar] [CrossRef] [PubMed]
Chen, X.; Shi, C.; Liu, B. Static hand gesture recognition based on finger root-center-angle and length weighted Ma-halanobis distance. In Real-Time Image and Video Processing; International Society for Optics and Photonics: Bellingham, WA, USA, 2016; Volume 9897, p. 98970U. [Google Scholar] [CrossRef]
Bhavana, V.; Mouli, G.M.S.; Lokesh, G.V.L. Hand Gesture Recognition Using Otsu’s Method. In Proceedings of the 2017 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC), Tamilnadu, India, 14–16 December 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1–4. [Google Scholar] [CrossRef]
Yusnita, L.; Rosalina, R.; Roestam, R.; Wahyu, R.B. Implementation of Real-Time Static Hand Gesture Recognition Using Artificial Neural Network. Commun. Inf. Technol. J. 2017, 11, 85–91. [Google Scholar] [CrossRef] [Green Version]
Jalal, A.; Quaid, M.A.K.; Tahir, S.B.U.D.; Kim, K. A Study of Accelerometer and Gyroscope Measurements in Physical Life-Log Activities Detection Systems. Sensors 2020, 20, 6670. [Google Scholar] [CrossRef] [PubMed]
Liu, K.; Kehtarnavaz, N. Real-time robust vision-based hand gesture recognition using stereo images. J. Real Time Image Process. 2013, 11, 201–209. [Google Scholar] [CrossRef]
Ahmed, W.; Chanda, K.; Mitra, S. Vision based Hand Gesture Recognition using Dynamic Time Warping for Indian Sign Language. In Proceedings of the 2016 International Conference on Information Science (ICIS), Kochi, India, 11–14 December 2016; IEEE: Piscataway, NJ, USA, 2017; pp. 120–125. [Google Scholar] [CrossRef]
Al-Shamayleh, A.S.; Ahmad, R.; Abushariah, M.A.M.; Alam, K.A.; Jomhari, N. A systematic literature review on vision based gesture recognition techniques. Multim. Tools Appl. 2018, 77, 28121–28184. [Google Scholar] [CrossRef]
Pansare, J.R.; Ingle, M. Vision-based approach for American Sign Language recognition using Edge Orientation Histogram. In Proceedings of the 2016 International Conference on Image, Vision and Computing (ICIVC), Portsmouth, NH, USA, 3–5 August 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 86–90. [Google Scholar] [CrossRef]
Hussain, S.; Saxena, R.; Han, X.; Khan, J.A.; Shin, H. Hand gesture recognition using deep learning. In Proceedings of the 2017 International SoC Design Conference (ISOCC), Seoul, Korea, 5–8 November 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 48–49. [Google Scholar] [CrossRef]
Mahdikhanlou, K.; Ebrahimnezhad, H. Multimodal 3D American sign language recognition for static alphabet and numbers using hand joints and shape coding. Multim. Tools Appl. 2020, 79, 22235–22259. [Google Scholar] [CrossRef]
Liu, J.; Ding, H.; Shahroudy, A.; Duan, L.-Y.; Jiang, X.; Wang, G.; Kot, A.C. Feature Boosting Network For 3D Pose Estimation. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 494–501. [Google Scholar] [CrossRef] [Green Version]
Jalal, A.; Akhtar, I.; Kim, K. Human Posture Estimation and Sustainable Events Classification via Pseudo-2D Stick Model and K-ary Tree Hashing. Sustainability 2020, 12, 9814. [Google Scholar] [CrossRef]
Kerdvibulvech, C. A methodology for hand and finger motion analysis using adaptive probabilistic models. EURASIP J. Embed. Syst. 2014, 2014, 18. [Google Scholar] [CrossRef] [Green Version]
Nguyen, T.-N.; Vo, D.-H.; Huynh, H.-H.; Meunier, J. Geometry-based static hand gesture recognition using support vector machine. In Proceedings of the 2014 13th International Conference on Control Automation Robotics & Vision (ICARCV), Singapore, 10–12 December 2014; IEEE: Piscataway, NJ, USA, 2014; pp. 769–774. [Google Scholar] [CrossRef]
Jalal, A.; Kim, Y.-H.; Kim, Y.-J.; Kamal, S.; Kim, D. Robust human activity recognition from depth video using spatiotemporal multi-fused features. Pattern Recognit. 2017, 61, 295–308. [Google Scholar] [CrossRef]
Jalal, A.; Uddin, Z.; Kim, T.-S. Depth video-based human activity recognition system using translation and scaling invariant features for life logging at smart home. IEEE Trans. Consum. Electron. 2012, 58, 863–871. [Google Scholar] [CrossRef]
Kolkur, S.; Kalbande, D.; Shimpi, P.; Bapat, C.; Jatakia, J. Human Skin Detection Using RGB, HSV and YCbCr Color Models. In Proceedings of the International Conference on Communication and Signal Processing 2016 (ICCASP 2016), Lonere, India, 26–27 December 2016; Atlantic Press: Amsterdam, The Netherlands, 2017. [Google Scholar] [CrossRef] [Green Version]
Lv, Y.; Zhou, W. Hierarchical Multimodal Adaptive Fusion (HMAF) Network for Prediction of RGB-D Saliency. Comput. Intell. Neurosci. 2020, 2020, 1–9. [Google Scholar] [CrossRef]
Zhang, Q.; Yang, M.; Kpalma, K.; Zheng, Q.; Zhang, X. Segmentation of hand posture against complex backgrounds based on saliency and skin colour detection. IAENG Int. J. Comput. Sci. 2018, 45, 435–444. [Google Scholar]
Grzejszczak, T.; Kawulok, M.; Galuszka, A. Hand landmarks detection and localization in color images. Multim. Tools Appl. 2016, 75, 16363–16387. [Google Scholar] [CrossRef] [Green Version]
Jalal, A.; Batool, M.; Kim, K. Sustainable Wearable System: Human Behavior Modeling for Life-Logging Activities Using K-Ary Tree Hashing Classifier. Sustainability 2020, 12, 10324. [Google Scholar] [CrossRef]
Kim, T.; Jalal, A.; Han, H.; Jeon, H.; Kim, J. Real-Time Life Logging via Depth Imaging-based Human Activity Recognition towards Smart Homes Services. In Proceedings of the International Symposium on Renewable Energy Sources and Healthy Buildings, Seoul, Korea, 26–29 August 2014; p. 63. [Google Scholar] [CrossRef]
Tahir, S.B.U.D.; Jalal, A.; Kim, K. Wearable Inertial Sensors for Daily Activity Analysis Based on Adam Optimization and the Maximum Entropy Markov Model. Entropy 2020, 22, 579. [Google Scholar] [CrossRef]
Jalal, A.; Batool, M.; Kim, K. Stochastic Recognition of Physical Activity and Healthcare Using Tri-Axial Inertial Wearable Sensors. Appl. Sci. 2020, 10, 7122. [Google Scholar] [CrossRef]
Ahmed, A.; Jalal, A.; Kim, K. A Novel Statistical Method for Scene Classification Based on Multi-Object Categorization and Logistic Regression. Sensors 2020, 20, 3871. [Google Scholar] [CrossRef] [PubMed]
Mahmood, M.; Jalal, A.; Kim, K. WHITE STAG model: Wise human interaction tracking and estimation (WHITE) using spa-tio-temporal and angular-geometric (STAG) descriptors. Multim. Tools Appl. 2019, 79, 1–32. [Google Scholar] [CrossRef]
Shehzed, A.; Jalal, A.; Kim, K. Multi-Person Tracking in Smart Surveillance System for Crowd Counting and Normal/Abnormal Events Detection. In Proceedings of the 2019 International Conference on Applied and Engineering Mathematics (ICAEM), London, UK, 3–5 July 2019; Volume 12, pp. 163–168. [Google Scholar] [CrossRef]
Jalal, A.; Quaid, M.A.K.; Hasan, A.S. Wearable Sensor-Based Human Behavior Understanding and Recognition in Daily Life for Smart Environments. In Proceedings of the 2018 International Conference on Frontiers of Information Technology (FIT), Islamabad, Pakistan, 17–19 December 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 105–110. [Google Scholar] [CrossRef]
Jalal, A.; Sharif, N.; Kim, J.T.; Kim, T.-S. Human activity recognition via recognized body parts of human depth silhouettes for residents monitoring services at smart homes. Indoor Built Environ. 2013, 22, 271–279. [Google Scholar] [CrossRef]
Jalal, A.; Kamal, S.; Kim, D. A Depth Video Sensor-Based Life-Logging Human Activity Recognition System for Elderly Care in Smart Indoor Environments. Sensors 2014, 14, 11735–11759. [Google Scholar] [CrossRef]
Susan, S.; Agrawal, P.; Mittal, M.; Bansal, S. New shape descriptor in the context of edge continuity. CAAI Trans. Intell. Technol. 2019, 4, 101–109. [Google Scholar] [CrossRef]
Osterland, S.; Weber, J. Analytical analysis of single-stage pressure relief valves. Int. J. Hydromechatron. 2019, 2, 32–53. [Google Scholar] [CrossRef]
Ghesmoune, M.; Lebbah, M.; Azzag, H. A new Growing Neural Gas for clustering data streams. Neural Netw. 2016, 78, 36–50. [Google Scholar] [CrossRef]
Sun, Q.; Liu, H.; Harada, T. Online growing neural gas for anomaly detection in changing surveillance scenes. Pattern Recognit. 2017, 64, 187–201. [Google Scholar] [CrossRef]
Zhong, C.; Zhang, B.; Wang, J. Scale-Adaptive Growing Neural Network Based on Distortion Error Stability and its Application in Image Topological Feature Extraction. IEEE Access 2021, 9, 767–776. [Google Scholar] [CrossRef]
Ghaderi, A.; Morovati, V.; Dargazany, R. A Physics-Informed Assembly of Feed-Forward Neural Network Engines to Predict Inelasticity in Cross-Linked Polymers. Polymers 2020, 12, 2628. [Google Scholar] [CrossRef] [PubMed]
Jedynak, R. Approximation of the inverse Langevin function revisited. Rheol. Acta 2014, 54, 29–39. [Google Scholar] [CrossRef] [Green Version]
Hossain, M.; Steinmann, P. More hyperelastic models for rubber-like materials: Consistent tangent operators and comparative study. J. Mech. Behav. Mater. 2013, 22, 27–50. [Google Scholar] [CrossRef]
Kroon, M. An 8-chain Model for Rubber-like Materials Accounting for Non-affine Chain Deformations and Topological Constraints. J. Elast. 2010, 102, 99–116. [Google Scholar] [CrossRef]
Hong, F.; Lu, C.; Liu, C.; Liu, R.; Jiang, W.; Ju, W.; Wang, T. PGNet: Pipeline Guidance for Human Key-Point Detection. Entropy 2020, 22, 369. [Google Scholar] [CrossRef] [Green Version]
Pławiak, P. Novel methodology of cardiac health recognition based on ECG signals and evolutionary-neural system. Expert Syst. Appl. 2018, 92, 334–349. [Google Scholar] [CrossRef]
Demidova, L.; Nikulchev, E.; Sokolova, Y. The SVM Classifier Based on the Modified Particle Swarm Optimization. Int. J. Adv. Comput. Sci. Appl. 2016, 7. [Google Scholar] [CrossRef] [Green Version]
Emary, E.; Zawbaa, H.M.; Grosan, C. Experienced Gray Wolf Optimization through Reinforcement Learning and Neural Networks. IEEE Trans. Neural Netw. Learn. Syst. 2017, 29, 681–694. [Google Scholar] [CrossRef]
Lessmann, S.; Stahlbock, R.; Crone, S. Genetic Algorithms for Support Vector Machine Model Selection. In Proceedings of the 2006 IEEE International Joint Conference on Neural Network Proceedings, Vancouver, BC, Canada, 16–21 July 2006; IEEE: Piscataway, NJ, USA, 2006; pp. 3063–3069. [Google Scholar] [CrossRef]
Quaid, M.A.K.; Jalal, A. Wearable sensors based human behavioral pattern recognition using statistical features and re-weighted genetic algorithm. Multim. Tools Appl. 2020, 79, 6061–6083. [Google Scholar] [CrossRef]
Batool, M.; Jalal, A.; Kim, K. Telemonitoring of Daily Activity Using Accelerometer and Gyroscope in Smart Home Environ-ments. J. Electr. Eng. Technol. 2020, 15, 2801–2809. [Google Scholar] [CrossRef]
Ong, Y.S.; Nair, P.B.; Keane, A.J. Evolutionary Optimization of Computationally Expensive Problems via Surrogate Modeling. AIAA J. 2003, 41, 687–696. [Google Scholar] [CrossRef] [Green Version]
Jalal, A.; Lee, S.; Kim, J.T.; Kim, T.-S. Human Activity Recognition via the Features of Labeled Depth Body Parts. In Computer Vision; Springer International Publishing: New York, NY, USA, 2012; pp. 246–249. [Google Scholar] [CrossRef]
Rahim, A.; Islam, R.; Shin, J. Non-Touch Sign Word Recognition Based on Dynamic Hand Gesture Using Hybrid Segmentation and CNN Feature Fusion. Appl. Sci. 2019, 9, 3790. [Google Scholar] [CrossRef] [Green Version]
Sridhar, S.; Oulasvirta, A.; Theobalt, C. Interactive Markerless Articulated Hand Motion Tracking Using RGB and Depth Data. In Proceedings of the 2013 IEEE International Conference on Computer Vision, Sydney, Australia, 1–8 December 2013; IEEE: Piscataway, NJ, USA, 2013; pp. 2456–2463. [Google Scholar] [CrossRef]
Sridhar, S.; Mueller, F.; Zollhöfer, M.; Casas, D.; Oulasvirta, A.; Theobalt, C. Real-Time Joint Tracking of a Hand Manipulating an Object from RGB-D Input. In Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 8–16 October 2016. [Google Scholar] [CrossRef] [Green Version]
Zhang, J.; Jiao, J.; Chen, M.; Qu, L.; Xu, X.; Yang, Q. 3d hand pose tracking and estimation using stereo matching. arXiv 2016, arXiv:2016.1610.07214. [Google Scholar]
Tompson, J.; Stein, M.; LeCun, Y.; Perlin, K. Real-Time Continuous Pose Recovery of Human Hands Using Convolutional Networks. ACM Trans. Graph. 2014, 33, 1–10. [Google Scholar] [CrossRef]
Vaitkevičius, A.; Taroza, M.; Blažauskas, T.; Damaševičius, R.; Maskeliūnas, R.; Woźniak, M. Recognition of American Sign Language Gestures in a Virtual Reality Using Leap Motion. Appl. Sci. 2019, 9, 445. [Google Scholar] [CrossRef] [Green Version]
Ahlawat, S.; Batra, V.; Banerjee, S.; Saha, J.; Garg, A.K. Hand Gesture Recognition Using Convolutional Neural Network. In Proceedings of the International Conference on Innovative Computing and Communications. Lecture Notes in Networks and Systems, Delhi, India, 5–6 May 2018; Springer International Publishing: Singapore, 2018; pp. 179–186. [Google Scholar] [CrossRef]
Wang, J.; Liu, T.; Wang, X. Human hand gesture recognition with convolutional neural networks for K-12 double-teachers instruction mode classroom. Infrared Phys. Technol. 2020, 111, 103464. [Google Scholar] [CrossRef]
Cai, Y.; Ge, L.; Cai, J.; Yuan, J. Weakly-Supervised 3D Hand Pose Estimation from Monocular RGB Images. In Constructive Side-Channel Analysis and Secure Design, Proceedings of the 9th International Workshop, COSADE 2018, Singapore, 23–24 April 2018; Springer International Publishing: Singapore, 2018; pp. 678–694. [Google Scholar] [CrossRef]
Imashev, A.; Mukushev, M.; Kimmelman, V.; Sandygulova, A. A Dataset for Linguistic Understanding, Visual Evaluation, and Recognition of Sign Languages: The K-RSL. In Proceedings of the 24th Conference on Computational Natural Language Learning, online, 19–20 November 2020; Association for Computational Linguistics: Stroudsburg, PA, USA, 2020; pp. 631–640. [Google Scholar] [CrossRef]
Shan, D.; Geng, J.; Shu, M.; Fouhey, D.F. Understanding Human Hands in Contact at Internet Scale. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; IEEE: Piscataway, NJ, USA; pp. 9866–9875. [Google Scholar] [CrossRef]
Spurr, A.; Song, J.; Park, S.; Hilliges, O. Cross-Modal Deep Variational Hand Pose Estimation; Institute of Electrical and Electronics Engineers (IEEE): Piscataway, NJ, USA, 2018; pp. 89–98. [Google Scholar] [CrossRef] [Green Version]
Brahmbhatt, S.; Tang, C.; Twigg, C.D.; Kemp, C.C.; Hays, J. ContactPose: A Dataset of Grasps wi.th Object Contact and Hand Pose. In Constructive Side-Channel Analysis and Secure Design, Proceedings of the 11th International Workshop, COSADE 2020, Lugano, Switzerland, 1–3, April 2020; Springer International Publishing: Singapore, 2020; pp. 361–378. [Google Scholar]
Li, M.; Gao, Y.; Sang, N. Exploiting Learnable Joint Groups for Hand Pose Estimation. arXiv 2021, arXiv:2021.2012.09496. [Google Scholar]
Chen, L.; Lin, S.Y.; Xie, Y.; Tang, H.; Xue, Y.; Lin, Y.Y.; Xie, X.; Fan, W. Tagan: Tonality-alignment generative adversarial networks for realistic hand pose synthesis. In Proceedings of the 30th British Machine Vision Conference, BMVC, Cardiff, UK, 9–12 September 2019. [Google Scholar]
Dai, S.; Liu, W.; Yang, W.; Fan, L.; Zhang, J. Cascaded Hierarchical CNN for RGB-Based 3D Hand Pose Estimation. Math. Probl. Eng. 2020, 2020, 1–13. [Google Scholar] [CrossRef]
Zhou, Y.; Habermann, M.; Xu, W.; Habibie, I.; Theobalt, C.; Xu, F. Monocular real-time hand shape and motion capture us-ing multi-modal data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–18 June 2020; pp. 5346–5355. [Google Scholar]
Deng, X.; Yang, S.; Zhang, Y.; Tan, P.; Chang, L.; Wang, H. Hand3d: Hand pose estimation using 3d neural network. arXiv 2017, arXiv:2017.1704.02224. [Google Scholar]
Moon, G.; Chang, J.Y.; Lee, K.M. V2v-posenet: Voxel-to-voxel prediction network for accurate 3d hand and human pose es-timation from a single depth map. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 5079–5088. [Google Scholar]

Figure 1. Flow chart of the proposed hand gesture recognition (HGR) system.

Figure 2. Enhanced and binary images of three gestures in sign word dataset: (a) call, (b) close, and (c) correct.

Figure 3. Fused methods of three gestures in sign word dataset: (a) call, (b) fine, and (c) correct.

Figure 4. The directional image of gestures of the Sign Word dataset: (a) close, (b) single, and (c) cold.

Figure 5. Wave propagation of geodesic distance via fast-marching algorithm (FMA) on the Sign Word dataset classes of (a) call, (b) single, and (c) fine.

Figure 6. Landmarks detection on the Sign Word dataset with (a) call, (b) fine, and (c) please hand gestures.

Figure 7. Distance feature computed on gestures of the Sign Word dataset classes of (a) call, (b) please, and (c) fine.

Figure 8. Angular features extraction from triangles drawn on two classes of the Sign Word dataset: (a) call, (b) angle description, and (c) fine.

Figure 9. (a) Geometric feature collected from irregular shapes of the Sign Word dataset class call; (b) irregular shape divided into different sized triangles; (c) single triangle side representation.

Figure 11. (a) SONG on the call gesture with the outer region selected as a feature. (b) SONG on fine gesture with the outer region selected for feature.

Figure 12. (a) depicts active model feature extraction and the direction of extracted points, while (b) depicts the points extracted.

Figure 13. Cases of full-hand feature extraction: (a) three cases of 45° change in direction; (b) three cases of 90° change in direction.

Figure 14. Gray wolf optimization best solution on the Sign Word dataset.

Figure 15. Representation of the basic units of a genetic model.

Figure 16. Flow chart of the reweighted genetic model for HGR.

Table 1. Comparison of detection accuracy for the Sign Word dataset.

Classes	Fused Method Accuracy (%)	Directional Image Accuracy (%)	Classes	Fused Method Accuracy (%)	Directional Image Accuracy (%)
call	92	69	ok	92	68
close	91	62	please	93	58
cold	92	67	single	92	63
correct	92	60	sit	92.5	52
fine	92	66	tall	91	56
help	92	59	wash	92	62
home	91	55	work	91	61
like	92	65	yes	91	60
love	91.5	65	you	92	62
no	91	66	iLoveYou	92	68

Table 2. Descriptions of datasets used for evaluation and experimentation.

Name of Dataset	Type of Input Data	Gesture Classes
Sign word	RGB images	This dataset contains 20 isolated hand gestures (11 single-hand gestures and 9 double-hand gestures), i.e., call, close, cold, correct, fine, help, home, like, love, no, ok, please, single, sit, tall, wash, work, yes, you, iloveyou. The images of the dataset were collected with a pixel resolution of 200 x 200. To collect dataset images, we requested three volunteers (mean age 25) to perform the gesture of Sign Word [85].
Dexter1	RGB frames with 5 Sony DFW-V500 RGB cameras at 25 fps	Dexter1 consists of seven sequences, i.e., abduction–adduction, flexion–extension, finger count, finger wave, flexex1, pinch, random, tiger grasp of the hand. Roughly the first 250 frames in each sequence correspond to slow motions while the remaining frames are fast motions. All sequences are performed with an actor’s right hand [86].
Dexter + Object	RGB frames with Creative Senz3D color camera	Dexter + Object is a dataset for evaluating algorithms for joint hand and object tracking. It consists of six sequences, i.e., grasp1, grasp2, pinch, rigid, rotate, and occlusion with two actors (one female) and varying interactions with a simple object shape [87].
STB	RGB and RGBD frames	STB dataset contains 18,000 images with ground truth. Six people performed counting and random poses having different backgrounds [88].
NYU	RGBD data with ground truth images	NYU hand pose dataset consists of 8252 test set and 72,757 training set frames. The dataset consists of RGBD and RGB images. The training set consists of single user while test set consist of two users with different hand poses [89].

Table 3. Confusion matrix of accuracy scores for gesture classification for the Sign Word dataset.

Classes	C ¹	CL ²	CO ³	CR ⁴	F ⁵	H ⁶	HM ⁷	L ⁸	LV ⁹	N ¹⁰	O ¹¹	P ¹²	S ¹³	ST ¹⁴	T ¹⁵	WA ¹⁶	WO ¹⁷	Y ¹⁸	U ¹⁹	IL ²⁰
C ¹	0.99	0.00	0.00	0.00	0.00	0.01	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
CL ²	0.00	0.96	0.00	0.00	0.01	0.00	0.00	0.00	0.00	0.00	0.00	0.01	0.00	0.00	0.02	0.00	0.00	0.00	0.00	0.00
CO ³	0.01	0.00	0.87	0.00	0.00	0.01	0.00	0.00	0.00	0.02	0.00	0.00	0.00	0.01	0.00	0.03	0.05	0.00	0.00	0.00
CR ⁴	0.00	0.00	0.00	0.97	0.00	0.00	0.00	0.01	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.02	0.00	0.00
F ⁵	0.00	0.00	0.00	0.00	0.99	0.00	0.00	0.00	0.00	0.00	0.00	0.01	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
H ⁶	0.03	0.00	0.01	0.00	0.00	0.85	0.01	0.00	0.02	0.00	0.02	0.00	0.00	0.01	0.00	0.02	0.03	0.00	0.00	0.00
HM ⁷	0.00	0.00	0.00	0.00	0.00	0.00	0.95	0.00	0.02	0.00	0.00	0.00	0.00	0.00	0.00	0.03	0.00	0.00	0.00	0.00
L ⁸	0.00	0.00	0.00	0.01	0.00	0.00	0.00	0.96	0.00	0.00	0.00	0.00	0.01	0.00	0.00	0.00	0.00	0.00	0.00	0.02
LV ⁹	0.00	0.00	0.01	0.00	0.00	0.03	0.00	0.00	0.94	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.01	0.01	0.00	0.00
N ¹⁰	0.01	0.00	0.00	0.02	0.00	0.00	0.00	0.01	0.00	0.91	0.01	0.00	0.00	0.01	0.00	0.00	0.00	0.01	0.02	0.00
O ¹¹	0.02	0.00	0.00	0.01	0.01	0.00	0.00	0.03	0.00	0.03	0.86	0.01	0.01	0.00	0.00	0.00	0.00	0.01	0.01	0.00
P ¹²	0.00	0.00	0.00	0.00	0.03	0.00	0.00	0.00	0.00	0.00	0.00	0.97	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
S ¹³	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.03	0.00	0.96	0.00	0.00	0.00	0.00	0.00	0.00	0.01
ST ¹⁴	0.00	0.00	0.03	0.00	0.00	0.02	0.00	0.01	0.00	0.00	0.00	0.00	0.00	0.85	0.00	0.07	0.02	0.00	0.00	0.00
T ¹⁵	0.00	0.04	0.01	0.00	0.00	0.02	0.03	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.89	0.00	0.01	0.00	0.00	0.00
WA ¹⁶	0.00	0.00	0.00	0.00	0.00	0.01	0.00	0.00	0.02	0.00	0.00	0.00	0.00	0.02	0.00	0.87	0.08	0.00	0.00	0.00
WO ¹⁷	0.00	0.00	0.04	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.05	0.89	0.02	0.00	0.00
Y ¹⁸	0.00	0.00	0.00	0.01	0.00	0.00	0.00	0.00	0.00	0.03	0.00	0.00	0.00	0.00	0.00	0.00	0.01	0.88	0.07	0.00
U ¹⁹	0.01	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.01	0.01	0.00	0.00	0.00	0.00	0.00	0.00	0.08	0.89	0.00
IL ²⁰	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.03	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.97

¹ call, ² close, ³ cold, ⁴ correct, ⁵ fine, ⁶ help, ⁷ home, ⁸ like, ⁹ love, ¹⁰ no, ¹¹ ok, ¹² please, ¹³ single, ¹⁴ sit, ¹⁵ tall, ¹⁶ wash, ¹⁷ work, ¹⁸ yes, ¹⁹ you, ²⁰ iLoveYou.

Table 4. Confusion matrix of accuracy scores for gesture classification for the Dexter1 dataset.

Gesture Classes	AD ¹	FC ²	FW ³	F ⁴	P ⁵	R ⁶	TG ⁷
AD ¹	0.95	0.0	0.00	0.05	0.00	0.00	0.00
FC ²	0.03	0.96	0.00	0.00	0.00	0.00	0.02
FW ³	0.00	0.04	0.94	0.02	0.00	0.00	0.00
F ⁴	0.03	0.00	0.02	0.95	0.00	0.00	0.00
P ⁵	0.01	0.01	0.00	0.00	0.92	0.00	0.06
R ⁶	0.00	0.03	0.05	0.00	0.03	0.89	0.0
TG ⁷	0.00	0.00	0.00	0.02	0.07	0.00	0.91

¹ adbadd, ² fingercount, ³ fingerwave, ⁴ flexer1, ⁵ pinch, ⁶ random, ⁷ tigergrasp.

Table 5. Confusion matrix of accuracy scores for gesture classification for the Dexter + Object dataset.

Predicted Gesture Classes
	G ¹	GR ²	O ³	P ⁴	R ⁵	RO ⁶
G ¹	0.91	0.07	0.00	0.00	0.00	0.02
GR ²	0.09	0.89	0.00	0.00	0.01	0.01
O ³	0.00	0.00	0.85	0.00	0.09	0.06
P ⁴	0.06	0.05	0.00	0.84	0.03	0.02
R ⁵	0.00	0.01	0.06	0.00	0.92	0.01
RO ⁶	0.00	0.00	0.00	0.05	0.07	0.88

¹ grasp1, ² grasp2, ³ occlusion, ⁴ pinch, ⁵ rigid, ⁶ rotate.

Table 6. Mean accuracy for gesture classification of datasets.

Datasets	Mean Accuracy %
Sign Word	92.1
Dexter1	93.1
Dexter + Object	88.2
STB	90.8
NYU	85.3

Table 7. Test results of the three classifiers using the Sign Word dataset.

Classifier	Accuracy	Precision	Recall	F1
Decision tree	0.9142	0.9012	0.8412	0.8701
ANN	0.8924	0.8214	0.8516	0.8362
Genetic algo	0.9212	0.8817	0.8833	0.8824

Table 8. Test results of the three classifiers using the Dexter1 dataset.

Classifier	Accuracy	Precision	Recall	F1
Decision tree	0.9212	0.9102	0.8702	0.8897
ANN	0.9024	0.8313	0.8714	0.8508
Genetic algo	0.9312	0.8927	0.8923	0.8924

Table 9. Test results of the three classifiers using the Dexter + Object dataset.

Classifier	Accuracy	Precision	Recall	F1
Decision tree	0.9021	0.9012	0.8412	0.8701
ANN	0.8761	0.7915	0.8315	0.8110
Genetic algo	0.8822	0.8315	0.8012	0.8160

Table 10. Test results of the three classifiers using the STB dataset.

Classifier	Accuracy	Precision	Recall	F1
Decision tree	0.8901	0.8542	0.8612	0.8576
ANN	0.8912	0.8612	0.8415	0.8512
Genetic algo	0.9081	0.8522	0.8724	0.8621

Table 11. Test results of the three classifiers using the NYU dataset.

Classifier	Accuracy	Precision	Recall	F1
Decision tree	0.8641	0.8414	0.8213	0.8312
ANN	0.8421	0.8321	0.8101	0.8214
Genetic algo	0.8532	0.8462	0.8387	0.8424

Table 12. Result comparison with the other state-of-the-art methods on all three datasets.

Dataset	Feature Extraction Method	Authors	Recognition Accuracy (%)
Sign Word	Point-based	Vaitkevičius et al. [90]	86.1
	Full-hand	Ahlawat et al. [91]	90
	Full-hand	Wang et al. [92]	92
	Point-based + full-hand	Proposed methodology on Sign Word dataset	92.1
Dexter1	Point-based	Cai [93]	88
	Full-hand	Imashev [94]	86
	Full-hand	Shan et al. [95]	89
	Point-based + full-hand	Proposed methodology on Dexter1 dataset	93.1
Dexter + Object	Point-based	Spurr et al. [96]	85
	Point-based	Brahmbhatt et al. [97]	86.49
	Full-hand	Li et al. [98]	84
	Point-based + full-hand	Proposed methodology on Dexter + Object dataset	88.2
STB	Point-based	Chen et al. [99]	75
	Point-based	Dai et al. [100]	77
	Full-hand	Zhou et al. [101]	89
	Point-based + full-hand	Proposed methodology on STB	90.8
NYU	Point-based	Deng et al. [102]	74
	Full-hand	Moon et al. [103]	83.4
	Point-based + full-hand	Proposed methodology on NYU	85.3

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ansar, H.; Jalal, A.; Gochoo, M.; Kim, K. Hand Gesture Recognition Based on Auto-Landmark Localization and Reweighted Genetic Algorithm for Healthcare Muscle Activities. Sustainability 2021, 13, 2961. https://0-doi-org.brum.beds.ac.uk/10.3390/su13052961

AMA Style

Ansar H, Jalal A, Gochoo M, Kim K. Hand Gesture Recognition Based on Auto-Landmark Localization and Reweighted Genetic Algorithm for Healthcare Muscle Activities. Sustainability. 2021; 13(5):2961. https://0-doi-org.brum.beds.ac.uk/10.3390/su13052961

Chicago/Turabian Style

Ansar, Hira, Ahmad Jalal, Munkhjargal Gochoo, and Kibum Kim. 2021. "Hand Gesture Recognition Based on Auto-Landmark Localization and Reweighted Genetic Algorithm for Healthcare Muscle Activities" Sustainability 13, no. 5: 2961. https://0-doi-org.brum.beds.ac.uk/10.3390/su13052961

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Hand Gesture Recognition Based on Auto-Landmark Localization and Reweighted Genetic Algorithm for Healthcare Muscle Activities

Abstract

1. Introduction

2. Literature Review

2.1. HGR Through Electromyographic Signals

2.2. HGR through Smartphone

2.3. HGR Through Camera

2.3.1. HGR via Full-Hand Features

2.3.2. HGR via Landmarks Features

3. Materials and Methods

3.1. Preprocessing

3.2. Hand Detection

3.2.1. The Fused Method

3.2.2. Directional Images

3.3. Landmark Detection

Geodesic Distance

3.4. Feature Extraction via Point-Based Method

3.4.1. Distance Features

3.4.2. Angular Features

3.4.3. Geometric Features

3.5. Feature Extraction via Full Hand

3.5.1. Mesh Geometry

3.5.2. Active Model

3.6. Features Optimization

3.7. Classifier: Reweighted Genetic Algorithm

4. System Validation and Experimentation

4.1. Dataset Description

4.2. Recognition Accuracy

4.3. Precision, Recall, and F1 Score

4.4. Comparison

5. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI