Next Article in Journal
EcoQBNs: First Application of Ecological Modeling with Quantum Bayesian Networks
Previous Article in Journal
Bound on Efficiency of Heat Engine from Uncertainty Relation Viewpoint
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Hybrid Method Based on Extreme Learning Machine and Wavelet Transform Denoising for Stock Prediction

College of Computer Science and Technology, Harbin Institute of Technology, Shenzhen 518055, China
*
Author to whom correspondence should be addressed.
Submission received: 8 March 2021 / Revised: 2 April 2021 / Accepted: 7 April 2021 / Published: 9 April 2021

Abstract

:
The trend prediction of the stock is a main challenge. Accidental factors often lead to short-term sharp fluctuations in stock markets, deviating from the original normal trend. The short-term fluctuation of stock price has high noise, which is not conducive to the prediction of stock trends. Therefore, we used discrete wavelet transform (DWT)-based denoising to denoise stock data. Denoising the stock data assisted us to eliminate the influences of short-term random events on the continuous trend of the stock. The denoised data showed more stable trend characteristics and smoothness. Extreme learning machine (ELM) is one of the effective training algorithms for fully connected single-hidden-layer feedforward neural networks (SLFNs), which possesses the advantages of fast convergence, unique results, and it does not converge to a local minimum. Therefore, this paper proposed a combination of ELM- and DWT-based denoising to predict the trend of stocks. The proposed method was used to predict the trend of 400 stocks in China. The prediction results of the proposed method are a good proof of the efficacy of DWT-based denoising for stock trends, and showed an excellent performance compared to 12 machine learning algorithms (e.g., recurrent neural network (RNN) and long short-term memory (LSTM)).

1. Introduction

In the era of big data, deep learning for predicting stock prices [1] and trends has become more popular [2,3]. The fully-connected feedforward neural network (FNN) possesses excellent performance, and is superior to traditional time-series forecasting techniques (e.g., autoregressive integrated moving average (ARIMA)) [4,5]. The FNN is mainly trained using the well-known backpropagation (BP) algorithm [6]. The traditional BP algorithm essentially optimizes parameters based on the gradient descent method, accompanied by the problems of slow convergence and local minima. Extreme learning machine (ELM) is one of the effective training algorithms for single-hidden-layer feedforward neural networks (SLFNs). In ELM, hidden nodes are initialized randomly, and network parameters at the input end are fixed without iterative tuning. The only parameter that needs to be learned is the connection (or weight) matrix between the hidden layer and the output layer [7]. Theoretically, if hidden nodes are randomly generated, ELM maintains the general approximation capability of SLFNs [8]. Compared with the traditional FNN algorithm, the advantages of ELM in terms of efficiency and generalization performance have been proven on a wide range of issues in different fields [9]. The ELM possesses a faster learning and better generalization performance [10,11,12] than traditional gradient-based learning algorithms [13]. Due to efficient learning ability of ELM, it is widely used in classification, regression problems, etc. [14,15]. In addition to being used for traditional classification and regression tasks, ELM has recently been extended for clustering, feature selection, and representation learning [16]. For more research on ELM, please refer to related literatures [17,18,19,20,21,22,23,24,25].
In recent years, the research of hybrid models for time series prediction has become more popular [26]. Regarding the advantages of ELM, in recent years, the use of ELM for time-series datasets has gradually increased. A number of scholars have applied ELM to carry out feature engineering on time-series data [27], and ELM was extensively utilized to study various hybrid models for predicting time-series data. In order to discover the features of the original data, Yang et al. proposed an ELM-based recognition framework to deal with the recognition problem [28]. Li et al. proposed the design and architecture of a trading signal mining platform that uses ELM to simultaneously predict stock prices based on market news and stock price datasets [29]. Wang et al. introduced a new mutual information-based sentiment analysis method and employed ELM to improve the prediction accuracy and accelerate the prediction performance of the proposed model [30]. Jiang et al. combined empirical mode decomposition, ELM, and improved harmony search algorithm to establish a two-stage ensemble model for stock price prediction [31]. Tang et al. optimized the ELM by the differential evolution algorithm to construct a new hybrid predictive model [32]. Weng et al. presented an improved genetic algorithm regularized online ELM to predict gold price [33]. Jiang et al. proposed a hybrid approach consisting of pigeon-inspired optimization (PIO) and ELM based on wavelet packet analysis (WPA) for predicting bulk commodity prices. That hybrid model possessed a better performance on horizontal precision, directional precision and robustness [34]. Khuwaja et al. presented a framework to predict the stock price movement using phase space reconstruction (PSR) and ELM, and results achieved from the proposed framework were compared with the conventional machine learning pipeline, as well as the baseline methods [35]. Jeyakarthic and Punitha introduced a new method based on multi-core ELM for forecasting stock market returns [36]. Xu et al. presented a new carbon price prediction model using time series complex network analysis and ELM [37]. In addition to the use of ELM for feature processing and developing hybrid models, a number of studies have modified ELM to make it highly appropriate for a variety of practical scenarios. Wang et al. introduced a non-convex loss function, developed a robust regularized ELM, and emphasized on solving the key problem of low efficiency [38]. Guo et al. presented a robust adaptive online sequential ELM-based algorithm for online modeling and prediction of non-stationary data streams with outliers [39].
The research of stock prediction is inseparable from the hypothesis of market efficiency [40]. There are a lot of studies on market efficiency [41]. The general conclusion is that the market in which the stock trend can be predicted should be the ineffective market, otherwise it is impossible to predict the trend of stock. Relevant studies have pointed out that China’s stock market is a relatively emerging market [42,43], which is relatively ineffective. Therefore, it is feasible to use various machine learning models to predict the trend of the stock through transaction data analysis. In order to predict the stock trend in an ineffective market, we need to consider the influence of noise on trend prediction. Due to the influence of various accidental factors on the financial market, the impact of noise on financial time-series data draws scholars’ attention. To our knowledge, noise often distorts investors’ judgments on stock trends and seriously affects further analysis and processing. However, financial time-series data possess non-stationary and non-linear characteristics, and the traditional denoising processing methods are often accompanied with several defects. The traditional methods of denoising the financial time-series data mainly include the moving average (MA) [44], Kalman filter [45], Wiener filter [46], and fast Fourier transform (FFT) [47]. As a simple data smoothing technique, the MA approximately processes the time-series data. In the denoising process, it may also lead to a loss of useful information. The FFT generally treats high-frequency signals as noise, sets all Fourier coefficients above a certain threshold frequency to zero, and then converts the datasets to the time-domain through inverse Fourier transform to achieve denoising [47]. With respect to the low signal-to-noise ratio in financial data, it is difficult to achieve effective denoising due to more high-frequency effective signals. Kalman filter is a recursive estimator to estimate the state of a system based on the criterion of minimum mean-square error of the residual [48]. As financial time-series data are non-stationary and nonlinear, it is difficult to describe their state and behavior with a definite equation. In signal processing, the Wiener filter is a filter used to produce an estimate of a desired or target random process by linear time-invariant (LTI) filtering of an observed noisy process, assuming known stationary signal and noise spectra, and additive noise. The Wiener filter minimizes the mean square error between the estimated random process and the desired process. Wavelet analysis is a mathematical technique that can decompose a signal into multiple lower resolution levels by controlling the scaling and shifting factors of a single wavelet function. FFT has no locality, while wavelet transform not only has locality, but also scaled parameters that can change the spectrum structure and the shape of the window, so that wavelet analysis can achieve the effect of multi-resolution analysis [49]. The wavelet methods can be used to decompose a noisy signal into different scales and remove the noise while preserving the signal, regardless of the frequency content. The wavelet transforms are developed according to the requirements of time-frequency localization. They possess adaptive properties and are particularly appropriate for processing of stationary and non-linear signals [50]. A recent study employed wavelet transform to reduce the amount of noisy data [51]. Xu et al. proposed a novel method based on the wavelet-denoising algorithm and multiple echo state networks to improve the prediction accuracy of noisy multivariate time-series [52]. Bao et al. developed a deep learning framework, combining wavelet transform, stacked autoencoders, and long short-term memory (LSTM) for stock price prediction [53]. Yang et al. presented an image processing method based on wavelet transform for big data analysis of historical data of individual stocks to obtain images of individual stocks volatility data [54]. Lahmiri introduced a new method based on the combination of stationary wavelet transform and Tsallis entropy for empirical analysis of the return series [55]. Li and Tang proposed a WT-FCD-MLGRU model, which is the combination of wavelet transform, filter cycle decomposition and multi-lag neural networks, and that model possessed minimum forecasting error in stock index prediction [56]. Wen et al. used wavelet transform to extract the features of the Shanghai Composite Index and S&P Index to study the relationship between China’s stock market and international commodities [57]. Mohammed et al. employed continuous wavelet transform and wavelet coherence method to study the relationship between stock indices in different markets [58]. Yang applied the differential method to highlight the trend of the stock price change. To further suppress the influence of stock noise data, they employed wavelet transform to decompose the stock data into principal component and detailed component [59]. Xu et al. presented design and implementation of a stacked system to predict the stock price. Their model used the wavelet transform technique to reduce the noise of market data, and stacked auto-encoder to filter unimportant features from preprocessed data [60]. He et al. proposed a new shrinkage (threshold) function to improve the performance of wavelet shrinkage denoising [61]. Yu et al. used wavelet coherence and wavelet phase difference based on continuous wavelet transform to quantitatively analyze the correlation effect of stock signal returns in the time-frequency domain [62]. Faraz and Khaloozadeh applied wavelet transform to reduce the noise in the stock index and to make the data smooth [63,64]. Chen utilized a recursive adaptive separation algorithm based on discrete wavelet transform (DWT) to denoise data [65]. Li et al. proposed a hybrid model based on wavelet transform denoising, ELM, and k-nearest neighbor regression optimization for stock prediction [66]. There are several other similar studies as well [67,68,69,70,71,72,73,74,75]. Basically, related research has improved the effect of related time-series prediction through wavelet transform.
Due to the applicability of wavelet transform in financial data, as well as the need for data smoothing of the labeling method based on the continuous trend of stock data, and with respect to the advantages of ELM, we proposed a hybrid method for stock trend prediction based on ELM and wavelet transform.
The main arrangements of this paper are as follows:
In the second section, the theories related to ELM and wavelet transform are introduced, and then, the hybrid method DELM (the combined model of wavelet transform denoising and ELM) is described.
In the third section, an overview of the continuous trend labeling method based on time-series data is presented, and the stock datasets used in this paper are introduced. The statistical metrics used for evaluation of experimental results are described, and the reasons for choosing those metrics are discussed.
In the fourth section, the differences between the denoised data and the raw data are compared. Besides, the stationarity testing of the denoised data after feature preprocessing is carried out. The difference in the labeling effect after the combination of the labeling method and DWT-based denoising is analyzed. The prediction results of ELM and DELM are compared. The prediction results of the DELM and other commonly used algorithms are compared.
The descriptions of the related abbreviations are listed in Appendix A Table A1.

2. Methods

In this section, we briefly introduce ELM and wavelet transform theoretically, and describe the DELM method proposed.

2.1. ELM

ELM is a new algorithm developed for SLFNs [76]. In an ELM algorithm, the input layer weights are randomly assigned, and the output layer weights are obtained by using the generalized inverse of the hidden layer output matrix. Compared with traditional feed-forward neural network training algorithms, which are slow, easy to fall into local minimum solutions, and are sensitive to the selection of learning rates, the ELM algorithm randomly generates the connection weights of the input layer and the threshold of the hidden layer neurons, and it is unnecessary to adjust the weights in the training process. It is only by setting the number of neurons in the hidden layer, that a unique optimal solution can be obtained. Compared with the previous traditional training algorithms, the ELM algorithm is advantaged by fast learning and good generalization performance. It is not only appropriate for regression and fitting problems, but also for classification [77,78] and pattern recognition. The structure of a typical SLFN is shown in Figure 1. It consists of an input layer, a hidden layer, and an output layer, which are fully connected.
There are n neurons in the input layer, corresponding to the number of n input variables, and l neurons in the hidden layer; there are m neurons in the output layer, corresponding to the number of m output variables. W denotes the weight matrix linking the input layer and the hidden layer, where wij is the weight of the i-th neuron in the hidden layer and the j-th neuron in the input layer. The weight matrix linking the hidden layer and the output layer is β, βjk represents the weight matrix connecting the j-th neuron in the hidden layer and the k-th neuron in the output layer, and b represents the threshold of neurons in the hidden layer. W, β, and b are shown in Equation (1).
W = [ w 11       w 12         w 1 n w 21       w 22         w 2 n                             w l 1       w l 2           w l n ] l × n ,   β = [ β 11       β 12         β 1 m β 21       β 22         β 2 m                             β l 1       β l 2           β l m ] l × m ,   b = [ b 1 b 2   b l ] l × 1
For the training set of N samples, the input matrix is X, the output matrix is Y, and T is the expected output matrix (see Equation (2)).
X = [ x 11       x 12         x 1 N x 21       x 22         x 2 N                             x n 1       x n 2           x l N ] n × N , Y = [ y 11       y 12         y 1 N y 21       y 22         y 2 N                             y m 1       y m 2           y m N ] m × N T = [ t 11       t 12         t 1 N t 21       t 22         t 2 N                             t m 1       t m 2           t m N ] m × N
The goal of ELM for learning SLFN is to minimize the output error, which is expressed in Equation (3).
j = 1 N y j t j = 0 ,   t j = [ t 1 j , t 2 j , , t m j ] ,   y j = [ y 1 j , y 2 j , , y m j ]
That is, the existence of wi, βi and bi results to hold Equation (4). It can therefore be expressed as Equation (5).
i = 1 l β i g ( w i x j + b i ) = t j , j = 1 , , N
H β = T
With expanding Equations (4) and (5) to Equations (6) and (7), we can achieve the specific network output form, as well as the specific form of H.
t j = [ t 1 j t 2 j t m j ] = [ i = 1 l β i 1 g ( w i x j + b i ) i = 1 l β i 2 g ( w i x j + b i ) i = 1 l β i m g ( w i x j + b i ) ]         ( j = 1 , 2 , N ) ,   w i = [ w i 1 , w i 2 , , w i n ] ,   x j = [ x 1 j , x 2 j , , x n j ]
H ( w 1 , w 2 , , w l , b 1 , b 2 , , b l , x 1 , x 2 , , x N ) =     [ g ( w 1 x 1 + b 1 )           g ( w 2 x 1 + b 2 )                 g ( w l x 1 + b l ) g ( w 1 x 2 + b 1 )           g ( w 2 x 2 + b 2 )                 g ( w l x 2 + b l )                                                                                         g ( w 1 x N + b 1 )         g ( w 2 x N + b 2 )                 g ( w l x N + b l ) ] N × l    
In order to train a SLFN, we attempted to obtain w ^ i , b ^ i and β ^ i , resulting in holding Equation (8). This minimized the loss function in Equation (9).
H ( w ^ i , b ^ i ) β ^ i T = min w , b , β H ( w i , b i ) β i T , i = 1 , , l
E = j = 1 N ( i = 1 l β i g ( w i x j + b i ) t j ) 2
Some traditional algorithms based on gradient descent can be used to solve such problems, while it is necessary to adjust all parameters in an iterative process for gradient-based learning algorithms. For the ELM algorithm, once the input weights and the bias of the hidden layer are randomly determined, the output matrix of the hidden layer is uniquely constructed. The output weights can be determined by the system (i.e., the ELM algorithm randomly assigns input weights and hidden layer bias rather than completely adjusting all parameters (e.g., backpropagation neural network (BPNN)). For SLFNs, the ELM algorithm can analytically determine output weights. Through the proof of the previously presented theorems Theorems 2.1 and 2.2 in [76], the minimum norm of the weights is given, which is simple to implement and fast in prediction. Therefore, W and b are randomly selected and determined, and remain unchanged during the training process. β can be obtained by solving the least squares of the Equations (10) and (11).
min β H β T
β ^ = H + T
where, H+ is the Moore–Penrose generalized inverse of the hidden layer output matrix H.

2.2. Wavelet Analysis

Wavelet transform is a mathematical approach that gives the time-frequency representation of a signal with the possibility to adjust the time-frequency resolution. It can simultaneously display functions and manifest their local characteristics in time-frequency domain. The use of these characteristics facilitates training of neural networks with accuracy to extremely nonlinear signals. It is a time-frequency localized analysis method that the size of window is fixed, while its shape may change [79]. In other words, it has a lower time resolution and higher frequency resolution in low frequency band [80], and higher time resolution and lower frequency resolution in high frequency band, making it highly appropriate for analyzing non-stationary signals, as well as extracting the local characteristics of signals. Wavelet transform includes continuous wavelet transform (CWT) and DWT [81]. The aim of wavelet transform is to translate a function called basic wavelet with parameter τ, then, scale the function with the scaling parameter a, and do inner product with signal x(t), as formulated in Equation (12), where a > 0 is the scaling factor used to scale the φ(t) function, and the τ parameter is used to translate the function φ(t). Both a and τ are continuous variables, thus, Equation (12) is called CWT [82]. DWT is a transform that decomposes a given signal into a number of sets, where each set is a time-series of coefficients describing the time evolution of the signal in the corresponding frequency band [83]. DWT discretizes a signal according to the power series based on the scaling parameter, and is often used for signal decomposition and reconstruction [84]. DWT constructs a scaling function vector group and a wavelet function vector group at different scales and time periods, i.e., the scaling function vector space and the wavelet function vector space [85]. At a certain level, the signal convolved in the scaling space is the approximated, low-frequency information of the raw signal (e.g., “cA” component in Figure 2), and the signal convolved in the wavelet space is the detailed, high-frequency information of the raw signal (e.g., “cD” component in Figure 2). DWT has two important functions, one is the scaling function, as shown in Equation (13), and the other is the wavelet function, as presented in Equation (14) [79].
W T x ( a , τ ) = 1 a + x ( t ) φ ( t τ a ) d t
ϕ j k ( t ) = 2 j 2 ϕ ( 2 j t k ) , j , k Z
ψ j k ( t ) = 2 j 2 ψ ( 2 j t k ) , j , k Z
The signal passes through a decomposed high-pass filter and a decomposed low-pass filter. The high-frequency component of the corresponding signal is output by the high-pass filter, which is called the detail component. The output of the low-pass filter corresponds to the relatively low-frequency component of the signal, which is called the approximate component [86]. In general, the short-term volatility of stock is often affected by various information and possesses the characteristics of short-term noise. The labeling method used in the third part of the present research is based on the continuous trend characteristics of stocks to label the data and generate training datasets. Therefore, noisy data may have an obvious impact on the labeling process. It is hoped that the continuous trend characteristics of the stock are relatively stable, and the short-term noise can be filtered in the process of data labeling, especially for data with Gaussian white noise in the majority of cases; thus, we denoised the raw data by wavelet transform, and labeled the data based on the denoised data to generate training dataset. The use of DWT to denoise stock data generally requires the selection of wavelet basis, the number of decomposition layers, and the selection of threshold [79]. The DWT-based denoising process is illustrated in Figure 2. In the selection process of wavelet function, we used dozens of stocks to test the whole experimental process with different wavelet functions. The final trend prediction results are better with the wavelet function of db8. Then, we set the wavelet function as db8 with the threshold parameter of 0.04.

2.3. The Proposed Hybrid Method

In the current research, the main purpose is to propose a hybrid method for stock prediction, including ELM model and wavelet transform denoising. Hence, first, we imported the raw stock data, then used the labeling method to label the data, performed feature preprocessing on the raw data based on the Equations (17) and (18), and then allocated the data into training datasets, validation datasets, and test datasets. Afterwards, the training datasets were used for the training of the proposed denoised ELM (DELM). Besides, we trained the ELM model and the 12 common algorithms using the training datasets based on the raw data, and the results on the corresponding test sets were employed to compare with the results of the DELM to examine the positive influence of DWT on the ELM classification results (i.e., C1 in Figure 3). The ELM-associated parameters are shown in Table 1. Finally, the results of the 12 common algorithms were compared with those of the DELM method (i.e., C2 in Figure 3).
We used the features that were extracted by DWT-based denoising to train the DELM, and the parameters used were consistent with those applied in training of the ELM model with the raw data. Table 1 summarizes parameters required for training of the ELM model.

3. Feature Engineering for Stock Trend Prediction

In this section, we mainly introduce the labeling method that is used to forecast the stock trend. Through this labeling method, we can clearly define the rising and falling trend of the stock. We introduce the data set used in this paper, the statistical metrics of the experimental results and some considerations of selecting these statistical metrics.

3.1. Labeling Method

In the previous research, we proposed a labeling method based on the continuous trend characteristics of the time-series data [87]. In the current study, this labeling method was used to label the stock data, and then, training datasets and test datasets were generated for prediction of trends of the corresponding stocks.
In this paper, training datasets and test datasets were generated based on the closing price of transaction data. Firstly, we expanded the dimension of the closing price in order to make the current training vector of the historical stock data with the parameter length of λ. The process is formulated in Equation (15), where x represents the original closing price sequence, X denotes the matrix after dimension expansion, and each row of the X data represents a vector. Equation (16) indicates the labeling of these vectors, which are calculated by Equation (22).
x =   [ x 1 x 2 . . x N 1 x N ] X =   [ x λ x λ 1 x λ 2 x 1 x λ + 1 x λ x λ 1 x 2 . . . . . . . . x N 1 x N 2 x N 3 x N λ x N x N 1 x N 2 x N λ + 1 ]
y = [ l a b e l λ l a b e l λ + 1 .   .   l a b e l N 1         l a b e l N   ]
After expanding the dimension of the raw closing price data based on the parameter λ, we carry out the basic feature processing for the expanded data, so that the processed data features are stable, consistent with the standardization process, as summarized in Equations (17) and (18), where xij represents a certain closing price of matrix X, and Mλs denotes the mean value of the sliding parameter λ. In this way, the data feature processing uses historical data only, and there is no look-ahead bias [88]. In the section of experiments, we attempted to examine the stationarity of the processed data. λ was set as 11, consistent with the study [87].
f i j = ( x i j M λ i ) / M λ i , x i j X
M λ s = i = s s + λ 1 x i λ ,   x i x , s = 1 , 2 N λ + 1
Then, the relative maximum and minimum values of the time-series data are defined in Equation (19) with respect to the fluctuation parameter ω (the labeling parameter), and the continuous trend characteristics of the corresponding stocks are calculated according to Equations (20) and (21). Finally, the labels of the data can be obtained by Equation (22).
h =   [ h 1   h 2 .   .   h t 1         h t   ] l = [ l 1 l 2 .   .   l m 1         l m   ]
T D ( h i l i 1 ) = a b s ( h i l i 1 l i 1 ) , i > 1
T D ( l i h i 1 ) = a b s ( l i h i 1 h i 1 ) , i > 1
x l a b e l = { 1 ,             i f                 x     { x | l i 1 x 0 h i , T D ( h i l i 1 ) ω , i = 2 , 3 , 4 t ; x 0   { x j | j = 0 } , j = 0 , 1 , 2 λ } 0 ,         i f                 x     { x | h i 1 x 0 l i , T D ( l i h i 1 ) ω , i = 2 , 3 , 4 m ; x 0   { x j | j = 0 } , j = 0 , 1 , 2 λ }

3.2. Datasets

The stock data used in the current research are from a pool of hundreds of stocks in the Shanghai and Shenzhen stock markets in China, covering various industries. The date of trading these stocks backs to 1 January 2001 to 3 December 2020, lasting for approximately 20 years. The transaction data of each trading day are taken as the raw data, including stock code, opening price, the highest price, the lowest price, closing price, trading volume, etc. Some suspended stocks or newly listed stocks are deleted, and 400 stocks with more than 4000 rows of data are screened out as our dataset. The data are from https://tushare.pro/ (accessed on 2 January 2021), which can be downloaded in the sub-category of “Backward Answer Authority Quotes” under the category of “Quotes Data”. The data can also be downloaded for free through https://github.com/justbeat99/400_stocks_data_zips.git (accessed on 2 January 2021). After downloading the raw data, we performed feature preprocessing on the data according to Equations (17) and (18), and then labeled the data according to Equations (20)–(22) to generate labeled datasets, and segmented each stock data with the first 70% of the date for the training dataset, 15% in the middle part for the validation dataset, and the last 15% for the test dataset. As a result, the training, validation, and test datasets of 400 stocks with labeled data could be obtained. We checked the balance of the positive and negative samples on the training, validation, and test datasets of these 400 stocks, and it was found that all the datasets were relatively balanced. The balance table is submitted in the Supplementary Materials. Appendix A Table A2 provides the balanced datasets for some stocks. It can be seen from Appendix A Table A2 that for the listed data, the training datasets are half of the positive and negative samples, i.e., they are all relatively balanced. Regarding the validation datasets, the proportion of positive samples for 000005 is 39%, the proportion of positive samples for 000025 is 34%, and the proportion of positive samples for 000520 is 31% that are imbalanced. Regarding the test datasets, the proportion of positive samples for 000055 is 35%, the proportion of positive samples for 000068 is 37%, and the proportion of positive samples for 000523 is 39%. The balance of these situations is slightly worse. However, they all happen in the validation dataset or test dataset of a small number of stocks, and their impact is not great. We further checked the data of all stocks, and it was found that the training dataset was basically balanced. The sample balance sheet for positive and negative cases for 400 stocks is submitted as Supplementary Materials.

3.3. Statistical Metrics

Since our datasets are relatively balanced, five common statistical metrics were employed to evaluate the classification effect, including Accuracy (Acc), Recall (R), Precision (P), F1 score (F1), and area under the curve (AUC) [89], as shown in Table 2.
In Table 2, TP represents the correctly predicted proportion of positive samples; FN denotes the incorrectly predicted proportion of positive samples; FP represents the proportion of negative samples that are predicted incorrectly; and TN demonstrates the proportion of negative samples that are predicted correctly [90]. In terms of AUC, x i + and x i represent data points with positive and negative labels, respectively. Besides, f is a general classification model, 1 is an indicator function that is equal to 1 when f ( x i + ) f ( x j ) ; otherwise that is equal to 0; N+ (resp., N) is the number of data points with positive (resp., negative) labels, and M = N+N denotes the number of matching points with opposite labels (x+, x), with a value ranging from 0 to 1 [91].
Acc is the most basic evaluation metric, which mainly reflects the correctness of the forecasting as a whole [92]. It simply calculates the ratio, while it does not differentiate categories. Because type of error costs may be different, it is not advised to use only Acc to measure the case of unbalanced samples, because the generalizability of the model and the random prediction problem caused by sample skew are not considered. Generally speaking, the higher the Acc, the better the classifier. Our datasets are relatively balanced datasets, therefore, Acc is a promising evaluation metric. Precision is a measure of accuracy that represents the proportion of examples that are classified as positive examples but actually positive examples. Recall is a measure of coverage, which is used to measure the proportion of positive cases that are correctly classified as positive cases. F1 takes into account the Acc and Recall of the classification model [93], and can be regarded as the harmonic average of the accuracy and recall of the model. The physical meaning of AUC is the probability of taking any positive/negative case, and the positive case ranks before the negative case. The AUC reflects the sorting ability of classifiers. It is noteworthy that the AUC is not sensitive to whether the sample categories are balanced or not, justifying why performance of a classifier is typically evaluated by the AUC for unbalanced samples [94]. For a specific classifier, it is impossible for us to simultaneously improve all the above-mentioned metrics. However, if a classifier can correctly classify all instances, all metrics are optimal. Therefore, we mainly considered the actual effects (i.e., the results of Acc, the sensitivity of balanced samples, and the values of the AUC).

4. Experiments

In the section, we visualized and analyzed the results of the DWT. The stationarity of the feature data is tested. The labeling process of the labeling method is described in detail through the visualization. The results of ELM and DELM are compared and analyzed. Finally, the results of DELM are compared with these of some common classification algorithms.

4.1. The Visualization of the DWT-Based Denoising

After completing the DWT-based denoising process, we could obtain the denoised data. The raw data and the denoised data are checked. As shown in Figure 4, the line charts of raw data and denoised data of four stocks can be observed. Since the number of the raw data and the denoised data exceeds 4000, the visualization of all the data in one graph is not very intuitive. We partially enlarged the graph of the last 300 data for each stock. As displayed in Figure 4, the trend of the time-series data after denoising is smoother, and the result of the continuous trend characteristics is more stable. An abnormal fluctuation in the raw time-series data is often caused by random accidents. Abnormal fluctuations in stock market caused by accidents often reflect short-term surges and plunges, causing stock price to deviate from the normal trend. However, when accidents pass, the stock price often returns to the original normal trend. Therefore, it is hoped that the denoised data can filter such abnormal noise and maintain better trend continuity. It can be seen from the Figure 4 that the denoised data are basically less sensitive to such abnormal points, improving the continuity of the trend after denoising the data. This result is in line with our needs and expectations. It was also noticed that after denoising the data, the relatively high-price point was basically lower compared to the raw data in a local area, and the relatively low-price point was higher compared to the raw data in a local area. This is also in line with our needs for denoising. It is hoped that the DWT-based denoising can enhance the trend characteristics of stocks, and then better train machine learning models for predicting changes in the trend of stocks.

4.2. Testing of Stationarity

In general, it is highly essential to standardize or normalize the data before the data are used to train the model, so that the data can be mapped to a relatively stable fluctuation interval with a relatively stable volatility, which is convenient for a model to learn such a norm according to the eigenvector. The traditional standardization or normalization methods are used to process all the data [95], which are not appropriate for the time-series data and have the problem of look-ahead bias [96]. The raw data processed by Equations (17) and (18) were stable. It is necessary to conduct a stationarity test on the data processed by Equations (17) and (18) after DWT-based denoising to indicate whether there is a stable sequence, which is convenient for machine learning models to learn. Stationary data could improve the prediction ability of machine learning models. Figure 5 shows the results of feature processing based on Equations (17) and (18) for the DWT-based denoised data of stock code 000005. Figure 5a illustrates the sequence diagram of the denoised data. Figure 5a shows that the denoised data are not stable and do not have a stable fluctuation form. Figure 5b displays the featured data after the denoised data are processed by the Equations (17) and (18). It can be seen from Figure 5b that after feature processing, the data are mapped to a relatively stable fluctuation interval, and the mean is around zero. Figure 5c is the autocorrelation graph, and Figure 5d is the partial autocorrelation graph. It can be seen that once the lag parameter exceeds 15, the corresponding autocorrelation value and partial autocorrelation value fluctuate around zero. It can be concluded that, basically, the features obtained by the Equations (17) and (18) from denoised data are stable (Figure 5). In order to obtain the exact results from the statistical level, we conducted an enhanced ADF test on the 400 stocks [97,98]. Appendix A Table A3 shows the results of ADF test for some stocks. Others are submitted as Supplementary Materials. From Appendix A Table A3, we can see that the statistic based on ADF test of 000005 is −10.77, which is less than the critical values of 1% (−3.43), 5% (−2.86), and 10% (−2.57). Simultaneously, the p value is 2.37 × 10−19, which is close to zero. From the results of the ADF test, it can be observed that the features processed by Equations (17) and (18) from DWT-based denoised data are stationary. All the 400 stocks were checked, and these stationarity data for all stocks are consistent, i.e., these are all stable sequences.

4.3. Labeling Process

The labeling method used in this paper is based on the continuous trend features of the corresponding stock. In the process of labeling the stock data, the volatility parameter ω needs to be given, based on ω, the relatively high- and low- price points are calculated according to the Equations (19)–(21), and the continuous trend indicator TD is calculated based on these high- and low-price points. The labeling method does not remarkably care about the short-term normal fluctuations of stock price, while it is more concerned with long-term, continuous trends. Because it uses the relatively high- and low- price points in a period to calculate the TD, the correct calculation of these price points for the corresponding period determines the labelling of all the data in such points. Then, if the stock price fluctuation caused by some short-term random events exceeds the threshold ω, i.e., the short-term rises and falls suddenly and sharply, which is caused by accidental events, the labeling method may regard this fluctuation as a trend of rise or fall, and then label the data and use the labeled data for training the model. The worst result is that the labelled data in this period may be biased (especially if the price returns to normal after the abnormal event), which may cause biased training results of the model.
Therefore, we used the DWT-based denoising algorithm to denoise the raw stock data in order to obtain the denoised data with better continuous trend characteristics, i.e., the denoised data can smooth such abnormal fluctuations, and then, change the relative high- and low-price points compared to the raw data for the corresponding period. Figure 6 shows the visualization process of the labeling of stock code 000005 with ω equal to 0.05. The red line indicates a downward trend and green line shows an upward trend. Figure 6a displays the labeling process of the DWT-based denoising data, and Figure 6b illustrates the labeling process of the raw data. Due to the denser data, it is roughly found that the continuity of the labeling process of denoised data is superior. The labeling process of the last 300 data from the two datasets was partially enlarged (Figure 6c,d). It can be clearly seen that the data that underwent DWT denoising are labeled with the labeling method based on the proposed continuous trend feature, reflecting better trend continuity. In particular, in the section circled by the orange box in the lower right corner, it can be seen that in the c plot, the labeling method is applicable to all the corresponding stock trends as a downtrend, while the two small rebounds in plot d are labeled as green, indicating that they are uptrend. The difference in the results originates from the difference in the sensitivity of the calculation of the high- and low-price points during this period. Similarly, the same situation exists for the orange box in the upper left corner. In the plot c, the labeling method labels the trends of this period as (fall, rise, fall), and in plot d, trends are indeed labeled as (fall, rise, fall, rise, fall). It can be clearly seen that the smoothness of the data after denoising is better, and the labeling process is more realistic and better reflects the characteristics of the continuous trend. In plots c and d, the serial numbers from 100 to 200 achieve the same conclusion mentioned above. The difference in labeling caused by noise may appear in training of an oversensitive model, and the accuracy of prediction and other metrics may also be sensitive to the validation set and test set. Therefore, it can be seen from the graphical results that the data after DWT-based denoising possess good smoothness and continuous trend characteristics, which can better improve the labeling effect of our labeling method. The trend continuity of the labeled data is better, reflecting the continuous trend characteristics of the corresponding stocks. This is more in line with actual investment behavior.

4.4. Resluts of ELM and DELM

After analyzing the smoothness of the data denoised by DWT and the stationarity of the data after feature processing by Equations (17) and (18), the obtained features were used to train the DELM. In order to better evaluate the effects of DWT on the final results, we established two models: the ELM model based on feature training of raw data, and the DELM based on feature training of denoised data. In the ELM training phase, we used the “High-Performance Extreme Learning Machines” toolbox [99]. Table 3, Table 4, Table 5, Table 6 and Table 7 present the results of Acc, P, R, F1, AUC of the two models with some stocks. The results in the validation dataset were mainly used to verify the selection of relevant parameters, and to prevent problems, such as overfitting a model trained on the training dataset. We analyzed the results in the test dataset with concentration of the results in the validation dataset. As shown in Table 3, in terms of the Acc, the results of the DELM in the test dataset significantly exceed those of the ELM model. For each stock, the Acc value of the DELM was higher than that of the ELM model for all stocks. The Acc of stock code 000007 increased from 0.5770 to 0.6483, with an increase of seven percentage points; the Acc of 000025 also increased from 0.6057 to 0.6894, with an increase of eight percentage points; the results of the improvement for codes 000048 and 000402 were not significantly different. With the Acc of 000520, the DELM significantly increased the result of the ELM model from 0.6225 to 0.7565, with an increase of 13%. In addition, the Acc for code 000530 was elevated by 10%. Other stocks’ Acc improvement was basically five to six percentage points. The above-mentioned results were then averaged. The mean values showed that the DELM increased the ELM’s Acc from 0.6445 to 0.6909, with an average increase of more than five percentage points. This is also in line with the average improvement result in the validation dataset, and the average improvement in the validation dataset is about three points.
From Table 4, it can be seen that in terms of the value of the Precision metric, the values are not as improved as the Acc metric. The results of the Precision metric for the ELM model are partly good, and some represent the results of the DELM, e.g., the stock codes of 000005, 000150, 000151, 000402, 000404, 000420, 000430, 000507, and 000509. However, in general, it can be seen from the average results that the values of the Precision metric for the DELM increased from 0.6170 to 0.6506, indicating improvement to a certain degree.
Regarding the values of the Recall metric, it can be seen from Table 5 that it is not significantly improved, and even in a variety of cases, the values of the Recall metric for the ELM model are higher. From the mean values of Recall metric, it can be seen that the mean values of Recall metric for the ELM model dropped by about 0.53% compared to the DELM.
Regarding the values of F1 metric presented in Table 6, we can also achieve the same conclusion as Recall metric. For each stock, the two models possess their own results. For the mean value, it was elevated by about 1.2 percentage points.
The values of the AUC metric are presented in Table 7. As far as the AUC values of the ELM and DELM were concerned, the AUC value was not improved in only one sample for the DELM, i.e., the AUC value of 0.6387 for the ELM model for code 000005 was higher than the AUC value for the DELM for code 0.6108. From the mean value of the AUC, it can be seen that the AUC value for the ELM model compared with the DELM was elevated from 0.6447 to 0.6856, with an increase of four percentage points, which is very significant.
At the same time, we calculated the mean values of the statistical metrics for all 400 stocks presented in Table 8 (the values for other stocks are submitted as Supplementary Materials). From Table 8, it can be seen that the conclusion is basically the same. The values of Acc and AUC for the DELM have been significantly improved compared to the ELM model. The improvement of F1 for the DELM is not statistically significant (0.6343 versus 0.6369). It was also noticed that the P value of the ELM model improved (within 3.7%) compared with that of the DELM. The value of R metric decreased by an average of 2.4 percentage points. The average value of AUC rose from 0.6517 to 0.6892, with an increase of 3.75 percentage points. In the section of statistical metrics, we compared the differences between the different metrics, and concentrated on the values of Acc and AUC. The results of the 400 stocks were also checked, as presented in the Supplementary Materials, and it was found that in terms of the values of Acc and AUC metrics, for the DELM, the values for each stock were elevated, indicating that DWT-based denoising could remarkably improve the ELM model prediction results based on the labeling method. It was demonstrated that the interpretation regarding the combination of continuous trend-based labeling method and DWT-based denoising was correct. The results showed that the denoised data were highly appropriate for the continuous trend-based labeling method. The results also highlighted the rationality and superiority of the architecture of the proposed hybrid method.

4.5. The DELM Method and Other Classification Algorithms

In order to further verify the prediction ability of the proposed hybrid method (DELM), we also tested prediction ability of other common models in datasets of the 400 stocks. Among them, the training of deep learning models (e.g., recurrent neural network (RNN) and LSTM), with good dealing with time-series data problems, was carried out through Pytorch (ver. 1.5.1) [100], and the other models were trained using the Sklearn toolkit (ver. 0.23.2) [101]. All the models and associated parameters are summarized in Table 9. The parameters’ names and specific functions are not detailed here (please refer to the related literatures).
Due to space limitations, we presented the results of only six stocks, and the results of other stocks are submitted as Supplementary Materials. As shown in Table 10, among the results of all six stocks, the Acc values of the DELM were most promising, which significantly surpassed the Acc values of other common models, basically reaching an accuracy rate of 0.7. Additionally, the Acc values of the DELM surpassed the results of other common models, and again verified the efficacy of DWT-based denoising. In addition, the results of LSTM, RNN, RF, GBT, ABT, and SVC were relatively better than those of other common models (basically between 0.67 and 0.68). The values of the AUC metric were concerned, those of stock codes 000005, 000007, and 000025 for the DELM were not the best in all algorithms, and the most reliable were found in other stock codes. As far as the Acc values were concerned, these values in the proposed DELM were the best on all the stock codes, and the AUC values of the proposed DELM were the best among all the other algorithms. In addition to the six stocks, we checked the results of all 400 stocks, and the conclusions were basically consistent with those of the six stocks, i.e., the proposed DELM could significantly improve the values of Acc and AUC in prediction process. Regarding the values of P, R, and F1, as explained in the section of statistical metrics, in general, each model possesses its unique advantages. This paper mainly concentrated on the values of Acc and AUC. Therefore, it is concluded that the proposed hybrid method DELM can better predict changes in the continuous trend of stocks and has a higher prediction accuracy and AUC value.

5. Conclusions

This research proposed a hybrid method for the trend prediction of stocks based on ELM and wavelet transform denoising. The raw data were first denoised based on DWT, feature preprocessing was performed after denoising data, and then, the DELM was trained based on the denoised data with the obtained features. Finally, the training DELM was compared with the initial ELM model on a dataset of 400 stocks. The prediction results greatly improved the values of the Acc and AUC metrics. The results fully proved the superiority of the DELM, and also showed that wavelet transform could improve the prediction ability of the ELM model. At the same time, the logical relationship between DWT-based denoising and the labeling method was analyzed based on the continuous trend, and logical explanations were provided for good results. Additionally, in order to better assess the efficacy of the proposed DELM, the predictive results of the DELM for the stocks were also compared with those of the 12 common algorithms, which the proposed DELM method outperformed. The results confirmed the superiority of the proposed DELM method as well. However, this paper does not investigate the influence of different wavelet function denoising results on improving the accuracy of stock trend prediction in-depth, which remains to be investigated by future research work.

Supplementary Materials

Author Contributions

D.W.: Conceptualization, methodology, programming, validation of the results, analyses, writing, review & editing, supervision, investigation, and data curation. X.W.: Resources, supervision, and project administration. S.W.: Conceptualization, supervision, and review. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Technology and Innovation Commission of Shenzhen Municipality (Grant No. JCYJ20190806112210067).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data are from https://tushare.pro/ (accessed on 2 January 2021), which can be downloaded in the sub-category of “Backward Answer Authority Quotes” under the category of “Quotes Data”. The data can also be downloaded for free through https://github.com/justbeat99/400_stocks_data_zips.git (accessed on 2 January 2021).

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Appendix A

Table A1. The descriptions of related abbreviations.
Table A1. The descriptions of related abbreviations.
AbbreviationFull NameDescription
ELMExtreme Learning MachineExtreme learning machine model.
CWTContinuous wavelet transformWavelet transform.
DWTDiscrete wavelet transformWavelet transform.
DELMDenoised ELMELM model trained based on the denoised data.
LSTMLong Short-Term MemoryClassifier of comparative experiment
RNNRecurrent Neural NetworkClassifier of comparative experiment
KNNk-Nearest NeighborsClassifier of comparative experiment
LRLogistic RegressionClassifier of comparative experiment
RFRandom Forest Classifier of comparative experiment
DTDecision TreeClassifier of comparative experiment
GBTGradient BoostingClassifier of comparative experiment
ABTAdaBoostClassifier of comparative experiment
NBNaive BayesClassifier of comparative experiment
LDALinear Discriminant AnalysisClassifier of comparative experiment
QDAQuadratic discriminant analysisClassifier of comparative experiment
SVCSupport Vector Machine classifierClassifier of comparative experiment
RBFRadial Basis FunctionActivation function
AccAccuracyStatistical Metric
RRecallStatistical Metric
PPrecisionStatistical Metric
F1F1 scoreStatistical Metric
AUCArea under the CurveStatistical Metric
Table A2. Part of the stocks of balance information of positive and negative samples on the corresponding training dataset, validation dataset, and test dataset. PN denotes the number of positive samples, NN denotes the number of negative samples, and PN% is the ratio of positive samples to the total number of samples in the corresponding dataset.
Table A2. Part of the stocks of balance information of positive and negative samples on the corresponding training dataset, validation dataset, and test dataset. PN denotes the number of positive samples, NN denotes the number of negative samples, and PN% is the ratio of positive samples to the total number of samples in the corresponding dataset.
CodeTraining PNTraining NNPN%Validation PNValidation NNPN%Test PNTest NNPN%
000001146018020.454022970.583473520.50
000005147915840.482593970.393013560.46
000007158612920.552893280.472703470.44
000012165216430.504033030.573253820.46
000014160416590.493543450.513223780.46
000025174315420.532394650.342844210.40
000026178815170.544282800.603433660.48
000031162415440.513083710.453223570.47
000032172914180.553383360.503723030.55
000048160715930.503303560.483043820.44
000050160315720.503783020.563653160.54
000055175415660.533413710.482494630.35
000056157915870.503383410.502933860.43
000061153316390.483393410.503113690.46
000065170215300.533913020.563113820.45
000068153114060.522693600.432333970.37
000090158017010.483683350.523443600.49
000150157814820.523003560.462204360.34
000151178815510.543373780.473733430.52
000155158313330.543602650.582793460.45
000158170815610.523663340.523413600.49
000402169016300.514043080.573633490.51
000404168216240.514232860.603283810.46
000411172013320.563203340.493393150.52
000420181614810.553963100.562834240.40
000422176215160.543743290.534072960.58
000430168614860.533703100.543583220.53
000507167716250.514053030.573473610.49
000509155714250.523113280.492553840.40
000519159115880.503183630.473643180.53
000520141114450.491894230.312933190.48
000523177714570.553443490.502674260.39
000524173914690.543373510.493453430.50
000526145015040.493173160.503243090.51
000530172815670.523713350.533293780.47
000531168215570.523223720.463413540.49
000532173015320.533423570.493653340.52
Table A3. The results of ADF test with critical values of 1% (−3.43), 5% (−2.86), and 10% (−2.57), respectively. “Used lag” denotes the number of lags used. “N of observations” represents the number of observations used for the ADF regression and calculation of the critical values.
Table A3. The results of ADF test with critical values of 1% (−3.43), 5% (−2.86), and 10% (−2.57), respectively. “Used lag” denotes the number of lags used. “N of observations” represents the number of observations used for the ADF regression and calculation of the critical values.
Stock CodeTest Statisticp ValueUsed LagN of Observations
000001−11.407.74 × 10−21174642
000005−10.772.37 × 10−19254350
000007−14.241.52 × 10−26114100
000012−15.304.32 × 10−28114696
000014−13.101.71 × 10−24224639
000025−9.562.48 × 10−16314662
000026−12.874.93 × 10−24144707
000031−10.323.00 × 10−18314494
000032−10.683.98 × 10−19324463
000048−7.855.75 × 10−12324539
000050−10.041.55 × 10−17324503
000055−10.911.11 × 10−19324711
000056−14.871.64 × 10−27114512
000061−14.812.02 × 10−27134518
000065−9.711.03 × 10−16324585
000068−14.961.26 × 10−27124183
000090−9.552.55 × 10−16284659
000150−9.681.23 × 10−16264345
000151−9.701.06 × 10−16324737
000155−13.012.59 × 10−24154150
000158−14.132.33 × 10−26174652
000402−10.293.67 × 10−18324711
000404−9.903.30 × 10−17324691
000411−10.431.65 × 10−18314328
000420−9.562.48 × 10−16314678
000422−9.045.06 × 10−15314652
000430−10.577.22 × 10−19294502
000507−14.427.95 × 10−27134704
000509−7.836.45 × 10−12314228
000519−14.871.64 × 10−27104531
000520−16.164.50 × 10−29124067
000523−10.401.93 × 10−18314588
000524−12.336.40 × 10−23164567
000526−9.025.79 × 10−15314188
000530−15.522.26 × 10−28184689
000531−8.812.04 × 10−14324595
000532−10.596.70 × 10−19304629

References

  1. Ding, G.; Qin, L. Study on the prediction of stock price based on the associated network model of LSTM. Int. J. Mach. Learn. Cybern. 2020, 11, 1307–1317. [Google Scholar] [CrossRef] [Green Version]
  2. Maqsood, H.; Mehmood, I.; Maqsood, M.; Yasir, M.; Afzal, S.; Aadil, F.; Selim, K.; Muhammad, K. A local and global event sentiment based efficient stock exchange forecasting using deep learning. Int. J. Inf. Manag. 2020, 50, 432–451. [Google Scholar] [CrossRef]
  3. Carta, S.; Corriga, A.; Ferreira, A.; Podda, A.S.; Recupero, D.R. A multi-layer and multi-ensemble stock trader using deep learning and deep reinforcement learning. Appl. Intell. 2021, 51, 889–905. [Google Scholar] [CrossRef]
  4. Domingos, S.D.O.; de Oliveira, J.F.; de Mattos Neto, P.S. An intelligent hybridization of ARIMA with machine learning models for time series forecasting. Knowl. Based Syst. 2019, 175, 72–86. [Google Scholar] [CrossRef]
  5. He, J.; Wang, J.; Jiang, X.; Chen, X.; Chen, L. The long-term extreme price risk measure of portfolio in inventory financing: An application to dynamic impawn rate interval. Complexity 2015, 20, 17–34. [Google Scholar] [CrossRef]
  6. Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
  7. Huang, G.; Huang, G.B.; Song, S.; You, K. Trends in extreme learning machines: A review. Neural Netw. 2015, 61, 32–48. [Google Scholar] [CrossRef]
  8. Huang, G.B.; Chen, L. Convex incremental extreme learning machine. Neurocomputing 2007, 70, 3056–3062. [Google Scholar] [CrossRef]
  9. Huang, G.B.; Zhou, H.; Ding, X.; Zhang, R. Extreme learning machine for regression and multiclass classification. IEEE Trans. Syst. Man Cybern. Part B 2011, 42, 513–529. [Google Scholar] [CrossRef] [Green Version]
  10. Zhou, H.; Zhuang, Z.; Liu, Y.; Liu, Y.; Zhang, X. Defect classification of green plums based on deep learning. Sensors 2020, 20, 6993. [Google Scholar] [CrossRef]
  11. Li, Y.; Zeng, Y.; Qing, Y.; Huang, G.B. Learning local discriminative representations via extreme learning machine for machine fault diagnosis. Neurocomputing 2020, 409, 275–285. [Google Scholar] [CrossRef]
  12. Ouyang, T.; Wang, C.; Yu, Z.; Stach, R.; Mizaikoff, B.; Huang, G.B.; Wang, Q.J. NOx measurements in vehicle exhaust using advanced deep ELM networks. IEEE Trans. Instrum. Meas. 2021, 70, 1–10. [Google Scholar] [CrossRef]
  13. Nayak, S.C.; Misra, B.B. Extreme learning with chemical reaction optimization for stock volatility prediction. Financ. Innov. 2020, 6, 1–23. [Google Scholar] [CrossRef]
  14. Liu, Z.; Jin, W.; Mu, Y. Variances-constrained weighted extreme learning machine for imbalanced classification. Neurocomputing 2020, 403, 45–52. [Google Scholar] [CrossRef]
  15. Chen, Y.; Xie, X.; Zhang, T.; Bai, J.; Hou, M. A deep residual compensation extreme learning machine and applications. J. Forecast. 2020, 39, 986–999. [Google Scholar] [CrossRef]
  16. Huang, G.; Song, S.; Gupta, J.N.; Wu, C. Semi-supervised and unsupervised extreme learning machines. IEEE Trans. Cybern. 2014, 44, 2405–2417. [Google Scholar] [CrossRef]
  17. Albadra MA, A.; Tiuna, S. Extreme learning machine: A review. Int. J. Appl. Eng. Res. 2017, 12, 4610–4623. [Google Scholar]
  18. Ding, S.; Zhao, H.; Zhang, Y.; Xu, X.; Nie, R. Extreme learning machine: Algorithm, theory and applications. Artif. Intell. Rev. 2015, 44, 103–115. [Google Scholar] [CrossRef]
  19. Alade, O.A.; Selamat, A.; Sallehuddin, R. A review of advances in extreme learning machine techniques and its applications. In Proceedings of the International Conference of Reliable Information and Communication Technology, Johor Bahru, Malaysia, 23–24 April 2017; Springer: Cham, Switzerland, 2017; pp. 885–895. [Google Scholar] [CrossRef]
  20. Alaba, P.A.; Popoola, S.I.; Olatomiwa, L.; Akanle, M.B.; Ohunakin, O.S.; Adetiba, E.; Alex, A.A.; Daud, W.M.A.W. Towards a more efficient and cost-sensitive extreme learning machine: A state-of-the-art review of recent trend. Neurocomputing 2019, 350, 70–90. [Google Scholar] [CrossRef]
  21. Zhang, G.; Li, Y.; Cui, D.; Mao, S.; Huang, G.B. R-ELMNet: Regularized extreme learning machine network. Neural Netw. 2020, 130, 49–59. [Google Scholar] [CrossRef]
  22. Chen, J.; Zeng, Y.; Li, Y.; Huang, G.B. Unsupervised feature selection based extreme learning machine for clustering. Neurocomputing 2020, 386, 198–207. [Google Scholar] [CrossRef]
  23. Zeng, Y.; Chen, J.; Li, Y.; Qing, Y.; Huang, G.B. Clustering via adaptive and locality-constrained graph learning and unsupervised ELM. Neurocomputing 2020, 401, 224–235. [Google Scholar] [CrossRef]
  24. Zeng, Y.; Li, Y.; Chen, J.; Jia, X.; Huang, G.B. ELM embedded discriminative dictionary learning for image classification. Neural Netw. 2020, 123, 331–342. [Google Scholar] [CrossRef]
  25. Li, Y.; Zeng, Y.; Liu, T.; Jia, X.; Huang, G.B. Simultaneously learning affinity matrix and data representations for machine fault diagnosis. Neural Netw. 2020, 122, 395–406. [Google Scholar] [CrossRef]
  26. Das, S.P.; Padhy, S. A novel hybrid model using teaching–learning-based optimization and a support vector machine for commodity futures index forecasting. Int. J. Mach. Learn. Cybern. 2018, 9, 97–111. [Google Scholar] [CrossRef]
  27. Wang, H.B.; Liu, X.; Song, P.; Tu, X.Y. Sensitive time series prediction using extreme learning machine. Int. J. Mach. Learn. Cybern. 2019, 10, 3371–3386. [Google Scholar] [CrossRef]
  28. Yang, L.; Song, S.; Li, S.; Chen, Y.; Huang, G. Graph embedding-based dimension reduction with extreme learning machine. IEEE Trans. Syst. Man Cybern. Syst. 2019, 1–12. [Google Scholar] [CrossRef]
  29. Li, X.; Xie, H.; Wang, R.; Cai, Y.; Cao, J.; Wang, F.; Min, X.; Deng, X. Empirical analysis: Stock market prediction via extreme learning machine. Neural Comput. Appl. 2016, 27, 67–78. [Google Scholar] [CrossRef]
  30. Wang, F.; Zhang, Y.; Rao, Q.; Li, K.; Zhang, H. Exploring mutual information-based sentimental analysis with kernel-based extreme learning machine for stock prediction. Soft Comput. 2017, 21, 3193–3205. [Google Scholar] [CrossRef]
  31. Jiang, M.; Jia, L.; Chen, Z.; Chen, W. The two-stage machine learning ensemble models for stock price prediction by combining mode decomposition, extreme learning machine and improved harmony search algorithm. Ann. Oper. Res. 2020, 1–33. [Google Scholar] [CrossRef]
  32. Tang, Z.; Zhang, T.; Wu, J.; Du, X.; Chen, K. Multistep-ahead stock price forecasting based on secondary decomposition technique and extreme learning machine optimized by the differential evolution algorithm. Math. Probl. Eng. 2020, 1–13. [Google Scholar] [CrossRef]
  33. Weng, F.; Chen, Y.; Wang, Z.; Hou, M.; Luo, J.; Tian, Z. Gold price forecasting research based on an improved online extreme learning machine algorithm. J. Ambient. Intell. Humaniz. Comput. 2020, 11, 4101–4111. [Google Scholar] [CrossRef]
  34. Jiang, F.; He, J.; Zeng, Z. Pigeon-inspired optimization and extreme learning machine via wavelet packet analysis for predicting bulk commodity futures prices. Sci. China Inf. Sci. 2019, 62, 70204. [Google Scholar] [CrossRef] [Green Version]
  35. Khuwaja, P.; Khowaja, S.A.; Khoso, I.; Lashari, I.A. Prediction of stock movement using phase space reconstruction and extreme learning machines. J. Exp. Theor. Artif. Intell. 2020, 32, 59–79. [Google Scholar] [CrossRef]
  36. Jeyakarthic, M.; Punitha, S. An effective stock market direction prediction model using water wave optimization with multi-kernel extreme learning machine. IIOAB J. 2020, 11, 103–109. [Google Scholar]
  37. Xu, H.; Wang, M.; Jiang, S.; Yang, W. Carbon price forecasting with complex network and extreme learning machine. Phys. A 2020, 545, 122830. [Google Scholar] [CrossRef]
  38. Wang, K.; Pei, H.; Cao, J.; Zhong, P. Robust regularized extreme learning machine for regression with non-convex loss function via DC program. J. Frankl. Inst. 2020, 357, 7069–7091. [Google Scholar] [CrossRef]
  39. Guo, W. Robust adaptive online sequential extreme learning machine for predicting nonstationary data streams with outliers. J. Algorithms Comput. Technol. 2019, 13, 1748302619895421. [Google Scholar] [CrossRef]
  40. Hu, Y.; Valera, H.G.A.; Oxley, L. Market efficiency of the top market-cap cryptocurrencies: Further evidence from a panel framework. Financ. Res. Lett. 2019, 31, 138–145. [Google Scholar] [CrossRef]
  41. Kristoufek, L. On Bitcoin markets (in) efficiency and its evolution. Phys. A 2018, 503, 257–262. [Google Scholar] [CrossRef]
  42. Liu, B.; Xia, X.; Xiao, W. Public information content and market information efficiency: A comparison between China and the US. China Econ. Rev. 2020, 60, 101405. [Google Scholar] [CrossRef]
  43. Han, C.; Wang, Y.; Xu, Y. Efficiency and multifractality analysis of the Chinese stock market: Evidence from stock indices before and after the 2015 stock market crash. Sustainability 2019, 11, 1699. [Google Scholar] [CrossRef] [Green Version]
  44. Chen, S.W.; Chen, H.C.; Chan, H.L. A real-time QRS detection method based on moving-averaging incorporating with wavelet denoising. Comput. Methods Programs Biomed. 2006, 82, 187–195. [Google Scholar] [CrossRef] [PubMed]
  45. Bai, Y.T.; Wang, X.Y.; Jin, X.B.; Zhao, Z.Y.; Zhang, B.H. A neuron-based kalman filter with nonlinear autoregressive model. Sensors 2020, 20, 299. [Google Scholar] [CrossRef] [Green Version]
  46. Manju, B.R.; Sneha, M.R. ECG denoising using wiener filter and kalman filter. Procedia Comput. Sci. 2020, 171, 273–281. [Google Scholar] [CrossRef]
  47. Mustafi, A.; Ghorai, S.K. A novel blind source separation technique using fractional Fourier transform for denoising medical images. Optik 2013, 124, 265–271. [Google Scholar] [CrossRef]
  48. Ma, H.; Yan, L.; Xia, Y.; Fu, M. Introduction to Kalman Filtering; Springer: Singapore, 2020; pp. 3–9. [Google Scholar] [CrossRef]
  49. Kato, K.; Takahashi, K.; Mizuguchi, N.; Ushiba, J. Online detection of amplitude modulation of motor-related EEG desynchronization using a lock-in amplifier: Comparison with a fast Fourier transform, a continuous wavelet transform, and an autoregressive algorithm. J. Neurosci. Methods 2018, 293, 289–298. [Google Scholar] [CrossRef]
  50. Chen, H.; Xu, W.; Broderick, N.; Han, J. An adaptive denoising method for Raman spectroscopy based on lifting wavelet transform. J. Raman Spectrosc. 2018, 49, 1529–1539. [Google Scholar] [CrossRef]
  51. Liu, T.; Wei, H.; Zhang, C.; Zhang, K. Time series forecasting based on wavelet decomposition and feature extraction. Neural Comput. Appl. 2017, 28, 183–195. [Google Scholar] [CrossRef]
  52. Xu, M.; Han, M.; Lin, H. Wavelet-denoising multiple echo state networks for multivariate time series prediction. Inf. Sci. 2018, 465, 439–458. [Google Scholar] [CrossRef]
  53. Bao, W.; Yue, J.; Rao, Y. A deep learning framework for financial time series using stacked autoencoders and long-short term memory. PLoS ONE 2017, 12, e0180944. [Google Scholar] [CrossRef] [Green Version]
  54. Yang, J.; Li, J.; Liu, S. A new algorithm of stock data mining in Internet of Multimedia Things. J. Supercomput. 2020, 76, 2374–2389. [Google Scholar] [CrossRef]
  55. Lahmiri, S. Randomness in denoised stock returns: The case of Moroccan family business companies. Phys. Lett. A 2018, 382, 554–560. [Google Scholar] [CrossRef]
  56. Li, X.; Tang, P. Stock index prediction based on wavelet transform and FCD-MLGRU. J. Forecast. 2020, 39, 1229–1237. [Google Scholar] [CrossRef]
  57. Wen, S.; An, H.; Huang, S.; Liu, X. Dynamic impact of China’s stock market on the international commodity market. Resour. Policy 2019, 61, 564–571. [Google Scholar] [CrossRef]
  58. Mohammed, S.A.; Bakar MA, A.; Ariff, N.M. Analysis of relationships between Malaysia’s Islamic and conventional stock markets using wavelet techniques. In AIP Conference Proceedings; AIP Publishing LLC: Melville, NY, USA, 2019; Volume 2111, p. 020018. [Google Scholar] [CrossRef]
  59. Yang, Z.; Yi, X.; Zhu, A. A mixed model based on wavelet transform and support vector regression to forecast stock price. In Proceedings of the 2020 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA), Dalian, China, 27–29 June 2020; pp. 420–426. [Google Scholar] [CrossRef]
  60. Xu, L.; Chhim, B.; Zheng, Y.; Nojima, Y. Stacked deep learning structure with bidirectional long-short term memory for stock market prediction. In Proceedings of the International Conference on Neural Computing for Advanced Applications, Shenzhen, China, 3–5 July 2020; Springer: Singapore, 2020; pp. 447–460. [Google Scholar] [CrossRef]
  61. He, F.; He, X. A continuous differentiable wavelet shrinkage function for economic data denoising. Comput. Econ. 2019, 54, 729–761. [Google Scholar] [CrossRef]
  62. Yu, M.; Yu, K.; Han, T.; Wan, Y.; Zhao, D. Research on application of fractional calculus in signal analysis and processing of stock market. Chaos Solitons Fractals 2020, 131, 109468. [Google Scholar] [CrossRef]
  63. Faraz, H.; Khaloozadeh, H. Multi-step-ahead stock market prediction based on least squares generative adversarial network. In Proceedings of the 2020 28th Iranian Conference on Electrical Engineering (ICEE), Tabriz, Iran, 4–6 August 2020; pp. 1–6. [Google Scholar] [CrossRef]
  64. Faraz, M.; Khaloozadeh, H.; Abbasi, M. Stock market prediction-by-prediction based on autoencoder long short-term memory networks. In Proceedings of the 2020 28th Iranian Conference on Electrical Engineering (ICEE), Tabriz, Iran, 4–6 August 2020; pp. 1–5. [Google Scholar] [CrossRef]
  65. Chen, Y.T.; Lai, W.N.; Sun, E.W. Jump detection and noise separation by a singular wavelet method for predictive analytics of high-frequency data. Comput. Econ. 2019, 54, 809–844. [Google Scholar] [CrossRef]
  66. Li, W.; Kong, D.; Wu, J. A novel hybrid model based on extreme learning machine, k-nearest neighbor regression and wavelet denoising applied to short-term electric load forecasting. Energies 2017, 10, 694. [Google Scholar] [CrossRef] [Green Version]
  67. Štifanić, D.; Musulin, J.; Miočević, A.; Baressi Šegota, S.; Šubić, R.; Car, Z. Impact of COVID-19 on forecasting stock prices: An integration of stationary wavelet transform and bidirectional long short-term memory. Complexity 2020, 2020. [Google Scholar] [CrossRef]
  68. Jiang, Z.; Yoon, S.M. Dynamic co-movement between oil and stock markets in oil-importing and oil-exporting countries: Two types of wavelet analysis. Energy Econ. 2020, 90, 104835. [Google Scholar] [CrossRef]
  69. Dai, Z.; Zhu, H.; Kang, J. New technical indicators and stock returns predictability. Int. Rev. Econ. Financ. 2020, 71, 127–142. [Google Scholar] [CrossRef]
  70. Al-Yahyaee, K.H.; Mensi, W.; Rehman, M.U.; Vo, X.V.; Kang, S.H. Do Islamic stocks outperform conventional stock sectors during normal and crisis periods? Extreme co-movements and portfolio management analysis. Pac. Basin Financ. J. 2020, 62, 101385. [Google Scholar] [CrossRef]
  71. Asafo-Adjei, E.; Agyapong, D.; Agyei, S.K.; Frimpong, S.; Djimatey, R.; Adam, A.M. Economic policy uncertainty and stock returns of Africa: A wavelet coherence analysis. Discret. Dyn. Nat. Soc. 2020, 2020. [Google Scholar] [CrossRef]
  72. Alshammari, T.S.; Ismail, M.T.; Al-Wadi, S.; Saleh, M.H.; Jaber, J.J. Modeling and forecasting saudi stock market volatility using wavelet methods. J. Asian Financ. Econ. Bus. 2020, 7, 83–93. [Google Scholar] [CrossRef]
  73. Mariani, M.C.; Bhuiyan MA, M.; Tweneboah, O.K.; Beccar-Varela, M.P.; Florescu, I. Analysis of stock market data by using Dynamic Fourier and Wavelets techniques. Phys. A 2020, 537, 122785. [Google Scholar] [CrossRef]
  74. Tan, Z.; Liu, J.; Chen, J. Detecting stock market turning points using wavelet leaders method. Phys. A 2020, 565, 125560. [Google Scholar] [CrossRef]
  75. Bouri, E.; Shahzad, S.J.; Roubaud, D.; Kristoufek, L.; Lucey, B. Bitcoin, gold, and commodities as safe havens for stocks: New insight through wavelet analysis. Q. Rev. Econ. Financ. 2020, 77, 156–164. [Google Scholar] [CrossRef]
  76. Huang, G.B.; Zhu, Q.Y.; Siew, C.K. Extreme learning machine: Theory and applications. Neurocomputing 2006, 70, 489–501. [Google Scholar] [CrossRef]
  77. Kuppili, V.; Tripathi, D.; Reddy Edla, D. Credit score classification using spiking extreme learning machine. Comput. Intell. 2020, 36, 402–426. [Google Scholar] [CrossRef]
  78. Zhang, W.B.; Ji, H.B. Fuzzy extreme learning machine for classification. Electron. Lett. 2013, 49, 448–450. [Google Scholar] [CrossRef]
  79. Shensa, M.J. The discrete wavelet transform: Wedding the a trous and Mallat algorithms. IEEE Trans. Signal Process. 1992, 40, 2464–2482. [Google Scholar] [CrossRef] [Green Version]
  80. Sifuzzaman, M.; Islam, M.R.; Ali, M.Z. Application of wavelet transform and its advantages compared to Fourier transform. J. Phys. Sci. 2009, 13, 121–134. [Google Scholar]
  81. Rhif, M.; Ben Abbes, A.; Farah, I.R.; Martínez, B.; Sang, Y. Wavelet transform application for/in non-stationary time-series analysis: A review. Appl. Sci. 2019, 9, 1345. [Google Scholar] [CrossRef] [Green Version]
  82. Zhang, D. Wavelet transform. In Fundamentals of Image Data Mining; Springer: Cham, Switzerland, 2019; pp. 35–44. [Google Scholar]
  83. Altunkaynak, A.; Ozger, M. Comparison of discrete and continuous wavelet–multilayer perceptron methods for daily precipitation prediction. J. Hydrol. Eng. 2016, 21, 04016014. [Google Scholar] [CrossRef]
  84. De Faria, M.L.L.; Cugnasca, C.E.; Amazonas, J.R.A. Insights into IoT data and an innovative DWT-based technique to denoise sensor signals. IEEE Sens. J. 2017, 18, 237–247. [Google Scholar] [CrossRef]
  85. Chen, D.; Wan, S.; Xiang, J.; Bao, F.S. A high-performance seizure detection algorithm based on Discrete Wavelet Transform (DWT) and EEG. PLoS ONE 2017, 12, e0173138. [Google Scholar] [CrossRef] [Green Version]
  86. Hajiabotorabi, Z.; Kazemi, A.; Samavati, F.F.; Ghaini FM, M. Improving DWT-RNN model via B-spline wavelet multiresolution to forecast a high-frequency time series. Expert Syst. Appl. 2019, 138, 112842. [Google Scholar] [CrossRef]
  87. Wu, D.; Wang, X.; Su, J.; Tang, B.; Wu, S. A labeling method for financial time series prediction based on trends. Entropy 2020, 22, 1162. [Google Scholar] [CrossRef] [PubMed]
  88. Long, W.; Lu, Z.; Cui, L. Deep learning-based feature engineering for stock price movement prediction. Knowl. Based Syst. 2019, 164, 163–173. [Google Scholar] [CrossRef]
  89. Hossin, M.; Sulaiman, M.N. A review on evaluation metrics for data classification evaluations. Int. J. Data Min. Knowl. Manag. Process 2015, 5, 1. [Google Scholar] [CrossRef]
  90. Lever, J.; Krzywinski, M.; Altman, N. Classification Evaluation; Nature Publishing Group: Washington, DC, USA, 2016. [Google Scholar] [CrossRef] [Green Version]
  91. Ma, W.; Lejeune, M.A. A distributionally robust area under curve maximization model. Oper. Res. Lett. 2020, 48, 460–466. [Google Scholar] [CrossRef]
  92. Lever, J. Classification evaluation: It is important to understand both what a classification metric expresses and what it hides. Nat. Methods 2016, 13, 603–605. [Google Scholar] [CrossRef]
  93. Chicco, D.; Jurman, G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genom. 2020, 21, 6. [Google Scholar] [CrossRef] [Green Version]
  94. Narkhede, S. Understanding auc-roc curve. Towards Data Sci. 2018, 26, 220–227. [Google Scholar]
  95. Singh, D.; Singh, B. Investigating the impact of data normalization on classification performance. Appl. Soft Comput. 2020, 97, 105524. [Google Scholar] [CrossRef]
  96. Moskowitz, T.J.; Ooi, Y.H.; Pedersen, L.H. Time series momentum. J. Financ. Econ. 2012, 104, 228–250. [Google Scholar] [CrossRef] [Green Version]
  97. Mushtaq, R. Augmented Dickey Fuller Test. 2011. Available online: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1911068 (accessed on 27 June 2017).
  98. Ajewole, K.P.; Adejuwon, S.O.; Jemilohun, V.G. Test for stationarity on inflation rates in Nigeria using augmented dickey fuller test and Phillips-persons test. J. Math. 2020, 16, 11–14. [Google Scholar]
  99. Akusok, A.; Björk, K.M.; Miche, Y.; Lendasse, A. High-performance extreme learning machines: A complete toolbox for big data applications. IEEE Access 2015, 3, 1011–1025. [Google Scholar] [CrossRef]
  100. Ketkar, N. Introduction to pytorch. In Deep Learning with Python; Apress: Berkeley, CA, USA, 2017; pp. 195–208. [Google Scholar] [CrossRef]
  101. Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, P.; Prettenhofer, R.; Weiss, V.; Duchesnay, E. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
Figure 1. Schematical presentation of a single-hidden-layer feedforward neural network (SLFN).
Figure 1. Schematical presentation of a single-hidden-layer feedforward neural network (SLFN).
Entropy 23 00440 g001
Figure 2. Discrete wavelet transform (DWT)-based denoising process. cA denotes the approximate component, and cD represents the detail component. Plot (a) shows the process of signal decomposition and reconstruction. Plot (b) shows the whole denoising process.
Figure 2. Discrete wavelet transform (DWT)-based denoising process. cA denotes the approximate component, and cD represents the detail component. Plot (a) shows the process of signal decomposition and reconstruction. Plot (b) shows the whole denoising process.
Entropy 23 00440 g002
Figure 3. Flowchart of the DELM method.
Figure 3. Flowchart of the DELM method.
Entropy 23 00440 g003
Figure 4. The diagrams of the four stocks using raw data and denoised data by DWT.
Figure 4. The diagrams of the four stocks using raw data and denoised data by DWT.
Entropy 23 00440 g004
Figure 5. Stock data feature diagram of code 000005. Where, (a) is the sequence diagram of the denoised data, and the Y axis represents the closing price after denoising; (b) displays the feature diagram of time-series data after the denoised data could be processed by Equations (17) and (18), and the Y axis represents the value of the feature; (c) shows the autocorrelation graph, and the Y axis represents the autocorrelation value; (d) illustrates the partial autocorrelation graph, and the Y axis represents the partial autocorrelation value. The X axis in (a,b) represents the date, and in (c,d) represents the lag parameter.
Figure 5. Stock data feature diagram of code 000005. Where, (a) is the sequence diagram of the denoised data, and the Y axis represents the closing price after denoising; (b) displays the feature diagram of time-series data after the denoised data could be processed by Equations (17) and (18), and the Y axis represents the value of the feature; (c) shows the autocorrelation graph, and the Y axis represents the autocorrelation value; (d) illustrates the partial autocorrelation graph, and the Y axis represents the partial autocorrelation value. The X axis in (a,b) represents the date, and in (c,d) represents the lag parameter.
Entropy 23 00440 g005
Figure 6. The labeling process. The Y-axis represents the closing price of the corresponding stock, and the X-axis indicates the date. The green line indicates that the labeling method is significant in labeling this segment of data as an upward trend, and the red line indicates that the labeling method can be used to label this segment of data as a downward trend. (a) displays the labeling process of the DWT-based denoising data, and (b) illustrates the labeling process of the raw data. The labeling process of the last 300 data from the two datasets was partially enlarged (c,d).
Figure 6. The labeling process. The Y-axis represents the closing price of the corresponding stock, and the X-axis indicates the date. The green line indicates that the labeling method is significant in labeling this segment of data as an upward trend, and the red line indicates that the labeling method can be used to label this segment of data as a downward trend. (a) displays the labeling process of the DWT-based denoising data, and (b) illustrates the labeling process of the raw data. The labeling process of the last 300 data from the two datasets was partially enlarged (c,d).
Entropy 23 00440 g006
Table 1. Parameters required for training of the extreme learning machine (ELM) model.
Table 1. Parameters required for training of the extreme learning machine (ELM) model.
NameInput NeuronsHidden NeuronsActivation FunctionHidden LayersOutput Neurons
First hidden layer1150Sigmoid150
Second hidden layer5050RBF11
Table 2. Metrics used for evaluation of classification effects.
Table 2. Metrics used for evaluation of classification effects.
MetricsFormulaEvaluation Focus
Accuracy (Acc) T P + T N T P + F N + F P + F N The ratio of correctly classified samples to total samples.
Recall (R) T P T P + F N Proportion of correctly classified results among the true positive samples.
Precision (P) T P T P + F P Proportion of correctly classified results among the results predicted to be positive samples.
F1_score (F1) 2 × P r e c i s i o n × R e c a l l P r e c i s i o n + R e c a l l The harmonic average of precision and recall, and its value is closer to the smaller value of Precision and Recall.
AUC i = 1 N + j = 1 N 1 f ( x i + ) f ( x j ) M The area under the Roc curve is between 0.1 and 1. Area under the curve (AUC) as a value can intuitively evaluate the quality of the classifier. The larger the value, the better the results will be.
Table 3. The Acc metric of corresponding validation datasets and test datasets of ELM and DELM for some stocks.
Table 3. The Acc metric of corresponding validation datasets and test datasets of ELM and DELM for some stocks.
CodeELMDELM
Validation AccTest AccValidation AccTest Acc
0000010.57940.66380.70960.6953
0000050.63110.64990.72410.6941
0000070.60620.57700.67260.6483
0000120.62750.65060.67280.6973
0000140.64520.64860.61660.7000
0000250.51560.60570.66620.6894
0000260.63700.62760.65820.6996
0000310.65680.67160.71280.7231
0000320.60980.64000.62610.7037
0000480.62240.68220.62540.6837
0000500.59120.61970.63090.6814
0000550.64330.61240.67130.6699
0000560.61410.63180.66860.7025
0000610.60880.64410.66180.6897
0000650.64940.63200.63490.6912
0000680.65020.60950.68360.6587
0000900.57330.66190.61170.6903
0001500.64330.66920.67990.7317
0001510.63360.65080.65450.6732
0001550.65120.63520.66080.6976
0001580.63570.65190.61430.6790
0004020.63060.68260.63060.6896
0004040.64320.67000.69530.6953
0004110.60400.63460.64370.7034
0004200.65160.62380.62750.6436
0004220.67000.67430.72260.6885
0004300.62350.68820.60880.6941
0005070.61720.63980.65540.6822
0005090.64010.63690.62600.6870
0005190.64170.65840.66810.6935
0005200.67160.62250.73690.7565
0005230.65080.64940.72150.7071
0005240.61480.65550.62940.6846
0005260.62880.63190.62090.6367
0005300.64450.64360.69690.7440
0005310.66140.64890.63690.6964
0005320.62950.65240.65520.6609
Mean0.62830.64450.66030.6909
Table 4. The precision metric for the corresponding validation datasets and test datasets of ELM and DELM for some stocks.
Table 4. The precision metric for the corresponding validation datasets and test datasets of ELM and DELM for some stocks.
CodeELMDELM
Validation PTest PValidation PTest P
0000010.75470.70740.69870.7425
0000050.53710.65240.46860.5909
0000070.55580.51080.59550.6316
0000120.70350.65230.70000.7380
0000140.67430.64620.64810.6708
0000250.35990.50720.46860.5693
0000260.68790.59750.67810.6205
0000310.61190.66780.63810.7120
0000320.59260.65030.63550.7486
0000480.61720.66290.61390.7273
0000500.64370.68280.66380.7437
0000550.60900.46380.64000.5248
0000560.62500.57760.63170.6947
0000610.63370.62730.58170.6061
0000650.67960.57110.64020.7300
0000680.58780.47490.63980.4868
0000900.60760.69060.63490.7356
0001500.59940.50520.64570.5000
0001510.58950.65730.62500.5749
0001550.67930.57100.66410.6045
0001580.63840.62600.62390.7045
0004020.67320.69520.66900.6580
0004040.73100.66550.75460.5248
0004110.56620.61110.62660.6332
0004200.65690.51950.58620.4848
0004220.67490.72360.76830.8235
0004300.63190.66900.54290.6289
0005070.66500.64110.70460.6062
0005090.61600.53850.60980.5020
0005190.61280.68250.64980.7782
0005200.47000.63600.43500.7057
0005230.61920.53190.70940.5525
0005240.58490.63640.61660.7429
0005260.65300.66080.67790.7254
0005300.64930.61290.76420.7021
0005310.62180.64560.58670.6810
0005320.60560.65800.63760.6676
Mean0.62220.61700.63450.6506
Table 5. The values of the Recall metric for the corresponding validation datasets and test datasets of the ELM and DELM for some stocks.
Table 5. The values of the Recall metric for the corresponding validation datasets and test datasets of the ELM and DELM for some stocks.
CodeELMDELM
Validation RTest RValidation RTest R
0000010.39800.55040.56040.6201
0000050.47490.50500.48240.3467
0000070.79240.78520.70500.6755
0000120.60050.51380.64850.5831
0000140.57910.52170.50290.5514
0000250.54810.73940.50910.4656
0000260.73130.70550.71320.6573
0000310.66560.61180.71280.6897
0000320.71010.75000.60180.7048
0000480.56670.57570.57060.5994
0000500.59260.54250.64030.5852
0000550.71260.69480.67470.5944
0000560.56210.54610.64610.5414
0000610.51030.54660.63570.6569
0000650.71610.72350.64200.5356
0000680.60970.52790.61400.5311
0000900.52170.55810.54370.5630
0001500.66330.66820.65440.6364
0001510.73290.68900.72050.7932
0001550.74720.73480.75810.8137
0001580.69950.70670.58120.5959
0004020.67820.67220.70660.7825
0004040.63590.57620.69930.6883
0004110.81560.81120.74850.7979
0004200.79290.79860.79300.8703
0004220.73260.70760.69310.5255
0004300.73780.80730.75500.8286
0005070.66670.60230.70460.6701
0005090.69130.63140.64310.6318
0005190.63210.67310.58250.6129
0005200.49740.49490.64440.8430
0005230.77030.74910.69420.6174
0005240.73590.73040.60730.5417
0005260.55210.57720.54030.5895
0005300.70350.63530.70570.6856
0005310.68940.63050.65060.7033
0005320.69590.69590.69860.6657
Mean0.65300.64840.64820.6431
Table 6. The F1 score metric for the corresponding validation datasets and test datasets of the ELM and DELM for some stocks.
Table 6. The F1 score metric for the corresponding validation datasets and test datasets of the ELM and DELM for some stocks.
CodeELMDELM
Validation F1Test F1Validation F1Test F1
0000010.52120.61910.62200.6758
0000050.50410.56930.47540.4370
0000070.65340.61900.64560.6528
0000120.64790.57490.67330.6515
0000140.62310.57730.56630.6053
0000250.43450.60170.48800.5122
0000260.70890.64710.69520.6384
0000310.63760.63860.67340.7006
0000320.64600.69660.61820.7260
0000480.59080.61620.59140.6572
0000500.61710.60460.65190.6550
0000550.65680.55630.65690.5574
0000560.59190.56140.63880.6085
0000610.56540.58420.60750.6305
0000650.69740.63830.64110.6179
0000680.59850.50000.62660.5080
0000900.56140.61740.58570.6379
0001500.62970.57530.65000.5600
0001510.65340.67280.66930.6667
0001550.71160.64260.70800.6937
0001580.66750.66390.60180.6457
0004020.67570.68350.68730.7148
0004040.68020.61760.72590.5955
0004110.66840.69710.68210.7061
0004200.71850.62950.67410.6228
0004220.70260.71550.72880.6416
0004300.68080.73160.63160.7151
0005070.66580.62110.70460.6365
0005090.65150.58120.62600.5595
0005190.62230.67770.61430.6857
0005200.48330.55660.51940.7683
0005230.68650.62210.70170.5832
0005240.65180.68020.61190.6265
0005260.59830.61610.60130.6505
0005300.67530.62390.73380.6937
0005310.65390.63800.61700.6920
0005320.64760.67640.66670.6667
Mean0.63190.62550.63820.6377
Table 7. The AUC values in the corresponding validation datasets and test datasets of ELM and DELM for some stocks.
Table 7. The AUC values in the corresponding validation datasets and test datasets of ELM and DELM for some stocks.
CodeELMDELM
Validation AUCTest AUCValidation AUCTest AUC
0000010.61150.66300.69040.6972
0000050.60400.63870.64550.6108
0000070.61720.60010.67690.6489
0000120.63190.64040.67380.6940
0000140.64610.63920.61610.6789
0000250.52350.62740.62330.6378
0000260.61210.63010.65250.6927
0000310.65760.66860.71280.7212
0000320.60950.62750.62630.7036
0000480.62040.67130.62280.6847
0000500.59100.62570.63010.6847
0000550.64610.63140.67160.6525
0000560.61390.62150.66670.6820
0000610.60850.63650.65790.6844
0000650.63950.64050.63480.6813
0000680.64510.59270.67530.6266
0000900.57580.65960.61230.6865
0001500.64490.66900.67770.7015
0001510.63900.64920.65640.6911
0001550.63400.64480.65180.7135
0001580.63270.65340.61440.6775
0004020.62320.68280.61730.6901
0004040.64490.66340.69460.6935
0004110.60840.62780.64140.7125
0004200.63190.65280.63200.6990
0004220.66570.66800.72500.6991
0004300.61250.68160.62350.7033
0005070.60890.63910.64550.6804
0005090.64140.63600.62640.6721
0005190.64110.65730.66090.7016
0005200.62340.61730.70380.7601
0005230.65170.66800.72000.6845
0005240.61720.65530.62860.6813
0005260.62890.63320.62590.6448
0005300.64130.64300.69490.7362
0005310.66330.64860.63810.6966
0005320.63090.65030.65580.6609
Mean0.62540.64470.65470.6856
Table 8. The mean values of statistical metrics in validation datasets and test datasets for the ELM and DELM for the 400 stocks.
Table 8. The mean values of statistical metrics in validation datasets and test datasets for the ELM and DELM for the 400 stocks.
MetricELMDELM
ValidationTestValidationTest
Acc0.63860.65230.66340.7013
p0.65390.63120.68110.6681
R0.66480.64970.64360.6257
F10.65480.63430.65670.6369
AUC0.63570.65170.66020.6892
Table 9. Related parameters for training of the 12 common models.
Table 9. Related parameters for training of the 12 common models.
ModelsRelated Parameters
LSTMInput size = 11; hidden size = 11; output size = 2; layer num = 1; Activation function = Relu; Optimization function = Adam with learning rate = 0.009, betas = (0.9, 0.999), eps = 1 × 10−8; loss function = Cross Entropy Loss; stop training epoch = 200
RNNInput size = 11; hidden size = 11; output size = 2; layer num = 1; Activation function = Relu; Optimization function = Adam with learning rate = 0.009, betas = (0.9, 0.999), eps = 1 × 10−8; loss function = Cross Entropy Loss; stop training epoch = 200
KNNn of neighbors = 5
LRpenalty = ‘l2’
RFn of estimators = 50
DTmax of depth = 3
GBTLearning rate = 0.1, n_estimators = 100
ABTn of estimators = 50
NBpriors = None; var smoothing = 1 × 108
LDAsolver = ‘svd’; store covariance = False; tol = 1 × 104
QDAstore covariance = False; tol = 1 × 104
SVCkernel = ‘rbf’; C = 2
Table 10. Classification results of the DELM and other classification algorithms on several test datasets and validation datasets.
Table 10. Classification results of the DELM and other classification algorithms on several test datasets and validation datasets.
CodeModelValidation
Acc
Test
Acc
Validation
P
Test
P
Validation
R
Test
R
Validation
F1
Test
F1
Validation
AUC
Test
AUC
000001ELM0.57940.66380.75470.70740.39800.55040.52120.61910.61150.6630
DELM0.70960.69530.69870.74250.56040.62010.62200.67580.69040.6972
LSTM0.57110.65220.68750.65120.46620.64500.55550.64800.58970.6522
RNN0.57320.66940.72000.68700.43480.62220.52720.65010.59770.6690
KNN0.55940.64090.72170.65480.38060.58500.49840.61800.59100.6405
LR0.48930.62090.73200.74120.17660.36310.28460.48740.54450.6191
RF0.60520.64810.77160.66240.44530.59370.56470.62610.63340.6477
DT0.62230.65520.76740.67790.49250.58210.60000.62640.64530.6547
GBT0.62230.68100.78280.70000.47510.62540.59130.66060.64830.6806
ABT0.61800.68530.76680.70420.48260.63110.59240.66570.64200.6849
NB0.48640.61090.66930.66970.21140.42650.32140.52110.53500.6096
LDA0.55080.64230.77850.71370.30600.46690.43930.56450.59410.6411
QDA0.58940.60520.64710.58940.62940.67440.63810.62900.58240.6056
SVC0.63380.68380.82020.72180.46520.59080.59370.64980.66360.6832
000005ELM0.63110.64990.53710.65240.47490.50500.50410.56930.60400.6387
DELM0.72410.69410.46860.59090.48240.34670.47540.43700.64550.6108
LSTM0.57260.59760.45490.56840.40000.51360.42490.53850.54260.5911
RNN0.64050.65070.56430.64680.39650.52460.46430.57750.59810.6409
KNN0.62500.61340.52670.58480.49420.53820.51000.56060.60230.6076
LR0.63110.63620.54970.65350.36290.43850.43720.52490.58450.6210
RF0.65700.64990.56590.63710.56370.54820.56480.58930.64080.6421
DT0.59450.61490.48800.58390.54830.55480.51640.56900.58650.6103
GBT0.62040.65750.52050.64500.49030.56150.50500.60040.59780.6501
ABT0.61430.64230.51030.61620.57530.58140.54080.59830.60750.6376
NB0.57930.56160.43880.56310.23550.19270.30650.28710.51950.5331
LDA0.63110.66060.53650.66530.48260.52160.50810.58470.60530.6498
QDA0.59300.59510.47780.59890.33200.35220.39180.44350.54760.5764
SVC0.64630.65300.55340.64200.54050.54820.54690.59140.62800.6449
000007ELM0.60620.57700.55580.51080.79240.78520.65340.61900.61720.6001
DELM0.67260.64830.59550.63160.70500.67550.64560.65280.67690.6489
LSTM0.60620.58740.55400.51850.82010.80330.66110.63010.61890.6113
RNN0.59510.60490.54880.53320.78270.80410.64440.64040.60630.6270
KNN0.60450.57540.55870.51020.74050.74070.63690.60420.61260.5937
LR0.53970.53480.50490.48330.89270.91110.64500.63160.56070.5766
RF0.62560.62070.57970.55260.73010.70000.64620.61760.63180.6295
DT0.64020.66770.59650.60120.71630.71480.65090.65310.64470.6730
GBT0.61590.65480.57030.57850.73010.77780.64040.66350.62270.6684
ABT0.62070.65320.57490.57650.73010.78150.64330.66350.62720.6674
NB0.54290.53810.50730.48410.83740.84440.63190.61540.56050.5721
LDA0.58020.59000.53490.51970.79580.82960.63980.63910.59300.6165
QDA0.59970.56240.57140.50000.58130.54810.57630.52300.59860.5608
000012ELM0.62750.65060.70350.65230.60050.51380.64790.57490.63190.6404
DELM0.67280.69730.70000.73800.64850.58310.67330.65150.67380.6940
LSTM0.52710.67580.58200.63130.60770.71020.59410.66800.51370.6784
RNN0.56660.68980.64730.68330.52900.60650.58220.64240.57280.6836
KNN0.59210.64920.67800.63510.54340.55690.60330.59340.60010.6423
LR0.66860.68180.73540.65340.65510.65540.69290.65440.67080.6798
RF0.61610.68460.68440.67350.60790.60920.64390.63970.61750.6790
DT0.59350.67330.67470.66430.55580.58460.60950.62190.59970.6667
GBT0.62890.69310.70800.69150.59550.60000.64690.64250.63440.6861
ABT0.60480.68460.69870.68750.54090.57540.60980.62650.61540.6764
NB0.49860.61810.60700.68210.34490.31690.43990.43280.52390.5956
LDA0.65860.65350.72130.62270.65510.62460.68660.62370.65920.6513
QDA0.60910.56010.65840.52020.65510.55380.65670.53650.60150.5596
SVC0.62750.69450.69660.70110.61540.58460.65350.63760.62950.6863
000014ELM0.64520.64860.67430.64620.57910.52170.62310.57730.64610.6392
DELM0.61660.70000.64810.67080.50290.55140.56630.60530.61610.6789
LSTM0.62780.65410.66460.65300.54040.54160.59470.58830.62890.6458
RNN0.60200.58930.62940.56140.52030.50530.56900.53100.60310.5831
KNN0.58510.57290.61190.53670.49440.52170.54690.52910.58630.5691
LR0.62950.64710.66100.63640.55080.54350.60090.58630.63050.6395
RF0.62370.62290.66190.59800.52540.54970.58580.57280.62500.6174
DT0.62950.63570.70390.63140.46330.50000.55880.55810.63160.6257
GBT0.62950.66290.64800.63870.58760.61490.61630.62660.63000.6593
ABT0.64090.63290.66040.60800.59890.56830.62810.58750.64150.6281
NB0.54510.60000.57320.59460.39830.40990.47000.48530.54700.5859
LDA0.62230.66000.64240.64480.57340.58070.60600.61110.62300.6541
QDA0.55940.57290.59910.54660.39270.41930.47440.47450.56150.5615
SVC0.64230.66000.67330.65330.57060.55590.61770.60070.64330.6523
000025ELM0.51560.60570.35990.50720.54810.73940.43450.60170.52350.6274
DELM0.66620.68940.46860.56930.50910.46560.48800.51220.62330.6378
LSTM0.57090.63530.40060.54700.52850.55280.45530.54910.56060.6219
RNN0.60750.58910.44140.48800.51210.54750.46930.50930.58430.5823
KNN0.58810.59430.41410.49660.51460.51060.45900.50350.57020.5807
LR0.46020.55600.35930.47250.75310.87680.48650.61410.53140.6082
RF0.60230.65390.41830.56540.43930.60920.42860.58640.56270.6466
DT0.60510.65670.43250.57240.52300.58450.47350.57840.58520.6450
GBT0.60510.66670.43300.57140.52720.69010.47550.62520.58620.6705
ABT0.58810.65820.41180.56830.49790.63030.45080.59770.56620.6536
NB0.39350.48790.27930.42910.49790.82040.35790.56350.41880.5420
LDA0.52700.57590.39170.48300.71130.75000.50520.58760.57180.6042
QDA0.64350.57870.47370.45860.45190.25350.46250.32650.59690.5258
SVC0.57100.66670.40600.56710.56900.72890.47390.63790.57050.6768
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Wu, D.; Wang, X.; Wu, S. A Hybrid Method Based on Extreme Learning Machine and Wavelet Transform Denoising for Stock Prediction. Entropy 2021, 23, 440. https://0-doi-org.brum.beds.ac.uk/10.3390/e23040440

AMA Style

Wu D, Wang X, Wu S. A Hybrid Method Based on Extreme Learning Machine and Wavelet Transform Denoising for Stock Prediction. Entropy. 2021; 23(4):440. https://0-doi-org.brum.beds.ac.uk/10.3390/e23040440

Chicago/Turabian Style

Wu, Dingming, Xiaolong Wang, and Shaocong Wu. 2021. "A Hybrid Method Based on Extreme Learning Machine and Wavelet Transform Denoising for Stock Prediction" Entropy 23, no. 4: 440. https://0-doi-org.brum.beds.ac.uk/10.3390/e23040440

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop