PM2.5 Concentration Prediction Based on LightGBM Optimized by Adaptive Multi-Strategy Enhanced Sparrow Search Algorithm

Liu, Xuehu; Zhao, Kexin; Liu, Zuhan; Wang, Lili

doi:10.3390/atmos14111612

Open AccessArticle

PM_2.5 Concentration Prediction Based on LightGBM Optimized by Adaptive Multi-Strategy Enhanced Sparrow Search Algorithm

¹

School of Information Engineering, Nanchang Institute of Technology, Nanchang 330099, China

²

College of Science, Nanchang Institute of Technology, Nanchang 330099, China

^*

Authors to whom correspondence should be addressed.

Atmosphere 2023, 14(11), 1612; https://0-doi-org.brum.beds.ac.uk/10.3390/atmos14111612

Submission received: 22 September 2023 / Revised: 20 October 2023 / Accepted: 26 October 2023 / Published: 27 October 2023

(This article belongs to the Section Atmospheric Techniques, Instruments, and Modeling)

Download

Browse Figures

Versions Notes

Abstract

:

The atmospheric environment is of great importance to human health. However, its influencing factors are complex and variable. An efficient technique is required to more precisely estimate PM_2.5 concentration values. In this paper, an enhanced Sparrow Search Algorithm (LASSA)-optimized Light Gradient Boosting Machine (LightGBM) is proposed for PM_2.5 concentration prediction. This approach can provide accurate predictions while also reducing potential losses resulting from unexpected events. LightGBM is regarded as an outstanding machine learning approach; however, it includes hyperparameters that must be optimally mixed in order to achieve the desired results. We update the Sparrow Search Algorithm (SSA) and utilize it to identify the optimal combination of the most crucial parameters, using cross-validation to increase the reliability. Using limited air quality data and meteorological data as inputs, PM_2.5 concentration values were predicted. The LASSA-LGB’s output was compared to normal LGB, SSA-LGB and ISSA-LGB. The findings demonstrate that LASSA-LGB outperforms the other models in terms of prediction accuracy. The RMSE and MAPE error indices were lowered from 3% to 16%. The concordance correlation coefficient is not less than 0.91, and the R² reached 0.96. This indicates that the proposed model has potential advantages in the field of PM_2.5 concentration prediction.

Keywords:

PM_2.5 concentration prediction; lightgbm; sparrow search algorithm; Lévy flight

1. Introduction

With the rapid development of the economy and the ongoing advancement of urban industrialization, the issue of air contamination has grown more prominent in recent decades. The atmospheric environment contains a range of suspended substances, including tiny particulate substances with a particle size of less than 2.5 microns, which are referred to as PM_2.5. Excessive PM_2.5 content results in deteriorating air quality, which endangers human health, causes respiratory diseases and, in extreme cases, even leads to death [1]. PM_2.5 concentration has become a crucial reference for objectively evaluating urban air quality. Therefore, the timely prediction of the PM_2.5 concentration is necessary to mitigate health damages [2]. Statistical approaches and machine learning prediction models have made up the majority of the recent PM_2.5 concentration prediction research. Methods based on statistics are primarily derived from gathering historical data, followed by a manual experience summary and statistical analysis. However, achieving accurate predictions is challenging due to the complex sources of pollutants and numerous influencing factors. Artificial intelligence technology is aiding this pursuit with the application of machine learning methods, which are becoming increasingly prevalent. Machine learning models have superior generalization capabilities compared to traditional statistical models. Classical machine learning models, such as the K-neighbors regressor and support vector regressor, have been applied to forecast air quality [3,4,5]. These methods offer the advantages of simplicity and fast calculation. Nevertheless, they suffer from relatively inferior accuracy and an inability to extract crucial information from complex data.

Numerous neural network techniques, including the long-short-term memory net-work, recurrent neural network and backpropagation network, have recently been used in this subject [6,7,8]. Despite the advantages of avoiding complicated data preprocessing and achieving higher accuracy in the output, the drawbacks cannot be ignored [9,10,11]. These include challenging algorithm parameter adjustments, lengthy training times and slow calculation speeds [12].

The new machine learning method achieves high accuracy in predicting the PM_2.5 concentration because it analyzes the nonlinear relationship between atmospheric pollutants and variable meteorological factors. As a result, researchers employ cutting-edge machine learning algorithms to forecast the haze concentration. A recent study introduced a gradient boosting decision tree (GBDT) model that incorporates both human activities and natural factors [13]. It is utilized to directly estimate terrestrial PM_2.5 concentrations from the national satellite aerosol optical depth data, with a fitting coefficient of 0.81. As reported in the literature [14], the remote sensing data retrieved by satellites are used to compensate for the lack of ground monitoring data. These data are used to estimate historical daily PM_2.5 concentrations on a national scale, together with meteorological and land use data. The random forest model demonstrated the most optimal fit for this prediction. As per the study in the literature [15], a novel feature engineering approach in tandem with the spatial effect of meteorological data was employed. The LightGBM model was utilized to forecast based on varied timescales. As a result, the fitting index reached up to 0.88. Candice et al. [16] used a variety of UCI public datasets to compare many techniques in the gradient boosting class. It was demonstrated that LightGBM is an accurate model with an extraordinarily rapid training performance. Ye et al. [17] applied SVM, GBDT and NAIVE BAYES for the fault identification of sensor nodes, with the findings indicating that GBDT delivers a higher fault detection rate.

Notably, the LightGBM model is an advanced version of GBDT which boasts superior accuracy and speed, while simultaneously curbing overfitting problems [18]. Given these advantages, LightGBM has found widespread use across industries. However, the selection of hyperparameters directly influences the optimal results due to the many parameter settings involved. Commonly used parameter optimization methods, such as manual search and grid search, are inefficient and unproductive, or have a rapidly decreasing efficiency as the number and range of parameters increase, which often results in insufficient precision. To combat this issue, swarm intelligence algorithms offer a novel solution [19].

The goal of the swarm intelligence algorithm is to mimic the actions of biological swarms and other natural events. This algorithm was derived from particle swarm optimization and has since given rise to other algorithms [20], such as the ant colony algorithm, the whale optimization algorithm and gray wolf optimization [21,22,23,24,25]. These algorithms are known for their simple execution, minimal parameters and high degree of flexibility. A unique method for group intelligence optimization was presented in 2020, and is called the sparrow search algorithm. The underlying idea of the program is inspired by the for-aging and early warning behaviors of sparrow groups [26]; it exhibits robustness and high optimization ability, but also some shortcomings. Hence, the optimization algorithm has integrated multiple strategies to enhance its effectiveness [27,28,29,30,31,32,33,34]. For instance, [35] used SSA to adjust the convolutional neural network’s initial parameters, attaining a classification accuracy of 98% for pipeline leak detection. Similar to this, [36] utilized logistic mapping and mutation operators for enhancing SSA’s global optimization capability, as well as to provide the supervision mechanism parameters of the stochastic configuration network, resulting in a substantial improvement in the network’s performance. Literature [37] employs a discrete decoding strategy, reverse learning strategy and control step size factor to enhance the Sparrow search algorithm. The enhanced algorithm is utilized to optimize LGB and perform industrial control intrusion detection. The sparrow algorithm’s efficacy is demonstrated through the optimization efficiency and accuracy of the parameter model.

Some researchers have validated the performance of other machine learning models. For example, literature [38] used a mix of the FFA and PSO algorithms to improve the SVM. To calculate dissolved oxygen, several combinations of pH, T, EC and Q were employed as inputs. When compared to SVM-PSO, SVM-FFA and pure SVM, the model outperformed the others significantly. Reham R. et al. [39] improved RVFL and the RVM using the QANA and AHA, respectively, resulting in the fusion models RVFL-AHA, RVFL-QANA, RVM-AHA and AVM-QANA. These models’ ability to simulate potential evapotranspiration was compared. The findings indicate that the AHA and QANA have a substantial impact on efficiency. Li et al. [40] suggested a short-term weather forecasting model using wavelet denoising and catboost. The model outperforms deep learning methods such as LSTM and Seq2Seq in terms of its accuracy and convergence time. These studies provide two key takeaways: (i) when compared to deep learning models, machine learning models have similar fitting accuracy and faster convergence rates; (ii) hybrid models that combine metaheuristic techniques and machine learning models outperform ordinary single models. As a result, in this study, LightGBM is used as a benchmark, and it is combined with an augmented sparrow search algorithm to create a hybrid model.

To address the shortcomings of the original SSA, this research presents an enhanced version that minimizes the issues of insufficient individual discreteness and slips into a locally optimal solution. The LASSA incorporates goodpoint set initialization, Lévy flight and an adaptive weights strategy. Given the potential benefits of LightGBM on real data, the improved algorithm optimizes the hyperparameters to construct an atmospheric PM_2.5 prediction model called LASSA-LightGBM. Compared with the forecasting results of IS-SA-LGB and parameter tuning models, the experimental results prove the superiority and dependability of this study.

Our goal is to present a novel hybrid model, the LightGBM, optimized based on the augmented sparrow search algorithm. LASSA-LGB is proposed and compared to ISSA-LGB, SSA-LGB and LGB. To the best of our knowledge, no one has documented combining the SSA enhanced with LGB for haze prediction. This is the primary motive for our research. The remainder of this paper is structured as follows: Section 2 initially provides the original sparrow search algorithm, and then discusses and analyzes the enhancement technique of this study. The algorithm workflow, data processing methodologies and prediction model are also described. Section 3 gives the performance of the method on benchmarking functions. Also, the experimental findings on haze prediction and analytical debates are provided in this part. Section 4 presents the conclusion of the work.

2. Materials and Methods

2.1. Original Sparrow Search Algorithm

The SSA algorithm is a novel metaheuristic algorithm based on the sparrow group’s habits. Many animals utilize the collective wisdom of their populations to locate food and elude predators in the wild, and sparrow populations are no different. Within the sparrow population, diverse individuals exhibit distinct feeding behaviors. Based on how much food they consume, they may be divided into three groups: scouts, followers and discoverers. The finder searches for food and communicates its location to the rest of the group. The followers not only receive food using the knowledge supplied by the discoverer, but also continually watch other sparrows struggle for food. Additionally, a particular fraction of the sparrow population is chosen to scout and sound alerts throughout the foraging process. When the alert value reaches the safety line, the sparrows will start to move. This permits sparrow groups to locate more food by continually updating their whereabouts.

It is vital to remember that the scale of seekers and followers stays consistent throughout time. However, only the identity of individual sparrows varies with their fitness level. The following formula shows how the finder’s location updates:

X_{i, j}^{t + 1} = {\begin{array}{l} X_{i, j}^{t} \cdot \exp (\frac{- t}{α \cdot G_{\max}}), R_{2} < S T \\ X_{i, j}^{t} + λ \cdot ρ, R_{2} ⩾ S T \end{array}

(1)

G_max is the utmost count of recursions, while t is the present recursion number. X represents the i-th sparrow’s location information at iteration number t. α is a random value between 0 and 1. The random number λ conforms to the normal distribution. ρ is an all-1 matrix in 1 × d dimensions. R₂ is the present alert value, which is a stochastic number the same as α but can take on the value zero. ST is the preset secure threshold.

Some of the birds in the group with high fitness become discoverers, and the others become followers. The following method is used to update the followers’ location:

X_{i, j}^{t + 1} = {\begin{array}{l} λ \cdot \exp (\frac{X_{worst}^{t} - X_{i, j}^{t}}{i^{2}}), i > \frac{P}{2} \\ X_{p}^{t + 1} + | X_{i, j}^{t} - X_{p}^{t + 1} | \cdot ψ \cdot ρ, i ⩽ \frac{P}{2} \end{array}

(2)

X_p is the best foraging place currently sought by the discoverer. X_worst is the worst position in the current search area; that is, the minimum fitness value. ψ represents a 1 × d-dimensional matrix, wherein all of the units are one or negative one randomly.

To prevent predators, certain sparrows are designated as scouts while foraging. When the scouts detect a predator, they sound the alarm, and the finder then takes the remaining individuals to another safe location. The following formula is used to update the scouts’ position:

X_{i, j}^{t + 1} = {\begin{array}{l} X_{best}^{t} + η \cdot | X_{i, j}^{t} - X_{best}^{t} |, F t_{i} > F t_{G} \\ X_{i, j}^{t} + κ \cdot \frac{| X_{i, j}^{t} - X_{worst}^{t} |}{F t_{g} - F t_{w o r s t} + ε}, F t_{i} = F t_{G} \end{array}

(3)

X_{b e s t}^{t}

is the best position among all current individuals.

η

and

κ

are stochastic numbers that control the direction and length of the sparrow’s movement. ε is a positive number approximating 0, which is used to guarantee the denominator is nonzero. The Ft_g and Ft_worst are optimal and minimal global fitness, respectively. The Ft_i denotes the fitness of the individual at present.

2.2. LASSA

According to prior research, the initial version of the SSA exhibits faster convergence and greater robustness. Nonetheless, it is also more prone to local optima and lower solution accuracy. The initialization method for SSA, a basic stochastic approach, results in heavy dependence on the distribution range of the population at initialization. More importantly, in the subsequent rounds of the algorithm, the individual sparrows quickly congregate around the current optimum position. To enhance the capability of the SSA, this paper proposes an improvement strategy for the LASSA in the following aspects: Firstly, a goodpoint set is introduced for population initialization to improve population diversity and ergodicity. Secondly, the LASSA updates the discoverer’s position utilizing the fusion of Lévy flights, while the follower’s position update approach adopts adaptive dynamic change weights. The Cauchy variation perturbation approach is used to improve the precision of the best solution. In conclusion, the LASSA enhancement specifics are as follows.

2.2.1. Goodpoint Set

The starting population is a key factor influencing the algorithm’s speed and con-vergence. In order to establish the population of individuals, the original SSA algorithm uses a random initialization approach, which frequently fails to guarantee the variety of the incipient population or the individual fitness value. To combat this issue, many optimization algorithms utilize chaotic sequence mapping. Chaotic mapping is characterized by high dependence on initial values and instability. It has been demonstrated that the distribution of Tent mapping is more evenly distributed. Nonetheless, there are brief and unstable cycles, and it is apt to become trapped by immovable points [41]. To enhance the algorithm’s global search capability, this study introduces a uniformly distributed set of goodpoints and incorporates it into the population initialization. Figure 1 shows the homogeneity of the populations formed using different initialization methods. Hua Luogeng first proposed the goodpoint set [42]. The following defines it in its simplest terms:

Assume G_D is a unit cube in D-dimensional Euclidean space, where 0 ≤ x_i ≤ 1, (i = 1, 2, …, D), such that r belongs to G_D if the set of points is of form:

\begin{array}{l} P_{n} (k) = {({r_{1}^{(n)} * k}, \dots, {r_{i}^{(n)} * k}, \dots, {r_{D}^{(n)} * k}) | 1 \leq k \leq n} \end{array}

(4)

The set’s deviation is

Φ (n) = C (r, ε) n^{- 1 + ε}

, wherein C(r, ɛ) is an invariant only related to r and ε (ε is a very small positive number). The cluster of points P_n(k) is referred to as the set of goodpoints, and r is the goodpoint. When using the goodpoint set specifically, it is generally taken as a part of the base of the real subcircular domain r; that is:

r_{k} = {2 \cos \frac{2 π k}{P}}, 1 \leq k \leq D

(5)

where P is the minimum prime value that satisfies (P − 3) ≥ 2 ∗ D. Mapping to the seek range of this paper is

X_{i} {(j) = (ub}_{j} {- lb}_{j}) \cdot {{r}_{j}^{(i)} \cdot {k} + lb}_{j}

, the top and lower borders of the search space are indicated by ub and lb.

In this study, we utilize the aforementioned formula to establish a goodpoint set for enhancing the population initialization of SSA. As demonstrated by the properties de-scribed earlier, the goodpoint set displays significantly reduced deviation compared to a random distribution. It can enhance the diversity and ergodicity of the population while ensuring algorithmic stability. Moreover, it eliminates local extremum attraction and precludes premature convergence.

2.2.2. Lévy Flight

In the SSA algorithm, the discoverer refers to the individual occupying the most abundant area within the entire population. The algorithm’s solution ability is heavily contingent upon the discoverer’s behavior. Analyzing the original Formula (1), it can be determined that the random numbers that are uniformly distributed make it difficult for the discoverer to move out of the current range. To solve this issue, this paper introduces the Lévy flight strategy to improve the discoverer’s position update method.

Lévy flight is a concept used by optimization researchers to control an algorithm’s exploratory and exploitative abilities. The motion trajectories of numerous organisms in nature follow a Lévy flight pattern where the step size’s probability distribution is heavily tailed. Figure 2 shows the flight trace. Therefore, it can explore a bigger space than random wandering in the same number of steps. Combining the Lévy flight algorithm with the sparrow algorithm can enhance diversity among individuals and improve global search capabilities. Additionally, this approach enables an easier escape from the local extreme values.

As the distribution of Lévy is a highly complex calculus problem, the Mantegna method is typically used to generate a randomized step that follows a Lévy distribution [43], which is formulated as shown below:

L e v y = \frac{μ}{| v |^{\frac{1}{τ}}}

(6)

θ = {\frac{Γ (1 + τ) \cdot \sin \frac{π τ}{2}}{Γ (\frac{1 + τ}{2}) \cdot τ \cdot 2^{\frac{τ - 1}{2}}}}^{\frac{1}{τ}}

(7)

In the above equation, μ obeys a distribution with mean of 0 and variance of

θ^{2}

. ν has the same distribution as

θ

but the variance is 1.

τ

is a constant taking the value of 1.5. The following equation is used to update the posotion of the finder while using the Lévy flying method:

X_{i, j}^{t + 1} = {\begin{array}{l} X_{i, j}^{t} \cdot (1 + L e v y), R_{2} < S T \\ L e v y \cdot X_{i, j}^{t} + λ \cdot ρ, R_{2} ⩾ S T \end{array}

(8)

Introducing Lévy flight into the position update formula for the Discoverer’s large-scale search involves multiplying the length of Lévy flight by the finder’s position to obtain a step length. The finder’s new position is determined by adding this step length to their existing position.

2.2.3. Adaptive Nonlinear Decreasing Weights

The majority of the sparrow population is occupied by followers, who move closely behind the finders, resulting in a decreased ability to independently search. When i is greater than half of n, global search capability exists, but it is hindered by the range of the worst individual’s position, as well as its own position, leading to limited wide-range search capability. When i is less than half of n the updating mechanism will quickly bring the follower closer to the discoverer, resulting in a reduction in diversity. This research suggests a dynamic adaptive technique that makes use of fluctuating weights in the phase that follows to overcome this problem. The formula shown below illustrates how the variable weight feature is utilized to update each unique location of the follower sparrow:

X_{i, j}^{t + 1} = {\begin{array}{l} ω \cdot \exp (\frac{X_{worst}^{t} - X_{i, j}^{t}}{i^{2}}), i > \frac{N}{2} \\ X_{p}^{t + 1} + ω \cdot | X_{i, j}^{t} - X_{p}^{t + 1} | \cdot ψ \cdot ρ, i ⩽ \frac{N}{2} \end{array}

(9)

ω = \frac{e^{(1 - {t / G}_{\max})} - e^{- (1 - {t / G}_{\max})}}{e^{(1 - {t / G}_{\max})} + e^{- (1 - {t / G}_{\max})}}

(10)

The weight ω in the formula relates solely to the number of iterations, and excludes other parameters while remaining impervious to influences from the search area. With the increasing iterations, the weight value progressively reduces, guaranteeing that optimal solutions are not overlooked and that the search is not prematurely trapped in a local minimum. Considering the algorithm’s worldwide search and limited precision, this study proposes an adaptive decrement weight that modifies the follower search technique after each cycle. This modification prevents early convergence to a local optimum and enhances accuracy in later stages of convergence. The utilization of this weight additionally improves the follower flexibility and augments the granularity when conducting searches.

2.2.4. Cauchy Mutation

After thorough analysis of the SSA’s search and update formula, it becomes evident that if the individual optimum presents a local optimum, other members of the population will quickly converge to that position. In order to deal with this situation, individuals are typically subjected to mutation operations. Because the Gaussian distribution is more concentrated, the probability of taking values near the mean value of 0 is higher. The Cauchy distribution, in comparison, has a much wider range of values and accepts big values with some probability. In this study, the sparrow with the greatest maximal fitness value is chosen for variation using the Cauchy variation approach [44]. The Cauchy distribution’s probability density function is:

f (x) = \frac{1}{π} [\frac{γ}{{(x - x_{0})}^{2} + γ^{2}}]

(11)

In this paper, the standard Cauchy distribution, with x₀ = 0 and γ = 1, is taken. The standard Cauchy distribution has high stability. To further improve the algorithm solution performance, after the above mutation perturbation, the fitness values are compared before and after the mutation, and the greedy method is used to select a better fitness value to confirm the update position.

X_{p b e s t}^{t + 1} = X_{_{pbest}}^{t} + c a u c h y (0, 1) \cdot X_{_{pbest}}^{t}

(12)

X_{_{pbest}}^{t + 1} = {\begin{array}{l} X_{_{pbest}}^{t + 1}, f (X_{_{pbest}}^{t + 1}) \geq f (X^{t}_{pbest}) \\ X_{pbest}, o t h e r w i s e \end{array}

(13)

where X_pbest denotes the location of the individual’s optimal value.

The following are the precise LASSA algorithm implementation steps:

(1): Initial settings. The values of the parameters were determined, including the total population quantity P, the maximum number of iterations Gmax, the proportion of three types of sparrows in the population Pnum and Snum, the safety threshold ST, etc. Meanwhile, obtain the beginning position of the populations using Equation (5).
(2): Each individual sparrow’s fitness value was determined and sorted. Choose the worst fitness value and the current ideal fitness value, together with the relevant location information.
(3): Choose the top PD*P sparrows from the sparrows ordered by fitness values to serve as discoverers, then update the locations in accordance with Equation (8), and determine the optimal position of the discoverer.
(4): According to the Snum ratio, investigators are chosen at random. Separately, up-date the locations of additional sparrows.
(5): Determine the fitness value and, using the Cauchy variation approach, disrupt the present optimal solution to obtain a new one. Then, determine the position update according to the greedy rule.
(6): If the iteration condition has been satisfied, choose to move on to the next step; otherwise, repeat steps (3) through (6).
(7): The algorithm is finished, and the best outcome is displayed.

2.3. LightGBM

LightGBM is a new machine learning approach based on gradient-boosting decision trees. As a member of the boosting family of ensemble learning, it delivers both high accuracy and rapid speed. LightGBM introduces two innovative techniques: exclusive feature bundling (EFB) and gradient-based one-side sampling (GOSS). Using GOSS, subjective evaluations can be excluded by eliminating samples with a minimal training error. In EFB, bundling mutually exclusive data features can reduce complexity without compromising split-point accuracy. While ensuring prediction accuracy, the traditional gradient decision tree can greatly improve its learning speed. The LGB tree employs a splitting strategy that incorporates depth restrictions during the growth process. Growth and pruning are the two stages of the tree model construction process. When the growth stage begins, the best variables and split points are selected from the input variables to maximize the fitness value after splitting. The pruning phase involves identifying the optimal split point and removing redundant branches that may impede the model accuracy. When utilizing tree models for predictive tasks, it is crucial to establish an appropriate combination of hyperparameters. Variant parameters lead to distinct model calculation results, and most research utilizes empirical methods to modify them, which can hinder achieving the best model regression outcomes. Hence, optimization algorithms are required to optimize the parameters. Most of the parameters of the LGB model are pre-fixed empirically. The other most important parameters that need to be carefully tuned by us are shown in Table 1, below.

2.4. Framework of LASSA-LGB

Apply the LASSA presented in this paper to optimize the max_depth, learning_rate, num_leaves and subsample parameters of LGB. The maximum depth of the tree structure is indicated by max_depth. The complexity of the tree model increases as the value rises, yet too much depth may result in overfitting. The acceptable range for depth is between 3 and 7. The term num_leaves denotes the quantity of leaf nodes, which, together with the maximum depth, controls the shape of the tree. The range for num_leaves is between 1 and 127. Meanwhile, learning_rate represents the learning rate. When updating the tree model’s leaf node, the weight value is decreased by the learning rate, leading to a more stable model. The weight value of the previous generation is reduced within a range of [0.01–0.15]. Subsample determines the rate of subsampling. The range of the subsample is [0.1–1]; it regulates whether some training features are repeatedly extracted in iterations. The following steps can be taken when optimizing LGB parameters using LASSA:

Define the function of fitness. In the Sparrow algorithm, the fitness value for an individual bird is calculated by determining the mean square error. Individuals with smaller fitness values are ranked higher. The goal is to reach the global error minimum within the maximum iteration limit.
To initialize the Sparrow algorithm population, set the basic parameters like maximum iteration limit, population size, discoverer ratio and safety threshold. The top and lower bounds of the sparrow’s search space should define the ideal parameter range of the LGB model.
Determine the fitness values and organize them ascendingly. The value at the top of the list indicates the best fitness and the matching spot in the training set sample.
Update the position iteratively. Update the population position information using the aforementioned formula. And determine the best position based on the greedy approach.
Establish the stopping criterion. If it is not met, repeat steps 3–4. Eventually, The algorithm comes to a conclusion and generates the ideal parameter compound.
The model should be updated using the parameters gained in the previous stage, and the test set output values should be predicted. Calculate additional metrics for error assessment.

Figure 3 displays the training and prediction procedure of the model.

2.5. Dataset and Merit Criteria

To examine the practicality of the LASSA in predicting haze, this paper examines multisite air quality monitoring datasets from Beijing [45]. As there is so much missing data from earlier years, we selected directly from 2014. In total, the study case covers monitoring data from 2014 to 2017 for three sites and comprises air quality variables PM_2.5, PM₁₀, SO₂, NO₂, CO and O₃, together with climatic factors including ground temperature, atmospheric pressure, dew point temperature and rainfall. Each row in the original data represents an hour of data. Due to the equipment status and other factors, some data are missing. This study employs average interpolation to fill in the gaps. Given that PM_2.5 concentrations are influenced by both air pollutants and meteorological factors, these data are distributed across different numerical ranges. Therefore, the data are standardized prior to model training and prediction. Furthermore, we used the quartile deviation approach to examine the data’s outlier situation, and points that were more than 1.5 times the IQR were considered extreme outliers. Figure 4 shows the distribution of the data. These irrational excessive points were eliminated. The anomalous noise will have a significant impact on the model accuracy, particularly in iterative procedures. Errors will accumulate over time, resulting in deviations in the model output values. Handling outliers and missing values will aid in improving the model accuracy. This paper utilizes the concentration data from the preceding 24 h to forecast the PM_2.5 concentration for the upcoming hour.

It is a common observation that atmospheric activity is weaker during winter, resulting in the accumulation of particulate matter closer to the ground. Conversely, summer’s stronger atmospheric activity leads to faster diffusion and transfer of particulate matter in the air. Notably, the atmospheric haze concentration values exhibit distinctive distribution characteristics across seasons. To account for this, we will forecast separately through time segments, with each quarter serving as one data segment. Specifically, this study collected annual monitoring data from the Changping site between March 2016 and February 2017, which were then divided into four smaller data sets, numbered S1–S4. Monitoring readings from the gucheng station, numbered S7–S10, were added as validation from March 2014 to February 2015. The air quality data from the Aotizhongxin station during June–August 2015 and December–February of the following year were used as supplementary, they are numbered S5–S6. Regression prediction models were established for these 10 different time periods to predict the PM_2.5 concentration values. Ultimately, 80% of the subsamples were chosen as the training set, and 20% as the test set. Each round of model development was subjected to 5-fold cross-validation.

To assess the prescient power of the LASSA-LGB model on the atmospheric PM_2.5 con-centration, this paper utilizes various statistical indicators. These performance metrics comprise root mean square error (RMSE), mean absolute error (MAE), mean absolute per-centage error (MAPE) and the Theil Index (Til). Initially designed in economics to gauge income inequality among regions, the Theil Index serves as a key indicator. The greater the value, the larger the gap between the rich and poor. This article uses the value to represent the difference in the trend between the predicted and actual data. Conversely, the closer the number is to zero, the more accurate the prediction is. The formula for calculating these indicators is provided below:

TIL = \frac{\sqrt{\frac{1}{M} \sum_{t = 1}^{M} (Y (t) - \hat{Y} (t))^{2}}}{\sqrt{\frac{1}{M} \sum_{t = 1}^{M} Y^{2} (t)} \sqrt{\frac{1}{M} \sum_{t = 1}^{M} {\hat{Y}}^{2} (t)}}

(14)

W I A = 1 - \frac{\sum_{t = 1}^{M} {(Y_{i} - {\hat{Y}}_{i})}^{2}}{\sum_{t = 1}^{M} {(| {\hat{Y}}_{i} - \bar{Y} | + | Y_{i} - \bar{Y} |)}^{2}}, 0 \leq W I A \leq 1

(15)

In the formula, M is the quantity of data rows, Ŷ indicates the model prediction value and Y is the true value of the data. MAE shows the difference between the true value of the sample and the predicted value of the model, and RMSE shows the stability of the model output. The better the model, the lower the values of these two indicators.

3. Results and Analysis

3.1. Benchmark Function Experiment

This work conducts simulation testing on 10 regularly used benchmark functions with diverse properties to determine the solution impacts and algorithm efficiency of the LASSA. The performance is analyzed and compared with particle swarm optimization (PSO), gray wolf optimization (GWO), the whale optimization algorithm (WOA), the SSA and the ISSA. Specific function details are provided in the table.

The article’s experimental environment is anchored on an Intel(R) Core(TM) i7-10870H CPU @ 2.20 GHz and runs on Windows 10. The Python programming language was utilized to execute the implementation. The specific parameter settings for each algorithm are displayed in the Table 2 to harmonize the experimental environment and guarantee the reliability and authenticity of the experimental outcomes. For the GWO algorithm, the convergence variable a linearly decreases in the range [0, 2] as the number of iterations increases. The variables r1 and r2 are random numbers with a value range of [0, 1]. The learning factor parameters C1 and C2 are both set to 2 for PSO. For the WOA, the value of the constant b, which helps to specify the form of the line of volution motion during the assault, is set to 1. For the SSA and its enhanced algorithms, PD signifies the discoverer proportion, SD signifies the vigilant proportion and ST signifies the safety threshold. These parameters are rigorously established to have identical values, whereby the discoverer proportion is 0.2 and the vigilant proportion is 0.1. In addition, the safety threshold is set at 0.8.

Table 3 displays information about the test functions. F1–F5 are unimodal benchmark functions with multiple dimensions. F6–F9 are multidimensional and multimodal benchmark functions. F10 is a low-dimensional multi-modal benchmark function that serves to test the algorithm’s performance on low-dimensional problems. The unimodal function has a single global optimum value, which may be used to demonstrate the algorithm’s fundamental search capability and quick convergence. The multimodal function possesses a global optimum and multiple local extrema, making it an appropriate tool for emphasizing the algorithm’s global search ability. F1–F9 are tested in 30 dimensions to evaluate the algorithm’s performance in higher dimensions. To ensure fairness, the quantity of the population and the maximum number of iterations are set to the same values of 30 and 500, respectively. To eliminate any accidental errors, we independently executed each algorithm 30 times on various test functions. We determined the top value (Best), worst outcome (Worst), average outcome (Avg) and standard deviation (Std). The algorithm’s optimization performance improves as the average value decreases. Similarly, a smaller standard deviation indicates greater algorithmic stability.

It is evident from the results from the testing function, shown in Table 4, that when subjected to similar constraints, the LASSA algorithm outperforms the other algorithms for all 10 test functions. Notably, for F7 and F9, the solution effect reaches 100%. In addition to F3, the LASSA and IS-SA may also calculate the theoretically best value for single-peak test functions. However, the Std and Avg obtained using the LASSA are either the same or lower. These values far surpass those of PSO, GWO and WOA. For function F2, while none of the six algorithms found the theoretical value, the LASSA’s result was the closest and had a lower standard deviation. In the case of functions F7 and F9 in the multimodal function, the three SSA-type algorithms achieved the theoretical value with almost no error. As for function F6, the LASSA’s various indicators were significantly better compared to the comparison algorithm. With regard to function F8, the SSA, ISSA and LASSA found the same solution, while the WOA found an equal optimal value. On the contrary, for the low-dimensional multimodal function F10, only PSO aligns closest with the theoretical value. The LASSA follows next, but its mean and standard deviation are still the smallest. It is proven that the LASSA’s optimization accuracy remains high for low-dimensional functions. Among these test functions, the LASSA found solutions that were better than or the same as the SSA, implying that the LASSA’s enhancement strategy effectively boosted the algorithm’s optimization capability. In multiple functions, the standard deviation of the LASSA can be significantly lower than that of contrast algorithms, suggesting that the stability of this algorithm surpasses that of its counterparts.

The convergence trend graphs (Figure 5) provide a more intuitive view of the convergence and each algorithm’s ability to overcome local stagnation. The abscissa axis shows the count of iterations, while the longitudinal axis displays the fitness number. For ease of observation, the vertical axis is shown in logarithmic values. A shift on the x-axis while the y-value stays the same indicates that the algorithm has either hit the global optimal value or has fallen into a locally optimal value. According to the figure, F1~F5 are high-dimensional single-peak functions, and the LASSA decreases rapidly with an exponential curve on these functions and searches for the optimal value the fastest. On the high-dimensional multi-peak functions F6~F9, the LASSA is even more overwhelming, descending to the vicinity of the optimal value and almost not falling into a stagnant state. This shows the LASSA’s exceptional ability to escape local optima. On the low-dimensional function, this algorithm also searches steadily for the location of the ideal value, outperforming all of the other algorithms in terms of accuracy.

In conclusion, via the processing of numerous benchmark function comparison tests, further analysis reveals that the LASSA has a powerful seek capability and can adapt to solving problems across multiple environments. First, the initial fitness value of the LASSA is the smallest, which indicates the highest initial population quality. Furthermore, the slope of the curve is the largest among multiple functions, which shows that Lévy flight and adaptive nonlinear inertia weights can augment the search speed. Thus, the feasibility and effectiveness of the LASSA are underscored.

3.2. Results and Analysis

Default parameters were selected for the base LGB model. The population quantity and count of iterations were set to 30 and 100 for both the SSA and LASSA. The other parameters remain unaltered. An optimized model is established to forecast the concentration of PM_2.5 by optimizing the max_depth, learning_rate, num_leaves and the subsample.

In order to show the efficacy of the LASSA-LGB, this paper uses three comparison models; namely, the base LGB, SSA-LGB and ISSA-LGB models.

To obtain the fitting curve, the projected values of each model and the actual values of the original data are produced side by side. Next, determine each error index by calculating the difference between the two values. It can be inferred from the form that the accuracy differs between several models (Table 5). More significantly, the values of the PM_2.5 concentration exhibit divergent fluctuation trends across each season. Even so, the LASSA-LGB model’s prediction results remain undisturbed and display a good performance across various data sets. For instance, in the data S5, the LASSA-LGB model showed an MAE value of 35.2, which is 9.4% lower than that of LGB. Additionally, among the four models, LASSA-LGB exhibited the smallest error value, leading to a significant improvement in the prediction accuracy. From the perspective of output stability, the RMSE value of the LASSA-LGB model is 61.8, with the TIL coefficient at 0.06 and the WIA value at 0.97. They are, respectively, 14.4%, 14.2% and 0.8% lower than LGB. The predicted value’s absolute deviation from the actual value is small, and it can learn mutation data information effectively. Moreover, it can change precisely when there is a sharp fluctuation. Looking at the comparison of the output statistical results for all of the data, it can be observed that the RMSE index of LASSA-LGB is lower than that of single LGB by 3% to 16%. Furthermore, the percentage error and Theil index of LASSA-LGB are reduced by about 1.1% to 17.9%, respectively. Through the forecasting outcomes of the abovementioned practical application of different models, it is evident that the LASSA possesses certain advantages in engineering optimization problems. The LASSA-optimized LGB model exhibits an outstanding performance and surpasses the basic single model and other modified models in predicting haze. This model has robust learning capabilities and a high fault tolerance rate; it can efficiently effect the desired optimization.

Compared to the PSO and GWO algorithms, the SSA already exhibits certain advantages. The algorithm’s capacity to evade local extreme values is improved by the mutation processing and disturbance applied to the optimal value in the LASSA presented in this paper. Coupled with the introduction of a dynamic weight strategy, the algorithm can more comprehensively and efficiently explore.

In general, the LASSA demonstrates an exceptional optimization speed and convergence accuracy across multiple processes. Both the multimodal and unimodal functions exhibit strong means, optimal values and standard deviations. The LASSA boasts faster convergence in the unimodal benchmark function and greater capability for avoiding local optima in the multimodal benchmark function. All of the above provide conclusive evidence of the algorithm’s stability and effectiveness.

The LASSA algorithm excels at optimizing the performance, but it does have some limitations. Specifically, the calculation workload increases, leading to longer processing times and suboptimal exploration of individual directions. Further research could ex-amine integrating other intelligent algorithmic mechanisms into the SSA. To achieve this, strategies such as learning from the surrounding individuals and broadening the scope of the search could be added. Various influencing factors, including pollutant emissions and land use data, will be considered when predicting haze in the future.

Figure 6 shows the scatterplot comparison of the LGB-based models’ outcomes in forecasting PM_2.5 concentration. The LASSA-LGB model is without a doubt the one that comes the closest to the idealized situation, in which the fitted linear equation should be y = x. The proposed LASSA-LGB model has a significant advantage over the other three models in estimating the PM_2.5 concentration values, as shown by the equations and R² values in the figure. Its fitted straight lines have slopes and intercepts that are closer to 1 and 0, and greater coefficient of determination values. Additionally, compared to the single LGB model, the prediction accuracy of the SSA-LGB and ISSA-LGB is better in terms of the straight line and coefficient values. The standard deviation, correlation and centered root-mean-square error are the major pieces of information that can be displayed in the Taylor diagram. The LASSA-LGB model outperforms the other comparison models in all three criteria, according to the data shown in the Taylor diagram in Figure 7. In comparison to the ISSA-LGB, SSA-LGB and regular LGB models, the fusion model has the smallest difference in standard deviation between the predicted and true values, the highest correlation and the smallest central root-mean-square error. The distribution of the data is mostly depicted using the violin plot. Figure 8 illustrates how closely related the distribution states of the output values from the LASSA-LGB model and the true values are. The two values have nearly identical means and medians, and the shape of the boxes are, likewise, the closest. The comparisons in Figure 6 through 8 clearly demonstrate how much LASSA-LGB increases the standard LGB model’s forecast accuracy. Table 6 shows the specific percentage of the prediction accuracy improvement of LASSA-LGB compared to LGB.

Awad et al. [46] used nu-SVR to forecast short-term haze exposure in British Columbia, allowing for geographical and temporal pattern variations, and their final model had a coefficient of determination of 0.79. Li et al. [47] used a generalized summation model to quantify the nonlinear relationship between predictors, and trained several models using a bagging strategy to reduce bias, with experimental results achieving a r2 of 0.89. Literature [48] created a predicted model of daily PM_2.5 values in the London area using input satellite aerosol optical thickness, land use and meteorological data. They employed an integrated strategy that included Random Forest, Gradient Enhancer and k Nearest Neighbor. The r2 value after cross-validation was 0.828. Wu et al. [49] explored several machine learning algorithms to predict seasonal PM_2.5 concentrations—including support vector methods, decision trees and Long Short Term Memory networks—with LSTM outperforming the other models; it had a consistency coefficient of 0.52 and a rmse of 19.58. In our investigation, the LASSA-LGB model produced R² values of up to 0.96 for monitoring data from a large number of stations. The coefficient of consistency reached 0.97. This demonstrates that our proposed model predicts haze with excellent accuracy.

4. Conclusions

The Sparrow Search Algorithm is a cutting-edge metaheuristic optimizer offering an impressive performance, albeit with some limitations. Concerning the issues of uneven population distribution and susceptibility to local extreme values, this article introduces a multi-strategy-enhanced Sparrow Search Algorithm, named the LASSA. The algorithm employs several key strategies, including the use of consistently distributed goodpoint sets, Lévy flights, adaptive nonlinear decreasing weights and Cauchy’s variation perturbation. Goodpoint sets that are consistently distributed are utilized to initialize the population. By filling the search range with individuals, the population diversity is improved, thus increasing the efficiency of subsequent optimization searches. The algorithm gives the discoverer’s movement mode Lévy flight, thus elevating the discoverer’s movement path and searchability. The adaptive inertia weights for followers then effectively balance the global exploration and local exploitation. Additionally, the Cauchy variation overcomes the problem of excessive individual aggregation, helping to prevent sticking to local extreme values and boosting the accuracy of the solution. The powerful solving ability of the LASSA has been verified using ten benchmark functions, which it outperformed in solving the accuracy, rate of convergence and robustness.

The LightGBM model is utilized for predicting atmospheric PM_2.5 while optimizing its parameters through the LASSA. As a result, the LASSA-LGB regression prediction model is established. By investigating monitoring data from various times and regions, the prediction curve of the haze concentration indicates that the model exhibits a strong learning ability, a high fitting degree and significantly improved accuracy and stability. It has been demonstrated that the LASSA algorithm significantly enhances the optimization accuracy. The enhanced LASSA-LGB model possesses strong practicality and is well-suited to forecasting PM_2.5 concentrations, which can serve as a point of reference for predicting trends in air quality alterations.

The hybrid model developed in this study by tuning LGB with the LASSA is useful to haze prediction using minimal data. However, it is unclear whether it is still valid in long-term data. Furthermore, the study’s input influence components are rather limited. Factors such as humidity and land use, for example, have a significant impact on the generation and alteration of PM_2.5. More factors should be examined in future studies. Moreover, the importance of each input data’s influence on the outcomes needs to be examined in the future research.

Author Contributions

Conceptualization, Z.L.; Formal analysis, K.Z. and L.W.; Funding acquisition, Z.L.; Investigation, X.L.; Methodology, X.L.; Supervision, Z.L.; Validation, K.Z. and L.W.; Visualization, X.L.; Writing—original draft, X.L.; Writing—review and editing, Z.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant number 42261077.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The dataset is publicly available at https://archive.ics.uci.edu/dataset/501/beijing+multi+site+air+quality+data (accessed on 25 December 2022).

Conflicts of Interest

The authors declare no conflict of interest.

References

Megaritis, A.G.; Fountoukis, C.; Charalampidis, P.E.; Denier Van Der Gon, H.A.C.; Pilinis, C.; Pandis, S.N. Linking Climate and Air Quality over Europe: Effects of Meteorology on PM_2.5 Concentrations. Atmos. Chem. Phys. 2014, 14, 10283–10298. [Google Scholar] [CrossRef]
Di, Q.; Koutrakis, P.; Schwartz, J. A Hybrid Prediction Model for PM_2.5 Mass and Components Using a Chemical Transport Model and Land Use Regression. Atmos. Environ. 2016, 131, 390–399. [Google Scholar] [CrossRef]
Belachsen, I.; Broday, D.M. Imputation of Missing PM_2.5 Observations in a Network of Air Quality Monitoring Stations by a New kNN Method. Atmosphere 2022, 13, 1934. [Google Scholar] [CrossRef]
Lee, Y.S.; Choi, E.; Park, M.; Jo, H.; Park, M.; Nam, E.; Kim, D.G.; Yi, S.-M.; Kim, J.Y. Feature Extraction and Prediction of Fine Particulate Matter (PM_2.5) Chemical Constituents Using Four Machine Learning Models. Expert Syst. Appl. 2023, 221, 119696. [Google Scholar] [CrossRef]
Zhao, R.; Gu, X.; Xue, B.; Zhang, J.; Ren, W. Short Period PM_2.5 Prediction Based on Multivariate Linear Regression Model. PLoS ONE 2018, 13, e0201011. [Google Scholar] [CrossRef]
Zhu, H.; Lu, X. The Prediction of PM_2.5 Value Based on ARMA and Improved BP Neural Network Model. In Proceedings of the 2016 International Conference on Intelligent Networking and Collaborative Systems (INCoS), Ostrawva, Czech Republic, 7–9 September 2016; pp. 515–517. [Google Scholar]
Ehteram, M.; Ahmed, A.N.; Khozani, Z.S.; El-Shafie, A. Graph Convolutional Network—Long Short Term Memory Neural Network- Multi Layer Perceptron- Gaussian Progress Regression Model: A New Deep Learning Model for Predicting Ozone Concertation. Atmos. Pollut. Res. 2023, 14, 101766. [Google Scholar] [CrossRef]
Chang, Y.; Chiao, H.; Abimannan, S.; Huang, Y.; Tsai, Y.; Lin, K.-M. An LSTM-Based Aggregated Model for Air Pollution Forecasting. Atmos. Pollut. Res. 2020, 11, 1451–1463. [Google Scholar] [CrossRef]
Adnan, R.M.; Mostafa, R.R.; Islam, A.R.M.T.; Kisi, O.; Kuriqi, A.; Heddam, S. Estimating Reference Evapotranspiration Using Hybrid Adaptive Fuzzy Inferencing Coupled with Heuristic Algorithms. Comput. Electron. Agric. 2021, 191, 106541. [Google Scholar] [CrossRef]
Adnan, R.M.; Mostafa, R.R.; Dai, H.-L.; Heddam, S.; Kuriqi, A.; Kisi, O. Pan Evaporation Estimation by Relevance Vector Machine Tuned with New Metaheuristic Algorithms Using Limited Climatic Data. Eng. Appl. Comput. Fluid Mech. 2023, 17, 2192258. [Google Scholar] [CrossRef]
Adnan, R.M.; Meshram, S.G.; Mostafa, R.R.; Islam, A.R.M.T.; Abba, S.I.; Andorful, F.; Chen, Z. Application of Advanced Optimized Soft Computing Models for Atmospheric Variable Forecasting. Mathematics 2023, 11, 1213. [Google Scholar] [CrossRef]
Li, H.; Yang, Y.; Wang, H.; Li, B.; Wang, P.; Li, J.; Liao, H. Constructing a Spatiotemporally Coherent Long-Term PM_2.5 Concentration Dataset over China during 1980–2019 Using a Machine Learning Approach. Sci. Total Environ. 2021, 765, 144263. [Google Scholar] [CrossRef] [PubMed]
Moisan, S.; Herrera, R.; Clements, A. A Dynamic Multiple Equation Approach for Forecasting PM_2.5 Pollution in Santiago, Chile. Int. J. Forecast. 2018, 34, 566–581. [Google Scholar] [CrossRef]
Chen, G.; Li, S.; Knibbs, L.D.; Hamm, N.A.S.; Cao, W.; Li, T.; Guo, J.; Ren, H.; Abramson, M.J.; Guo, Y. A Machine Learning Method to Estimate PM_2.5 Concentrations across China with Remote Sensing, Meteorological and Land Use Information. Sci. Total Environ. 2018, 636, 52–60. [Google Scholar] [CrossRef] [PubMed]
Zhong, J.; Zhang, X.; Gui, K.; Wang, Y.; Che, H.; Shen, X.; Zhang, L.; Zhang, Y.; Sun, J.; Zhang, W. Robust Prediction of Hourly PM_2.5 from Meteorological Data Using LightGBM. Natl. Sci. Rev. 2021, 8, nwaa307. [Google Scholar] [CrossRef] [PubMed]
Candice, B.; Anna, C.; Gonzalo, M.-M. A Comparative Analysis of Gradient Boosting Algorithms. Artif. Intell. Rev. 2021, 54, 1937–1967. [Google Scholar] [CrossRef]
Yuan, Y.; Li, S.; Zhang, X.; Sun, J. A Comparative Analysis of SVM, Naive Bayes and GBDT for Data Faults Detection in WSNs. In Proceedings of the 2018 IEEE International Conference on Software Quality, Reliability and Security Companion (QRS-C), Lisbon, Portugal, 16–20 July 2018; pp. 394–399. [Google Scholar]
Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Advances in Neural Information Processing Systems; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: Brooklyn, NY, USA, 2017; Volume 30. [Google Scholar]
Yue, Y.; Cao, L.; Lu, D.; Hu, Z.; Xu, M.; Wang, S.; Li, B.; Ding, H. Review and Empirical Analysis of Sparrow Search Algorithm. Artif. Intell. Rev. 2023, 56, 10867–10919. [Google Scholar] [CrossRef]
Kennedy, J.; Eberhart, R.C. Particle Swarm Optimization. In Proceedings of the ICNN’95—International Conference on Neural Networks, Perth, WA, Australia, 27 November–1 December 1995; Volume 4, pp. 1942–1948. [Google Scholar]
Blum, C. Ant Colony Optimization: Introduction and Recent Trends. Phys. Life Rev. 2005, 2, 353–373. [Google Scholar] [CrossRef]
Mirjalili, S.; Lewis, A. The Whale Optimization Algorithm. Adv. Eng. Softw. 2016, 95, 51–67. [Google Scholar] [CrossRef]
Mirjalili, S.; Mirjalili, S.M.; Lewis, A. Grey Wolf Optimizer. Adv. Eng. Softw. 2014, 69, 46–61. [Google Scholar] [CrossRef]
Heidari, A.A.; Mirjalili, S.; Faris, H.; Aljarah, I.; Mafarja, M.; Chen, H. Harris Hawks Optimization: Algorithm and Applications. Future Gener. Comput. Syst. 2019, 97, 849–872. [Google Scholar] [CrossRef]
Karaboga, D.; Gorkemli, B.; Ozturk, C.; Karaboga, N. A Comprehensive Survey: Artificial Bee Colony (ABC) Algorithm and Applications. Artif. Intell. Rev. 2014, 42, 21–57. [Google Scholar] [CrossRef]
Xue, J.; Shen, B. A Novel Swarm Intelligence Optimization Approach: Sparrow Search Algorithm. Syst. Sci. Control Eng. 2020, 8, 22–34. [Google Scholar] [CrossRef]
Zhang, Z.; Han, Y. Discrete Sparrow Search Algorithm for Symmetric Traveling Salesman Problem. Appl. Soft Comput. 2022, 118, 108469. [Google Scholar] [CrossRef]
Zhang, Z.; He, R.; Yang, K. A Bioinspired Path Planning Approach for Mobile Robots Based on Improved Sparrow Search Algorithm. Adv. Manuf. 2022, 10, 114–130. [Google Scholar] [CrossRef]
Liu, J.; Wang, Z. A Hybrid Sparrow Search Algorithm Based on Constructing Similarity. IEEE Access 2021, 9, 117581–117595. [Google Scholar] [CrossRef]
Xu, B.; Tan, Y.; Sun, W.; Ma, T.; Liu, H.; Wang, D. Study on the Prediction of the Uniaxial Compressive Strength of Rock Based on the SSA-XGBoost Model. Sustainability 2023, 15, 5201. [Google Scholar] [CrossRef]
Nguyen, H.; Bui, X.-N.; Topal, E. Reliability and Availability Artificial Intelligence Models for Predicting Blast-Induced Ground Vibration Intensity in Open-Pit Mines to Ensure the Safety of the Surroundings. Reliab. Eng. Syst. Saf. 2023, 231, 109032. [Google Scholar] [CrossRef]
Wang, M.; Hui, G.; Pang, Y.; Wang, S.; Chen, S. Optimization of Machine Learning Approaches for Shale Gas Production Forecast. Geoenergy Sci. Eng. 2023, 226, 211719. [Google Scholar] [CrossRef]
Zhu, Y.; Yousefi, N. Optimal Parameter Identification of PEMFC Stacks Using Adaptive Sparrow Search Algorithm. Int. J. Hydrogen Energy 2021, 46, 9541–9552. [Google Scholar] [CrossRef]
Liu, X.; Guo, H. Air Quality Indicators and AQI Prediction Coupling Long-Short Term Memory (LSTM) and Sparrow Search Algorithm (SSA): A Case Study of Shanghai. Atmos. Pollut. Res. 2022, 13, 101551. [Google Scholar] [CrossRef]
Li, Q.; Shi, Y.; Lin, R.; Qiao, W.; Ba, W. A Novel Oil Pipeline Leakage Detection Method Based on the Sparrow Search Algorithm and CNN. Measurement 2022, 204, 112122. [Google Scholar] [CrossRef]
Zhang, C.; Ding, S. A Stochastic Configuration Network Based on Chaotic Sparrow Search Algorithm. Knowl.-Based Syst. 2021, 220, 106924. [Google Scholar] [CrossRef]
Zhao, Z.; Wang, H. Research on Intrusion Detection of Industrial Control System Based on ISSA-LightGBM. J. East China Univ. Sci. Technol. 2022, 48, 1–9. [Google Scholar] [CrossRef]
Adnan, R.M.; Dai, H.-L.; Mostafa, R.R.; Parmar, K.S.; Heddam, S.; Kisi, O. Modeling Multistep Ahead Dissolved Oxygen Concentration Using Improved Support Vector Machines by a Hybrid Metaheuristic Algorithm. Sustainability 2022, 14, 3470. [Google Scholar] [CrossRef]
Mostafa, R.R.; Kisi, O.; Adnan, R.M.; Sadeghifar, T.; Kuriqi, A. Modeling Potential Evapotranspiration by Improved Machine Learning Methods Using Limited Climatic Data. Water 2023, 15, 486. [Google Scholar] [CrossRef]
Li, D.; Dan, N.; Zengliang, Z.; Chen, C. Short-Term Weather Forecast Based on Wavelet Denoising and Catboost. In Proceedings of the 2019 Chinese Control Conference (CCC), GuangZhou, China, 27 July 2019; pp. 3760–3764. [Google Scholar]
Dong, N.; Wu, C.-H.; Ip, W.-H.; Chen, Z.-Q.; Chan, C.-Y.; Yung, K.-L. An Opposition-Based Chaotic GA/PSO Hybrid Algorithm and Its Application in Circle Detection. Comput. Math. Appl. 2012, 64, 1886–1902. [Google Scholar] [CrossRef]
Hua, L.; Wang, Y. Estimation of Discrepancy. In Applications of Number Theory to Numerical Analysis; Springer: Berlin/Heidelberg, Germany, 1981; pp. 70–98. [Google Scholar]
Houssein, E.H.; Saad, M.R.; Hashim, F.A.; Shaban, H.; Hassaballah, M. Lévy Flight Distribution: A New Metaheuristic Algorithm for Solving Engineering Optimization Problems. Eng. Appl. Artif. Intell. 2020, 94, 103731. [Google Scholar] [CrossRef]
Bloch, D. A Note on the Estimation of the Location Parameter of the Cauchy Distribution. J. Am. Stat. Assoc. 1966, 61, 852–855. [Google Scholar] [CrossRef]
Zhang, S.; Guo, B.; Dong, A.; He, J.; Xu, Z.; Chen, S.X. Cautionary Tales on Air-Quality Improvement in Beijing. Proc. R. Soc. Math. Phys. Eng. Sci. 2017, 473, 20170457. [Google Scholar] [CrossRef]
Awad, Y.A.; Koutrakis, P.; Coull, B.A.; Schwartz, J. A Spatio-Temporal Prediction Model Based on Support Vector Machine Regression: Ambient Black Carbon in Three New England States. Environ. Res. 2017, 159, 427–434. [Google Scholar] [CrossRef]
Li, L.; Zhang, J.; Qiu, W.; Wang, J.; Fang, Y. An Ensemble Spatiotemporal Model for Predicting PM_2.5 Concentrations. Int. J. Environ. Res. Public. Health 2017, 14, 549. [Google Scholar] [CrossRef] [PubMed]
Danesh Yazdi, M.; Kuang, Z.; Dimakopoulou, K.; Barratt, B.; Suel, E.; Amini, H.; Lyapustin, A.; Katsouyanni, K.; Schwartz, J. Predicting Fine Particulate Matter (PM_2.5) in the Greater London Area: An Ensemble Approach Using Machine Learning Methods. Remote Sens. 2020, 12, 914. [Google Scholar] [CrossRef]
Wu, Y.; Lin, S.; Shi, K.; Ye, Z.; Fang, Y. Seasonal Prediction of Daily PM_2.5 Concentrations with Interpretable Machine Learning: A Case Study of Beijing, China. Environ. Sci. Pollut. Res. 2022, 29, 45821–45836. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Dispersion of different type of chaos mapping. (a) Point of logistic mapping; (b) Point of goodpoint set.

Figure 2. Track of Levy flight.

Figure 3. Flowchart of LASSA-LGB.

Figure 4. Distribution of the research data.

Figure 5. Benchmark functions’ fitting curve. Figure (a–j) describes the solution iteration curves of 10 functions, respectively. In Figure 5e, the gray line and the blue line are highly coincident.

Figure 6. Scatterplots of the observed and predicted PM_2.5 concentrations using different LGB-based models in S1.

Figure 7. Taylor diagram of PM_2.5 concentrations using different LGB based models. Figure (a–j) describes the result of 10 sites, respectively.

Figure 8. Violin plots of PM_2.5 concentrations using different LGB based models in S1. Figure (a–d) describes the result of 4 datasets, respectively.

Table 1. Key parameters of LightGBM.

Parameters	Range	Default Value
max_depth	[3–7]	5
num_leaves	[1–127]	31
subsample	[0.1–1.0]	1
learning_rate	[0.01–0.15]	0.1

Table 2. Compare algorithms and parameter settings.

Algorithm	Parameter
PSO	C1 = C2 = 2, ω = 0.9
GWO	r1, r2 ∈ [0, 1]
WOA	b = 1
SSA	PD = 20%, SD = 10%, ST = 0.8
ISSA	PD = 20%, SD = 10%, ST = 0.8
LASSA	PD = 20%, SD = 10%, ST = 0.8

Table 3. Benchmark functions.

Name	Definition	Value Range	Ideal Optima
Sphere	$F_{1} (x) = \sum_{i = 1}^{n} x_{i}^{2}$	[−100, 100]³⁰	0
Schwefel’2.22	$F_{2} (x) = \sum_{i = 1}^{n} \| x_{i} \| + \prod_{i = 1}^{n} \| x_{i} \|$	[−10, 10]³⁰	0
Rosen brock	$F_{3} (x) = \sum_{i = 1}^{n - 1} [100 {(x_{i + 1} - x_{i}^{2})}^{2} + {(x_{i} - 1)}^{2}]$	[−30, 30]³⁰	0
Schwefel’2.21	$F_{4} (x) = \max_{i} {\| x_{i} \|, 1 \leq i \leq n}$	[−100, 100]³⁰	0
Schwefel’1.2	$F_{5} (x) = \sum_{i = 1}^{n} {(\sum_{j = 1}^{i} x_{j})}^{2}$	[−100, 100]³⁰	0
Penalized1	$\begin{matrix} \begin{array}{l} F_{6} (x) = \frac{π}{n} {10 sin (π y_{1}) + \sum_{i = 1}^{n - 1} {(y_{i} - 1)}^{2} [1 + 10 {sin}^{2} (π y_{i + 1})] + \\ {(y_{n} - 1)}^{2}} + \sum_{i = 1}^{n} u (x_{i}, 10, 100, 4) \end{array} \end{matrix}$	[−50, 50]³⁰	0
Rastrigin	$F_{7} (x) = \sum_{i = 1}^{n} [x_{i}^{2} - 10 \cos (2 π x_{i}) + 10]$	[−5.12, 5.12]³⁰	0
Ackley	$\begin{matrix} F_{8} (x) = - 20 \exp (- 0.2 \sqrt{\frac{1}{n} \sum_{i = 1}^{n} x_{i}^{2}}) - \\ \exp (\frac{1}{n} \sum_{i = 1}^{n} \cos (2 π x_{i})) + 20 + e \end{matrix}$	[−32, 32]³⁰	0
Griewank	$F_{9} (x) = \frac{1}{4000} \sum_{i = 1}^{n} x_{i}^{2} - \prod_{i = 1}^{n} \cos (\frac{x_{i}}{\sqrt{i}}) + 1$	[−600, 600]³⁰	0
Shekel’s	$F_{10} (x) = - \sum_{i = 1}^{5} {[(x - a_{i}) {(x - a_{i})}^{T} + c_{i}]}^{- 1}$	[0, 10]⁴	−10.153

Table 4. Benchmark function results.

Fun	Metric	PSO	GWO	WOA	SSA	ISSA	LASSA
F1	Avg	6.93 × 10⁻¹	1.43 × 10⁻²⁷	1.26 × 10⁻⁸¹	1.36 × 10⁻¹⁷⁰	6.61 × 10⁻²¹¹	2.14 × 10⁻²¹⁷
	Std	6.05 × 10⁻¹	3.03 × 10⁻²⁷	1.15 × 10⁻⁸⁰	2.01 × 10⁻¹⁷⁰	0	0
	Worst	0.38 × 10⁺¹	2.65 × 10⁻²⁶	1.86 × 10⁻⁷⁹	6.79 × 10⁻¹⁶⁸	3.30 × 10⁻²⁰⁸	1.07 × 10⁻²¹⁴
	Best	7.22 × 10⁻²	6.84 × 10⁻³⁰	1.36 × 10⁻¹⁰⁰	6.53 × 10⁻¹⁶⁹	0	0
F2	Avg	0.19 × 10⁺¹	9.66 × 10⁻¹⁷	1.45 × 10⁻⁵³	1.06 × 10⁻⁹⁰	6.97 × 10⁻⁸²	1.22 × 10⁻¹²⁷
	Std	0.41 × 10⁺¹	7.92 × 10⁻¹⁷	1.67 × 10⁻⁵²	2.37 × 10⁻⁸⁹	1.56 × 10⁻⁸⁰	2.73 × 10⁻¹²⁶
	Worst	2.01 × 10⁺¹	6.23 × 10⁻¹⁶	3.02 × 10⁻⁵¹	5.31 × 10⁻⁸⁸	3.48 × 10⁻⁷⁹	6.11 × 10⁻¹²⁵
	Best	1.89 × 10⁻²	4.49 × 10⁻¹⁸	1.20 × 10⁻⁶⁶	3.79 × 10⁻⁹¹	0	0
F3	Avg	5.97 × 10⁺³	2.70 × 10⁺¹	9.50 × 10⁺⁰⁰	1.10 × 10⁻⁰⁴	7.17 × 10⁻⁰⁵	4.52 × 10⁻⁰⁶
	Std	2.17 × 10⁺⁴	7.47 × 10⁻¹	1.25 × 10⁺⁰¹	2.40 × 10⁻⁰⁴	1.59 × 10⁻⁰⁴	1.36 × 10⁻⁰⁵
	Worst	9.03 × 10⁺⁴	2.88 × 10⁺¹	2.88 × 10⁺⁰¹	3.11 × 10⁻⁰³	1.58 × 10⁻⁰³	1.98 × 10⁻⁰⁴
	Best	3.95 × 10⁺¹	2.52 × 10⁺¹	8.12 × 10⁻⁰⁶	3.25 × 10⁻¹¹	2.05 × 10⁻¹⁰	1.72 × 10⁻¹³
F4	Avg	0.87 × 10⁺¹	7.62 × 10⁻⁷	1.90 × 10⁻⁰⁹	8.60 × 10⁻⁹⁹	1.81 × 10⁻¹⁰²	2.12 × 10⁻¹³⁹
	Std	0.15 × 10⁺¹	7.81 × 10⁻⁷	3.27 × 10⁻⁰⁸	1.92 × 10⁻⁹⁷	4.05 × 10⁻¹⁰¹	4.73 × 10⁻¹³⁸
	Worst	1.37 × 10⁺¹	5.47 × 10⁻⁶	7.28 × 10⁻⁰⁷	4.30 × 10⁻⁹⁶	9.06 × 10⁻¹⁰⁰	1.06 × 10⁻¹³⁶
	Best	0.46 × 10⁺¹	3.32 × 10⁻⁸	1.64 × 10⁻²⁰	0	0	0
F5	Avg	3.36 × 10⁺³	1.76 × 10⁻⁵	5.30 × 10⁺⁰³	1.28 × 10⁻²¹⁰	2.31 × 10⁻²²⁰	2.13 × 10⁻²⁷⁴
	Std	2.59 × 10⁺³	1.07 × 10⁻⁴	1.00 × 10⁺⁰⁴	1.09 × 10⁻²⁰⁹	0	0
	Worst	1.68 × 10⁺⁴	2.00 × 10⁻³	5.79 × 10⁺⁰⁴	6.40 × 10⁻²⁰⁸	1.16 × 10⁻²¹⁷	1.07 × 10⁻²⁷¹
	Best	5.40 × 10⁺²	1.37 × 10⁻⁹	1.89 × 10⁻¹⁶	1.14 × 10⁻²¹¹	0	0
F6	Avg	4.73 × 10⁻¹	4.87 × 10⁻²	3.46 × 10⁻⁰³	1.53 × 10⁻⁰⁴	7.86 × 10⁻⁰⁶	3.43 × 10⁻⁰⁸
	Std	5.80 × 10⁻¹	4.07 × 10⁻²	3.98 × 10⁻⁰³	3.40 × 10⁻⁰⁴	2.50 × 10⁻⁰⁵	1.19 × 10⁻⁰⁷
	Worst	0.35 × 10⁺¹	5.63 × 10⁻¹	3.55 × 10⁻⁰²	2.93 × 10⁻⁰³	3.38 × 10⁻⁰⁴	1.32 × 10⁻⁰⁶
	Best	9.67 × 10⁻⁴	1.06 × 10⁻⁵	1.06 × 10⁻⁰⁶	1.18 × 10⁻⁰⁹	2.13 × 10⁻¹¹	5.17 × 10⁻¹⁶
F7	Avg	1.76 × 10⁻³	1.80 × 10⁻²	1.08 × 10⁻⁰²	0	0	0
	Std	1.56 × 10⁻³	7.64 × 10⁻²	4.94 × 10⁻⁰²	0	0	0
	Worst	1.27 × 10⁻²	3.34 × 10⁻²	2.00 × 10⁻⁰²	0	0	0
	Best	9.36 × 10⁻⁵	3.96 × 10⁻⁵	1.98 × 10⁻⁰⁵	0	0	0
F8	Avg	0.11 × 10⁺¹	5.77 × 10⁻¹⁴	2.58 × 10⁻¹⁵	4.44 × 10⁻¹⁶	4.44 × 10⁻¹⁶	4.44 × 10⁻¹⁶
	Std	6.38 × 10⁻¹	1.09 × 10⁻¹⁴	2.11 × 10⁻¹⁵	0	0	0
	Worst	0.27 × 10⁺¹	9.28 × 10⁻¹⁴	7.55 × 10⁻¹⁵	4.44 × 10⁻¹⁶	4.44 × 10⁻¹⁶	4.44 × 10⁻¹⁶
	Best	7.51 × 10⁻²	3.24 × 10⁻¹⁴	4.44 × 10⁻¹⁶	4.44 × 10⁻¹⁶	4.44 × 10⁻¹⁶	4.44 × 10⁻¹⁶
F9	Avg	6.09 × 10⁻¹	4.72 × 10⁻³	2.05 × 10⁻³	1.83 × 10⁻³	0	0
	Std	2.25 × 10⁻¹	9.19 × 10⁻³	3.57 × 10⁻²	1.71 × 10⁻²	0	0
	Worst	0.10 × 10⁺¹	8.53 × 10⁻²	7.49 × 10⁻¹	1.41 × 10⁻¹	0	0
	Best	6.67 × 10⁻²	0	0	0	0	0
F10	Avg	−5.96	−9.29	−8.45	−7.05	−6.61	−10.11
	Std	3.41	2.02	2.38	2.05	1.92	0.33
	Worst	−2.63	−2.63	−1.34	−5.05	−5.05	−5.05
	Best	−10.15	−10.15	−10.15	−10.15	−10.15	−10.15

Table 5. Prediction results of PM_2.5 concentration at multiple sites.

		RMSE	MAE	MAPE	R²	TIL	CCC	WIA
S1	LASSA-LGB	65.681	36.527	30.699	0.881	0.056	0.841	0.915
	ISSA-LGB	70.428	36.000	30.923	0.841	0.069	0.817	0.901
	SSA-LGB	71.265	37.238	36.613	0.784	0.068	0.812	0.899
	LGB	80.891	41.320	35.629	0.784	0.079	0.785	0.882
S2	LASSA-LGB	55.708	29.009	25.945	0.899	0.047	0.868	0.931
	ISSA-LGB	61.774	29.191	26.267	0.889	0.063	0.839	0.914
	SSA-LGB	63.649	30.988	30.469	0.823	0.066	0.829	0.908
	LGB	66.191	32.356	28.305	0.794	0.070	0.821	0.904
S3	LASSA-LGB	53.217	34.311	35.368	0.820	0.028	0.840	0.916
	ISSA-LGB	56.219	35.828	36.137	0.818	0.035	0.824	0.907
	SSA-LGB	56.872	34.634	39.842	0.817	0.035	0.818	0.905
	LGB	58.272	35.971	40.404	0.809	0.038	0.809	0.900
S4	LASSA-LGB	40.167	25.803	16.779	0.825	0.094	0.915	0.957
	ISSA-LGB	40.595	25.907	16.637	0.827	0.097	0.912	0.955
	SSA-LGB	40.985	25.060	16.799	0.818	0.096	0.909	0.953
	LGB	40.897	26.282	17.143	0.819	0.096	0.908	0.953
S5	LASSA-LGB	61.876	35.276	21.932	0.906	0.067	0.954	0.976
	ISSA-LGB	63.964	36.739	23.292	0.899	0.069	0.951	0.975
	SSA-LGB	65.137	37.851	24.320	0.895	0.071	0.948	0.974
	LGB	72.306	38.963	22.799	0.871	0.078	0.938	0.968
S6	LASSA-LGB	63.466	36.933	23.399	0.906	0.066	0.954	0.976
	ISSA-LGB	65.510	38.289	24.653	0.899	0.069	0.951	0.975
	SSA-LGB	66.284	39.107	25.824	0.897	0.070	0.949	0.974
	LGB	73.766	40.241	23.553	0.872	0.077	0.938	0.968
S7	LASSA-LGB	44.574	27.580	17.420	0.965	0.035	0.983	0.991
	ISSA-LGB	47.514	29.404	17.813	0.960	0.037	0.980	0.990
	SSA-LGB	49.023	29.449	19.914	0.958	0.038	0.979	0.989
	LGB	50.726	31.023	19.409	0.954	0.040	0.977	0.989
S8	LASSA-LGB	41.287	25.199	16.916	0.962	0.041	0.981	0.990
	ISSA-LGB	43.981	26.628	17.150	0.957	0.043	0.979	0.989
	SSA-LGB	43.314	26.689	19.101	0.958	0.043	0.979	0.989
	LGB	46.747	28.234	18.542	0.951	0.046	0.976	0.988
S9	LASSA-LGB	42.379	25.979	12.757	0.943	0.037	0.972	0.986
	ISSA-LGB	42.507	25.630	12.622	0.943	0.037	0.971	0.985
	SSA-LGB	43.777	26.386	12.351	0.939	0.039	0.969	0.984
	LGB	42.983	25.758	12.484	0.942	0.038	0.970	0.985
S10	LASSA-LGB	52.174	27.109	12.844	0.917	0.043	0.959	0.979
	ISSA-LGB	49.192	26.245	12.657	0.927	0.041	0.964	0.981
	SSA-LGB	50.200	27.346	12.505	0.923	0.042	0.962	0.980
	LGB	51.148	27.004	12.709	0.920	0.043	0.960	0.979

Table 6. The percentage of accuracy increase in LASSA-LGB compared to LGB (%).

	S1	S2	S3	S4	S5	S6	S7	S8	S9	S10
RMSE	23.16	18.82	9.50	1.82	16.86	16.23	13.80	13.22	1.42	1.97
CCC	6.68	5.38	3.67	0.84	1.71	1.67	0.54	0.56	0.18	0.03

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, X.; Zhao, K.; Liu, Z.; Wang, L. PM_2.5 Concentration Prediction Based on LightGBM Optimized by Adaptive Multi-Strategy Enhanced Sparrow Search Algorithm. Atmosphere 2023, 14, 1612. https://0-doi-org.brum.beds.ac.uk/10.3390/atmos14111612

AMA Style

Liu X, Zhao K, Liu Z, Wang L. PM_2.5 Concentration Prediction Based on LightGBM Optimized by Adaptive Multi-Strategy Enhanced Sparrow Search Algorithm. Atmosphere. 2023; 14(11):1612. https://0-doi-org.brum.beds.ac.uk/10.3390/atmos14111612

Chicago/Turabian Style

Liu, Xuehu, Kexin Zhao, Zuhan Liu, and Lili Wang. 2023. "PM_2.5 Concentration Prediction Based on LightGBM Optimized by Adaptive Multi-Strategy Enhanced Sparrow Search Algorithm" Atmosphere 14, no. 11: 1612. https://0-doi-org.brum.beds.ac.uk/10.3390/atmos14111612

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu