Next Article in Journal
Capturing a Complexity of Nutritional, Environmental, and Economic Impacts on Selected Health Parameters in the Russian High North
Previous Article in Journal
A Novel Approach to Measuring Urban Waterlogging Depth from Images Based on Mask Region-Based Convolutional Neural Network
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Estimating Soil Available Phosphorus Content through Coupled Wavelet–Data-Driven Models

1
Water Engineering Department, Faculty of Agriculture, University of Tabriz, Tabriz 51666-16471, Iran
2
Laboratory of Remote Sensing and GIS, Department of Soil Science, University of Tehran, P.O.Box: 4111, Karaj 31587-77871, Iran
3
Department of Mining Engineering, Hacettepe University, 06800 Beytepe, Ankara, Turkey
4
Faculty of Natural Sciences and Engineering, Ilia State University, 0162 Tbilisi, Georgia
5
Soil Erosion and Degradation Research Group, Department of Geography, University of Valencia, 46010 Valencia, Spain
6
Physical Geography, Trier University, 54286 Trier, Germany
*
Author to whom correspondence should be addressed.
Sustainability 2020, 12(5), 2150; https://0-doi-org.brum.beds.ac.uk/10.3390/su12052150
Submission received: 16 January 2020 / Revised: 4 March 2020 / Accepted: 4 March 2020 / Published: 10 March 2020

Abstract

:
Soil phosphorus (P) is a vital but limited element which is usually leached from the soil via the drainage process. Soil phosphorus as a soluble substance can be delivered through agricultural fields by runoff or soil loss. It is one of the most essential nutrients that affect the sustainability of crops as well as the energy transfer for living organisms. Therefore, an accurate simulation of soil phosphorus, which is considered as a point source pollutant in elevated contents, must be performed. Considering a crucial issue for a sustainable soil and water management, an effective soil phosphorus assessment in the current research was conducted with the aim of examining the capability of five different wavelet-based data-driven models: gene expression programming (GEP), neural networks (NN), random forest (RF), multivariate adaptive regression spline (MARS), and support vector machine (SVM) in modeling soil phosphorus (P). In order to achieve this goal, several parameters, including soil pH, organic carbon (OC), clay content, and soil P data, were collected from different regions of the Neyshabur plain, Khorasan-e-Razavi Province (Northeast Iran). First, a discrete wavelet transform (DWT) was applied to the pH, OC, and clay as the inputs and their subcomponents were utilized in the applied data-driven techniques. Statistical Gamma test was also used for identifying which effective soil parameter is able to influence soil P. The applied methods were assessed through 10-fold cross-validation scenarios. Our results demonstrated that the wavelet–GEP (WGEP) model outperformed the other models with respect to various validations, such as correlation coefficient (R), scatter index (SI), and Nash–Sutcliffe coefficient (NS) criteria. The GEP model improved the accuracy of the MARS, RF, SVM, and NN models with respect to SI-NS (By comparing the SI values of the GEP model with other models namely MARS, RF, SVM, and NN, the outputs of GEP showed more accuracy by 35%, 30%, 40%, 50%, respectively. Similarly, the results of the GEP outperformed the other models by 3.1%, 2.3%, 4.3%, and 7.6%, comparing their NS values.) by 35%-3.1%, 30%-2.3%, 40%-4.3%, and 50%-7.6%, respectively.

1. Introduction

Controlling point source pollutants is a crucial issue in soil and water resource management plans [1,2]. To achieve this goal, related to the sustainability of natural ecosystems and human activities, managing runoff and soil losses from agricultural lands is crucial [3,4]. Agricultural drainage systems usually remove the excess of overland flow from the land through surface/subsurface networks to maintain the soil moisture levels at the standard points for crop production [5,6]. Nevertheless, water flowing through these systems can deliver soluble elements, which can be stored by the soil and modify soil quality [7,8]. During the drainage process, those soluble elements are leached from the soil and can pollute the drainage effluent [9], which may lead to increasing the risk of environmental problems. Thus, nature-based solutions should be necessary to conserve the stability between the disposed of/retained excess water, in such a way that neither waterlogging nor environmental side effects can take place [10].
Although soil phosphorus (P) is an essential nutrient for sustained crop production, it is also considered as a pollutant resource of water. In other words, it affects the growth of the crops as well as terrestrial systems in addition to the microorganisms living condition in the soil [11]. Interestingly enough, the drained water from agricultural lands can generally contain significant amounts of this element, which has a high spatial and temporal variation. The reduction of water quality due to eutrophication is another environmentally-damaging result of this delivery from lands to water resources or artificial drainage systems [12,13]. Generally, a considerable amount of soil P can be found in most of the calcareous soils of arid/semiarid regions of Iran, which represents a result of the reactions of absorption and illuviation of carbonate minerals [14,15]. Based on its diverse detrimental effects, stringent management of phosphorus in surface/subsurface water bodies is obviously essential to improve water quality.
To better manage soil phosphorus, gaining precise knowledge of available soil P content and planning an efficient fertilization process are strongly needed [16,17]. There are also other dynamics to be considered, namely, transport and source factors, which should be considered to conserve water quality. In several soils, applying excess P fertilizer above crop needs is desirable, which is thought to be able to enhance the optimal crop yield [18]. However, soils which contain a significant amount of amorphous clay and highly calcareous ones should be considered as exceptions, because increasing the P levels could be problematic. There are also different parameters, including soil pH and texture, organic matter, as well as the presence of iron and aluminium oxides which modify the extraction of P by the plants [19,20]. Therefore, accurate knowledge of the dynamics of the studied soils, as well as the complex system of soil P for the plants, are key in agricultural lands and human soil health [21,22]. Some soil parameters can usually be estimated by modeling approaches using easily measured soil parameters, such as soil separates. In this context, regression-based models can be applied to demonstrate the correlation between easily pedological measured parameters (as input variables) and target-dependent variables, which are referred to as pedotransfer functions (PTFs). Nonetheless, artificial intelligence (AI) techniques (e.g., gene expression programming (GEP), neural networks (NN), random forest (RF), multivariate adaptive regression spline (MARS), and support vector machine (SVM)) for mapping the interrelations among soil parameters used to be more applicable. The use of the wavelet transform approach might be applied in this context for improving the models’ overall performance accuracy. It is interesting to note that the use of AI techniques in this regard is very limited, at least, in semiarid areas and larger scales (e.g., [23,24]) despite their wider applications in various soil and water analysis issues, such as modeling soil cation exchange capacity [8,25], modeling soil bulk density [26], predicting groundwater level fluctuations [27], estimating reference evapotranspiration [28], simulating watershed sediment amount [29], and estimating terrestrial parameters such as solar radiation [30], as well as groundwater pollution studies [31].
In revent decades, wavelet transform (WT) has become a reliable technique in analyzing periodicities and variations in time series analysis [32,33,34,35]. Wavelet analysis is expected to be a promising tool because of its multiresolution in frequency and time domains in modeling hydrological issues, which improves the accuracy of forecasting models by providing subseries of time series. They are based on gaining more detailed information about the behavior of the physical process to be estimated/simulated (e.g., [36,37,38]). In other words, WT constructs both low- and high-frequency components at various levels of resolution in time series, which would improve the accuracy of predictive models. It has been demonstrated as a powerful technique in some spatial simulation issues, as well [39,40]. The conjunction of these types of analysis with AI models has been applied for estimation issues in water resources and hydrological research [41,42,43,44]. It is evident from these studies that WT fairly advances the accuracy of AI methods in estimation.
Nevertheless, there is a lack of information about the development of the models based on limited measured soil variables for estimating soil P and applications in semiarid catchments. Therefore, the current study aimed to evaluate the GEP, MARS, RF, SVM, and NN methodologies for estimating soil P content using soil easily measured variables. Based on the best of the authors’ knowledge, except using NN in the previously mentioned studies, the current research is the first application of these AI techniques, as well as their coupling with wavelet transform for modeling soil P. Using single data-partitioning methods for feeding the applied models with the input-target matrices is common around soil parameter modeling. Despite the simplifications involved in such data-partitioning modes, the developed models might be overfitted using only a part of data for their training, which might be linked to the attributes selected for the training stage. However, the current research intends to show the application of the most powerful k-fold testing cross-validation mode, where all the available input-target pairs are involved in developing and testing the applied models.

2. Materials and Methods

2.1. Study Region

The studied area of the Neyshabur plain (Northeast of Iran) is located between latitude 35°40′ N to 36°40′ N and longitude 58°12′ E to 59°31′ E, with an average altitude of 1256 m a.s.l., which approximately covers 90 km2 [45] (Figure 1). This flat plain is characterized by gentle hillslopes ranging from 5 to 20 degrees (78%), and the rest of the land is covered by moderate inclination [46]. Based on an earlier study [47], most of the soils in this plain can be classified as Aridisols and Entisols orders [48], and irrigated farming is the main land use of this area. The general physiographic trend of the plain extends in an NW–SE direction. Land units and land use maps are included in Figure 2 according to [49].

2.2. Soil Sampling Procedures and Soil Analysis

ArcGIS 10.2 (ESRI, USA) software was used to apply a fishnet sampling design method as a reliable strategy with 300 grids (Figure 1). Additionally, the influence of the spatial variation of soil parameters affecting P values in the Neyshabur plain was considered with a grid interval of 500 × 500 m. A portable Global Positioning System (GPS) to precisely locate the sampling locations was used. A total of 12 locations of about 300 grids correspond to urbanized areas (mostly limited by the fences) and were therefore not sampled. Thus, a total of 288 samples ranging from 0 to 30 cm of soil depth were gathered and selected. Then, the disturbed samples were collected, air-dried, crushed, and also sieved, utilizing a 2 mm sieve size. After separating and discarding, the large plant substances and pebbles were transported for laboratory analyses. Soil P was determined using the method developed by Olsen [50] and the amount of soil organic carbon (OC) content estimated using the Walkley–Black approach in addtion to dichromate extraction and titrimetric quantization [51]. Particle size distribution (Sand: 0.05–−2 mm, silt: 0.002–0.05 mm, and clay: < 0.002 mm) was determined using the Bouyoucos hydrometer method [52]. Schoeneberger´s methodology [53] was followed to classify soil texture. Soil pH was also measured by a digital pH-meter in saturated paste extract [54]. Calcium carbonate equivalent (CCE) was also gained using the back-titration technique [55]. The statistical-based characteristics of the utilized soil data are shown in Table 1.
The maximum values of the standard deviation (SD) and coefficient of variation (CV) are devoted to the soil P. This can be justified due to the parent material, terrain attributes and agricultural practices [24,56].
Kurtosis and skewness coefficients obtained the highest values for soil OC. They revealed the degree of the deviation of OC from the Gaussian distribution, which might be justified due to the application of fertilizers as discussed by Fard et al. [57]. There are also some differences observed in soil properties, which might be effective parameters impacting on soil P content. In other words, its sorption and desorption may be altered under a certain set of environmental conditions [24,58].

2.3. Soil Models

2.3.1. Discrete Wavelet Transform (DWT)

Wavelet analysis, which is actually driven by the compact support of its basic function, has been developed based on the Fourier analysis. Despite the incapability of Fourier transform in providing a valid time–frequency analysis, wavelet transform analysis appear to be an effective tool in this area [59]. Therefore, WT is generally similar to the windowed Fourier transform, while their merit functions are completely different. Fourier transform decomposes the signal into sines and cosines (localization in Fourier space), but the WT utilizes localized functions in bothreal and Fourier spaces. Having waveforms of effectively limited duration and zero mean is one of these wavelets’ specifications. It also provides a time-scale localization of processes, which improves the accuracy of forecasting models by providing time subseries. Translating the signals from the time domain to the time/frequency domain can be done with the discrete wavelet transform (DWT). During this process, the original signal decomposes into different frequency components, which can be converted into father and mother wavelets [60]. Wavelet function called the mother wavelet might be considered as + ψ ( t ) d t = 0 . ψ a , b ( t ) , which might be acquired by compressing and expanding ψ ( t ) based on Equation (1):
ψ a , b ( t ) = | a | 1 / 2 ψ ( t b a )             b   ϵ   R ,   a   ϵ   R ,   a 0
ψ a , b ( t ) is a successive wavelet, a represents a factor that shows the frequency, and b is also considered as a time factor. The range of real numbers is also presented by R. If ψ a , b ( t ) assured Equation (1), for the series f(t) є L2(R) or finite energy signal, the successive wavelet transform of f(t) might be considered as follows in Equation (2):
W ψ f ( a ,   b ) = | a | 1 2 R f ( t ) ψ ¯ ( t b a ) d t
The complex conjugate functions ψ ( t ) are shown as ψ ¯ ( t ) . WT filters a wave for f(t) with various filters, in which the successive wavelet is frequently discrete in real applications. Aiming to obtain a successive wavelet, which is not frequently continuous when a = a 0 j , b = k b 0 a 0 j a 0 > 1 , and b є R, k, j might be considered as integers, the DWT of f(t) might be written as Equation (3):
W ψ f ( j ,   k ) = a 0 j 2 R f ( t ) ψ ¯   ( a 0 j t k b 0 ) d t
The best selection for the a0 and b0 would be 2 and 1 in 1-time steps. The power of this logarithmic-based scaling is recognized as the most effective case for practical issues, which is known as dyadic grid arrangement [61]. Equation (3) becomes a binary wavelet transform when a0 = 2, b0 = 1 (Equation (4)):
W ψ f ( j ,   k ) = 2 j 2 R f ( t ) ψ ¯   ( 2 j t k ) d t
The properties of the original data sets in terms of frequency and time domain concepts (a or j and b or k, respectively) can simultaneously be described by W ψ f ( a ,   b ) or W ψ f ( j ,   k ) . Interestingly, low levels of frequency in WT in addition to high time-domain resolution would result in a reduction of a or j and vice versa [36]. The DWT can be considered as Equation (5) using integer time steps for a discrete-time series f(t).
W ψ f ( j ,   k ) = 2 j 2 t = 0 N 1 f ( t ) ψ ¯   ( 2 j t k )
where W ψ f ( j ,   k ) equals a wavelet coefficient for the DW of scale a = 2j, b = 2jk. There are also various filters, namely, high and low passes operated by DWT. These filters are responsible for passing the original time series in order to separate them at a variety of levels. The DCs described as detail coefficients and approximation based subcomponents series (A) were calculated through Equation (5), where the DCs show the low-scaled high-frequency elements of a signal, and A is the components in the opposite way [61].
Hybrid modeling of approaches based on wavelet–artificial intelligence could provide more precise outputs of variations, periodicities, and trends in time series, which would make the models more viable. The coupled wavelet–AI models were obtained by combining the DWT and AI models. As an example, WGEP is a GEP-based model that utilizes the subseries components extracted by applying discrete wavelet transform on original patterns, so its inputs consist of the decomposition of the original patterns (Ds) obtained by the Mallat DWT algorithm [61]. Figure 3, Figure 4 and Figure 5 show the decomposed wavelet subcomponents of the applied input parameters. It should be noted that the sum of each detail and approximation subcomponent gives the original input data. The attribute of each of these subcomponents is not the same, which provides useful information on various resolution levels [62]. Thus, adding this detailed information to the inputs improves the prediction accuracy of the AI models. However, it should be noted that D1 indicates the details of subcomponents with the highest frequency. Generally, this component usually involves noisy data, while D3 shows the details of subcomponents with the lowest frequency. A represents the approximation of the original signal, ignoring the detail components. In fact, all these subcomponents have different information, but sometimes, removing the highest frequency component with noisy data may improve the models’ accuracy.
As an example, Figure 6 reveals the flowchart and structure of the WGEP model. The selection of effective details and A subcomponents was made based on the correlation analysis, which is explained in the Modeling Protocol subsection.

2.3.2. Gene Expression Programming (GEP)

GEP is a genetic-based algorithm which employs elements such as chromosomes and expression trees as programs. Every chromosome is composed of genes, any of which encode a smaller subexpression tree. In this model, the architecture coordination of the linear chromosomes authorizes it to operate notable genetic operators such as mutation, transposition, and recombination without any limitation. Selecting the sets of data to fit accordingly in order to introduce the relationship between variables is notable at this step. Meanwhile, GEP shows several advantages which make it a stronger approach compared to other learning algorithms. Based on the nature of GEP, the creation of diversity is quite simple [63]. In other words, the obvious robustness of GEP is in simplifying the production of genetic variation, as well as its unique nature that provides the evolution of complex programs consisting of numerous subprograms [64]. Furthermore, this procedure makes it capable of expressing several relationships between input and output variables. This unique feature makes GEP stronger than other methods [65,66].
The following plan for GEP-based simulation of soil P (target variable) using the mentioned input variables was followed in this study [67]. The first step was to select a fitness function, although various absolute- and relative-error-based fitness functions might be used for modeling soil P. Then, the root relative squared error (RRSE) was applied as advised by the literature (e.g., [25]). Second, it is essential to choose the predictor parameters and sets of function. Then, the utilized input variables were obtained using the Gamma test, as mentioned before. Various GEP function sets were evaluated (not presented here) to select the proper set. It was found that the following function set gave the optimum results.
Moreover, it is necessary to select a chromosomal architecture where a specific head size (= 8) and the number of genes (= 3) are utilized as advised by previous authors (e.g., [26]). Finally, we selected genetic operators. GeneXpro default operators were also chosen following previous research [65].

2.3.3. Multivariate Adaptive Regression Spline (MARS)

MARS is a nonparametric regression approach that might be regarded as an accompaniment of linear models which automatically simulates the nonlinearities and interactions between parameters. This innovative method excels at searching and finding optimal nonparametric regression models and also controls the data easily [68]. This algorithm has some striking advantages, such as flexibility, rapidness, and accuracy, which make it usable for predicting continuous and binary outputs using a combination of linear and nonlinear methods. Identifying the underlying functional relationships between input and output variables without any assumptions is another capability of MARS [69]. Basis functions are generated by the MARS model at the next step, called “searching”. The algorithm includes a forward and backward stepwise plan. The intemperate forward stepwise chosen plan would make a complex and overtrained model after a number of splits, which would add up the functions as well as find the probabilistic nodes in order [70], which will have a lower performance accuracy. Thus, this step involves the removal of minimum real terms, which eliminates the nonobligatory variables among the previously chosen set. The resulting splines provide more accuracy as well as the flexibility to the MARS technique, which assumes a threshold and deviations of linear functions [71].

2.3.4. Random Forest (RF)

The RF algorithm was first presented by Breiman [72], who gathered a collection of trees for spatial and time-series simulations. This method as a group learning algorithm directs high-dimension regression and classification issues and has been widely applied for forecasting issues [69]. One of the advantageous of this approach is its capability to calculate interactions among different factors. This is a tree-based group method where all trees are dependent on multitude variables, and a forest is grown from several regression trees put together and forming a group [72]. There are two key parameters: the number of variables and that of trees [73]. Dealing with random binary trees enables the RF to make the final decision, which is achieved across averaging the output, after fitting single trees into the ensemble (bagging procedure). The bias of the bagged trees is the same as that of the single trees, while the variance is reduced by a reduction in the correlation between trees [74].
Here, after examining various numbers of trees, the tree number 150 was selected, which produced the lowest error magnitudes, while 10 cycles were found as the best for calculating the mean error, iteratively. Regarding the training error process, the percentage decrease was found to be 5% by trial and error; based on that, the minimum child node size to stop and the maximum number of levels were selected as 5 and 10, respectively.

2.3.5. Support Vector Machine (SVM)

SVM is one of the best machine-learning-based approaches that has been broadly implemented in different scientific fields recently. This technique was first developed by Vapnik [75] and has been framed on the concept of decisions in linear data categorization. This theory minimizes the upper bound generalization error instead of the local training error, providing this algorithm with a greater ability to generalize, which is the key aim in statistical learning for solving complicated problems [57,76]. In the current research, the regression–SVM type 1 was employed, as it has shown higher performance accuracy in previous research [77]. Through a trial and error procedure, SVM constant values were chosen as 10 (capacity) and 0.15 (epsilon). Different linear, sigmoid, polynomial, and radial basis functions were evaluated as SVM kernel functions; among them, the radial basis function kernel (Gamma = 0.25) gave the best results. Finally, the highest number of iterations was found to be 1000 (iteratively), and the applied models were stopped at an error value of 0.005.

2.3.6. Artificial Neural Networks (NN)

NN is a computing method whose basic concept is formed by the behavior of biological neural networks. The nodes are the main processor in this approach, similar to neurons, which also have weighted connections in this biological base system [78]. An advantage of this method is the ability to exploit relationships utilizing dependent and independent factors that enable it to model nonlinear behaviors. This specification of this model makes it more efficient for noisy data systems. Briefly, this computer-based model has been proposed in several layers that are responsible for recognizing complex patterns. In this study, a multilayer perception feed-forward network was applied to the data sets in addition to the different transfer functions in hidden and output layers. In selecting the perfect function process, 100 networks were examined during each training stage, where the conjugate gradient training algorithm with 180 cycles presented the maximum accuracy. Furthermore, Tanh and Identity activation were the functions with which the best function for the layers was associated. Through an iterative process, the number of hidden layer nodes was found to be 12 in this research. SVM, NN, RF, and MARS analysis was carried out using STATISTICA software (StatSoft, Hamburg, Germany).

2.4. Modeling Protocol

In this research, identifying the influential independent variable on soil P for selecting the most important input parameters was utilized using the statistical Gamma test. Constructing models based on this calibration test can be considered inflexible, though it may be advantageous in some situations where model selection is integrated together with the variable selection approach [26]. The obtained outputs using this test revealed that the best input composition would be a combination of pH, OC, and clay content, which are used to register the maximum influence on soil P with the minimum Gamma statistic (0.0193). Extracting the wavelet decompositions of the selected input variables was the next step, and computing the correlations of these subcomponents with the target parameter (Soil P) was considered as the following stage. All the results of this process are summarized in Table 2. Finally, the criteria for choosing the effective subcomponents as predictor variables for the wavelet–AI models were the ones with correlation values >0.09. Even though the linear correlations seem to be lower for each component, all these may be useful for the learning process of the implemented nonlinear models.
A k-fold procedure was applied to the user data with the aim of dividing them into training as well as testing blocks. In this method, all the available data are split up to k number of blocks. We used 10-fold independent blocks in our research. The considered models were trained with a portion of data each time and tested using the rest of them. In this process, a total of 50 train–test phases (5 models × 10 k-folds = 50) were applied to each single and wavelet-based AI model. Using this effective approach provided reliable results of simulated values, which prevented model overfitting during the simulation as well as allowed all the data to participate in two stages.
For inclusive justification of the models’ performance, there is a necessity of applying indices with the aim of assessing the validation of constructed models. Here, three statistical indicators (from Equation (6) to Equation (8)), namely, correlation coefficient (R), scatter index (SI), and Nash–Sutcliffe coefficient (NS), were utilized. Additionally, a t-test analysis was performed to prove the results.
R =   i = 1 n ( P i o P ¯ o ) ( P i m P ¯ m ) i = 1 n ( P i o P ¯ o ) 2 i = 1 n ( P i m P ¯ m ) 2
S I = R M S E P ¯ = 1 n i = 1 n ( P i o P i m ) 2 P ¯ o
N S = 1 i = 1 n ( P i m P i o ) 2 i = 1 n ( P i o P ¯ o ) 2
In Equation (6), P i o denotes the measured soil P value at the i-th observation, P i m represents the corresponding simulated P magnitude, and P ¯ o indicates the mean observed P magnitude. n stands for the number of patterns. Overall, R cannot be utilized as a fitness measurement alone. Thus, using other indices such as SI and NS can be considered essential for validation models. The best fit for R, SI, and NS can be summarized in values equaling to 1, 0, and 1, respectively. In calculating these concepts, a full series of applied patterns were used, including both pooling the simulations of each data set and dividing for each test phase (1/10 of the patterns matrix).

3. Results and Discussion

The coupled wavelet–AI models, including WGEP, WMARS, WRF, WSVM, and WNN, were obtained through combining the DWT and AI models. In order to determine the decomposition level, we used the log(N) formula, where N is an indicator of the data number. In other words, 288 data were used in our study, which resulted in three decomposition levels (log(288) = 3) in DWT applications. Moreover, one approximation, namely, A, and three details (D1, D2, and D3) for each input were obtained as subcomponents, each of which were utilized as input in the wavelet-based AI models. Table 3 presents the statistical indices of the employed single as well as wavelet-based AI models. It should be noted that the values are not significant with respect to the t-test results; however, the main aim here was to compare different wavelet-based models. The highest significance level with the lowest t-statistic indicated the most robust model.
The single models could not simulate the soil P content with reasonable accuracy, which might be linked to the complicated chemistry property of P in soils. This situation could be due to the reaction of inorganic P with other elements such as calcium, iron, and aluminium, which can convert them to phosphates [15]. In our study area, also, organic P can be found with different shapes, and it can be resistant to microbial degradation in soil and highly correlated with the variations of OC (as can be seen in Table 1) and soil texture. Specifically, limes might bring a discrepancy to the amount of P, which can make extrapolations difficult [24]. On the other hand, clay contents and Fe and Al oxides could improve P sorption [79,80,81], while soil OC presents an adverse effect [82]. Nevertheless, Demaria et al. [83] stated that soil pH and metal ions show a significant influence on soil P contents, so the variations in soil P contents might be due to the variations in soil properties, as soil OC and clay distribution were considerably different in our studied area.
Based on Table 3, the lowest SI (0.127) and the highest R (0.990) and NS (0.978) values belong to the WGEP model, which surpasses other models, including WRF, WMARS, WSVM, and WNN (ranked successively). By contrast, WNN represents the worst results, with the highest SI (0.256) and the lowest R (0.960) and NS (0.911) indicators. Meanwhile, the discrepancies among the models’ performance are not considerable (0.067, 0.054, 0.086, and 0.129 between the WGEP and WRF, WMARS, WSVM, and WNN, respectively) with the exception of WNN. Furthermore, the accuracy increments of the WMARS, WRF, WSVM, and WNN models with respect to R-NS measures, respectively, are 1.5%–3.1%, 1.2%–2.3%, 2.1%–4.3%, and 3.1%–7.4% using the WGEP model. Moreover, a t-test approach as a statistical hypothesis was also set at a significant level of 95% with the aim of comparing the degree of the difference between the measured and simulated soil P values, that the results are presented in Table 3. During the t-test statistics analyses, the WGEP model revealed the highest performance among the other applied wavelet-based AI models.
Our findings demonstrated that the GEP technique has applied all the introduced input variables (selected primarily using the Gamma test, as well as the correlations between the wavelet decompositions and the target variable). Thus, the outcomes of the Gamma test are confirmed by the GEP. Shiri et al. [26] argued that the differences between the input selection of the GEP and Gamma test might be linked to their functional structure, where a major assumption with the Gamma test is that the existing interrelation of the studied problem consisted of a smooth function and a random variable, while GEP develops the structure and constants of the formulas simultaneously. The extent to which site-specific constants are embedded in the formulation—and whether those constants are similar to or different than those at another statio—would dictate the transferability of the function between stations. Therefore, in the current case, giving similar results of the input selection might be linked to the nature of the studied problem as well as using the wavelet transform for transferring the original data into their subcomponents. The maximum weight given by the GEP to the introduced input variables belongs to the subcomponents of the soil OC as well as soil clay content. The MARS model, however, gives the highest weight to the D12 (pH) and D21 (OC), while the highest weights correspond to the D22 (OC), D21 (OC), D12 (pH), and A31 (clay) with the RF model. Nonetheless, analyzing the obtained equation reveals that soil clay and OC content have a direct, positive effect on soil P amount, while pH shows an adverse relation, which confirms the results of previous studies (e.g., [18,84]).
The split-up of SI and NS values of the utilized techniques per test stage can be found in Figure 7, where it can be seen that the lowest SI and the highest NS values in 6 test stages (1st, 2nd, 4th, 6th, 8th, and 10th) out of 10 are allocated to the WGEP, while it has similar accuracy with WRF model in the 3rd and 5th test stages. Figure 7 shows considerable differences among the models’ performance accuracy in different test stages as well.
The reported difference between the maximum and minimum SI values among the test stages of WGEP and WSVM is 0.104 and 0.247, respectively. The same trend is also observed for the WGEP model for the NS indicator (the lowest NS fluctuations); however, the WMARS presents the highest fluctuations for NS. Taking into consideration that the current research applied the k-fold testing approach in the way of holding back the data to feed the train–test partitions, such variations would dictate the requirement of using this method to evaluate the methodologies. Since there is a need for a strong validation for all available data as well as other data with the same statistical ranges, the obtained results would not satisfy these criteria. On the other hand, the patterns used in this research have been chosen from various spatial points where the necessity of using k-fold testing becomes more indispensable. The application of k-fold testing is more significant in checking the models’ generalizability via external applications, for instance, in the considered models trained with data from different places and tested using data from another place, as the test patterns include data from different points [66].
The measured versus simulated magnitudes of soil P (utilizing the applied methods) have been plotted in Figure 8.
It is apparent from the scatterplots that the WGEP model has less scattered estimates than the other applied methods, and its estimates fall into the 0.95 prediction interval except for 1 or 2 values. The WRF model has the second-best accurate estimates, while NN obtains the worst accuracy and most scattered soil P estimations. The WGEP model adequately estimates high soil P values, while the other models generally tend to overestimate them. All these graphs confirm the statistics evaluated in Table 3.
Overall, the wavelet-based GEP model excelled among all the applied models, as it could provide the most precise results in modeling soil P values in the present study. It also showed that each of the DWs obtained using DWT played a distinct role in the original data and exhibited different effects on the original output (here soil P data). This study justified previous applications [85,86,87] and indicated that discrete wavelet transform can be considered as a useful technique in data preprocessing as well.
Finally, it should be noted that land use may affect the relationships among soil variables, as discussed by other authors (e.g., [88,89,90,91]). Nonetheless, topography affects soil properties as well due to the local redistribution of water, solar radiation, and parent material through erosion and deposition processes [92,93,94]. While the developed models might not be directly transferable to other regions (since the influential parameters on soil P may differ in distinct locations), the proposed approaches can be useful for a suitable assessment of soil P magnitudes in different regions. Therefore, the current study presents a practitioner with a fully described blueprint for applying these techniques.

4. Conclusions and Challenges

The current research demonstrated the necessity of applying the k-fold cross-validation to assess the five variant wavelet-based AI models, namely, WGEP, WMARS, WRF, WSVM, and WNN, which were applied in modeling soil P. To achieve this goal, soil pH, OC, and clay content were used as inputs. These soil properties were the most adequate considering our study area conditions. Although applications based on a single data set assignment are commonly used, k-fold testing is advisable to perform a thorough evaluation of the modeling procedure referring to a data set as stated by Marti et al. [95]. Gamma test was also utilized for selecting the optimal inputs to the applied models and with the aim of preventing the AI-based models from overfitting in addition to ten k-fold cross-validations. Applying DWT with three decomposition levels to the input data was another part of the process where each subcomponent was used as input. Finally, our findings can be summarized in the following five main points: (i) The wavelet transform technique can be considered as an effective tool in preprocessing input data of AI methods in modeling soil P; (ii) among the applied AI models, WGEP provided the best estimations, followed by the WRF, while the WNN showed the worst results; (iii) models’ results were also compared according to the t-test, and WGEP obtained the most robust results among them in prediction of soil P; (iv) the SI-NS accuracy increments of the WMARS, WRF, WSVM, and WNN models were also found to be 35%–3.1%, 30%–2.3%, 40%–4.3%, and 50%–7.4% using the WGEP model, respectively; and, (v) k-fold application can be considered as an essential method in modeling soil P, as it demonstrated a high spatial variation, defaulting and preventing the applied models from overtraining.
Applying the hybrid wavelet–data-driven models on the water and soil-related analyses, including groundwater quality modeling, soil moisture simulation, etc., could be an interesting topic for future investigations. Furthermore, coupling different wavelet transforms ranging from discrete (DWT) to continuous (CWT) with other data-driven models for the same issues would be another suggestion for further research in this field.

Author Contributions

Conceptualization, J.S., A.K., and O.K.; Methodology and validation, J.S., A.K., O.K., S.M.K., and S.K.; Formal analysis, J.S., A.K., O.K., S.M.K., S.K., and A.H.N.; Writing, J.S., A.K., O.K., S.M.K., S.K., A.H.N., and J.R.-C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Acknowledgments

This study was partially supported by the Department of Soil Science, University of Tehran, Iran.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Keesstra, S.; Geissen, V.; Mosse, K.; Piiranen, S.; Scudiero, E.; Leistra, M.; van Schaik, L. Soil as a filter for groundwater quality. Curr. Opin. Environ. Sustain. Terr. Syst. 2012, 4, 507–516. [Google Scholar] [CrossRef]
  2. Kumar, V.; Parihar, R.D.; Sharma, A.; Bakshi, P.; Singh Sidhu, G.P.; Bali, A.S.; Karaouzas, I.; Bhardwaj, R.; Thukral, A.K.; Gyasi-Agyei, Y.; et al. Global evaluation of heavy metal content in surface water bodies: A meta-analysis using heavy metal pollution indices and multivariate statistical analyses. Chemosphere 2019, 236, 124364. [Google Scholar] [CrossRef]
  3. Ribolzi, O.; Valles, V.; Gomez, L.; Voltz, M. Speciation and origin of particulate copper in runoff water from a Mediterranean vineyard catchment. Environ. Pollut. 2002, 117, 261–271. [Google Scholar] [CrossRef]
  4. Serpa, D.; Nunes, J.P.; Keizer, J.J.; Abrantes, N. Impacts of climate and land use changes on the water quality of a small Mediterranean catchment with intensive viticulture. Environ. Pollut. Barking Essex. 2017. [Google Scholar] [CrossRef] [PubMed]
  5. García-Díaz, A.; Bienes, R.; Sastre, B.; Novara, A.; Gristina, L.; Cerdà, A. Nitrogen losses in vineyards under different types of soil groundcover. A field runoff simulator approach in central Spain. Agric. Ecosyst. Environ. 2017, 236, 256–267. [Google Scholar] [CrossRef]
  6. Atucha, A.; Merwin, I.A.; Brown, M.G.; Gardiazabal, F.; Mena, F.; Adriazola, C.; Lehmann, J. Soil erosion, runoff and nutrient losses in an avocado (Perseaamericana Mill) hillside orchard under different groundcover management systems. Plant Soil 2013, 368, 393–406. [Google Scholar] [CrossRef] [Green Version]
  7. Khaledian, Y.; Kiani, F.; Ebrahimi, S.; Brevik, E.C.; Aitkenhead-Peterson, J. Assessment and monitoring of soil degradation during land use change using multivariate analysis. Land Degrad. Dev. 2016, 28, 128–141. [Google Scholar] [CrossRef]
  8. Sulieman, M.; Saeed, I.; Hassaballa, A.; Rodrigo-Comino, J. Modeling cation exchange capacity in multi geochronological-derived alluvium soils: An approach based on soil depth intervals. CATENA 2018, 167, 327–339. [Google Scholar] [CrossRef]
  9. Ritzema, H.P.; Braun, H.M.H. Environmental aspects of drainage. In Drainage Principles and Applications; Ritzema, H.P., Ed.; International Institute for Land Reclamation and Improvement (ILRI): Wageningen, The Netherlands, 1994; Volume 16, pp. 1041–1065. [Google Scholar]
  10. Keesstra, S.; Nunes, J.P.; Novara, A.; Finger, D.; Avelar, D.; Kalantari, Z.; Cerdà, A. The superior effect of nature based solutions in land management for enhancing ecosystem services. Sci. Total Environ. 2018, 610–611, 997–1009. [Google Scholar] [CrossRef] [Green Version]
  11. D’Angelo, E.; Crutchfield, J.; Vandiviere, M. Rapid, sensitive microscale determination of phosphate in water and soil. J. Environ. Qual. 2001, 30, 2206–2209. [Google Scholar] [CrossRef] [Green Version]
  12. Madramootoo, C.A.; Johnston, W.R.; Willardson, L.S. Management of Agricultural Drainage Water Quality; Water Reports 13; International Commission on Irrigation and Drainage (ICID): New Delhi, India; Food and Agriculture Organization of the United Nations (FAO): Rome, Italy, 1997. [Google Scholar]
  13. Bergström, L.; Kirchmann, H.; Djodjic, F.; Kyllmar, K.; Ulén, B.; Liu, J.; Andersson, H.; Aronsson, H.; Börjesson, G.; Kynkänniemi, P.; et al. Turnover and losses of phosphorus in Swedish agricultural soils: Long-Term changes, leaching trends and mitigation measures. J. Environ. Qual. 2015, 44, 512–523. [Google Scholar] [CrossRef] [Green Version]
  14. Musavi, R.; Sepehr, E. Phosphorus efficiency of some barley genotypes in the presence of phosphate-solubilizing microorganisms. EJGCTS 2013, 4, 27–40. (In Persian) [Google Scholar]
  15. Hosseini, M.; RajabiAgereh, S.; Khaledian, Y.; Jafarzadeh Zoghalchali, H.; Brevik, E.C.; MovahediNaeini, S.A.R. Comparison of multiple statistical techniques to predict soil phosphorus. Appl. Soil Ecol. 2017, 114, 123–131. [Google Scholar] [CrossRef]
  16. Sharpley, A. Identifying sites vulnerable to soil phosphorus loss on agricultural runoff. J. Environ. Qual. 1995, 24, 947–951. [Google Scholar] [CrossRef]
  17. Daré, J.K.; Silva, C.F.; Freitas, M.P. Revealing chemophoric sites in organophosphorus insecticides through the MIA-QSPR modeling of soil sorption data. Ecotoxicol. Environ. Saf. 2017, 144, 560–563. [Google Scholar] [CrossRef]
  18. Cox, F.R. Predicting increases in extractable phosphorus from fertilizing soils of varying clay content. Soil Sci. Soc. Am. J. 1994, 58, 1249–1253. [Google Scholar] [CrossRef]
  19. Freeman, J.S.; Rowell, D.L. The adsorption and precipitation of phosphate on to calcite. Eur. J. Soil Sci. 1981, 32, 75–84. [Google Scholar] [CrossRef]
  20. Mohebbi Sadegh, M.J. Investigation of relationships between available phosphorus, potassium and some soil properties in agricultural lands of Varamin—Iran. Int. J. Agric. Biosci. 2014, 3, 7–12. [Google Scholar]
  21. Neil, L.L.; McCullough, C.D.; Lund, M.A.; Evans, L.H.; Tsvetnenko, Y. Toxicity of acid mine pit lake water remediated with limestone and phosphorus. Ecotoxicol. Environ. Saf. 2009, 72, 2046–2057. [Google Scholar] [CrossRef]
  22. Rocha, G.S.; Lombardi, A.T.; Melão, M.d.G.G. Influence of phosphorus on copper toxicity to Selenastrum gracile (Reinsch) Korshikov. Ecotoxicol. Environ. Saf. 2016, 128, 30–35. [Google Scholar] [CrossRef]
  23. Nour, M.H.; Smith, D.W.; Gamal El-Din, M.; Prepas, E.E. The application of artificial neural networks to flow and phosphorus dynamics in small streams on the Boreal Plain, with emphasis on the role of wetlands. Ecol. Model. 2006, 191, 19–32. [Google Scholar] [CrossRef]
  24. Keshavarzi, A.; Sarmadian, F.; Omran, E.E.; Iqbal, M. A neural network model for estimating soil phosphorus using terrain analysis. Egypt. J. Remote Sens. Space Sci. 2015, 18, 127–135. [Google Scholar] [CrossRef] [Green Version]
  25. Shiri, J.; Keshavarzi, A.; Kisi, O.; Iturraran-Viveros, U.; Bagherzadeh, A.; Mousavi, R.; Karimi, S. Modeling soil cation exchange capacity using soil parameters: Assessing the heuristic models. Comput. Electron. Agric. 2017, 135, 242–251. [Google Scholar] [CrossRef]
  26. Shiri, J.; Keshavarzi, A.; Kisi, O.; Karimi, S.; Iturraran-Viveros, U. Modeling soil bulk density through a complete data scanning procedure: Heuristic alternatives. J. Hydrol. 2017, 549, 592–602. [Google Scholar] [CrossRef]
  27. Shiri, J.; Kisi, O.; Yoon, H.; Lee, K.K.; Nazemi, A.H. Predicting groundwater level fluctuations with meteorological effect implications: A comparative study among soft computing techniques. Comput. Geosci. 2013, 56, 32–44. [Google Scholar] [CrossRef]
  28. Karimi, S.; Kisi, O.; Kim, S.; Nazemi, A.H.; Shiri, J. Modelling daily reference evapotranspiration in humid locations of South Korea using local and cross-station data management scenarios. Int. J. Climatol. 2017, 37, 3238–3246. [Google Scholar] [CrossRef]
  29. Kisi, O.; Hossein zadehDalir, A.; Cimen, M.; Shiri, J. Suspended sediment modeling using genetic programming and soft computing techniques. J. Hydrol. 2012, 450–451, 48–58. [Google Scholar] [CrossRef]
  30. Landeras, G.; Lopez, J.J.; Kisi, O.; Shiri, J. Comparison of gene expression programming with neuro-fuzzy and neural network computing techniques in estimating daily incoming solar radiation in the Basque Country (Northern Spain). Energy Convers. Manag. 2012, 62, 1–13. [Google Scholar] [CrossRef]
  31. Kisi, O.; Keshavarzi, A.; Shiri, J.; Zounemat-Kermani, M.; Omran, E.E. Groundwater quality modeling using neuro-particle swarm optimization and neuro-differential evolution techniques. Hydrol. Res. 2017, 48, 1508–1519. [Google Scholar] [CrossRef]
  32. Coulibaly, P.; Burn, H.D. Wavelet analysis of variability in annual Canadian streamflows. Water Resour. Res. 2004, 40, W03105. [Google Scholar] [CrossRef]
  33. Partal, T.; Kucuk, M. Long-Term trend analysis using discrete wavelet components of annual precipitations measurements in Marmara region (Turkey). Phys. Chem. Earth 2006, 31, 1189–1200. [Google Scholar] [CrossRef]
  34. Shiri, J.; Kisi, O. Estimation of daily suspended sediment load by using wavelet conjunction models. J. Hydrol. Eng. 2012, 17, 986–1000. [Google Scholar] [CrossRef]
  35. Tamaddun, K.A.; Kalra, A.; Ahmad, S. Wavelet analyses of western US streamflow with ENSO and PDO. J. Water Clim. Chang. 2017, 8, 26–39. [Google Scholar] [CrossRef] [Green Version]
  36. Wang, W.; Ding, J. Wavelet network model and its application to the prediction of the hydrology. Nat. Sci. 2003, 1, 67–71. [Google Scholar]
  37. Partal, T.; Kisi, O. Wavelet and neuro-fuzzy conjunction model for precipitation forecasting. J. Hydrol. 2007, 342, 199–212. [Google Scholar] [CrossRef]
  38. Kisi, O.; Shiri, J. Wavelet and neuro-fuzzy conjunction model for predicting water table depth fluctuations. Hydrol. Res. 2012, 43, 286–300. [Google Scholar] [CrossRef]
  39. Lee, Y.Y.; Liew, K.M. Detection of damage locations in a beam using the wavelet analysis. Int. J. Struct. Stab. Dyn. 2001, 1, 455–460. [Google Scholar] [CrossRef]
  40. Lam, H.F.; Lee, Y.Y.; Sun, H.Y.; Cheng, G.F.; Guo, X. Application of the spatial wavelet transform and Bayesian approach to the crack detection of a partially obstructed beam. Thin Walled Struct. 2005, 43, 1–21. [Google Scholar] [CrossRef]
  41. Zheng, T.; Girgis, A.A.; Makram, E.B. A hybrid wavelet kalman filter method for load forecasting. Electr. Pow. Syst. Res. 2000, 54, 11–17. [Google Scholar] [CrossRef]
  42. Zhou, H.; Wu, L.; Guo, Y. Mid and longterm hydrologic forecasting for drainage are based on WNN and FRM. In ISDA: Sixth International Conference on Intelligent Systems Design and Applications; International Swaps and Derivatives Association (ISDA): New York, NY, USA, 2006; pp. 7–12. [Google Scholar]
  43. Shiri, J.; Kisi, O. Short-Term and long-term streamflow forecasting using a wavelet and neuro-fuzzy conjunction model. J. Hydrol. 2010, 394, 486–493. [Google Scholar] [CrossRef]
  44. Feng, Q.; Wen, X.H.; Li, J.G. Wavelet analysis-support vector machine coupled models for monthly rainfall forecasting in arid regions. Water Resourmanag. 2015, 29, 1049–1065. [Google Scholar] [CrossRef]
  45. Mansouri Daneshvar, M.R.; Bagherzadeh, A.; Alijani, B. Application of multivariate approach in agrometeorological suitability zonation at northeast semiarid plains of Iran. Theor. Appl. Climatol. 2013, 114, 139–152. [Google Scholar] [CrossRef]
  46. Bhunia, S.G.; Keshavarzi, A.; Shit, P.K.; Omran, E.; Bagherzadeh, A. Evaluation of groundwater quality and its suitability for drinking and irrigation using GIS and geo-statistics techniques in semiarid region of Neyshabur, Iran. Appl. Water Sci. 2018, 8, 168. [Google Scholar] [CrossRef] [Green Version]
  47. Bagherzadeh, A.; Ghadiri, E.; SouhaniDarban, A.R.; Gholizadeh, A. Land suitability modeling by parametric-based neural networks and fuzzy methods for soybean production in a semi-arid region. Model. Earth Syst. Environ. 2016, 2, 104. [Google Scholar] [CrossRef] [Green Version]
  48. Soil Survey Staff. Keys to Soil Taxonomy, 12th ed.; USDA-Natural Resources Conservation Service: Washington, DC, USA, 2014. [Google Scholar]
  49. Bagherzadeh, H.R.; Bagherzadeh, A.; Moeinrad, H. Analysis of parametric approaches in qualitative land suitability evaluation for irrigated wheat (Triticum aestivum L.) cultivation at Neyshabur plain. Agroecology 2012, 4, 121–130. [Google Scholar]
  50. Olsen, S.R.; Cole, C.V.; Watanabe, F.S.; Dean, L.A. Estimation of Available Phosphorus in Soils by Extraction with Sodium Bicarbonate; U.S. Government Printing Office: Washington, DC, USA, 1954.
  51. Nelson, D.W.; Sommers, L.P. Total carbon, organic carbon and organic matter. In Methods of Soil Analysis: Part 2; Agronomy Handbook 9; Page, A.L., Ed.; American Society of Agronomy: Madison, WI, USA; Soil Science Society of America: Madison, WI, USA, 1986; pp. 539–579. [Google Scholar]
  52. Gee, G.W.; Bauder, J.W. Particle size analysis. In Methods of Soil Analysis: Part 1; Agronomy Handbook 9; Klute, A., Ed.; American Society of Agronomy: Madison, WI, USA; Soil Science Society of America: Madison, WI, USA, 1986; pp. 383–411. [Google Scholar]
  53. Schoeneberger, P.J.; Wysocki, D.A.; Benham, E.C.; Broderson, W.D. Field Book for Describing and Sampling Soils; Version 2.0; NRCS-National Soil Survey Center: Lincoln, NE, USA, 2012.
  54. Thomas, G.W. Soil pH and soil acidity. In Methods of Soil Analysis: Part 2; Agronomy Handbook 9; Page, A.L., Ed.; American Society of Agronomy: Madison, WI, USA; Soil Science Society of America: Madison, WI, USA, 1996; pp. 475–490. [Google Scholar]
  55. Nelson, R.E. Carbonate and gypsum. In Methods of Soil Analysis: Part 1; Agronomy Handbook 9; Page, A.L., Ed.; American Society of Agronomy: Madison, WI, USA; Soil Science Society of America: Madison, WI, USA, 1982; pp. 181–197. [Google Scholar]
  56. Keshavarzi, A.; Omran Omran, E.E.; Bateni Sayed, M.; Pradhan, B.; Vasu, D.; Bagherzadeh, A. Modeling of available soil phosphorus (ASP) using multi-objective group method of data handling. Model. Earth Syst. Environ. 2016, 2, 157. [Google Scholar] [CrossRef] [Green Version]
  57. Fard, M.M.; Harchagani, H.B. Comparison of artificial neural network and regression pedotransfer functions models for prediction of soil cation exchange capacity in Chaharmahal-e-Bakhtiari province. J. Soil Water Conserv. 2009, 23, 90–99. [Google Scholar]
  58. Stutter, M.I. The composition, leaching, and sorption behavior of some alternative sources of phosphorus for soils. AMBIO 2015, 44, 207–216. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  59. Sharifi, S.S.; Rezaverdinejada, R.; Nourani, V. Estimation of daily global solar radiation using wavelet regression, ANN, GEP and empirical models: A comparative study of selected temperature based approaches. J. Atmos. Sol. Terr. Phys. 2016, 149, 131–145. [Google Scholar]
  60. Kaboudan, M. Extended daily exchange rates forecasts using wavelet temporal resolutions. New Math. Nat. Comput. 2005, 1, 79–107. [Google Scholar] [CrossRef]
  61. Mallat, S.G. A theory for multi resolution signal decomposition: The wavelet representation. IEEE Trans. Pattern Anal. 1989, 11, 674–693. [Google Scholar] [CrossRef] [Green Version]
  62. Kim, T.W.; Valdes, J.B. Nonlinear model for drought forecasting based on a conjunction of wavelet transforms and neural networks. J. Hydrol. Eng. 2003, 6, 319–328. [Google Scholar] [CrossRef] [Green Version]
  63. Shiri, J.; Kisi, O. Application of artificial intelligence to estimate daily pan evaporation using available and estimated climatic data in the Khozestan Province (South-Western Iran). J. Irrig. Drain. Eng. 2011, 137, 412–425. [Google Scholar] [CrossRef]
  64. Ferreira, C. Gene expression programming: A new adaptive algorithm for solving problems. Complex Syst. 2001, 13, 87–129. [Google Scholar]
  65. Landeras, G.; Bekoe, E.; Ampofo, J.; Logah, F.; Diop, M.; Cisse, M.; Shiri, J. New alternatives for reference evapotranspiration estimation in West Africa using limited weather data and ancillary data supply strategies. Theor. Appl. Climatol. 2018, 132, 701–716. [Google Scholar] [CrossRef]
  66. Shiri, J. Evaluation of FAO56-PM, empirical, semi-empirical and gene expression programming approaches for estimating daily reference evapotranspiration in hyper-arid regions of Iran. Agric. Water Manag 2017, 188, 101–114. [Google Scholar] [CrossRef]
  67. Ferreira, C. Gene Expression Programming: Mathematical Modeling by an Artificial Intelligence; Springer: Berlin/Heidelberg, Germany; New York, NY, USA, 2006; 478p. [Google Scholar]
  68. Friedman, J.H. Multivariate adaptive regression splines. Ann. Stat. 1991, 19, 1. [Google Scholar] [CrossRef]
  69. Fan, J.; Wu, L.; Zhanga, F.; Caia, H.; Zeng, W.; Wang, X.; Zoua, H. Empirical and machine learning models for predicting daily global solar radiation from sunshine duration: A review and case study in China. Renew. Sustain. Energy Rev. 2019, 100, 186–212. [Google Scholar]
  70. Andres, J.D.; Lorca, P.; de Cos Juez, F.J.; Sánchez-Lasheras, F. Bankruptcy forecasting: A hybrid approach using Fuzzy c-means clustering and Multivariate Adaptive Regression Splines (MARS). Expert Syst. Appl. 2010, 38, 1866–1875. [Google Scholar] [CrossRef]
  71. Zhang, W.G.; Goh, A.T.C. Multivariate adaptive regression splines for analysis of geotechnical engineering systems. Comput. Geotech. 2013, 48, 82–95. [Google Scholar] [CrossRef]
  72. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
  73. Youssef, A.M.; Pourghasemi, H.R.; Pourtaghi, Z.S.; Al-Katheeri, M.M. Landslide susceptibility mapping using random forest, boosted regression tree, classification and regression tree, and general linear models and comparison of their performance at Wadi, Tayyah Basin, Asir Region, Saudi Arabia. Landslides 2016, 13. [Google Scholar] [CrossRef]
  74. Hastie, T.; Tibshirani, R.; Friedman, J. Random forests. In The Elements of Statistical Learning; Springer Series in Statistics; Springer: New York, NY, USA, 2009; pp. 587–604. [Google Scholar]
  75. Vapnik, V.; Golwich, S.; Smola, A.J. Support vector method for function approximation, regression estimation, and signal processing. In Advances in Neural Information Processing Systems 9; Mozer, M., Jordan, M., Petsche, T., Eds.; MIT Press: Boston, MA, USA, 1997; pp. 281–287. [Google Scholar]
  76. Gunn, S.R. Support Vector Machines for Classification and Regression; Technical Report; University of Southampton: Southampton, UK, 1998. [Google Scholar]
  77. Shiri, J.; Nazemi, A.H.; Sadraddini, A.A.; Landeras, G.; Kisi, O.; FakheriFard, A.; Marti, P. Comparison of heuristic and empirical approaches for estimating reference evapotranspiration from limited inputs in Iran. Comput. Electron. Agric. 2014, 108, 230–241. [Google Scholar] [CrossRef]
  78. Haykin, S. Neural Networks: A Comprehensive Foundation; Prentice-Hall: Upper Saddle River, NJ, USA, 1999; 842p. [Google Scholar]
  79. Singh, B.; Gilkes, R.J. Phosphorus sorption in relation to soil properties for the major soil types of south-western Australia. Aust. J. Soil Res. 1991, 29, 603–618. [Google Scholar] [CrossRef]
  80. Freese, D.; Van der Zee, S.; Van Riemsdijk, W.H. Comparison of different models for phosphate sorption as a function of the iron and aluminum oxides in soils. Eur. J. Soil Sci. 1992, 43, 729–738. [Google Scholar] [CrossRef]
  81. Frossard, E.; Brossard, M.; Hedley, M.J.; Metherell, A. Reactions controlling the cycling of P in soils. In Phosphorus Cycling in Terrestrial and Aquatic Ecosystems: A Global Perspective; Tiessen, H., Ed.; SCOPE/John Wiley: New York, NY, USA, 1995; pp. 107–137. [Google Scholar]
  82. Dubus, I.G.; Becquer, T. Phosphorus sorption and desorption in oxide-rich Ferralsols of New Caledonia. Aust. J. Soil Res. 2001, 39, 403–414. [Google Scholar] [CrossRef]
  83. Demaria, P.; Sinaj, S.; Flisch, R.; Frossard, E. Soil properties and phosphorus isotopic exchangeability in cropped temperate soils. Commun. Soil Sci. Plant Anal. 2013, 44, 287–300. [Google Scholar] [CrossRef]
  84. Yousef, B.B.; Akiri, B. Sodium bicarbonate extraction to estimate nitrogen, phosphorus, and potassium availability in soils. Soil Sci. Soc. Am. J. 1978, 42, 319–323. [Google Scholar] [CrossRef]
  85. Nayak, P.C.; Venkatesh, B.; Krishna, B.; Jain, S.K. Rainfall-Runoff modeling using conceptual, data driven, and wavelet based computing approach. J. Hydrol. 2013, 493, 57–67. [Google Scholar] [CrossRef]
  86. Liu, Z.Y.; Zhou, P.; Chen, G.; Guo, L.D. Evaluating a coupled discrete wavelet transform and support vector regression for daily and monthly streamflow forecasting. J. Hydrol. 2014, 519, 2822–2831. [Google Scholar] [CrossRef]
  87. Shoaib, M.; Shamseldin, A.Y.; Melville, B.W. Comparative study of different wavelet based neural network models for rainfall-runoff modeling. J. Hydrol. 2014, 515, 47–58. [Google Scholar] [CrossRef]
  88. Raheem Lahmod, N.; Talib Alkooranee, J.; Gatea Alshammary, A.A.; Rodrigo-Comino, J. Effect of wheat straw as a cover crop on the chlorophyll, seed, and oilseed yield of Trigonella foeunm graecum L under water deficiency and weed competition. Plants 2019, 8, 503. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  89. Rodrigo-Comino, J.; Giménez-Morera, A.; Panagos, P.; Pourghasemi, H.R.; Pulido, M.; Cerdà, A. The potential of straw mulch as a nature-based solution for soil erosion in olive plantation treated with glyphosate: A biophysical and socioeconomic assessment. Land Degrad. Dev. 2019. [Google Scholar] [CrossRef]
  90. Reijneveld, J.A.; Ehlert, P.A.I.; Termorshuizen, A.J.; Oenema, O. Changes in the soil phosphorus status of agricultural land in The Netherlands during the 20th century. Soil Use Manag. 2010, 26, 399–411. [Google Scholar] [CrossRef]
  91. Yazdanbakhsh, A.; Alavi, S.N.; Valadabadi, S.A.; Karimi, F.; Karimi, Z. Heavy metals uptake of salty soils by ornamental sunflower, using cow manure and biosolids: A case study in Alborz city, Iran. Air Soil Water Res. 2020, 13. [Google Scholar] [CrossRef] [Green Version]
  92. Gessler, P.E.; Chadwick, O.A.; Chamran, R.; Althouse, L.; Holmes, K. Modeling soil-landscape and ecosystem properties using terrain attributes. Soil Sci. Soc. Am. J. 2000, 64, 2046–2056. [Google Scholar] [CrossRef] [Green Version]
  93. Kozar, B.; Lawrence, R.; Long, D.S. Soil phosphorus and potassium mapping using a spatial correlation model incorporating terrain slope gradient. Precis. Agric. 2002, 3, 407–417. [Google Scholar] [CrossRef]
  94. Anderson, D.W. The effect of parent material and soil development on nutrient cycling in temperate ecosystems. Biogeochemistry 1988, 5, 71–97. [Google Scholar] [CrossRef]
  95. Marti, P.; Shiri, J.; Duran-Ros, M.; Arbat, G.; Cartagena, F.R.; Puig-Bargues, J. Artificial neural networks vs. gene expressions programming for estimating outlet dissolved oxygen in micro irrigation sand filters fed with effluents. Comput. Electron. Agric. 2013, 99, 176–185. [Google Scholar] [CrossRef]
Figure 1. Location of the study area and sampling points.
Figure 1. Location of the study area and sampling points.
Sustainability 12 02150 g001
Figure 2. The land units/land use maps in the study region (obtained from [49]).
Figure 2. The land units/land use maps in the study region (obtained from [49]).
Sustainability 12 02150 g002
Figure 3. Decomposed wavelet subcomponents of soil pH data.
Figure 3. Decomposed wavelet subcomponents of soil pH data.
Sustainability 12 02150 g003
Figure 4. Decomposed wavelet subcomponents of soil organic carbon (OC) data.
Figure 4. Decomposed wavelet subcomponents of soil organic carbon (OC) data.
Sustainability 12 02150 g004
Figure 5. Decomposed wavelet subcomponents of clay content data.
Figure 5. Decomposed wavelet subcomponents of clay content data.
Sustainability 12 02150 g005
Figure 6. Structure of the wavelet–gene expression programming (WGEP) model.
Figure 6. Structure of the wavelet–gene expression programming (WGEP) model.
Sustainability 12 02150 g006
Figure 7. Split upper test stage statistics of the applied models.
Figure 7. Split upper test stage statistics of the applied models.
Sustainability 12 02150 g007
Figure 8. Measured vs. simulated soil P magnitudes of the applied techniques (for all available patterns). (a) WGEP-based soil P (b) WMARS-based doil P (c) WRF-based soil P (d) WSVM-based soil P (e) WNN-based soil P.
Figure 8. Measured vs. simulated soil P magnitudes of the applied techniques (for all available patterns). (a) WGEP-based soil P (b) WMARS-based doil P (c) WRF-based soil P (d) WSVM-based soil P (e) WNN-based soil P.
Sustainability 12 02150 g008aSustainability 12 02150 g008b
Table 1. Main statistical descriptors of the used soil data set.
Table 1. Main statistical descriptors of the used soil data set.
Clay (%)Silt (%)Sand (%)OC (%)pH (−)P (ppm)
Maximum3758.470.02.228.370.4
Minimum1016.419.00.177.52.4
Mean22.636.341.10.737.918.7
Standard deviation5.96.19.30.330.216.2
Coefficient of variation0.30.20.20.450.020.9
Skewness−0.1930.0860.4031.788−0.0541.363
Kurtosis−0.7220.3300.0944.244−0.5230.987
* OC: organic carbon; P: soil phosphorus values.
Table 2. The correlation coefficient between the subcomponents of input parameters and soil P values.
Table 2. The correlation coefficient between the subcomponents of input parameters and soil P values.
Input Subcomponents
AD1D2D3
pH0.090−0.092−0.182−0.099
OC0.1370.2150.1780.128
Clay−0.204−0.082−0.054−0.107
A: approximation subcomponents; D1, D2, and D3: three details for each input.
Table 3. Error statistics of the applied models.
Table 3. Error statistics of the applied models.
GEPMARSRFSVMNNWGEPWMARSWRFWSVMWNN
SI0.5640.6120.6000.6890.7040.1270.1940.1810.2130.256
R0.6780.5620.5870.4320.4120.9900.9750.9780.9700.960
NS0.6000.5620.5780.5020.4980.9780.9490.9560.9380.911
t-test results
WGEPWMARSWRFWSVMWNN
t-Statistic−0.2810.258−0.2600.6140.628
Resultant significance level0.9010.8800.8890.8210.791

Share and Cite

MDPI and ACS Style

Shiri, J.; Keshavarzi, A.; Kisi, O.; Karimi, S.M.; Karimi, S.; Nazemi, A.H.; Rodrigo-Comino, J. Estimating Soil Available Phosphorus Content through Coupled Wavelet–Data-Driven Models. Sustainability 2020, 12, 2150. https://0-doi-org.brum.beds.ac.uk/10.3390/su12052150

AMA Style

Shiri J, Keshavarzi A, Kisi O, Karimi SM, Karimi S, Nazemi AH, Rodrigo-Comino J. Estimating Soil Available Phosphorus Content through Coupled Wavelet–Data-Driven Models. Sustainability. 2020; 12(5):2150. https://0-doi-org.brum.beds.ac.uk/10.3390/su12052150

Chicago/Turabian Style

Shiri, Jalal, Ali Keshavarzi, Ozgur Kisi, Sahar Mohsenzadeh Karimi, Sepideh Karimi, Amir Hossein Nazemi, and Jesús Rodrigo-Comino. 2020. "Estimating Soil Available Phosphorus Content through Coupled Wavelet–Data-Driven Models" Sustainability 12, no. 5: 2150. https://0-doi-org.brum.beds.ac.uk/10.3390/su12052150

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop