1. Introduction
Saline-alkaline soil is one of the main land degradation threats that affect soil fertility, stability, and biodiversity [
1]. The accumulation of high levels of sodium salts relative to other exchangeable cations is the main attribute of sodic soils [
2]. Most saline-alkaline soils are distributed in arid and semi-arid climatic regions. Evaporation and deposition increase the salt concentration in the roots of vegetation [
2], leading to changes in the chemical and biological functions of the soil [
3,
4]. On the one hand, the excessive alkalinity of the soil will have an adverse effect on the infiltration capacity of the soil [
5], increase the susceptibility to water and wind erosion [
6], and decompose more soil organic matter [
7]. On the other hand, excessively high soil salinity can disrupt soil respiration, the nitrogen cycle, and the decomposition function of soil microorganisms [
7,
8]. In addition, salinity stress directly affects vegetation growth by reducing plant water uptake (osmotic stress) and/or by deteriorating the transpiring leaves (specific ion effects) [
9], which reduces organic input to the soil and can ultimately lead to desertification [
10]. Extreme environmental conditions, the dispersion of saline dust [
11,
12], poverty, migration, and the high costs of soil reclamation are some of the long-term socioeconomic consequences of soil salinization.
The radiation on the surface of the earth is partially polarized [
13,
14]. The properties of light include intensity and polarization. Polarized reflectance (Rp) refers to the polarized part of the reflected light [
15]. Polarized light can usually be obtained through a polarizer. Rp can reflect a large amount of surface information and characterizes the optical properties of the earth’s surface [
16,
17]. However, it can also be used to obtain the optical performance of aerosol boundary conditions [
18,
19,
20]. According to previous reports, Rp follows an anisotropic distribution pattern [
21,
22,
23]. The angular distribution of Rp can be characterized by a bidirectional polarized distribution function (BPDF). Therefore, the BPDF model is very important for the estimation of Rp. Thus far, a significant amount of research has been carried out on BPDF [
17,
22,
24,
25]. These studies pertained to vegetation [
13,
17,
26,
27,
28,
29,
30], soil [
31,
32], ice and snow [
16,
23], urban surface [
33], and other basic elements such as smoke and dust [
16,
34] and man-made targets [
14]. Over the past three decades, several BPDF models have been proposed, based on various measurements and results [
35]. The BPDF models can be broadly categorized as (a) physical models based on simplified radiative transfer equations [
27,
31,
36] and Monte Carlo simulation [
27], and (b) semi-empirical models based on the combination of physical models and free parameters that are empirically parameterized [
19,
22,
33,
37]. Six semi-empirical BPDF models were developed to parameterize the polarized reflectance of land surfaces. Using the space-borne polarization and directionality of Earth’s reflectances (POLDER) measurements, three semi-empirical BPDF models, i.e., the Nadal and Bréon model [
21], the Maignan model [
22], and the Xie–Cheng model [
33] were proposed for 11 surface types, 14 international geosphere biosphere program (IGBP) classes, and urban areas, respectively. At the airborne level, Waquet et al. proposed a scaled Fresnel model by accounting for the mutual shadowing of facets using MICROPOL measurements over forests, cropped surfaces, and urban areas [
38]. Litvinov et al. explored a three-parameter BPDF model using research scanning polarimeter (RSP) measurements over vegetation and soil surfaces [
39]. For ground-level measurements, Diner et al. developed a semi-empirical BPDF model for grass surfaces using a ground-based multi-angle spectropolarimetric imager (Ground-MSPI) [
37].
With the development of machine learning and deep learning methods, many algorithms have been proposed, such as the generalized regression neural network (GRNN) [
40], K–nearest neighbor (KNN) algorithm [
41], support vector regression (SVR) [
42], random forest (RF) [
43], and deep neural networks (DNN). These methods have been widely used in other fields of remote sensing, such as for classification and change detection [
44,
45], and in the investigation of the biophysical and biochemical characteristics of vegetation [
43,
46,
47]. Therefore, it is possible to develop BPDF models based on machine learning using these popular algorithms. Such work, though relatively rare, is very important for improving the accuracy of inversion and enhancing the study of the polarized light characteristics of saline-alkaline soil using deep learning.
There are two main purposes of this research. The first is based on the principle of deep learning and a new method, suitable for estimating the polarization reflectivity of the saline-alkaline soil surface, is proposed. The second is to compare this new method with the existing semi-empirical BPDF model and machine learning methods, trying to establish which method is most suitable for the prediction of the polarized reflectance of a saline-alkaline soil surface. The hypothesis of this study is that the polarized reflectance of saline-alkaline soil and an ordinary soil surface is similar in some ways. The principle of deep learning can be applied for finding the solution to this problem. The method used in this research is to measure the polarized reflectance of saline-alkaline soil under laboratory conditions. After sorting the data, deep learning methods are used for processing, and the most suitable deep learning model for this research is explored by changing the corresponding parameters. In addition, using the laboratory data, the results of several existing semi-empirical BPDF models and machine learning methods are compared to achieve the purpose of this research.
5. Discussion
The discussion is divided into three parts. The first and second parts discuss the parameters of the machine learning methods and try to find the best method. Compared with the BPDF model based on machine learning, the semi-empirical BPDF model is more advanced, owing to years of research. However, there are few results regarding the BPDF model based on machine learning. In addition, according to the principles of machine learning, many factors affect the final results of machine learning, for example, data quality, the ratio of the training set and test set, learning rate, etc. These factors largely depend on past experience. The DNN-BPDF model has limited reference research available, so this section will further discuss the training set size and learning rate.
5.1. Influence of the Training Ratio on the Fitting Effect
According to the principles of machine learning, the original dataset needs to be divided into two parts: a training set and a test set. The training set is used to train the model so that the model fits the problem to be studied. The test set is used to evaluate the fitness of the model [
52]. The training set has an impact on the training results. If extremely little data is used for training, then the model is not fully trained, and it is difficult to achieve the desired effect; if an excessive amount of data is used for training, the test set will be too small. The fitting situation will also affect the final result in an unsatisfactory manner. In addition, the choice of training ratio will vary depending on the research problem, and the method of choosing the best training ratio is based on experience. Therefore, it is necessary for us to discuss the training ratios.
To illustrate the impact of the training and test data ratio on the final result, we selected the training data ratios as 10%, 20%, 30%, 40%, 50%, 60%, 70%, and 80% at 670 nm. The RMSE, Cor, and R
2 values, based on the degree of fit of the machine learning BPDF model in several cases, are calculated at the two bands of 865 nm. The results are shown in
Table 5 and
Table 6. However, when the proportion of the training set is particularly small, the training results may be unsatisfactory, due to the small number of samples. This study uses oversampling to alleviate the problem of insufficient samples [
68]. The results shown in
Table 5 and
Table 6 are the results of this method.
The above two tables show the influence of different grouping ratios on the fitting effect. The numbers in bold type indicate the best effect among the statistics. It is clear that the optimal results in the 670 nm band are concentrated in the 60–80% proportion of the training set, and the optimal results in the 865 nm band are concentrated in the 50–80% proportion of the training set. Therefore, the grouping ratio adopted in this experiment, that is, the 70% training set and 30% test set, has a relatively good fitting effect. This ratio does not necessarily present the optimal RMSE, Cor, and R2, as it may be possible to obtain better RMSE, Cor and R2 simultaneously. In addition, as mentioned in the previous section, the Cor value of the fit result of the machine learning method in the 865 nm band is smaller than that of the semi-empirical model. After changing the training set ratio, the machine learning method can obtain a higher Cor value. The Cor value of SVR at 80% training set ratio is 0.9087, that of RF at 60% training set ratio is 0.9085, and that of DNN at 80% training set ratio is 0.9276. The Cor values of several machine learning methods are higher than those of 70% of the training set, and the values were all above 90%.
5.2. Optimal Learning Rate of BPDF Model Based on DNN Method
In the deep learning method, the learning rate is a very important parameter. It guides the use of the gradient of the loss function to adjust the hyperparameters of the network weight. This directly affects the quality of the deep learning results. However, the selection of the learning rate value is highly subjective. Depending on the characteristics of the built model, different incoming data, and different research contents, the optimal learning rate of the neural network is also different.
An extremely small or large learning rate may have an unsatisfactory effect. If the learning rate is too low, the final fitting or classification process will be slow. On the other hand, if the learning rate is too high, it may produce loss oscillations or even fail to converge. Therefore, it was particularly important to find a suitable learning rate for the DNN used in this study. However, this experiment lacked the reference of the previous experience; hence, the enumeration method was adopted, that is, the listing of many learning rates and calculating the loss, in order to find the best learning rate.
Figure 9a shows the loss results for different learning rates in the 670 nm band, while
Figure 9b shows the loss results of different learning rates in the 865 nm band [
69]. In
Figure 9a, in the 670 nm band, the loss of the learning rate decreases significantly in the range of 0.2 to 0.5, and then, as the learning rate increases, the loss also rises and oscillates significantly, such that the best learning rate is approximately 0.5. In the 865 nm band, the learning rate decreases in the range of 0.2 to 0.4, and the loss increases and oscillates in the subsequent learning rate range, such that the optimal learning rate is approximately 0.4. These two values are relatively close; hence, the learning rate of the deep learning BPDF model in this experiment could be selected as a value between 0.4 and 0.5.
6. Conclusions
This study explored the application of a semi-empirical BPDF model and machine learning-based BPDF model in saline-alkaline soil in the laboratory. The results showed that the six models used in this study, whether the semi-empirical BPDF model or the machine learning-based BPDF model, had relatively good results. However, the machine learning-based BPDF model generally presented better results than the semi-empirical BPDF model. Among these, the deep learning method had the best effect. Therefore, the machine learning method was further discussed, and the influence of different training data ratios and different learning rates on the learning effect under the application of the polarized reflectance of the saline-alkaline soil surface was discussed; the best training ratio and learning rate were determined. The results indicate that the difference in the proportion of training data and the difference in the learning rate has a greater impact on the fitting results. A training data proportion that is too low or too high will reduce the fitting effect. The best training ratio was between 60% and 70%. With a 40% to 30% test set, the best learning rate of the DNN-BPDF model was between 0.4 and 0.5.
In summary, this study explored different types of models for the polarization reflectance of saline-alkaline soils, which is helpful and significant for both the remote sensing investigations of saline-alkaline soils and the study of BPDF models.