Shallow Landslide Prediction Using a Novel Hybrid Functional Machine Learning Algorithm

Tien Bui, Dieu; Shahabi, Himan; Omidvar, Ebrahim; Shirzadi, Ataollah; Geertsema, Marten; Clague, John J.; Khosravi, Khabat; Pradhan, Biswajeet; Pham, Binh Thai; Chapi, Kamran; Barati, Zahra; Bin Ahmad, Baharin; Rahmani, Hosein; Gróf, Gyula; Lee, Saro

doi:10.3390/rs11080931

Open AccessArticle

Shallow Landslide Prediction Using a Novel Hybrid Functional Machine Learning Algorithm

by

Dieu Tien Bui

^1,2

,

Himan Shahabi

^3,*

,

Ebrahim Omidvar

⁴

,

Ataollah Shirzadi

⁵

,

Marten Geertsema

⁶

,

John J. Clague

⁷

,

Khabat Khosravi

⁸

,

Biswajeet Pradhan

^9,10

,

Binh Thai Pham

¹¹

,

Kamran Chapi

⁵

,

Zahra Barati

⁴,

Baharin Bin Ahmad

¹²,

Hosein Rahmani

¹³

,

Gyula Gróf

¹⁴

and

Saro Lee

^15,16,*

¹

Geographic Information Science Research Group, Ton Duc Thang University, Ho Chi Minh City, Vietnam

²

Faculty of Environment and Labour Safety, Ton Duc Thang University, Ho Chi Minh City, Vietnam

³

Department of Geomorphology, Faculty of Natural Resources, University of Kurdistan, Sanandaj 66177-15175, Iran

⁴

Department of Rangeland and Watershed Management, Faculty of Natural Resources and Earth Sciences, University of Kashan, Kashan 87317-53153, Iran

⁵

Department of Rangeland and Watershed Management, Faculty of Natural Resources, University of Kurdistan, Sanandaj 66177-15175, Iran

⁶

British Columbia, Ministry of Forests, Lands, Natural Resource Operations and Rural Development, Prince George, BC V2L 1R5, Canada

⁷

Department of Earth Sciences Simon Fraser University 8888 University Drive Burnaby, Burnaby, BC V5A 1S6, Canada

⁸

Department ofWatershed Management, Faculty of Natural Resources, Sari Agricultural Sciences and Natural Resources University, Sari, Mazandaran 48181-68984, Iran

⁹

Center for Advanced Modeling and Geospatial System (CAMGIS), Faculty of Engineering and IT, University of Technology Sydney, CB11.06.106, Building 11, 81 Broadway, Ultimo, NSW 2007, Australia

¹⁰

Department of Energy and Mineral Resources Engineering, Choongmu-gwan, Sejong University, 209 Neungdong-ro, Gwangjin-gu, Seoul 05006, Korea

¹¹

Institute of Research and Development, Duy Tan University, Da Nang 550000, Vietnam

¹²

Faculty of Built Environment and Surveying, Universiti Teknologi Malaysia (UTM), Johor Bahru 81310, Malaysia

¹³

Department of Computer Science and Engineering and IT, School of Electrical and Computer Engineering, Shiraz University, Shiraz 84334-71964, Iran

¹⁴

Department of Energy Engineering, Budapest University of Technology and Economics, 1111 Budapest, Hungary

¹⁵

Geoscience Platform Research Division, Korea Institute of Geoscience and Mineral Resources (KIGAM), 124, Gwahak-ro Yuseong-gu, Daejeon 34132, Korea

¹⁶

Department of Geophysical Exploration, Korea University of Science and Technology, 217 Gajeong-ro Yuseong-gu, Daejeon 34113, Korea

Show full affiliation list

Hide full affiliation list

^*

Authors to whom correspondence should be addressed.

Remote Sens. 2019, 11(8), 931; https://0-doi-org.brum.beds.ac.uk/10.3390/rs11080931

Submission received: 17 March 2019 / Revised: 11 April 2019 / Accepted: 12 April 2019 / Published: 17 April 2019

(This article belongs to the Special Issue Mass Movement and Soil Erosion Monitoring Using Remote Sensing)

Download

Browse Figures

Versions Notes

Abstract

:

We used a novel hybrid functional machine learning algorithm to predict the spatial distribution of landslides in the Sarkhoon watershed, Iran. We developed a new ensemble model which is a combination of a functional algorithm, stochastic gradient descent (SGD) and an AdaBoost (AB) Meta classifier namely ABSGD model to predict the landslides. The model incorporates 20 landslide conditioning factors, which we ranked using the least-square support vector machine (LSSVM) technique. For the modeling, we considered 98 landslide locations, of which 70% (79) were used for training and 30% (19) for validation processes. Model validation was performed using sensitivity, specificity, accuracy, the root mean square error (RMSE) and the area under the receiver operatic characteristic (AUC) curve. We also used soft computing benchmark models, including SGD, logistic regression (LR), logistic model tree (LMT) and functional tree (FT) algorithms for model validation and comparison. The selected conditioning factors were significant in landslide occurrence but distance to road was found to be the most important factor. The ABSGD model (AUC= 0.860) outperformed the LR (0.797), SGD (0.776), LMT (0.740) and FT (0.734) models. Our results confirm that the combined use of a functional algorithm and a Meta classifier prevents over-fitting, reduces noise and enhances the power prediction of the individual SGD algorithm for the spatial prediction of landslides.

Keywords:

landslide modeling; stochastic gradient descent; AdaBoost; Meta classifier; GIS; Iran

Graphical Abstract

1. Introduction

Landslides are important geohazards that can seriously impact the natural and built environment [1,2,3]. About 66 million people live in landslide-prone areas, with the greatest risk in terms of numbers in Asia [4,5]. Managing this risk involves a multi-step process centered on identification, characterization and prediction of landslides [6]. In this paper, we focus on spatial prediction of landslides, while recognizing that landslide prediction has temporal and magnitude components [7,8].

Spatial predictions of landslides commonly involve the production of landslide susceptibility maps [9]. Such mapping is challenging because it relies on adequate high-quality data [10]. Moreover, there is not yet a globally accepted standard approach, in spite of the numerous techniques that have been proposed and used [11,12]. Yet, over the past several decades, there have been remarkable advances in geographic information system (GIS) and remote sensing tools that have been applied to assess landslide susceptibility, hazards, risks and mapping [13,14,15].

Models for predicting landslide susceptibility can be created using qualitative or quantitative methods [16,17]. Qualitative methods based on landslide inventories and parameter weighting rely on expert judgment, whereas quantitative statistical, probabilistic and deterministic methods are mathematically based. With adequate input data, quantitative methods will generally outperform qualitative methods [18,19].

Many quantitative GIS-based techniques and approaches are being developed and applied to natural hazard susceptibility mapping (LSM), including weights of evidence (WoE) [20,21], analytic hierarchy processes (AHP) [22,23,24], frequency ratios (FR) [25,26], simple additive weighting (SAW), bivariate statistics (BS) [27,28], statistical index (SI) [29], logistic regression (LR) [19,30,31], weighted linear combinations (WLC) [32,33,34], multivariate adaptive regression splines (MARS) [35,36], Fisher’s linear discriminant function (FLDA) [37], certainty factor (CF) [38], multivariate regression (MR) [39,40], index of entropy (IOE) [41,42], random forest (RF) [43],discriminant analysis (DA) [44], genetic algorithm (GA) [45], generalized additive models (GAMs) [46], Bayesian logistic regression (BLR) [47,48] and evidential belief functions (EBFs) [49]. Among these different approaches, machine learning methods have received much recent attention for landslide prediction [50,51,52,53,54]. Unlike the statistical/probabilistic approaches that assume a relationship between historical records of landslides and several conditioning factors, machine learning methods can efficiently query a large suite of spatially explicit landslide data and extract information directly from the data [51]. A variety of machine learning methods have been used for LSM, including support vector machine algorithms [55,56,57], artificial neural networks [58], neuro-fuzzy techniques [39,59], decision trees [60], naive Bayes [61], radial basis function (RBF) [62,63], Alternate Decision Tree (ADTree) [64], reduced error pruning trees (REPT) [65] and naive Bayes tree [66]. Each model has advantages and disadvantages depending on differences of the specific study areas. Therefore, new approaches are desirable for testing and validation.

A recent development that shows considerable promise is the combination of different methods to build hybrid models that can generate more accurate spatial predictions of landslides [67]. Data mining approaches are being combined with other methods, such as ANN-Bayes analysis [68], stepwise weight assessment ratio analysis (SWARA), the adaptive neuro-fuzzy inference system (ANFIS) [69], rough set (RS)-SVM [70], neuro fuzzy inference system optimized by particle swarm optimization (PSOANFIS) [71], ANFIS optimized by shuffled frog leaping algorithm (SFLA) [72], ANFIS with grey wolf optimizer (GWO) and biogeography-based optimization (BBO) [73], random subspace and the naive Bayes tree (RS-NBT) [66], and weights of evidence (WoE) and evidential belief function (EBF) [49]. These approaches have provided reasonable results; however, no single hybrid model has emerged as superior to the others.

The objective of this study is to introduce a new hybrid machine learning approach for landslide prediction. Our new approach merges the AdaBoost (AB) Meta classifier with the stochastic gradient descent (SGD) algorithm as a base classifier. We refer to this approach as the stochastic gradient descent-AdaBoost ensemble (ABSGD) method. Here we use it to predict locations of shallow landslides in Chahar Mahaal-o-Bakhtiari Province, Iran. To our knowledge, this hybrid approach has not previously been used for LSM and landslide prediction. To test the performance of our proposed approach, we compare results from the study area to those of several soft computing benchmark models, including logistic regression (LR), the logistic model tree (LMT) and the functional tree (FT).

2. Study Area

Our study area is the Sarkhoon watershed, located within the Zagros Mountains, Iran 50°25.4′–50°38.45′E, 31°42.05′–31°52.05′N (Figure 1). The study area ranges in elevation from 1370 to 3375 m above sea level. The watershed is underlain mainly by sedimentary rocks of Late Cretaceous, Eocene, Miocene and Pliocene age, including limestone, dolomite, marl, sandstone and conglomerate. Complex folds and both reverse and strike-slip faults are present within the study area [74].

Average annual precipitation is 874 mm and temperatures range from below freezing during winter to 40 °C during summer. Land cover/land use in the watershed is approximately 59% forest, 34% rangeland, 3.5% rock outcrop, 3% dry farming and 0.7% residential land. Drought, conversion of land to farms and road construction over the past four decades have degraded the land [75] and increased the susceptibility of the watershed to landslides.

3. Methodology

3.1. Landslide Inventory Map (LIM)

To frame this study, we collected both landslide and non-landslide points in the Sarkhoon watershed, taking into account published studies from other areas [76,77,78,79,80]. We collected some of the landslide polygons from the Forests, Rangelands and Watershed Management Organization of Iran. The polygons cover both scar and accumulated/body zones. But in this study we selected the center of each scar zone of landslides as landslide locations. Additionally, other parts of landslides were determined based on the 1:20,000-scale aerial photographs provided by the provincial Department of Natural Resources and Watershed Management. We then ground-truthed the landslides in the field and recorded their GPS locations. Our inventory of 98 landslide points included 55 translational slides, 22 complex landslides and 21 rotational slides ranging in size from 100 to 60,000 m² (Figure 2).

We also randomly chose 100 non-landslide points to be used for LSM. Both the landslide and non-landslide points were divided into training and testing subsets for modeling purposes. About 70% of the points were randomly chosen for the training dataset and 30% were selected for testing.

3.2. Landslide Conditioning Factors

We selected the following twenty landslide conditioning factors: land use, lithology, average annual precipitation, altitude, slope angle, aspect, European Slope Length and Steepness Factor (LS-Factor), general curvature, profile curvature, plan curvature, longitudinal curvature, tangential curvature, solar radiation, stream power index (SPI), topographic position index (TPI), topographic wetness index (TWI), terrain roughness index (TRI), distance to streams, distance to roads and distance to faults. The classification of different conditioning factors is presented in Table 1. We used seven land use classes in the study area. These include: dry farming, sparse forest, dense forest, poor rangeland, good rangeland, residential area and rock outcrops, which have been mapped by the Chahar Mahaal-o-Bakhtiari Department of Natural Resources and Watershed Management (http://www.frw.org.i). We derived lithological units and faults from the geology map of Ardales and Dehdez sheets prepared by Geological Survey & Mineral Explorations of Iran (GSI) at a 1:100,000 scale [74]. A total of ten lithological units were identified in the Sarkhoon watershed (Table 1). We built an average annual precipitation map using a relationship between average annual precipitation and elevation based on 42 years of average annual precipitation data (1972-2014) from nine meteorological stations in the watershed.

We created a Digital Elevation Model (DEM) with 12.5 m resolution from ALOS PALSAR data provided by the Alaska Satellite Facility (https://vertex.daac.asf.alaska.edu/#). Maps of elevation, slope angle, aspect and length, general, profile, plan, longitudinal and tangential curvature, solar radiation, SPI, TPI, TWI, TRI and distance to stream were constructed from the DEM using ARC GIS 10.3 and SAGA 6.0.0 software. The distance to road map was constructed from the road network built by the Iran National Cartographic Center in DGN format and 1:25,000 scale. The flowchart for the landslide susceptibility mapping and analysis of spatial data of the watershed is shown in Figure 3.

3.3. AdaBoost Meta Classifier

First introduced by Freund and Schapire [81], AdaBoost is a boosting ensemble technique used to improve the predictive capability of weak classifiers. The technique incrementally constructs one classifier at a time; each classifier is trained on a dataset generated selectively from the original dataset by progressively increasing at each step the likelihood of ‘‘difficult’’ data points [82]. AdaBoost has been used in ensemble to improve the prediction ability specially in support vector machines [83], neural networks [84] and decision trees [85].

We apply the technique in this study as follows. Let

U = {(x_{1}, y_{1}), (x_{2}, y_{2}), \dots, (x_{n}, y_{n})}

be an original training dataset where

x = x_{1}, i = 1, 2, \dots, n

is a set of landslide conditioning factors,

y = y_{i} \in {- 1, 1}

represents two classes for classification and

W = {w_{1}, w_{2}, \dots, w_{n}}

is the weight distribution against the samples at the

i^{t h}

boosting iteration. For a given iteration, AdaBoost constructs a new set of training datasets, which are sampled from the original training dataset, with the weight distribution W. Thereafter, the weak learner is called to build a base classifier defined by

S_{t}

which uses the new training datasets for learning. An error of

S_{t}

, denoted as

E_{s}

, is calculated using the following equation [86]:

E_{s} = \sum_{i : s_{t} \neq y_{i}} w_{i}

(1)

The weights of the samples are updated during the learning process as follows:

w_{i + 1} = w_{i} \cdot \exp (- β \cdot z_{i})

(2)

where

β

and

z_{i}

are calculated using the following equations [87]:

β = 0.5 \ln (\frac{1 - E_{s}}{E_{s}})

(3)

z_{i} = {\begin{cases} 1 & i f s_{t} (x_{i}) = y_{i} \\ - 1 & i f s_{t} (x_{i}) \neq y_{i} \end{cases}

(4)

The calculated weights are then normalized to add up to one, as follows:

w_{i + 1} = \frac{w_{i + 1}}{\sum_{i = 1}^{n} w_{i + 1}}

(5)

In the final step, AdaBoost combines all the results of the classification of classifiers.

3.4. Stochastic Gradient Descent Algorithm

The stochastic gradient descent algorithm (SGDA) is a drastic simplification algorithm [88] that utilizes a small subset, which is selected randomly, to compute the gradient of the objective function [89]. The batch size is called as the number of training dataset utilized for the approximation in one iteration. The parameters can be updated more frequently than the gradient descent by using a small batch size, thus accelerating the convergence. A batch size of 1, in the extreme case, provides the maximum frequency of updates and a very simple perceptron-like algorithm. In the SGDA, the weights of the features are updated for the training sample using the following equation [90]:

w^{z + 1} = w^{z} + α_{z} \frac{\partial}{\partial w} (L (j, w) - \frac{M}{N} \sum_{i} | ω_{i} |)

(6)

where N is the batch size, M is the meta-parameter that controls the degree of regularization, z is the iteration counter,

α_{z}

is the learning rate,

ω_{i}

is the weight of the feature, and

L (j, w)

is the conditional log-likelihood of the j^th training sample [89].

3.5. Logistic Regression

Logistic regression (LR) is a popular statistical method applied to landslide susceptibility mapping [91,92,93]. It establishes a multivariate regression relationship between independent variables and a dependent variable [31,92]. The variables can be discrete, continuous or both. The LR algorithm estimates the probability of a certain landslide event by utilizing the maximum likelihood estimation [39]. In the case of landslide prediction, the dependent variable is a binary variable (landslide and non-landslide). The algorithm of LR can be expressed in a simple form as follows [92]:

P = \frac{1}{1 + e^{- f}}

(7)

where

P

is defined as the probability of a past landslide event and

f

is determined by:

f = a_{0} + a_{1} x_{1} + a_{2} x_{2} + \dots + a_{n} x_{n}

(8)

where n is the number of the factors,

a_{0}

is the intercept of the algorithm,

a_{i}, i = 1, 2, \dots, n

is the slope coefficient of the algorithm and

x = x_{i}, i = 1, 2, \dots, n

is the attributes of the factors.

3.6. Logistic Model Tree

Logistic model tree (LMT) is one of the classification tree classifiers. It uses a combination of decision tree and logistic regression machine learning methods [94]. In LMT, the classification and regression tree algorithms are used to prune the tree for classification, whereas the LogitBoost algorithm is used to construct the logistic regression model at every node of the tree; the splitting process is carried out by the logistic variant information gain [94,95]. To find the number of LogitBoost iterations, the LMT employs cross-validation to prevent over-fitting. The additive logistic regression of least squares fitting is used in the LogitBoost algorithm at each class

N_{i}

as follows [94]:

L_{N} (x) = \sum_{i = 1}^{n} α_{i} x_{i} + α_{o}

(9)

where n is the number of factors and

α_{o}

and

α_{i}

are, respectively, the initial coefficient and the coefficient of the i^th component of vector x.

In LMT, the posterior probabilities of the leaf nodes are calculated using the linear logistic regression method [94]:

P (N | x) = \frac{\exp (L_{N} (x))}{\sum_{N^{'} = 1}^{C} \exp (L_{N}^{'} (x))}

(10)

where C is the number of classes.

3.7. Functional Tree

Functional tree (FT) is a tree classifier that uses a combination of attributes at leaf nodes, decision nodes or both and leaves in the learning classification tree. FT uses the logistic regression function to split at the functional inner nodes and predict at the functional leaves. In FT, the functional leaves are used to reduce the variance, whereas the functional inner nodes are used to reduce the bias of classification. The application of FT in landslide prediction is limited to few case studies [96].

x = x_{i}, i = 1, 2, \dots, n

is a set of attributes of the factors, and

y = y_{i}

represents output classes (landslide and non-landside). The classification of the FT algorithm is carried out using the following steps: (1) construct the model, which is the probability of distribution of the output classes, by selecting the constructor of the Linear Bayes Discriminate Function; (2) generate the new constructed dataset by extending the new factor that belongs to the landslide or non-landslide classes; and (3) construct the classification tree by selecting the factors from the initial training datasets and the new datasets.

3.8. Factor Selection Using the Least Square Support Vector Machine (LSSVM)

Factor selection techniques are used to improve and enhance the predictive ability of models during the modeling process with a training dataset. Problems with over-fitting and noise in the training dataset can be overcome by removing factors that have no predictive power. To achieve this objective, we used the least square support vector machine (LSSVM), which was originally proposed by Suykens et al. [97] as a SVM-modified method. LSSVM is a kernel supervised machine learning method that uses the least square linear function for classification and regression problems [98]. It depends on standardization networks and uses the quadratic cost function to reduce the variance in the training dataset and solve a set of linear equations [99].

Consider a training dataset of S data points

{(x_{1}, y_{1}), \dots, (x_{n}, y_{n})}

, where

x_{i} \in R^{d}

is a feature vector and

y_{i} \in {- 1, + 1}

is the landslide and non-landslide values. A nonlinear function is used to map the data points into a high-dimensional Hilbert space. The LSSVM classifier is formulated by minimizing [99]:

\frac{1}{2} w^{T} w + γ \frac{\sum_{k = 1}^{n} e_{k}^{2}}{2}

(11)

Subject to the equality constraints:

y_{i} - (w . φ (x_{i}) - b) = e_{i}

(12)

where

γ

> 0 is a regularization factor,

b

is a bias term and

e_{i}

is the difference between the estimated and the actual outputs.

3.9. Evaluation and Comparison of Algorithms

3.9.1. Statistical Index-Based Evaluation

In this study, we used several statistically based measures including sensitivity (SST), specificity (SPC), accuracy (ACC), root mean squared error (RMSE) and the area under the receiver operating characteristic curve (AUC) to evaluate the landslide modeling process. These quantitative measures were obtained using a 2×2 contingency/confusion table in which four types of possible outcomes—true positive (TP), false positive (FP), true negative (TN) and false negative (FN)—were captured (Table 2). The 2×2 contingency/confusion table in binary classification such as landslide and non-landslide is obtained based on a cutoff value (here is 0.5). Then, it calculated based on the comparison between each landslide ground truth pixel (actual landslide locations) and landside pixel on the obtained classified map. TP and FP refer to landslide locations that are determined to be, respectively, landslide and non-landslide locations. FN and TN classify non-landslide locations as, respectively, landslide and non-landslide locations. Statistical values derived from these four factors are computed as follows [100]:

S S T = \frac{T P}{T P + F N}; S P C = \frac{T N}{T N + F P}; A C C = \frac{T P + T N}{T P + T N + F P + F N}

(13)

R M S E = \sqrt{\frac{1}{n}} \sum_{i = 1}^{n} (X_{o b s e v a t i o n} - X_{e s t i m a t i n})^{2}

(14)

where n is the total number of samples in the landslide training dataset or validation dataset,

X_{o b s e v a t i o n}

is the predicted probability value in the landslide training dataset or validation dataset and

X_{e s t i m a t i n}

is the actual probability value calculated from the landslide susceptibility model.

3.9.2. AUC

The areas under the receiver operating characteristic curve (AUC) is a standard tool for evaluating and assessing the general performance of models [27,49,66,85,101,102]. We used AUC to check the performance of our landslide models. The y-axis of the curve provides a measure of the model sensitivity and the x-axis records 100-specificity [66,103]. The AUC index ranges from 0.5 for an inaccurate model to 1 for an ideal model with higher performance [85,104]. The index is computed as follows [105]:

A U C = \sum T P + \sum T N / P + N

(15)

where P is the total number of landslide locations and N is the total number of non-landslide locations.

4. Results and Analysis

4.1. The Most Significant Conditioning Factors in the Modeling Process

One of the most important steps in any environmental modeling process is the determination of the most significant conditioning factors. Not all factors have the same effect on event occurrences; some may have no effect and must be removed from further consideration. In the present study, the LSSVM model was applied to rate the effectiveness of each conditioning factor based on average merit (AM). Application of this model revealed that distance to road is the most important conditioning factor for landslide occurrences in the Sarkhoon watershed (AM = 19.9), followed by elevation (AM = 18.7), aspect (17.8), rainfall (17), general curvature (15.2), land use (14.6), longitudinal curvature (13.2), profile curvature (11.8), solar radiation (11.3), TPI (10), TWI (8.4), TRI (8.1), SPI (7.1), slope angle (6.7), plan curvature (6.5), STI (6), lithology (5.3), distance to fault (4.5), tangential curvature (4) and distance to river (3.9) (Figure 4).

4.2. Model Validation and Comparison

The modeling process performances using SST, SPC, ACC, RMSE and AUC for both the training and testing phases are shown in Table 3. LMT has the highest sensitivity in the training set (0.783%), meaning that 78.3% of the landslide locations are classified as landslide, followed by FT (75.4%), LR (86.6%) and SGD and ABSGD (83.6%). ABSGD had the highest specificity (87.7%), followed by FT (85.5%), LR (84.5%), LMT (83.1%) and SGD (81.7%). The ABSGD model classified 78.8% of the non-landslide locations as non-landslide. ABSGD and LR had the highest performance (0.785) in terms of ACC, followed by SGD (0.826), LMT (0.807) and FT (0.804). The ABSGD model correctly classified the pixels in the landslide class in 85.5% of the cases. ABSGD had the lowest value of RMSE (0.323) and thus the best model performance, followed by LR (0.338), SGD (0.446), LMT (0.451) and FT (0.502). ABSGD had the highest AUC (0.941), followed by LR (0.917), SGD (0.904), LMT (0.871) and FT (0.819).

The results of the testing phase are similar to those of the training phase. Specifically, for SST, ABSGD > LMT > SGD > LR > FT; for SPC, ABSGD > LR > SGD > LMT > FT; for ACC, ABSGD > LMT > LR > SGD > FT; for RMSE, ABSGD > LR > SGD > LMT > FT; and for AUC, ABSGD > LR > SGD > LMT > FT. Generally, the results show that ABSGD has the highest prediction capability and FT the lowest capability (Table 3).

4.3. Landslide Susceptibility Mapping

After determining the conditioning factors that provided the best model prediction power, we determined the optimal operator for each model. We used a trial-and-error process to determine the optimum values of all parameters in each algorithm such that the goodness-of-fit and performance of the applied algorithms yielded the highest values. All parameters were changed stage-by-stage and the performance of the models checked. The optimum values of these parameters were selected for the final stage of modeling (Table 4).

We transformed the study area into raster format with a pixel size of 10 m. Each pixel was classified as either landslide or non-landslide. We next estimated the landslide indexes that show the probability of landslide occurrence for each pixel based on the training dataset and the learned model. Thus, each pixel of the study area was assigned a unique index. Finally, the indexes for each model were assigned to five classes, namely very low susceptibility (VLS), low susceptibility (LS), moderate susceptibility (MS), high susceptibility (HS) and very high susceptibility (VHS) using the quantile classification scheme [106,107], as shown in Figure 5a–e. The results show that the northeast, middle and southern parts of the Sarkhoon watershed have very high landslide susceptibility and that they are mostly located along the roads.

4.4. Map Verification and Comparison

Model evaluation is an important step in any environmental modeling process, without which the results cannot be shown to have scientific significance [106,107]. We determined the validity of the five landslide maps of the Sarkhoon watershed (Figure 5) using AUC for both the training and testing datasets. The area under the curve was considered to be the quantitative criterion for model validity and comparison. The results of the model validation using the training dataset showed that ABSGD (AUC = 0.868) outperformed LR (AUC = 0.827), SGD (AUC = 0.779), LMT (AUC = 0.753) and FT (AUC = 0.737) (Figure 6a). The results of the testing dataset are similar to those of the training dataset—the ABSGD model (AUC = 0.860) outperformed LR (AUC = 0.797), SGD (AUC = 0.776), LMT (AUC = 0.740) and FT (AUC = 0.734) (Figure 6b). Although all models yielded good and reasonable results, the ABSGD ensemble model had the highest predictive power for landslide susceptibility assessment. The success and prediction rate curves for training and validation landslides based only on landslide locations were designed for the ABSGD and SGD models. We showed that ABSGD had the highest performance and prediction capability for the training (AUC = 0.855) and validation (AUC = 0.765) datasets. Corresponding values for SGD are lower (AUC training = 0.843; AUC validation = 0.727) (Figure 7a,b).

5. Discussion

A goal of spatial landslide modeling is to produce a reliable susceptibility map with high prediction accuracy. Therefore, research is focused on developing and evaluating the performance of predictive landslide susceptibility models [55]. Although many methods have been developed for landslide modeling over the past four decades, machine learning algorithms and their ensemble techniques have been favored in recent years. Their efficiency in enhancing the performance of the models has been stressed by many researchers [31,108].

The main objective of this study was to introduce a new machine learning ensemble model that combines the stochastic gradient descent (SGD) as a base function classifier and AdaBoost as a Meta classifier, namely, ABSGD. Using a linear support vector machine (LSVM) with 10-fold cross-validation, we identified the distance to road as the most significant factor for landslides in the Sarkhoon watershed. Similar findings have been previously reported by Pham et al. [61,91,101]. The results of the factor selection also indicated that all other factors are important for the modeling and prediction of landslides in the Sarkhoon watershed.

We compared the model results and the validation process to assess the ability of ABSGD to spatially predict landslides using four soft computing benchmark models—the SGD, LR, LMT and FT models. Five measures, namely sensitivity, specificity, accuracy, RMSE and AUC, were used for the comparison. The results indicated that the ABSGD model had a better goodness-of-fit (using the training dataset) and prediction capability (using the validation dataset) than the other models.

Additionally, the results of this study showed that the LR model had a higher value of goodness-of-fit and prediction capability than the SGD model and that the SGD model outperformed the LMT and FT decision tree classifiers. The results confirmed that AdaBoost improved the performance of the SGD algorithm. This finding is in agreement with those of Bui et al. [55], Pham et al. [91] and Shirzadi et al. [66], all of whom state that Meta classifiers can enhance the performance of base classifiers. Shirzadi et al. [66] reported that the random subspace (RS) can improve the predictive power of the naive base tree (NBTree) for landslide modeling. In addition, Pham et al. [102] revealed that RS improved the performance of the classification and regression tree (CART) for preparing landslide susceptibility maps.

The main advantage of AdaBoost as a Meta classifier is that it can provide a good balance between accuracy and diversity and reduce noise and data over-fitting in the training dataset [109]. In sum, AdaBoost, as a boosting algorithm, has a good generalization capability, fast performance and low implementation complexity in classification issues [110].

6. Conclusions

A key objective in predictive modeling of landslides is to produce reliable susceptibility maps that can assist managers, land use planners and decision makers to better manage landslide-prone areas. We have shown that machine learning ensemble models can improve spatial landslide predictions due to improvements in the performance of the base classifier. In this study, we used a novel ensemble model, which we refer to as the stochastic gradient descent-AdaBoost ensemble (ABSGD), to prepare a reliable landslide susceptibility map for the Sarkhoon watershed in Chahar-Mahaal-oBakhtiari Province, Iran. This ensemble model combines a functional classifier, SGD and a Meta classifier (AdaBoost).

The results of landslide factor selection using LSSVM with a 10-fold cross-validation showed that all conditioning factors affected the spatial landslide modeling, with distance to road proving the most important. Steep slopes crossed by roads are prone to landslides largely due to cut-and-fill construction techniques and diversion of drainage. Additionally, our results indicate that, although all models performed reliably, the ABSGD model outperformed the LR, SGD, LMT and FT models. Therefore, we suggest that a combination of SGD and AdaBoost provides a better optimized model for increasing the accuracy of predictive landslide susceptibility mapping. In this study, we showed that hybrid models can enhance the performance of individual models in assessing predicting landslides.

Author Contributions

D.T.B., H.S., E.O., A.S., M.G., J.J.C., K.K., B.P., B.T.P., K.C., Z.B., B.B.A., H.R., G.G., and S.L. contributed equally to the work. H.S., E.O., and Z.B. collected field data and conducted the landslide susceptibility mapping and analysis. D.T.B., H.S., E.O., A.S., K.K., K.C., and Z.B. wrote the manuscript. D.T.B., M.G., J.J.C., B.P., B.T.P., H.R., G.G., and S.L. provided critical comments in planning this paper and edited the manuscript. All the authors discussed the results and edited the manuscript.

Funding

This research was supported by the Basic Research Project of the Korea Institute of Geoscience, Mineral Resources (KIGAM) funded by the Minister of Science and ICT and Universiti Teknologi Malaysia (UTM) based on Research University Grant (Q.J130000.2527.17H84).

Conflicts of Interest

The authors declare no conflict of interest.

References

Dehnavi, A.; Aghdam, I.N.; Pradhan, B.; Varzandeh, M.H.M. A new hybrid model using step-wise weight assessment ratio analysis (swara) technique and adaptive neuro-fuzzy inference system (anfis) for regional landslide hazard assessment in iran. Catena 2015, 135, 122–148. [Google Scholar] [CrossRef]
Petley, D. Global patterns of loss of life from landslides. Geology 2012, 40, 927–930. [Google Scholar] [CrossRef]
Aleotti, P.; Chowdhury, R. Landslide hazard assessment: Summary review and new perspectives. Bull. Eng. Geol. Environ. 1999, 58, 21–44. [Google Scholar] [CrossRef]
Sassa, K.; Canuti, P. Landslides-Disaster Risk Reduction; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2008. [Google Scholar]
Nadim, F.; Kjekstad, O.; Peduzzi, P.; Herold, C.; Jaedicke, C. Global landslide and avalanche hotspots. Landslides 2006, 3, 159–173. [Google Scholar] [CrossRef]
Dowling, C.A.; Santi, P.M. Debris flows and their toll on human life: A global analysis of debris-flow fatalities from 1950 to 2011. Nat. Hazards 2014, 71, 203–227. [Google Scholar] [CrossRef]
Chen, W.; Peng, J.; Hong, H.; Shahabi, H.; Pradhan, B.; Liu, J.; Zhu, A.-X.; Pei, X.; Duan, Z. Landslide susceptibility modelling using gis-based machine learning techniques for chongren county, jiangxi province, china. Sci. Total Environ. 2018, 626, 1121–1135. [Google Scholar] [CrossRef]
Corominas, J.; van Westen, C.; Frattini, P.; Cascini, L.; Malet, J.-P.; Fotopoulou, S.; Catani, F.; Van Den Eeckhaut, M.; Mavrouli, O.; Agliardi, F. Recommendations for the quantitative analysis of landslide risk. Bull. Eng. Geol. Environ. 2014, 73, 209–263. [Google Scholar] [CrossRef]
Shadman Roodposhti, M.; Aryal, J.; Shahabi, H.; Safarrad, T. Fuzzy shannon entropy: A hybrid gis-based landslide susceptibility mapping method. Entropy 2016, 18, 343. [Google Scholar] [CrossRef]
Shafizadeh-Moghadam, H.; Minaei, M.; Shahabi, H.; Hagenauer, J. Big data in geohazard; pattern mining and large scale analysis of landslides in iran. Earth Sci. Inform. 2019, 12, 1–17. [Google Scholar] [CrossRef]
Ercanoglu, M.; Gokceoglu, C. Use of fuzzy relations to produce landslide susceptibility map of a landslide prone area (west black sea region, turkey). Eng. Geol. 2004, 75, 229–250. [Google Scholar] [CrossRef]
Zhao, C.; Lu, Z. Remote Sensing of Landslides—A Review; Multidisciplinary Digital Publishing Institute: Basel, Switzerland, 2018. [Google Scholar]
Zhao, C.; Kang, Y.; Zhang, Q.; Lu, Z.; Li, B. Landslide identification and monitoring along the jinsha river catchment (wudongde reservoir area), china, using the insar method. Remote Sens. 2018, 10, 993. [Google Scholar] [CrossRef]
Shahabi, H.; Hashim, M. Landslide susceptibility mapping using gis-based statistical models and remote sensing data in tropical environment. Sci. Rep. 2015, 5, 9899. [Google Scholar] [CrossRef]
Golovko, D.; Roessner, S.; Behling, R.; Wetzel, H.-U.; Kleinschmit, B. Evaluation of remote-sensing-based landslide inventories for hazard assessment in southern kyrgyzstan. Remote Sens. 2017, 9, 943. [Google Scholar] [CrossRef]
Kutlug Sahin, E.; Ipbuker, C.; Kavzoglu, T. Investigation of automatic feature weighting methods (fisher, chi-square and relief-f) for landslide susceptibility mapping. Geocarto Int. 2017, 32, 956–977. [Google Scholar] [CrossRef]
Reichenbach, P.; Rossi, M.; Malamud, B.D.; Mihir, M.; Guzzetti, F. A review of statistically-based landslide susceptibility models. Earth-Sci. Rev. 2018, 180, 60–91. [Google Scholar] [CrossRef]
Pradhan, B.; Seeni, M.I.; Kalantar, B. Performance evaluation and sensitivity analysis of expert-based, statistical, machine learning and hybrid models for producing landslide susceptibility maps. In Laser Scanning Applications in Landslide Assessment; Springer: Cham, Switzerland, 2017; pp. 193–232. [Google Scholar]
Trigila, A.; Frattini, P.; Casagli, N.; Catani, F.; Crosta, G.; Esposito, C.; Iadanza, C.; Lagomarsino, D.; Mugnozza, G.S.; Segoni, S. Landslide susceptibility mapping at national scale: The italian case study. In Landslide Science and Practice; Springer: Berlin/Heidelberg, Germany, 2013; pp. 287–295. [Google Scholar]
Kayastha, P.; Dhital, M.R.; De Smedt, F. Landslide susceptibility mapping using the weight of evidence method in the tinau watershed, nepal. Nat. Hazards 2012, 63, 479–498. [Google Scholar] [CrossRef]
Tsangaratos, P.; Ilia, I.; Hong, H.; Chen, W.; Xu, C. Applying information theory and gis-based quantitative methods to produce landslide susceptibility maps in nancheng county, china. Landslides 2017, 14, 1091–1111. [Google Scholar] [CrossRef]
Yalcin, A. Gis-based landslide susceptibility mapping using analytical hierarchy process and bivariate statistics in ardesen (turkey): Comparisons of results and confirmations. Catena 2008, 72, 1–12. [Google Scholar] [CrossRef]
Park, S.; Choi, C.; Kim, B.; Kim, J. Landslide susceptibility mapping using frequency ratio, analytic hierarchy process, logistic regression and artificial neural network methods at the inje area, korea. Environ. Earth Sci. 2013, 68, 1443–1464. [Google Scholar] [CrossRef]
Alizadeh, M.; Hashim, M.; Alizadeh, E.; Shahabi, H.; Karami, M.; Beiranvand Pour, A.; Pradhan, B.; Zabihi, H. Multi-criteria decision making (mcdm) model for seismic vulnerability assessment (sva) of urban residential buildings. ISPRS Int. J. Geo-Inf. 2018, 7, 444. [Google Scholar] [CrossRef]
Choi, J.; Oh, H.-J.; Lee, H.-J.; Lee, C.; Lee, S. Combining landslide susceptibility maps obtained from frequency ratio, logistic regression and artificial neural network models using aster images and gis. Eng. Geol. 2012, 124, 12–23. [Google Scholar] [CrossRef]
Regmi, A.D.; Devkota, K.C.; Yoshida, K.; Pradhan, B.; Pourghasemi, H.R.; Kumamoto, T.; Akgun, A. Application of frequency ratio, statistical index and weights-of-evidence models and their comparison in landslide susceptibility mapping in central nepal himalaya. Arabian J. Geosci. 2014, 7, 725–742. [Google Scholar] [CrossRef]
Shirzadi, A.; Chapi, K.; Shahabi, H.; Solaimani, K.; Kavian, A.; Ahmad, B.B. Rock fall susceptibility assessment along a mountainous road: An evaluation of bivariate statistic, analytical hierarchy process and frequency ratio. Environ. Earth Sci. 2017, 76, 152. [Google Scholar] [CrossRef]
Shahabi, H.; Ahmad, B.; Khezri, S. Evaluation and comparison of bivariate and multivariate statistical methods for landslide susceptibility mapping (case study: Zab basin). Arabian J. Geosci. 2013, 6, 3885–3907. [Google Scholar] [CrossRef]
Chen, W.; Shahabi, H.; Shirzadi, A.; Hong, H.; Akgun, A.; Tian, Y.; Liu, J.; Zhu, A.-X.; Li, S. Novel hybrid artificial intelligence approach of bivariate statistical-methods-based kernel logistic regression classifier for landslide susceptibility modeling. Bull. Eng. Geol. Environ. 2018. [Google Scholar] [CrossRef]
Akgun, A. A comparison of landslide susceptibility maps produced by logistic regression, multi-criteria decision and likelihood ratio methods: A case study at İzmir, turkey. Landslides 2012, 9, 93–106. [Google Scholar] [CrossRef]
Pham, B.T.; Pradhan, B.; Bui, D.T.; Prakash, I.; Dholakia, M. A comparative study of different machine learning methods for landslide susceptibility assessment: A case study of uttarakhand area (india). Environ. Modell. Softw. 2016, 84, 240–250. [Google Scholar] [CrossRef]
Michael, E.A.; Samanta, S. Landslide vulnerability mapping (lvm) using weighted linear combination (wlc) model through remote sensing and gis techniques. Mode. Earth Syst. Environ. 2016, 2, 88. [Google Scholar] [CrossRef]
He, X.; Hong, Y.; Yu, X.; Cerato, A.B.; Zhang, X.; Komac, M. Landslides susceptibility mapping in oklahoma state using gis-based weighted linear combination method. In Landslide Science for a Safer Geoenvironment; Springer: Cham, Switzerland, 2014; pp. 371–377. [Google Scholar]
Hong, H.; Shahabi, H.; Shirzadi, A.; Chen, W.; Chapi, K.; Ahmad, B.B.; Roodposhti, M.S.; Hesar, A.Y.; Tian, Y.; Bui, D.T. Landslide susceptibility assessment at the wuning area, china: A comparison between multi-criteria decision making, bivariate statistical and machine learning methods. Nat. Hazards 2018. [Google Scholar] [CrossRef]
Conoscenti, C.; Ciaccio, M.; Caraballo-Arias, N.A.; Gómez-Gutiérrez, Á.; Rotigliano, E.; Agnesi, V. Assessment of susceptibility to earth-flow landslide using logistic regression and multivariate adaptive regression splines: A case of the belice river basin (western sicily, italy). Geomorphology 2015, 242, 49–64. [Google Scholar] [CrossRef]
Chen, W.; Pourghasemi, H.R.; Naghibi, S.A. Prioritization of landslide conditioning factors and its spatial modeling in shangnan county, china using gis-based data mining algorithms. Bull. Eng. Geol. Environ. 2017. [Google Scholar] [CrossRef]
Chen, W.; Pradhan, B.; Li, S.; Shahabi, H.; Rizeei, H.M.; Hou, E.; Wang, S. Novel hybrid integration approach of bagging-based fisher’s linear discriminant function for groundwater potential analysis. Nat. Resour. Res. 2019. [Google Scholar] [CrossRef]
Azareh, A.; Rahmati, O.; Rafiei-Sardooi, E.; Sankey, J.B.; Lee, S.; Shahabi, H.; Ahmad, B.B. Modelling gully-erosion susceptibility in a semi-arid region, iran: Investigation of applicability of certainty factor and maximum entropy models. Sci. Total Environ. 2019, 655, 684–696. [Google Scholar] [CrossRef]
Pradhan, B. Remote sensing and gis-based landslide hazard analysis and cross-validation using multivariate logistic regression model on three test areas in malaysia. Adv. Space Res. 2010, 45, 1244–1256. [Google Scholar] [CrossRef]
Althuwaynee, O.F.; Pradhan, B.; Park, H.-J.; Lee, J.H. A novel ensemble bivariate statistical evidential belief function with knowledge-based analytical hierarchy process and multivariate statistical logistic regression for landslide susceptibility mapping. Catena 2014, 114, 21–36. [Google Scholar] [CrossRef]
Chen, W.; Xie, X.; Peng, J.; Shahabi, H.; Hong, H.; Bui, D.T.; Duan, Z.; Li, S.; Zhu, A.-X. Gis-based landslide susceptibility evaluation using a novel hybrid integration approach of bivariate statistical based random forest method. Catena 2018, 164, 135–149. [Google Scholar] [CrossRef]
Tien Bui, D.; Shahabi, H.; Shirzadi, A.; Chapi, K.; Alizadeh, M.; Chen, W.; Mohammadi, A.; Ahmad, B.; Panahi, M.; Hong, H. Landslide detection and susceptibility mapping by airsar data using support vector machine and index of entropy models in cameron highlands, malaysia. Remote Sens. 2018, 10, 1527. [Google Scholar] [CrossRef]
Catani, F.; Lagomarsino, D.; Segoni, S.; Tofani, V. Landslide susceptibility estimation by random forests technique: Sensitivity and scaling issues. Nat. Hazards Earth Syst. Sci. 2013, 13, 2815–2831. [Google Scholar] [CrossRef]
Guzzetti, F.; Reichenbach, P.; Ardizzone, F.; Cardinali, M.; Galli, M. Estimating the quality of landslide susceptibility models. Geomorphology 2006, 81, 166–184. [Google Scholar] [CrossRef]
Jaafari, A.; Zenner, E.K.; Panahi, M.; Shahabi, H. Hybrid artificial intelligence models based on a neuro-fuzzy system and metaheuristic optimization algorithms for spatial prediction of wildfire probability. Agric. For. Meteorol. 2019, 266, 198–207. [Google Scholar] [CrossRef]
Goetz, J.N.; Guthrie, R.H.; Brenning, A. Integrating physical and empirical landslide susceptibility models using generalized additive models. Geomorphology 2011, 129, 376–386. [Google Scholar] [CrossRef]
Taheri, K.; Shahabi, H.; Chapi, K.; Shirzadi, A.; Gutiérrez, F.; Khosravi, K. Sinkhole susceptibility mapping: A comparison between bayes-based machine learning algorithms. Land Degrad. Dev. 2019. [Google Scholar] [CrossRef]
Abedini, M.; Ghasemian, B.; Shirzadi, A.; Shahabi, H.; Chapi, K.; Pham, B.T.; Bin Ahmad, B.; Tien Bui, D. A novel hybrid approach of bayesian logistic regression and its ensembles for landslide susceptibility assessment. Geocarto Int. 2018. [Google Scholar] [CrossRef]
Chen, W.; Shahabi, H.; Shirzadi, A.; Li, T.; Guo, C.; Hong, H.; Li, W.; Pan, D.; Hui, J.; Ma, M. A novel ensemble approach of bivariate statistical-based logistic model tree classifier for landslide susceptibility assessment. Geocarto Int. 2018. [Google Scholar] [CrossRef]
Vahidnia, M.H.; Alesheikh, A.A.; Alimohammadi, A.; Hosseinali, F. A gis-based neuro-fuzzy procedure for integrating knowledge and data in landslide susceptibility mapping. Comput. Geosci. 2010, 36, 1101–1114. [Google Scholar] [CrossRef]
Dickson, M.E.; Perry, G.L. Identifying the controls on coastal cliff landslides using machine-learning approaches. Environ. Modell. Softw. 2016, 76, 117–127. [Google Scholar] [CrossRef]
Hong, H.; Liu, J.; Zhu, A.-X.; Shahabi, H.; Pham, B.T.; Chen, W.; Pradhan, B.; Bui, D.T. A novel hybrid integration model using support vector machines and random subspace for weather-triggered landslide susceptibility assessment in the wuning area (china). Environ. Earth Sci. 2017, 76, 652. [Google Scholar] [CrossRef]
Pradhan, B. A comparative study on the predictive ability of the decision tree, support vector machine and neuro-fuzzy models in landslide susceptibility mapping using gis. Comput. Geosci. 2013, 51, 350–365. [Google Scholar] [CrossRef]
Chen, W.; Zhang, S.; Li, R.; Shahabi, H. Performance evaluation of the gis-based data mining techniques of best-first decision tree, random forest and naïve bayes tree for landslide susceptibility modeling. Sci. Total Environ. 2018, 644, 1006–1018. [Google Scholar] [CrossRef]
Bui, D.T.; Tuan, T.A.; Klempe, H.; Pradhan, B.; Revhaug, I. Spatial prediction models for shallow landslide hazards: A comparative assessment of the efficacy of support vector machines, artificial neural networks, kernel logistic regression and logistic model tree. Landslides 2016, 13, 361–378. [Google Scholar]
Chen, W.; Pourghasemi, H.R.; Panahi, M.; Kornejady, A.; Wang, J.; Xie, X.; Cao, S. Spatial prediction of landslide susceptibility using an adaptive neuro-fuzzy inference system combined with frequency ratio, generalized additive model and support vector machine techniques. Geomorphology 2017, 297, 69–85. [Google Scholar] [CrossRef]
Rahmati, O.; Naghibi, S.A.; Shahabi, H.; Bui, D.T.; Pradhan, B.; Azareh, A.; Rafiei-Sardooi, E.; Samani, A.N.; Melesse, A.M. Groundwater spring potential modelling: Comprising the capability and robustness of three different modeling approaches. J. Hydrol. 2018, 565, 248–261. [Google Scholar] [CrossRef]
Ermini, L.; Catani, F.; Casagli, N. Artificial neural networks applied to landslide susceptibility assessment. Geomorphology 2005, 66, 327–343. [Google Scholar] [CrossRef]
Bui, D.T.; Pradhan, B.; Lofman, O.; Revhaug, I.; Dick, O.B. Spatial prediction of landslide hazards in hoa binh province (vietnam): A comparative assessment of the efficacy of evidential belief functions and fuzzy logic models. Catena 2012, 96, 28–40. [Google Scholar]
Tsai, F.; Lai, J.-S.; Chen, W.W.; Lin, T.-H. Analysis of topographic and vegetative factors with data mining for landslide verification. Ecol. Eng. 2013, 61, 669–677. [Google Scholar] [CrossRef]
Pham, B.T.; Bui, D.; Prakash, I.; Dholakia, M. Evaluation of predictive ability of support vector machines and naive bayes trees methods for spatial prediction of landslides in uttarakhand state (india) using gis. J. Geomat. 2016, 10, 71–79. [Google Scholar]
He, Q.; Shahabi, H.; Shirzadi, A.; Li, S.; Chen, W.; Wang, N.; Chai, H.; Bian, H.; Ma, J.; Chen, Y. Landslide spatial modelling using novel bivariate statistical based naïve bayes, rbf classifier and rbf network machine learning algorithms. Sci. Total Environ. 2019, 663, 1–15. [Google Scholar] [CrossRef]
Zhang, T.; Han, L.; Chen, W.; Shahabi, H. Hybrid integration approach of entropy with logistic regression and support vector machine for landslide susceptibility modeling. Entropy 2018, 20, 884. [Google Scholar] [CrossRef]
Tien Bui, D.; Shahabi, H.; Shirzadi, A.; Chapi, K.; Pradhan, B.; Chen, W.; Khosravi, K.; Panahi, M.; Bin Ahmad, B.; Saro, L. Land subsidence susceptibility mapping in south korea using machine learning algorithms. Sensors 2018, 18, 2464. [Google Scholar] [CrossRef]
Pham, B.T.; Prakash, I.; Singh, S.K.; Shirzadi, A.; Shahabi, H.; Bui, D.T. Landslide susceptibility modeling using reduced error pruning trees and different ensemble techniques: Hybrid machine learning approaches. CATENA 2019, 175, 203–218. [Google Scholar] [CrossRef]
Shirzadi, A.; Bui, D.T.; Pham, B.T.; Solaimani, K.; Chapi, K.; Kavian, A.; Shahabi, H.; Revhaug, I. Shallow landslide susceptibility assessment using a novel hybrid intelligence approach. Environ. Earth Sci. 2017, 76, 60. [Google Scholar] [CrossRef]
Bui, D.T.; Pradhan, B.; Revhaug, I.; Nguyen, D.B.; Pham, H.V.; Bui, Q.N. A novel hybrid evidential belief function-based fuzzy logic model in spatial prediction of rainfall-induced shallow landslides in the lang son city area (vietnam). Geomat. Nat. Hazards Risk 2015, 6, 243–271. [Google Scholar] [CrossRef]
He, S.; Pan, P.; Dai, L.; Wang, H.; Liu, J. Application of kernel-based fisher discriminant analysis to map landslide susceptibility in the qinggan river delta, three gorges, china. Geomorphology 2012, 171, 30–41. [Google Scholar] [CrossRef]
Aghdam, I.N.; Varzandeh, M.H.M.; Pradhan, B. Landslide susceptibility mapping using an ensemble statistical index (wi) and adaptive neuro-fuzzy inference system (anfis) model at alborz mountains (iran). Environ. Earth Sci. 2016, 75, 1–20. [Google Scholar] [CrossRef]
Chang, S.-H.; Wan, S. Discrete rough set analysis of two different soil-behavior-induced landslides in national shei-pa park, taiwan. Geosci. Front. 2015, 6, 807–816. [Google Scholar] [CrossRef]
Nguyen, V.V.; Pham, B.T.; Vu, B.T.; Prakash, I.; Jha, S.; Shahabi, H.; Shirzadi, A.; Ba, D.N.; Kumar, R.; Chatterjee, J.M. Hybrid machine learning approaches for landslide susceptibility modeling. Forests 2019, 10, 157. [Google Scholar] [CrossRef]
Chen, W.; Panahi, M.; Tsangaratos, P.; Shahabi, H.; Ilia, I.; Panahi, S.; Li, S.; Jaafari, A.; Ahmad, B.B. Applying population-based evolutionary algorithms and a neuro-fuzzy system for modeling landslide susceptibility. Catena 2019, 172, 212–231. [Google Scholar] [CrossRef]
Jaafari, A.; Panahi, M.; Pham, B.T.; Shahabi, H.; Bui, D.T.; Rezaie, F.; Lee, S. Meta optimization of an adaptive neuro-fuzzy inference system with grey wolf optimizer and biogeography-based optimization algorithms for spatial prediction of landslide susceptibility. Catena 2019, 175, 430–445. [Google Scholar] [CrossRef]
Berberian, M.; King, G. Towards a paleogeography and tectonic evolution of iran. Can. J. Earth Sci. 1981, 18, 210–265. [Google Scholar] [CrossRef]
Azarnivand, H.; Alizadeh, E.; Sour, A.; Hajibeglo, A. The effects of range management plans of soil properties and rangelands vegetation (case study: Eshtehard rangelands). J. Rangel. Sci. 2012, 2, 625–633. [Google Scholar]
Dhakal, A.S.; Amada, T.; Aniya, M. Landslide hazard mapping and its evaluation using gis: An investigation of sampling schemes for a grid-cell based quantitative method. Photogramm. Eng. Remote Sens. 2000, 66, 981–989. [Google Scholar]
Congalton, R.G.; Green, K. Assessing the Accuracy of Remotely Sensed Data: Principles and Practices; CRC Press: Boca Raton, FL, USA, 2008. [Google Scholar]
Neuhäuser, B.; Damm, B.; Terhorst, B. Gis-based assessment of landslide susceptibility on the base of the weights-of-evidence model. Landslides 2012, 9, 511–528. [Google Scholar] [CrossRef]
Chen, W.; Shahabi, H.; Zhang, S.; Khosravi, K.; Shirzadi, A.; Chapi, K.; Pham, B.; Zhang, T.; Zhang, L.; Chai, H. Landslide susceptibility modeling based on gis and novel bagging-based kernel logistic regression. Appl. Sci. 2018, 8, 2540. [Google Scholar] [CrossRef]
Tien Bui, D.; Shahabi, H.; Shirzadi, A.; Chapi, K.; Hoang, N.-D.; Pham, B.; Bui, Q.-T.; Tran, C.-T.; Panahi, M.; Bin Ahamd, B. A novel integrated approach of relevance vector machine optimized by imperialist competitive algorithm for spatial modeling of shallow landslides. Remote Sens. 2018, 10, 1538. [Google Scholar] [CrossRef]
Freund, Y.; Schapire, R.E. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 1997, 55, 119–139. [Google Scholar] [CrossRef]
Sun, J.; Jia, M.-Y.; Li, H. Adaboost ensemble for financial distress prediction: An empirical comparison with data from chinese listed companies. Expert Syst. Appl. 2011, 38, 9305–9312. [Google Scholar] [CrossRef]
Wang, S.-J.; Mathew, A.; Chen, Y.; Xi, L.-F.; Ma, L.; Lee, J. Empirical analysis of support vector machine ensemble classifiers. Expert Syst. Appl. 2009, 36, 6466–6476. [Google Scholar] [CrossRef] [Green Version]
West, D.; Dellana, S.; Qian, J. Neural network ensemble strategies for financial decision applications. Comput. Oper. Res. 2005, 32, 2543–2559. [Google Scholar] [CrossRef]
Hong, H.; Liu, J.; Bui, D.T.; Pradhan, B.; Acharya, T.D.; Pham, B.T.; Zhu, A.-X.; Chen, W.; Ahmad, B.B. Landslide susceptibility mapping using j48 decision tree with adaboost, bagging and rotation forest ensembles in the guangchang area (china). Catena 2018, 163, 399–413. [Google Scholar] [CrossRef]
Schwenk, H.; Bengio, Y. Boosting neural networks. Neural Comput. 2000, 12, 1869–1887. [Google Scholar] [CrossRef]
Roe, B.P.; Yang, H.-J.; Zhu, J.; Liu, Y.; Stancu, I.; McGregor, G. Boosted decision trees as an alternative to artificial neural networks for particle identification. Nuclear Instrum. Methods Phys. Res. Sect. A 2005, 543, 577–584. [Google Scholar] [CrossRef] [Green Version]
Bottou, L. Large-scale machine learning with stochastic gradient descent. In Proceedings of the 19th International Conference on Computational Statistics, Paris, France, 22–27 August 2010; pp. 177–186. [Google Scholar]
Tsuruoka, Y.; Tsujii, J.I.; Ananiadou, S. Stochastic gradient descent training for l1-regularized log-linear models with cumulative penalty. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language, Singapore, 2–7 August 2009; pp. 477–485. [Google Scholar]
Gardner, W.A. Learning characteristics of stochastic-gradient-descent algorithms: A general study, analysis and critique. Signal Process. 1984, 6, 113–133. [Google Scholar] [CrossRef]
Pham, B.T.; Tien Bui, D.; Prakash, I.; Nguyen, L.H.; Dholakia, M.B. A comparative study of sequential minimal optimization-based support vector machines, vote feature intervals and logistic regression in landslide susceptibility assessment using gis. Environ. Earth Sci. 2017, 76, 371. [Google Scholar] [CrossRef]
Bai, S.-B.; Wang, J.; Lü, G.-N.; Zhou, P.-G.; Hou, S.-S.; Xu, S.-N. Gis-based logistic regression for landslide susceptibility mapping of the zhongxian segment in the three gorges area, china. Geomorphology 2010, 115, 23–31. [Google Scholar] [CrossRef]
Lee, S.; Pradhan, B. Landslide hazard mapping at selangor, malaysia using frequency ratio and logistic regression models. Landslides 2007, 4, 33–41. [Google Scholar] [CrossRef]
Landwehr, N.; Hall, M.; Frank, E. Logistic model trees. Mach. Learn. 2005, 59, 161–205. [Google Scholar] [CrossRef]
Chen, W.; Xie, X.; Wang, J.; Pradhan, B.; Hong, H.; Bui, D.T.; Duan, Z.; Ma, J. A comparative study of logistic model tree, random forest and classification and regression tree models for spatial prediction of landslide susceptibility. Catena 2017, 151, 147–160. [Google Scholar] [CrossRef]
Pham, B.T.; Tien Bui, D.; Pourghasemi, H.R.; Indra, P.; Dholakia, M.B. Landslide susceptibility assesssment in the uttarakhand area (india) using gis: A comparison study of prediction capability of naïve bayes, multilayer perceptron neural networks and functional trees methods. Theor. Appl. Climatol. 2015, 122, 1–19. [Google Scholar] [CrossRef]
Suykens, J.A.; De Brabanter, J.; Lukas, L.; Vandewalle, J. Weighted least squares support vector machines: Robustness and sparse approximation. Neurocomputing 2002, 48, 85–105. [Google Scholar] [CrossRef]
Suykens, J.A.; Vandewalle, J. Least squares support vector machine classifiers. Neural Process. Lett. 1999, 9, 293–300. [Google Scholar] [CrossRef]
Smola, A.J.; Schölkopf, B.; Müller, K.-R. The connection between regularization operators and support vector kernels. Neural Netw. 1998, 11, 637–649. [Google Scholar] [CrossRef] [Green Version]
Beguería, S. Validation and evaluation of predictive models in hazard assessment and risk management. Nat. Hazards 2006, 37, 315–329. [Google Scholar] [CrossRef]
Pham, B.T.; Nguyen, V.-T.; Ngo, V.-L.; Trinh, P.T.; Ngo, H.T.T.; Bui, D.T. A novel hybrid model of rotation forest based functional trees for landslide susceptibility mapping: A case study at kon tum province, vietnam. In Proceedings of the International Conference on Geo-Spatial Technologies and Earth Resources, Hanoi, Vietnam, 5–6 October 2017; pp. 186–201. [Google Scholar]
Pham, B.T.; Prakash, I.; Bui, D.T. Spatial prediction of landslides using a hybrid machine learning approach based on random subspace and classification and regression trees. Geomorphology 2018, 303, 256–270. [Google Scholar] [CrossRef]
Shirzadi, A.; Solaimani, K.; Roshan, M.H.; Kavian, A.; Chapi, K.; Shahabi, H.; Keesstra, S.; Ahmad, B.B.; Bui, D.T. Uncertainties of prediction accuracy in shallow landslide modeling: Sample size and raster resolution. Catena 2019, 178, 172–188. [Google Scholar] [CrossRef]
Bui, D.T.; Ho, T.-C.; Pradhan, B.; Pham, B.-T.; Nhu, V.-H.; Revhaug, I. Gis-based modeling of rainfall-induced landslides using data mining-based functional trees classifier with adaboost, bagging and multiboost ensemble frameworks. Environ. Earth Sci. 2016, 75, 1101. [Google Scholar]
Gorsevski, P.V.; Gessler, P.E.; Foltz, R.B.; Elliot, W.J. Spatial prediction of landslide hazard using logistic regression and roc analysis. Trans. GIS 2006, 10, 395–415. [Google Scholar] [CrossRef]
Shafizadeh-Moghadam, H.; Valavi, R.; Shahabi, H.; Chapi, K.; Shirzadi, A. Novel forecasting approaches using combination of machine learning and statistical models for flood susceptibility mapping. J. Environ. Manag. 2018, 217, 1–11. [Google Scholar] [CrossRef]
Khosravi, K.; Pham, B.T.; Chapi, K.; Shirzadi, A.; Shahabi, H.; Revhaug, I.; Prakash, I.; Bui, D.T. A comparative assessment of decision trees algorithms for flash flood susceptibility modeling at haraz watershed, northern iran. Sci. Total Environ. 2018, 627, 744–755. [Google Scholar] [CrossRef]
Bui, D.T.; Ho, T.C.; Revhaug, I.; Pradhan, B.; Nguyen, D.B. Landslide susceptibility mapping along the national road 32 of vietnam using gis-based j48 decision tree classifier and its ensembles. In Cartography from Pole to Pole; Springer: Berlin/Heidelberg, Germany, 2014; pp. 303–317. [Google Scholar]
Li, X.; Wang, L.; Sung, E. Adaboost with svm-based component classifiers. Eng. Appl. Artif. Intell. 2008, 21, 785–795. [Google Scholar] [CrossRef]
Viola, P.; Jones, M. Robust Real-Time Face Detection; IEEE: Piscataway Township, NJ, USA, 2001; p. 747. [Google Scholar]

Figure 1. Location of the study area in Iran; the red circles denote landslides for testing; the red triangles denote landslides for training; the green circles denote non-landslides for testing; and the green triangles denote non-landslides for training.

Figure 2. Some recent landslides in the Sarkhoon watershed.

Figure 3. The overall flowchart of landslide susceptibility modeling in the Sarkhoon watershed.

Figure 4. Factor selection using least square support vector machine (LSSM).

Figure 5. Landslide susceptibility maps using: (a) AdaBoost-scholastic gradient descent (ABSGD); (b) stochastic gradient descent (SGD); (c) logistic regression (LR); (d) logistic model tree (LMT); and (e) functional tree (FT).

Figure 6. Area under the ROC curve (AUCs) of the models using: (a) training dataset and (b) validation dataset.

Figure 7. Landslide model validation and comparison using: (a) success rate curve and (b) prediction rate curve.

Table 1. Landslide conditioning factors and their classes.

Factors	Classes	GIS Data Type	Scale	Classification Method
Land use	(1) Dry farming; (2) Sparse forest; (3) Dense forest; (4) Poor rangeland; (5) Good rangeland; (6) Residential area; (7) Rock outcrop	Polygon	1:25,000	Supervised classification
Lithology *	(1) Mmm; (2) MPlsma; (3) PlCb; (4) Qal; (5) Q2t; (6) Q3t; (7) Edj; (8) Klt; (9) Kmg; (10) KlSi	Polygon	1: 100,000	Lithological classification
Average annual precipitation (mm)	(1) 523–650; (2) 650–800 (3) 800–950; (4) 950–1100; (5) 1100–1250; (6) 1250<	GRID	10 m × 10 m	Natural breaks
Altitude (m)	(1) 1370–1620; (2) 1620–1870; (3) 1870–2120; (4) 2120–2370; (5) 2370–2620; (6) 2620–2870; (7) 2870–3120; (8) 3120–3375	GRID	10 m × 10 m	Natural breaks
Slope angle (˚)	(1) 0–5; (2) 5–10; (3) 10–15; (4) 15–20; (5) 20–30; (6) 30–45; (7) 45<	GRID	10 m × 10 m	Manual
Aspect (˚)	(1) −1–0; (2) 0–22.5, 337.5–360; (3) 22.5–67.5; (4) 67.5–112.5; (5) 112.5–157.5; (6) 157.5–202.5; (7) 202.5–247.5; (8) 247.5–292.5; (9) 292.5–337.5	GRID	10 m × 10 m	Azimuth classification
LS	(1) <−70: (2) −70–−45; (3) −45–−15; (4) −15–15; (5) 15–45; (6) 45<	GRID	10 m × 10 m	Natural breaks
General curvature	(1) <−0.1; (2) −0.1–−0.05; (3) −0.05–0; (4) 0–0.05; (5) 0.05<	GRID	10 m × 10 m	Natural breaks
Profile curvature	(1) −1.369- −0.084; (2) −0.084–−0.008; (3) −0.008–0.26	GRID	10 m × 10 m	Natural breaks
Plan curvature	(1) −49.714–−0.0119; (2) −0.0119–0.0008; (3) 0.0008–0.0143; (4) 0.0143–8.3923	GRID	10 m × 10 m	Natural breaks
Longitudinal curvature	(1) <−0.1; (2) −0.1–−0.05; (3) −0.05–0; (4) 0–0.05; (5) 0.05–1; (6) 0.1–1.37	GRID	10 m × 10 m	Natural breaks
Tangential curvature	(1) −1.21–−0.051; (2) −1.21–−0.004; (3) −0.004–0.28	GRID	10 m × 10 m	Natural breaks
Solar radiation	(1) <350,000; (2) 350,000–700,000 (3) 700,000–1,050,000; (4) 1,050,000–1,400,000; (5) 1,400,000–1,750,000; (6) 1,750,000<	GRID	10 m × 10 m	Natural breaks
SPI	(1) 4–6; (2) 6–8; (3) 8–10; (4) 10–12; (5) 12–14; (6) 14–16; (7) 16–18; (8) 18–20	GRID	10 m × 10 m	Natural breaks
TPI	(1) <−30; (2) −30–−15; (3) −15–0; (4) 0–15; (5) 15–30; (6) 30<	GRID	10 m × 10 m	Natural breaks
TWI	(1) 4.71–6.69; (2) 6.69–8.67; (3) 8.67–10.56; (4) 10.56–12.64; (5) 12.64–14.62; (6) 14.62–16.60; (7) 16.60–18.58; (8) 18.58–20.56	GRID	10 m × 10 m	Natural breaks
TRI	(1) <5; (2) 5–15; (3) 15–25; (4) 25–35; (5) 35–45; (6) 45<	GRID	10 m × 10 m	Natural breaks
Distance to stream (m)	(1) 0–100; (2) 100–200; (3)200–300; (4) 300–400; (5) 400–500, (6) 500<	Line	1:25,000	Manual
Distance to road (m)	(1) 0–100; (2) 100–200; (3)200–300; (4) 300–400; (5) 400–500, (6) 500<	Line	1:25,000	Manual
Distance to fault (m)	(1) 0–100; (2) 100–200; (3)200–300; (4) 300–400; (5) 400–500, (6) 500<	Line	1: 100,000	Manual

* M^m_m: Olive, grey, green marl (Mishan Formation); MPl^sm_a: Red sandstone and marl (Aghajari Formation); Pl^C_b: conglomerate with sandstone (Bakhtiari Formation); Q_al: Active stream channel deposits; Q²_t: Quaternary Young terraces, Q³_t: Quaternary Low level terraces; E^d_j: Thick to medium bedded grey dolomite (Jahrum formation); K^l_t: Thick to medium bedded cream fossiliferous limeston (Tarbur formation); K^m_g: Alternation of bluish grey marl with limestones (Gurpi formation); K^l_Si: Massive brownish grey limestone (Sarvak-Ilam Formation).

Table 2. Contingency table with four types of possible consequences for the modeling evaluation process.

Predicted class		Actual class
		Landslide (1)	Non-landslide (0)
	Landslide (1)	TP	FP
	Non-landslide (0)	FN	TN

Table 3. Model results and analysis using training and validation datasets.

	ABSGD		SGD		LR		LMT		FT
	T	V	T	V	T	V	T	V	T	V
TP	61	20	56	19	58	21	54	18	52	18
TN	57	25	58	23	60	22	59	25	59	21
FP	8	9	13	10	11	8	12	12	10	11
FN	12	4	11	6	9	7	15	4	17	8
SST	0.836	0.833	0.836	0.760	0.866	0.750	0.783	0.818	0.754	0.692
SPC	0.877	0.735	0.817	0.697	0.845	0.733	0.831	0.676	0.855	0.656
ACC	0.855	0.776	0.826	0.724	0.855	0.741	0.807	0.729	0.804	0.672
RMSE	0.323	0.411	0.446	0.531	0.338	0.443	0.451	0.536	0.502	0.540
AUC	0.941	0.861	0.904	0.830	0.917	0.839	0.871	0.731	0.819	0.708

TP: true positive, TN: true negative, FP: false positive, FN: false negative, SST: sensitivity, SPC: specificity, ACC: accuracy, T: training; V: validation.

Table 4. Parameters of machine learning algorithms applied in this study.

Algorithm	Parameters
SGD	Bach size, 100; Debug, False; Do not check capability, False; Not normalized, true; Do not replace missing, False; Epoch, 500; Epsilon, 0.001; Lambda, 0.0001; Learning rate, 0.01, Loss function, Logistic regression; Number of decimal places, 2; Number of seeds, 1.
ABSGD	Batch size, 100; Classifier, SGD; Debug, False; Do not check capability, False; Number of decimal places, 2; Number of iterations, 10; Number of seeds, 1; Use resampling, False; Weight threshold, 100.

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tien Bui, D.; Shahabi, H.; Omidvar, E.; Shirzadi, A.; Geertsema, M.; Clague, J.J.; Khosravi, K.; Pradhan, B.; Pham, B.T.; Chapi, K.; et al. Shallow Landslide Prediction Using a Novel Hybrid Functional Machine Learning Algorithm. Remote Sens. 2019, 11, 931. https://0-doi-org.brum.beds.ac.uk/10.3390/rs11080931

AMA Style

Tien Bui D, Shahabi H, Omidvar E, Shirzadi A, Geertsema M, Clague JJ, Khosravi K, Pradhan B, Pham BT, Chapi K, et al. Shallow Landslide Prediction Using a Novel Hybrid Functional Machine Learning Algorithm. Remote Sensing. 2019; 11(8):931. https://0-doi-org.brum.beds.ac.uk/10.3390/rs11080931

Chicago/Turabian Style

Tien Bui, Dieu, Himan Shahabi, Ebrahim Omidvar, Ataollah Shirzadi, Marten Geertsema, John J. Clague, Khabat Khosravi, Biswajeet Pradhan, Binh Thai Pham, Kamran Chapi, and et al. 2019. "Shallow Landslide Prediction Using a Novel Hybrid Functional Machine Learning Algorithm" Remote Sensing 11, no. 8: 931. https://0-doi-org.brum.beds.ac.uk/10.3390/rs11080931

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Shallow Landslide Prediction Using a Novel Hybrid Functional Machine Learning Algorithm

Abstract

1. Introduction

2. Study Area

3. Methodology

3.1. Landslide Inventory Map (LIM)

3.2. Landslide Conditioning Factors

3.3. AdaBoost Meta Classifier

3.4. Stochastic Gradient Descent Algorithm

3.5. Logistic Regression

3.6. Logistic Model Tree

3.7. Functional Tree

3.8. Factor Selection Using the Least Square Support Vector Machine (LSSVM)

3.9. Evaluation and Comparison of Algorithms

3.9.1. Statistical Index-Based Evaluation

3.9.2. AUC

4. Results and Analysis

4.1. The Most Significant Conditioning Factors in the Modeling Process

4.2. Model Validation and Comparison

4.3. Landslide Susceptibility Mapping

4.4. Map Verification and Comparison

5. Discussion

6. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI