Next Article in Journal
An Empirical Investigation on Plastic Waste Issues and Plastic Disposal Strategies to Protect the Environment: A UAE Perspective
Previous Article in Journal
(Re)Defining Restorative and Regenerative Urban Design and Their Relation to UNSDGs—A Systematic Review
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Assessment of Landslide Susceptibility Using Different Machine Learning Methods in Longnan City, China

1
Geological Hazards Prevention Institute, Gansu Academy of Sciences, Lanzhou 730099, China
2
College of Grassland Agriculture, Northwest Agriculture and Forestry University, Yangling 712100, China
*
Author to whom correspondence should be addressed.
Sustainability 2022, 14(24), 16716; https://0-doi-org.brum.beds.ac.uk/10.3390/su142416716
Submission received: 10 November 2022 / Revised: 1 December 2022 / Accepted: 9 December 2022 / Published: 13 December 2022
(This article belongs to the Section Hazards and Sustainability)

Abstract

:
In recent decades, with the increase in extreme climate duration and the continuous development of urbanization in China, the threat of landslide disasters has become increasingly serious. More and more scholars pay attention to the problem of the prevention of landslide disasters. Therefore, the landslide susceptibility prediction is generated, which can play an important role in the design of land development and urban development schemes in mountainous areas. In this paper, the frequency ratio (FR) model is used to quantitatively analyze the relationship between each factor and the occurrence of landslide (elevation, slope, aspect, plan curvature, profile curvature, distance to faults, rainfall, distance to rivers, soil types, land cover, Normalized Difference Vegetation Index (NDVI) and distance to roads). Based on the analysis of landslide distribution, 12 influencing factors were selected to establish the landslide susceptibility evaluation index system. Historical landslide points were randomly divided into training (70% of the total) and validation (30%) sets. Thereafter, decision tree (DT), logistic regression (LR), and random forest (RF) models were used to generate the landslide susceptibility mapping (LSM), and the predictive performance of the three models was evaluated using receiver operating characteristic (ROC) curves. The FR model results showed that landslides mostly occurred at slopes of 0–15°, elevations of <1000 m, distance to rivers of 0–500 m, rainfall of 750–840 mm, NDVI of 0.8–0.9, distance to roads of 0–500 m, distance to faults of 1500–2000 m and transportation land. Our results also showed that the RF model showed a great capability of identifying areas highly susceptible to landslide, and this model had the greatest reliability. High and very high landslide susceptibility was detected for 29.73% of the land area of Longnan City, Gansu Province, mainly in the eastern, northeastern, and southern regions. The importance ranking of the RF model also revealed that elevation, NDVI, distance to roads, and rainfall dominated the spatial distribution of landslides. Our results could help government agencies and decision-makers make wise decisions for future natural hazard prevention in Longnan City.

1. Introduction

A landslide refers to the downslope movement of whole or partial rock and soil under the action of gravity [1]. Landslide is caused by a variety of associated natural disasters (e.g., earthquake and heavy rainfall) or human activities (e.g., road construction, deforestation, etc.) [2,3,4,5,6,7,8]. It is widely distributed in mountain areas around the world and catastrophically threatens human life, property, and the natural environment [9]. In the past several decades, with the development of urbanization in China, the development of western mountainous areas has been greatly promoted. However, landslides are becoming more frequent due to human activities and the extreme climate [3].
Landslide susceptibility assessment can clarify the spatial distribution of landslide occurrence at the regional scale [10], and it can provide a certain scientific basis for landslide disaster prevention and regional development planning [11]. In the past few decades, due to the rapid development of computer technology, multifarious machine learning models have been developed and used to map landslide susceptibility in addition to some traditional statistical models [12,13,14]. For instance, support vector machine (SVM) [15], random forest (RF) [10], decision tree (DT) [16], and artificial neural network (ANN) [17] are all used in landslide susceptibility assessment. However, the selection of the model is related to the credibility and accuracy of the evaluation results [18]. For example, in the Sultan Mountains area in southwest Turkey, Ozdemir et al. [19] used the frequency ratio (FR) model, the weight of evidence (WOE), and logistic regression (LR) to conduct a comparative study on landslide susceptibility mapping (LSM), and the results showed that the FR model was more suitable than other two models for landslide susceptibility assessment in the region. In addition, LR, RF, and ANN models were used to evaluate landslide susceptibility, and it was found that the RF model had better landslide prediction accuracy [20]. Specifically, this may be because RF models can handle high-dimensional data well and prevent data overfitting at the same time.
In addition, a large number of literature have shown that the selection of landslide-influencing factors will also have a significant effect on the LSM [21]. For example, many factors, including topographic conditions (elevation, slope, aspect, plane curvature, profile curvature, etc.), geological conditions (lithology of the formation, distance to faults, etc.), and environmental conditions (land cover, (Normalized Difference Vegetation Index) NDVI, vegetation types, soil types, precipitation, distance to rivers, etc.) are applied in the production of the LSM [22,23,24]. However, in China, due to the rapid development of urbanization, the transformation of nature by human beings is further deepened, and the landslide disasters caused by human activities are becoming more and more serious [25]. Therefore, more evidence is needed to enrich our understanding of the role of human activities (the construction of roads and large buildings) in triggering landslides [9].
From some existing studies, it can be found that different factors are selected for the drawing of LSM in different regions. Achour et al. [26] used nine factors affecting landslide occurrence (such as elevation, slope, aspect, stream power index (SPI), topographic wetness index (TWI), plane curvature, profile curvature, lithology, and NDVI) to evaluate the landslide susceptibility of highway A1. Nevertheless, some factors (rainfall, land cover, distance to roads, distance to rivers, etc.) that induce landslides are not considered in susceptibility assessment, which has a certain impact on the accuracy of susceptibility assessment. Chen et al. [27] took elevation, annual average rainfall, slope, aspect, SPI, sediment transport index (STI), TWI, plan curvature, profile curvature, lithology, NDVI, distance to roads, soil types, land cover and distance to rivers as the factors in evaluating landslide susceptibility in Zichang City. Therefore, it is necessary to select appropriate factors for landslide susceptibility assessment based on the actual situation of the study area. Considering that the area is mountainous and the intensity of human activities, the elevation and distance to roads are used to explore the susceptibility of landslides.
There is complex and mountainous terrain, a rainy summer in Longnan City. In addition, with the continuous advancement of urbanization, unreasonable development of mountainous areas by human beings leads to frequent landslide disasters. However, there are few studies on regional landslide susceptibility, and it is urgent to produce a relatively accurate LSM. In this study, based on the results of previous research and the actual situation of the region, we construct a comprehensive index system containing 12 causative factors. First, the frequency ratio (FR) model is used to quantitatively analyze the relationship between each factor and the occurrence of landslides. What is more, we need to generate a relatively accurate LSM using a statistical model (LR) and two machine learning models (DT and RF models). We believe that our research results and conclusions can identify and delimit landslide-prone areas so as to provide certain theoretical support for the prevention of landslides and the construction planning of major regional projects.

2. Study Area

The study area, Longnan City (104°01′19″–106°35′20″ E, 32°35′45″–34°32′00″ N), is located in the southeast of Gansu Province, with a total land area of 27,000 km2 (Figure 1). The average altitude of Longnan City is 1000 m, the highest altitude is 4168 m, and the lowest altitude is 566 m, respectively [28]. Longnan City consists of one district and eight counties with different climates, which can be divided into two parts. The first is the northern warm temperate semi-humid area, which refers to the area north of Wudu District and Kang County. In Li County, the mean annual temperature is 11.4 °C. Mean monthly temperatures range from −1.5 °C in January to 23.2 °C in July [29]. The mean annual precipitation is 500 mm, accounting for 64% of the annual precipitation (from June to September). The second is the southern subtropical sub-humid area, which roughly includes Kang County, Wen County, and Wudu District. Take Wudu District as an example; the mean annual temperature of the district is 14.9 °C. Mean monthly temperatures range from 3.7 °C in January to 25.2 °C in July. The mean annual precipitation is 460 mm, with >80% of the annual precipitation (from May to September) [29]. The geological structure of the study area is mainly shown as follows: prominent fold structure, the Variscan fold belt of the Middle Qinling Mountains, the Indosinian fold belt of the South Qinling Mountains, and the Mesoproterozoic fold belt of Bikou pass through, hard lithology, wide distribution of soft rocks in the river valley, and a typical northwest karst landform area—Vientiane Cave in Wudu District [29,30]. The terrain obviously undulates, and the landform is mainly composed of mountains, valleys, hills, and basins, which are distributed in a staggered manner. The study area is the only area in Gansu Province that belongs to the Yangtze River drainage system, with river network density. There are the Jialing River, Bailong River, Xihanshui River, Lesu River, Qinghe River, and other rivers in the region. The vertical zonality of soil type distribution is obvious, mainly yellow brown soil, brown soil and cinnamon soil. The vegetation is mainly alpine meadow and subtropical evergreen deciduous forest. The vegetation is luxuriant, and the vegetation coverage rate reaches 85% [30].

3. Methods

As shown in Figure 2, the entire workflow of this study has five main steps: (1) collect historical landslide points and build a landslide susceptibility evaluation index system according to the actual situation of the region; (2) The relationship between the selected factors and landslide is clarified, and the multicollinearity analysis of the selected factors is carried out; (3) Based on FR, LR, DT, and RF model, landslide susceptibility modeling is conducted using the 12 previously selected influencing factors; (4) The nnet, randomForest and rpart packages in R [31] were used to train and predict the model, and the landslide susceptibility map was drawn with the predicted results. (5) The performance of these models is evaluated and generates the dominant factors that trigger the occurrence of landslides.

3.1. Spatial Dataset and Landslide Inventory

Several data sources were used to investigate the environmental factors leading to landslides, and the susceptibility to landslides was analyzed. All data sources are listed in Table 1.
Landslide inventory is an embodiment of the spatial distribution of existing landslides and an indispensable tool for landslide disaster prevention and control [37]. In the present study, 1656 historical landslides were obtained from the National Cryosphere Desert Science Data Center (Figure 3a). Field investigations and News reports were used in typical landslide areas to ensure the accuracy of historical landslide points. For example, Figure 3b,c are typical landslides in Tanchang County and Cheng County reported by the news respectively, and Figure 3d,e are the typical landslide in Wudu District under field investigation. In the subsequent landslide susceptibility analysis, the same number of non-landslide points were randomly selected from areas that were not prone to landslides. The dataset is divided into a training dataset (70%) and a validation dataset (30%) for model training and prediction [9,25].
In the study area, the landslide types mainly include accumulation layer landslide (47.6%), rock landslide (21.9%), complex landslide (13.4%), and loess landslide (17.0%) (Figure 4a). According to the volume of the landslide, the landslide is divided into small, medium, large and giant [29]. In addition, large landslides (10 × 105–10 × 106 m3) are the largest in the study area, accounting for 33.9% of the total. The second is the medium landslide (10 × 104–10 × 105 m3), accounting for 30.6% of the total. Giant landslides (≥10 × 106 m3) were the least significant, accounting for 7.9% of the total (Figure 4b).

3.2. Landslide Influencing Factors

Because landslides are affected by numerous factors, the formation mechanism of landslides is extremely complex [9]. Before the landslide susceptibility evaluation, the selection of evaluation factors is very crucial. However, the selection of factors is commonly affected by the characteristics of the study area, the type of landslide, and the availability of data [38,39,40]. In detail, these factors included the elevation, slope, aspect, plan curvature, profile curvature, distance to rivers, NDVI, land cover, rainfall, soil types, vegetation types, distance to roads, distance to faults, and geomorphic types. Therefore, after reading the literature and considering the actual local situation, this paper initially selected 12 influencing factors (elevation, slope, aspect, plan curvature, profile curvature, distance to faults, rainfall, distance to rivers, soil types, land cover, NDVI and distance to roads) to construct the landslide susceptibility evaluation index system. The relationship between landslide occurrence and influencing factors is as follows.

3.2.1. Topographic Factors

Elevation is the most common factor in evaluating landslide susceptibility [11,32]. Different elevation ranges produce different soil types and vegetation types, as well as different rainfall and human activities. Therefore, elevation is usually considered an important driving factor of landslides (Figure 5a).
In general, slopes are closely related to the slope stress field, which will affect the failure mode and dynamic characteristics of landslides [33]. Slopes do not have a linear relationship with the occurrence of landslides. In practice, the probability of landslides will be higher within a certain slope range [32]. In this study, we reclassified slopes into six categories (Figure 5b).
Aspect is another significant topographical factor. It controls some microclimate numbers such as rainfall intensity, soil moisture, slope exposure, etc., thus indirectly affecting the occurrence of landslides [34]. On the basis of (Digital Elevation Model) DEM, the slope aspect is discretized according to the azimuth Angle. 45° is a grading state, and there are 9 categories in total (Figure 5c).
Plane curvature consists of surfaces that intersect the horizontal plane, the curvature of contour lines [32]. The curvature of the plane is divided into five groups, and the divergence or convergence of water flow downhill is determined. The plan curvature map was generated from the DEM and divided into eight classes (Figure 5d).
Profile curvature refers to the curvature of the curved line formed by the intersection of the earth’s surface and the vertical plane, and the profile curvature value is a direct reflection of the geometric characteristics of the slope surface [27]. The profile curvature was divided into five groups (Figure 5e).

3.2.2. Geological Factors

Geological structure is an essential factor leading to the occurrence of landslide disasters. The role of the middle fault zone is the most obvious. The rock mass in the fault zone is broken, and its stability is poor, which provides sufficient material sources for the landslide. At the same time, the fault plane, joint plane, and other geological structural planes control the spatial location of the sliding plane and the boundary of the landslide, which is conducive to the formation of landform conditions for landslide disasters [36]. In general, the closer the distance to the fault, the greater the probability of landslides. Therefore, we used the distance to faults as the index to reflect the effect of faults on landslides, and the index had seven classes (Figure 5f).

3.2.3. Environmental Factors

In mountainous areas of China, rainfall is the main factor triggering landslides [9]. This is because short-term high-intensity rainfall will increase pore water pressure, reduce slope stability, and then lead to landslides. Therefore, in our study, the annual average rainfall data from 1980 to 2018 were used for landslide susceptibility assessment. Rainfall was divided into seven classes (Figure 5g).
Different distribution densities and development degrees of rivers lead to different erosion abilities of slopes and surfaces [41]. The study area has developed water system and dense river network. Distance to rivers was divided into eight classes (Figure 5h).
Soil type is also an important factor affecting the occurrence of landslides [42]. Different soil types have different characteristics. For example, loess has the characteristics of water sensitivity and impermeability, so it is easy for loess collapsibility to occur when it encounters water, resulting in a lot of landslides. In addition, vegetation varies under different soil types. Soil type was divided into eleven classes (Figure 5i).
Different land cover types have different effects on slope stability. Moreover, the development and utilization of land will cause the destruction of the original vegetation type and the variation of surface runoff, which may cause the surface water to directly wash the slope body, thus causing landslide disasters [42]. Land cover was divided into ten classes (Figure 5j).
The development and distribution of landslides are affected by the vegetation cover [27], which is mainly reflected in the root-fixing effect of the roots and stems of vegetation on the slope surface. This is conducive to slowing down the overland water flow velocity and infiltration velocity. Four classes were reclassified based on a natural break method (Figure 5k).

3.2.4. Factors of Human Engineering Activities

The destruction of the natural environment by human activities becomes a landslide is one of the most influential factors. The construction of roads will lead to fragmentation of habitats, reduction of vegetation, and increase in unstable slopes. In this paper, the distribution of roads and the spatial distribution of landslide disaster points are used to reflect the influence of human activities on the development of landslide disasters [27]. Therefore, six different buffer classifications were generated for distance to roads (Figure 5l).

3.3. Multicollinearity Analysis

Before data analysis, the factors controlling landslides must be independent of each other. In other words, if there is a strong linear correlation between the above factors, it indicates that these factors exist in a multicollinearity problem. The multicollinearity problem will bring difficulties to the prediction of landslide occurrence and may lead to errors in the results. With tolerance (TOL) > 0.1 or variance inflation factor (VIF) < 10, this indicates that the variables are independent of each other [34]. TOL and VIF are calculated using the following Formula:
TOL = 1     R N 2
VIF = 1 TOL
where R N 2 is the coefficient of determination for a regression of analysis on all the other variables. When the VIF value is >10 or the TOL value is <0.1, the corresponding factor is multicollinearity, which should be eliminated from the landslide prediction model.

3.4. Modeling Approach

3.4.1. Frequency Ratio (FR) Model

The FR model is typically used to calculate the probability of landslide occurrence of each influencing factor in different grading intervals and to analyze the spatial relationship between landslide distribution and grading of each influencing factor [43]. The basic idea is to indirectly determine the input variables of the model by calculating the influence degree of each sub-interval of environmental factors on landslide development [27,44]. FR is generally an index reflecting the distribution density of a landslide within a certain range. When FR > 1, it indicates that the environmental factor interval is strongly correlated with landslides, and the landslide occurrence probability is large. When FR < 1, the correlation between the environmental factor interval and landslide is weak, and the probability of landslides is small. The frequency ratio can be calculated as follows:
FR = A A   /   S S
where, A is the number of landslides in each factor; A’ is the total number of landslides; S is the number of pixels of a particular class; S’ is the total number of pixels.

3.4.2. Logistic Regression (LR) Model

The LR model is a multivariate statistical analysis method to study the relationship between binomial classification results and several unrelated influencing factors (x1, x2, …, xn) [45]. The model can be used to predict the probability of landslide occurrence by studying the relationship between landslide susceptibility and disaster factors, in which the independent variables are the factor index value (x1, x2, …, xn). The dependent variable is whether landslide occurs, which are 1 and 0, respectively. Independent variables can be continuous variables, discrete variables, or any combination of the two, and do not need to meet the normal distribution [46]. The Logistic regression function is as follows [47]:
P y = 1 = 1 1 + e - ( α + β 1 x 1 + β 2 x 2 + + β n x n )
where: P is the probability of landslide disaster; α is the intercept; β is the regression coefficient. The output result of P is 0–1. In order to facilitate understanding and calculation, the natural logarithm of the left and right ends of the above equation is taken to obtain the linear regression equation as follows:
ln P 1 - P = α + β 1 x 1 + β 2 x 2 + + β n x n

3.4.3. Decision Tree (DT) Model

The DT model is a supervised non-parametric machine learning model that can be operated without prior knowledge of data distribution. It has the ability to easily interpret and model and handles the reduction of data complexity and relationships between variables. Compared with other models, the DT model is flexible, fast, and robust, and can be used to control the nonlinearity between input features and discrete classes, so that the nonlinear relationship between parameters does not affect the performance of the tree. In addition, the DT model is easy to construct and clarify, which makes it easy for decision-makers to use [48,49].

3.4.4. Random Forest (RF) Model

The RF is an ensemble learning method originally proposed by Breiman [50], which builds multiple decision trees through different data subsets and votes on the results of multiple decision trees to obtain the output of random forest. Numerous existing studies have shown that random forests have considerable tolerance for outliers and noise, are unlikely to over-fit, and have high prediction accuracy and stability [44]. When using RF model for classification prediction, n decision trees will be established, and each decision tree has one vote to select the optimal classification, and the final classification will be predicted by simple voting. The process of classification and prediction using RF model is as follows (Figure 6).
  • The RF model using the independent samples in random sampling method from the general background smoke m out a sample as an initial training dataset; as a result of the independent sampling method with back extraction, the initial training focused on each still has nearly a third of the data has not been taken, these data are called data outside bag, used to evaluate the model performance.
  • A total of n initial training datasets are extracted by using the above method, and each initial training dataset will train a decision tree without pruning and free growth, forming n classification results.
  • The output result of RF model is the type with the highest flat average probability value among n decision trees, and its probability value is calculated by the following Formula:
    P c = max P i = j = 1 n P ij n | i I
    where I is the set of all classes; n is the number of decision trees; Pi is the probability of occurrence of event I; Pij is the occurrence probability of Jth decision tree event I; Pc is the probability value corresponding to the final selected classification.

3.5. Validation of Model

Landslide is a typical binary classification problem. Confusion matrices are often used to analyze the prediction accuracy of models. In this study, 0.5 was used as the threshold to judge the predicted value [11]. If the predicted value is >0.5 is considered a landslide (assign 1); otherwise, it is considered a non-landslide (assign 0). Based on the confusion matrix, we select sensitivity, specificity, and accuracy as the model evaluation index, which are defined as:
Sensitivity =   TP     TP + FN
Specificity = TN   TN + FP  
Accuracy = TP + TN TP + FP + TN + FN
where TP, FP, TN, and FN are truly positive, false positive, true negative, and false negative, respectively [51].
The receiver operating characteristic (ROC) curves are widely used to evaluate the model performance in landslide binary classification, which mainly takes 1-specificity and sensitivity as the X and Y axes [52]. The sensitivity indicates the proportion of the actual landslide that is correctly predicted as a landslide. The 1-specificity indicates the proportion of actual non-landslides that are mispredicted to be landslides. The area under the ROC curve refers to the AUC value, which ranges from 0.5 to 1. The model with larger AUC value has a better prediction effect [10].

4. Results

4.1. Considering Multicollinearity of Factors Contributing to Landslide Susceptibility

All landslide impact factors were selected using the multicollinearity test for the creation of LSMs using different models. Table 2 lists the results of the multicollinearity analysis of landslide influencing factors. The VIF values of all landslide impact factors are smaller than the threshold value (10). Therefore, all impact factors are used to evaluate landslide susceptibility.

4.2. Influencing Factors Analyses Using FR Model

As shown in Table 3, the relationship between landslide occurrence of the FR model and relevant influencing factors is summarized. There is a strong relationship between elevation and the occurrence of landslides. In the elevation dataset, the FR values of <1000 m, 1000–1500 m, and 1500–2000 m sub-classes were higher, which were 3.04, 1.55, and 1.05, respectively. It indicated that this area had high landslide susceptibility. For the slope, the sub-classes 0–15 degree of the FR value was the highest FR values (FR = 1.27). For the aspect, flat had the highest FR value (FR = 15.19), followed by southwest, south, west, and southeast (1.20, 1.16, 1.12, and 1.12). However, for the factors of distance from rivers and roads, there was a significant negative correlation between FR value and distance; that is, FR value increased with the decrease in distance. Precipitation is the most important factor inducing landslides in mountainous areas. The 750–840 mm sub-classes had the highest FR value (FR = 1.32), followed by the 480–570 mm and 660–750 mm sub-classes with FR values of 1.12 and 1.01, respectively. For the NDVI, except for the subcategory 0.6–0.8, the FR value of other sub-classes increased as the factor class value increased, with the FR value reaching the maximum at the sub-classes of 0.8–0.9. In the case of soil type factor, the FR values of Yellow frozen soil, Yellow earth, Cinnamon soil, and Carbonate Cinnamon soil were 1.74, 1.55, 1.08, and 1.05, respectively, indicating that these soil types were significantly correlated with landslides. In the result of the land cover factor, the FR value of transportation was the highest (14.63), followed by residential land, water body, agricultural land, bare land, industrial and mining storage land (3.12, 2.74, 1.88, 1.72, and 1.29). In the case of distance to faults, the highest FR value of 1.38 was 1500–2000 m, followed by the sub-classes of 0–1500 mm (FR = 1.31). For the profile curvature, the results of FR showed that the sub-classes of 20–30 mm had the highest FR value (1.28).

4.3. Landslide Susceptibility Models

LSM of LR, DT, and RF models are compared and analyzed. The LSM of each model was divided into five susceptibility zones, namely, very low, low, moderate, high, and very high. The relative area percentages of these classes in each model were then calculated (Table 4). For the LR model, In terms of the significance level of each variable varies greatly. Five factors, namely elevation, rainfall, NDVI, land cover, and distance to roads, had significant correlations with the probability of landslide occurrence but had no significant correlations with the other seven factors (Table 5). The proportions of very low, low, moderate, high, and very high levels were 22.48% (6272.23 km2), 24.70% (6891.50 km2), 21.94% (6120.90 km2), 19.04% (5311.27 km2) and 11.84% (3304.09 km2), respectively (Table 4; Figure 7). For the DT model, Figure 8 shows the pruned regression tree that takes the full tree as the first argument and the chosen complexity parameter as the second. According to Figure 8, the most influential factors were elevation, NDVI, and distance to faults, respectively. The proportion of very low, low, moderate, high, and very high levels were 15.52% (4329.67 km2), 17.73% (4945.91 km2), 16.59% (4629.25 km2), 18.38% (5127.19 km2), and 31.78% (8867.98 km2), respectively (Table 4; Figure 9). For the RF model, we use hyper-parameter (n_estimators) to optimize the random forest model, and prediction results indicate that when the resulting model is applied to new observations, the answer will have a 25% time error. That is, 75% of the results are accurate, which is a better model (Figure 10). The proportions of very low, low, moderate, high, and very high levels were 27.43% (7653.20 km2), 23.24% (6483.50 km2), 19.61% (5470.81 km2), 17.42% (4858.97 km2), and 12.31% (3433.52 km2), respectively (Table 4, Figure 11). In addition, the high and very high susceptibility areas in the study area were mainly distributed in the eastern, northeastern, and southern regions (Figure 7, Figure 9 and Figure 11). In general, the DT model has the largest proportion of very high and high susceptibility regions (Table 4; Figure 12).

4.4. Accuracy Assessment and Comparison

In order to evaluate the model performance, the accuracy values and the receiver operating characteristic (ROC) curves were used to test the accuracy of the three models. For the ROC curves, the horizontal axis of the ROC curve was the false positive rate (1-specificity), the vertical axis was the true positive rate (sensitivity), and the area under the line can be used as an index to evaluate the accuracy of a certain judgment method. The accuracy values of LR, DT, and RF models were 0.73, 0.74, and 0.76, respectively (Table 6). The AUC values of the LR, DT, and RF models were 0.81, 0.77, and 0.83, respectively (Table 6, Figure 13). Through analysis, the false negative rate of the LR, DT, and RF models was 0.23, 0.17, and 0.19, respectively. It manifested that the prediction power of the RF model was higher than that of the DT and LR models, which could be adequately applied to the evaluation of regional landslide susceptibility.

4.5. Relative Importance of Impact Factors

By calculating the RF algorithm in R software, we obtained the contribution of 12 factors to the occurrence of landslides. Among all factors, the elevation (19.58%), NDVI (12.78%), distance to roads (11.72%), and rainfall (8.51%) factors had higher weight values than the other factors, indicating that the four factors were more vital for landslide occurrence (Figure 14). This was followed by distance to rivers (8.00%), distance to faults (7.55%), slope (6.87%), aspect (6.60%), plan curvature (5.78%), profile curvature (5.13%), land cover (4.24%) and soil types (3.24%).

5. Discussion

5.1. Evaluation of Landslide Susceptibility Model

With the altering in the natural environment and the intensification of human activities, landslide disasters have been increasing in recent decades, causing many unpredictable results [53]. Landslides are affected by many factors (topographic, geological, environmental, and human activities), and their causes and predictions are quite complex. LSM is of great significance for the visual analysis of landslide-prone areas. It is urgent to obtain a high-precision LSM. The low-precision LSM will cause serious economic losses and irreversible consequences. Therefore, the main objective of this study is to apply a statistical model (LR) and two machine learning models (DT and RF models) to regional landslide susceptibility analysis and comparison. Finally, a prediction model suitable for landslide susceptibility in mountainous areas is established. It provides some informational support for landslide prevention.
Based on the performance assessment, the result showed that the RF model performed better than the other two models, with the highest AUC values (0.83). Our results also have an agreement with the previous literature [10,54]. For example, Rahmati et al. [54] compared a variety of machine-learning models and found that the RF model had the highest prediction accuracy for landslide susceptibility. Dou et al. [10] also compared the two models of LSM on the Izu-Oshima volcanic island and found the same results. The RF model showed that the eastern, northeastern, and southern of Longnan had higher landslide risk than other areas.
In addition, high and very high landslide susceptibility was detected for 29.73% of the land area of Longnan City, mainly in the eastern, northeastern, and southern regions (Figure 11). These areas are mainly located in lower-elevation river valleys (Figure 5a) and areas with more human activity (Figure 5l), resulting in a high number of unstable slopes in the region. In addition, these areas also receive relatively high rainfall (Figure 5f), which induces various unstable slopes. Ultimately, these areas are at higher risk of landslides.

5.2. The Impact of Influencing Factor on Landslide Occurrence

In our study, 12 factors were used to construct a landslide susceptibility model. However, before assessing landslide susceptibility, it is necessary to analyze the predictive power of all the influencing factors. To achieve this objective, we used multicollinearity analysis to estimate the correlation between these factors. Through TOL and VIF values, this research verifies that there is no multicollinearity among the 12 landslide impact factors (Table 2). All factors are applied to evaluate landslide susceptibility.
Furthermore, in order to obtain the influence of these factors on landslides, we used the RF algorithm to rank the importance of these factors. Regarding the results of the RF algorithm, the elevation (19.58%), NDVI (12.78%), distance to roads (11.72%), and rainfall (8.51%) factors had higher weight values than the other factors, indicating that the four factors were more vital for landslide occurrence (Figure 14). Firstly, this may be due to the typical mountainous terrain in Longnan City. High-altitude areas are mainly rocky cliffs, so the resistance to weathering is strong. However, low-altitude areas are prone to landslides due to the disturbance of many anthropogenic activities [44,55]. A strong relationship between NDVI and landslide has been reported in the literature [34]. First of all, the surface vegetation has a certain protective effect on the slope surface and inhibits rainfall infiltration to a certain extent, which is consistent with the research results of Wang et al. [56]. Secondly, most views believe that there is a certain correlation between vegetation and some stable old landslides. By analyzing the influence of road distance on landslides (Figure 14), we found that road distance was greatly correlated with the occurrence of landslides. Other recent studies have come to the same conclusion because of the increasing human urbanization process, which damages the natural environment and exacerbates geological hazards [39,44]. In addition, our results also indicate that rainfall is another major factor affecting the occurrence of landslides. This is mainly due to the fact that the mean annual precipitation in the study area is between 480–932 mm, mainly concentrated in June to September, and there are more heavy rains [37]. Alvioli et al.’s [57] study also pointed out that in recent years, the occurrence of sudden rainstorms caused by global warming has been frequent, leading to frequent landslides. It is critical to note that this is not the case in all regions, and the importance of factors affecting landslides can alter as local natural conditions differ from the intensity of human activity. Ngo et al. [11] reported that slope, land cover, distance to faults, and geology are the four most important factors affecting the occurrence of landslides.

6. Conclusions

The present research contributes to the comparison and evaluation of RF with DT and LR models in LSM. The major conclusions are summarized as follows:
  • Overall, the RF models showed 2.0% and 6.0% higher performance compared to LR and DT, which manifested that the RF model had the best landslide prediction performance. High and very high landslide susceptibility was detected for 29.73% of the land area of Longnan City, Gansu Province, mainly in the eastern, northeastern, and southern regions.
  • Based on the results of the FR model, it can be known that most landslides occurred at slopes of 0–15°, elevations of <1000 m, distance to rivers of 0–500 m, rainfall of 750–840 mm, NDVI of 0.8–0.9, distance to roads of 0–500 m, distance to faults of 1500–2000 m and transportation land.
  • The FR model result indicated that elevation is the most effective variable on landslide occurrences in Longnan City, followed by NDVI, distance to roads, rainfall, distance to rivers, distance to faults, slope, aspect, plan curvature, profile curvature, land cover, and soil types.
  • All the models used in this paper are single models, and the accuracy of the results needs to be further improved. However, the coupled model may be a better method. This paper identifies the areas with high and very high landslide susceptibility, which has certain guiding significance for the future urban development planning of the study area. However, the internal mechanism of landslide occurrence has not been studied at present. The follow-up research needs to investigate and sample typical landslides in this area and explore the internal mechanisms of landslide occurrence at the micro-scale by using a structural equation model.
Our results help to better understand landslide hazards, which is very useful for developing appropriate hazard management measures.

Author Contributions

J.G. collected all the data. J.G., X.S., L.L., Z.Z. and J.W. analyzed the data. J.G. and X.S. wrote the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This study was funded by [Postdoctoral Fund of Gansu Academy of Sciences] grant number [BSH2021-02] and [the Science and Technology Foundation for Young Scientists of Gansu Province] grant number [22JR5RA773]. And The APC was funded by [Postdoctoral Fund of Gansu Academy of Sciences].

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are openly available on request.

Acknowledgments

The author would like to thank the editor and the reviewers for their valuable suggestions and constructive comments.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

FRFrequency ratio
NDVINormalized Difference Vegetation Index
DTDecision tree
LRLogistic regression
RFRandom forest
LSMLandslide susceptibility mapping
ROCReceiver operating characteristic
SVMSupport vector machine
ANNArtificial neural network
WOEWeight of evidence
SPIStream power index
TWITopographic wetness index
STISediment transport index
DEMDigital Elevation Model
TOLTolerance
VIFVariance inflation factor
AUCArea under the ROC curve

References

  1. Malamud, B.D.; Turcotte, D.L.; Guzzetti, F.; Reichenbach, P. Landslide inventories and their statistical properties. Earth Surf. Process. Landf. 2004, 29, 687–711. [Google Scholar] [CrossRef]
  2. Sarker, A.A.; Rashid, A.K.M. Landslide and flashflood in Bangladesh. In Disaster Risk Reduction Approaches in Bangladesh; Springer: Tokyo, Japan, 2013; pp. 165–189. [Google Scholar]
  3. Dang, V.H.; Dieu, T.B.; Tran, X.L.; Hoang, N.-D. Enhancing the accuracy of rainfall-induced landslide prediction along mountain roads with a GIS-based random forest classifier. Bull. Eng. Geol. Environ. 2019, 78, 2835–2849. [Google Scholar] [CrossRef]
  4. Saito, H.; Uchiyama, S.; Hayakawa, Y.S.; Obanawa, H. Landslides triggered by an earthquake and heavy rainfalls at Aso volcano, Japan, detected by UAS and SfM-MVS photogrammetry. Prog. Earth Planet. Sci. 2018, 5, 15. [Google Scholar] [CrossRef]
  5. Hussain, G.; Singh, Y.; Singh, K.; Bhat, G.M. Landslide susceptibility mapping along national highway-1 in Jammu and Kashmir State (India). Innov. Infrastruct. Solut. 2019, 4, 59. [Google Scholar] [CrossRef]
  6. Shao, L. Geological disaster prevention and control and resource protection in mineral resource exploitation region. Int. J. Low-Carbon Technol. 2019, 14, 142–146. [Google Scholar] [CrossRef] [Green Version]
  7. Zhang, W.; Li, H.; Han, L.; Chen, L.; Wang, L. Slope stability prediction using ensemble learning techniques: A case study in Yunyang County, Chongqing, China. J. Rock Mech. Geotech. Eng. 2022, 4, 1089–1099. [Google Scholar] [CrossRef]
  8. Wang, H.; Wang, L.; Zhang, L. Transfer learning improves landslide susceptibility assessment. Gondwana Res. 2022, 15, 8765–8784. [Google Scholar] [CrossRef]
  9. Sun, D.L.; Wen, H.J.; Wang, D.Z.; Xu, J.H. A random forest model of landslide susceptibility mapping based on hyperparameter optimization using Bayes algorithm. Geomorphology 2020, 362, 107201. [Google Scholar] [CrossRef]
  10. Dou, J.; Yunus, A.P.; Bui, D.T.; Merghadi, A.; Sahana, M.; Zhu, Z.; Chen, C.-W.; Khosravi, K.; Yang, Y.; Pham, B.T. Assessment of advanced random forest and decision tree algorithms for modeling rainfall-induced landslide susceptibility in the Izu-Oshima Volcanic Island, Japan. Sci. Total Environ. 2019, 662, 332–346. [Google Scholar] [CrossRef]
  11. Ngo, P.T.T.; Panahi, M.; Khosravi, K.; Ghorbanzadeh, O.; Kariminejad, N.; Cerda, A.; Lee, S. Evaluation of deep learning algorithms for national scale landslide susceptibility mapping of Iran. Geosci. Front. 2021, 12, 505–519. [Google Scholar]
  12. Bui, D.T.; Tuan, T.A.; Klempe, H.; Pradhan, B.; Revhaug, I. Spatial prediction models for shallow landslide hazards: A comparative assessment of the efficacy of support vector machines, artificial neural networks, kernel logistic regression, and logistic model tree. Landslides 2016, 13, 361–378. [Google Scholar]
  13. An, K.; Kim, S.; Chae, T.; Park, D. Developing an accessible landslide susceptibility model using open-source resources. Sustainability 2018, 10, 293. [Google Scholar] [CrossRef]
  14. Moresi, F.V.; Maesano, M.; Collalti, A.; Sidle, R.C.; Matteucci, G.; Scarascia Mugnozza, G. Mapping Landslide Prediction through a GIS-Based Model: A Case Study in a Catchment in Southern Italy. Geosciences 2020, 10, 309. [Google Scholar] [CrossRef]
  15. Pal, S.C.; Chowdhuri, I. GIS-based spatial prediction of landslide susceptibility using frequency ratio model of Lachung River basin, North Sikkim, India. SN Appl. Sci. 2019, 1, 416. [Google Scholar] [CrossRef] [Green Version]
  16. Tien Bui, D.; Pham, B.T.; Nguyen, Q.P.; Hoang, N.-D. Spatial prediction of rainfall-induced shallow landslides using hybrid integration approach of Least-Squares Support Vector Machines and differential evolution optimization: A case study in Central Vietnam. Int. J. Digit. Earth 2016, 9, 1077–1097. [Google Scholar] [CrossRef]
  17. Chen, W.; Pourghasemi, H.R.; Zhao, Z. A GIS-based comparative study of Dempster-Shafer, logistic regression and artificial neural network models for landslide susceptibility mapping. Geocarto Int. 2017, 32, 367–385. [Google Scholar] [CrossRef]
  18. Tian, N.M.; Lan, H.X.; Wu, Y.M.; Li, L.P. Performance comparison of BP artificial neural network and CART decision tree model in landslide susceptibility prediction. J. Geo-Inf. Sci. 2020, 22, 2304–2316. [Google Scholar]
  19. Ozdemir, A.; Altural, T. A comparative study of frequency ratio, weights of evidence and logistic regression methods for landslide susceptibility mapping: Sultan Mountains, SW Turkey. J. Asian Earth Sci. 2013, 64, 180–197. [Google Scholar] [CrossRef]
  20. Sevgen, E.; Kocaman, S.; Nefeslioglu, H.A.; Gokceoglu, C. A novel performance assessment approach using photogrammetric techniques for landslide susceptibility mapping with logistic regression, ANN and random forest. Sensors 2019, 19, 3940. [Google Scholar] [CrossRef] [Green Version]
  21. Jebur, M.N.; Pradhan, B.; Tehrany, M.S. Optimization of landslide conditioning factors using very high-resolution airborne laser scanning (LiDAR) data at catchment scale. Remote Sens. Environ. 2014, 152, 150–165. [Google Scholar] [CrossRef]
  22. Youssef, A.M.; Pourghasemi, H.R.; Pourtaghi, Z.S.; Al-Katheeri, M.M. Landslide susceptibility mapping using random forest, boosted regression tree, classification and regression tree, and general linear models and comparison of their performance at Wadi Tayyah Basin, Asir Region, Saudi Arabia. Landslides 2016, 13, 839–856. [Google Scholar] [CrossRef]
  23. Zhu, A.X.; Miao, Y.M.; Yang, L.; Bai, S.B.; Liu, J.Z.; Hong, H.Y. Comparison of the presence-only method and presence-absence method in landslide susceptibility mapping. Catena 2018, 171, 222–233. [Google Scholar] [CrossRef]
  24. Sun, D.; Wen, H.; Zhang, Y.; Xue, M. An optimal sample selection-based logistic regression model of slope physical resistance against rainfall-induced landslide. Nat. Hazards 2021, 105, 1255–1279. [Google Scholar] [CrossRef]
  25. Zhao, W.Y.; Tian, Y.; Wu, L.; Liu, Y. Human Impact Index in Landslide Susceptibility Mapping. In Proceedings of the 18th International Conference on Geoinformatics, Beijng, China, 18–20 June 2010. [Google Scholar]
  26. Achour, Y.; Pourghasemi, H.R. How do machine learning techniques help in increasing accuracy of landslide susceptibility maps? Geosci. Front. 2020, 11, 871–883. [Google Scholar] [CrossRef]
  27. Chen, X.; Chen, W. GIS-based landslide susceptibility assessment using optimized hybrid machine learning methods. Catena 2021, 196, 104833. [Google Scholar] [CrossRef]
  28. Bai, S.B.; Wang, J.; Thiebes, B.; Cheng, C.; Chang, Z.Y. Susceptibility assessments of the Wenchuan earthquake-triggered landslides in Longnan using logistic regression. Environ. Earth Sci. 2014, 71, 731–743. [Google Scholar] [CrossRef]
  29. Xie, X. Risk Assessment of Geological Disasters in Longnan City Based on GIS. 2016. Available online: http://cdmd.cnki.com.cn/Article/CDMD-10718-1015721029.htm (accessed on 8 July 2022).
  30. Tian, F.; Zhang, J.; Ran, Y.H. Assessment of debris flow disaster hazard and influence factors in Longnan district. J. Catastrophol. 2017, 32, 197–203. [Google Scholar]
  31. Team, R.C. R: A language and environment for statistical computing. R Found. Stat. Comput. 2020. Available online: http://www.R-project.org/ (accessed on 8 July 2022).
  32. Guo, Z.; Shi, Y.; Huang, F.; Fan, X.; Huang, J. Landslide susceptibility zonation method based on C5. 0 decision tree and K-means cluster algorithms to improve the efficiency of risk management. Geosci. Front. 2021, 12, 101249. [Google Scholar] [CrossRef]
  33. Hong, H.; Shahabi, H.; Shirzadi, A.; Chen, W.; Chapi, K.; Ahmad, B.B.; Roodposhti, M.S.; Hesar, A.Y.; Tian, Y.; Bui, D.T. Landslide susceptibility assessment at the Wuning area, China: A comparison between multi-criteria decision making, bivariate statistical and machine learning methods. Nat. Hazards 2019, 96, 173–212. [Google Scholar] [CrossRef]
  34. Wang, Q.; Guo, Y.; Li, W.; He, J.; Wu, Z. Predictive modeling of landslide hazards in Wen County, northwestern China based on information value, weights-of-evidence, and certainty factor. Geomat. Nat. Hazards Risk 2019, 10, 820–835. [Google Scholar] [CrossRef] [Green Version]
  35. Chen, W.; Chen, X.; Peng, J.; Panahi, M.; Lee, S. Landslide susceptibility modeling based on ANFIS with teaching-learning-based optimization and Satin bowerbird optimizer. Geosci. Front. 2021, 12, 93–107. [Google Scholar] [CrossRef]
  36. Bui, D.T.; Tsangaratos, P.; Nguyen, V.T.; Liem, N.V.; Trinh, P.T. Comparing the prediction performance of a Deep Learning Neural Network model with conventional machine learning models in landslide susceptibility assessment. Catena 2020, 188, 104426. [Google Scholar] [CrossRef]
  37. Tian, Y.; Xu, C.; Ma, S.; Zhang, H. Inventory and spatial distribution of landslides triggered by the 8th August 2017 MW 6.5 Jiuzhaigou earthquake, China. J. Earth Sci. 2019, 30, 206–217. [Google Scholar] [CrossRef]
  38. Guzzetti, F.; Mondini, A.C.; Cardinali, M.; Fiorucci, F.; Santangelo, M.; Chang, K.-T. Landslide inventory maps: New tools for an old problem. Earth-Sci. Rev. 2012, 112, 42–66. [Google Scholar] [CrossRef]
  39. Pham, B.T.; Nguyen-Thoi, T.; Qi, C.; Phong, T.V.; Dou, J.; Ho, L.S.; Le, H.V. Prakash, I. Coupling RBF neural network with ensemble learning techniques for landslide susceptibility mapping. Catena 2020, 195, 104805. [Google Scholar] [CrossRef]
  40. Pourghasemi, H.R.; Kornejady, A.; Kerle, N.; Shabani, F. Investigating the effects of different landslide positioning techniques, landslide partitioning approaches, and presence-absence balances on landslide susceptibility mapping. Catena 2020, 187, 104364. [Google Scholar] [CrossRef]
  41. Kavzoglu, T.; Sahin, E.K.; Colkesen, I. Selecting optimal conditioning factors in shallow translational landslide susceptibility mapping using genetic algorithm. Eng. Geol. 2015, 192, 101–112. [Google Scholar] [CrossRef]
  42. Yuvaraj, R.M.; Dolui, B. Statistical and machine intelligence based model for landslide susceptibility mapping of Nilgiri district in India. Environ. Chall. 2021, 5, 100211. [Google Scholar]
  43. Zhang, T.; Han, L.; Han, J.; Li, X.; Zhang, H.; Wang, H. Assessment of landslide susceptibility using integrated ensemble fractal dimension with kernel logistic regression model. Entropy 2019, 21, 218. [Google Scholar] [CrossRef] [Green Version]
  44. Zhou, X.; Wen, H.; Zhang, Y.; Xu, J.; Zhang, W. Landslide susceptibility mapping using hybrid random forest with GeoDetector and RFE for factor optimization. Geosci. Front. 2021, 12, 101211. [Google Scholar] [CrossRef]
  45. Schumacher, M.; Roßner, R.; Vach, W. Neural networks and logistic regression: Part I. Comput. Stat. Data Anal. 1996, 21, 661–682. [Google Scholar] [CrossRef]
  46. Ohlmacher, G.C.; Davis, J.C. Using multiple logistic regression and GIS technology to predict landslide hazard in northeast Kansas USA. Eng. Geol. 2003, 69, 331–343. [Google Scholar] [CrossRef]
  47. Hosmer, D.; Lemeshow, S. Wiley Series in Probability and Statistics. In Applied Logistic Regression, 2nd ed.; John Wiley & Sons, Inc.: New York, NY, USA, 2000; p. 354. [Google Scholar] [CrossRef]
  48. Yeon, Y.K.; Han, J.G.; Ryu, K.H. Landslide susceptibility mapping in Injae, Korea, using a decision tree. Eng. Geol. 2010, 116, 274–283. [Google Scholar] [CrossRef]
  49. Kadavi, P.R.; Lee, C.W.; Lee, S. Landslide-susceptibility mapping in Gangwon-do, South Korea, using logistic regression and decision tree models. Environ. Earth Sci. 2019, 78, 116. [Google Scholar] [CrossRef]
  50. Breiman, L.; Cutler, A. Random forest. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
  51. Pham, B.T.; Shirzadi, A.; Shahabi, H.; Omidvar, E.; Singh, S.K.; Sahana, M.; Asl, D.T.; Ahmad, B.B.; Quoc, N.K.; Lee, S. Landslide susceptibility assessment by novel hybrid machine learning algorithms. Sustainability 2019, 11, 4386. [Google Scholar] [CrossRef] [Green Version]
  52. Brenning, A. Spatial prediction models for landslide hazards: Review, comparison and evaluation. Nat. Hazards Earth Syst. Sci. 2005, 5, 853–862. [Google Scholar] [CrossRef]
  53. He, S.; Pan, P.; Dai, L.; Wang, H.; Liu, J. Application of kernel-based fisher discriminant analysis to map landslide susceptibility in the qinggan river delta, three gorges, China. Geomorphology 2012, 171, 30–41. [Google Scholar] [CrossRef]
  54. Rahmati, O.; Tahmasebipour, N.; Haghizadeh, A.; Pourghasemi, H.R.; Feizizadeh, B. Evaluation of different machine learning models for predicting and mapping the susceptibility of gully erosion. Geomorphology 2017, 298, 118–137. [Google Scholar] [CrossRef]
  55. Ding, Q.; Chen, W.; Hong, H. Application of frequency ratio, weights of evidence and evidential belief function models in landslide susceptibility mapping. Geocarto Int. 2017, 32, 619–639. [Google Scholar] [CrossRef]
  56. Wang, Y.; Fang, Z.C.; Hong, H.Y. Comparison of convolutional neural networks for landslide susceptibility mapping in Yanshan County, China. Sci. Total Environ. 2019, 666, 975–993. [Google Scholar] [CrossRef] [PubMed]
  57. Alvioli, M.; Guzzetti, F.; Rossi, M. Scaling properties of rainfall induced landslides predicted by a physically based model. Geomorphology 2014, 213, 38–47. [Google Scholar] [CrossRef]
Figure 1. Location of the study area. (a) China; (b) Longnan City.
Figure 1. Location of the study area. (a) China; (b) Longnan City.
Sustainability 14 16716 g001
Figure 2. Flowchart of the methods.
Figure 2. Flowchart of the methods.
Sustainability 14 16716 g002
Figure 3. Preparation of the landslide inventory. (a) Distribution of landslide, (b) Typical landslide in Tanchang County, (c) Typical landslide in Cheng County, (d,e) represent typical landslides in Wudu District respectively.
Figure 3. Preparation of the landslide inventory. (a) Distribution of landslide, (b) Typical landslide in Tanchang County, (c) Typical landslide in Cheng County, (d,e) represent typical landslides in Wudu District respectively.
Sustainability 14 16716 g003
Figure 4. (a) types and (b) volume of historical landslides.
Figure 4. (a) types and (b) volume of historical landslides.
Sustainability 14 16716 g004
Figure 5. Landslide conditioning factors: (a) elevation, (b) slope, (c) aspect, (d) plan curvature, (e) profile curvature, (f) distance to faults, (g) rainfall, (h) distance to rivers, (i) soil types, (j) land cover, (k) NDVI, (l) distance to roads.
Figure 5. Landslide conditioning factors: (a) elevation, (b) slope, (c) aspect, (d) plan curvature, (e) profile curvature, (f) distance to faults, (g) rainfall, (h) distance to rivers, (i) soil types, (j) land cover, (k) NDVI, (l) distance to roads.
Sustainability 14 16716 g005aSustainability 14 16716 g005b
Figure 6. The process of random forest.
Figure 6. The process of random forest.
Sustainability 14 16716 g006
Figure 7. The landslide susceptibility map of the LR model.
Figure 7. The landslide susceptibility map of the LR model.
Sustainability 14 16716 g007
Figure 8. Optimally pruned regression tree for the study area.
Figure 8. Optimally pruned regression tree for the study area.
Sustainability 14 16716 g008
Figure 9. The landslide susceptibility map of the DT model.
Figure 9. The landslide susceptibility map of the DT model.
Sustainability 14 16716 g009
Figure 10. The error rate of the overall RF model (OOB out of bag (black line)); 0, absent landslide (red line); and 1, present landslide (green line).
Figure 10. The error rate of the overall RF model (OOB out of bag (black line)); 0, absent landslide (red line); and 1, present landslide (green line).
Sustainability 14 16716 g010
Figure 11. The landslide susceptibility map of the RF model.
Figure 11. The landslide susceptibility map of the RF model.
Sustainability 14 16716 g011
Figure 12. Area percentages of different landslide susceptibility classes.
Figure 12. Area percentages of different landslide susceptibility classes.
Sustainability 14 16716 g012
Figure 13. ROC curves of three models.
Figure 13. ROC curves of three models.
Sustainability 14 16716 g013
Figure 14. Relative importance calculated from the RF model.
Figure 14. Relative importance calculated from the RF model.
Sustainability 14 16716 g014
Table 1. Landside conditioning factors and their classes.
Table 1. Landside conditioning factors and their classes.
FactorsClassesData ScaleTechniquesRef.
Elevation (m)<1000
1000–1500
1500–2000
2000–2500
2500–3000
3000–3500
>3500
www.gscloud.cn/search (8 July 2022)
Geospatial data cloud
30 × 30 m (Digital Elevation Model) DEM[32]
Slope (°)0–15
15–30
30–45
45–60
60–75
75–82
www.gscloud.cn/search (8 July 2022)
Geospatial data cloud
30 m × 30 m DEM[33]
Aspect (°)F (−1)
N (0–22.5; 337.5–360)
NE (22.5–67.5)
E (67.5–112.5)
SE (112.5–157.5)
S (157.5–202.5)
SW (202.5–247.5)
W (247.5–292.5)
NW (292.5–337.5)
www.gscloud.cn/search (8 July 2022) Geospatial data cloud30 × 30 m DEM[34]
Plan curvature 0–10
10–20
20–30
30–40
40–50
50–60
60–70
70–82
www.gscloud.cn/search (8 July 2022) Geospatial data cloud30 × 30 m DEM[32]
Profile curvature 0–10
10–20
20–30
30–40
40–50
www.gscloud.cn/search (8 July 2022)
Geospatial data cloud
30 × 30 m DEM[32]
NDVI0.2–0.4
0.4–0.6
0.6–0.8
0.8–0.9
https://www.resdc.cn/data.aspx?DATAID=122 (8 July 2022)
Resources and Environmental Science and Data Center
NDVI = NIR − IR/NIR + IR
where NIR and IR are the near infrared and red bands of the electromagnetic spectrum
[27]
Land coverwater bodies
Grassland
Agricultural land
Residential land
Industrial and mining storage land
Transportations Woodland
Bare land
Garden land
Landfor green
Marsh
https://www.ncdc.ac.cn (11 July 2022)
National Data Center for Glacial and Frozen Desert Science
Supervised classification
(Maximum likelihood)
[34]
Rainfall (mm)480–570
570–660
660–750
750–840
840–932
https://www.resdc.cn/data.aspx?DATAID=122 (11 July 2022)
Resources and Environmental Science and Data Center
Kriging Interpolation method[35]
Soil typesYellow frozen soil
Cultivated loessial soils
Cinnamon soil
Subalpine meadow steppe soil
Brown earths Carbonate Cinnamon soil
Dark Chestnut soil
Sticky disc yellow brown earths
Alpine meadow soil
Argillaceous dark brown soil
Yellow earths
https://www.ncdc.ac.cn (11 July 2022)
National Data Center for Glacial and Frozen Desert Science
Digitization process[27]
Distance to rivers (m)0–500
500–1000
1000–1500
1500–2000
2000–2500
>2500
https://www.webmap.cn/main.do?method=index (12 July 2022)
National Catalogue Service For Geographic Information
Buffering[32]
Distance to faults (m)0–1500
1500–3000
3000–4500
4500–6000
6000–7500
>7500
https://data.earthquake.cn (12 July 2022)
China Earthquake Data center
Buffering[36]
Distance to roads (m)0–500
500–1000
1000–1500
1500–2000
2000–2500
>2500
https://www.webmap.cn/main.do?method=index (12 July 2022)
National Catalogue Service For Geographic Information
Buffering[27]
Table 2. Multicollinearity analysis of landslide influencing factors.
Table 2. Multicollinearity analysis of landslide influencing factors.
Landslide Influencing FactorsStatistics
TOLVIF
Elevation (m)0.5001.999
Slope (°)0.6821.466
Aspect (°)0.9881.002
Plan curvature0.7181.392
Profile curvature0.9151.093
Distance to faults (m)0.8621.161
Rainfall (mm)0.4862.056
Distance to rivers (m)0.8151.227
Soil types0.7241.380
Land cover0.7561.323
NDVI0.5661.766
Distance to roads (m)0.7531.328
Table 3. Analysis of the relationship between each selected factor and landslides.
Table 3. Analysis of the relationship between each selected factor and landslides.
Influencing FactorClassNo. of LandslidesPercent of Landslide (%)No. of Pixels in DomainPercentage of Domain (%)Frequency Ratio
Elevation (m)<100020012.071,391,1353.973.04
1000–150070342.439,570,77027.291.55
1500–200065939.7713,299,09837.921.05
2000–2500925.556,947,40119.810.28
2500–300030.183,066,3178.740.02
3000–350000.00694,9821.980.00
>350000.0099,0150.280.00
Slope (°)0–1555133.259,184,01726.221.27
15–3073944.6016,791,39147.940.93
30–4533520.228,297,37823.690.85
45–60321.93738,7052.110.92
60–7500.0013,1070.040.00
75–8200.00290.000.00
Aspect (°)F (−1)80.4811,1350.0315.19
N(0–22.5; 337.5–360)17010.264,801,02813.710.75
NE (22.5–67.5)17910.804,637,52913.240.82
E(67.5–112.5)16710.083,847,61010.990.92
SE(112.5–157.5)24614.854,637,89813.241.12
S(157.5–202.5)27316.484,984,23414.231.16
SW(202.5–247.5)24614.854,338,03812.391.20
W(247.5–292.5)18511.163,498,2719.991.12
NW(292.5–337.5)18311.044,268,88412.190.91
Plan curvature0–1024614.855,488,77915.690.95
10–2037622.698,061,71223.050.98
20–3031318.896,297,13218.001.05
30–4021112.734,315,82412.341.03
40–501368.212,991,3278.550.96
50–601197.182,408,3186.881.04
60–701006.041,952,8655.581.08
70–821569.413,464,6909.900.95
Profile curvature0–10121273.1425,788,50873.720.99
10–2038523.238,193,86323.420.99
20–30583.50954,7152.731.28
30–4020.1243,2120.120.98
40–5000.003490.000.00
Distance to faults (m)<150020412.3141689.371.31
1500–300021212.7941299.291.38
3000–45001579.4740899.201.03
4500–60001639.8440649.141.08
6000–750017610.6239228.821.20
>750074544.9624,09754.190.83
Rainfall (mm)480–57069942.1842,03737.731.12
570–66027216.4227,70924.870.66
660–75038923.4825,84023.201.01
750–84028217.0214,34512.881.32
840–932150.9114721.320.69
Distance to rivers (m)0–50049029.57826918.591.59
500–100036522.03779017.521.26
1000–150024714.91718116.150.92
1500–20001649.90607213.650.72
2000–25001307.85498611.210.70
>250026115.7510,17122.870.69
Soil typesYellow frozen soil36221.85462112.561.74
Cultivated loessial soils00.00170.050.00
Cinnamon soil31318.89643517.491.08
Subalpine meadow steppe soil00.004871.320.00
Brown earths72543.7519,19552.170.84
Carbonate Cinnamon soil1519.1132448.821.03
Dark Chestnut soil00.002570.700.00
Sticky disc yellow brown earths40.244581.240.19
Alpine meadow soil70.424941.340.31
Argillaceous dark brown soil00.002180.590.00
Yellow earths955.7313653.711.55
Land coverWater body130.781040.292.74
Grassland1438.63409511.260.77
Agricultural land95457.5711,12630.591.88
Residential land231.391620.453.12
Industrial and mining storage land10.06170.051.29
Transportations20.1230.0114.63
Woodland49529.8720,52656.440.53
Bare land261.573320.911.72
landfor green00.0010.000.00
Marsh00.0020.010.00
NDVI0.2–0.4 50.301040.370.80
0.4–0.6995.9717806.420.93
0.6–0.880448.5217,37962.640.77
0.8–0.974945.20848130.571.48
Distance to roads (m)0–50055533.49761917.131.95
500–100033620.28636214.311.42
1000–150021512.98578313.001.00
1500–20001589.54491011.040.86
2000–25001046.2841429.310.67
>250026315.8715,65335.200.45
Table 4. Percentage and area in each susceptibility class.
Table 4. Percentage and area in each susceptibility class.
ModelSusceptibility Classes
Very Low (%)Low (%)Moderate (%)High (%)Very High (%)
LR22.48%24.70%21.94%19.04%11.84%
DT15.52%17.73%16.59%18.38%31.78%
RF27.43%23.24%19.61%17.42%12.31%
Table 5. Logistic regression (LR) model statistics.
Table 5. Logistic regression (LR) model statistics.
FactorCoef.SDt-Valuep-ValueSig.
Distance to rivers 0.000040.000050.870.383
Distance to roads −0.00020.00004−5.76<0.001***
Distance to faults −0.000010.000006−1.920.054
Slope 0.00390.00550.700.483
Aspect 0.00020.00050.430.665
Profile curvature0.00030.00980.030.976
Plan curvature−0.00160.0027−0.600.548
Elevation−0.00230.0002−13.26<0.001***
NDVI−5.15000.7412−6.95<0.001***
Rainfall −0.00150.0007−2.070.039*
Soil types0.01170.02000.580.560
Land cover−0.10010.0239−4.19<0.001***
Constant9.66500.655614.73<0.001***
Akaike crit. (AIC)2497.9
*** p < 0.001, * p < 0.01.
Table 6. Predictive performance of three models.
Table 6. Predictive performance of three models.
ModelAccuracy Parameters
SensitivitySpecificityAccuracyAUC Values
LR0.710.770.730.81
DT0.650.830.740.77
RF0.710.810.760.83
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Gao, J.; Shi, X.; Li, L.; Zhou, Z.; Wang, J. Assessment of Landslide Susceptibility Using Different Machine Learning Methods in Longnan City, China. Sustainability 2022, 14, 16716. https://0-doi-org.brum.beds.ac.uk/10.3390/su142416716

AMA Style

Gao J, Shi X, Li L, Zhou Z, Wang J. Assessment of Landslide Susceptibility Using Different Machine Learning Methods in Longnan City, China. Sustainability. 2022; 14(24):16716. https://0-doi-org.brum.beds.ac.uk/10.3390/su142416716

Chicago/Turabian Style

Gao, Jiangping, Xiangyang Shi, Linghui Li, Ziqiang Zhou, and Junfeng Wang. 2022. "Assessment of Landslide Susceptibility Using Different Machine Learning Methods in Longnan City, China" Sustainability 14, no. 24: 16716. https://0-doi-org.brum.beds.ac.uk/10.3390/su142416716

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop