A novel GIS-based ensemble technique for flood susceptibility mapping using evidential belief function and support vector machine: Brisbane, Australia

Mahyat Shafapour Tehrany; Lalit Kumar; Farzin Shabani

doi:10.7717/peerj.7653

A novel GIS-based ensemble technique for flood susceptibility mapping using evidential belief function and support vector machine: Brisbane, Australia

Mahyat Shafapour Tehrany^1,2, Lalit Kumar¹, Farzin Shabani ^1,3,4

1School of Environmental and Rural Science, University of New England, Armidale, NSW, Australia

2Geospatial Science, School of Science, RMIT University, Melbourne, Australia

3ARC Centre of Excellence for Australian Biodiversity and Heritage, Global Ecology, College of Science and Engineering, Flinders University of South Australia, Adelaide, Australia

4Department of Biological Sciences, Macquarie University, Sydney, NSW, Australia

DOI: 10.7717/peerj.7653

Published: 2019-10-09
Accepted: 2019-08-09
Received: 2019-06-02

Academic Editor: Marco Cavalli

Subject Areas: Natural Resource Management, Environmental Impacts
Keywords: Flood susceptibility mapping, Support vector machine, Evidential belief function, Ensemble modeling

Copyright: © 2019 Shafapour Tehrany et al.
Licence: This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.

Cite this article: Shafapour Tehrany M, Kumar L, Shabani F. 2019. A novel GIS-based ensemble technique for flood susceptibility mapping using evidential belief function and support vector machine: Brisbane, Australia. PeerJ 7:e7653 https://doi.org/10.7717/peerj.7653

Abstract

In this study, we propose and test a novel ensemble method for improving the accuracy of each method in flood susceptibility mapping using evidential belief function (EBF) and support vector machine (SVM). The outcome of the proposed method was compared with the results of each method. The proposed method was implemented four times using different SVM kernels. Hence, the efficiency of each SVM kernel was also assessed. First, a bivariate statistical analysis using EBF was performed to assess the correlations among the classes of each flood conditioning factor with flooding. Subsequently, the outcome of the first stage was used in a multivariate statistical analysis performed by SVM. A highest prediction accuracy of 92.11% was achieved by an ensemble EBF-SVM—radial basis function method; the achieved accuracy was 7% and 3% higher than that offered by the individual EBF method and the individual SVM method, respectively. Among all the applied methods, both the individual EBF and SVM methods achieved the lowest accuracies. The reason for the improved accuracy offered by the ensemble methods is that by integrating the methods, a more detailed assessment of the flooding and conditioning factors can be performed, thereby increasing the accuracy of the final map.

Introduction

Climate change and the inevitable urbanization have increased the occurrences of floods (Kjeldsen, 2010). The direct consequences of flooding include the loss of life, destruction of property, damage to crops, and deterioration of health conditions as a result of waterborne illnesses. Flooding can cause serious damages by dragging huge objects across the land on which the water flows (Fotovatikhah et al., 2018). Large floods can affect wildlife and decrease the level of biodiversity in inundated areas. A decrease in the habitat potential and food availability in the affected areas can cause long-term effects for the surviving wildlife. Population growth can result in increased constructions on floodplains. Smaller dwellings can be built that can result in denser cities and an increased possibility of floods in such areas. More closely constructed dwellings increases the quantity of houses that are potentially exposed to flood damage. Therefore, the costs involved in flood damage are considerably high in terms of both damaged assets and human fatalities. It is more important to aim at preventing such disasters than compensating for the damages. Preventive actions can minimize the possibly irreversible damages caused to buildings, farming, and transportation (Youssef, Pradhan & Sefry, 2016). The regions that are susceptible to floods must be identified to assist the governments and agencies in avoiding as much destruction as possible. It is not easy to determine the impact of a flood because it is not tangible; the evaluation requires a considerable amount of time. Conversely, the loss and destruction cause by a flood can be measured more easily (Yi, Lee & Shim, 2010).

There is a need for more studies based on floods and floodplain management strategies to improve the existing knowledge concerning the way floods occur under varying climate and catchment situations. Numerous studies have utilized flood susceptibility mapping (Merz, Thieken & Gocht, 2007; Pradhan, 2010; Pradhan & Youssef, 2011; Tehrany et al., 2014; Van Alphen et al., 2009). However, it remains to solve the problem of generating accurate flood forecasts and maps. The rainfall-runoff modeling techniques WetSpa, HYDROTEL, and SWAT are some of the popular hydrological methods (Herder, 2013). Calibration and sensitivity analysis must be performed for these methods (Neitsch et al., 2002). Moreover, it is not easy for researchers with limited expertise in hydrology to implement these methods (Herder, 2013). Therefore, they are less applicable for real-time studies. Bivariate statistical analysis (BSA) and multivariate statistical analysis (MSA) are two forms of quantitative (statistical) techniques (Ayalew & Yamagishi, 2005). BSA includes the analysis of the correlations between the flood inventory map and each conditioning factor (Althuwaynee, Pradhan & Lee, 2012). Each class of a particular conditioning factor is examined separately, and the final probability map is produced by the sum of all weighted summation. Frequency ratio (FR) and weight-of-evidence (WoE) are two examples of BSA methods. MSA methods such as logistic regression (LR) examine the multiple associations between the different conditioning factors and the flood inventory map simultaneously. LR evaluates the correlations between the different conditioning factors and flooding at the same time (Carrara, Crosta & Frattini, 2003). FR and LR have been widely used in studies based on natural hazards (Lee & Pradhan, 2007; Park et al., 2013); both the methods involve simple and linear calculation processes. On one hand, BSA neglects the correlations among the different conditioning factors, which is considered a disadvantage. On the other hand, MSA neglects the influence of the classes of each conditioning factor on the occurrence of floods (Tehrany, Pradhan & Jebur, 2013). Generally, catchments cannot be accurately modeled using simple and linear techniques owing to their complex, dynamic, and non-linear structure. Other available techniques include the WoE, evidential belief function (EBF), artificial neural network (ANN), support vector machine (SVM), and decision tree (DT) that are more advanced and structurally complex. The WoE method, which is a BSA method, is based on Bayesian theory, and it is appropriate for solving decision-making problems under uncertainties. This technique has been applied in several studies based on natural hazards (Pourghasemi et al., 2013b). However, a demerit of all BSA methods can also be observed in the WoE method: it does not evaluate the correlation among the different conditioning factors.

The SVM, ANN, and DT methods are known as machine learning methods. They can be trained using training datasets; then, the model can be applied to the whole dataset. Machine learning methods have been used in a variety of applications (Qasem et al., 2019). Fotovatikhah et al. (2018) have stated that the ANN method is one of the most popular computational intelligence (CI) methods; it was applied in flood mapping by Campolo, Soldati & Andreussi (2003), Shu & Burn (2004) and Seckin et al. (2013), among others. It can handle errors in the input dataset and gather information from incomplete or contradictory datasets. However, the accuracy of its outcomes decreases for cases in which the validation data has values beyond the range of those used to run the model (Kia et al., 2012). In cases where a large number of factors are used in the analysis, the entire modeling process becomes time consuming (Ghalkhani et al., 2013). An adaptive neuro-fuzzy inference system (ANFIS) (Dehghani et al., 2019) is an integrated method created using the ANN and fuzzy interface system (FIS) methods. This method has better capability than the individual ANN method (Tehrany, Pradhan & Jebur, 2014).

The application of the DT method in flood susceptibility analysis has been evaluated by Tehrany, Pradhan & Jebur (2013) in Kelantan, Malaysia. The prediction accuracy of their results proved the proficiency of this method in flood studies. The drawback of using the DT method is the considerable amount of time required to produce the final tree. SVM is a powerful machine learning technique in probability analysis. Using this technique, pixels can be categorized even when the data are not linearly separable. Its processing speed varies based on the selected SVM kernel.

Similar to other natural disasters, floods cause costly and irrecoverable damages to the affected areas. Although it is almost impossible to prevent flooding, high-risk areas can be recognized, and the damages can be considerably reduced by proper management. Appropriate planning and management can be carried out by performing flood susceptibility, hazard, and risk analyses. The areas that are susceptible to floods must be detected; the accuracy of the outcomes is directly associated with the efficiency of the technique used and the accuracy of the dataset used.

Based on the aforementioned literature, there is a lack of optimized techniques for obtaining flood susceptibility maps. In addition, there are several methods such as EBF that have not yet been used for obtaining these maps. The idea of creating a more reliable method by combining two or more techniques may resolve the issues involved in the individual methods (Rokach, 2010). It has been shown by several researchers that an ensemble technique is more efficient in terms of prediction accuracy than individual methods (Lee & Oh, 2012). A recent study that was implemented by Choubin et al. (2019) has proved the proficiency of ensemble modeling in flood analysis. They used three methods for performing the analysis; these include multivariate discriminant analysis, classification and regression trees, and SVM. Tehrany et al. (2014) resolved the issues faced by FR and LR in flood susceptibility mapping by integrating both the methods. A similar method has been tested by Umar et al. (2014) to map landslide-susceptible regions in West Sumatera Province, Indonesia. Although several methods and their applications to flood susceptibility mapping have been examined, an ensemble analysis that includes the integration of EBF and SVM has not been tested for this purpose. The reasons that these methods were chosen are as follows:

Four SVM Kernels (linear (LN), polynomial (PL), radial basis function (RBF), and sigmoid (SIG)) provide more detail in the assessment and reliability of the derived ensemble method. If all the kernels in the ensemble method provide a higher accuracy than the individual methods, it proves the proficiency of the ensemble method. Every method does not provide the opportunity to not only evaluate the outcomes using an accuracy assessment technique but also using various internal factors.
The study conducted by Fotovatikhah et al. (2018) investigated more than hundred articles on floods. According to their results, SVM methods exhibited lower error rates in comparison with those exhibited by other methods.
The EBF method has been rarely used in flood susceptibility mapping, but its application has been repeatedly examined in other natural hazard domains.
EBF is a robust method based on the Dempster-Shafer theory in which relative flexibility is one of the benefits. It is capable of generating reliable outcomes by integrating different factors to decrease the uncertainty (Thiam, 2005). This technique assesses the likelihood that a certain theory is correct, and it estimates how closely the proof confirms the correctness of that hypothesis. The degree of belief (Bel), degree of uncertainty (Unc), degree of disbelief (Dis), and degree of plausibility (Pls) are the main parameters of EBF each of which extracts specific information using different analysis of a dataset.
A combination of SVM and EBF is an integration of a powerful machine learning method and a strong statistical method, respectively.

An individual SVM and the application of its four kernels in flood susceptibility mapping has been tested by Tehrany et al. (2015). However, EBF is almost new in the flood susceptibility domain. EBF is mostly used in mineral potential mapping (Carranza, 2009; Ford, Miller & Mol, 2015), landslide mapping (Althuwaynee, Pradhan & Lee, 2012; Lee, Hwang & Park, 2013), land subsidence mapping (Pradhan et al., 2014), forest fire susceptibility mapping (Pourghasemi, 2016), and groundwater mapping (Nampak, Pradhan & Manap, 2014; Pourghasemi & Beheshtirad, 2015). The aim of this study is to enhance the prediction accuracy of individual SVM and EBF methods in flood susceptibility mapping by combining them. The SVM and EBF methods are applied individually to compare the performance of the new ensemble method with that of the individual ones. In addition, all the SVM kernel types are used in the ensemble modeling because each has a specific analysis process that produces a different outcome. A comparison between the SVM kernels can assist in identifying the most proficient method for natural hazard studies. Finally, the precision and reliability of the results are assessed using the area under the curve (AUC) method.

Study area and data

The Brisbane River Catchment in Queensland, Australia, was chosen as the study area. Three sub-basins of the Bremer River, Brisbane River, and a part of Lockyer Creek are covered by this catchment. The study area has an approximate area of 2,806 km² located between latitudes 27°22′12″S and 28°01′48″S and longitudes 152°22′12″E and 153°05′6″E (Fig. 1). In Queensland, the average yearly precipitation ranges from very low values in the Southwest to very high values exceeding 2,000 mm around the coastal regions. Even in regions with low precipitation, considerably heavy rainfall takes place in some years, thereby causing floods. Scientists believe that long-term climate change may affect rainfall in this region (Partridge, 2001). Brisbane has a humid subtropical climate with very hot, humid summers and dry, reasonably warm winters. The average temperature is 20.3 °C, and it receives nearly 1,168 mm of rainfall per year. A destructive flood occurred in Brisbane in 2001 and is used in this study as inventory data. The flood forced the evacuation of many people from towns and cities. Vast areas around the Brisbane River were inundated, and there was significant damage and loss of life.

Figure 1: Study area and inundated points used as inventory data in this research.

Download full-size image

DOI: 10.7717/peerj.7653/fig-1

To perform flood susceptibility mapping, two sets of data are required. The first dataset represents the historical data of floods that indicates the inundated regions (a flood inventory map). The second dataset is related to flood contributing parameters that are known as flood conditioning factors (Lee, Hwang & Park, 2013). Flood inventory data need to be assessed against flood conditioning factors to recognize their significance and impact on the occurrence of the floods because it is typically assumed that floods will occur under the same conditions as before (Fotovatikhah et al., 2018). Then, the inventory data must be divided into the training and testing datasets to be used for the training and validation processes, respectively (Tsangaratos & Ilia, 2016). In flood modeling, there is no specific or pre-defined method that exists for classifying inventory data. It is typically decided based on the accessibility and quality of data. Space robustness and time robustness are two measures used for assessment (Althuwaynee et al., 2014a). In time robustness, flood inventory data are split into two periods: past incidence that represent the training data, and future incidence that represent the validation data. Multi-temporal data are required for this analysis, wherein each flood is associated with the precipitation data that caused it. In space robustness, flood inventory data are randomly split into two classes: training and testing. When comprehensive flood inventory data are available, integration of these methods is possible (Huabin et al., 2005). In this study, the space robustness technique was used to generate the training and testing datasets.

These training and testing datasets were also used later in the validation stage (Xu, Xu & Yu, 2012). The validation process was implemented by comparing the existing flood locations with the acquired flood susceptibility map. The AUC method, which is described in the methodology section (‘Validation’), was used to assist the validation. The success and prediction rates of the AUC method were measured using the training and testing datasets, respectively. The success rate represented how well the model fit to the training dataset (Tehrany et al., 2015). The prediction capability of the model cannot be assessed by the success rate because it is measured using the flood locations that have already been used for constructing the model. The prediction rate can be used to evaluate the prediction capability of the model. The prediction rates were measured by comparing the flood susceptibility maps with the flood testing dataset (Bui et al., 2012).

According to the literature, the percentages commonly used to divide the inventory dataset are 30% and 70% for the testing and training datasets, respectively (Abdulwahid & Pradhan, 2017; Chen et al., 2019; Pham et al., 2017a). In the study conducted by Kalantar et al. (2018), the impacts of training data selection on the susceptibility mapping have been evaluated.

From 159 flood locations, 106 locations were used for the purpose of training, and the remaining 53 locations were used for validation (Fig. 1). With regard to the data configuration, specific data preparation was required. Two separate data layers were created for training and testing. The training flood locations (106 points) were selected randomly to produce the dependent data consisting of values 0 and 1 that represent the existence and absence of flooding over a region, respectively. The same number of points (106) were selected as non-flooded areas, and value 0 was assigned to them. Considering the non-flooded locations in the study area can enhance the accuracy of the results (Tehrany et al., 2015). The rest of the flood events (53 points) were used for the purpose of testing. The same configuration was also used to create the testing data layer.

In terms of flood conditioning factors, the selection of the most influential parameters is essential. Precipitation is the most significant parameter in the occurrence of floods. However, many other parameters are involved (Lawal et al., 2012). Flooding is initiated by rainfall but influenced by many other factors. During rainfall in a drainage basin, the extent of rain that enters the rivers depends on the condition of the basin, mainly its extent, topography, and LULC types (Hölting & Coldewey, 2019). Some rainfall is controlled by vegetation and soil, and the remaining rainfall reaches the rivers. Twelve flood conditioning factors (slope, aspect, elevation, curvature, topographic wetness index (TWI), geology, stream power index (SPI), soil, LULC, rainfall, distance from roads, and distance from rivers) were collected from different sources and converted into a raster format with a 5 × 5 m pixel size (Table 1). All the scale factors were classified using the quantile method, and they are presented in Fig. 2. The factors were classified because EBF is a BSA method that assesses the influence of each class of a conditioning factor on a specific event, which, in the current case, is floods.

Table 1:

Spatial dataset and data sources.

Conditioning factor	Source	Impact
Altitude	Light Detection and Ranging (LiDAR) data from Australian Government/Geoscience Australia	High-elevation regions help water flow and connect to lower areas around the rivers, causing flooding.
Slope	Derived from DEM	Impact on the extent and velocity of runoff.
Aspect	Derived from DEM	Effect on the amount of precipitation and sunshine.
Curvature	Derived from DEM	Influence on surface infiltration.
SPI	Derived from DEM	Erosive power of the terrain.
TWI	Derived from DEM	Amount of the flow accumulation at any place in a catchment.
Soil	CSIRO website	Soil type and soil structure control the soil saturation and amount of water infiltration in soil.
Geology	Queensland Government website	Impact on rainfall penetration and water flow.
LULC	• Classifying SPOT5 imagery	Each LULC type plays specific role in flooding.
	• High spatial resolution orthophotography
	• Scanned aerial photos
	• Local expert knowledge
Rainfall	Bureau of Meteorology website	Floods occur after heavy precipitation.
Distance from river	Queensland government website (Wetlandinfo)	Areas closer to the rivers have higher chance of getting flooded.
Distance from road	Department of Transport and Main Roads	Impervious surfaces produce more flooding.

DOI: 10.7717/peerj.7653/table-1

Figure 2: Flood conditioning factors.
(A) Altitude, (B) slope, (C) aspect, (D) curvature, (E) stream power index (SPI), (F) topographic wetness index (TWI),(G) distance from rivers, (H) distance from roads, (I) rainfall, (J) soil types, (K) geology, (L) land use land cover (LULC).

Download full-size image

DOI: 10.7717/peerj.7653/fig-2

Floods typically occur in regions with low elevation (Botzen, Aerts & Van den Bergh, 2013). Water moves from the hillsides of mountains and reaches the lower ground; this leads to flooding. Researchers consider the altitude an amplifying parameter in the occurrence of floods because it has an impact on the amount and velocity of runoff (Kia et al., 2012). Altitude and its derivatives have vital roles in identifying areas that are susceptible to flooding. More reliable flood analysis can be expected when more accurate topographical data are used (Abdullah, Vojinovic & Rahman, 2013). A DEM with a spatial resolution of 5 m that was produced from Light Detection and Ranging (LiDAR) data was used to derive other related parameters. Slope layer, another topographical parameter, was produced from DEM with 10 classes with a maximum angle of 53°. The slope impact on flooding is related to runoff speed: steep slopes have less time for infiltration, which causes an increase in water flow. An aspect map that has nine classes indicating the direction of the terrain (flat, northeast, east, southeast, south, southwest, west, and northwest) was also derived from DEM. The curvature (slope shape) has three classes: concave (positive values (+)), convex (negative values (−)) and flat (value 0). Water-associated parameters of TWI and SPI were also used in the analysis, and they were measured using the following equations (Tehrany, Pradhan & Jebur, 2014):

(1) $TWI = ln (A_{s} ∕ tan β)$ (2) $SPI = A_{s} tan β$ where A_s is the area of catchment (m²) and β (radians) is the slope gradient.

Although both TWI and SPI factors have been derived from the catchment area and slope, each represents different terrain characteristics. SPI measures the erosive power of flowing water (Althuwaynee et al., 2014b). It is expected that flooding occurs in the areas with the lowest SPI values, the reason being that most areas with high SPI values are sharp and steep lands. Therefore, gravity increases the speed of water flow; consequently, destructive power increases. On the other hand, the spatial distribution and zone of saturation of sources for runoff generation can be identified by measuring the TWI. The TWI is used to measure topographic control on hydrological procedures (Chen & Yu, 2011). It shows the water penetration capability in a region and thus, the areas with potential for floods. Logically, flat terrain absorbs more water than steep terrain owing to more gravity acting on the water flowing down the hilly slopes. Hence, the TWI in areas around rivers and flat lands is greater than that in areas with slopes. Higher TWI values are usually found in flooded areas.

The distance from a river and the distance from a road were determined using the Euclidean Distance tool, and ten classes were created for each parameter. Urbanization increases the areas with impervious surfaces that cause increased hydraulic proficiency in urban basins. Hence, the terrain has less rainfall infiltration capacity that increases the extent of runoff (Shuster et al., 2005). Owing to the significant role of LULC, this factor was also used in the analysis. Different soil conditions can affect the extent of runoff in the catchment area. Some soil types allow greater infiltration of precipitation compared to others, which leads to a smaller volume of runoff. Different types of geology can also affect the amount and speed of water flow.

Methodology

The process commenced by performing EBF using the flood training points. The correlation between each class of a conditioning factor and flood occurrence was assessed. All the factors were reclassified using the derived weights and used in SVM analysis as inputs. SVM analysis was performed using all the four kernels (LN, PL, SIG, and RBF) because each kernel has a different method of analysis. To clearly judge the performance of the ensemble methods, EBF and SVM were also applied individually. All six derived susceptibility maps were validated using the AUC technique and the flood testing dataset. The procedure is shown in Fig. 3.

Evidential belief function (EBF)

The Dempster–Shafer technique is a statistical procedure that is used to recognize spatial integration between dependent and independent factors (Smets, 1994). The Dempster-Shafer theory (DST) of evidence, developed by Dempster (2008), is a generalization of the Bayesian theory of subjective probability. Its major advantages are its relative flexibility in accepting uncertainty and the ability to combine beliefs from multiple sources of evidence (Tehrany et al., 2017).

Suppose that a set of flood conditioning factors C = (C_i, i = 1, 2, 3, …, n) that includes mutually exclusive and exhaustive factors of C_i is used in this research. C is known as the frame of discernment. A basic probability assignment is a function m: $P (C) \to [0, 1] . P (C)$ is the set of all subsets of C including the empty set and C itself. This function is also called a mass function and satisfies $m (Φ) = 0$ and $\sum_{A C} m (A) = 1$ , where Φ is an empty set and A is any subset of C. $m (A)$ measures the degree to which the evidence supports A, and it is denoted by Bel (A), a belief function. The degree of belief (Bel), degree of uncertainty (Unc), degree of disbelief (Dis), and degree of plausibility (Pls) are the main parameters of EBF (Althuwaynee, Pradhan & Lee, 2012). The dissimilarity among Bel and Pls is represented by Unc, which represents ignorance. Dis is the degree of belief of the hypothesis being incorrect for certain evidence. The relationships between these parameters have been previously described, and they include 1 − Unc − Bel, Dis = 1 − Pls and Bel + Unc + Dis = 1, where C_ij has no flood event, Bel will be zero, and Dis will be reset to zero (Awasthi & Chauhan, 2011).

Both BSA and MSA can be performed using EBF (Carranza, Woldai & Chikambwe, 2005). The number of pixels that represent flood or non-flood for each class of a flood conditioning factor are measured by overlapping the flood inventory layer of all flood conditioning factors. Assuming that N(L) and N(C) are the pixels that are inundated and that C_ij is the jth class of the flood contributing factor C_i(i = 1, 2, 3, …, n), N(C_ij) is the number of pixels in class C_ij, and N = (L∩C_ij) is the number of inundated pixels in C_ij. Hence, EBF can be calculated as follows (Carranza & Hale, 2003):

(3) $B e l (C_{i j}) = \frac{W_{C_{i j} (Flood)}}{\sum_{j = 1}^{n} W_{C_{i j} (Flood)}}$ (4) $W_{C_{i j} (Flood)} = \frac{\frac{N (L \cap C_{i j})}{N (L)}}{\frac{[N (C_{_{i j}}) - N (L \cap C_{i j})]}{[N (C) - N (L)]}}$ (5) $D i s (C_{i j}) = \frac{W_{C_{i j} (Non-flooded)}}{\sum_{j = 1}^{n} W_{C_{i j} (Non-flooded)}}$ where

(6) $W_{C_{i j} (Non-flooded)} = \frac{\frac{[N (C_{i j}) - N (L \cap C_{i j})]}{N (L)}}{\frac{[N (C) - N (L) - N (C_{i j}) + N (L \cap C_{i j})]}{[N (C) - N (L)]}} .$

The numerator in Eq. (4) is the proportion of flooded pixels in factor class C_ij; the numerator in Eq. (6) is the proportion of flooded pixels that do not occur in factor class C_ij; the denominator in Eq. (4) is the proportion of non-flooded pixels in factor class C_ij; the denominator in Eq. (6) is the proportion of non-flooded pixels in other attributes outside the factor class C_ij. Here, the weight of C_ij is represented by W_{C_ij(Flood)}, which supports the belief that floods are more likely to occur, and W_{C_ij(Non-flood)} represents the weight of C_ij that supports the belief that floods are less likely to occur.

Excel and ArcGIS software were used to measure the EBF. Subsequently, Dempster’s rule of combination was applied using a raster calculator in ArcGIS to obtain the four integrated EBFs. The formulae for combining two flood conditioning factors C₁ and C₂ are as follows:

(7) ${B e l}_{C_{1} C_{2}} = \frac{{B e l}_{C_{1}} {B e l}_{C_{2}} + {B e l}_{C_{1}} {U n c}_{C_{2}} + {B e l}_{C_{2}} {U n c}_{C_{1}}}{1 - {B e l}_{C_{1}} {D i s}_{C_{2}} - {D i s}_{C_{1}} {B e l}_{C_{2}}}$ (8) ${D i s}_{C_{1} C_{2}} = \frac{{D i s}_{C_{1}} {D i s}_{C_{2}} + {D i s}_{C_{1}} {U n c}_{C_{2}} + {D i s}_{C_{2}} {U n c}_{C_{1}}}{1 - {B e l}_{C_{1}} {D i s}_{C_{2}} - {D i s}_{C_{1}} {B e l}_{C_{2}}}$ (9) ${D i s}_{C_{1} C_{2}} = \frac{{D i s}_{C_{1}} {D i s}_{C_{2}} + {D i s}_{C_{1}} {U n c}_{C_{2}} + {D i s}_{C_{2}} {U n c}_{C_{1}}}{1 - {B e l}_{C_{1}} {D i s}_{C_{2}} - {D i s}_{C_{1}} {B e l}_{C_{2}}}$

Integrated EBF of the flood conditioning factors are implemented sequentially using Eqs. (7)–(9).

Support vector machine (SVM)

Among the data-driven techniques, machine learning methods produce promising viewpoints in natural hazard mapping, and they are suitable for nonlinear multi-dimensional modeling problems (Yilmaz, 2010). SVM is based on the statistical learning concept. It contains a stage wherein the model is trained using a training dataset of related input and target output values. After the model is trained, it is used to assess the testing data. There are two main procedures underlying SVM for solving problems (Yao, Tham & Dai, 2008). First, a linear separating hyper-plane is created that splits the data based on their patterns. Second, mathematical functions (kernels) are used to transform the nonlinear data into a linearly distinguishable format (Micheletti et al., 2011).

Separating hyper-plane formations from a training dataset is the basis for this method. The separated hyper-plane is generated in the original space of n coordinates (x_i: parameters of vector x) between the points of two distinct classes (Shao & Deng, 2012). Values of +1 and −1 are assigned to the pixels that are above and below the hyper-plane, respectively. The training pixels that are closest to the hyper-plane are called support vectors. Modeling of the rest of the data can be undertaken after deriving the decision surface (Pradhan, 2013). The maximum margin of separation between the classes is discovered by SVM; therefore, it builds a classification hyper-plane in the center of the maximum margin.

Consider a training dataset of instance-label pairs (x_i, y_i) with x_i ∈ Rⁿ, y_i ∈ {1, −1} and i = 1, …, m. In this case study, x represents slope, aspect, elevation, curvature, TWI, geology, SPI, soil, LULC, rainfall, distance from roads, and distance from rivers. The classes of 1 and −1 show the flooded and non-flooded pixels, respectively. Finding the best hyper-plane is the goal of the SVM, which separates pixels into different classes, namely, flooded and non-flooded. A separating hyper-plane can be defined as:

(10) $y_{i} (w . x_{i} + b) \geq 1 - ξ_{i},$

where the orientation of the hyper-plane in the feature space is shown by w, the offset of the hyper-plane from the origin is represented by b, and the positive slack variable is ξ_i (Cortes & Vapnik, 1995). The following optimization problem using Lagrangian multipliers was solved through the determination of an optimal hyper-plane (Samui, 2008).

(11) $Minimize \sum_{i = 1}^{n} α_{i} - \frac{1}{2} \sum_{i = 1}^{n} \sum_{j = 1}^{n} α_{i} α_{j} y_{i} y_{j} (x_{i} x_{j}),$ (12) $subject to \sum_{i = 1}^{n} α_{i} y_{j} = 0, 0 \leq α_{i} \leq C,$

where α_i are Lagrange multipliers, C is the penalty, and the slack variables ξ_i allow the penalized constraint violation. Then, the decision function that is used to classify the new data can be written as:

(13) $g (x) = sign (\sum_{i = 1}^{n} y_{i} α_{i} x_{i} + b) .$

In the case where the hyper-plane cannot be separated by the linear kernel function, the original input data may be shifted into a high-dimension feature space through some nonlinear kernel functions. Then, the classification decision function is written as (Pradhan, 2013):

(14) $g (x) = sign (\sum_{i = 1}^{n} y_{i} α_{j} K (x_{i}, x_{j}) + b)$ where K(x_i, x_j) is the kernel function.

All of the conditioning factors were reclassified using the obtained EBF weights and entered into SPSS Modeler to implement the SVM modeling. Selection of the kernel function is very important in SVM modeling (Pradhan, 2013). SPSS Modeler offers four types of SVM kernels: LN, PL, RBF, and SIG. RBF is the most popular kernel because it works well in most cases (Yao, Tham & Dai, 2008). RBF has high interpolation capability and less extrapolation capability (Kavzoglu & Colkesen, 2009). PL has an inverse situation, which has better extrapolation capabilities compared to RBF. SIG and RBF perform in a similar manner for certain parameters. However, RBF offers more accuracy (Song et al., 2011). The LN kernel is less popular because it is based on a linear assumption. Using different kernels results in different outcomes. Therefore, in this study, all the kernels were used in ensemble modeling to find the optimal results and compare the outputs. The mathematical representation of each kernel is listed below (Pourghasemi et al., 2013a):

where γ (gamma) is a common parameter for all kernels except LN; d shows the polynomial degree term in the polynomial kernel function; r represents the bias term in the polynomial and sigmoid kernel functions. The parameters γ, d and r are defined by the user. The accuracy of these parameters directly influences the reliability and correctness of SVM outcomes (Ballabio & Sterlacchini, 2012). The two other SVM parameters of the cost of constraint violation (C) and epsilon (ε) were considered constant throughout the analysis.

Because of the importance of kernel parameter selection, listed in Table 2, a cross-validation method was used instead of the trial and error method (Zhuang & Dai, 2006). This process commenced with the division of the flood inventory into n folds: one fold was kept for accuracy assessment purposes, and the rest were saved to run the model (Yao, Tham & Dai, 2008). In this study, the dataset was split into five random folds for which every group had an equal number of flood points (Table 3). The average of the parameters was used for the final training.

Table 2:

Different SVM kernel types, their equations and required parameters.

Kernel	Equation	Kernel parameters
RBF	$K (x_{i}, x_{j}) = \exp (- γ ∥ x_{i} - x_{j} ∥^{2})$	γ
LN	$K (x_{i}, x_{j}) = x_{i}^{T} x_{j}$	–
PL	$K (x_{i}, x_{j}) = {(- γ x_{i}^{T} x + r)}^{d}$	γ, d
SIG	$K (x_{i}, x_{j}) = Tanh {(- γ x_{i}^{T} x + r)}^{d}$	γ

DOI: 10.7717/peerj.7653/table-2

Table 3:

Cross-Validation results.

EBF & RBF-SVM Model	Training fold	Testing fold	γ		C
1	2, 3, 4, 5	1	0.1		20
2	1, 3, 4, 5	2	0.2		10
3	1, 2, 4, 5	3	0.1		10
4	1, 2, 3, 5	4	0.3		12
5	1, 2, 3, 4	5	0.2		15
			0.18		13.5
EBF & SIG-SVM Model	Training fold	Testing fold	γ		C
1	2, 3, 4, 5	1	2		20
2	1, 3, 4, 5	2	1.5		10
3	1, 2, 4, 5	3	1		10
4	1, 2, 3, 5	4	2		10
5	1, 2, 3, 4	5	3		10
			1.9		12
EBF & LN-SVM Model	Training fold	Testing fold	C
1	2, 3, 4, 5	1	10
2	1, 3, 4, 5	2	11
3	1, 2, 4, 5	3	14
4	1, 2, 3, 5	4	15
5	1, 2, 3, 4	5	15
			13
EBF & PL-SVM Model	Training fold	Testing fold	γ	d	C
1	2, 3, 4, 5	1	1	3	10
2	1, 3, 4, 5	2	1	3	20
3	1, 2, 4, 5	3	5	5	20
4	1, 2, 3, 5	4	5	1	10
5	1, 2, 3, 4	5	10	7	10
			4.4	3.8	14

DOI: 10.7717/peerj.7653/table-3

Ensemble modeling

To perform the ensemble modeling, all the flood conditioning factors were reclassified based on the acquired EBF weight of C_ij. This stage represents the BSA. The next stage denotes the MSA by reclassifying the conditioning factors using the derived weights from EBF and using them in the SVM analysis. The ensemble method was applied using all four SVM kernels and the parameters obtained from the cross-validation. Consequently, four flood probability indices were derived. In addition, another flood probability map was generated using an individual SVM and an RBF kernel. A stand-alone SVM analysis was performed using the original flood conditioning factors that were not classified by the EBF results. The conditioning factors used in the individual SVM modeling were unclassified and were all in a continuous data format. The reason for this was to examine whether the data format or the use of classified factors can reduce the data variability (Sajedi Hosseini et al., 2018). In addition, individual EBF results were modeled, and a flood probability index was calculated.

Spatial sensitivity analysis

Uncertainty is an unavoidable factor in every analysis (Rahmati, Pourghasemi & Melesse, 2016). Considering these uncertainties helps in obtaining better interpretations of the model outcomes. Although it is not possible to achieve 100% accuracy, there are several approaches that can be implemented to reduce the uncertainty (Refsgaard et al., 2007). These include uncertainty engine (Brown & Heuvelink, 2007), inverse modeling (predictive uncertainty) (Friedel, 2005), and Monte Carlo analysis (Yang, 2011). Sensitivity analysis (SA) evaluates the impact of conditioning factor variations on model outputs, thereby allowing the quantitative assessment of the relative importance of uncertainty sources (Chen, Yu & Khan, 2010). In this study, the Jackknife test was used to assess the uncertainty among the conditioning factor datasets. This SA technique examines the impact of repeatedly removing every conditioning factor from the dataset on the final outcomes. This means that by using this process, the conditioning factor contribution in the analysis can be recognized. In addition, the percentage of relative decrease (PRD) of the AUC values was measured to investigate the dependency of the model output on the influence of conditioning factors using the following equation:

(15) $PRD = \frac{{AUC}_{all} - {AUC}_{i}}{{AUC}_{i}} \times 100$

where AUC_all indicates the AUC value derived using the full conditioning factor dataset, and AUC_i shows the prediction power of the method when the i th conditioning factor has been excluded from the dataset.

Validation

To evaluate the efficiency and reliability of the analytical outcomes, the popular AUC method was used. AUC is a popular assessment technique in natural hazard analysis because it provides an understandable and comprehensive way for performing validation (Beguería, 2006; Hand & Till, 2001). It commences with the arrangement of the probability index in descending order. Classification of the probability index into hundred categories on the y-axis with cumulative 1% breaks is the second step. Then, the flood occurrence in each class is examined, and prediction and success rates are derived. The prediction rate is the accuracy that is achieved using flood testing points. It shows how successful the applied technique was in predicting the flood-prone areas that were already inundated. Conversely, the success rate is produced using the flood training points, and this shows the model performance (Tehrany et al., 2015). The range of the AUC is between zero and one. The maximum accuracy is represented by the value 1, and 0 indicates the failure of the analysis. In this study, 106 flood locations were used for training and 53 locations were used for testing purposes.

Results and Discussion

Analyzing the weights derived from each method

EBF was applied, and the weight for each class of the flood conditioning factors was determined. The areas with high values of Bel and low values of Dis are the most susceptible to floods. Table 4 lists the EBF calculated for the twelve flood parameters. A range of 0.22 to 23.75 m in altitude received the highest Bel (77) and lowest Dis (2) values, thereby indicating the highest susceptibility to floods. All the altitude classes except the second one had considerably low Bel values, thereby indicating low susceptibility to floods. EBF results acquired for altitudes confirmed that most flooding occurred at low altitudes because the water flowed to and met in the lower areas, thereby indicating that flooding of areas at higher altitudes is almost impossible. The correlation between landslide occurrence and slope shows that steep slopes accelerated water flows. A range of 0–0.21° in the slope map attained the highest Bel value of 29 and a low Dis value of 8, followed by the slope range 0.62–1.25°. The aspect map received the highest Bel value of 52 and the lowest value of 8 for the class that was flat; this shows that floods occur in flat areas because water cannot infiltrate the saturated soil. Moisture preservation and vegetation density are affected by this aspect, which also influences flood occurrence. The morphology of the topography is indicated by curvature, which has three categories: concave, convex, and flat. A pixel with a negative curvature value denotes upward concave ground; a pixel with a positive curvature value denotes upward convex ground. A pixel with value zero represents flat ground. The Bel values for the convex and concave categories in the curvature map were low; this condition implies lower flood potential compared with the flat curvature class. The flat class received Bel and Dis values of 54 and 17, respectively.

Table 4:

Results of EBF in the case of each factor.

Layer	Classes	Pixels in class	Pixels in domain	Bel	Dis
Elevation (m)	0–23.75	1,214,340	81	77	2
	23.75–39.43	1,221,749	12	11	9
	39.43–51.19	1,427,269	4	3	10
	51.19–66.87	1,474,786	1	0	11
	66.87–82.56	1,480,835	4	3	10
	82.56–98.24	1,010,034	3	3	10
	98.24–121.76	1,384,453	1	0	11
	121.76–149.21	1,049,967	0	0	10
	149.21–204.10	1,151,800	0	0	11
	204.10–1000.01	1,056,388	0	0	10
Slope	0–0.21	726,676	19	29	8
	0.21–0.62	1,343,717	18	15	9
	0.62–1.25	1,571,205	30	21	8
	1.25–2.09	1,449,461	18	14	9
	2.09–3.13	1,345,964	10	8	10
	3.13–4.39	1,240,403	3	2	10
	4.39–6.27	1,289,769	5	4	10
	6.27–9.41	1,214,967	1	0	10
	9.41–15.05	1,147,790	1	0	10
	15.05–53.32	1,141,669	1	1	10
Aspect	Flat	498,560	29	52	8
	North	1,593,563	12	6	11
	Northeast	1,708,517	10	5	11
	East	1,813,190	10	4	11
	Southeast	1,527,816	9	5	11
	South	1,216,894	9	6	11
	Southwest	1,147,405	10	7	11
	West	1,391,379	7	4	11
	Northwest	1,574,297	10	5	11
Curvature	Convex	141,143	1	45	40
	Flat	12,201,477	105	54	17
	Concave	129,001	0	0	41
SPI	0	2,479	0	0	11
	0–157700.42	2,479	0	0	11
	157700.42–315400.84	12,450,355	106	100	0
	315400.84–473101.27	10,969	0	0	11
	473101.27–630801.69	3,631	0	0	11
	630801.69–946202.54	1,609	0	0	11
	946202.54–1419303.81	1,030	0	0	11
	1419303.81–2207805.92	595	0	0	11
	2207805.92–4257911.43	367	0	0	11
	4257911.43–40213608	311	0	0	11
TWI	2.595171–5.410969	1,110,528	1	1	10
	5.410969–6.137627	1,289,883	0	0	11
	6.137627–6.773453	1,280,133	1	0	11
	6.773453–7.409278	1,399,032	6	4	10
	7.409278–8.045104	1,374,035	3	2	10
	8.045104–8.680929	1,199,716	9	8	10
	8.680929–9.498419	1,238,009	14	12	9
	9.498419–10.588406	1,214,072	22	20	8
	10.588406–12.314218	1,215,006	18	16	9
	12.314218–25.757385	1,151,207	32	31	7
Soil	Metasediments and phyllites	896,047	1	1	6
	Hard acidic yellow and red mottled soils	2,458,265	65	40	2
	Sandstone, cracking clays and shales	1,860,005	16	13	5
	Leached sands and siliceous sands	537,366	0	0	6
	Porous loamy soils, clay and friable earth	34,892	0	0	5
	Sandstones, hard acidic yellow and red soils	2,639,304	18	10	6
	Clays and loamy soils	803,627	2	3	6
	Shallow and stony leached loams	2,754	0	0	5
	Hard acidic mottled soils with leached sands	157,001	3	29	5
	Sandy or loamy red earths	106,873	0	0	5
	Moderate and shallow dark cracking clays	250,975	0	0	6
	Sandstones	1,589,702	1	0	6
	Shallow dark cracking clays	538,555	0	0	6
	Dark cracking clays	348,651	0	0	6
	Red and brown friable porous earth	1	0	0	5
	Rock outcrops and friable soils	83,011	0	0	5
	Loamy soils with clay	164,592	0	0	5
Geology	Phyllite and greywacke	1,835,300	42	7	5
	Sandstone, siltstone, shale and conglomerate	2,882,371	14	1	8
	Sand, silt, mud and gravel	929,694	7	2	7
	Granite, granodiorite, tonalite, diorite and gabbro	127,866	0	0	7
	Shale, conglomerate, sandstone, coal, siltstone, basalt and tuff	811,388	19	8	6
	Basaltic lavas with local rhyolite	927,150	0	0	8
	Andesitic to rhyolitic flows and volcaniclastic rocks	52,795	11	71	6
	Andesite	19,650	0	0	7
	Sandstone, mudstone and conglomerate	676,567	5	2	7
	Sandstone, siltstone, mudstone, coal and conglomerate	3,362,012	4	0	10
	Poorly lithified sandstone, conglomerate and mudstone	3,362,012	4	0	10
	Ferricrete and silcrete	263,124	3	3	7
	Basalt to gabbro plugs	192,005	1	1	7
LULC	Reservoir/dam	48,873	1	12	3
	Waste treatment and disposal	7,155	0	0	3
	Lake	9,314	0	0	3
	Marsh/wetland	978	0	0	3
	River	81,631	0	0	3
	Channel/aqueduct	779	0	0	3
	Nature conservation	767,825	0	0	4
	Managed resource protection	16,674	0	0	3
	Other minimal use	959,487	8	4	3
	Livestock grazing	6,937,978	24	2	6
	Production forestry	4,793	0	0	3
	Plantation forestry	105,294	0	0	3
	Grazing modified pastures	14,728	0	0	3
	Cropping	8,177	0	0	3
	Perennial horticulture	7,073	0	0	3
	Land in transition	226	0	0	3
	Irrigated modified pastures	177,342	0	0	3
	Irrigated cropping	420,877	3	4	3
	Irrigated perennial horticulture	11,025	0	0	3
	Irrigated seasonal horticulture	10,3824	0	0	3
	Intensive horticulture	1,397	0	0	3
	Intensive animal production	90,621	2	13	3
	Manufacturing and industrial	159,058	3	11	3
	Residential	1,873,566	27	8	3
	Services	471,135	35	43	2
	Utilities	13,453	0	0	3
	Transport and communication	21,731	0	0	3
	Mining	156,607	3	11	3
Distance from Roads(m)	0	401,007	8	26	9
	0–117.16	2,280,545	53	31	6
	117.16–351.48	1,932,712	20	13	9
	351.48–585.81	1,213,919	6	6	10
	585.81–937.29	1,343,246	6	6	10
	937.29–1405.93	1,314,392	7	7	10
	1405.93–1991.74	1,097,365	2	2	10
	1991.74–2811.87	1,031,440	2	2	10
	2811.87–4100.64	943,365	0	0	10
	4100.64–29876.13	913,630	2	2	10
Distance from Rivers (m)	0–489.65	1,241,245	31	31	7
	489.65–1305.74	1,482,252	49	42	6
	1305.74–2285.05	1,296,926	13	12	9
	2285.05–3590.81	1,313,062	4	3	10
	3590–5059.76	1,281,742	1	0	11
	5059.76–6691.94	1,234,808	0	0	11
	6691.94–9140.22	1,205,804	4	4	10
	9140.22–12894.24	1,142,553	4	4	10
	12894.24–18606.88	1,135,295	0	0	11
	18606.88–41620.6	1,137,934	0	0	11
Rainfall (mm/day)	1.86–2.81	1,245,402	3	2	10
	2.81–2.86	1,214,009	0	0	11
	2.86–2.92	1,278,634	9	8	10
	2.92–2.98	1,205,609	15	14	9
	2.98–3.04	1,459,866	13	10	9
	3.04–3.09	1,397,625	9	7	10
	3.09–3.17	1,344,633	6	5	10
	3.17–3.31	1,142,961	4	3	10
	3.31–3.42	1,049,373	18	19	9

DOI: 10.7717/peerj.7653/table-4

The SPI range 157700.42–315400.84 had the highest flood susceptibility with a Bel value of 100. With regard to TWI, the highest flood potential was observed in the range 12.31–25.76 because this range showed the highest Bel (31) and lowest Dis (7) values. As the value of TWI increases, the water infiltration decreases, which can cause floods. Soil and geology also control water penetration and infiltration. The class of “Hard acidic yellow and red mottled soils” in soil and the class of “Andesitic to rhyolitic flows and volcaniclastic rocks” in geology received the highest Bel values of 40 and 71, respectively. The first three classes of distance from river, which were 0–489.65 m, 489.65–1305.74 m, and 1305.74–2285.05 m, received the highest Bel values. River proximity is one of the main factors in flood studies. The results showed that the areas closer to the river had higher chances of inundation. Heavy precipitation causes the ground to quickly become saturated and flood. This was confirmed by the acquired weight from EBF for the rainfall map. The highest rainfall class, 3.31–3.42, received the highest Bel value of 19 and a low Dis value of 9.

As described in ‘Ensemble Modeling’, every conditioning factor was reclassified based on the derived EBF weights and used in SVM analysis to implement ensemble modeling. The ensemble method was applied using all four SVM kernels. The kernel parameters were derived from the cross-validation (Table 3). The final step involved the derivation of four flood probability indices. In addition, another flood probability map was generated using an individual SVM and an RBF kernel. The stand-alone SVM was undertaken using the original flood conditioning factors, which were not classified by the EBF results. Figure 4 illustrates the six flood probability index maps.

Figure 4: Flood probability index maps derived from: (A) individual EBF, (B) individual SVM, (C) ensemble EBF and SVM-RBF, (D) ensemble EBF and SVM-LN, (E) ensemble EBF and SVM-PL and (F) ensemble EBF and SVM-SIG.

Download full-size image

DOI: 10.7717/peerj.7653/fig-4

Creations of flood susceptibility maps

To produce the flood susceptibility maps, the flood probability index has to be classified into different zones of susceptibility (Pradhan, 2013; Tehrany, Jones & Shabani, 2019). Natural break, equal interval, and quantile are some of the most commonly used methods in natural hazard probability index classification (Ayalew & Yamagishi, 2005). Two factors of data nature and data application influence the choice of classification method (Tehrany, Jones & Shabani, 2019). For instance, quantile is a technique that, without affecting the data, groups the pixels into same-size classes. This means that it groups equal numbers of pixels (area) into each susceptibility zone (Nampak, Pradhan & Manap, 2014). Therefore, it appears to be the most suitable method for classifying the flood probability index. To facilitate a reliable assessment of the impact of each class of a flood conditioning factor on flood occurrence, we attempted, where possible, to reduce the influence of the classification algorithm on the classes of the conditioning factor. However, natural break and equal interval might lead to a class with a large number of pixels and a class with few values (Chung & Fabbri, 2003). The flood susceptibility maps were produced by dividing each flood probability index into five susceptible classes of very low, low, moderate, high, and very high using a quantile method as seen in Fig. 5. The selected number of classes was based on the literature (Pham et al., 2017b; Termeh et al., 2018)

Figure 5: Flood susceptibility maps derived from: (A) individual EBF, (B) individual SVM, (C) ensemble EBF and SVM-RBF, (D) ensemble EBF and SVM-LN, (E) ensemble EBF and SVM-PL and (F) ensemble EBF and SVM-SIG.

Download full-size image

DOI: 10.7717/peerj.7653/fig-5

Accuracy assessment

To evaluate the reliability of the derived susceptibility maps, an accuracy assessment was performed using the AUC method (Fig. 6). The AUC results showed that the highest prediction (92.11%) and success (94.32%) rates were achieved by the ensemble EBF-SVM–RBF method. The individual methods produced lower accuracies (EBF: 82.60% success rate and 89.56% prediction rate; SVM: 86.91% success rate and 83.53% prediction rate) compared to all the ensemble methods except the ensemble EBF-SVM–LN method (81.21% success rate and 74.70% prediction rate). The reason is that the linear kernel is not appropriate for use in non-linear phenomena such as flooding. Based on the achieved accuracies, the ensemble EBF and SVM method can be used instead of the individual methods to improve the accuracy of the final maps. This can help planners to recognize the most susceptible areas with higher certainty. Using the ensemble technique improved the success rate by 12% and 7% and the prediction rate by 3% and 9% over the individual EBF and SVM methods, respectively.

According to the results obtained in this study, ensemble modeling provided considerable advantages compared to the traditional methods. For example, the processing time for SVM was significantly reduced due to the pre-analysis of the flood conditioning factors. Hence, the factors were assessed and reclassified based on the EBF analysis and then used as an input for SVM. This quickened the machine learning process. In terms of cost, there are no direct differences among the methods in terms of performance; however, reducing the processing time in a large-scale analysis may speed up the management process, thereby reducing the damage costs in hazardous areas.

Figure 6: (A) Success rate, (B) prediction rate of flood susceptibility derived from (1) individual EBF, (2) individual SVM, (3) ensemble EBF and SVM-RBF, (4) ensemble EBF and SVM-LN, (5) ensemble EBF and SVM-PL and (6) ensemble EBF and SVM-SIG.

Download full-size image

DOI: 10.7717/peerj.7653/fig-6

Sensitivity analysis

As described earlier in the methodology section, every dataset includes an inevitable amount of uncertainty. The SA in this study was performed using the Jackknife test, and its outcomes are summarized in Table 5. The highest loss of performance or PRD ≈ 8.23 of the AUC method was achieved when slope was omitted from the conditioning factor dataset. This was followed by SPI (PRD ≈ 8.11) and geology (PRD ≈ 7.33). A higher PRD indicates that those conditioning factors provide specific information to the model that cannot be found in other factors. On the contrary, some of the conditioning factors did not represent strong contributions to the spatial prediction of flood occurrence such as distance from road (PRD ≈ 0.22), soil (PRD ≈ 0.65), and distance from river (PRD ≈ 0.71). These outcomes show that flood susceptibility mapping is highly sensitive to slope, SPI, geology, altitude, and LULC. Such an SA assists researchers in recognizing the most influential parameters in flood analysis. It is important to consider that these factors might be different in each study area.

Table 5:

The Jackknife test results of variables when each conditioning factor is excluded in ensemble model.

Excluded factor	Decrease of AUC	Percent of relative decrease (PRD) of AUC
Slope	8.23	9.81
SPI	8.11	9.65
Geology	7.33	8.65
Altitude	6.98	8.20
LULC	6.16	7.17
Aspect	3.54	4.00
TWI	2.77	3.10
Curvature	0.87	0.95
Rainfall	0.73	0.80
Distance from river	0.71	0.78
Soil	0.65	0.71
Distance from road	0.22	0.24

DOI: 10.7717/peerj.7653/table-5

Conclusion

Proper and reliable techniques and strategies are required to assist governments and planners in identifying areas that are susceptible to floods and avoiding future urban development plans in these areas. Therefore, advancements in studies based on floods and available techniques are required to enhance our understanding the occurrence of floods varied climate and catchment conditions. To overcome the weaknesses of the stand-alone EBF and SVM methods, the more sophisticated ensemble methods can be used. In this study, a novel ensemble EBF-SVM method was developed, applied, and examined for the assessment of flood susceptibility mapping of the Brisbane Catchment, Australia, using GIS and SPSS Clementine V.14.2. Each of these methods is considered an efficient and powerful statistical technique. However, to enhance their performance, they were ensembled and used in this study. EBF and SVM were used to perform BSA and MSA, respectively. All four SVM kernels and their impacts were also considered. The ensemble method was applied four times using different kernels to identify the most proficient SVM kernel type. In addition, both EBF and SVM were used individually to obtain flood probability indices. The success rate and prediction rate of the AUC method were used to examine the strength and prediction capabilities of all the applied methods. The best accuracy was achieved by using the ensemble EBF-SVM–RBF method, with AUC of 94.32% and 92.11% for prediction and success rates, respectively. These values were approximately 6% higher than those obtained with the stand-alone models. The identified ensemble method offered the best fit for reasonable automatic flood conditioning parameter classification without any expert knowledge requirement. The performances of individual methods were enhanced by their integration. SVM offers different kernel types that can be selected based on the objective and data availability of each study. Each kernel is suitable for specific conditions, and each produces considerably different outcomes. Although the improvement in prediction was approximately 3% and 9% compared to the current individual EBF and SVM methods, respectively, the improvement is significant. Any increase in prediction accuracy can have a significant impact on flood mitigation planning, and the relevant method should be tested under different scenarios and implemented where possible.

Supplemental Information

Spatial data GIS file

DOI: 10.7717/peerj.7653/supp-1

Download

[1] Abdullah AF, Vojinovic Z, Rahman AA. 2013. A methodology for processing raw LiDAR data to support urban flood modelling framework: case study—Kuala Lumpur Malaysia. In: Rahman AA, Boguslawski P, Gold C, Said M, eds. Developments in multidimensional spatial data models. Berlin: Springer. 49-68

[2] Abdulwahid WM, Pradhan B. 2017. Landslide vulnerability and risk assessment for multi-hazard scenarios using airborne laser scanning data (LiDAR) Landslides 14(3):1057-1076

[3] Althuwaynee OF, Pradhan B, Lee S. 2012. Application of an evidential belief function model in landslide susceptibility mapping. Computers & Geosciences 44:120-135

[4] Althuwaynee OF, Pradhan B, Park HJ, Lee JH. 2014a. A novel ensemble bivariate statistical evidential belief function with knowledge-based analytical hierarchy process and multivariate statistical logistic regression for landslide susceptibility mapping. Catena 114:21-36

[5] Althuwaynee OF, Pradhan B, Park HJ, Lee JH. 2014b. A novel ensemble decision tree-based CHi-squared Automatic Interaction Detection (CHAID) and multivariate logistic regression models in landslide susceptibility mapping. Landslides 11(6):1063-1078

[6] Awasthi A, Chauhan SS. 2011. Using AHP and Dempster–Shafer theory for evaluating sustainable transport solutions. Environmental Modelling & Software 26(6):787-796

[7] Ayalew L, Yamagishi H. 2005. The application of GIS-based logistic regression for landslide susceptibility mapping in the Kakuda-Yahiko, Mountains Central Japan. Geomorphology 65(1):15-31

[8] Ballabio C, Sterlacchini S. 2012. Support vector machines for landslide susceptibility mapping: the Staffora River Basin case study, Italy. Mathematical Geosciences 44(1):47-70

[9] Beguería S. 2006. Validation and evaluation of predictive models in hazard assessment and risk management. Natural Hazards 37(3):315-329

[10] Botzen W, Aerts J, Van den Bergh J. 2013. Individual preferences for reducing flood risk to near zero through elevation. Mitigation and Adaptation Strategies for Global 18(2):229-244

[11] Brown JD, Heuvelink GB. 2007. The Data Uncertainty Engine (DUE): a software tool for assessing and simulating uncertain environmental variables. Computers & Geosciences 33(2):172-190

[12] Bui DT, Pradhan B, Lofman O, Revhaug I, Dick OB. 2012. Spatial prediction of landslide hazards in Hoa Binh province (Vietnam): a comparative assessment of the efficacy of evidential belief functions and fuzzy logic models. Catena 96:28-40

[13] Campolo M, Soldati A, Andreussi P. 2003. Artificial neural network approach to flood forecasting in the River Arno. Hydrological Sciences Journal 48(3):381-398

[14] Carranza EJM. 2009. Controls on mineral deposit occurrence inferred from analysis of their spatial pattern and spatial association with geological features. Ore Geology Reviews 35(3):383-400

[15] Carranza EJM, Hale M. 2003. Evidential belief functions for data-driven geologically constrained mapping of gold potential, Baguio district, Philippines. Ore Geology Reviews 22(1):117-132

[16] Carranza E, Woldai T, Chikambwe E. 2005. Application of data-driven evidential belief functions to prospectivity mapping for aquamarine-bearing pegmatites, Lundazi district, Zambia. Natural Resources Research 14(1):47-63

[17] Carrara A, Crosta G, Frattini P. 2003. Geomorphological and historical data in assessing landslide hazard. Earth Surface Processes and Landforms 28(10):1125-1142

[18] Chen W, Yan X, Zhao Z, Hong H, Bui DT, Pradhan B. 2019. Spatial prediction of landslide susceptibility using data mining-based kernel logistic regression, naive Bayes and RBFNetwork models for the Long County area (China) Bulletin of Engineering Geology and the Environment 78(1):247-266

[19] Chen CY, Yu FC. 2011. Morphometric analysis of debris flows and their source areas using GIS. Geomorphology 129(3):387-397

[20] Chen Y, Yu J, Khan S. 2010. Spatial sensitivity analysis of multi-criteria weights in GIS-based land suitability evaluation. Environmental Modelling & Software 25(12):1582-1591

[21] Choubin B, Moradi E, Golshan M, Adamowski J, Sajedi-Hosseini F, Mosavi A. 2019. An Ensemble prediction of flood susceptibility using multivariate discriminant analysis, classification and regression trees, and support vector machines. Science of the Total Environment 651:2087-2096

[22] Chung C-JF, Fabbri AG. 2003. Validation of spatial prediction models for landslide hazard mapping. Natural Hazards 30(3):451-472

[23] Cortes C, Vapnik V. 1995. Support-vector networks. Machine Learning 20(3):273-297

[24] Dehghani M, Riahi-Madvar H, Hooshyaripor F, Mosavi A, Shamshirband S, Zavadskas EK, Chau K.-W. 2019. Prediction of hydropower generation using grey wolf optimization adaptive neuro-fuzzy inference system. Energies 12(2):289-309

[25] Dempster A. 2008. Upper and lower probabilities induced by a multivalued mapping. In: Classic works of the dempster-shafer theory of belief functions. Berlin, Heidelberg: Springer. 57-72

[26] Ford A, Miller JM, Mol AG. 2015. A comparative analysis of weights of evidence, evidential belief functions, and fuzzy logic for mineral potential mapping using incomplete data at the scale of investigation. Natural Resources Research 25(1):19-33

[27] Fotovatikhah F, Herrera M, Shamshirband S, Chau K-W, Ardabili SFaizollahzadeh, Piran MJ. 2018. Survey of computational intelligence as basis to big flood management: Challenges, research directions and future work. Engineering Applications of Computational Fluid Mechanics 12(1):411-437

[28] Friedel MJ. 2005. Coupled inverse modeling of vadose zone water, heat, and solute transport: calibration constraints, parameter nonuniqueness, and predictive uncertainty. Journal of Hydrology 312(1–4):148-175

[29] Ghalkhani H, Golian S, Saghafian B, Farokhnia A, Shamseldin A. 2013. Application of surrogate artificial intelligent models for real-time flood routing. Water and Environment Journal 27(4):535-548

[30] Hand DJ, Till RJ. 2001. A simple generalisation of the area under the ROC curve for multiple class classification problems. Machine Learning 45(2):171-186

[31] Herder C. 2013. Impacts of land use changes on the hydrology of Wondo Genet catchment in Ethiopia.

[32] Hölting B, Coldewey WG. 2019. Surface water infiltration. In: Hydrogeology. Berlin, Heidelberg: Springer. 33-37

[33] Huabin W, Gangjun L, Weiya X, Gonghui W. 2005. GIS-based landslide hazard assessment: an overview. Progress in Physical Geography 29(4):548-567

[34] Kalantar B, Pradhan B, Naghibi SA, Motevalli A, Mansor S. 2018. Assessment of the effects of training data selection on the landslide susceptibility mapping: a comparison between support vector machine (SVM), logistic regression (LR) and artificial neural networks (ANN) Geomatics, Natural Hazards and Risk 9(1):49-69

[35] Kavzoglu T, Colkesen I. 2009. A kernel functions analysis for support vector machines for land cover classification. International Journal of Applied Earth Observation 11(5):352-359

[36] Kia MB, Pirasteh S, Pradhan B, Mahmud AR, Sulaiman WNA, Moradi A. 2012. An artificial neural network model for flood simulation using GIS: Johor River Basin, Malaysia. Environmental Earth Sciences 67(1):251-264

[37] Kjeldsen TR. 2010. Modelling the impact of urbanization on flood frequency relationships in the UK. Hydrology Research 41(5):391-405

[38] Lawal D, Matori A, Hashim A, Yusof K, Chandio I. 2012. Detecting flood susceptible areas using GIS-based analytic hierarchy process. In: Paper presented at the international conference on future environment and energy.

[39] Lee S, Hwang J, Park I. 2013. Application of data-driven evidential belief functions to landslide susceptibility mapping in Jinbu, Korea. Catena 100:15-30

[40] Lee S, Oh HJ. 2012. Ensemble-based landslide susceptibility maps in Jinbu area, Korea. In: Terrigenous mass movements. Springer Heidelberg New York Dordrecht London: Springer. 193-220

[41] Lee S, Pradhan B. 2007. Landslide hazard mapping at Selangor, Malaysia using frequency ratio and logistic regression models. Landslides 4(1):33-41

[42] Merz B, Thieken A, Gocht M. 2007. Flood risk mapping at the local scale: concepts and challenges. In: Begum S, Stive MJF, Hall JW, eds. Flood risk management in Europe, 25 Advances in natural and technological hazards research. Dordrecht: Springer. 231-251

[43] Micheletti N, Foresti L, Kanevski M, Pedrazzini A, Jaboyedoff M. 2011. Landslide susceptibility mapping using adaptive support vector machines and feature selection. Master’s thesis, University of Lausanne Faculty of Geosciences and Environment, Lausanne, Switzerland thesis

[44] Nampak H, Pradhan B, Manap MA. 2014. Application of GIS based data driven evidential belief function model to predict groundwater potential zonation. Journal of Hydrology 513:283-300

[45] Neitsch SL, Arnold JG, Kiniry JR, Williams JR. 2002. Soil and water assessment tool, theoretical documentation, Version 2000, Texas Water Resources Institute, College Station, Texas, USA, 2002.

[46] Park S, Choi C, Kim B, Kim J. 2013. Landslide susceptibility mapping using frequency ratio, analytic hierarchy process, logistic regression, and artificial neural network methods at the Inje area, Korea. Environmental Earth Sciences 68(5):1443-1464

[47] Partridge I. 2001. Will it rain? The effects of the Southern oscillation and El Niño on Australia (Second Edition). Brisbane: Queensland Department of Primary Industries.

[48] Pham BT, Bui DT, Pham HV, Le HQ, Prakash I, Dholakia M. 2017a. Landslide hazard assessment using random subspace fuzzy rules based classifier ensemble and probability analysis of rainfall data: a case study at Mu Cang Chai District, Yen Bai Province (Viet Nam) Journal of the Indian Society of Remote Sensing 45(4):673-683

[49] Pham BT, Bui DT, Prakash I, Dholakia M. 2017b. Hybrid integration of Multilayer Perceptron Neural Networks and machine learning ensembles for landslide susceptibility assessment at Himalayan area (India) using GIS. Catena 149:52-63

[50] Pourghasemi HR. 2016. GIS-based forest fire susceptibility mapping in Iran: a comparison between evidential belief function and binary logistic regression models. Scandinavian Journal of Forest Research 31(1):80-98

[51] Pourghasemi HR, Beheshtirad M. 2015. Assessment of a data-driven evidential belief function model and GIS for groundwater potential mapping in the Koohrang Watershed, Iran. Geocarto International 30(6):662-685

[52] Pourghasemi HR, Jirandeh AG, Pradhan B, Xu C, Gokceoglu C. 2013a. Landslide susceptibility mapping using support vector machine and GIS at the Golestan Province, Iran. Journal of Earth System Science 122(2):349-369

[53] Pourghasemi HR, Pradhan B, Gokceoglu C, Mohammadi M, Moradi HR. 2013b. Application of weights-of-evidence and certainty factor models and their comparison in landslide susceptibility mapping at Haraz watershed, Iran. Arabian Journal of Geosciences 6(7):2351-2365

[54] Pradhan B. 2010. Flood susceptible mapping and risk area delineation using logistic regression, GIS and remote sensing. Journal of Spatial Hydrology 9(2):1-7

[55] Pradhan B. 2013. A comparative study on the predictive ability of the decision tree, support vector machine and neuro-fuzzy models in landslide susceptibility mapping using GIS. Computers & Geosciences 51:350-365

[56] Pradhan B, Abokharima MH, Jebur MN, Tehrany MS. 2014. Land subsidence susceptibility mapping at Kinta Valley (Malaysia) using the evidential belief function model in GIS. Natural Hazards 73(2):1019-1042

[57] Pradhan B, Youssef A. 2011. A 100-year maximum flood susceptibility mapping using integrated hydrological and hydrodynamic models: Kelantan River Corridor, Malaysia. Journal of Flood Risk Management 4(3):189-202

[58] Qasem SN, Samadianfard S, Nahand HS, Mosavi A, Shamshirband S, Chau K-W. 2019. Estimating daily dew point temperature using machine learning algorithms. Water 11(3):582-595

[59] Rahmati O, Pourghasemi HR, Melesse AM. 2016. Application of GIS-based data driven random forest and maximum entropy models for groundwater potential mapping: a case study at Mehran Region, Iran. Catena 137:360-372

[60] Refsgaard JC, Van der Sluijs JP, Højberg AL, Vanrolleghem PA. 2007. Uncertainty in the environmental modelling process—a framework and guidance. Environmental Modelling & Software 22(11):1543-1556

[61] Rokach L. 2010. Ensemble-based classifiers. Artificial Intelligence Review 33(1–2):1-39

[62] Sajedi Hosseini F, Choubin B, Solaimani K, Cerdà A, Kavian A. 2018. Spatial prediction of soil erosion susceptibility using a fuzzy analytical network process: application of the fuzzy decision making trial and evaluation laboratory approach. Land Degradation & Development 29(9):3092-3103

[63] Samui P. 2008. Slope stability analysis: a support vector machine approach. Environmental Geology 56(2):255-267

[64] Seckin N, Cobaner M, Yurtal R, Haktanir T. 2013. Comparison of artificial neural network methods with L-moments for estimating flood flow at ungauged sites: the case of East Mediterranean River Basin, Turkey. Water Resources Management 27(7):2103-2124

[65] Shao Y-H, Deng N-Y. 2012. A coordinate descent margin based-twin support vector machine for classification. Neural Networks 25:114-121

[66] Shu C, Burn DH. 2004. Artificial neural network ensembles and their application in pooled flood frequency analysis. Water Resources Research 40(9):1-10

[67] Shuster W, Bonta J, Thurston H, Warnemuende E, Smith D. 2005. Impacts of impervious surface on watershed hydrology: a review. Urban Water Journal 2(4):263-275

[68] Smets P. 1994. What is Dempster-Shafer’s model. In: Advances in the Dempster-Shafer theory of evidence. Hoboken: John Wiley and Sons. 5-34

[69] Song S, Zhan Z, Long Z, Zhang J, Yao L. 2011. Comparative study of SVM methods combined with voxel selection for object category classification on fMRI data. PLOS ONE 6(2):e17191

[70] Tehrany MS, Jones S, Shabani F. 2019. Identifying the essential flood conditioning factors for flood prone area mapping using machine learning techniques. Catena 175:174-192

[71] Tehrany MS, Lee M-J, Pradhan B, Jebur MN, Lee S. 2014. Flood susceptibility mapping using integrated bivariate and multivariate statistical models. Environmental Earth Sciences 72(10):4001-4015

[72] Tehrany MS, Pradhan B, Jebur MN. 2013. Spatial prediction of flood susceptible areas using rule based decision tree (DT) and a novel ensemble bivariate and multivariate statistical models in GIS. Journal of Hydrology 504:69-79

[73] Tehrany MS, Pradhan B, Jebur MN. 2014. Flood susceptibility mapping using a novel ensemble weights-of-evidence and support vector machine models in GIS. Journal of Hydrology 512:332-343

[74] Tehrany MS, Pradhan B, Mansor S, Ahmad N. 2015. Flood susceptibility assessment using GIS-based support vector machine model with different kernel types. Catena 125:91-101

[75] Tehrany MS, Shabani F, Javier DN, Kumar L. 2017. Soil erosion susceptibility mapping for current and 2,100 climate conditions using evidential belief function and frequency ratio. Geomatics, Natural Hazards and Risk 8(2):1695-1714

[76] Termeh SVR, Kornejady A, Pourghasemi HR, Keesstra S. 2018. Flood susceptibility mapping using novel ensembles of adaptive neuro fuzzy inference system and metaheuristic algorithms. Science of the Total Environment 615:438-451

[77] Thiam AK. 2005. An evidential reasoning approach to land degradation evaluation: Dempster-Shafer theory of evidence. Transactions in GIS 9(4):507-520

[78] Tsangaratos P, Ilia I. 2016. Comparison of a logistic regression and Naïve Bayes classifier in landslide susceptibility assessments: the influence of models complexity and training dataset size. Catena 145:164-179

[79] Umar Z, Pradhan B, Ahmad A, Jebur MN, Tehrany MS. 2014. Earthquake induced landslide susceptibility mapping using an integrated ensemble frequency ratio and logistic regression models in West Sumatera Province, Indonesia. Catena 118:124-135

[80] Van Alphen J, Martini F, Loat R, Slomp R, Passchier R. 2009. Flood risk mapping in Europe, experiences and best practices. Journal of Flood Risk Management 2(4):285-292

[81] Xu C, Xu X, Yu G. 2012. Earthquake triggered landslide hazard mapping and validation related with the 2010 Port-au-Prince, Haiti earthquake. Disaster Advances 5(4):1297-1304

[82] Yang J. 2011. Convergence and uncertainty analyses in Monte-Carlo based sensitivity analysis. Environmental Modelling & Software 26(4):444-457

[83] Yao X, Tham L, Dai F. 2008. Landslide susceptibility mapping based on support vector machine: a case study on natural slopes of Hong Kong, China. Geomorphology 101(4):572-582