Prediction Success of Machine Learning Methods for Flash Flood Susceptibility Mapping in the Tafresh Watershed, Iran

Janizadeh, Saeid; Avand, Mohammadtaghi; Jaafari, Abolfazl; Phong, Tran Van; Bayat, Mahmoud; Ahmadisharaf, Ebrahim; Prakash, Indra; Pham, Binh Thai; Lee, Saro

doi:10.3390/su11195426

Open AccessArticle

Prediction Success of Machine Learning Methods for Flash Flood Susceptibility Mapping in the Tafresh Watershed, Iran

¹

Department of Watershed Management Engineering, College of Natural Resources, Tarbiat Modares University, Tehran, P.O. Box 14115-111, Iran

²

Research Institute of Forests and Rangelands, Agricultural Research, Education, and Extension Organization (AREEO), Tehran 13185-116, Iran

³

Institute of Geological Sciences, Vietnam Academy of Sciences and Technology, 84 Chua Lang Street, Dong da, Hanoi, 100000, Viet Nam

⁴

DHI, Lakewood, CO 80228, USA

⁵

Department of Science & Technology, Bhaskarcharya Institute for Space Applications and Geo-Informatics (BISAG), Government of Gujarat, Gandhinagar 382007, India

⁶

Institute of Research and Development, Duy Tan University, Da Nang 550000, Viet Nam

⁷

Geoscience Platform Research Division, Korea Institute of Geoscience and Mineral Resources (KIGAM), 124, Gwahak-ro Yuseong-gu, Daejeon 34132, Korea

⁸

Department of Geophysical Exploration, Korea University of Science and Technology, 217 Gajeong-ro Yuseong-gu, Daejeon 34113, Korea

^*

Authors to whom correspondence should be addressed.

Sustainability 2019, 11(19), 5426; https://0-doi-org.brum.beds.ac.uk/10.3390/su11195426

Submission received: 7 September 2019 / Revised: 25 September 2019 / Accepted: 29 September 2019 / Published: 30 September 2019

(This article belongs to the Special Issue Sustainable Applications of Remote Sensing and Geospatial Information Systems to Earth Observations)

Download

Browse Figures

Versions Notes

Abstract

:

Floods are some of the most destructive and catastrophic disasters worldwide. Development of management plans needs a deep understanding of the likelihood and magnitude of future flood events. The purpose of this research was to estimate flash flood susceptibility in the Tafresh watershed, Iran, using five machine learning methods, i.e., alternating decision tree (ADT), functional tree (FT), kernel logistic regression (KLR), multilayer perceptron (MLP), and quadratic discriminant analysis (QDA). A geospatial database including 320 historical flood events was constructed and eight geo-environmental variables—elevation, slope, slope aspect, distance from rivers, average annual rainfall, land use, soil type, and lithology—were used as flood influencing factors. Based on a variety of performance metrics, it is revealed that the ADT method was dominant over the other methods. The FT method was ranked as the second-best method, followed by the KLR, MLP, and QDA. Given a few differences between the goodness-of-fit and prediction success of the methods, we concluded that all these five machine-learning-based models are applicable for flood susceptibility mapping in other areas to protect societies from devastating floods.

Keywords:

alternating decision tree; data mining; spatial modeling; susceptibility mapping; GIS

1. Introduction

Floods are destructive disasters that endanger human life and cause global economic losses of about 60 billion USD annually [1]. In general, floods are divided into five types based on their locations and causes, including riverine flooding, urban drainage, ground failures, fluctuating lake levels and coastal flooding and erosion [2]. Flash floods are common types of riverine flooding that occur when a large amount of water is discharged within a few minutes or hours (three to six hours) of excessive rainfall, the collapse of natural ice, or a dam failure [3]. Identifying the areas susceptible to flooding supports well-informed flood management and helps in reducing risks and losses [4,5,6]. Development of flood susceptibility maps is also key in comprehensive watershed management. However, accurate prediction of areas susceptible to flooding and production of reliable susceptibility maps is multifaceted and time-consuming due to the complexity of flood occurrences, interaction of various geo-environmental and anthropogenic factors, and the paucity of accurate data [5,6,7,8,9].

Global advancement of computational techniques and geographical information system (GIS) [10] has resulted in development of various spatially explicit modeling approaches that rely on the physically-based hydraulic models and data-driven methods; these could be used for flood susceptibility modeling and mapping [5,6,7,8,9,11]. Hydraulic models have been extensively used to simulate flood characteristics [12,13,14]. Recently, there has been a growing interest in data-driven methods as suitable alternatives to hydraulic models. These methods are especially advantageous when: (i) relatively large amount of data is available; (ii) the study watershed is not subject to substantial changes over the analysis time; and (iii) implementation of complex computational hydraulic models is difficult [15]. Compared to hydraulic models, data-driven methods allow for rapid flood modeling as they are less sensitive to input data, easier to implement and more computationally efficient [16,17]. Some of the notable methods include statistical and probabilistic models [7,18], multi-criteria decision making [19], logistic regression (LR) [18,20], decision trees [8,21,22], artificial neural network (ANN) [6,23], extreme learning machine [6], support vector machine (SVM) [6,22] and low-complexity tools such as AutoRoute and height above the nearest drainage (HAND) [24,25]. Despite these applications, other data-driven techniques have been rarely explored for their capability in flood modeling.

This study explored the prediction success of five machine learning techniques—alternating decision tree (ADT), functional tree (FT), Kernel logistic regression (KLR), multilayer perceptron (MLP) and quadratic discriminant analysis (QDA)—for flood susceptibility mapping in the Tafresh watershed, Iran. Our specific objectives are to: (i) explore and compare the efficiency of these five techniques to produce flood susceptibility maps; and (ii) generate a flood susceptibility map for the study area.

2. Study Area

Tafresh watershed is located in the Markazi Province, Iran (Figure 1). The 1605 km² watershed is characterized by a mountainous topography, cold winters and relatively moderate summers. Mean annual rainfall and evaporation are 304 and 1921 mm. The average temperatures in summer and winter were recorded as 19.2 and 6.4 °C, respectively, with 73 freezing days over the year [26]. The major rivers of this city are: Abkamer, which originates from the southern highlands of Tafresh and joins the Qara Chai River, about 4 km northwest of Tafresh; Farminin, that originates from Rudbar village and after irrigating parts of the city, flows into the Salt Lake; and Qara Chai, that irrigates parts of the north of the city and then flows into the Salt Lake of Qom [27]. Due to heavy rainfall in winter, the discharge of the rivers increases and causes severe overbank flooding [27]. In addition, due to the Mediterranean climate of the province, which causes heavy rainfall during spring and autumn, flash floods occur frequently during these seasons [26].

3. Methodology

The flood susceptibility mapping methodology consists of four main steps (Figure 2): (i) construction of a geospatial database for influential factors and historical flood events; (ii) development of the machine learning models; (iii) model validation against historical flood events; and (iv) generation of flood susceptibility maps. A detailed description of each step is presented in the following subsections.

3.1. Geospatial Database

3.1.1. Inventory Map of Historical Floods

Tafresh watershed is one of the flood-prone watersheds that is usually affected by floods due to topographic and climatic characteristics. Numerous floods have occurred in the past and caused severe damages to the human life and buildings. One of the major floods in the watershed occurred in 2017, with a 24-hour rainfall depth of 90 mm, resulting in over one million USD damages in the watershed (Figure 3).

Here, the geographic locations of 320 historical floods were obtained from the Regional Water Organization of Markazi Province to develop an inventory flood map. These floods were then divided into two groups. The first group included 70% of flood data, which was then used as the training dataset, and the second group included 30% of the remaining data, which was used as the validation dataset.

3.1.2. Flood Influencing Factors

Based on an extensive review of the literature [3,5,6,7,8,9,11,18,19,20,21,22,28], the characteristics of the historical floods in the study watershed and multiple field observations, we selected eight influential factors—elevation, slope, slope aspect, distance from rivers, average annual rainfall, land use, soil type and lithology—for flood susceptibility mapping.

Elevation is an important variable in flood occurrence [5,6,18]. In general, flooding and elevation have an inverse relationship, as low-elevated areas are more prone to flooding [8,11]. Here, the elevation map of the study area was extracted from a digital elevation model (DEM) with a 12.5 m pixel size that was obtained from the ALOS PALSAR sensor (https://earth.esa.int/web/guest/home) (Figure 4a). Slope is one of the influential factors in the occurrence of floods due to its direct impact on surface runoff and infiltration potential [8]. Flood-prone areas are often located within flat landscapes [6,22] as floods likely have long durations, which cause water stagnation (long flood duration) that causes environmental hazard. The slope map was derived from the DEM and classified into five classes (Figure 4b). Aspect is another influencing factor in determining the flood occurrence because it is directly associated with the convergence and direction of water flow [18,22,29]. The aspect map of the research area was extracted from the DEM and classified into nine classes (Figure 4c). Distance from rivers has a significant effect on the probability and magnitude of flooding because the terrestrial water storages are highly associated with flood events [6,7,18,22]. The map of this factor was prepared based on the Euclidian distance and divided into six classes (Figure 4d). Rainfall depth is a key factor that could have the greatest effect on flooding [6,7,18,22]. The spatial distribution of average annual rainfall in the Tafresh watershed was prepared using the metrological data obtained from the period of 1993–2018 [26] (Figure 4e). Land use has a significant effect on flood susceptibility [30,31]. The land-use map was derived from the OLI Landsat satellite imagery (https://landsat.gsfc.nasa.gov/operational-land-imager-oli/) of 2017/6/24 using the maximum likelihood algorithm and supervised environment classification in the ENVI software [32] (Figure 4f). Soil type was another influencing factor because it controls the infiltration and runoff [7,21]. The soil type map was obtained from the Natural Resources Office of the Markazi Province, Iran (Figure 4g). The last factor was lithology, which represents units of rocks and soils that affect infiltration and runoff [6,7,33]. We obtained the lithology map from the Administrative Office of the Natural Resources of Markazi Province, Iran (Figure 4h, Table 1).

3.2. Training and Validation Datasets

The geospatial database that was constructed in the first step of the modeling methodology was used to generate training and validation datasets for the modeling process. To this end, the flood inventory map was randomly divided into two sets; one set with 70% of historical flood locations, was used for training and another set with the remaining 30% of flood locations was used for validation [6,7,33]. Similar to the flood locations, 360 unflooded samples were selected from the unflooded portions of the study area and used to complete the training and validation datasets. Flooded and unflooded datasets were overlaid to generate the final training and validation datasets.

3.3. Spatial Relationship

Using the frequency ratio (FR) method, we investigated the spatial relationship between the components of the historical floods and each of the eight influencing factors. For each class of the influencing factors, the FR was calculated using the following equation [7,34,35]:

F R = (\frac{\frac{a}{b}}{\frac{c}{d}})

(1)

where a: number of flood pixels within the class i of a given factor; b: total number of flood pixels in the domain; c: number of pixels in class i of a given factor; d: total number of pixels in the domain.

3.4. Machine Learning Methods

Here, we briefly describe the five machine learning methods used in this study. A full description of each method can be found in the literature [36,37,38].

3.4.1. Alternating Decision Tree (ADT)

ADT is an integration of decision trees and boosting procedures proposed to increase the prediction accuracy of binary classification problems [36]. This method alters decision nodes, which indicate a predicate condition, and prediction nodes, which consist of a single number. Decision nodes determine a predicate condition, while prediction nodes contain a single number. ADT is grown using a boosting algorithm for numeric prediction, in which a decision node and its two prediction nodes are constructed at each boosting iteration step [39]. Each prediction node is assigned a weight that represents the contribution of the node to the final prediction score. The summation of all the contribution weights yields the final prediction probability. This procedure differs from other decision tree based methods such as classification and regression tree (CART) or C4.5, in which a sample follows only one path through the tree [40].

3.4.2. Functional Tree (FT)

FT is a multivariate decision tree that uses a combination of traits in leaves and/or in internal nodes to develop a hierarchical framework for handling classification problems [37]. For these problems, FT utilizes a logistic regression function for splitting the functional inner nodes and for prediction at the functional leaves. This is the main advantage of the FT model over conventional hierarchical models that only use the input data.

3.4.3. Kernel Logistic Regression (KLR)

KLR is a traditional classification method based on minimizing the negative log-likelihood function that utilizes the Broyden–Fletcher–Goldfarb–Shanno algorithm to estimate the probabilistic outcomes. In contrast to the LR, KLR has the ability to classify inseparable linear problems by transferring input characteristics to a higher-dimensional space through the kernel [41]. KLR that requires only solving an unconstrained quadratic can provide probabilities and the straightforward extent to multi-class classification problems. Proper parameter tuning makes KLR a computationally efficient method. We used the statistical software R to implement KLR and tuned the parameter using a trial and error process.

3.4.4. Multilayer Perceptron (MLP)

MLP represents a three-layer structure that consists of an input layer, an output layer and one or more intermediate layers, which are not directly connected to input data and outgoing outputs. The input layer units are only responsible for distributing the inputs to the next layer and the output layer also provides the response of the output signals. In this tier, the number of neurons is equal to the number of inputs and outputs and the hidden layers of the relationship interact with the input and output layers. In MLP, there is no definite algorithm for determining the number of hidden layers and the number of neurons and this is often done by trial and error [42,43].

3.4.5. Quadratic Discriminant Analysis (QDA)

QDA is a conventional classification technique with a quadratic decision surface to deal with different covariance values. In the QDA, measurements in each class are assumed to be normal. An advantage of this method is that QDA does not assume the same covariance of each class and can cope with different amounts of covariance classes. QDA is an easy-to-use and attractive classification technique because it does not ask for parameter tuning. This method has been successfully applied in various modeling practices [44,45].

3.5. Performance Metrics

3.5.1. Receiver Operating Characteristic (ROC) Curve

ROC curve is of the most commonly used procedures for checking the performance of the predictive models [46,47]. ROC curve is a two-dimensional curve that plots the true positive rate (TPR) on the y-axis and the false positive rate (FPR) on the x-axis. The ROC curve quantifies the performance of a model using the area under the curve (AUC), with values having a range of 0.5–1.0. A higher AUC indicates a better model performance [48,49].

3.5.2. Statistical Indices

Seven statistical indices were used for further assessment of the model performances: positive predictive value (PPV), negative predictive value (NPV), sensitivity (SST), specificity (SPF), accuracy (ACC), Kappa and root-mean-square error (RMSE). Given the nature of a flood modeling problem, which is treated as a binary pattern recognition problem (0 = unflooded, 1= flooded), these indices are calculated as [50,51]:

P P V = \frac{A}{A + B}

(2)

N P V = \frac{C}{C + D}

(3)

S S T = \frac{A}{A + D}

(4)

S P F = \frac{C}{C + B}

(5)

A C C = \frac{A + C}{A + C + B + D}

(6)

where A, B, C and D are the numbers of true positives, false positives, true negatives and false negatives, respectively.

K = \frac{P_{O} - P_{e s t}}{1 - P_{est}}

(7)

P_o = (A + C)

(8)

P_est = (A + D) × (A + D) + (B + C) × (D + C)

(9)

where P_a is the relative observed agreement among raters and P_est is the hypothetical probability of chance agreement.

R M S E = \frac{1}{n} \sum_{i = 1}^{n} ({\bar{X}}_{i} - X_{i}^{2})

(10)

where n is the number of samples, and

{\bar{X}}_{i}

, and X_i², are actual and predicted values of the outputs, respectively.

4. Results

4.1. Spatial Relationship

Spatial relationship between historical floods and influencing factors (measured via the FR method) revealed that the most flood-prone portions of the Tafresh watershed are located on the orchards (FR = 7.8), the lithology groups 7 and 11 (FR = 7.2 and 2.62), distance from rivers of 0–200 m (FR = 5.61 and 2.48), slope degree 0.9–3 (FR = 1.99), the residential areas (FR = 1.79), elevation of 1296–1823 m (FR = 1.49) and on the northwest slopes (FR = 1.48). Conversely, those portions of the study area with an elevation >2500 m and permit lithology groups 3, 8, 9, 12, 13 and 16 were identified as the low-susceptibility areas in the watershed (Figure 5).

4.2. Model Performance

Based on the performance metrics (Section 3.5), all models were found to be powerful for recognizing the general pattern of flood susceptibility (i.e., training performance) in the study area. ADT method, with the lowest RMSE (0.247; Figure 6) and the greatest PPV (90.4%), NPV (95.2%), SST (95%), SPF (90.9%), ACC (92.8%) (Figure 7) and Kappa (0.856) (Table 2) indices had the best performance. This method correctly classified 90.4% of the flooded cells and 95.2% of the unflooded cells, indicating an excellent agreement between predicted and observed flood events.

In the validation phase, however, the performance of the methods was inconsistent. For example, while the QDA model achieved the lowest RMSE (0.28) and the greatest PPV (94.4%), SPF (93.8%), ACC (90.1%) and Kappa (0.803) values, the MLP model had the highest NPV (94.6%) and SST (94.2%). The FT model, with the greatest RMSE (0.309) and with the lowest NPV (80.3%), SST (82.9%) and ACC (88%), was ranked as the least effective model, although this model was successful at classifying 95% of the unflooded cells (SPF) and 95.8% of all pixels in the flooded class (PPV), with an excellent agreement between predictions and observations (Kappa = 0.761). The AUC values for the training and validation phases (Figure 8) demonstrated the superiority of the ADT model (AUC_training = 0.981; AUC_validation = 0.972) for modeling flood susceptibility, followed by the MLP (AUC_training = 0.963; AUC_validation = 0.959), QDA (AUC_training = 0.954; AUC_validation = 0.956), KLR (AUC_training = 0.953; AUC_validation = 0.95) and FT (AUC_training = 0.95; AUC_validation = 0.95) models, sequentially.

4.3. Flood Susceptibility Maps

We applied the validated models to estimate flood susceptibility values in the study watershed. The flood susceptibility values were then reclassified into five susceptibility classes—very low, low, moderate, high and very high—using the geometrical intervals classification scheme. This resulted in five flood susceptibility maps (Figure 9), one for each machine learning method. Among the five methods, the QDA predicted the greatest portion (26.1%) of the watershed into very high susceptibility, whereas the ADT predicted the smallest area (12.9%) to very high susceptibility (Figure 9a). Despite the difference in the performance of the models, all the models suggested that the low-lying areas along the rivers, orchards and the residential areas (western part of the watershed) are the most flood-prone portions of the study watershed. Overall, nearly 30% of the Tafresh watershed is covered by the high and very high flood susceptibilities, indicating that the mitigation strategies (e.g., warning systems) and monitoring plans should primarily focus on these portions of the watershed.

A further analysis of the susceptibility maps showed that in each map, the greatest number (Figure 10b) and the highest FR (Figure 10c) of the flood pixels belong to the very high susceptibility class, followed by the high, moderate, low and very low classes, sequentially. This indicated that the models performed satisfactorily in demarcating various levels of flood susceptibility across the study watershed.

5. Discussion and Conclusions

Identifying and zoning the flood-prone areas is one of the important measures for development of mitigation plans and proper resource allocation in response to future flood events. Despite the universal application of machine learning techniques for prediction of floods, generating a reliable flood susceptibility map is still a challenging task. In this study, we applied five machine learning methods—ADT, FT, KLR, MLP and QDA—and compared their predictive performance in the Tafresh watershed, Iran. Nine flood influencing factors were used in flood susceptibility mapping. Our results demonstrated that the ADT method was dominant over the other four methods in terms of overall training and validation performance. This finding was in agreement with past flood susceptibility mapping studies. For example, Khosravi et al. [52] demonstrated the capability of the ADT model over the logistic model tree (LMT), reduced error pruning tree and Naïve Bayes tree models for flood prediction in the Haraz watershed, Iran. In a recent study, Costache [53] found that the most accurate flood susceptibility map for the center of Romania is derived from the ADT model, which outperformed the weights of evidence and LMT models. Further, our results are supported by previous findings that machine learning methods represent predictive flood models with high capability and reliability [54]. For example, using the multivariate discriminant analysis (MDA), classification and regression trees (CART), SVM [22], genetic algorithm rule-set production (GARP), quick unbiased efficient statistical tree (QUEST) [28], ANN [23], adaptive neuro fuzzy inference system (ANFIS) [55] and boosted regression trees (BRT) [9] methods, the researchers successfully predicted and mapped the distribution of flood susceptibilities within different regions of Iran. Similar results have also been reported from USA [56], Australia [57], China [58], Vietnam [6] and Romania [59]. Additionally, the literature consists of several successful experiments of using machine learning methods for the prediction of landslide [39], wildfire [50] and gully erosion [60].

ADT is a robust algorithm against the potential errors of a modeling process and provides significant improvement in classification error [36]. In addition to a robust classification scheme, ADT represents a measure of confidence, known as the classification margin, which helps the model to easily learn alternating trees from the training dataset [50]. The overall advantage of the ADT model is ease of implementation because this method does not have several hyper-parameters to be tuned and modelers just deal with the number of boosting iterations [52]. Nonetheless, in line with the previous works that acknowledged the efficiency of these five machine learning methods for modeling different types of natural hazards [39,50,60], we found that all the five methods are fairly straightforward and easy to implement within the open-source WEKA software [61] for flood susceptibility mapping. We suggest that these methods could be applied in many different landscapes for flood susceptibility mapping.

Our results revealed that the high susceptibility classes of flood occurrences are associated with those portions of the research area that are characterized by human activities such as orchards and residential areas. The extensive levee systems within the study area have significantly reduced the land area for floodwater storage, suggesting the need for redesigning the levee system to increase floodwater storage and at the same time provide the population and infrastructure with widespread flood protection. It is noteworthy that due to substantial human activities in the Tafresh watershed, the probability of floods in the present study do not seem to be constant over a long period of time, highlighting the need for periodic assessments of flood susceptibility for adopting better informed flood mitigation strategies.

Our results have implications for regional flood planning and watershed protection. In the study watershed, stream gauges with adequate observations are not available, which makes it impossible to develop reliable hydraulic models. The five data-mining methods could be effective alternatives for flood susceptibility mapping in these situations (limited hydrologic and geomorphologic observations) and in ungauged watersheds. The flood susceptibility maps generated here could be used to identify areas that need mitigation actions. In addition to the pre-flood management stage that was illustrated here, the computationally efficient data mining methods could be applied for flood-related studies where fast computation is crucial. A potential application could be real-time flood forecasting, where the application of hydraulic models would require a massive amount of time. The data-driven methods presented here could be effectively used for emergency management to guide evacuation plans, which are directly linked to public safety.

6. Summary

In this study, we evaluated five machine learning methods for flash flood susceptibility mapping in the Tafresh watershed, Iran. The methods showed a few differences in performance. Therefore, we found that all these methods are suitable for flood susceptibility mapping. Since the study watershed lacks stream gauges with adequate observations, it is not possible to develop reliable hydraulic models. Therefore, the five data-driven methods used here could be effective alternates for flood susceptibility mapping.

Modeling frameworks like what is presented here are useful for optimal flood management and for sustainable conservation of the human society. They also contribute to improving the understanding of planners, managers and engineers to review their conservation plans in response to future floods. The results are, however, only valid for the study watershed and cannot be extrapolated to other areas. Various sources of uncertainty also exist in this study. These include selection of influencing factors, subjective classification of flood influencing factors, spatial resolution of datasets, training and validation datasets and choice of performance metrics. Each of these requires further research to show how these uncertainties affect the ultimate flood susceptibility maps and subsequent decision making. Future work should investigate the impact of these uncertainties by selection of other flood factors such as daily or sub-daily rainfall, HAND, stream power index and topographic wetness index, classifying the flood factors together with stakeholders [62,63], performing a sensitivity analysis on the impact of classification of observed dataset (other than 70% and 30% for training and validation) and evaluating the efficiency of the five methods via alternate goodness-of-fit measures [64].

Author Contributions

Conceptualization, S.J., M.A., A.J., T.V.P. and B.T.P.; data curation, S.J. and M.A.; methodology, A.J., T.V.P. and B.T.P.; writing—original draft preparation, all authors; writing—review and editing, A.J. and E.A.; supervision, E.A., B.T.P. and S.L.; funding acquisition, B.T.P and S.L.

Funding

This research was supported by the Basic Research Project of the Korea Institute of Geoscience, Mineral Resources (KIGAM) funded by the Minister of Science and ICT, and Universiti Teknologi Malaysia (UTM) based on Research University Grant (Q.J130000.2527.17H84).

Conflicts of Interest

The authors declare no conflict of interest.

References

Convertino, M.; Annis, A.; Nardi, F. Information-theoretic portfolio decision model for optimal flood management. Environ. Model. Softw. 2019, 119, 258–274. [Google Scholar] [CrossRef] [Green Version]
Wright, J.M. Floodplain Management: Principles and Current Practices; The University of Tennessee–Knoxville: Knoxville, TN, USA, 2008. [Google Scholar]
Wang, X.; Kinsland, G.; Poudel, D.; Fenech, A. Urban flood prediction under heavy precipitation. J. Hydrol. 2019, 577, 123984. [Google Scholar] [CrossRef]
Dawson, C.W.; Abrahart, R.J.; Shamseldin, A.Y.; Wilby, R.L. Flood estimation at ungauged sites using artificial neural networks. J. Hydrol. 2006, 319, 391–409. [Google Scholar] [CrossRef] [Green Version]
Tehrany, M.S.; Pradhan, B.; Jebur, M.N. Flood susceptibility mapping using a novel ensemble weights-of-evidence and support vector machine models in GIS. J. Hydrol. 2014, 512, 332–343. [Google Scholar] [CrossRef]
Bui, D.T.; Ngo, P.T.T.; Pham, T.D.; Jaafari, A.; Minh, N.Q.; Hoa, P.V.; Samui, P. A novel hybrid approach based on a swarm intelligence optimized extreme learning machine for flash flood susceptibility mapping. Catena 2019, 179, 184–196. [Google Scholar] [CrossRef]
Rahmati, O.; Pourghasemi, H.R.; Zeinivand, H. Flood susceptibility mapping using frequency ratio and weights-of-evidence models in the Golastan Province, Iran. Geocarto Int. 2016, 31, 42–70. [Google Scholar] [CrossRef]
Chapi, K.; Singh, V.P.; Shirzadi, A.; Shahabi, H.; Bui, D.T.; Pham, B.T.; Khosravi, K. A novel hybrid artificial intelligence approach for flood susceptibility assessment. Environ. Model. Softw. 2017, 95, 229–245. [Google Scholar] [CrossRef]
Shafizadeh-Moghadam, H.; Valavi, R.; Shahabi, H.; Chapi, K.; Shirzadi, A. Novel forecasting approaches using combination of machine learning and statistical models for flood susceptibility mapping. J. Environ. Manag. 2018, 217, 1–11. [Google Scholar] [CrossRef] [Green Version]
Amade, N.; Painho, M.; Oliveira, T. Geographic information technology usage in developing countries—A case study in Mozambique. Geo Spat. Inf. Sci. 2018, 21, 331–345. [Google Scholar] [CrossRef]
Chen, W.; Hong, H.; Li, S.; Shahabi, H.; Wang, Y.; Wang, X.; Ahmad, B.B. Flood susceptibility modelling using novel hybrid approach of reduced-error pruning trees with bagging and random subspace ensembles. J. Hydrol. 2019, 575, 864–873. [Google Scholar] [CrossRef]
Ahmadisharaf, E.; Kalyanapu, A.J.; Bates, P.D. A probabilistic framework for floodplain mapping using hydrological modeling and unsteady hydraulic modeling. Hydrol. Sci. J. 2018, 63, 1759–1775. [Google Scholar] [CrossRef]
Aronica, G.; Franza, F.; Bates, P.; Neal, J. Probabilistic evaluation of flood hazard in urban areas using Monte Carlo simulation. Hydrol. Process. 2012, 26, 3962–3972. [Google Scholar] [CrossRef]
Bates, P.D.; Horritt, M.S.; Aronica, G.; Beven, K. Bayesian updating of flood inundation likelihoods conditioned on flood extent data. Hydrol. Process. 2004, 18, 3347–3370. [Google Scholar] [CrossRef]
Solomatine, D.P.; Ostfeld, A. Data-driven modelling: Some past experiences and new approaches. J. Hydroinform. 2008, 10, 3–22. [Google Scholar] [CrossRef]
Moayedi, H.; Tien Bui, D.; Gör, M.; Pradhan, B.; Jaafari, A. The feasibility of three prediction techniques of the artificial neural network, adaptive neuro-fuzzy inference system, and hybrid particle swarm optimization for assessing the safety factor of cohesive slopes. ISPRS Int. J. Geo Inf. 2019, 8, 391. [Google Scholar] [CrossRef]
Liu, W.K.; Karniakis, G.; Tang, S.; Yvonnet, J. A Computational Mechanics Special Issue on: Data-Driven Modeling and Simulation—Theory, Methods, and Applications; Springer: Berlin, Germany, 2019. [Google Scholar]
Shafapour Tehrany, M.; Kumar, L.; Neamah Jebur, M.; Shabani, F. Evaluating the application of the statistical index method in flood susceptibility mapping and its comparison with frequency ratio and logistic regression methods. Geomat. Nat. Hazards Risk 2019, 10, 79–101. [Google Scholar] [CrossRef]
Vojtek, M.; Vojteková, J. Flood Susceptibility Mapping on a National Scale in Slovakia Using the Analytical Hierarchy Process. Water 2019, 11, 364. [Google Scholar] [CrossRef]
Nandi, A.; Mandal, A.; Wilson, M.; Smith, D. Flood hazard mapping in Jamaica using principal component analysis and logistic regression. Environ. Earth Sci. 2016, 75, 465. [Google Scholar] [CrossRef]
Khosravi, K.; Shahabi, H.; Pham, B.T.; Adamowski, J.; Shirzadi, A.; Pradhan, B.; Dou, J.; Ly, H.B.; Gróf, G.; Ho, H.L.; et al. A comparative assessment of flood susceptibility modeling using Multi-Criteria Decision-Making Analysis and Machine Learning Methods. J. Hydrol. 2019, 573, 311–323. [Google Scholar] [CrossRef]
Choubin, B.; Moradi, E.; Golshan, M.; Adamowski, J.; Sajedi-Hosseini, F.; Mosavi, A. An Ensemble prediction of flood susceptibility using multivariate discriminant analysis, classification and regression trees, and support vector machines. Sci. Total Environ. 2019, 651, 2087–2096. [Google Scholar] [CrossRef]
Kia, M.B.; Pirasteh, S.; Pradhan, B.; Mahmud, A.R.; Sulaiman, W.N.A.; Moradi, A. An artificial neural network model for flood simulation using GIS: Johor River Basin, Malaysia. Environ. Earth Sci. 2012, 67, 251–264. [Google Scholar] [CrossRef]
Jafarzadegan, K.; Merwade, V. Probabilistic floodplain mapping using HAND-based statistical approach. Geomorphology 2019, 324, 48–61. [Google Scholar] [CrossRef]
Afshari, S.; Tavakoly, A.A.; Rajib, M.A.; Zheng, X.; Follum, M.L.; Omranian, E.; Fekete, B.M. Comparison of new generation low-complexity flood inundation mapping tools with a hydrodynamic model. J. Hydrol. 2018, 556, 539–556. [Google Scholar] [CrossRef]
Islamic Republic of Iran Meteorological Organization (IRIMO). 2019. Available online: http://irimo.ir/english/monthly&annual/r25.asp (accessed on 17 February 2019).
Razi, H.A.; Rad, A.D.; Mardean, M.; Bayat, R. Preparation a corrective-Supplementary Pattern of Watershed Management Programs to Sediment Rate reduce in the Haftan Watershed, Tafresh. Geogr. Environ. Plan. 2016, 61, 1–14. [Google Scholar]
Darabi, H.; Choubin, B.; Rahmati, O.; Torabi Haghighi, A.; Pradhan, B.; Kløve, B. Urban flood risk mapping using the GARP and QUEST models: A comparative study of machine learning techniques. J. Hydrol. 2019, 569, 142–154. [Google Scholar] [CrossRef]
Jaafari, A.; Najafi, A.; Rezaeian, J.; Sattarian, A. Modeling erosion and sediment delivery from unpaved roads in the north mountainous forest of Iran. GEM Int. J. Geomath. 2015, 6, 343–356. [Google Scholar] [CrossRef]
Benito, G.; Rico, M.; Sánchez-Moya, Y.; Sopeña, A.; Thorndycraft, V.; Barriendos, M. The impact of late Holocene climatic variability and land use change on the flood hydrology of the Guadalentín River, southeast Spain. Glob. Planet. Chang. 2010, 70, 53–63. [Google Scholar] [CrossRef]
Mind′je, R.; Li, L.; Amanambu, A.C.; Nahayo, L.; Nsengiyumva, J.B.; Gasirabo, A.; Mindje, M. Flood susceptibility modeling and hazard perception in Rwanda. Int. J. Disaster Risk Reduct. 2019, 38, 101211. [Google Scholar] [CrossRef]
Mafi-Gholami, D.; Zenner, E.K.; Jaafari, A.; Ward, R.D. Modeling multi-decadal mangrove leaf area index in response to drought along the semi-arid southern coasts of Iran. Sci. Total Environ. 2019, 656, 1326–1336. [Google Scholar] [CrossRef]
Bui, D.T.; Tsangaratos, P.; Ngo, P.T.T.; Pham, T.D.; Pham, B.T. Flash flood susceptibility modeling using an optimized fuzzy rule based feature selection technique and tree based ensemble methods. Sci. Total Environ. 2019, 668, 1038–1054. [Google Scholar] [CrossRef]
Jaafari, A.; Najafi, A.; Pourghasemi, H.R.; Rezaeian, J.; Sattarian, A. GIS-based frequency ratio and index of entropy models for landslide susceptibility assessment in the Caspian forest, northern Iran. Int. J. Environ. Sci. Technol. 2014, 11, 909–926. [Google Scholar] [CrossRef] [Green Version]
Jaafari, A.; Razavi Termeh, S.V.; Bui, D.T. Genetic and firefly metaheuristic algorithms for an optimized neuro-fuzzy prediction modeling of wildfire probability. J. Environ. Manag. 2019, 243, 358–369. [Google Scholar] [CrossRef] [PubMed]
Freund, Y.; Mason, L. The alternating decision tree learning algorithm. In Proceedings of the Sixteenth International Conference on Machine Learning (ICML′99), Bled, Slovenia, 27–30 June 1999; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 1999; pp. 124–133. [Google Scholar]
Gama, J. Functional trees. Mach. Learn. 2004, 55, 219–250. [Google Scholar] [CrossRef]
Maalouf, M.; Trafalis, T.B.; Adrianto, I. Kernel logistic regression using truncated Newton method. Comput. Manag. Sci. 2011, 8, 415–428. [Google Scholar] [CrossRef]
Hong, H.; Pradhan, B.; Xu, C.; Tien Bui, D. Spatial prediction of landslide hazard at the Yihuang area (China) using two-class kernel logistic regression, alternating decision tree and support vector machines. Catena 2015, 133, 266–281. [Google Scholar] [CrossRef]
Chen, W.; Pourghasemi, H.R.; Kornejady, A.; Xie, X. GIS-based landslide susceptibility evaluation using certainty factor and index of entropy ensembled with alternating decision tree models. In Natural Hazards GIS-Based Spatial Modeling Using Data Mining Techniques; Springer: Heidelberg, Germany, 2019; pp. 225–251. [Google Scholar]
Shawe-Taylor, J.; Cristianini, N. Kernel Methods for Pattern Analysis; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar]
Bayat, M.; Ghorbanpour, M.; Zare, R.; Jaafari, A.; Thai Pham, B. Application of artificial neural networks for predicting tree survival and mortality in the Hyrcanian forest of Iran. Comput. Electron. Agric. 2019, 164, 104929. [Google Scholar] [CrossRef]
Tien Bui, D.; Moayedi, H.; Gör, M.; Jaafari, A.; Kok Foong, L. Predicting slope stability failure through machine learning paradigms. ISPRS Int. J. Geo Inf. 2019, 8, 395. [Google Scholar]
Naghibi, S.A.; Moradi Dashtpagerdi, M. Evaluation of four supervised learning methods for groundwater spring potential mapping in Khalkhal region (Iran) using GIS-based features. Hydrogeol. J. 2017, 25, 169–189. [Google Scholar] [CrossRef]
Pourghasemi, H.R.; Rahmati, O. Prediction of the landslide susceptibility: Which algorithm, which precision? Catena 2018, 162, 177–192. [Google Scholar] [CrossRef]
Jaafari, A.; Mafi-Gholami, D.; Pham, B.T.; Tien Bui, D. Wildfire probability mapping: Bivariate vs. multivariate statistics. Remote Sens. 2019, 11, 618. [Google Scholar] [CrossRef]
Hong, H.; Jaafari, A.; Zenner, E.K. Predicting spatial patterns of wildfire susceptibility in the Huichang County, China: An integrated model to analysis of landscape indicators. Ecol. Indic. 2019, 101, 878–891. [Google Scholar] [CrossRef]
Jaafari, A.; Panahi, M.; Pham, B.T.; Shahabi, H.; Bui, D.T.; Rezaie, F.; Lee, S. Meta optimization of an adaptive neuro-fuzzy inference system with grey wolf optimizer and biogeography-based optimization algorithms for spatial prediction of landslide susceptibility. Catena 2019, 175, 430–445. [Google Scholar] [CrossRef]
Jaafari, A. LiDAR-supported prediction of slope failures using an integrated ensemble weights-of-evidence and analytical hierarchy process. Environ. Earth Sci. 2018, 77, 42. [Google Scholar] [CrossRef]
Jaafari, A.; Zenner, E.K.; Pham, B.T. Wildfire spatial pattern analysis in the Zagros Mountains, Iran: A comparative study of decision tree based classifiers. Ecol. Inf. 2018, 43, 200–211. [Google Scholar] [CrossRef]
Pham, B.T.; Jaafari, A.; Prakash, I.; Singh, S.K.; Quoc, N.K.; Bui, D.T. Hybrid computational intelligence models for groundwater potential mapping. Catena 2019, 182, 104101. [Google Scholar] [CrossRef]
Khosravi, K.; Pham, B.T.; Chapi, K.; Shirzadi, A.; Shahabi, H.; Revhaug, I.; Prakash, I.; Tien Bui, D. A comparative assessment of decision trees algorithms for flash flood susceptibility modeling at Haraz watershed, northern Iran. Sci. Total Environ. 2018, 627, 744–755. [Google Scholar] [CrossRef] [PubMed]
Costache, R. Flash-flood Potential Index mapping using weights of evidence, decision Trees models and their novel hybrid integration. Stoch. Environ. Res. Risk Assess. 2019, 33, 1375–1402. [Google Scholar] [CrossRef]
Mosavi, A.; Ozturk, P.; Chau, K.-W. Flood prediction using machine learning models: Literature review. Water 2018, 10, 1536. [Google Scholar] [CrossRef]
Ahmadlou, M.; Karimi, M.; Alizadeh, S.; Shirzadi, A.; Parvinnejhad, D.; Shahabi, H.; Panahi, M. Flood susceptibility assessment using integration of adaptive network-based fuzzy inference system (ANFIS) and biogeography-based optimization (BBO) and BAT algorithms (BA). Geocarto Int. 2019, 34, 1252–1272. [Google Scholar] [CrossRef]
Tsakiri, K.; Marsellos, A.; Kapetanakis, S. Artificial Neural Network and Multiple Linear Regression for Flood Prediction in Mohawk River, New York. Water 2018, 10, 1158. [Google Scholar] [CrossRef]
Tehrany, M.S.; Jones, S.; Shabani, F. Identifying the essential flood conditioning factors for flood prone area mapping using machine learning techniques. Catena 2019, 175, 174–192. [Google Scholar] [CrossRef]
Ma, M.; Liu, C.; Zhao, G.; Xie, H.; Jia, P.; Wang, D.; Wang, H.; Hong, Y. Flash Flood Risk Analysis Based on Machine Learning Techniques in the Yunnan Province, China. Remote Sens. 2019, 11, 170. [Google Scholar] [CrossRef]
Costache, R.; Bui, D.T. Spatial prediction of flood potential using new ensembles of bivariate statistics and artificial intelligence: A case study at the Putna river catchment of Romania. Sci. Total Environ. 2019, 691, 1098–1118. [Google Scholar] [CrossRef]
Bui, D.T.; Shirzadi, A.; Shahabi, H.; Chapi, K.; Omidavr, E.; Pham, B.T.; Asl, D.T.; Khaledian, H.; Pradhan, B.; Panahi, M.; et al. A novel ensemble artificial intelligence approach for gully erosion mapping in a semi-arid watershed (Iran). Sensors 2019, 19, 2444. [Google Scholar] [CrossRef]
Hall, M.; Frank, E.; Holmes, G.; Pfahringer, B.; Reutemann, P.; Witten, I.H. The WEKA data mining software: An update. ACM SIGKDD Explor. Newsl. 2009, 11, 10–18. [Google Scholar] [CrossRef]
Ahmadisharaf, E.; Kalyanapu, A.; Chung, E.-S. Sustainability-based flood hazard mapping of the Swannanoa River watershed. Sustainability 2017, 9, 1735. [Google Scholar] [CrossRef]
de Brito, M.M.; Evers, M.; Almoradie, A.D.S. Participatory flood vulnerability assessment: A multi-criteria approach. Hydrol. Earth Syst. Sci. 2018, 22, 373–390. [Google Scholar] [CrossRef]
Ahmadisharaf, E.; Camacho, R.A.; Zhang, H.X.; Hantush, M.M.; Mohamoud, Y.M. Calibration and validation of watershed models and advances in uncertainty analysis in TMDL studies. J. Hydrol. Eng. 2019, 24, 03119001. [Google Scholar] [CrossRef]

Figure 1. Location of the study area.

Figure 2. Flowchart of the modeling methodology.

Figure 3. Examples of past flood events in the study area.

Figure 4. Influencing factors for flood susceptibility mapping: (a) elevation; (b) slope; (c) slope aspect; (d) distance from river; (e) average annual rainfall; (f) land use; (g) soil; and (h) lithology.

Figure 5. Frequency ratio of each class of the influencing factors.

Figure 6. Error of the machine learning methods. ADT, alternating decision tree; FT, functional tree; KLR, kernel logistic regression; MLP, multilayer perceptron; QDA, quadratic discriminant analysis; and RMSE, root-mean-square error.

Figure 7. Performance of the machine learning models in training and validation phases. ADT, alternating decision tree; FT, functional tree; KLR, kernel logistic regression; MLP, multilayer perceptron; QDA, quadratic discriminant analysis; PPV, positive predictive value; NPV, negative predictive value; SST, sensitivity; SPF, specificity; and ACC, accuracy.

Figure 8. Receiver operating characteristic (ROC) curves and AUC values of the models in (a) training and (b) validation phases. AUC, area under curve; ADT, alternating decision tree; FT, functional tree; KLR, kernel logistic regression; MLP, multilayer perceptron; and QDA, quadratic discriminant analysis.

Figure 9. Flood susceptibility maps derived from the (a) ADT; (b) FT; (c) KLR; (d) MLP; and (e) QDA methods. ADT, alternating decision tree; FT, functional tree; KLR, kernel logistic regression; MLP, multilayer perceptron; and QDA, quadratic discriminant analysis.

Figure 10. Quantitative analysis of the flood susceptibility maps. ADT, alternating decision tree; FT, functional tree; KLR, kernel logistic regression; MLP, multilayer perceptron; and QDA, quadratic discriminant analysis.

Table 1. Lithology units in the Tafresh watershed.

No.	Geo-unit	Description	Age
1	OMsm	Limestone, marl, gypsiferous marl, sandy marl and sandstone (QOM FM)	Oligocene–Miocene
2	URig	Red marl, gypsiferous marl, sandstone and conglomerate (Upper red Fm)	Miocene
3	Plc	Polymictic conglomerate and sandstone	Pliocene
4	Etvai	Dacitic to Andesitic volcano sediment	Eocene
5	OMbcq	Basal conglomerate and sandstone	Oligocene
6	Etlig	Andesitic to basaltic volcanic tuff	Eocene
7	Ek	Well-bedded green tuff and tuffaceous shale (KARAJ FM)	Eocene
8	EKgy	Gypsum	Late Eocene
9	Jiiv	Upper Jurassic diorite	Late Jurassic
10	E1c	Pale-red, polygenic conglomerate and sandstone	Paleocene–Eocene
11	K1l	Thick-bedded to massive, white to pinkish orbitolina-bearing limestone (TIZKUH FM)	Early Cretaceous
12	K2l	Hyporite-bearing limestone (Senonian)	Late Cretaceous
13	K2shm	Shale, calcareous shale and sandstone with intercalations of limestone	Late Cretaceous
14	Qt2	Low level piedmont fan and valley terrace deposits	Quaternary
15	TRn	Sandstone, quartz arenite, shale and fossiliferous limestone (NAIBAND FOR)	Mesozoic
16	Js	Dark grey shale and sandstone (SHEMSHAK FM)	Triassic–Jurassic

Table 2. Kappa values of the five machine learning models in the training and validation phases.

Model	Training	Validation
ADT	0.856	0.761
FT	0.802	0.761
KLR	0.772	0.775
MLP	0.820	0.775
QDA	0.766	0.803

ADT, alternating decision tree; FT, functional tree; KLR, kernel logistic regression; MLP, multilayer perceptron; QDA, quadratic discriminant analysis.

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Janizadeh, S.; Avand, M.; Jaafari, A.; Phong, T.V.; Bayat, M.; Ahmadisharaf, E.; Prakash, I.; Pham, B.T.; Lee, S. Prediction Success of Machine Learning Methods for Flash Flood Susceptibility Mapping in the Tafresh Watershed, Iran. Sustainability 2019, 11, 5426. https://0-doi-org.brum.beds.ac.uk/10.3390/su11195426

AMA Style

Janizadeh S, Avand M, Jaafari A, Phong TV, Bayat M, Ahmadisharaf E, Prakash I, Pham BT, Lee S. Prediction Success of Machine Learning Methods for Flash Flood Susceptibility Mapping in the Tafresh Watershed, Iran. Sustainability. 2019; 11(19):5426. https://0-doi-org.brum.beds.ac.uk/10.3390/su11195426

Chicago/Turabian Style

Janizadeh, Saeid, Mohammadtaghi Avand, Abolfazl Jaafari, Tran Van Phong, Mahmoud Bayat, Ebrahim Ahmadisharaf, Indra Prakash, Binh Thai Pham, and Saro Lee. 2019. "Prediction Success of Machine Learning Methods for Flash Flood Susceptibility Mapping in the Tafresh Watershed, Iran" Sustainability 11, no. 19: 5426. https://0-doi-org.brum.beds.ac.uk/10.3390/su11195426

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Prediction Success of Machine Learning Methods for Flash Flood Susceptibility Mapping in the Tafresh Watershed, Iran

Abstract

1. Introduction

2. Study Area

3. Methodology

3.1. Geospatial Database

3.1.1. Inventory Map of Historical Floods

3.1.2. Flood Influencing Factors

3.2. Training and Validation Datasets

3.3. Spatial Relationship

3.4. Machine Learning Methods

3.4.1. Alternating Decision Tree (ADT)

3.4.2. Functional Tree (FT)

3.4.3. Kernel Logistic Regression (KLR)

3.4.4. Multilayer Perceptron (MLP)

3.4.5. Quadratic Discriminant Analysis (QDA)

3.5. Performance Metrics

3.5.1. Receiver Operating Characteristic (ROC) Curve

3.5.2. Statistical Indices

4. Results

4.1. Spatial Relationship

4.2. Model Performance

4.3. Flood Susceptibility Maps

5. Discussion and Conclusions

6. Summary

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI