Shallow Landslide Susceptibility Mapping by Random Forest Base Classifier and Its Ensembles in a Semi-Arid Region of Iran

Nhu, Viet-Ha; Shirzadi, Ataollah; Shahabi, Himan; Chen, Wei; Clague, John J; Geertsema, Marten; Jaafari, Abolfazl; Avand, Mohammadtaghi; Miraki, Shaghayegh; Talebpour Asl, Davood; Pham, Binh Thai; Ahmad, Baharin Bin; Lee, Saro

doi:10.3390/f11040421

Open AccessArticle

Shallow Landslide Susceptibility Mapping by Random Forest Base Classifier and Its Ensembles in a Semi-Arid Region of Iran

by

Viet-Ha Nhu

^1,2

,

Ataollah Shirzadi

³

,

Himan Shahabi

^4,5

,

Wei Chen

^6,7,

John J Clague

⁸,

Marten Geertsema

⁹,

Abolfazl Jaafari

¹⁰

,

Mohammadtaghi Avand

¹¹

,

Shaghayegh Miraki

¹²,

Davood Talebpour Asl

⁴,

Binh Thai Pham

^13,*

,

Baharin Bin Ahmad

¹⁴ and

Saro Lee

^15,16,*

¹

Geographic Information Science Research Group, Ton Duc Thang University, Ho Chi Minh City, Vietnam

²

Faculty of Environment and Labour Safety, Ton Duc Thang University, Ho Chi Minh City, Vietnam

³

Department of Rangeland and Watershed Management, Faculty of Natural Resources, University of Kurdistan, Sanandaj 66177-15175, Iran

⁴

Department of Geomorphology, Faculty of Natural Resources, University of Kurdistan, Sanandaj 66177-15175, Iran

⁵

Board Member of Department of Zrebar Lake Environmental Research, Kurdistan Studies Institute, University of Kurdistan, Sanandaj 66177-15175, Iran

⁶

College of Geology & Environment, Xi’an University of Science and Technology, Xi’an 710054, China

⁷

Key Laboratory of Coal Resources Exploration and Comprehensive Utilization, Ministry of Natural Resources, Xi’an 710021, Shaanxi, China

⁸

Department of Earth Sciences, Simon Fraser University 8888 University Drive Burnaby, Burnaby, BC V5A 1S6, Canada

⁹

British Columbia, Ministry of Forests, Lands, Natural Resource Operations and Rural Development, Prince George, BC V2L 1R5, Canada

¹⁰

Research Institute of Forests and Rangelands, Agricultural Research, Education, and Extension Organization (AREEO), Tehran 13185-116, Iran

¹¹

Department of Watershed Management Engineering and Sciences, Faculty of Natural Resources and Marine Science, Tarbiat Modares University, Tehran 14115-111, Iran

¹²

Department of Watershed Sciences Engineering, Faculty of Natural Resources, University of Agricultural Science and Natural Resources of Sari, Mazandaran 48181-68984, Iran

¹³

Institute of Research and Development, Duy Tan University, Da Nang 550000, Vietnam

¹⁴

Faculty of Built Environment and Surveying, Universiti Teknologi Malaysia (UTM), Johor Bahru 81310, Malaysia

¹⁵

Geoscience Platform Research Division, Korea Institute of Geoscience and Mineral Resources (KIGAM), 124 Gwahak-ro, Yuseong-gu, Daejeon 34132, Korea

¹⁶

Department of Geophysical Exploration, Korea University of Science and Technology, 217 Gajeong-ro, Yuseong-gu, Daejeon 34113, Korea

Show full affiliation list

Hide full affiliation list

^*

Authors to whom correspondence should be addressed.

Forests 2020, 11(4), 421; https://0-doi-org.brum.beds.ac.uk/10.3390/f11040421

Submission received: 8 February 2020 / Revised: 27 March 2020 / Accepted: 4 April 2020 / Published: 9 April 2020

(This article belongs to the Special Issue Trees: Recorders of Past Soil Erosion and Landslide Events)

Download

Browse Figures

Versions Notes

Abstract

:

We generated high-quality shallow landslide susceptibility maps for Bijar County, Kurdistan Province, Iran, using Random Forest (RAF), an ensemble computational intelligence method and three meta classifiers—Bagging (BA, BA-RAF), Random Subspace (RS, RS-RAF), and Rotation Forest (RF, RF-RAF). Modeling and validation were done on 111 shallow landslide locations using 20 conditioning factors tested by the Information Gain Ratio (IGR) technique. We assessed model performance with statistically based indexes, including sensitivity, specificity, accuracy, kappa, root mean square error (RMSE), and area under the receiver operatic characteristic curve (AUC). All four machine learning models that we tested yielded excellent goodness-of-fit and prediction accuracy, but the RF-RAF ensemble model (AUC = 0.936) outperformed the BA-RAF, RS-RAF (AUC = 0.907), and RAF (AUC = 0.812) models. The results also show that the Random Forest model significantly improved the predictive capability of the RAF-based classifier and, therefore, can be considered as a useful and an effective tool in regional shallow landslide susceptibility mapping.

Keywords:

shallow landslide; machine learning; goodness-of-fit; over-fitting; GIS; Iran

1. Introduction

Landslide susceptibility mapping is an important step in mitigating the damage and injury inflicted by landslides. Landslide susceptibility maps provide effective guidance and reference for decision-makers, managers, and engineers taking such action [1,2]. However, the selection of proper methods/models for generating the most accurate and informative landslide susceptibility maps remains challenging [3]. Several comparative studies, in which the researchers used a variety of GIS-aided methods for the development of landslide susceptibility maps, have focused on this issue. For example, Yesilnacar and Topal [4] showed that artificial neural network (ANN) performed better than logistic regression (LR) in predicting future landslides and producing landslide susceptibility maps. Yilmaz [5] reported that a susceptibility map produced by ANN was more accurate than the map generated by LR and the frequency ratio (FR). Pradhan [6] analyzed the predictive performance of the support vector machine (SVM), decision tree, and adaptive neuro-fuzzy inference (ANFIS) models and reported that ANFIS outperformed the other models. Hong et al. [7] presented a comparative study of SVM, kernel LR, and alternating decision tree (ADT) models and demonstrated the superiority of ADT over the other two models. Razavizadeh et al. [8] concluded that FR outperformed the statistical index (SI) and weights of evidence (WOF) for landslide prediction, and Juliev et al. [9] illustrated the supremacy of SI over the certainty factor (CF) and FR. Van Finally, Dao et al. [3] and Nhu et al. [10] explored the potential application of deep learning ANN for landslide modeling and prediction, and showed its superiority over several other machine learning methods.

A review of the literature indicates that statistical/probabilistic methods (e.g., FR, SI, WOE, and CF) can provide a quick estimate of landslide susceptibility with minimal computation time. The simplified representations of reality inherent in these methods, however, are a drawback given the complexity of factors that lead to slope instability [11,12]. Machine learning methods (e.g., ANN, SVM, ANFIS, decision trees) may provide a more reliable estimate of landslide susceptibility because they involve fewer simplifications, but they require more computation time [13,14,15,16,17]. For example, Van Dao et al. [3] reported that deep learning ANN requires much more computational time than conventional methods. Despite this disadvantage, machine learning methods have the ability to handle large volumes of non-linear and complex data derived from different sources and reported at a variety of scales in many fields especially in studies of natural hazards such as floods [18,19,20,21,22,23,24,25,26,27,28], wildfires [29,30], sinkholes [31], drought [32,33], earthquakes [34,35], gully erosion [36,37], and land/ground subsidence [38,39].

Recent advances in data processing techniques have shown that the performance of machine learning methods can be further improved, and their limitations alleviated, using ensemble learning methods [40,41,42,43,44], which were first introduced in the early 1990s [45,46]. Ensemble methods have outperformed single model methods in terms of accuracy and robustness [40]. Examples of the integrated application of machine learning and ensemble learning methods for landslide susceptibility mapping include: ANN with AdaBoost, Bagging, Dagging, MultiBoost, Rotation Forest, and Random Subspace [47]; Reduced Error Pruning Trees with Bagging, MultiBoost, Rotation Forest, and Random Subspace [48]; SVM with MultiBoost [49], J48 decision tree with AdaBoost, Bagging, and Rotation Forest [50]; Naïve Bayes Trees (NBT) with Random Subspace [51] and Rotation Forest [52]; and Bayesian Logistic Regression (BLR) with Random Subspace, Adaboost, Multiboost, and Bagging [53]. Such ensemble models also have been used in many other fields, for example groundwater potential mapping [46,54], gully erosion distribution modeling [55], and flood prediction [56].

To the best of our knowledge, RF has not yet been tested as a base method for these ensemble learners in the context of landslide modeling. To fill this gap, we model landslide susceptibility in Bijar County, Iran, using an ensemble learning framework based on the RF method coupled with Bagging (BA), Rotation Forest (RF), and Random Subspace (RS) techniques. We first use a geospatial data set for our study area to develop three ensemble models, namely BA-RAF, RF-RAF, and RS-RAF. We next evaluate and compare the performances of the models, and lastly derive a map of landslide susceptibility from each model.

2. Study Area

Our ~600 km² study area is located in Iran’s hilly Bijar County in Kurdistan province (Figure 1). It lies between 35°48′25″ N and 35°59′50″ N latitude and 47°28′50″ E and 47°46′44″ E longitude, and ranges in elevation from 1570 to 2550 m asl. Almost 80% of the area is dry-farmed; the remainder is irrigated land, woodland, pasture land, residential areas, and barren land. The climate of study area is classified as type D (cold weather) based on the Köppen climatic classification system. Mean annual precipitation (1987–2014) is 338 mm. Mean daily maximum and minimum temperatures are 4.4 and 13.4 °C, respectively. There are an average of 261 frost-free days and 35 days with snow [51].

Bedrock is exposed at the surface over most of the study area. The dominant rock types are sandstone and conglomerate (250 km², 42%) and shale and slate (140 km², 23%). Almost 200 km² (32%) of the area is covered by unconsolidated Quaternary colluvial and alluvial sediments of a variety of textures. The southern part of the study area is crossed by numerous, NW/SE-trending faults. The study area experiences annual shallow landslides.

3. Data Preparation

3.1. Landslide Inventory Map

A requirement for modeling landslide susceptibility is preparation of a landslide inventory map [57]. We assume, like Guzzetti et al. [58], that future landslides will happen under conditions similar to those of the past. In this study, we use 111 landslides previously mapped by the Forest, Rangeland and Watershed Management Organization of Iran [59]. We validated their locations through a combination of field examination and the use of Google Earth imagery [60]. We classified the 111 landslides as rotational slides (70.6%), complex slides (22.4%), and falls (6.3%). Landslides range in length from 70 to 280 m and in width from 7 to 293 m [48].

3.2. Landslide Conditioning Factors

We selected 20 landslide conditioning factors (LCF) to develop landslide susceptibility prediction models (Table 1). The choice of factors was based on expert opinion, previous research, and field observations. We use seven topographic factors (slope, aspect, elevation, curvature, profile curvature, plan curvature, and sediment transport index), a crucial triggering factor (rainfall), four hydrological factors (stream power index, topographic wetness index, distance to river, and river density), three geological factors (lithology, distance to faults, and fault density), two land cover factors (land use and normalized difference vegetation index (NDVI)), and two anthropogenic factors (distance to roads and road density). All conditioning factors were sampled in a grid with pixels measuring 20 × 20 m for data analysis and processing. We normalized the dataset for use in modeling and validation.

We used the ASTER DEM (30 × 30 m), transformed into a 20 × 20 m grid, to produce slope and aspect layers. Slope is an expression of changes in elevations over distance and is expressed in this study in degrees. All other things being equal, steeper slopes are more susceptible to landslides, thus slope is an important conditioning factor in landslide susceptibility prediction models [51,61,62,63,64,65]. Slope was binned into eight categories: (1) 0–5, (2) 5–10, (3)10–15, (4) 15–20, (5) 20–25, (6) 25–30, (7) 30–45, and (8) >45°. Slope aspect is a measure of the cardinal direction of a slope, expressed relative to north (00) [66]. It is related to evapotranspiration in hilly areas and thus is considered to be important in landslide susceptibility mapping [67,68]. In the present study, slope aspect was divided into nine classes: (1) flat, (2) north, (3) northeast, (4) east, (5) southeast, (6) south, (7) southwest, (8) west, and (9) northwest (Table 1).

Elevation influences moisture and temperature regimes [69]. Both temperature and precipitation affect soil moisture and commonly change with elevation. Lower elevations also may be preferentially used for roads, the construction of which could trigger landslides in hilly or mountainous areas [70]. We defined nine elevation classes: (1) 1573–1700, (2) 1700–1800, (3) 1800–1900, (4) 1900–2000, (5) 2000–2100, (6) 2100–2200, (7) 2200–2300, (8) 2300–2400, and (9) >2400 m.

The curvature of a slope – concave, convex, positive, negative, or zero – may have a relationship to slope stability [65]. In this study, slope curvature was divided to six classes: (1) [(−12.5)–(−1.4)], (2) [(−1.4)–(−0.4)], (3) [(−0.4)–(−0.2)], (4) [(−0.2)–0.9], (5) [0.9–2.5], and (6) [2.5–15.6] m⁻¹.

Plan curvature is defined as the curvature of a contour line formed by the intersection of a horizontal plane with the ground surface [63], and measured perpendicular to the maximum slope direction [71]. It affects erosion by forcing water to converge or diverge in the slope direction [72,73]. We defined six plan curvature classes: (1) [(−6.7)–(−0.8)], (2) [(−0.8)–(−0.2)], (3) [(−0.2)–0], (4) [0–0.4], (5) [0.4–1.1], (6) [1.1–10.4] m⁻¹.

Profile curvature is the derivative of curvature and may be associated with landslides, as it controls changes in the velocity of water or sediment flowing down a slope [74,75]. It provides a measure of the concavity or convexity of a slope – negative values correspond to concave slopes, positive values to convex slopes, and zero values to flat slopes [76]. We defined six curvature classes: (1) [(−10.7)–(−1.7)], (2) [(−1.7)–(−0.7)], (3) [(−0.7)–(−0.2)], (4) [(−0.2)–0.2], (5) [0.2–0.9], and (6) [0.9–7.5] m⁻¹.

The sediment transport index (STI) is a hydrological measure of the amount of sediment transported by overland flow. The index is based on catchment evolution erosion theories and on transport capacity, which limits sediment flux. It is calculated using the following formula [77]:

STI = {[\frac{A_{s}}{22.13}]}^{0.6} {[\frac{\sin β}{0.0896}]}^{1.3}

(1)

where A_s is specific catchment area (m²) and

β

is slope gradient (radians). In this study, we divided STI into six classes: (1) 0–7, (2) 7–14, (3) 14–21, (4) 21–28, (5) 28–35, (6) 35–42.

Amounts and intensities of rainfall may be positively correlated with the incidence of landslides, but the relationship is strongly controlled by topography. Rainfall on well-drained, relatively flat terrain may have less impact on slope stability than rainfall on steep slopes in hilly areas [5]. We constructed a rainfall map based on mean annual rainfall over the period 1980–2016 from records at nine rain gauge stations inside and outside the study area. Rainfall was divided into seven classes using the natural break classification method: (1) 263–270, (2) 270–300, (3) 300–330, (4) 330–360, (5) 360–390, (6) 390–420, and (7) 420–450 mm.

Annual solar radiation (h) is defined as mean solar radiation received at a given pixel in one year [78,79], and may have an indirect impact on landslide susceptibility [70]. Solar radiation directly affects evapotranspiration and is also influenced by topography. We computed solar radiation from aspect and slope values using the “Area Solar Radiation” command in ArcGIS 10.3 and defined seven classes using the natural break classification method: (1) 3.015–6.563, (2) 5.563–6.747, (3) 6.747–6.849, (4) 6.849–6.930, (5) 6.930–7.073, (6) 7.073–7.236, and (7) 7.236–8.215 hr.

The stream power index (SPI) provides a measure of the intensity and erosive power of surface runoff, and is calculated as [80]:

SPI = A_{s} \times \tan β

(2)

where

A_{s}

is the specific catchment area (m²) and

β

is the local slope gradient (radians). SPI is affected by the characteristics of the underlying soil or rock. In general, high values of SPI indicate a higher potential for landsliding. The SPI layer comprises six classes: (1) 0–998, (2) 998–6986, (3) 6986–19961, (4) 19961–45911, (5) 45911–101803, and (6) 101803–255505. About 67% of landslide pixels lie within the class of 0–998.

Topographic wetness index (TWI) is a measure of the tendency of runoff to converge on a slope [81]. TWI values were calculated as:

TWI = Ln (\frac{A_{s}}{\tan β})

(3)

where

A_{s}

is the cumulative upslope area draining through a point (m²) and

β

is the slope angle (radians) at that point. As TWI increases, landslide susceptibility may also increase. We calculated TWI from the DEM in SAGA software and then divided it into six groups using the natural break classification method: (1) 1–3, (2) 3–4, (3) 4–6, (4) 6–8, (5) 8–9, and (6) 9–11.

Distance to rivers has been shown to be an important conditioning factor for landslides. A river might, for example, undercut a side slope, increasing the likelihood of a landslide [51]. In this study, distance to river was extracted by establishing buffers around rivers and using the “Euclidean distance” tool in ArcGIS. It was divided into five classes using the natural break classification method: (1) 0–50, (2) 50–100, (3) 100–150, (4) 150–200, and (5) >200 m.

River density is an important factor in landslide occurrence, especially in mountainous regions [82], in part through its effects on groundwater recharge. The flow network in this study was derived from the DEM using the Arc-Hydrology extension. River density then was calculated in ArcGIS 10.3 using the “Line density” tool and divided into seven groups using the natural break classification methods. Values range from 0 to 13.2 and were binned into seven classes: (1) 0–1.9, (2) 1.9–3.2, (3) 3.2–4.2, (4) 4.2–5.2, (5) 5.2–6.3, (6) 6.3–7.8, and (7) 7.8–13.2 km/km².

Geology affects rock strength, soil porosity, and permeability and thus is an important factor in landsliding [83]. We extracted lithology data from a 1:100,000-scale geological map produced by the Geological Survey of Iran and verified through field work and aerial photo interpretation. We defined three units: Quaternary, Tertiary, and Cretaceous.

The probability that a slope will fail could increase if the slope is near a major fault [84]. We constructed a distance-to-fault layer from the geological map of the study area using “Euclidean distance” in ArcGIS 10.3. We defined six classes: (1) 0–200, (2) 200–400, (3) 400–600, (4) 600–800, (5) 800–1000, and (6) >1000 m.

Fault density provides a measure of bedrock fracturing, which may contribute to the occurrence of landslides [85]. A fault density layer was produced from the geological map in ArcGIS 10.3 using the “Line density” tool. It has seven classes: (1) 0–0.3, (2) 0.3–0.8, (3) 0.8–1.2, (4) 1.2–1.7, (5) 1.7–2.1, (6) 2.1–2.5, (7) 2.5–3.2 km/km².

Land use is a significant factor in slope stability analyses because development and utilization of the land affects infiltration, surface runoff, and vegetation [86]. The land-use layer was prepared using Landsat 7 (ETM+) imagery acquired on April 25, 2008, using the supervised classification method in PCI Geomatica 9.1. Five land-use classes were identified: (1) residential area, (2) arable land (dry farming and cultivated lands), (3) woodland, (4) grassland, and (5) barren land.

The normalized difference vegetation index (NDVI) provides a quantitative measure of vegetation cover on surfaces, which could be related to slope failures [87]. NDVI is calculated from reflectance measurements in the red and near-infrared portions of the electromagnetic spectrum as [88]:

NDVI = \frac{(NIR (Band 4) - Red (Band 3))}{(NIR (Band 4) + Red (Band 3))}

(4)

where Red and NIR are the spectral reflectance values in, respectively, the red and near-infrared bands. NDVI values range from −0.23 to 0.73 and were divided into seven classes using OLI sensor images from Landsat 8 in ENVI5.1: (1) [(−0.23)–(−0.061)], (2) [(−0.061)–(−0.0081)], (3) [(−0.0081)–(0.060)], (4) [(0.060)–0.14], (5) [0.14–0.24], (6) [0.24–0.41], and (7) [0.41–0.73]

Road construction can increase the likelihood of landslides in hilly and mountainous areas by reducing the strength of rock and sediment [89,90]. Cuts and fills along roads, as well as poor drainage, are implicated in most road-related landslides. We used two metrics, distance to roads and road density, to provide a measure of the cumulative impacts of road construction on the occurrence of landslides. We created a distance-to-road layer in ArcGIS 10.3 with five classes using the manual classification method: (1) 0–50, (2) 50–100, (3) 100–150, (4) 150–200, and (5) >200 m. We define road density as the cumulative length of road per unit area [4]. The road density map was constructed in ArcGIS 10.3 using the natural break classification method, with seven classes: (1) 0–0.0013, (2) 0.0013–0.0027, (3) 0.0027–0.0041, (4) 0.0041–0.0055, (5) 0.0055–0.0069, (6) 0.0069–0.0083, and (7) 0.0083–0.0097 km/km².

4. Machine Learning Models

4.1. Random Forest Decision Tree-Base Classifier

Random Forest (RF) is a modern, tree-based, machine learning method that includes a multitude of classification and regression trees [91]. It is a nonparametric method for modeling continuous and discrete data. The main problem encountered in the use of this method is fluctuations in the results of each tree. To reduce these fluctuations and estimates of variance, we used a Random Forest approach [92] that combines several decision trees and incorporates multiple bootstrap samples of the data and a number of randomly chosen input variables [93]. We bootstrapped a large number of samples from the initial observational dataset, following Micheletti et al. [94]. During sampling, we left out about one-third of the data for use in validation modeling. Then we expanded trees based on bootstrap samples. When we constructed branches in a tree with all M independent variables, we chose randomly m variables for partitioning. For regression, we chose the ratio m/M of one-third and classified it as m = √M [95]. After constructing the entire tree, we considered several other trees to determine an output. We averaged these outcomes and calculated the final output of the model by considering the empirical distribution of the outputs, the percentile values, and the range of uncertainty.

4.2. Ensemble Models

Although there are numerous effective clustering algorithms, none has emerged as being superior to the others. We used ensemble models that combine a number of simple clusters rather than a complex single cluster (Figure 2). One group of ensemble models uses fused data categories; a second group was dynamically selected. In the first group, we applied inputs simultaneously to the clusters and all outputs contributed to the final answer [96,97]. In the second group, we treated the clusters as complementary and segmented the input data and determined the best cluster for each segment using training or validation data. To classify a new sample, we first assigned that sample to a segment and then chose the output of the best cluster of that segment as the final answer.

The most important methods of the first group are Boosting, Bagging, and Rotational Forest [98].

4.2.1. Bagging

The Bagging (Bootstrap Aggregating, BA) method was proposed by Breiman [99]. The algorithm creates several classifiers,

H_{m}, m = 1, \dots, M

, to modify the training dataset and then combines them into one class. It makes this class the combined weight of individual predictor classes, according to the following equation:

H (d_{i}) = s i g n (\sum_{m = 1}^{M} α_{m} H_{m} (d_{i}))

(5)

The method can be thought of as voting. The

α_{m}, m = 1, \dots, M

variable is determined, such that more accurate classifications have a greater impact on the final prediction than less accurate ones. Because the accuracy of the basic

H_{m}

classification is slightly higher than that of the random classification, the former are called weak

H_{m}

classifications [100,101].

4.2.2. Random Subspace

Another algorithm that we used in our study is Random Subspace (RS), which is rooted in the theory of stochastic discrimination [102]. It can be difficult to identify the most relevant features in high dimensional feature space. To deal with this problem, Ho [103] proposed a multivariate method called “Random Subspace”, in which the vector of the main high-dimensional trait is sampled randomly to produce the low-dimensional subspaces. The final decision is made by combining several classifiers in the random subspace [104]. RS has been used on topics as diverse as floods, groundwater, medicine, and computers. Due to its random selection of features, this algorithm increases accuracy and minimizes error.

4.2.3. Rotation Forest

Rotation Forest (RF) is an ensemble approach introduced by Rodriguez et al. [105]. It is a decision tree-based classification method that divides the attribute space into several subsets and uses the Bagging technique on each subgroup. The decision tree using extracted features from each of these groups then is trained. This method increases the accuracy and diversity of results [105]. In RF, decision trees are used as basic classifiers, because the decision trees are sensitive to the set of attribute space and because differences in attribute space lead to more variation in results [105].

4.3. Model Validation and Comparison

4.3.1. Statistical Metrics

Machine learning statistical metrics are of three types, specifically threshold, probability, and ranking metrics [106]. Threshold and ranking metrics are most often used and comprise three groups, as follows [107]:

(i): One group is used to evaluate the generalization ability of the trained classifier, more specifically the performance of trained classifier when tested with an unseen dataset.
(ii): A second group is employed in evaluating model selection. The aim is to select the optimum classifier among a variety of trained classifiers based on their performance using an unseen dataset.
(iii): A third group selects the optimum solution among all solutions generated during the classification training. Only the optimum solution obtained from the optimum model is tested with the unseen dataset.

In this section, we investigate how accurately the machine learning techniques used in this study make decisions when presented with a previously unseen dataset (first group). We judge the goodness-of-fit and prediction accuracy of the models based on four components within a 2 × 2 confusion matrix (a posteriori probabilities of each classification), namely true positive (TP), true negative (TN), false positive (FP), and false negative (FN). TP and TN are the number of landslides that are correctly classified as, respectively, landslides and non-landslides. FP and FN are the number of non-landslides that are incorrectly classified as landslides and non-landslides [41,51]. We selected seven established and applicable threshold metrics to evaluate how well the different models predict landslides: sensitivity (SST), specificity (SPF), accuracy (ACC), kappa, mean absolute error (MAE), and root mean square error (RMSE). These indexes are calculated as follows:

SST = \frac{A}{A + D}

(6)

SPF = \frac{C}{C + B}

(7)

SST and SPF are the proportion of the landslides and non-landslides that are correctly classified as landslides and non-landslides, respectively.

ACC = \frac{A + C}{A + C + B + D}

(8)

where A is the number of true positives, B is the number of false positives, C is the number of true negatives, and D is number of false negatives. ACC is the proportion of landslides and non-landslides that are correctly classified relative to, respectively, total landslides and total non-landslides [88,108].

kappa = \frac{P_{a} {- P}_{est}}{{1 - P}_{est}}

(9)

P_{a} = (A + C) {; P}_{est} = (A + D) \times (A + D) + (B + C) \times (D + C)

(10)

where

p_{a}

is the relative observed agreement among raters and

P_{est}

is the hypothetical probability of chance agreement.

RMSE = \sqrt{\frac{1}{N}} \sum_{i = 1}^{L} {(X_{model} {- X}_{act})}^{2}

(11)

where

X_{m o d e l}

and

X_{a c t}

denote the modeled and actual (i.e., measured) values, respectively;

L

is the summation of samples.

M A E = \frac{1}{n} \sum_{i = 1}^{n} | x_{\mod e l} - x_{a c t} |

(12)

where

n

is the number of errors

x_{i}

is the prediction, and

x

is the true value.

4.3.2. ROC and AUC

We evaluated the accuracy of the final landslide susceptibility maps using a receiver operating characteristic (ROC) curve. ROC is a relative factor that shows the likelihood of a class using the Boolean method [109]. The vertical axis of the ROC curve represents the actual positive percentage and the horizontal axis is the false positive percentage [110,111]. AUC is the area under the ROC curve; the larger the value, the better the performance of the model. The normal range of AUC is 0.5–1.0. If the value is near 1.0, the accuracy of the prediction model is very high [112].

4.3.3. Friedman and Wilcoxon Sign Rank Tests

The Friedman test is a non-parametric statistical test that compares groups and determines the mean rankings of those groups [113]. It is used for two-way analysis of variance of non-parametric data by a ranking method. The null hypothesis is that the distribution of observations in repeated measurements is the same, which is there is no difference between the distributions created by repeated measurements of one group or between groups. The Wilcoxon test [114] is used to determine the size of the difference between ranks and is suitable for analysis of a sample in two different situations or of two samples from the same community. It is a substitute for the two-sample t-test if there are no t-test conditions; the samples must be paired with their other attributes.

4.4. Factor Selection Using the Information Gain Ratio Technique

Feature selection is a problem in machine learning, as well as in statistical pattern recognition. It is important in many applications, including classification, because many features in these applications are either useless or provide little information. Numerous solutions and algorithms have been proposed to address the feature selection problem. In this study, we used the Information Gain Ratio (IGR) technique to select the most important factors for landsliding. This method was introduced by Hunt et al. [115] based on information theory and developed over time by other researchers [116,117]. Average merit (AM) is a metric in feature selection that quantitatively assesses the importance and ranking of factors. In this study, AM is the weight obtained by the IGR feature selection technique. Factors are prioritized based on AM in descending order. The information gain of term t is defined as follows:

I G (t) = - \sum_{i = 1}^{| C |} P (C_{i}) l o g P_{(C_{i)}} + P (t) \sum_{i = 1}^{| C |} P_{(C_{i} / t)} \log P_{(C_{i} / t)} + P (\bar{t}) \times \sum_{i = 1}^{| C |} P_{(C_{i} / \bar{t})} \log P_{(C_{i} / \bar{t})}

(13)

where

P_{(C_{i} / t)}

indicates the

i^{t h}

category,

P (C_{i})

is the probability of the

i^{t h}

category,

P (t)

is the probability that the term

t

emerges or not,

P_{(C_{i} / t)}

is the conditional probability of the ith category if the term

t

appears, and

P_{(C_{i} / \bar{t})}

is the conditional probability of the

i^{t h}

category if the term t does not appear.

5. Analysis and Results

5.1. Factor Selection in Modeling Landslides

We analyzed the predictive power of the models using the av*-erage merit (AM) of the IGR technique with 10-fold cross-validation. The technique decreases the noise and over-fitting problems of the training dataset, and thus enhances power prediction, by selecting and removing low-weight factors during the modeling phase. In our study, the most important landslide predictor is slope angle (AM = 87.078) (Figure 3). It is followed, successively, by TWI (AM = 85.955), plan curvature (AM = 73.033), LS (AM = 69.101), curvature (AM = 64.606), land use (AM = 63.483), SPI (AM = 61.797), profile curvature (AM = 61.236), solar radiation (AM = 55.618), elevation (AM = 53.932), aspect (AM = 52.247), and rainfall (AM = 51.123). Of the 20 conditioning factors we used in this study, only these 12 were effective and thus were selected for further modeling. The six ineffective factors (AM = 0) that created noise and decreased overall accuracy are distance to river, distance to fault, distance to road, river density, fault density, road density, lithology, and NDVI.

5.2. Modeling Process and Evaluations

After selecting the most important conditioning factors, we modelled and evaluated the data of, respectively, the training and validation datasets. We selected optimal parameters, including the number of iterations (10) and number of seeds (20), by trial-and-error according to the RMSE and AUC. Results are shown in Table 2 and Figure 4. Optimal values for the Rotation Forest ensemble model, based on the lowest RMSE and the highest AUC, are 16 iterations and seven seeds. The RMSE for the training and validation datasets are, respectively, 0.274 and 0.310; AUC values for the two datasets are 0.970 and 0.958. Corresponding values for the Bagging ensemble model, based on eight seeds and 12 iterations, are 0.281 and 0.310 (RMSE), and 0.976 and 0.948 (AUC). For the Random Subspace model, values based on nine seeds and 10 iterations are 0.311 and 0.337 (RMSE) and 0.939 and 0.933.

For each model, we checked the goodness-of-fit and prediction accuracy of, respectively, the training and validation datasets. Values of statistical measures for the training dataset are shown in Table 3. Numbers of TP, TN, FP, and FN for the RAF base classifier model are, respectively, 73, 82, 16, and 7. The sensitivity of the model is 0.913, indicating that 91.3% of landslides are correctly classified. However, only 83.7% of non-landslide locations are correctly classified (specificity = 0.837). The accuracy is 0.871, meaning that 87.1% of landslide and non-landslide locations in the study area are correctly classified. Kappa, RMSE, and AUC values are, respectively, 0.741, 0.333, and 0.871.

The Rotation Forest ensemble model yielded 85 TP, 83 TN, 4 FP, and 6 FN. Sensitivity, specificity, accuracy, kappa, MAE, RMSE, and AUC are, respectively, 0.934, 0.954, 0.944, 0.832, 0.274, and 0.976. The sensitivity value for the Bagging Ensemble model is 0.928, indicating that 92.8% of landslides are correctly classified as landslides. The specificity is 0.874, meaning that 87.4% of non-landslide locations are correctly classified. Bagging accuracy is 0.899, showing that 89.9% of landslide and non-landslide locations are correctly classified. Kappa, RMSE, and AUC are, respectively, 0.805, 0.281, and 0.970. TP, TN, FP, and FN for Random Subspace ensemble model are, respectively, 77, 83, 12, and 7. Sensitivity, specificity, accuracy, kappa, RMSE, and AUC are, respectively, 0.917, 0.874, 0.894, 0.865, 0.311, and 0.933.

In summary, the highest sensitivity (0.934) was achieved by the RF ensemble model, followed by the BA (0.928), RS (0.917), and RAF (0.913) models. The highest specificity (0.954) was achieved by the RF model, followed by the RS and BA (0.874), and RAF (0.837) models. The RF model also yielded the highest accuracy (0.944), and the RAF model (0.871) had the lowest accuracy. Accuracies of the BA and RS models are, respectively, 0.899 and 0.894. The highest kappa value (0.865) was achieved by the RS model, followed by the RF (0.805), BA (0.805), and RAF (0.741) models. The RF, BA, RS models yielded the lowest RMSE, followed by the RAF model. The highest AUC in the modeling phase was achieved by RF (0.976), followed by BA (0.970), RS (0.933), and RAF (0.871). We note that all three ensemble models improve the goodness-of-fit of the RAF base classifier model.

The next step in the analysis involved testing the prediction accuracy of the models (Table 4). Results show that prediction accuracy is greatest for the RF-RAF ensemble model, followed closely by the BA-RAF model. Sensitivity for the RF-RAF ensemble model is 0.913 and specificity is 0.952, indicating that 91.3% and 95.2% of landslide and non-landslide locations are correctly classified. All ensemble models outperformed the RF base classifier; the RF-RAF ensemble model is a strong and powerful model that greatly enhanced the prediction accuracy of the RF base classifier.

5.3. Preparation of Landslide Susceptibility Maps

After we successfully tested our ensemble models and selected the best parameters, we applied each model to the training dataset to determine the probability of landslide occurrence for each map pixel. We used the natural break (NB), quantile (Q), and geometric interval (GI) methods to select the classification method that identified the largest number of landslides in the higher susceptibility classes (Figure 5). Based on these results, we chose the NB classification method for the FR base classifier model and the GI method for the RF-RAF, BA-RAF, and RS-RAF ensemble models to produce shallow landslide susceptibility maps (Figure 5).

Each landslide susceptibility map has five susceptibility classes, ranging from very low susceptibility (VLS) to very high susceptibility (VHS) (Figure 6). We organized the RF model classes as follows: very low susceptibility (VLS), probability range of 0.000–0.0490; low susceptibility (LS), probability range of 0.049–0.139; moderate susceptibly (MS), probability range of 0.139–0.239; high susceptibility (HS), probability range 0.239–0.359; and very high susceptibility (VHS), probability range 0.359–0.500). For the RF-RAF and BA-RAF ensemble models, the probability ranges are 0.000–0.009 (VLS), 0.009–0.032 (LS), 0.032–0.085 (MS), 0.085–0.208 (HS), and 0.208–0.500 (VHS). Finally, for the RS-RAF ensemble model, the ranges are 0.000–0.017 (VLS), 0.017–0.051 (LS), 0.051–0.116 (MS), 0.116–0.246 (HS), and 0.246–0.500 (VHS).

Similar landslide susceptibility patterns are evident in all the maps produced by the machine learning models (Figure 6). Susceptibility values increase from the plain (green color) to the mountains (red color). Most slopes with high and very high susceptibility ratings are bedrock covered by a thin layer of soil and colluvium.

5.4. Verification of Landslide Susceptibility Maps

ROC Curve and AUC

We assessed the performance and prediction accuracy of the landslide susceptibility maps with the ROC curve and AUC (Figure 7). Our RAF base classifier has an AUC of 0.817 for the training dataset (Figure 7a) and an AUC of 0.812 for the validation dataset (Figure 7b). The ensemble models outperformed the RAF base model. Among the ensemble models, the highest AUC for the training dataset was obtained by the RF-RAF model (0.945), followed by the BA-RAF (0.930) and RS-RAF (0.924) models (Figure 7c). Figure 7d shows the prediction accuracy of the ensemble models for the validation dataset. Here too, the RF-RAF model (AUC = 0.936) yielded the best result, followed by the BA-RAF and RS-RAF models (AUC = 0.907 for both). Although all models yielded reasonable results (the lowest AUC is 0.812), the RF-RAF ensemble model outperformed the others.

We also checked the performances of the models using the Friedman (Table 5) and Wilcoxon (Table 6) statistical tests at the 5% confidence level. The Friedman test shows the overall statistical differences between models with no pair-wise judgment. Based on the Friedman test, there is a significant difference between the models in terms of their landslide prediction performance. To determine the statistical differences between two or more models in a pair-wise manner, we employed the Wilcoxon sign rank test. Here too, we see significant differences (in p-values and z-values) between all models except the BA-RAF and RS-RAF ensemble models (Table 6).

6. Discussion

Landslide susceptibility maps are important tools for evaluating and managing landslide-prone areas. Susceptibility models with high predictive performance can be used by governmental agencies to develop landslide prevention and mitigation strategies [60]. However, producing standardized and accurate landslide susceptibility maps at catchment and regional scales is not a straightforward task. Nevertheless, accuracy can be determined with a variety of metrics [118], and the number of algorithms that evaluate accuracy is constantly increasing [119] as technology improves computational efficiency. Numerous modeling techniques have been employed over the past three decades to generate increasingly accurate landslide susceptibility maps, and several landslide researchers are seeking an optimal method or workflow in comparative studies [41,120] and for machine learning algorithms [121]. Advances are also being made in other scientific disciplines; the geomorphological community, for example, has invested a significant effort in improving and testing models [50,122,123], especially ensemble models. It is in the context of these efforts and broad scientific interest that we investigated the performance of an ensemble of three classifier methods (Bagging, Rotation Forest, and Random Subspace) used in tandem with a Random Forest machine learning model. Such tests are rare in the literature and have yet to be tested in different environments around the world.

Our study confirms that model performance improves by combining meta-classifiers with Random Forest (RF). The simpler RF benchmark produced a success rate curve (SRC) AUC of 0.817 and a prediction rate curve (PRC) AUC of 0.812, whereas the three ensembles significantly improved the performance with a SRC of 0.930 and PRC of 0.907 for Bagging, SRC of 0.924 and PRC of 0.907; and Rotation Forest –SRC = 0.945 and PRC = 0.936). Our results reveal that all three ensemble landslide models have much higher performances (SRC and PRC > 0.9) than the RF benchmark model. Of the three ensembles, the combination of Rotation Forest and Random Forest has the highest prediction capability, followed by Bagging and Random Subspace. This may be due to the greater ability of Rotation Forest to reduce model bias, variance, and over-fitting problems compared to other ensemble methods [44]. Nevertheless, all of the ensemble models we tested yielded good results, with significantly reduced over-fitting against the benchmark model. This result points to the value and efficiency of ensembling workflows for improving model performance [37,40,124,125].

Flexibility is the capacity of a model to be built on a specific dataset yet retain the ability to explain unknown landslide conditions. In our case, the flexibility we achieved is comparable to that of other recent ensemble studies [37,51,60,62,122,126,127]. For example, Shirzadi et al. [51] reported that the Random Subspace model improved the predictive power of Naive Base Tree (NBTree) for landslide susceptibility modeling. Similarly, Pham et al. [127] showed that Random Subspace improves the performance of the Classification and Regression Tree (CART) algorithm; and Bui et al. [128] confirmed that the combined use of a functional algorithm and a meta classifier prevents over-fitting, reduces noise, and enhances the power prediction of the individual Stochastic Gradient Descent (SGD) algorithm for the spatial prediction of landslides. More generally, He et al. [129] showed that a combination of meta models and a decision tree classifier enhances the prediction power of a single landslide model.

A major advantage of the three ensemble machine learning algorithms is that they automate the randomization process, enabling the researcher to examine several databases and collect valuable information. Rather than being limited to a single classifier, they generate a large number of classifiers iteratively while varying the input data randomly. Single models can be combined into one, generating a more adaptive predictive rule or scheme. Adaptation to varying parameters, for example different environmental conditions, is often sought to support planning decisions, especially in large regions with complex geology, topography, and climate. For this reason, provision of a large set of predictors is commonly a good practice [130], especially for machine learning models that tend to over-fit relative to statistical methods [131].

An important step in any environmental modeling process is determining the most significant conditioning factors to decrease over-fitting and noise problems [48]. In our work and in accordance with the most recent literature, we selected 20 conditioning factors that best explain landslide susceptibility. Factor selection techniques are used not only to improve model prediction, but also to support interpretation of the results and understand underlying geomorphological processes. Redundant information and over-complexity can affect model outcomes. Thus, it is necessary to purge non-informative factors. To achieve this objective, we used the Information Gain Ratio technique [41]. Based on IGR results, we selected 12 factors (slope, TWI, plan curvature, LS, curvature, land use, SPI, profile curvature, solar radiation, elevation, aspect, and rainfall) for modeling. Of these 12 factors, slope and TWI contributed most to the models, whereas rainfall and slope aspect contributed least. These results are in line with other studies [132,133], especially those done in Iranian environments [48,60,134]. For example, Arabameri et al. [134] used a “Landslide Numerical Risk Factor” (LNRF) bivariate model and its ensemble, together with Linear Multivariate Regression (LMR) and Boosted Regression Trees (BRT) models, in landslide susceptibility mapping and found that slope, NDVI, and land use play a crucial role in landslide occurrence in their study area. Yet in spite of the size of our study area, our conclusions are still site-specific or, at best, representative of the general climatic and geomorphic settings of the region. Thus, if we are ever to produce and agree on a standardized landslide susceptibility model, or even a meta-analytic generalization of such a model as suggested by Lombardo and Mai [135], investigations of different terrains and geological contexts must be carried out. By extending the variability of the environmental conditions included in the dataset, whether topographic or hydrologic, the ensembling procedure we have presented in this study may lead to analogous predictive performances on a much larger scale. Overall, the use in tandem of remote sensing, GIS, and data mining tools is best able to produce reliable landslide susceptibility maps and other land use planning analysis [136] that can assist decision-makers for planning and development.

7. Conclusions

We provide a successful application of Random Forest and its ensembles for shallow landslide susceptibility mapping in a semi-arid region of Iran. Our study was done in three stages. First, we prepared shallow landslide inventory maps for Bijar County, Iran. Next, we completed shallow landslide susceptibility mapping in our study area using machine learning ensemble models including Random Forest and three Meta classifiers. Finally, we validated the landslide susceptibility maps. Our dataset comprises 111 shallow landslide locations with a ratio of 80/20 for training and validation datasets. We considered 20 landslide conditioning factors and found that 12 of them best explain landslide susceptibility in the study area; the most important are slope angle and TWI. All data were elaborated in a GIS environment and tested by the Information Gain Ratio technique during modeling and validation. We computed landslide susceptibility indexes for each model and incorporated these values in the GIS to produce shallow landslide susceptibility maps. Our proposed RF-RAF model outperformed the BA-RAF, RS-RAF, and RAF models, with an overall accuracy and prediction power of, respectively, 94.4% and 93.2%. Although this model performed well in our study area, it needs to be tested in other semi-arid, hilly, and mountainous areas with thin soil and colluvial layers on bedrock. Selecting the best non-landslide locations in our study area was the main limitation of our work in the modeling stage. As these locations are randomly selected, similar trial-and-error techniques should be used for both non-landslide and landslide locations. In this study, we repeated this process about ten times to achieve a reasonable combination of training and validation datasets. In future studies, researchers should try to streamline the non-landslide selection process at the modeling stage. Overall, we propose that the RF-RAF ensemble model be considered a useful and effective tool in regional shallow landslide susceptibility assessment, as it may help decision-makers, planners, managers, and government agencies mitigate losses associated with landslides.

Author Contributions

V.-H.N., A.S., H.S., W.C., J.J.C., M.G., A.J., M.A., S.M., D.T.A., B.T.P., B.B.A., and S.L. contributed equally to the work. A.S. and H.S. collected field data and conducted the landslide mapping and analysis. A.S., H.S., W.C., A.J., M.A., S.M., D.T.A., and B.B.A. wrote the manuscript. V.-H.N., J.J.C., B.T.P., B.B.A. and S.L. provided critical comments in planning this paper and edited the manuscript. All the authors discussed the results and edited the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Basic Research Project of the Korea Institute of Geoscience, Mineral Resources (KIGAM), which is funded by the Minister of Science and ICT.

Conflicts of Interest

The authors declare no conflict of interest.

References

Corominas, J.; Moya, J. A review of assessing landslide frequency for hazard zoning purposes. Eng. Geol. 2008, 102, 193–213. [Google Scholar] [CrossRef]
Piciullo, L.; Calvello, M.; Cepeda, J.M. Territorial early warning systems for rainfall-induced landslides. Earth-Sci. Rev. 2018, 179, 228–247. [Google Scholar] [CrossRef]
Dao, D.V.; Jaafari, A.; Bayat, M.; Mafi-Gholami, D.; Qi, C.; Moayedi, H.; Phong, T.V.; Ly, H.-B.; Le, T.-T.; Trinh, P.T.; et al. A spatially explicit deep learning neural network model for the prediction of landslide susceptibility. Catena 2020, 188, 104451. [Google Scholar] [CrossRef]
Yesilnacar, E.; Topal, T. Landslide susceptibility mapping: A comparison of logistic regression and neural networks methods in a medium scale study, hendek region (turkey). Eng. Geol. 2005, 79, 251–266. [Google Scholar] [CrossRef]
Yilmaz, I. Landslide susceptibility mapping using frequency ratio, logistic regression, artificial neural networks and their comparison: A case study from kat landslides (tokat—turkey). Comput. Geosci. 2009, 35, 1125–1138. [Google Scholar] [CrossRef]
Pradhan, B. A comparative study on the predictive ability of the decision tree, support vector machine and neuro-fuzzy models in landslide susceptibility mapping using gis. Comput. Geosci. 2013, 51, 350–365. [Google Scholar] [CrossRef]
Hong, H.; Pradhan, B.; Xu, C.; Tien Bui, D. Spatial prediction of landslide hazard at the yihuang area (china) using two-class kernel logistic regression, alternating decision tree and support vector machines. Catena 2015, 133, 266–281. [Google Scholar] [CrossRef]
Razavizadeh, S.; Solaimani, K.; Massironi, M.; Kavian, A. Mapping landslide susceptibility with frequency ratio, statistical index, and weights of evidence models: A case study in northern iran. Environ. Earth Sci. 2017, 76, 499. [Google Scholar] [CrossRef]
Juliev, M.; Mergili, M.; Mondal, I.; Nurtaev, B.; Pulatov, A.; Hübl, J. Comparative analysis of statistical methods for landslide susceptibility mapping in the bostanlik district, uzbekistan. Sci. Total Environ. 2019, 653, 801–814. [Google Scholar] [CrossRef]
Nhu, V.-H.; Hoang, N.-D.; Nguyen, H.; Ngo, P.T.T.; Bui, T.T.; Hoa, P.V.; Samui, P.; Bui, D.T. Effectiveness assessment of keras based deep learning with different robust optimization algorithms for shallow landslide susceptibility mapping at tropical area. Catena 2020, 188, 104458. [Google Scholar] [CrossRef]
Thanh, D.Q.; Nguyen, D.H.; Prakash, I.; Jaafari, A.; Nguyen, V.-T.; Van Phong, T.; Pham, B.T. Gis based frequency ratio method for landslide susceptibility mapping at da lat city, lam dong province, vietnam. Vietnam J. Earth Sci. 2020, 42, 55–66. [Google Scholar] [CrossRef] [Green Version]
Jaafari, A. Lidar-supported prediction of slope failures using an integrated ensemble weights-of-evidence and analytical hierarchy process. Environ. Earth Sci. 2018, 77, 1–42. [Google Scholar] [CrossRef]
He, Q.; Shahabi, H.; Shirzadi, A.; Li, S.; Chen, W.; Wang, N.; Chai, H.; Bian, H.; Ma, J.; Chen, Y. Landslide spatial modelling using novel bivariate statistical based naïve bayes, rbf classifier, and rbf network machine learning algorithms. Sci. Total Environ. 2019, 663, 1–15. [Google Scholar] [CrossRef] [PubMed]
Tien Bui, D.; Moayedi, H.; Gör, M.; Jaafari, A.; Foong, L.K. Predicting slope stability failure through machine learning paradigms. ISPRS Int. J. Geo-Inf. 2019, 8, 395. [Google Scholar]
Moayedi, H.; Tien Bui, D.; Gör, M.; Pradhan, B.; Jaafari, A. The feasibility of three prediction techniques of the artificial neural network, adaptive neuro-fuzzy inference system, and hybrid particle swarm optimization for assessing the safety factor of cohesive slopes. ISPRS Int. J. Geo-Inf. 2019, 8, 391. [Google Scholar] [CrossRef] [Green Version]
Pham, B.T.; Prakash, I.; Jaafari, A.; Bui, D.T. Spatial prediction of rainfall-induced landslides using aggregating one-dependence estimators classifier. J. Indian Soc. Remote Sens. 2018, 46, 1457–1470. [Google Scholar] [CrossRef]
Jaafari, A.; Rezaeian, J.; Omrani, M.S. Spatial prediction of slope failures in support of forestry operations safety. Croat. J. For. Eng. 2017, 38, 107–118. [Google Scholar]
Wang, Y.; Hong, H.; Chen, W.; Li, S.; Panahi, M.; Khosravi, K.; Shirzadi, A.; Shahabi, H.; Panahi, S.; Costache, R. Flood susceptibility mapping in dingnan county (china) using adaptive neuro-fuzzy inference system with biogeography based optimization and imperialistic competitive algorithm. J. Environ. Manag. 2019, 247, 712–729. [Google Scholar] [CrossRef]
Khosravi, K.; Shahabi, H.; Pham, B.T.; Adamowski, J.; Shirzadi, A.; Pradhan, B.; Dou, J.; Ly, H.-B.; Gróf, G.; Ho, H.L. A comparative assessment of flood susceptibility modeling using multi-criteria decision-making analysis and machine learning methods. J. Hydrol. 2019, 573, 311–323. [Google Scholar] [CrossRef]
Chen, W.; Hong, H.; Li, S.; Shahabi, H.; Wang, Y.; Wang, X.; Ahmad, B.B. Flood susceptibility modelling using novel hybrid approach of reduced-error pruning trees with bagging and random subspace ensembles. J. Hydrol. 2019, 575, 864–873. [Google Scholar] [CrossRef]
Pham, B.T.; Nguyen, M.D.; Dao, D.V.; Prakash, I.; Ly, H.B.; Le, T.T.; Ho, L.S.; Nguyen, K.T.; Ngo, T.Q.; Hoang, V.; et al. Development of artificial intelligence models for the prediction of compression coefficient of soil: An application of monte carlo sensitivity analysis. Sci. Total Environ. 2019, 679, 172–184. [Google Scholar] [CrossRef]
Tien Bui, D.; Shahabi, H.; Shirzadi, A.; Chapi, K.; Hoang, N.-D.; Pham, B.; Bui, Q.-T.; Tran, C.-T.; Panahi, M.; Bin Ahamd, B. A novel integrated approach of relevance vector machine optimized by imperialist competitive algorithm for spatial modeling of shallow landslides. Remote Sens. 2018, 10, 1538. [Google Scholar] [CrossRef] [Green Version]
Tien Bui, D.; Khosravi, K.; Li, S.; Shahabi, H.; Panahi, M.; Singh, V.; Chapi, K.; Shirzadi, A.; Panahi, S.; Chen, W. New hybrids of anfis with several optimization algorithms for flood susceptibility modeling. Water 2018, 10, 1210. [Google Scholar] [CrossRef] [Green Version]
Shafizadeh-Moghadam, H.; Valavi, R.; Shahabi, H.; Chapi, K.; Shirzadi, A. Novel forecasting approaches using combination of machine learning and statistical models for flood susceptibility mapping. J. Environ. Manag. 2018, 217, 1–11. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Chapi, K.; Singh, V.P.; Shirzadi, A.; Shahabi, H.; Bui, D.T.; Pham, B.T.; Khosravi, K. A novel hybrid artificial intelligence approach for flood susceptibility assessment. Environ. Model. Softw. 2017, 95, 229–245. [Google Scholar] [CrossRef]
Chen, W.; Li, Y.; Xue, W.; Shahabi, H.; Li, S.; Hong, H.; Wang, X.; Bian, H.; Zhang, S.; Pradhan, B. Modeling flood susceptibility using data-driven approaches of naïve bayes tree, alternating decision tree, and random forest methods. Sci. Total Environ. 2020, 701, 134979. [Google Scholar] [CrossRef] [PubMed]
Shahabi, H.; Shirzadi, A.; Ghaderi, K.; Omidvar, E.; Al-Ansari, N.; Clague, J.J.; Geertsema, M.; Khosravi, K.; Amini, A.; Bahrami, S. Flood detection and susceptibility mapping using sentinel-1 remote sensing data and a machine learning approach: Hybrid intelligence of bagging ensemble based on k-nearest neighbor classifier. Remote Sens. 2020, 12, 266. [Google Scholar] [CrossRef] [Green Version]
Khosravi, K.; Melesse, A.M.; Shahabi, H.; Shirzadi, A.; Chapi, K.; Hong, H. Flood susceptibility mapping at ningdu catchment, china using bivariate and data mining techniques. In Extreme Hydrology and Climate Variability; Elsevier: Amsterdam, The Netherlands, 2019; pp. 419–434. [Google Scholar]
Jaafari, A.; Zenner, E.K.; Panahi, M.; Shahabi, H. Hybrid artificial intelligence models based on a neuro-fuzzy system and metaheuristic optimization algorithms for spatial prediction of wildfire probability. Agric. For. Meteorol. 2019, 266, 198–207. [Google Scholar] [CrossRef]
Jaafari, A.; Pourghasemi, H.R. Factors influencing regional-scale wildfire probability in iran: An application of random forest and support vector machine. In Spatial Modeling in Gis and R for Earth and Environmental Sciences; Elsevier: Amsterdam, The Netherlands, 2019; pp. 607–619. [Google Scholar]
Taheri, K.; Shahabi, H.; Chapi, K.; Shirzadi, A.; Gutiérrez, F.; Khosravi, K. Sinkhole susceptibility mapping: A comparison between bayes-based machine learning algorithms. Land Degrad. Dev. 2019, 30, 730–745. [Google Scholar] [CrossRef]
Roodposhti, M.S.; Safarrad, T.; Shahabi, H. Drought sensitivity mapping using two one-class support vector machine algorithms. Atmos. Res. 2017, 193, 73–82. [Google Scholar] [CrossRef]
Choubin, B.; Soleimani, F.; Pirnia, A.; Sajedi-Hosseini, F.; Alilou, H.; Rahmati, O.; Melesse, A.M.; Singh, V.P.; Shahabi, H. Effects of drought on vegetative cover changes: Investigating spatiotemporal patterns. In Extreme Hydrology and Climate Variability; Elsevier: Amsterdam, The Netherlands, 2019; pp. 213–222. [Google Scholar]
Lee, S.; Panahi, M.; Pourghasemi, H.R.; Shahabi, H.; Alizadeh, M.; Shirzadi, A.; Khosravi, K.; Melesse, A.M.; Yekrangnia, M.; Rezaie, F. Sevucas: A novel gis-based machine learning software for seismic vulnerability assessment. Appl. Sci. 2019, 9, 3495. [Google Scholar] [CrossRef] [Green Version]
Alizadeh, M.; Alizadeh, E.; Asadollahpour Kotenaee, S.; Shahabi, H.; Beiranvand Pour, A.; Panahi, M.; Bin Ahmad, B.; Saro, L. Social vulnerability assessment using artificial neural network (ann) model for earthquake hazard in tabriz city, iran. Sustainability 2018, 10, 3376. [Google Scholar] [CrossRef] [Green Version]
Azareh, A.; Rahmati, O.; Rafiei-Sardooi, E.; Sankey, J.B.; Lee, S.; Shahabi, H.; Ahmad, B.B. Modelling gully-erosion susceptibility in a semi-arid region, iran: Investigation of applicability of certainty factor and maximum entropy models. Sci. Total Environ. 2019, 655, 684–696. [Google Scholar] [CrossRef] [PubMed]
Tien Bui, D.; Shirzadi, A.; Shahabi, H.; Chapi, K.; Omidavr, E.; Pham, B.T.; Talebpour Asl, D.; Khaledian, H.; Pradhan, B.; Panahi, M. A novel ensemble artificial intelligence approach for gully erosion mapping in a semi-arid watershed (iran). Sensors 2019, 19, 2444. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Tien Bui, D.; Shahabi, H.; Shirzadi, A.; Chapi, K.; Pradhan, B.; Chen, W.; Khosravi, K.; Panahi, M.; Bin Ahmad, B.; Saro, L. Land subsidence susceptibility mapping in south korea using machine learning algorithms. Sensors 2018, 18, 2464. [Google Scholar] [CrossRef] [Green Version]
Rahmati, O.; Samadi, M.; Shahabi, H.; Azareh, A.; Rafiei-Sardooi, E.; Alilou, H.; Melesse, A.M.; Pradhan, B.; Chapi, K.; Shirzadi, A. Swpt: An automated gis-based tool for prioritization of sub-watersheds based on morphometric and topo-hydrological factors. Geosci. Front. 2019, 10, 2167–2175. [Google Scholar] [CrossRef]
Pham, B.T.; Jaafari, A.; Prakash, I.; Singh, S.K.; Quoc, N.K.; Bui, D.T. Hybrid computational intelligence models for groundwater potential mapping. Catena 2019, 182, 104101. [Google Scholar] [CrossRef]
Tien Bui, D.; Shahabi, H.; Omidvar, E.; Shirzadi, A.; Geertsema, M.; Clague, J.J.; Khosravi, K.; Pradhan, B.; Pham, B.T.; Chapi, K. Shallow landslide prediction using a novel hybrid functional machine learning algorithm. Remote Sens. 2019, 11, 931. [Google Scholar] [CrossRef] [Green Version]
Pham, B.T.; Prakash, I.; Dou, J.; Singh, S.K.; Trinh, P.T.; Tran, H.T.; Le, T.M.; Van Phong, T.; Khoi, D.K.; Shirzadi, A. A novel hybrid approach of landslide susceptibility modelling using rotation forest ensemble and different base classifiers. Geocarto Int. 2019. [Google Scholar] [CrossRef]
Nguyen, V.V.; Pham, B.T.; Vu, B.T.; Prakash, I.; Jha, S.; Shahabi, H.; Shirzadi, A.; Ba, D.N.; Kumar, R.; Chatterjee, J.M. Hybrid machine learning approaches for landslide susceptibility modeling. Forests 2019, 10, 157. [Google Scholar] [CrossRef] [Green Version]
Nguyen, P.T.; Tuyen, T.T.; Shirzadi, A.; Pham, B.T.; Shahabi, H.; Omidvar, E.; Amini, A.; Entezami, H.; Prakash, I.; Phong, T.V. Development of a novel hybrid intelligence approach for landslide spatial prediction. Appl. Sci. 2019, 9, 2824. [Google Scholar] [CrossRef] [Green Version]
Shirzadi, A.; Solaimani, K.; Roshan, M.H.; Kavian, A.; Chapi, K.; Shahabi, H.; Keesstra, S.; Ahmad, B.B.; Bui, D.T. Uncertainties of prediction accuracy in shallow landslide modeling: Sample size and raster resolution. Catena 2019, 178, 172–188. [Google Scholar] [CrossRef]
Nhu, V.H.; Rahmati, O.; Falah, F.; Shojaei, S.; Al-Ansari, N.; Shahabi, H.; Shirzadi, A.; Górski, K.; Nguyen, H.; Ahmad, B.B. Mapping of Groundwater Spring Potential in Karst Aquifer System Using Novel Ensemble Bivariate and Multivariate Models. Water 2020, 12, 985. [Google Scholar] [CrossRef] [Green Version]
Pham, B.T.; Tien Bui, D.; Prakash, I.; Dholakia, M.B. Hybrid integration of multilayer perceptron neural networks and machine learning ensembles for landslide susceptibility assessment at himalayan area (india) using gis. Catena 2017, 149, 52–63. [Google Scholar] [CrossRef]
Pham, B.T.; Shirzadi, A.; Shahabi, H.; Omidvar, E.; Singh, S.K.; Sahana, M.; Asl, D.T.; Ahmad, B.B.; Quoc, N.K.; Lee, S. Landslide susceptibility assessment by novel hybrid machine learning algorithms. Sustainability 2019, 11, 4386. [Google Scholar] [CrossRef] [Green Version]
Pham, B.T.; Jaafari, A.; Prakash, I.; Bui, D.T. A novel hybrid intelligent model of support vector machines and the multiboost ensemble for landslide susceptibility modeling. Bull. Eng. Geol. Environ. 2019, 78, 2865–2886. [Google Scholar] [CrossRef]
Hong, H.; Liu, J.; Bui, D.T.; Pradhan, B.; Acharya, T.D.; Pham, B.T.; Zhu, A.-X.; Chen, W.; Ahmad, B.B. Landslide susceptibility mapping using j48 decision tree with adaboost, bagging and rotation forest ensembles in the guangchang area (china). Catena 2018, 163, 399–413. [Google Scholar] [CrossRef]
Shirzadi, A.; Bui, D.T.; Pham, B.T.; Solaimani, K.; Chapi, K.; Kavian, A.; Shahabi, H.; Revhaug, I. Shallow landslide susceptibility assessment using a novel hybrid intelligence approach. Environ. Earth Sci. 2017, 76, 60. [Google Scholar] [CrossRef]
Chen, W.; Shirzadi, A.; Shahabi, H.; Ahmad, B.B.; Zhang, S.; Hong, H.; Zhang, N. A novel hybrid artificial intelligence approach based on the rotation forest ensemble and naïve bayes tree classifiers for a landslide susceptibility assessment in langao county, china. Geomat. Nat. Hazards Risk 2017, 8, 1955–1977. [Google Scholar] [CrossRef] [Green Version]
Abedini, M.; Ghasemian, B.; Shirzadi, A.; Shahabi, H.; Chapi, K.; Pham, B.T.; Bin Ahmad, B.; Tien Bui, D. A novel hybrid approach of bayesian logistic regression and its ensembles for landslide susceptibility assessment. Geocarto Int. 2018, 34, 1427–1457. [Google Scholar] [CrossRef]
Miraki, S.; Zanganeh, S.H.; Chapi, K.; Singh, V.P.; Shirzadi, A.; Shahabi, H.; Pham, B.T. Mapping groundwater potential using a novel hybrid intelligence approach. Water Resour. Manag. 2019, 33, 281–302. [Google Scholar] [CrossRef]
Arabameri, A.; Chen, W.; Blaschke, T.; Tiefenbacher, J.P.; Pradhan, B.; Tien Bui, D. Gully head-cut distribution modeling using machine learning methods—A case study of nw iran. Water 2020, 12, 16. [Google Scholar] [CrossRef] [Green Version]
Ahmadlou, M.; Karimi, M.; Alizadeh, S.; Shirzadi, A.; Parvinnejhad, D.; Shahabi, H.; Panahi, M. Flood susceptibility assessment using integration of adaptive network-based fuzzy inference system (ANFIS) and biogeography-based optimization (BBO) and BAT algorithms (BA). Geocarto Int 2019, 34, 1252–1272. [Google Scholar] [CrossRef]
Galli, M.; Ardizzone, F.; Cardinali, M.; Guzzetti, F.; Reichenbach, P. Comparing landslide inventory maps. Geomorphology 2008, 94, 268–289. [Google Scholar] [CrossRef]
Guzzetti, F.; Carrara, A.; Cardinali, M.; Reichenbach, P. Landslide hazard evaluation: A review of current techniques and their application in a multi-scale study, central italy. Geomorphology 1999, 31, 181–216. [Google Scholar] [CrossRef]
Shirzadi, A.; Chapi, K.; Shahabi, H.; Solaimani, K.; Kavian, A.; Ahmad, B.B. Rock fall susceptibility assessment along a mountainous road: An evaluation of bivariate statistic, analytical hierarchy process and frequency ratio. Environ. Earth Sci. 2017, 76, 152. [Google Scholar] [CrossRef]
Shirzadi, A.; Soliamani, K.; Habibnejhad, M.; Kavian, A.; Chapi, K.; Shahabi, H.; Chen, W.; Khosravi, K.; Thai Pham, B.; Pradhan, B. Novel gis based machine learning algorithms for shallow landslide susceptibility mapping. Sensors 2018, 18, 3777. [Google Scholar] [CrossRef] [PubMed]
Chen, W.; Zhao, X.; Shahabi, H.; Shirzadi, A.; Khosravi, K.; Chai, H.; Zhang, S.; Zhang, L.; Ma, J.; Chen, Y.; et al. Spatial prediction of landslide susceptibility by combining evidential belief function, logistic regression and logistic model tree. Geocarto Int 2019, 34, 1177–1201. [Google Scholar] [CrossRef]
Pham, B.T.; Prakash, I.; Singh, S.K.; Shirzadi, A.; Shahabi, H.; Bui, D.T. Landslide susceptibility modeling using reduced error pruning trees and different ensemble techniques: Hybrid machine learning approaches. Catena 2019, 175, 203–218. [Google Scholar] [CrossRef]
Dehnavi, A.; Aghdam, I.N.; Pradhan, B.; Varzandeh, M.H.M. A new hybrid model using step-wise weight assessment ratio analysis (swara) technique and adaptive neuro-fuzzy inference system (anfis) for regional landslide hazard assessment in iran. Catena 2015, 135, 122–148. [Google Scholar] [CrossRef]
Froude, M.J.; Petley, D. Global fatal landslide occurrence from 2004 to 2016. Nat. Hazards Earth Syst. Sci. 2018, 18, 2161–2181. [Google Scholar] [CrossRef] [Green Version]
Nefeslioglu, H.A.; Duman, T.Y.; Durmaz, S. Landslide susceptibility mapping for a part of tectonic kelkit valley (eastern black sea region of turkey). Geomorphology 2008, 94, 401–418. [Google Scholar] [CrossRef]
Kavzoglu, T.; Sahin, E.K.; Colkesen, I. An assessment of multivariate and bivariate approaches in landslide susceptibility mapping: A case study of duzkoy district. Nat. Hazards 2015, 76, 471–496. [Google Scholar] [CrossRef]
Pham, B.T.; Bui, D.T.; Pourghasemi, H.R.; Indra, P.; Dholakia, M. Landslide susceptibility assesssment in the uttarakhand area (india) using gis: A comparison study of prediction capability of naïve bayes, multilayer perceptron neural networks, and functional trees methods. Theor. Appl. Climatol. 2017, 128, 255–273. [Google Scholar] [CrossRef]
Pham, B.T.; Pradhan, B.; Bui, D.T.; Prakash, I.; Dholakia, M. A comparative study of different machine learning methods for landslide susceptibility assessment: A case study of uttarakhand area (india). Environ. Model. Softw. 2016, 84, 240–250. [Google Scholar] [CrossRef]
He, S.; Ouyang, C.; Luo, Y. Seismic stability analysis of soil nail reinforced slope using kinematic approach of limit analysis. Environ. Earth Sci. 2012, 66, 319–326. [Google Scholar] [CrossRef]
Gorsevski, P.V.; Jankowski, P. An optimized solution of multi-criteria evaluation analysis of landslide susceptibility using fuzzy sets and kalman filter. Comput. Geosci. 2010, 36, 1005–1020. [Google Scholar] [CrossRef]
Oh, H.-J.; Pradhan, B. Application of a neuro-fuzzy model to landslide-susceptibility mapping for shallow landslides in a tropical hilly area. Comput. Geosci. 2011, 37, 1264–1276. [Google Scholar] [CrossRef]
Ercanoglu, M.; Gokceoglu, C. Assessment of landslide susceptibility for a landslide-prone area (north of yenice, nw turkey) by fuzzy approach. Environ. Geol. 2002, 41, 720–730. [Google Scholar]
Atkinson, P.M.; Massari, R. Autologistic modelling of susceptibility to landsliding in the central apennines, italy. Geomorphology 2011, 130, 55–64. [Google Scholar] [CrossRef]
Hengl, T.; Gruber, S.; Shrestha, D. Digital Terrain Analysis in Ilwis; International Institute for Geo-Information Science and Earth Observation: Enschede, The Netherlands, 2003; p. 62. [Google Scholar]
Talebi, A.; Uijlenhoet, R.; Troch, P.A. Soil moisture storage and hillslope stability. Nat. Hazards Earth Syst. Sci. 2007, 7, 523–534. [Google Scholar] [CrossRef] [Green Version]
Pourghasemi, H.R.; Jirandeh, A.G.; Pradhan, B.; Xu, C.; Gokceoglu, C. Landslide susceptibility mapping using support vector machine and gis at the golestan province, iran. J. Earth Syst. Sci. 2013, 122, 349–369. [Google Scholar] [CrossRef] [Green Version]
Moore, I.; Burch, G. Sediment transport capacity of sheet and rill flow: Application of unit stream power theory. Water Resour. Res. 1986, 22, 1350–1360. [Google Scholar] [CrossRef]
Iqbal, M. An Introduction to Solar Radiation; Elsevier: Amsterdam, The Netherlands, 2012. [Google Scholar]
Regmi, N.R.; Giardino, J.R.; McDonald, E.V.; Vitek, J.D. A comparison of logistic regression-based models of susceptibility to landslides in western colorado, USA. Landslides 2014, 11, 247–262. [Google Scholar] [CrossRef]
Moore, I.D.; Grayson, R.; Ladson, A. Digital terrain modelling: A review of hydrological, geomorphological, and biological applications. Hydrol. Process. 1991, 5, 3–30. [Google Scholar] [CrossRef]
Wilson, J.P.; Gallant, J.C. Terrain analysis: Principles and Applications; John Wiley & Sons: Hoboken, NJ, USA, 2000. [Google Scholar]
Park, S.; Choi, C.; Kim, B.; Kim, J. Landslide susceptibility mapping using frequency ratio, analytic hierarchy process, logistic regression, and artificial neural network methods at the inje area, korea. Environ. Earth Sci. 2013, 68, 1443–1464. [Google Scholar] [CrossRef]
Chowdhury, R.; Flentje, P.; Bhattacharya, G. Geotechnics in the Twenty-First Century, Uncertainties and Other Challenges: With Particular Reference to Landslide Hazard and Risk Assessment. In Proceedings of the International Symposium on Engineering under Uncertainty: Safety Assessment and Management (ISEUSAM-2012); Springer: New Delhi, India, 2013; pp. 27–53. [Google Scholar]
Cevik, E.; Topal, T. Gis-based landslide susceptibility mapping for a problematic segment of the natural gas pipeline, hendek (turkey). Environ. Geol. 2003, 44, 949–962. [Google Scholar] [CrossRef]
Nampak, H.; Pradhan, B.; Manap, M.A. Application of gis based data driven evidential belief function model to predict groundwater potential zonation. J. Hydrol. 2014, 513, 283–300. [Google Scholar] [CrossRef]
Barlow, J.; Martin, Y.; Franklin, S. Detecting translational landslide scars using segmentation of landsat etm+ and dem data in the northern cascade mountains, british columbia. Can. J. Remote Sens. 2003, 29, 510–517. [Google Scholar] [CrossRef]
Yang, W.; Wang, M.; Shi, P. Using modis ndvi time series to identify geographic patterns of landslides in vegetated regions. IEEE Geosci. Remote Sens. Lett. 2012, 10, 707–710. [Google Scholar] [CrossRef]
Hong, H.; Shahabi, H.; Shirzadi, A.; Chen, W.; Chapi, K.; Ahmad, B.B.; Roodposhti, M.S.; Hesar, A.Y.; Tian, Y.; Bui, D.T. Landslide susceptibility assessment at the wuning area, china: A comparison between multi-criteria decision making, bivariate statistical and machine learning methods. Nat. Hazards 2019, 96, 173–212. [Google Scholar] [CrossRef]
Demir, G.; Aytekin, M.; Akgun, A. Landslide susceptibility mapping by frequency ratio and logistic regression methods: An example from niksar–resadiye (tokat, turkey). Arab. J. Geosci. 2015, 8, 1801–1812. [Google Scholar] [CrossRef]
Donati, L.; Turrini, M.C. An objective method to rank the importance of the factors predisposing to landslides with the gis methodology: Application to an area of the apennines (valnerina; perugia, italy). Eng. Geol. 2002, 63, 277–289. [Google Scholar] [CrossRef]
Breiman, L.; Friedman, J.; Olshen, R.; Stone, C. Classification and Regression Trees; CRC Press: Boca Raton, FL, USA, 1984; Volume 37, pp. 237–251. [Google Scholar]
Kim, J.-C.; Lee, S.; Jung, H.-S.; Lee, S. Landslide susceptibility mapping using random forest and boosted tree models in pyeong-chang, korea. Geocarto Int. 2018, 33, 1000–1015. [Google Scholar] [CrossRef]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
Micheletti, N.; Foresti, L.; Robert, S.; Leuenberger, M.; Pedrazzini, A.; Jaboyedoff, M.; Kanevski, M. Machine learning feature selection methods for landslide susceptibility mapping. Math. Geosci. 2014, 46, 33–57. [Google Scholar] [CrossRef] [Green Version]
Svetnik, V.; Liaw, A.; Tong, C.; Culberson, J.C.; Sheridan, R.P.; Feuston, B.P. Random forest: A classification and regression tool for compound classification and qsar modeling. J. Chem. Inf. Comput. Sci. 2003, 43, 1947–1958. [Google Scholar] [CrossRef]
Polikar, R. Ensemble based systems in decision making. IEEE Circuits Syst. Mag. 2006, 6, 21–45. [Google Scholar] [CrossRef]
Rokach, L. Ensemble-based classifiers. Artif. Intell. Rev. 2010, 33, 1–39. [Google Scholar] [CrossRef]
Opitz, D.; Maclin, R. Popular ensemble methods: An empirical study. J. Artif. Intell. Res. 1999, 11, 169–198. [Google Scholar] [CrossRef]
Breiman, L. Bagging predictors. Mach. Learn. 1996, 24, 123–140. [Google Scholar] [CrossRef] [Green Version]
Breiman, L. Arcing the Edge; Technical Report 486; Statistics Department, University of California at Berkeley: Berkeley, CA, USA, 1997. [Google Scholar]
Schapire, R.E.; Freund, Y.; Bartlett, P.; Lee, W.S. Boosting the margin: A new explanation for the effectiveness of voting methods. Ann. Stat. 1998, 26, 1651–1686. [Google Scholar] [CrossRef]
Kleinberg, E.M. On the algorithmic implementation of stochastic discrimination. Ieee Trans. Pattern Anal. Mach. Intell. 2000, 22, 473–490. [Google Scholar] [CrossRef]
Ho, T.K. The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 1998, 20, 832–844. [Google Scholar]
Wang, X.; Tang, X. Random sampling for subspace face recognition. Int. J. Comput. Vis. 2006, 70, 91–104. [Google Scholar] [CrossRef]
Rodriguez, J.J.; Kuncheva, L.I.; Alonso, C.J. Rotation forest: A new classifier ensemble method. IEEE Trans. Pattern Anal. Mach. Intell. 2006, 28, 1619–1630. [Google Scholar] [CrossRef]
Caruana, R.; Niculescu-Mizil, A. Data Mining in Metric Space: An Empirical Analysis of Supervised Learning Performance Criteria. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, WA, USA, 22–25 August 2004; pp. 69–78. [Google Scholar]
Lavesson, N.; Davidsson, P. Generic Methods for Multi-Criteria Evaluation. In Proceedings of the 2008 SIAM International Conference on Data Mining, Atlanta, GA, USA, 24–26 April 2008; pp. 541–546. [Google Scholar]
Tien Bui, D.; Shirzadi, A.; Shahabi, H.; Geertsema, M.; Omidvar, E.; Clague, J.J.; Thai Pham, B.; Dou, J.; Talebpour Asl, D.; Bin Ahmad, B. New ensemble models for shallow landslide susceptibility modeling in a semi-arid watershed. Forests 2019, 10, 743. [Google Scholar] [CrossRef] [Green Version]
Kumar, R.; Indrayan, A. Receiver operating characteristic (roc) curve for medical researchers. Indian Pediatrics 2011, 48, 277–287. [Google Scholar] [CrossRef]
Akobeng, A.K. Understanding diagnostic tests 3: Receiver operating characteristic curves. Acta Paediatr. 2007, 96, 644–647. [Google Scholar] [CrossRef]
Wang, G.; Lei, X.; Chen, W.; Shahabi, H.; Shirzadi, A. Hybrid computational intelligence methods for landslide susceptibility mapping. Symmetry 2020, 12, 325. [Google Scholar] [CrossRef] [Green Version]
DeLeo, J.M. Receiver Operating Characteristic Laboratory (roclab): Software for Developing Decision Strategies that Account for Uncertainty. In Proceedings of the 1993 (2nd) International Symposium on Uncertainty Modeling and Analysis, College Park, MD, USA, 25–28 April 1993; pp. 318–325. [Google Scholar]
Friedman, M. The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J. Am. Stat. Assoc. 1937, 32, 675–701. [Google Scholar] [CrossRef]
Wilcoxon, F. Individual comparisons by ranking methods. In Breakthroughs in Statistics; Springer: Berlin/Heidelberg, Germany, 1992; pp. 196–202. [Google Scholar]
Hunt, E.B.; Marin, J.; Stone, P.J. Experiments in Induction; Academic Press: New York, NY, USA, 1966. [Google Scholar]
Quinlan, J.R. Machine learning, chap. Induction Decis. Trees 1986, 1, 81–106. [Google Scholar]
Peirolo, R. Information gain as a score for probabilistic forecasts. Meteorol. Appl. 2011, 18, 9–17. [Google Scholar] [CrossRef]
Rahmati, O.; Kornejady, A.; Samadi, M.; Deo, R.C.; Conoscenti, C.; Lombardo, L.; Dayal, K.; Taghizadeh-Mehrjardi, R.; Pourghasemi, H.R.; Kumar, S. Pmt: New analytical framework for automated evaluation of geo-environmental modelling approaches. Sci. Total Environ. 2019, 664, 296–311. [Google Scholar] [CrossRef]
Lombardo, L.; Fubelli, G.; Amato, G.; Bonasera, M. Presence-only approach to assess landslide triggering-thickness susceptibility: A test for the mili catchment (north-eastern sicily, italy). Nat. Hazards 2016, 84, 565–588. [Google Scholar] [CrossRef]
Bout, B.; Lombardo, L.; van Westen, C.J.; Jetten, V.G. Integration of two-phase solid fluid equations in a catchment model for flashfloods, debris flows and shallow slope failures. Environ. Model. Softw. 2018, 105, 1–16. [Google Scholar] [CrossRef]
Chen, W.; Shahabi, H.; Shirzadi, A.; Hong, H.; Akgun, A.; Tian, Y.; Liu, J.; Zhu, A.-X.; Li, S. Novel hybrid artificial intelligence approach of bivariate statistical-methods-based kernel logistic regression classifier for landslide susceptibility modeling. Bull. Eng. Geol. Environ. 2019, 78, 4397–4419. [Google Scholar] [CrossRef]
Hong, H.; Panahi, M.; Shirzadi, A.; Ma, T.; Liu, J.; Zhu, A.-X.; Chen, W.; Kougias, I.; Kazakis, N. Flood susceptibility assessment in hengfeng area coupling adaptive neuro-fuzzy inference system with genetic algorithm and differential evolution. Sci. Total Environ. 2018, 621, 1124–1141. [Google Scholar] [CrossRef]
Lombardo, L.; Opitz, T.; Huser, R. Point process-based modeling of multiple debris flow landslides using inla: An application to the 2009 messina disaster. Stoch. Environ. Res. Risk Assess. 2018, 32, 2179–2198. [Google Scholar] [CrossRef] [Green Version]
Kuncheva, L. Combining Pattern Classifiers Methods and Algorithms; John Wiley & Sons. Inc.: Hoboken, NJ, USA, 2004. [Google Scholar]
Bui, D.T.; Pradhan, B.; Nampak, H.; Bui, Q.-T.; Tran, Q.-A.; Nguyen, Q.-P. Hybrid artificial intelligence approach based on neural fuzzy inference model and metaheuristic optimization for flood susceptibilitgy modeling in a high-frequency tropical cyclone area using gis. J. Hydrol. 2016, 540, 317–330. [Google Scholar]
Kadavi, P.; Lee, C.-W.; Lee, S. Application of ensemble-based machine learning models to landslide susceptibility mapping. Remote Sens. 2018, 10, 1252. [Google Scholar] [CrossRef] [Green Version]
Pham, B.T.; Prakash, I.; Bui, D.T. Spatial prediction of landslides using a hybrid machine learning approach based on random subspace and classification and regression trees. Geomorphology 2018, 303, 256–270. [Google Scholar] [CrossRef]
Bui, D.T.; Ho, T.-C.; Pradhan, B.; Pham, B.-T.; Nhu, V.-H.; Revhaug, I. Gis-based modeling of rainfall-induced landslides using data mining-based functional trees classifier with adaboost, bagging, and multiboost ensemble frameworks. Environ. Earth Sci. 2016, 75, 1101. [Google Scholar]
He, Q.; Xu, Z.; Li, S.; Li, R.; Zhang, S.; Wang, N.; Pham, B.T.; Chen, W. Novel entropy and rotation forest-based credal decision tree classifier for landslide susceptibility modeling. Entropy 2019, 21, 106. [Google Scholar] [CrossRef] [Green Version]
Camilo, D.C.; Lombardo, L.; Mai, P.M.; Dou, J.; Huser, R. Handling high predictor dimensionality in slope-unit-based landslide susceptibility models through lasso-penalized generalized linear model. Environ. Model. Softw. 2017, 97, 145–156. [Google Scholar] [CrossRef] [Green Version]
Lombardo, L.; Cama, M.; Conoscenti, C.; Märker, M.; Rotigliano, E. Binary logistic regression versus stochastic gradient boosted decision trees in assessing landslide susceptibility for multiple-occurring landslide events: Application to the 2009 storm event in messina (sicily, southern italy). Nat. Hazards 2015, 79, 1621–1648. [Google Scholar] [CrossRef]
Guo, C.; Montgomery, D.R.; Zhang, Y.; Wang, K.; Yang, Z. Quantitative assessment of landslide susceptibility along the xianshuihe fault zone, tibetan plateau, china. Geomorphology 2015, 248, 93–110. [Google Scholar] [CrossRef]
Youssef, A.M.; Pradhan, B.; Jebur, M.N.; El-Harbi, H.M. Landslide susceptibility mapping using ensemble bivariate and multivariate statistical models in fayfa area, saudi arabia. Environ. Earth Sci. 2015, 73, 3745–3761. [Google Scholar] [CrossRef]
Arabameri, A.; Pradhan, B.; Rezaei, K.; Sohrabi, M.; Kalantari, Z. Gis-based landslide susceptibility mapping using numerical risk factor bivariate model and its ensemble with linear multivariate regression and boosted regression tree algorithms. J. Mt. Sci. 2019, 16, 595–618. [Google Scholar] [CrossRef]
Lombardo, L.; Mai, P.M. Presenting logistic regression-based landslide susceptibility results. Eng. Geol. 2018, 244, 14–24. [Google Scholar] [CrossRef]
Nasiri, V.; Darvishsefat, A.A.; Rafiee, R.; Shirvany, A.; Hemat, M.A. Land use change modeling through an integrated multi-layer perceptron neural network and Markov chain analysis (Case study: Arasbaran region, Iran). J. For. Res. 2019, 30, 943–957. [Google Scholar] [CrossRef]

Figure 1. Study area and locations of landslides in Bijar County, Kurdistan, Iran.

Figure 2. Landslide susceptibility mapping workflow.

Figure 3. The most significant landslide conditioning factors determined in the modeling phase.

Figure 4. Graphical display of optimal parameters for different numbers of iterations and seeds in the modeling phase (training dataset, panels a-f) and the validation phase (validation dataset, panels g-l). (a) Random Subspace (RS) model based on number of iterations and root mean square error (RMSE) (b) RS model based on number of iterations and area under the receiver operatic characteristic curve (AUC). (c) Random Forest (RF) model based on number of iterations and RMSE. (d) RF model based on number of iterations and AUC. (e) Bootstrap Aggregating (BA) model based on number of iterations and RMSE. (f) BA model based on number of iterations and AUC. (g) RS model based on number of seeds and RMSE. (h) RS model based on number of seeds and AUC. (i) RF model based on number of seeds and RMSE. (j) RF model based on number of seeds and AUC. (k) BA model based on number of seeds and RMSE. (l) BA model based on number of seeds and AUC.

Figure 5. Histograms of all models used to prepare the landslides susceptibility maps using three classification methods (Natural break, Quantile, Geometrical interval): (a) RF, (b) RS-RF, (c) Rot-RF, (d) Bag-RF.

Figure 6. Shallow landslide susceptibility maps produced using the (a) RAF model, (b) RF-RAF model, (c) BA-RF model, and (d) RS-RF model.

Figure 7. Receiver operating characteristic (ROC) curves and AUC values of the machine learning models. (a) RAF model, training dataset. (b) RAF model, validation dataset. (c) Ensemble models, training dataset. (d) Ensemble models, validation dataset.

Table 1. Landslide conditioning factors of the study area.

	Conditioning Factors	Classes
Topographic factors	Slope (°)	(1) 0–5, (2) 5–10, (3)10–15, (4) 15–20, (5) 20–25, (6) 25–30, (7) 30–45, (8) >45
	Aspect	(1) flat, (2) north, (3) northeast, (4) east, (5) southeast, (6) south, (7) southwest, (8) west, (9) northwest
	Elevation (m)	(1) 1573–1700, (2) 1700–1800, (3) 1800–1900, (4) 1900–2000, (5) 2000–2100, (6) 2100–2200, (7) 2200–2300, (8) 2300–2400, (9) >2400
	Curvature (m⁻¹)	(1) [(−12.5)–(−1.4)], (2) [(−1.4)–(−0.4)], (3) [(−0.4)–(−0.2)], (4) [(−0.2)–0.9], (5) [0.9–2.5], (6) [2.5–15.6]
	Plan curvature (m⁻¹)	(1) [(−6.7)–(−0.8)], (2) [(−0.8)–(−0.2)], (3) [(−0.2)–0], (4) [0–0.4], (5) [0.4–1.1], (6) [1.1–10.4]
	Profile curvature (m⁻¹)	(1) [(−10.7)–(−1.7)], (2) [(−1.7)–(−0.7)], (3) [(−0.7)–(−0.2)], (4) [(−0.2)–0.2], (5) [0.2–0.9], (6) [0.9–7.5]
	LS/STI	(1) 0–7, (2) 7–14, (3) 14–21, (4) 21–28, (5) 28–35, (6) 35–42
	Annual solar radiation (hr)	(1) 3.015–6.563, (2) 5.563–6.747, (3) 6.747–6.849, (4) 6.849–6.930, (5) 6.930–7.073, (6) 7.073–7.236, (7) 7.236–8.215
Triggering factor	Rainfall (mm)	(1) 263–270, (2) 270–300, (3) 300–330, (4) 330–360, (5) 360–390, (6) 390–420, (7) 420–450
Hydrological factors	SPI	(1) 0–998, (2) 998–6986, (3) 6986–19961, (4) 19961–45911, (5) 45911–101803, (6) 101803–255505
	TWI	(1) 1–3, (2) 3–4, (3) 4–6, (4) 6–8, (5) 8–9, (6) 9–11
	Distance to rivers (m)	(1) 0–50, (2) 50–100, (3) 100–150, (4) 150–200, (5) >200
	River density (km/km²)	(1) 0–1.9, (2) 1.9–3.2, (3) 3.2–4.2, (4) 4.2–5.2, (5) 5.2–6.3, (6) 6.3–7.8, (7) 7.8–13.2
Geologic factors	Lithology	Quaternary, (2) Tertiary (3) Cretaceous
	Distance to faults (m)	(1) 0–200, (2) 200–400, (3) 400–600, (4) 600–800, (5) 800–1000, (6) >1000
	Fault density (km/km²)	(1) 0–0.3, (2) 0.3–0.8, (3) 0.8–1.2, (4) 1.2–1.7, (5) 1.7–2.1, (6) 2.1–2.5, (7) 2.5–3.2
Land cover factors	Land use	(1) residential area, (2) arable land (dry farming and cultivated lands), (3) woodland, (4) grassland, (5) barren land
Land cover factors	NDVI	(1) [(−0.23)–(−0.061)], (2) [(−0.061)–(−0.0081)], (3) [(−0.0081)–(0.060)], (4) [(0.060)–0.14], (5) [0.14–0.24], (6) [0.24–0.41], (7) [0.41–0.73]
Man-made factors	Distance to roads (m)	(1) 0–50, (2) 50–100, (3) 100–150, (4) 150–200, (5) >200
Man-made factors	Road density (km/km²)	(1) 0–0.0013, (2) 0.0013–0.0027, (3) 0.0027–0.0041, (4) 0.0041–0.0055, (5) 0.0055–0.0069, (6) 0.0069–0.0083, (7) 0.0083–0.0097

Table 2. Optimal parameters of shallow landslide models for the study area.

Model	Parameters
Model	Seeds	Iterations	RMSEtrain	RMSEtest	AUCtrain	AUCtest
RF	7	16	0.274	0.307	0.970	0.958
BA	8	12	0.281	0.310	0.976	0.948
RS	9	10	0.311	0.337	0.939	0.933

Table 3. Model performance (goodness-of-fit) for the training dataset.

	RAF	RF-RAF	BA-RAF	RS-RAF
TP	73	85	82	77
TN	82	83	83	83
FP	16	4	7	12
FN	7	6	6	7
Sensitivity	0.913	0.934	0.928	0.917
Specificity	0.837	0.954	0.874	0.874
Accuracy	0.871	0.944	0.899	0.894
Kappa	0.741	0.832	0.805	0.865
RMSE	0.333	0.274	0.281	0.311
AUC	0.871	0.976	0.970	0.933

Table 4. Model performance (prediction accuracy) for the validation dataset.

	RAF	RF-RAF	BA-RAF	RS-RAF
TP	17	21	18	17
TN	18	20	19	19
FP	5	1	4	5
FN	4	2	3	3
Sensitivity	0.810	0.913	0.857	0.850
Specificity	0.783	0.952	0.826	0.792
Accuracy	0.795	0.932	0.841	0.818
Kappa	0.727	0.767	0.743	0.731
RMSE	0.410	0.307	0.310	0.337
AUC	0.864	0.958	0.948	0.933

Table 5. Performance of the shallow landslide models based on the Friedman’s test.

No	Shallow Landslide Models	Mean Ranks	χ2	Significance
1	RAF	1.03	62.157	0.000
2	RF-RAF	1.23
3	BA-RAF	2.48
4	RS-RAF	1.17

Table 6. Performance of the shallow landslide models based on the Wilcoxon signed-rank test.

NO	Pair-Wise Comparison	NPD	NND	z-Value	p-Value	Significance
1	RF-RAF vs. RAF	50	61	−2.016	0.000	Yes
2	BA-RAF vs. RAF	75	86	−1.240	0.000	Yes
3	RS-RAF vs. RAF	64	58	−1.029	0.013	Yes
4	RF-RAF vs. BA-RAF	86	45	−3.734	0.000	Yes
5	RF-RAF vs. RS-RAF	73	58	−3.237	0.000	Yes
6	BA-RAF vs. RS-RAF	82	63	−1.581	0.075	No

Notes: The standard p value: 0.05; NPD: Number of positive differences; NND: Number of negative differences.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Nhu, V.-H.; Shirzadi, A.; Shahabi, H.; Chen, W.; Clague, J.J.; Geertsema, M.; Jaafari, A.; Avand, M.; Miraki, S.; Talebpour Asl, D.; et al. Shallow Landslide Susceptibility Mapping by Random Forest Base Classifier and Its Ensembles in a Semi-Arid Region of Iran. Forests 2020, 11, 421. https://0-doi-org.brum.beds.ac.uk/10.3390/f11040421

AMA Style

Nhu V-H, Shirzadi A, Shahabi H, Chen W, Clague JJ, Geertsema M, Jaafari A, Avand M, Miraki S, Talebpour Asl D, et al. Shallow Landslide Susceptibility Mapping by Random Forest Base Classifier and Its Ensembles in a Semi-Arid Region of Iran. Forests. 2020; 11(4):421. https://0-doi-org.brum.beds.ac.uk/10.3390/f11040421

Chicago/Turabian Style

Nhu, Viet-Ha, Ataollah Shirzadi, Himan Shahabi, Wei Chen, John J Clague, Marten Geertsema, Abolfazl Jaafari, Mohammadtaghi Avand, Shaghayegh Miraki, Davood Talebpour Asl, and et al. 2020. "Shallow Landslide Susceptibility Mapping by Random Forest Base Classifier and Its Ensembles in a Semi-Arid Region of Iran" Forests 11, no. 4: 421. https://0-doi-org.brum.beds.ac.uk/10.3390/f11040421

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Shallow Landslide Susceptibility Mapping by Random Forest Base Classifier and Its Ensembles in a Semi-Arid Region of Iran

Abstract

1. Introduction

2. Study Area

3. Data Preparation

3.1. Landslide Inventory Map

3.2. Landslide Conditioning Factors

4. Machine Learning Models

4.1. Random Forest Decision Tree-Base Classifier

4.2. Ensemble Models

4.2.1. Bagging

4.2.2. Random Subspace

4.2.3. Rotation Forest

4.3. Model Validation and Comparison

4.3.1. Statistical Metrics

4.3.2. ROC and AUC

4.3.3. Friedman and Wilcoxon Sign Rank Tests

4.4. Factor Selection Using the Information Gain Ratio Technique

5. Analysis and Results

5.1. Factor Selection in Modeling Landslides

5.2. Modeling Process and Evaluations

5.3. Preparation of Landslide Susceptibility Maps

5.4. Verification of Landslide Susceptibility Maps

ROC Curve and AUC

6. Discussion

7. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI