New Ensemble Models for Shallow Landslide Susceptibility Modeling in a Semi-Arid Watershed

Tien Bui, Dieu; Shirzadi, Ataollah; Shahabi, Himan; Geertsema, Marten; Omidvar, Ebrahim; Clague, John J.; Thai Pham, Binh; Dou, Jie; Talebpour Asl, Dawood; Bin Ahmad, Baharin; Lee, Saro

doi:10.3390/f10090743

Open AccessArticle

New Ensemble Models for Shallow Landslide Susceptibility Modeling in a Semi-Arid Watershed

by

Dieu Tien Bui

^1,2

,

Ataollah Shirzadi

³

,

Himan Shahabi

^4,*

,

Marten Geertsema

⁵,

Ebrahim Omidvar

⁶

,

John J. Clague

⁷

,

Binh Thai Pham

⁸

,

Jie Dou

⁹,

Dawood Talebpour Asl

⁴,

Baharin Bin Ahmad

¹⁰ and

Saro Lee

^11,12,*

¹

Geographic Information Science Research Group, Ton Duc Thang University, Ho Chi Minh City, Vietnam

²

Faculty of Environment and Labor Safety, Ton Duc Thang University, Ho Chi Minh City, Vietnam

³

Department of Rangeland and Watershed Management, Faculty of Natural Resources, University of Kurdistan, Sanandaj 66177-15175, Iran

⁴

Department of Geomorphology, Faculty of Natural Resources, University of Kurdistan, Sanandaj 66177-15175, Iran

⁵

British Columbia, Ministry of Forests, Lands, Natural Resource Operations and Rural Development, Prince George, BC V2L 1R5, Canada

⁶

Department of Rangeland and Watershed Management, Faculty of Natural Resources and Earth Sciences, University of Kashan, Kashan 87317-53153, Iran

⁷

Department of Earth Sciences, Simon Fraser University 8888 University Drive Burnaby, Burnaby, BC V5A 1S6, Canada

⁸

Institute of Research and Development, Duy Tan University, Da Nang 550000, Vietnam

⁹

Department of Civil and Environmental Engineering, Nagaoka University of Technology, 1603-1, Kami-Tomioka, Nagaoka, Niigata 940-2188, Japan

¹⁰

Faculty of Built Environment and Surveying, Universiti Teknologi Malaysia (UTM), Johor Bahru 81310, Malaysia

¹¹

Geoscience Platform Research Division, Korea Institute of Geoscience and Mineral Resources (KIGAM), 124 Gwahak-ro, Yuseong-gu, Daejeon 34132, Korea

¹²

Department of Geophysical Exploration, Korea University of Science and Technology, 217 Gajeong-ro, Yuseong-gu, Daejeon 34113, Korea

Show full affiliation list

Hide full affiliation list

^*

Authors to whom correspondence should be addressed.

Forests 2019, 10(9), 743; https://0-doi-org.brum.beds.ac.uk/10.3390/f10090743

Submission received: 26 June 2019 / Revised: 21 August 2019 / Accepted: 24 August 2019 / Published: 28 August 2019

(This article belongs to the Special Issue Watershed Scale Forest Restoration and Sustainable Development)

Download

Browse Figures

Versions Notes

Abstract

:

We prepared a landslide susceptibility map for the Sarkhoon watershed, Chaharmahal-w-bakhtiari, Iran, using novel ensemble artificial intelligence approaches. A classifier of support vector machine (SVM) was employed as a base classifier, and four Meta/ensemble classifiers, including Adaboost (AB), bagging (BA), rotation forest (RF), and random subspace (RS), were used to construct new ensemble models. SVM has been used previously to spatially predict landslides, but not together with its ensembles. We selected 20 conditioning factors and randomly portioned 98 landslide locations into training (70%) and validating (30%) groups. Several statistical metrics, including sensitivity, specificity, accuracy, kappa, root mean square error (RMSE), and area under the receiver operatic characteristic curve (AUC), were used for model comparison and validation. Using the One-R Attribute Evaluation (ORAE) technique, we found that all 20 conditioning factors were significant in identifying landslide locations, but “distance to road” was found to be the most important. The RS (AUC = 0.837) and RF (AUC = 0.834) significantly improved the goodness-of-fit and prediction accuracy of the SVM (AUC = 0.810), whereas the BA (AUC = 0.807) and AB (AUC = 0.779) did not. The random subspace based support vector machine (RSSVM) model is a promising technique for helping to better manage land in landslide-prone areas.

Keywords:

shallow landslide; machine learning; goodness-of-fit; factor selection; GIS; Iran

1. Introduction

Landsliding is a complex natural phenomenon and is responsible for considerable loss of life and damage to engineered infrastructure worldwide [1]. Considerable efforts are being made to understand the causes and triggers of landslides to reduce these losses [2]. Considerable research has been carried out to identify and classify terrain that is susceptible to landslides. A common key strategy is to prepare a landslide susceptibility map that shows areas that are prone to landslides within a region of interest. Such maps can provide understandable and useful information that helps planners better manage susceptible areas [3,4,5]. As yet, there is no universal approach for preparing landslide susceptibility maps, thus landslide researchers are experimenting with new methods and techniques [6]. Novel quantitative approaches proposed by, researchers in recent years include logistic regression [7,8,9], multivariate regression [10,11,12], discriminant analysis [13,14], certainty factor [15,16], index or entropy (IOE) [17,18], spatial multi-criteria evaluation (SMCE) [19], statistical index [20,21], and analytic hierarchy processes [22,23].

Recently, machine learning (ML) techniques have become popular for spatial predictions of natural hazards, including wildfire [24], sinkhole development [25], flooding [26,27,28,29,30,31,32,33,34,35,36,37], drought [38], gully erosion [39,40], earthquakes [41], land/ground subsidence [42], and landslides [4,43,44,45,46]. ML is a subdivision of artificial intelligence (AI) and uses computer techniques to analyze and forecast information by learning from training data. ML algorithms that have been used for landslide susceptibility mapping include support vector machine [17,21,47,48,49], artificial neural network [50,51], decision trees such as naïve Bayes tree (NBT) [5,52], radial basis function (RBF) [53], kernel logistic regression (KLR) [54,55], Bayes’ net (BN) [56], bivariate statistical index (SI) [57], stochastic gradient descent (SGD) [58], particle swarm optimization (PSO) [59], best-first decision tree (BFDT) [60], random subspace-based support vector machines (RSSVM) [61], and logistic model tree (LMT) [17]. Ensemble models have been used in landslide susceptibility mapping because of their novelty and their ability to comprehensively assess landslide-related parameters for discrete classes of independent factors [26,28,34,44,62,63,64,65,66]. Applications of ensemble models improve precision and prediction relative to standard ML models. In particular, the performance of single machine learning methods is greatly improved using ensemble techniques such as AdaBoost, MultiBoost, Bagging, and Rotation Forest [44]. Ensemble methods use the learning classifier to integrate the different ML methods and produce hybrid models that produce better results [29,67,68,69].

Machine learning models are dissimilar to hydrological models that are based on hydro-geomorphological conditions within a single watershed and must be calibrated for use in other watersheds. Machine learning algorithms are not subject to this limitation; rather they are applicable in all situations. The algorithms have different distribution functions, which result in different performance. This characteristic of machine learning models has led to the development of several algorithms, with the best ones chosen to prepare susceptibility maps for specific regions. Differences in the performance of machine learning algorithms stem from differences in inputs (conditioning factors) to the model, because the distribution function of a particular algorithm is constant. Conditioning factors differ regionally; therefore, the performance of a particular algorithm is also different. Important differences relate to climate and land use, including earthworks (cuts and fills), urbanization (buildings and roads), and altered drainage [70,71]. Climate change can change the occurrence and intensity of extreme precipitation events, and hence may affect the occurrence of rainfall-triggered landslides [72,73,74,75,76].

A method for accurately predicting the spatial distribution of landslides is one of the controversial researched subjects among landslide researchers. The more the accurate method, the lower the damages from landslides will be. Indeed, with a reliable and timely landslide prediction, fatal events will be reduced and infrastructure better protected. Newly developed landslide prediction models have been proposed for use around the world. However, it is widely acknowledged that model uncertainties preclude the universal application of a single model. Therefore, for a given case study, powerful and flexible models should be tested and evaluated to generate the most accurate landslide susceptibility map possible. The primary objective of our research is to increase knowledge of landslide susceptibility mapping by introducing some novel machine learning ensemble approaches that combine the SVM algorithm as a base classifier with the AB, BA, RF, and RS ensemble models. The SVM is a powerful and functional algorithm that can be used as a benchmark model when a new model is developed and tested. Although it has been widely applied in landslide susceptibility mapping, SVM ensembles have not previously been used. We assess the performance of our proposed new ensemble models with a variety of statistical metrics and the AUC.

2. The Study Area

The study area is the Sarkhoon watershed in the Chahar Mahaal and Bakhtiari province, Iran (50°25.4′ E–50°38.45′ E, 31°42.05′ N–31°52.05′ N) (Figure 1). The watershed has an area of about 187 km² and ranges in elevation from 1370 m to 3376 m above sea level. The mean slope angle is 22.6 degrees, but some slopes are nearly vertical. The Sarkhoon watershed is one of several catchments that drain to the Persian Gulf via the Karoon River. Precipitation data provided by the regional water company of Chaharmahal-Bakhtiari province indicate that the average annual rainfall is about 523 mm in a semi-arid region at the mouth of the catchment and 1360 mm in its headwaters; the average for the entire watershed is 848 mm/year. Nearly 77% of the watershed is pasture land; the remainder includes dense forest (13.6%), semi-dense forest (3.4%), dry-farming land (2.4%), and barren land (0.7%). The area is dominated by sedimentary rocks of Cretaceous age, including marl of the Mishan Formation, sandstone and marl of the Aghajari Formation, conglomerate and sandstone of the Bakhtiari Formation, dolomite of the Jahrum Formation, limestone of the Tarbur Formation, limestone of the Sarvak-Illam Formation, and marl and limestone of the Gurpi Formation (Table 1). Quaternary alluvium is found on terraces and along floodplains and channels [58]. Our inventory of 98 landslide points included 55 transnational slides, 22 complex landslides and 21 rotational slides ranging in size from 100 to 60,000 m² (Figure 2).

3. Materials and Methods

3.1. Data Collection and Processing

3.1.1. Datasets

We prepared a landslide inventory map to determine relationships between the spatial pattern of landslides and different conditioning factors [52]. We identified 27 landslide polygons using the archived 1:20,000-scale aerial photograph of the study area from Natural Cartographic Center of Iran (NCC) (Figure 1). We then determined their exact locations in the field with a global positioning system (GPS). The polygons of landslides were converted to the points using “feature to point” in ArcGIS 10.2 environment, with a focus on the center point of landslide scarps. An additional 71 landslides have previously been identified by natural resource managers employed by Chaharmahal and Bakhtiary province. These features were located through field survey. We digitized all 98 landslide points to produce the landslide inventory map (Figure 1). Fifty five (56.1%) are translational slides, 21 (21.7%) are slumps, and 22 (22.5%) are compounds. The largest landslide has an area of about 60,000 m² (384 pixels) and is located in the eastern part of the watershed. The examples shown in Figure 2 provide visualizations of the character and sizes of landslides in the study area.

In addition to landslide locations, a sufficient number of non-landslide locations are required to produce a landslide susceptibility map. We employed a random sampling strategy [77,78,79] to select a total of 98 points without landslides. These points include stable slopes, plateaus, and flood plains with different characteristics that are distributed uniformly in the Sarkhoon watershed. We next divided the landslide and non-landslide locations into a training dataset (70% of the sites) and a validation dataset (30% of the sites).

3.1.2. Landslide Conditioning Factors

We chose 20 conditioning factors for modeling based on our knowledge of the study area, a literature review, and expert opinion. They include land cover/use, lithology, rainfall, elevation, slope angle, aspect, sediment transport index (STI) or slope length and steepness factor of USLE, general curvature, profile curvature, plan curvature, longitudinal curvature, tangential curvature, solar radiation, stream power index (SPI), topographic position index (TPI), topographic wetness index (TWI), terrain roughness index (TRI), distance to rivers, distance to roads, and distance to faults. The land cover/use map has seven classes: dry farm land, semi-dense forest, dense forest, semi-dense pasture, dense pasture, and residential areas. This map was extracted from Landsat 8 (25 April 2016) with a supervised machine learning classification algorithm of support vector machine (SVM). We constructed the lithology and fault maps from 1:100,000-scale geology maps prepared by Geological Survey and Mineral Explorations of Iran (Ardales and Dehdez sheets). Ten geologic units are included on the lithology map (Table 1). A 50-m buffer was added to mapped faults to prepare the distance-to-fault map. The annual rainfall map was constructed from 42 years of rainfall data (1972–2014) from nine meteorological stations, which were supplied by the regional water company of Chaharmahal-Bakhtiari province. We extrapolated the data spatially based on the elevations of the meteorological stations. The geomorphological factor maps were derived from a 12.5-m Digital Terrain Model (DTM) extracted from ALOS PALSAR data that were provided by the Alaska Satellite Facility (https://vertex.daac.asf.alaska.edu/#). The elevation, slope angle, aspect, STI/LS, general, profile, plan, longitudinal and tangential curvatures, solar radiation, SPI, TPI, TWI, TRI, and distance to river maps were constructed from the DEM using ArcGIS 10.2 and SAGA 6.0.0 software. The distance-to-road map was constructed from the road network provided by the Iranian National Cartographic Center (INCC) in DGN format and at a scale of 1:25,000 (Figure 3).

3.1.3. Factor Selection Using the One-R Attribute Evaluation (Orae) Technique

Factor or feature selection is an important part of machine learning modeling. Feature selection techniques affect model performance by increasing data quality, which results in improved comprehension of the machine learning algorithm results [80]. These techniques decrease the dimensionality of feature space, prevent redundancy, and decrease noise and over-fitting problems [81]. Wrapper and filter methods are available for selecting the most important features or factors for modeling. Machine learning algorithms can use wrapper methods to make factor accuracy estimates. Filter methods use statistical correlation between a set of factors and the target factor to select the most important factors. In general, filter methods are faster and more scalable than wrapper techniques, and also have lower computational complexity [81]. We used a filter technique, One-R Attribute Evaluation (ORAE), in this study. ORAE as a simple feature-selection algorithm first proposed by Holte [82]. For each factor, one rule (one-R) in the training dataset is separately constructed, and the rule with the smallest error metric is selected for modeling. We computed error metrics for each factor and each factor value. Based on the smallest calculated error metric, ORAE will independently sort all factors [83,84]. Figure 4 shows the code of the One-R algorithm for factor selection.

3.2. Modeling Process

3.2.1. Support Vector Machine (Svm) Algorithm

Support Vector Machine (SVM), introduced by Vapnik and Guyon [85,86], is a machine learning classifier that has been used to solve many real world problems, including landslide prediction [49,87], flood prediction [88], and forest fire prediction [89]. It is based on the principle of structural risk minimization of statistical learning theory to reduce the error and the complexity of computation. In using the SVM, an optimal hyper-plane is constructed in order to separate two classes and one class, assigned as “1”, is located above the hyper-plane and the other, assigned “0”, is located below it.

3.2.2. Ensemble/Meta Classifier Algorithms

We used the following meta-classifier algorithms (Figure 5): AdaBoost (AB), Bagging (BA), Random Subspace [1], and Rotation Forest (RF). Each of these meta-classifiers was first processed with a classifier and then with an ensemble. Model development and evaluation were done with ArcGIS 10.2 and Weka 3.9 software. We schematically summarize the workflow of our study in Figure 5. The major steps in this procedure are (1) data collection and processing (input), (2) modeling, and (3) evaluation.

AdaBoost is a popular data-boosting algorithm involving iterative training of a series of individual classifiers in an ensemble [90]. It can provide solutions for two-class, multi-class single-label, multi-class multi-label, single label category, and regression problems [91]. We followed the approach of Hong et al. [19] and used AdaBoost to adaptively resample data in choosing training samples. We used successive iterations to assign weights to the data, such that the subsequent iteration incorporates reweighted data that were previously misclassified. This approach provided us with a weighted sum of ensemble predictions.

Bagging is a bootstrap sampling technique used with ensemble methods [92]. We used BA to randomly sample and replace data while generating multi-sample training subsets. Generated subsets were used to create a decision tree and later incorporate them into the final model. We used this technique to improve our classification accuracy by decreasing the variance of classification errors.

We used the random subspace algorithm as our third classifier. Boot [93] reported that averaging forecasts constructed from many randomly selected fixed-size subsets substantially lower the mean squared forecast error. Instead of selecting subsets of available predictors, random projection regression creates a low-dimensional subspace by averaging over-predictors using random weights drawn from a normal distribution. In this study, we followed the methodology of Pham et al. [94], who used the RS method to assess landslide hazards in Vietnam.

Rotation Forest is an ensemble technique developed by Rodriguez et al. [95] to improve the predictive capability of single classifiers. We trained the RF model by sequentially: (1) deriving subsets after dividing the attribute sets; (2) resampling the sample subsets and transforming features on the generated subsets; (3) realigning the rotation matrix according to the sequence of original attribute sets; (4) training base classifiers using the rotated sample subsets; and (5) integrating the results of different base classifiers on different sample subsets to arrive at the final outcome (see Equation (1) in Rodriguez et al. [95] for the matrix equation that we used). The flowchart for the landslide susceptibility modeling and analysis of spatial data of the watershed is shown in Figure 5.

3.3. Performance and Evaluation of the Landslide Models

Model validation is essential to confirm the validity of results. Indeed, both the training and validation datasets should be checked for, respectively, model goodness-of-fit and prediction accuracy [68]. In this study, statistical metrics and area under the receiver operating characteristic curve (AUC) were used to evaluate the performance of the SVM algorithm and its ensembles.

3.3.1. Statistical Metrics

We used sensitivity, specificity, accuracy, and kappa metrics to evaluate and test the performance of our machine learning algorithms. All these metrics can be formulated on the basis of four types of possible consequences (PC)—True Positive (TP), False Positive (FP), True Negative (TN), and False Negative (FN). If the landslide pixels are correctly or incorrectly classified as landslide, TP and FP are, respectively, dominant. In contrast, if the landslide pixels are correctly or incorrectly classified as non-landslide, they are, respectively, TN and FN [44]. The higher the values of sensitivity, specificity, and accuracy (Equation (1)), or the lower the RMSE, the better the models perform [45]. A kappa index value of 1 indicates a perfect model, whereas −1 represents a non-reliable model. Sensitivity, specificity, accuracy, kappa, and RMSE can be formulated as follows:

Specifity = \frac{TP}{TN + FN}

(1)

Specifity = \frac{TN}{TN + FP}

(2)

Accuracy = \frac{TP + TN}{TP + TN + FP + FN}

(3)

Kappa index (K) = \frac{A - B}{1 - B}

(4)

Where A = (TP + TN) / (TP + TN + FN + FP) and

(5)

B = (TP + FN) (TP + FP) + (FP + TN) (FN + TN) / \sqrt{(TP + TN + FN + FP)}

RMSE = \sqrt{\frac{1}{n}} \sum_{i = 1}^{n} (X_{P} - X_{A})^{2}

(6)

where n is the total number of landsides,

X_{P}

is the predicted probability values by training or validation datasets,

X_{A}

is the actual (output) probability values obtained from landslide susceptibility model, and RMSE is root mean square error.

3.3.2. ROC Curve

The ROC curve is a two-dimensional plot used to evaluate the overall performance of models. It has been used in most landslide modeling studies [5,44,45,46]. The x and y axes of the plot are, respectively, the sensitivity and 100-specificity [17,96,97]. The area under the ROC curve (AUC) provides a measure of the performance of the landslide model [45,98]. AUC for a perfect model is 1; for an inaccurate model it is 0.5 [26,34,46]. If P and N are total number of landslides and non-landslides, respectively, the AUC can be expressed as follows:

AUC = \sum TP + \sum TN / (P + N)

(7)

4. Results and Analysis

4.1. Factor Selection

The most important factors for landslide occurrence are shown in Figure 6. The highest and lowest average merit (AM) values for conditioning factors determined by the ORAE method are, respectively, 72.46 and 46.38. This result indicates that all 20 factors had significance in the landslide susceptibility modeling (AM > 0). Distance to road had the highest predictive capability, because most of observed landslides in the Sarkhoon watershed are located near roads. This factor is followed by rainfall (AM = 65.94), land use (65.94), aspect (64.49), lithology (63.77), elevation (61.59), solar radiation (60.14), general curvature (57.97), slope (56.52), TWI (55.07), longitudinal curvature (53.62), plan curvature (52.90), distance to fault (52.90), SPI (50.72), TPI (48.55), LS (48.55), tangential curvature (48.55), profile curvature (47.82), TRI (47.10), and distance to river (46.38) (Figure 6). The low predictive capability of distance to the river is consistent with the fact that most landslides are far from the river channels.

4.2. Shallow Landslide Modeling Process

The SVM and its ensemble models were built using training and validation datasets. The goodness-of-fit analysis indicates that all models predicted the spatial distribution of landslides well (Table 2 and Table 3). The SVM model had the highest sensitivity (0.853) in the training process, whereas the RS model had the highest sensitivity (0.655) in the validation process. The RF and RS models had the highest specificity for the training and validation datasets—0.884 and 0.931, respectively, indicating that 88% and 93% of non-landslide locations were correctly classified as non-landslides. Additionally, the RF model performed best in the training process based on accuracy (0.855), kappa (0.710), RMSE (0.318) and AUC (0.944) criteria, and the RS model performed best in the validation process based on accuracy (0.793), kappa (0.561), RMSE (0.386), and AUC (0.886).

In contrast, the AB model performed most poorly in the training process, with accuracy, kappa, RMSE, and AUC values of, respectively, 0.797, 0.666, 0.349, and 0.915. It also performed poorly in the validation process, with the lowest AUC (0.821) and the highest RMSE (0.427). The SVM model was the weakest in the validation process, with the lowest values of specificity (0.793), accuracy (0.707), and kappa (0.392).

Finally, the AUC values determined in the validation process indicate that the RS (AUC = 0.886), RF (AUC = 0.878), and AB (AUC = 0.856) models significantly improved the goodness-of-fit and prediction accuracy of the SVM model (AUC = 0.841), but the BA model (AUC = 0.821) did not.

4.3. Development of Landslide Susceptibility Maps

We constructed the landslide susceptibility maps using SVM and its ensemble models. We established five susceptibility classes—very low susceptibility (VLS), low susceptibility (LS), moderate susceptibility (MS), high susceptibility (HS), and very high susceptibility (VHS) –using different classification methods, including natural breaks (NB), quantile (Q), and geometrical intervals (GI). To select the best classification method, we calculated, for each model, all observed landslides in each susceptibility class. The results, shown as histograms of all models (Figure 7), indicate that the most landslides are located in the VHS class.

In the case of the susceptibility map generated by the AB model, the VHS class determined by the natural breaks, quantile, and geometrical interval methods covered, respectively, 23.76%, 19.74%, and 25.78% of the watershed pixels, and 72.45%, 65.31%, and 75.51% of the observed landslide pixels. Therefore, for the AB model, we selected the geometrical interval method as the most appropriate method for classification of landslide susceptibility. Accordingly, we chose this method for the BA, RS, and RF susceptibility maps. We used the natural break method to classify the SVM and prepare the susceptibility map. In the case of the AB model, the VLS, LS, MS, HS, and VHS classes covered, respectively, 18.77%, 19.74%, 15.66%, 20.04%, and 25.78% of the study area (Figure 7). For the BA model, the VHS class had the largest area (30.39%), followed by the LS (21.36%), MS (18.88%), HS (15.30%), and VLS (14.07%) classes (Figure 7). The corresponding values for the RS model are 13.27%, 25.77%, 19.02%, 16.28% and 25.66%; for the RF model, they are 14.73%, 25.50%, 18.24%, 15.59%, and 25.94%. In the case of SVM model, the VHS class has the largest area (26.77%), followed by the VLS (20.35%), LS (19.00%), HS (17.88%), and MS (16.0%) classes.

Figure 8 shows the landslide susceptibility maps derived with SVM and its ensemble models with selected classification methods. For example, the classes calculated using the natural break method are VLS (0.0004–0.2003), LS (0.2003–0.3963), MS (0.3963–0.6080), HS (0.6080–0.8157), and VHS (0.8157–1.000) (Figure 7). The maps show that the northeastern, middle, eastern, southeastern, and southern parts of the Sarkhoon watershed are more susceptible to landslides than northern and western parts. The VHS areas are mostly located along roads, and VLS areas are in the highlands, where human activity is reduced and there are no roads.

4.4. Validation and Comparison of Landslide Susceptibility Maps

The performance of the new ensemble SVM models was compared to that of the SVM model using the area-under-the-ROC curve (AUC) for both training and validation datasets. The ROC curves of the susceptibility maps of the AB, BA, RF, RS, and SVM models in the training step are shown in Figure 9a. It can be seen that the SVM and all of its ensemble models perform well in landslide susceptibility mapping. The RS model is the best performer, with an AUC of 0.873, indicating an accuracy of 87.3%. It followed by the RF (0.871), SVM (0.869), BA (0.857), and AB (0.851) models.

ROC curves for validation dataset are shown in Figure 9b. The RS model performed best, with an AUC of 0.837, followed by the RF (AUC = 0.34), SVM (0.810), AB (0.807), and BA (0.779) models. Although our model performance in the validation step is slightly higher than in the training step, the ROC curves for the training and validation steps yielded higher AUC values for the training dataset than those for the validation dataset. The reason is that the same landslides that were used to build the landslide models were used for performance analysis in the training step.

5. Discussion

In recent years, machine learning algorithms have outperformed expert knowledge and analytic methods in landslide susceptibility mapping [99,100]. A large number of machine learning methods and techniques have been developed, but there is, as yet, no standard method that is accepted by researchers around the world. Previous research has shown complex and non-linear relations between landslide locations and conditioning factors that differ from one region to another and through time [101]. Our objective in this study was to demonstrate the power of SVM and its ensembles adaBoost, bagging, random subspace, and rotation forest in landslide susceptibility mapping in the Sarkhoon watershed in Iran. The SVM algorithm has been widely employed to map landslide susceptibility [98,102,103,104,105,106,107], but not with the ensembles used in this study.

Landslides are affected by a variety of geo-environmental factors, some of which are more important than others. Strategies for factor and feature selection are used in machine learning to improve the performance of models. The most important factors are objectively and preferentially selected, and factors that lead to noise and over-fitting problems are removed to enhance the predictive capability of the model [108]. Using the One-R Attribute Evaluation (ORAE) technique with 10-fold cross-validation, we show that all 20 of our conditioning factors were significant and thus were applied in the modeling and evaluation. However, distance to road is the most important factor for landslide occurrence. Roads disturb the natural environment of slopes, leading to a greater likelihood of landslides in hilly areas (e.g., [69,94,98,109,110]).

SVM is a powerful nonlinear machine learning algorithm in classification problems because it is solidly based on statistical learning theory and is able to find the optimal decision function based on a training dataset [111]. Several studies have concluded that the SVM method is more efficient than and outperforms other expert knowledge and machine learning techniques. Most recently, Tien Bui et al. [46] compared SVM with index of entropy, decision tree, and naïve bayes models, and concluded that SVM has higher prediction capability and is more powerful than the others. Likewise, Pham et al. [100] compared SVM with logistic regression, Fisher’s linear discriminant analysis, Bayesian network, and naïve bayes models, and reached a similar conclusion. The SVM method is superior to the artificial neural network (ANN) method because training by ANN is difficult and tends to overly emphasize local optima. The SVM method does not require interpretation based on rules, as do, for example, the decision tree algorithms. In comparison to logistic regression, SVM requires greater machine memory and processing time, but these issues are not critical for spatial predictions of landslides [100].

The RS and RF classifiers combined with the SVM algorithm provided the best goodness-of-fit, performance, and prediction accuracy for, respectively, the training and validation datasets of this study. The BA and AB ensemble methods failed to improve the performance or power prediction of the SVM. The RS and RF classifiers did a better job of decreasing the variance, bias, and noise of the training dataset than the BA and AB methods, and prevented the problem of over-fitting in the modeling process.

Similar studies [5,44,52,69,91,98] have shown that the RS classifier can enhance the goodness-of-fit and prediction accuracy of the naive Bayes tree classifier for spatial predictions of landslides. Pham et al. [98] reported that the RF in combination with the Fuzzy Unordered Rules Induction Algorithm (FURIA) is more powerful than FURIA, AB-FURIA, and BA-FURIA ensemble models, or SVM. Pham et al. [69] concluded that the RF ensemble model outperforms the Radial Basis Function (RFRBF) neural network, logistic regression, multi-layer perceptron neural networks (MLP Neural Nets), and naïve bayes models. Shirzadi et al. [44] prepared landslide susceptibility maps based on different sample sizes and raster resolution and the use of an alternating decision tree (ADTree) algorithm as a base classifier. Their ensemble models included Multiboost, BA, RF, and RS algorithms. They concluded that the RSADT model with a sample size of 60%/40% and 10-m raster resolution of 70/30% had the highest prediction accuracy, whereas the MBADT was more capable with a sample size of 80/20% and a 20-m raster resolution of 90/10%. Some researchers have concluded that the BA classifier performs better than other ensemble and machine learning base classifiers. For example, Bui et al. [112] concluded that the BA classifier used in combination with a decision tree (DT) model has the highest prediction capability and outperforms the DT and ABDT models. Bui et al. [113] combined the functional tree (FT) algorithm with BA, AB, and Multiboost ensembles, and concluded that it has higher prediction capability compared to the other models mentioned above.

Based on the discussion above, we conclude that the quality of results depend largely on the type of base classifier used. On one hand, a weak base classifier, such as a decision tree algorithm, will likely produce a weak ensemble model [114]. On the other hand, a strong base classifier such as SVM will not necessarily produce a powerful ensemble model. If an ensemble improves the power prediction of a base classifier, it will effectively decrease the over-fitting, noise, and variance problems of the training dataset during the modeling process. We argue that base classifiers and their ensembles should be tested and evaluated before they are accepted and used for landslide susceptibility mapping. The RSSVM model is a promising technique for spatial predictions of landslides and can be used in our study area for land-use planning. We recognize, however, that our method must be tested in areas with different terrain and geology before it can be more widely used.

6. Conclusions

In this study, we developed machine learning ensemble models to prepare landslide susceptibility maps in an effort to better manage landslide-prone areas in the Sarkhoon watershed in the Chahar Mahaal and Bakhtiari Province, Iran. We developed the models using 98 landslide locations (scarp or rupture zone pixels) and 20 conditioning factors tested by the ORAE technique. This tool allowed us to select the most important factors, decrease the over-fitting and noise problems inherent in the training dataset, and increase the reliability and prediction accuracy of the models. Most of observed landslides in the study area are within the high and very high susceptibility classes. The output of the modeling process using the ORAE technique confirmed that the SVM and its ensembles performed well in recognizing landslide-prone locations in the study area. The results indicate that, among the four Meta classifiers, rotation forest and random subspace are more powerful and flexible than adaboost and bagging for modeling landslides in the study area. They significantly enhanced the power prediction of SVM as a base classifier. Additionally, rotation forest model had the highest prediction accuracy in generating landslide susceptibility maps. Overall, the findings can be summarized as follows:

All 20 conditioning factors are significantly associated with landslide occurrence, while the most important factor is distance to road. It is followed by rainfall and land-use factors, which implies that human activities, including interference with runoff, are the main causes of landslides in the study area.
The proposed RF-SVM model is a promising technique for generating accurate and useful landslide susceptibility maps in other areas with similar geo-environmental characteristics.
We recommend the combination and integration of rotation forest and SVM for building the landslide susceptibility model.
The RF-SVM ensemble model outperformed the RS-SVM model, the BA-SVM model, and the AB-SVM model, and therefore is an appropriate and reasonable tool for landslide susceptibility mapping.
The ensemble model described in this paper is recommended as a robust model for landslide susceptibility assessment and hazard and risk management and reduction.

Author Contributions

D.T.B., A.S., H.S., M.G., E.O., J.J.C., B.T.P., J.D., D.T.A., B.B.A. and S.L contributed equally to the work. A.S., H.S. and E.O collected field data and conducted the landslide susceptibility mapping and analysis. D.T.B., A.S., H.S., E.O., J.D. and D.T.A. wrote the manuscript. D.T.B., M.G., J.J.C., B.T.P., B.B.A. and S.L. provided critical comments in planning this paper and edited the manuscript. All the authors discussed the results and edited the manuscript.

Funding

This research was supported by the Basic Research Project of the Korea Institute of Geoscience, Mineral Resources (KIGAM), which is funded by the Minister of Science and ICT and Universiti Teknologi Malaysia (UTM) based on Research University Grant (Q.J130000.2527.17H84).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Alimohammadlou, Y.; Najafi, A.; Yalcin, A. Landslide process and impacts: A proposed classification method. Catena 2013, 104, 29–32. [Google Scholar] [CrossRef]
Shirzadi, A.; Solaimani, K.; Roshan, M.H.; Kavian, A.; Chapi, K.; Shahabi, H.; Keesstra, S.; Ahmad, B.B.; Bui, D.T. Uncertainties of prediction accuracy in shallow landslide modeling: Sample size and raster resolution. Catena 2019, 178, 172–188. [Google Scholar] [CrossRef]
Camilo, D.C.; Lombardo, L.; Mai, P.M.; Dou, J.; Huser, R. Handling high predictor dimensionality in slope-unit-based landslide susceptibility models through lasso-penalized generalized linear model. Environ. Model. Softw. 2017, 97, 145–156. [Google Scholar] [CrossRef]
Thai Pham, B.; Prakash, I.; Dou, J.; Singh, S.K.; Trinh, P.T.; Trung Tran, H.; Minh Le, T.; Tran, V.P.; Kim Khoi, D.; Shirzadi, A. A novel hybrid approach of landslide susceptibility modeling using rotation forest ensemble and different base classifiers. Geocarto Int. 2018, 14, 1–38. [Google Scholar]
Shirzadi, A.; Bui, D.T.; Pham, B.T.; Solaimani, K.; Chapi, K.; Kavian, A.; Shahabi, H.; Revhaug, I. Shallow landslide susceptibility assessment using a novel hybrid intelligence approach. Environ. Earth Sci. 2017, 76, 60. [Google Scholar] [CrossRef]
Chen, W.; Xie, X.; Wang, J.; Pradhan, B.; Hong, H.; Bui, D.T.; Duan, Z.; Ma, J. A comparative study of logistic model tree, random forest, and classification and regression tree models for spatial prediction of landslide susceptibility. Catena 2017, 151, 147–160. [Google Scholar] [CrossRef] [Green Version]
Bai, S.-B.; Wang, J.; Lü, G.-N.; Zhou, P.-G.; Hou, S.-S.; Xu, S.-N. GIS-based logistic regression for landslide susceptibility mapping of the Zhongxian segment in the Three Gorges area, China. Geomorphology 2010, 115, 23–31. [Google Scholar] [CrossRef]
Shirzadi, A.; Saro, L.; Joo, O.H.; Chapi, K. A gis-based logistic regression model in rock-fall susceptibility mapping along a mountainous road: Salavat Abad case study, Kurdistan, Iran. Nat. Hazards 2012, 64, 1639–1656. [Google Scholar] [CrossRef]
Mousavi, S.Z.; Kavian, A.; Soleimani, K.; Mousavi, S.R.; Shirzadi, A. GIS based spatial prediction of landslide susceptibility using logistic regression model. Geomat. Nat. Hazards Risk 2011, 2, 33–50. [Google Scholar] [CrossRef]
Pradhan, B. Remote sensing and GIS-based landslide hazard analysis and cross-validation using multivariate logistic regression model on three test areas in Malaysia. Adv. Space Res. 2010, 45, 1244–1256. [Google Scholar] [CrossRef]
Nandi, A.; Shakoor, A. A GIS-based landslide susceptibility evaluation using bivariate and multivariate statistical analyses. Eng. Geol. 2010, 110, 11–20. [Google Scholar] [CrossRef]
Conoscenti, C.; Ciaccio, M.; Caraballo-Arias, N.A.; Gómez-Gutiérrez, Á.; Rotigliano, E.; Agnesi, V. Assessment of susceptibility to earth-flow landslide using logistic regression and multivariate adaptive regression splines: A case of the Belice River basin (western Sicily, Italy). Geomorphology 2015, 242, 49–64. [Google Scholar] [CrossRef]
He, S.; Pan, P.; Dai, L.; Wang, H.; Liu, J. Application of kernel-based Fisher discriminant analysis to map landslide susceptibility in the Qinggan River delta, Three Gorges, China. Geomorphology 2012, 171, 30–41. [Google Scholar] [CrossRef]
Dong, J.-J.; Tung, Y.-H.; Chen, C.-C.; Liao, J.-J.; Pan, Y.-W. Discriminant analysis of the geomorphic characteristics and stability of landslide dams. Geomorphology 2009, 110, 162–171. [Google Scholar] [CrossRef]
Hong, H.; Chen, W.; Xu, C.; Youssef, A.M.; Pradhan, B.; Tien Bui, D. Rainfall-induced landslide susceptibility assessment at the Chongren area (China) using frequency ratio, certainty factor, and index of entropy. Geocarto Int. 2017, 32, 139–154. [Google Scholar] [CrossRef]
Dou, J.; Oguchi, T.; Hayakawa, Y.S.; Uchiyama, S.; Saito, H.; Paudel, U. GIS-based landslide susceptibility mapping using a certainty factor model and its validation in the Chuetsu area, central Japan. In Landslide Science for a Safer Geoenvironment; Springer: Berlin, Germany, 2014; pp. 419–424. [Google Scholar]
Chen, W.; Shahabi, H.; Shirzadi, A.; Li, T.; Guo, C.; Hong, H.; Li, W.; Pan, D.; Hui, J.; Ma, M. A novel ensemble approach of bivariate statistical-based logistic model tree classifier for landslide susceptibility assessment. Geocarto Int. 2018, 1–23. [Google Scholar] [CrossRef]
Zhang, T.; Han, L.; Chen, W.; Shahabi, H. Hybrid integration approach of entropy with logistic regression and support vector machine for landslide susceptibility modeling. Entropy 2018, 20, 884. [Google Scholar] [CrossRef]
Hong, H.; Shahabi, H.; Shirzadi, A.; Chen, W.; Chapi, K.; Ahmad, B.B.; Roodposhti, M.S.; Hesar, A.Y.; Tian, Y.; Bui, D.T. Landslide susceptibility assessment at the Wuning area, China: A comparison between multi-criteria decision making, bivariate statistical and machine learning methods. Nat. Hazards 2018, 96, 1–40. [Google Scholar] [CrossRef]
Bui, D.T.; Lofman, O.; Revhaug, I.; Dick, O. Landslide susceptibility analysis in the Hoa Binh province of Vietnam using statistical index and logistic regression. Nat. Hazards 2011, 59, 1413. [Google Scholar] [CrossRef]
Tien Bui, D.; Shahabi, H.; Shirzadi, A.; Chapi, K.; Alizadeh, M.; Chen, W.; Mohammadi, A.; Ahmad, B.; Panahi, M.; Hong, H. Landslide detection and susceptibility mapping by Airsar data using support vector machine and index of entropy models in Cameron Highlands, Malaysia. Remote Sens. 2018, 10, 1527. [Google Scholar] [CrossRef]
Myronidis, D.; Papageorgiou, C.; Theophanous, S. Landslide susceptibility mapping based on landslide history and analytic hierarchy process (ahp). Nat. Hazards 2016, 81, 245–263. [Google Scholar] [CrossRef]
Shirzadi, A.; Chapi, K.; Shahabi, H.; Solaimani, K.; Kavian, A.; Ahmad, B.B. Rock fall susceptibility assessment along a mountainous road: An evaluation of bivariate statistic, analytical hierarchy process and frequency ratio. Environ. Earth Sci. 2017, 76, 152. [Google Scholar] [CrossRef]
Jaafari, A.; Zenner, E.K.; Panahi, M.; Shahabi, H. Hybrid artificial intelligence models based on a neuro-fuzzy system and metaheuristic optimization algorithms for spatial prediction of wildfire probability. Agric. For. Meteorol. 2019, 266, 198–207. [Google Scholar] [CrossRef]
Taheri, K.; Shahabi, H.; Chapi, K.; Shirzadi, A.; Gutiérrez, F.; Khosravi, K. Sinkhole susceptibility mapping: A comparison between bayes-based machine learning algorithms. Land Degrad. Dev. 2019, 30, 730–745. [Google Scholar] [CrossRef]
Chapi, K.; Singh, V.P.; Shirzadi, A.; Shahabi, H.; Bui, D.T.; Pham, B.T.; Khosravi, K. A novel hybrid artificial intelligence approach for flood susceptibility assessment. Environ. Model. Softw. 2017, 95, 229–245. [Google Scholar] [CrossRef]
Hong, H.; Panahi, M.; Shirzadi, A.; Ma, T.; Liu, J.; Zhu, A.-X.; Chen, W.; Kougias, I.; Kazakis, N. Flood susceptibility assessment in Hengfeng area coupling adaptive neuro-fuzzy inference system with genetic algorithm and differential evolution. Sci. Total Environ. 2018, 621, 1124–1141. [Google Scholar] [CrossRef] [PubMed]
Khosravi, K.; Pham, B.T.; Chapi, K.; Shirzadi, A.; Shahabi, H.; Revhaug, I.; Prakash, I.; Bui, D.T. A comparative assessment of decision trees algorithms for flash flood susceptibility modeling at Haraz watershed, northern Iran. Sci. Total Environ. 2018, 627, 744–755. [Google Scholar] [CrossRef]
Shafizadeh-Moghadam, H.; Valavi, R.; Shahabi, H.; Chapi, K.; Shirzadi, A. Novel forecasting approaches using combination of machine learning and statistical models for flood susceptibility mapping. J. Environ. Manag. 2018, 217, 1–11. [Google Scholar] [CrossRef] [Green Version]
Ahmadlou, M.; Karimi, M.; Alizadeh, S.; Shirzadi, A.; Parvinnejhad, D.; Shahabi, H.; Panahi, M. Flood susceptibility assessment using integration of adaptive network-based fuzzy inference system (ANFIS) and biogeography-based optimization (BBO) and bat algorithms (BA). Geocarto Int. 2018, 34, 1–21. [Google Scholar] [CrossRef]
Bui, D.T.; Panahi, M.; Shahabi, H.; Singh, V.P.; Shirzadi, A.; Chapi, K.; Khosravi, K.; Chen, W.; Panahi, S.; Li, S. Novel hybrid evolutionary algorithms for spatial prediction of floods. Sci. Rep. 2018, 8, 15364. [Google Scholar] [CrossRef]
Miraki, S.; Zanganeh, S.H.; Chapi, K.; Singh, V.P.; Shirzadi, A.; Shahabi, H.; Pham, B.T. Mapping groundwater potential using a novel hybrid intelligence approach. Water Resour. Manag. 2019, 33, 281–302. [Google Scholar] [CrossRef]
Rahmati, O.; Naghibi, S.A.; Shahabi, H.; Bui, D.T.; Pradhan, B.; Azareh, A.; Rafiei-Sardooi, E.; Samani, A.N.; Melesse, A.M. Groundwater spring potential modelling: Comprising the capability and robustness of three different modeling approaches. J. Hydrol. 2018, 565, 248–261. [Google Scholar] [CrossRef]
Tien Bui, D.; Khosravi, K.; Li, S.; Shahabi, H.; Panahi, M.; Singh, V.; Chapi, K.; Shirzadi, A.; Panahi, S.; Chen, W. New hybrids of anfis with several optimization algorithms for flood susceptibility modeling. Water 2018, 10, 1210. [Google Scholar] [CrossRef]
Chen, W.; Hong, H.; Li, S.; Shahabi, H.; Wang, Y.; Wang, X.; Ahmad, B.B. Flood susceptibility modelling using novel hybrid approach of reduced-error pruning trees with bagging and random subspace ensembles. J. Hydrol. 2019, 575, 864–873. [Google Scholar] [CrossRef]
Rahmati, O.; Samadi, M.; Shahabi, H.; Azareh, A.; Rafiei-Sardooi, E.; Alilou, H.; Melesse, A.M.; Pradhan, B.; Chapi, K.; Shirzadi, A. Swpt: An automated GIS-based tool for prioritization of sub-watersheds based on morphometric and topo-hydrological factors. Geosci. Front. 2019. [Google Scholar] [CrossRef]
Khosravi, K.; Shahabi, H.; Pham, B.T.; Adamawoski, J.; Shirzadi, A.; Pradhan, B.; Dou, J.; Ly, H.-B.; Gróf, G.; Ho, H.L.; et al. A comparative assessment of flood susceptibility modeling using multi-criteria decision-making analysis and machine learning methods. J. Hydrol. 2019, 573, 311–323. [Google Scholar] [CrossRef]
Roodposhti, M.S.; Safarrad, T.; Shahabi, H. Drought sensitivity mapping using two one-class support vector machine algorithms. Atmos. Res. 2017, 193, 73–82. [Google Scholar] [CrossRef]
Azareh, A.; Rahmati, O.; Rafiei-Sardooi, E.; Sankey, J.B.; Lee, S.; Shahabi, H.; Ahmad, B.B. Modelling gully-erosion susceptibility in a semi-arid region, Iran: Investigation of applicability of certainty factor and maximum entropy models. Sci. Total Environ. 2019, 655, 684–696. [Google Scholar] [CrossRef]
Tien Bui, D.; Shirzadi, A.; Shahabi, H.; Chapi, K.; Omidavr, E.; Pham, B.T.; Talebpour Asl, D.; Khaledian, H.; Pradhan, B.; Panahi, M. A novel ensemble artificial intelligence approach for gully erosion mapping in a semi-arid watershed (Iran). Sensors 2019, 19, 2444. [Google Scholar] [CrossRef]
Alizadeh, M.; Alizadeh, E.; Asadollahpour Kotenaee, S.; Shahabi, H.; Beiranvand Pour, A.; Panahi, M.; Bin Ahmad, B.; Saro, L. Social vulnerability assessment using artificial neural network (ANN) model for earthquake hazard in Tabriz City, Iran. Sustainability 2018, 10, 3376. [Google Scholar] [CrossRef]
Tien Bui, D.; Shahabi, H.; Shirzadi, A.; Chapi, K.; Pradhan, B.; Chen, W.; Khosravi, K.; Panahi, M.; Bin Ahmad, B.; Saro, L. Land subsidence susceptibility mapping in South Korea using machine learning algorithms. Sensors 2018, 18, 2464. [Google Scholar] [CrossRef] [PubMed]
Rodriguez-Galiano, V.; Sanchez-Castillo, M.; Chica-Olmo, M.; Chica-Rivas, M. Machine learning predictive models for mineral prospectivity: An evaluation of neural networks, random forest, regression trees and support vector machines. Ore Geol. Rev. 2015, 71, 804–818. [Google Scholar] [CrossRef]
Shirzadi, A.; Soliamani, K.; Habibnejhad, M.; Kavian, A.; Chapi, K.; Shahabi, H.; Chen, W.; Khosravi, K.; Thai Pham, B.; Pradhan, B. Novel GIS based machine learning algorithms for shallow landslide susceptibility mapping. Sensors 2018, 18, 3777. [Google Scholar] [CrossRef] [PubMed]
Pham, B.T.; Prakash, I.; Singh, S.K.; Shirzadi, A.; Shahabi, H.; Bui, D.T. Landslide susceptibility modeling using reduced error pruning trees and different ensemble techniques: Hybrid machine learning approaches. Catena 2019, 175, 203–218. [Google Scholar] [CrossRef]
Tien Bui, D.; Shahabi, H.; Shirzadi, A.; Kamran Chapi, K.; Hoang, N.-D.; Pham, B.; Bui, Q.-T.; Tran, C.-T.; Panahi, M.; Bin Ahmad, B.; et al. A novel integrated approach of relevance vector machine optimized by imperialist competitive algorithm for spatial modeling of shallow landslides. Remote Sens. 2019, 11, 57. [Google Scholar] [CrossRef]
Kavzoglu, T.; Colkesen, I.; Sahin, E.K. Machine learning techniques in landslide susceptibility mapping: A survey and a case study. In Landslides: Theory, Practice and Modelling; Springer: Berlin, Germany, 2019; pp. 283–301. [Google Scholar]
Dou, J.; Paudel, U.; Oguchi, T.; Uchiyama, S.; Hayakavva, Y.S. Shallow and deep-seated landslide differentiation using support vector machines: A case study of the Chuetsu area, Japan. Terr. Atmos. Ocean. Sci. 2015, 26, 227–239. [Google Scholar] [CrossRef]
Pham, B.T.; Prakash, I.; Khosravi, K.; Chapi, K.; Trinh, P.T.; Ngo, T.Q.; Hosseini, S.V.; Bui, D.T. A comparison of support vector machines and bayesian algorithms for landslide susceptibility modelling. Geocarto Int. 2018, 11, 1–23. [Google Scholar] [CrossRef]
Dou, J.; Yamagishi, H.; Zhu, Z.; Yunus, A.P.; Chen, C.W. Txt-tool 1.081-6.1; A comparative study of the binary logistic regression (BLR) and artificial neural network (ANN) models for GIS-based spatial predicting landslides at a regional scale. In Landslide Dynamics: Isdr-Icl Landslide Interactive Teaching Tools; Springer: Berlin, Germany, 2018; pp. 139–151. [Google Scholar]
Shirzadi, A.; Shahabi, H.; Chapi, K.; Bui, D.T.; Pham, B.T.; Shahedi, K.; Ahmad, B.B. A comparative study between popular statistical and machine learning methods for simulating volume of landslides. Catena 2017, 157, 213–226. [Google Scholar] [CrossRef]
Chen, W.; Shirzadi, A.; Shahabi, H.; Ahmad, B.B.; Zhang, S.; Hong, H.; Zhang, N. A novel hybrid artificial intelligence approach based on the rotation forest ensemble and naïve bayes tree classifiers for a landslide susceptibility assessment in Langao County, China. Geomat. Nat. Hazards Risk 2017, 8, 1955–1977. [Google Scholar] [CrossRef]
He, Q.; Shahabi, H.; Shirzadi, A.; Li, S.; Chen, W.; Wang, N.; Chai, H.; Bian, H.; Ma, J.; Chen, Y. Landslide spatial modelling using novel bivariate statistical based naïve bayes, rbf classifier, and rbf network machine learning algorithms. Sci. Total Environ. 2019, 663, 1–15. [Google Scholar] [CrossRef]
Chen, W.; Shahabi, H.; Shirzadi, A.; Hong, H.; Akgun, A.; Tian, Y.; Liu, J.; Zhu, A.-X.; Li, S. Novel hybrid artificial intelligence approach of bivariate statistical-methods-based kernel logistic regression classifier for landslide susceptibility modeling. Bull. Eng. Geol. Environ. 2018, 78, 1–23. [Google Scholar] [CrossRef]
Bui, D.T.; Tuan, T.A.; Klempe, H.; Pradhan, B.; Revhaug, I. Spatial prediction models for shallow landslide hazards: A comparative assessment of the efficacy of support vector machines, artificial neural networks, kernel logistic regression, and logistic model tree. Landslides 2016, 13, 361–378. [Google Scholar]
Chen, W.; Peng, J.; Hong, H.; Shahabi, H.; Pradhan, B.; Liu, J.; Zhu, A.-X.; Pei, X.; Duan, Z. Landslide susceptibility modelling using GIS-based machine learning techniques for Chongren County, Jiangxi Province, China. Sci. Total Environ. 2018, 626, 1121–1135. [Google Scholar] [CrossRef] [PubMed]
Chen, W.; Xie, X.; Peng, J.; Shahabi, H.; Hong, H.; Bui, D.T.; Duan, Z.; Li, S.; Zhu, A.-X. GIS-based landslide susceptibility evaluation using a novel hybrid integration approach of bivariate statistical based random forest method. Catena 2018, 164, 135–149. [Google Scholar] [CrossRef]
Tien Bui, D.; Shahabi, H.; Omidvar, E.; Shirzadi, A.; Geertsema, M.; Clague, J.J.; Khosravi, K.; Pradhan, B.; Pham, B.T.; Chapi, K. Shallow landslide prediction using a novel hybrid functional machine learning algorithm. Remote Sens. 2019, 11, 931. [Google Scholar] [CrossRef]
Chen, W.; Panahi, M.; Tsangaratos, P.; Shahabi, H.; Ilia, I.; Panahi, S.; Li, S.; Jaafari, A.; Ahmad, B.B. Applying population-based evolutionary algorithms and a neuro-fuzzy system for modeling landslide susceptibility. Catena 2019, 172, 212–231. [Google Scholar] [CrossRef]
Chen, W.; Zhang, S.; Li, R.; Shahabi, H. Performance evaluation of the GIS-based data mining techniques of best-first decision tree, random forest, and naïve bayes tree for landslide susceptibility modeling. Sci. Total Environ. 2018, 644, 1006–1018. [Google Scholar] [CrossRef]
Hong, H.; Liu, J.; Zhu, A.-X.; Shahabi, H.; Pham, B.T.; Chen, W.; Pradhan, B.; Bui, D.T. A novel hybrid integration model using support vector machines and random subspace for weather-triggered landslide susceptibility assessment in the Wuning area (China). Environ. Earth Sci. 2017, 76, 652. [Google Scholar] [CrossRef]
Chang, K.-T.; Hwang, J.-T.; Liu, J.-K.; Wang, E.-H.; Wang, C.-I. Apply two hybrid methods on the rainfall-induced landslides interpretation. In Proceedings of the IEEE 19th International Conference on Geoinformatics, Shanghai, China, 24–26 June 2011; pp. 1–5. [Google Scholar]
Pham, B.T.; Bui, D.T.; Prakash, I. Bagging based support vector machines for spatial prediction of landslides. Environ. Earth Sci. 2018, 77, 146. [Google Scholar] [CrossRef]
Jaafari, A.; Panahi, M.; Pham, B.T.; Shahabi, H.; Bui, D.T.; Rezaie, F.; Lee, S. Meta optimization of an adaptive neuro-fuzzy inference system with grey wolf optimizer and biogeography-based optimization algorithms for spatial prediction of landslide susceptibility. Catena 2019, 175, 430–445. [Google Scholar] [CrossRef]
Chen, W.; Zhao, X.; Shahabi, H.; Shirzadi, A.; Khosravi, K.; Chai, H.; Zhang, S.; Zhang, L.; Ma, J.; Chen, Y. Spatial prediction of landslide susceptibility by combining evidential belief function, logistic regression and logistic model tree. Geocarto Int. 2019, 34, 1–25. [Google Scholar] [CrossRef]
Nguyen, V.V.; Pham, B.T.; Vu, B.T.; Prakash, I.; Jha, S.; Shahabi, H.; Shirzadi, A.; Ba, D.N.; Kumar, R.; Chatterjee, J.M. Hybrid machine learning approaches for landslide susceptibility modeling. Forests 2019, 10, 157. [Google Scholar] [CrossRef]
Choubin, B.; Moradi, E.; Golshan, M.; Adamowski, J.; Sajedi-Hosseini, F.; Mosavi, A. An ensemble prediction of flood susceptibility using multivariate discriminant analysis, classification and regression trees, and support vector machines. Sci. Total Environ. 2019, 651, 2087–2096. [Google Scholar] [CrossRef] [PubMed]
Abedini, M.; Ghasemian, B.; Shirzadi, A.; Shahabi, H.; Chapi, K.; Pham, B.T.; Bin Ahmad, B.; Tien Bui, D. A novel hybrid approach of bayesian logistic regression and its ensembles for landslide susceptibility assessment. Geocarto Int. 2018, 30, 1–31. [Google Scholar] [CrossRef]
Pham, B.T.; Shirzadi, A.; Bui, D.T.; Prakash, I.; Dholakia, M. A hybrid machine learning ensemble approach based on a radial basis function neural network and rotation forest for landslide susceptibility modeling: A case study in the Himalayan area, India. Int. J. Sediment Res. 2018, 33, 157–170. [Google Scholar] [CrossRef]
Smyth, C.G.; Royle, S.A. Urban landslide hazards: Incidence and causative factors in Niterói, Rio de Janeiro State, Brazil. Appl. Geogr. 2000, 20, 95–118. [Google Scholar] [CrossRef]
Almeida, S.; Holcombe, E.A.; Pianosi, F.; Wagener, T. Dealing with deep uncertainties in landslide modelling for disaster risk reduction under climate change. Nat. Hazards Earth Syst. Sci. 2017, 17, 225–241. [Google Scholar] [CrossRef] [Green Version]
Ibsen, M.L.; Brunsden, D. The nature, use and problems of historical archives for the temporal occurrence of landslides, with specific reference to the south coast of Britain, Ventnor, Isle of Wight. Geomorphology 1996, 15, 241–258. [Google Scholar] [CrossRef]
Goudie, A.; Ayala, I.A. Geomorphological Hazards and Disaster Prevention; Cambridge University Press: Cambridge, UK, 2010. [Google Scholar]
Crozier, M.J. Deciphering the effect of climate change on landslide activity: A review. Geomorphology 2010, 124, 260–267. [Google Scholar] [CrossRef]
Kendon, E.J.; Roberts, N.M.; Fowler, H.J.; Roberts, M.J.; Chan, S.C.; Senior, C.A. Heavier summer downpours with climate change revealed by weather forecast resolution model. Nat. Clim. Chang. 2014, 4, 570. [Google Scholar] [CrossRef]
Shahabi, H.; Hashim, M. Landslide susceptibility mapping using GIS-based statistical models and Remote sensing data in tropical environment. Sci. Rep. 2015, 5, 9899. [Google Scholar] [CrossRef] [PubMed]
Dhakal, A.S.; Amada, T.; Aniya, M. Landslide hazard mapping and its evaluation using GIS: An investigation of sampling schemes for a grid-cell based quantitative method. Photogramm. Eng. Remote Sens. 2000, 66, 981–989. [Google Scholar]
Congalton, R.G.; Green, K. Assessing the Accuracy of Remotely Sensed Data: Principles and Practices; CRC Press: Boca Raton, FL, USA, 2008. [Google Scholar]
Colkesen, I.; Sahin, E.K.; Kavzoglu, T. Susceptibility mapping of shallow landslides using kernel-based gaussian process, support vector machines and logistic regression. J. Afr. Earth Sci. 2016, 118, 53–64. [Google Scholar] [CrossRef]
Novaković, J. Toward optimal feature selection using ranking methods and classification algorithms. Yugosl. J. Oper. Res. 2016, 21, 119–135. [Google Scholar] [CrossRef]
Yildirim, P. Filter based feature selection methods for prediction of risks in hepatitis disease. Int. J. Mach. Learn. Comput. 2015, 5, 258. [Google Scholar] [CrossRef]
Holte, R.C. Very simple classification rules perform well on most commonly used datasets. Mach. Learn. 1993, 11, 63–90. [Google Scholar] [CrossRef]
Morariu, D.; Cretulescu, R.; Breazu, M. Feature selection in document classification. In Proceedings of the Fourth International Conference in Romania of Information Science and Information Literacy, ISSN-L, Sibiu, Romania, 17–19 April 2013; pp. 2247–2255. [Google Scholar]
Selvi, C.; Ahuja, C.; Sivasankar, E. A comparative study of feature selection and machine learning methods for sentiment classification on movie data set. In Intelligent Computing and Applications; Springer: Berlin, Germany, 2015; pp. 367–379. [Google Scholar]
Vapnik, V.; Guyon, I.; Hastie, T. Support vector machines. Mach. Learn. 1995, 20, 273–297. [Google Scholar]
Vapnik, V. The Nature of Statistical Learning Theory; Springer Science & Business Media: Berlin, Germany, 2013. [Google Scholar]
Hong, H.; Pradhan, B.; Bui, D.T.; Xu, C.; Youssef, A.M.; Chen, W. Comparison of four kernel functions used in support vector machines for landslide susceptibility mapping: A case study at Suichuan area (China). Geomat. Nat. Hazards Risk 2017, 8, 544–569. [Google Scholar] [CrossRef]
Tehrany, M.S.; Pradhan, B.; Mansor, S.; Ahmad, N. Flood susceptibility assessment using GIS-based support vector machine model with different kernel types. Catena 2015, 125, 91–101. [Google Scholar] [CrossRef]
Bui, D.T.; Bui, Q.-T.; Nguyen, Q.-P.; Pradhan, B.; Nampak, H.; Trinh, P.T. A hybrid artificial intelligence approach using GIS-based neural-fuzzy inference system and particle swarm optimization for forest fire susceptibility modeling at a tropical area. Agric. For. Meteorol. 2017, 233, 32–44. [Google Scholar]
Freund, Y.; Schapire, R.E. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 1997, 55, 119–139. [Google Scholar] [CrossRef]
Hong, H.; Liu, J.; Bui, D.T.; Pradhan, B.; Acharya, T.D.; Pham, B.T.; Zhu, A.-X.; Chen, W.; Ahmad, B.B. Landslide susceptibility mapping using j48 decision tree with adaboost, bagging and rotation forest ensembles in the Guangchang area (China). Catena 2018, 163, 399–413. [Google Scholar] [CrossRef]
Breiman, L. Bagging predictors. Mach. Learn. 1996, 24, 123–140. [Google Scholar] [CrossRef] [Green Version]
Boot, T.; Nibbering, D. Forecasting using random subspace methods. J. Econom. 2019, 209, 391–406. [Google Scholar] [CrossRef] [Green Version]
Pham, B.T.; Bui, D.T.; Prakash, I.; Dholakia, M. Hybrid integration of multilayer perceptron neural networks and machine learning ensembles for landslide susceptibility assessment at Himalayan area (India) using. Catena 2017, 149, 52–63. [Google Scholar] [CrossRef]
Rodriguez, J.J.; Kuncheva, L.I.; Alonso, C.J. Rotation forest: A new classifier ensemble method. IEEE Trans. Pattern Anal. Mach. Intell. 2006, 28, 1619–1630. [Google Scholar] [CrossRef] [PubMed]
Hanley, J.A.; McNeil, B.J. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 1982, 143, 29–36. [Google Scholar] [CrossRef]
Mathew, J.; Jha, V.; Rawat, G. Landslide susceptibility zonation mapping and its validation in part of Garhwal Lesser Himalaya, India, using binary logistic regression analysis and receiver operating characteristic curve method. Landslides 2009, 6, 17–26. [Google Scholar] [CrossRef]
Pham, B.T.; Bui, D.T.; Prakash, I.; Dholakia, M. Rotation forest fuzzy rule-based classifier ensemble for spatial prediction of landslides using GIS. Nat. Hazards 2016, 83, 97–127. [Google Scholar] [CrossRef]
Pradhan, B. A comparative study on the predictive ability of the decision tree, support vector machine and neuro-fuzzy models in landslide susceptibility mapping using GIS. Comput. Geosci. 2013, 51, 350–365. [Google Scholar] [CrossRef]
Pham, B.T.; Pradhan, B.; Bui, D.T.; Prakash, I.; Dholakia, M. A comparative study of different machine learning methods for landslide susceptibility assessment: A case study of Uttarakhand area (India). Environ. Model. Softw. 2016, 84, 240–250. [Google Scholar] [CrossRef]
Zhou, C.; Lee, C.; Li, J.; Xu, Z. On the spatial relationship between landslides and causative factors on Lantau Island, Hong Kong. Geomorphology 2002, 43, 197–207. [Google Scholar] [CrossRef]
Yao, X.; Tham, L.; Dai, F. Landslide susceptibility mapping based on support vector machine: A case study on natural slopes of Hong Kong, China. Geomorphology 2008, 101, 572–582. [Google Scholar] [CrossRef]
Marjanović, M.; Kovačević, M.; Bajat, B.; Voženílek, V. Landslide susceptibility assessment using svm machine learning algorithm. Eng. Geol. 2011, 123, 225–234. [Google Scholar] [CrossRef]
Tien Bui, D.; Pradhan, B.; Lofman, O.; Revhaug, I. Landslide susceptibility assessment in Vietnam using support vector machines, decision tree, and naive bayes models. Math. Probl. Eng. 2012, 2012, 974638. [Google Scholar] [CrossRef]
Pourghasemi, H.R.; Jirandeh, A.G.; Pradhan, B.; Xu, C.; Gokceoglu, C. Landslide susceptibility mapping using support vector machine and GIS at the Golestan Province, Iran. J. Earth Syst. Sci. 2013, 122, 349–369. [Google Scholar] [CrossRef] [Green Version]
Chen, W.; Chai, H.; Zhao, Z.; Wang, Q.; Hong, H. Landslide susceptibility mapping based on GIS and support vector machine models for the Qianyang County, China. Environ. Earth Sci. 2016, 75, 474. [Google Scholar] [CrossRef]
Chen, W.; Pourghasemi, H.R.; Naghibi, S.A. A comparative study of landslide susceptibility maps produced using support vector machine with different kernel functions and entropy data mining models in China. Bull. Eng. Geol. Environ. 2018, 77, 647–664. [Google Scholar] [CrossRef]
Ballabio, C.; Sterlacchini, S. Support vector machines for landslide susceptibility mapping: The Staffora River basin case study, Italy. Math. Geosci. 2012, 44, 47–70. [Google Scholar] [CrossRef]
Ayalew, L.; Yamagishi, H. The application of GIS-based logistic regression for landslide susceptibility mapping in the Kakuda-Yahiko Mountains, central Japan. Geomorphology 2005, 65, 15–31. [Google Scholar] [CrossRef]
Pham, B.T.; Bui, D.T.; Pourghasemi, H.R.; Indra, P.; Dholakia, M. Landslide susceptibility assessment in the Uttarakhand area (India) using GIS: A comparison study of prediction capability of naïve bayes, multilayer perceptron neural networks, and functional trees methods. Theor. Appl. Climatol. 2017, 128, 255–273. [Google Scholar] [CrossRef]
Yan, W.; Shao, H. Application of support vector machine nonlinear classifier to fault diagnoses. In Proceedings of the IEEE 4th World Congress on Intelligent Control and Automation, Shanghai, China, 10–14 June 2002; pp. 2697–2700. [Google Scholar]
Bui, D.T.; Ho, T.C.; Revhaug, I.; Pradhan, B.; Nguyen, D.B. Landslide susceptibility mapping along the National Road 32 of Vietnam using GIS-based j48 decision tree classifier and its ensembles. In Cartography from Pole to Pole; Springer: Berlin, Germany, 2014; pp. 303–317. [Google Scholar]
Bui, D.T.; Ho, T.-C.; Pradhan, B.; Pham, B.-T.; Nhu, V.-H.; Revhaug, I. GIS-based modeling of rainfall-induced landslides using data mining-based functional trees classifier with adaboost, bagging, and multiboost ensemble frameworks. Environ. Earth Sci. 2016, 75, 1101. [Google Scholar]
Rokach, L. Ensemble-based classifiers. Artif. Intell. Rev. 2010, 33, 1–39. [Google Scholar] [CrossRef]

Figure 1. Geographical location of study area: (a) Chahar Mahaal and Bakhtiari Province, Iran; (b) Sarkhoon watershed; (c) locations of some landslides in the watershed.

Figure 2. Examples of landslides in the study area [58].

Figure 3. Landslide conditioning factors and their classes: (a) slope; (b) aspect; (c) elevation; (d) solar radiation; (e) general curvature; (f) tangential curvature; (g) longitudinal curvature; (h) profile curvature; (i) plan curvature; (j) TWI; (k) SPI; (l) LS; (m) precipitation; (n) TRI; (o) TPI; (p) geology; (q) land use; (r) distance to river; (s) distance to road; (t) distance to fault.

Figure 4. The pseudo code of the One-R algorithm.

Figure 5. Flowchart of methodology used in this study.

Figure 6. Factor scores determined using the ORAE technique.

Figure 7. Histogram classification for all models: (a) AB-SVM ensemble model; (b) BA-SVM ensemble model; (c) RS-SVM ensemble model; (d) RF-SVM ensemble model, and (e) SVM base classifier; NB (Natural Breaks); Q (Quantile); GI (Geometric Interval).

Figure 8. Landslide susceptibility maps derived with SVM and its ensembles: (a) SVM; (b) RS-SVM; (c) RF-SVM; (d) AB-SVM; (e) BA-SVM.

Figure 9. AUC results of models using: (a) the validation dataset and (b) the training dataset.

Table 1. Geological units in the study area [58].

Description	Symbol	Formation
Olive, gray, green marl	Mmm	Mishan
Red sandstone and marl	MPlsma	Aghajari
Conglomerate and sandstone	PlCb	Bakhtiari
Active stream channel deposits	Qal	-
Quaternary terraces	Q2t	-
Quaternary low-level terraces	Q3t	-
Thick to medium-bedded grey dolomite	Edj	Jahrum
Thick to medium-bedded cream fossiliferous limestone	Klt	Tarbur
Interbedded bluish-grey marl and limestone	Kmg	Gurpi
Massive brownish grey limestone	KlSi	Sarvak-Ilam

Table 2. Results of the modeling process by training dataset.

Factors	RF	RS	BA	AB	SVM
True positive (TP)	57	58	56	53	58
True negative (TN)	61	59	59	57	59
False positive (FP)	8	10	10	12	10
False negative (FN)	12	11	13	16	10
Sensitivity	0.826	0.841	0.812	0.768	0.853
Specificity	0.884	0.855	0.855	0.826	0.855
Accuracy	0.855	0.848	0.833	0.797	0.854
Kappa	0.710	0.696	0.696	0.666	0.696
RMSE	0.318	0.342	0.344	0.349	0.345
AUC	0.944	0.915	0.911	0.915	0.908

Table 3. Evaluation and comparison of models by validation dataset

Factors	RF	RS	BA	AB	SVM
True positive (TP)	18	19	18	18	18
True negative (TN)	25	27	24	24	23
False positive (FP)	4	2	5	5	6
False negative (FN)	11	10	11	11	11
Sensitivity	0.621	0.655	0.621	0.621	0.621
Specificity	0.862	0.931	0.828	0.828	0.793
Accuracy	0.741	0.793	0.724	0.724	0.707
Kappa	0.460	0.561	0.425	0.416	0.392
RMSE	0.410	0.386	0.400	0.427	0.416
AUC	0.878	0.886	0.856	0.821	0.841

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tien Bui, D.; Shirzadi, A.; Shahabi, H.; Geertsema, M.; Omidvar, E.; Clague, J.J.; Thai Pham, B.; Dou, J.; Talebpour Asl, D.; Bin Ahmad, B.; et al. New Ensemble Models for Shallow Landslide Susceptibility Modeling in a Semi-Arid Watershed. Forests 2019, 10, 743. https://0-doi-org.brum.beds.ac.uk/10.3390/f10090743

AMA Style

Tien Bui D, Shirzadi A, Shahabi H, Geertsema M, Omidvar E, Clague JJ, Thai Pham B, Dou J, Talebpour Asl D, Bin Ahmad B, et al. New Ensemble Models for Shallow Landslide Susceptibility Modeling in a Semi-Arid Watershed. Forests. 2019; 10(9):743. https://0-doi-org.brum.beds.ac.uk/10.3390/f10090743

Chicago/Turabian Style

Tien Bui, Dieu, Ataollah Shirzadi, Himan Shahabi, Marten Geertsema, Ebrahim Omidvar, John J. Clague, Binh Thai Pham, Jie Dou, Dawood Talebpour Asl, Baharin Bin Ahmad, and et al. 2019. "New Ensemble Models for Shallow Landslide Susceptibility Modeling in a Semi-Arid Watershed" Forests 10, no. 9: 743. https://0-doi-org.brum.beds.ac.uk/10.3390/f10090743

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

New Ensemble Models for Shallow Landslide Susceptibility Modeling in a Semi-Arid Watershed

Abstract

1. Introduction

2. The Study Area

3. Materials and Methods

3.1. Data Collection and Processing

3.1.1. Datasets

3.1.2. Landslide Conditioning Factors

3.1.3. Factor Selection Using the One-R Attribute Evaluation (Orae) Technique

3.2. Modeling Process

3.2.1. Support Vector Machine (Svm) Algorithm

3.2.2. Ensemble/Meta Classifier Algorithms

3.3. Performance and Evaluation of the Landslide Models

3.3.1. Statistical Metrics

3.3.2. ROC Curve

4. Results and Analysis

4.1. Factor Selection

4.2. Shallow Landslide Modeling Process

4.3. Development of Landslide Susceptibility Maps

4.4. Validation and Comparison of Landslide Susceptibility Maps

5. Discussion

6. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI