Combining Evolutionary Algorithms and Machine Learning Models in Landslide Susceptibility Assessments

Chen, Wei; Chen, Yunzhi; Tsangaratos, Paraskevas; Ilia, Ioanna; Wang, Xiaojing

doi:10.3390/rs12233854

Open AccessEditor’s ChoiceArticle

Combining Evolutionary Algorithms and Machine Learning Models in Landslide Susceptibility Assessments

¹

College of Geology and Environment, Xi’an University of Science and Technology, Xi’an 710054, China

²

Key Laboratory of Coal Resources Exploration and Comprehensive Utilization, Ministry of Natural Resources, Xi’an 710021, China

³

Laboratory of Engineering Geology and Hydrogeology, Department of Geological Sciences, School of Mining and Metallurgical Engineering, National Technical University of Athens, 15780 Zografou, Greece

^*

Author to whom correspondence should be addressed.

Remote Sens. 2020, 12(23), 3854; https://0-doi-org.brum.beds.ac.uk/10.3390/rs12233854

Submission received: 14 October 2020 / Revised: 18 November 2020 / Accepted: 20 November 2020 / Published: 25 November 2020

(This article belongs to the Special Issue Spatial Modelling of Natural Hazards and Water Resources through Remote Sensing, GIS and Machine Learning Methods)

Download

Browse Figures

Versions Notes

Abstract

:

The main objective of the present study is to introduce a novel predictive model that combines evolutionary algorithms and machine learning (ML) models, so as to construct a landslide susceptibility map. Genetic algorithms (GA) are used as a feature selection method, whereas the particle swarm optimization (PSO) method is used to optimize the structural parameters of two ML models, support vector machines (SVM) and artificial neural network (ANN). A well-defined spatial database, which included 335 landslides and twelve landslide-related variables (elevation, slope angle, slope aspect, curvature, plan curvature, profile curvature, topographic wetness index, stream power index, distance to faults, distance to river, lithology, and hydrological cover) are considered for the analysis, in the Achaia Regional Unit located in Northern Peloponnese, Greece. The outcome of the study illustrates that both ML models have an excellent performance, with the SVM model achieving the highest learning accuracy (0.977 area under the receiver operating characteristic curve value (AUC)), followed by the ANN model (0.969). However, the ANN model shows the highest prediction accuracy (0.800 AUC), followed by the SVM (0.750 AUC) model. Overall, the proposed ML models highlights the necessity of feature selection and tuning procedures via evolutionary optimization algorithms and that such approaches could be successfully used for landslide susceptibility mapping as an alternative investigation tool.

Keywords:

landslide susceptibility; feature selection; optimizing structural parameters; evolutionary algorithms; genetic algorithms; particle swarm optimization; support vector machines; artificial neural network

Graphical Abstract

1. Introduction

Landslides are natural disasters, the evolution of which has significant adverse consequences on the natural and build environment and in some cases, unfortunate human losses [1,2,3]. An increase in landslides has been reported worldwide, which is mainly associated with the impact of human activities and also natural processes. Specifically, it is well documented that favorable geo-environmental settings, climate change (the manifestation of extreme rainfall), human activities (the construction of new settlements and infrastructures in landslide-prone areas), and also changes in land-use patterns increase the possibility of landslides, which is a trend that likely will continue in the future [3,4,5]. As a response to increasing stress, the international scientific community has focused its efforts on designing effective strategies for predicting and mitigating the negative effects of landslides [2]. Numerous studies involving the estimation of the probability of the occurrence of a landslide considering, either their spatial component or both their spatial and temporal component, could be found. According to [6], the estimation of the landslide-prone areas, taking into account only their spatial component referred to as landslide susceptibility, is by far the most investigated topic and one of the most important components in a landslide risk study [7]. The prediction of landslides has been described mainly as a classification problem. The classification process is performed based on the complex and non-linear relation that exists between the spatial distribution of landslides and the landslide-related parameters [8,9].

Despite the large number of landslide studies, there is no guideline as to which technique or method should be followed during the estimation of the landslide susceptibility. The choice of the appropriate method depends on a set of parameters related to the quality and quantity of available data, the desired scale of analysis, and the geo-environmental conditions of the research area [10,11,12]. Considering the available methods, they could be categorized into qualitative (knowledge-driven) and quantitative methods (data-driven) [13,14]. Qualitative methods are based on the judgment of experts, whereas quantitative methods involve mainly the implementation of statistical, probabilistic, and deterministic models [2,15]. With the technological enhancement related mainly to hardware and also improvements in remote sensing and geographic information system technology, there was an analogous increase in the development of intelligent methods and techniques. In recent years, machine learning (ML) methods that depend on algorithms to learn the relation between a landslide and landslide-related parameters are utilized in landslide assessments [16,17,18,19]. It has been documented that ML methods produce more accurate predictions than statistical techniques, especially in cases when they assess parameters with different statistical distributions and complex feature space [17,20,21,22]. ML methods employ learning procedures and show excellent performance in discovering hidden trends and unknown patterns from large databases. In contrary to statistical methods, ML can assess data of different types without adopting assumptions concerning the statistical distribution of data [23]. Notable examples of data-driven methods involving the assessment of landslide phenomena, include fuzzy logic and neural-fuzzy logic, [24,25,26,27,28,29], support vector machines (SVM) [30,31,32], tree-based models, such as decision trees, classification and regression trees, random forest (RF), bagging, boosting [33,34,35,36,37,38,39,40] and artificial neural network based such as Feed-forward, Radial basis function, Kohonen Self Organizing, Recurrent, Convolutional, and Deep Learning [4,5,10,38,41,42,43,44,45,46].

ML methods come along with some constraints, which are mainly associated with the computational complexity, problems arising from the effects of the curse of dimensionality, and also over-fitting problems [47]. Additional to the above, the process of tuning the structural parameters used by the predictive models has been reported as a major drawback [20,44,48]. Tien Bui et al. [48] reports the benefit of selecting the optimal ML model and the usage of a proper conditioning selection method for landslide susceptibility assessments. The two most common approaches for tuning the structural parameters of the prediction models is the trial and error approach or a grid search technique. However, such approaches need a considerable computational time and may not always provide the optimal solution [21,49,50]. Similar approaches are adopted for the selection of an optimal set of parameters. This process is considered a significant task and when applied may reduce the effect of the curse of dimensionality, providing in most cases more accurate predictions [51].

Several studies could also be found in the literature that combine optimization algorithms and ML models in landslide susceptibility assessments that provide a sufficient solution in the search of optimal parameters and tuning of the structural parameters of the models [5,31]. Currently, one of the most frequent choices of optimization algorithms are those that are based on evolutionary algorithms, such as genetic algorithms (GA), particle swarm optimization (PSO), ant colony optimization (ACO), artificial bee colony (ABC), and gray wolf optimizer (GWO) [20,29,36,52,53]. In reference with the tuning process and the usage of evolutionary and swarm algorithms, several successful implementations of Adaptive Neuro Fuzzy Inference System (ANFIS) optimized by PSO for landslide spatial prediction have been reported [36,54]. Another example of the usage of evolutionary algorithms in landslide susceptibility is the one reported by [31], which introduced a novel model using a geographically weighted regression (GWR) technique so as to segment study areas into a series of prediction regions with appropriate sizes and an SVM classifier optimized by the PSO algorithm. The authors report higher prediction accuracy than conventional methods. Likewise, Nguyen et al. [55] applied a PSO and ABC algorithm to tune the structural parameters of an ANN. The outcomes of their study confirmed the significant increase of prediction accuracy from nearly 77% to around 86%. Chen, Panahi, and Pourghasemi [20] used three optimization techniques (PSO, GA, and differential evolution (DE)) to tune an ANFIS model. The authors highlighted that all three ensembles obtained satisfactory prediction accuracy, with the ANFIS-DE being the most promising ensemble model. In similar studies, concerning flood susceptibility maps, Tien Bui et al. [56] applied two meta-heuristic algorithms (PSO and EG) in order to tune an ANFIS model. The comparison of the results obtained by the proposed approach with the outcomes of a series of ML models (J48, DT, RF, MLP, Neural Nets, and SVM models) revealed the higher accuracy achieved by the proposed approach.

Although there are many studies that could be found in the literature with the usage of evolutionary algorithms in landslide susceptibility mapping, there are few a that combine two different evolutionary algorithms for feature selection and tuning process. In this context, the present study illustrates the integration of a feature selection evolutionary algorithm and an evolutionary optimization algorithm with two ML models used in landslide susceptibility assessments. The usage of the GA algorithm generated prediction models with reduced complexity and high generalization ability, whereas the model parameter tuning process via the PSO ensured more accurate predictions for both models. The remainder of the study is organized as follows: In the next section, Section 2, there is a short description concerning the study area. In Section 3, the methodology has been introduced, including a description of the algorithms and ML models used during the study. Data are presented in Section 3, whereas Section 4 lists the outcomes of the analysis. Section 5 covers the discussion, whereas the last section includes the conclusion of the present study.

2. Study Area

The Regional Unit of Achaia (RUA) covers the NW of Peloponnese, Greece, stretching to an area of 3274 km², between longitudes 260,000 and 360,000 and latitudes 4,180,000 and 4,250,000 (Coordinate system, EPSG:2100, GGRS87) and a population of 300,000 inhabitants (Figure 1). The RUA is one of the most mountainous regions of Greece covered by massive rocky limestone ridges and high peaks [24]. The geomorphological settings appear to be controlled mainly by geology, tectonic activity, and weathering and erosion procedures [57]. The elevation ranges between 0 and 2310 m with over 53% of its total area having elevation values above 1000 m.

The climate of the area is characterized by a Mediterranean type, having warm, dry summers and mild, wet winters. The wet season expands from October to May. The total rainfall accounts for 93% of the annual rainfall. The wettest month is December, in which the mean rainfall is 128.9 mm, followed by November (124.7 mm). August is the driest month with a mean rainfall of 7.0 mm, followed by July (8.8 mm). Concerning the highest mean annual temperature, it corresponds to 24.3 °C, whereas the lowest value corresponds to 8.0 °C. The mean annual temperature reaches 13.7 °C [58]. Concerning the geological structure and the geotectonic settings of the study area, it is characterized by the presence of three geotectonic zones (Ionian, Olonos-Pindos and Gavrovo-Tripolis) and dominant tectonic fracturing that is responsible for the occurrence of Plio-Pleistocene sediments [59]. Specifically, within the study area, the following geological units are identified: fine-grained to coarse-grained loose Quaternary formations, coarse-grained coherent Quaternary formations, fine-grained to coarse-grained Plio-Pleistocene sediments, flysch formations, Cretaceous, Triassic, and Jurassic limestones, dolomites, schist-chert formations, sandstones, semi-metamophic, and formations of volcanic origin [60,61] (Figure 2).

3. Materials and Methods

3.1. Methodology

The proposed methodology follows a five-phase procedure, which involves (i) data processing, (ii) multi-collinearity analysis, (iii) feature selection, (iv) optimizing the SVM and ANN models, (v) constructing the landslide susceptibility map, testing, and comparing (linear regression analysis) the two models. For the implementation of the evolutionary algorithms and ML models, the “caret” R package [62] was used, whereas ArcGIS 10.3.1 [63] was used for accessing the spatial data and constructing the landslide susceptibility maps. Figure 3 illustrates the five-phase methodology, whereas a brief description of each phase is introduced in the following paragraphs.

3.1.1. First Phase

The first phase involved, as a first action, the identification of the landslide and non-landslide areas and the construction of the inventory map. Landslide areas were located during field trips from 2017 to 2018, previous reports and studies that cover a time span of 67 years from 1950, and the interpretation of aerial photographs, whereas non-landslide areas were randomly selected from the freeing of landslide space by applying the Create Random Points, which is a geo-processing tool installed in the Data Management Tool of the ArcGIS platform [63].

A second action involved the selection of the landslide-related parameters and the classification and weighting of each parameter. Twelve parameters (elevation, slope angle, slope aspect, profile curvature, plan curvature, curvature, topographic wetness index (TWI), stream power index (SPI), lithology, hydrolithology cover, distance to faults, and distance to river network) were selected as landslide-related parameters based on the experience gained from previous studies conducted in the same area but also in areas with similar geo-environmental settings [57,64]. Resolution of the input data has an impact on the produced landslide accuracy, as a larger grid size leads to lost information [65]. In our case, each parameter was transformed into raster format with a grid size of 30 m. The grid size was determined by the input data, the DEM file (30 m), and the scale of the geological and topographic maps (1:50,000). The process of classification and weighting is mandatory in landslide susceptibility assessments. The classification of the continuous parameters, such as the geo-morphological parameters, was based on expert knowledge and previous studies with similar settings, whereas the weighting was based on the Weight of Evidence (WofE) method [66]. In the literature, several weighting methods have been used, including expert-based methods with most common being the Analytic Hierarchic Process and also data-driven methods with the Frequency ratio being the most common [65]. In our case, the WofE method was used in order to capture the different probability of landslide occurrence of each variable and class.

WofE has been extensively used in studies involving natural hazard analysis, including floods and landslides and is characterized as a data-driven approach. The theoretical background of the WofE model involves the implementation of the Bayes theorem and prior and posterior probability concepts [66]. In our case, each class of each parameter is assigned with a positive or a negative weight (W+ and W-) estimated based on the percentage of landslides in each class. The weight corresponds to a positive or a negative spatial correlation that may present among the classes of the landslide-related parameters and landslides. The measure of the spatial association is provided through the magnitude of contrast (C), the difference between W+ and W-. When the value of C is positive, it means that a positive correlation exists, while in the case of a negative value, it implies negative spatial association [66,67,68,69].

The following action in phase A was normalizing the parameters using the max–min procedure, with new values ranging between 0.1 and 0.9 [70]. The final action involved the construction of the training and testing subsets. This action had two parts, the first involving assigning to each data point the values that correspond to each parameter and the second partitioning the data into training and testing data. For the first part, the Spatial Analyst geoprocessing tool Extra Multi Values to Points was applied, whereas for the second part, the geoprocessing tool Create Subsets found in the Geostatistical Toolbox was used [63]. The training subset included 70% of landslide and non-landslide areas, whereas the testing subset included the remaining 30%.

3.1.2. Second Phase

The second phase involved a multi-collinearity analysis using the training subset, so as to find potential correlation among the landslide-related parameters and exclude those parameters from the next phase of analysis [48]. The analysis involves the calculation of two metrics, the Variance Inflation Factor (VIF) and the Tolerance index (TOL). In order to indicate severe multi-collinearity, the VIF index should be higher than 10 and the TOL index should be lower than 0.1 [71,72].

3.1.3. Third Phase

The third phase involved the feature selection procedure. As mentioned earlier, GA was the main evolutionary algorithm used for this purpose. In general, GA, which has been proposed by Holland [73], was developed based on the concepts of natural selection and biological operators (mutation, crossover, and selection). They are mainly used in optimization and search problems. The basic idea behind GA is that from an initial set of candidate solutions, which constitute the population, and repeated evolutionary process, the generated solutions would be better than the initial. Solutions, which are referred to as individuals, correspond to specific fitness values, where in our case, higher values are considered better. Those solutions that have the best fitness values are selected and undergo a crossover and random mutation process, which is repeated several times, producing several generations, which in principle creates optimal solutions. In our case, where we used GA for feature selection, the individuals are subsets of landslide-related parameters that are encoded as binary (1 and 0). The fitness values are measures of model performance, and specifically the classification accuracy achieved by the base learner, an RF model, using a 10-fold cross-validation procedure. The feature selection routine was conducted using the r package “caret” [62].

3.1.4. Fourth Phase

The fourth phase involved the implementation of the two machine learning models (SVM and ANN) and the usage of PSO as a tuning procedure. The PSO algorithm is an algorithm that is based on the behavior of bird flocking and fish schooling in the real world (swarm intelligence) [17], which was first introduced by Kennedy and Eberhart-Phillips [74]. Similar to GA, the PSO analyzes an initial group of random solutions (particles), sets a population (swarm), and updates the generation in order to achieve an optimal solution. The movement of each particle is influenced by the position and velocity of the surrounding particles, which is controlled by a fitness function. The objective of the PSO algorithm is to alter the velocity and position of each particle in order to maximize or minimize the fitness function. The fitness function was the same used in the feature selection process, maximizing the classification accuracy.

Concerning the two models, SVM and ANN, both are quite popular in landslide susceptibility assessments. SVM are characterized as non-parametric kernel-based techniques [75], appearing efficient in solving linear and non-linear classification and regression problems [76]. SVM are capable of defining complex decision functions that could separate two classes of data samples in an optimal way [77]. Specifically, the objective is to construct a function f(x) that deviates from known outcomes by a value no greater than ε (epsilon) for each training point x and at the same time to be as flat as possible. The main structural parameters used during the implementation of SVM model were C (cost), the cost of predicting a sample, γ (gamma), the precision parameter for the radial basis function, and ε (epsilon), the epsilon in the SVM insensitive loss function.

ANN are defined mainly as supervised ML models; they are able to classify unknown data to possible classes based on known data points and a set of inputs. ANN are considered as information processing systems that can provide knowledge based on a learning mechanism that is similar to the way humans learn [41,43]. ANN learns by processing known examples, storing within the data structure of the developed network probability-weighted correlations. The most common architecture of an ANN involves layers of neurons interconnected with a set of correlation weights, which by using a complex non-linear function transforms the input data to certain outputs. In our study, the ANN model, a feedforward MultiLayer Perceptron neural network, included three layers of neurons: an input layer, a hidden layer, and an output layer. The input layer had one neuron for each input landslide-related parameter. The hidden layer included a number of neurons that permitted complexities to develop among the input neurons. The output layer included only one neuron that corresponded to the outcome of the process (in our case, classifying an unknown area as landslide or non-landslide). The learning process that an ANN follows involves the estimation of the difference between the prediction of the ANN and the known target outcome. The ANN model adjusts the weighted correlations of each neuron that it has based on this difference and a learning rule. After a number of successive adjustments, the outcomes of the ANN model will be similar to the known target outcome. The learning procedure terminates when a certain criteria of accuracy are achieved. The “nnet” R package [78] was used, whereas the main structural parameters that the PSO algorithm tuned were the number of neurons in the hidden layer (size), the learning rate decay (decay), and the number of training iterations (epochs).

3.1.5. Fifth Phase

The first action of the fifth phase involved the implementation of the optimized SVM and ANN models for the construction of the landslide susceptibility maps. The landslide susceptibility maps illustrate the spatial distribution of the probability of an area to manifest landslides. The two maps were reclassified into a five-level map (very low susceptibility, low susceptibility, moderate susceptibility, high susceptibility, and very high susceptibility) using the natural break classification scheme [79]. The next action involved testing and comparing the two models by conducting a receiver operating characteristic (ROC) curve analysis and estimating the area under the ROC curve value (AUC) [11,80]. The final action was the implementation of the Wilcoxon signed-rank test in order evaluate the chance that the models produce statistically significant different outcomes by estimating the p and z values [81]. The null hypothesis was that the two models produced similar outcomes, and it was rejected when the p value was less than the significant level (0.05) and the z value was out of the range of the critical values of z (−1.96 and +1.96) [82,83,84].

3.2. Data

The landslide database included 335 landslides which according to the Varnes Classification System [85] were classified into three types: rockfalls, translational, and rotational landslides [57,59,86]. The majority were translational and rotational landslides, which were mainly manifested at areas that are covered by a thick weathering mantle and within the upper fragmentation zone of the geological formations [57] (Figure 4). Rockfalls were mainly associated with areas that are characterized by intense slope and are covered by limestone, schist, and chert formations [57]. Based on the results of previous studies concerning the area, in most cases, the combined action of intense rainfall, seismic–tectonic and human activity were identified as the main triggering factors [59,87,88,89]. However, in our study, triggering factors are not included in the analysis, since the purpose is to provide a series of landslide susceptibility maps that by default do not consider a triggering factor. The 335 data point samples were extracted representing the centroid of each landslide. In only few cases of very large landslides was there the need to introduce additional points in order to represent the overall settings of the area. The 335 non-landslide samples were selected following the procedure described in the previous section (Section 3.1). Thus, a final database of 670 landslide and non-landslide samples was formed.

The geo-morphological parameters of elevation, slope angle, slope aspect, plan curvature, profile curvature, curvature, TWI, and SPI were derived using specific geo-processing tools found in the ArcGIS suite from an ASTER GDEM with 30 m resolution [63,84,90]. The lithological cover, hydrological cover, distance from faults, and distance from river network were digitized from the geological and topographic map sheets from IGME and National Topographic Maps that covered the area [91,92]. It is well known that elevation, slope angle, and slope aspect are three geo-morphological parameters that have a significant impact on landslides [93,94] (Figure 5a–c). It also well observed that areas of high altitude and smooth surface are less prone to landslide than abrupt slopes [95,96]. In the present study, elevation was classified into 5 classes, using the natural break classification scheme, whereas slope angles were classified into 5 classes based on expert knowledge. Slope aspect was classified into 8 classes, namely: north (N), northeast (NE), east (E), southeast (SE), south (S), southwest (SW), west (W), northwest (NW). Curvature, profile curvatures, plan curvatures, and secondary geo-morphological parameters are also considered as significant parameters, since they express local surface relief and complexities, which in most cases has an impact on the erosion process and landslides [93,97] (Figure 5d, Figure 6a,b). In the present study, the three curvature parameters were classified into three categories that represent negative, near zero, and positive values. According to Nefeslioglu et al. [38,98], the SPI parameter expresses the erosive power of surface flow and also the capability of sediment transportation, both processes influencing the landslide manifestation, with high values being evidence for increased erosion risk (Figure 6d) [99]. The TWI parameter can provide the spatial distribution of saturated source areas of runoff generation [100,101] based on the topography of the area (Figure 6c). SPI and TWI values were classified into 5 classes using the natural break method. As already mentioned, seismo-tectonic activity plays an important role in the manifestation of landslides [59,88]. In the present study, the distance from faults was classified into five classes: <150 m, 151–300 m, 301–450 m, 451–600 m, and > 601 m (Figure 7a). The spatial distribution of the river network influences the surface runoff and degree of ground water infiltration, which are processes that may create the appropriate conditions to cause landslides [102]. The distance from the river network was classified into five classes: <200 m, 201–400 m, 401–600 m, 601–800 m, and >801 m (Figure 7b). Finally, the hydrological cover was classified into three categories, namely: permeable formations, semi-permeable formations, and impermeable formations (Figure 7c).

4. Results

4.1. First Phase—WofE Analysis

Based on the results of the weighing procedure performed by the WoE method, the lithological cover had the highest correlation followed by the slope angle (Figure 8). The highest correlation (C = 1.0623) was estimated for the class Volcanic rock and for the 4th class of the slope angle (>31°). Considering the slope angle, a clear increase has been recorded in the occurrence of landslide with the increase in slope angle.

A strong influence also has been recorded for the parameter distance from fault (C = 0.6724), and classes that characterize areas between 201 and 600 m away from faults. A similar influence has been observed for the 4th class (C = 0.6285) of the SPI parameter, whereas areas with elevations between 506 and 804 m exhibit the high values of C (0.5877).

The class of the profile curvature that characterizes surfaces that appear upwardly convex and flow of water will be decelerated, which was positively correlated with the occurrence of landslides to a higher degree (0.2809) than the class that characterizes surfaces that appear upwardly concave, in which the flow of water will be accelerated (0.1517). In the case of plan curvature, both classes that characterize the surface that are laterally convex and concave appear with a positive correlation (0.3269 and 0.3152 respectively). The SPI value has the highest correlation, with an occurrence of landslides in the range of 10–20.

4.2. Second Phase—Multi-Collinearity Analysis

According to the outcomes of the multi-collinearity analysis, there is no multi-collinearity issues among the twelve parameters; thus, the whole set of parameters could be included for further analysis in the next phase (Table 1). Slope angle was the parameter with the smallest TOL value (0.3053) and the highest VIF value (3.2750), however, these were outside of the limits that are considered as an indication of collinearity (TOL < 0.1 and VIF > 10).

4.3. Third Phase—Feature Selection by GA

Concerning the feature selection procedure based on GA, the maximum number of generations was set to 100 with a population size of 20 individuals, the crossover probability was set to 0.8, the mutation probability was set to 0.1, and elitisim was set to 0. These were the default values. During the analysis, the top five selected variables were elevation, distance from river network, slope angle, distance from faults, and lithological cover. On average, seven variables were selected, whereas in the final search using the entire training set, nine parameters were selected at iteration 25. The external performance at this iteration was 0.7508. The nine parameters were elevation, slope angle, profile curvature, plan curvature, TWI, SPI, distance from river network, distance from faults, and lithology, whereas hydrological cover, curvature, and slope aspect were found to be the three least predictive parameters. Overall, approximately 4367 sec were needed to give an outcome using a desktop PC with an Intel^® Core™ i5-4460 CPU 3.20 GHz processor and 8 GB RAM.

4.4. Fourth Phase—Optimizing SVM and ANN by PSO for Landslide Susceptibility Mapping

The next phase included the optimization procedure based on PSO for the two models, using only the features selected by the previous feature selection procedure. The optimal parameters are as follows: for the SVM model, cost = 5.48, gamma = 0.32, epsilon = 0.47, and the ANN model size =22, decay = 0.089, and maximum iteration = 120. The computational time needed for the optimization procedures was 828 and 1323 s, respectively.

Figure 9 and Figure 10 illustrates the landslide susceptibility maps produced by the implementation of ANN and SVM optimized by GA and PSO. Both maps were classified into a five-level scheme: very low, low, medium, high, and very high using the Natural Break classification method; details are in Section 3.1. From a visual inspection, the two maps show a similar spatial distribution; however, the ANN model presents a slightly higher coverage in the very high susceptibility class.

4.5. Fifth Phase—Evaluating the Performance of the SVM and ANN

Concerning the testing phase and the ROC analysis, for both models, their learning and predicting performance was evaluated based on the training and testing database, respectively (Figure 11, Table 2). The results showed that the SVM model had a slightly higher AUC value (0.977) followed by the ANN model (0.969). Based on the testing database, a quite different pattern has been recorded. The ANN model presented the highest AUC value (0.800) followed by the SVM model (0.750).

Figure 12a–d illustrates four histograms with the relative frequency and normal distribution that correspond to the two models and for landslide and non-landslide areas based on the testing dataset. The best ability to identify potential landslides was attributed to the ANN model, with almost 60% having values over 0.9. On the other hand, SVM gave the best results in the case of non-landslide areas. Slightly higher than 60% of the cases were lower than 0.3, indicating non-landslide areas, whereas the ANN model classified about 18% of non-landslide as landslides.

Although from the visual inspection the two susceptibility maps seem quite similar, from the Wilcoxon signed-rank test, the two models gave statistically significant different results. For a 95% significant level, it was found that the p value was 0.044 (less than 0.05), whereas the z value was 2.01 (outside the range of critical values −1.96 and +1.96). Thus, the two models produce different outcomes not by chance.

5. Discussion

Despite the large number of studies concerning landslide susceptibility and hazard modeling assessments, there are still no guidelines about the method or technique that appears as the most appropriate [2,17,93]. According to Van Westen et al. [103], every study area has its own unique set of landslide-related parameters, which affects to a different degree the probability of a landslide. Furthermore, no framework exists for the selection of the most appropriate parameters, making the whole process of landslide assessment a very difficult and complex task. Based on the above, no safe conclusion can be drawn as to the effectiveness of a method or technique that has universal validity. However, there is an increase confidence that ML methods provide higher predictive models than other statistical or knowledge-based methods. Two main advances of ML models are that they efficiently minimize uncertainties and have a greater tolerance in noisy or incomplete data [17,104].

Passing to our study, although the multi-collinearity analysis showed that all parameters could be used for analysis (since the parameters showed TOL values greater than 0.1 and VIF values less than 10), the feature selection procedure excluded three parameters: slope aspect, curvature, and hydrological cover. This may be attributed to the spatial distribution of landslide locations within the generated classes of the three parameters that appears to be more even. Thus, the three parameters appear with less discernment ability. Another probable explanation could be the presence of spatial autocorrelation; in our case, landslide and non-landslide locations have similar values in those three parameters. Our findings are in accordance with previous studies that indicate that an exclusion of certain parameters may be due to the spatial autocorrelation and data redundancy among the landslide-related parameters [4]. However, it must be mentioned that slope aspect and hydrological cover and also curvature are significant parameters that may have a great impact on the evolution of landslides [2,93]. Concerning the slope aspect, it defines the impact of sun exposure and rainfall controlling indirectly surface erosion and slope stability, which can vary greatly even at distances shorter than the resolution of the model (30 m). Although feature selection procedures assist in producing more accurate prediction models, they should be used with caution, and one must have in mind that transferring the knowledge gained in a study area to another may not be visible in all cases.

Based on the WofE method, concerning the lithological cover, the highest values were found in areas covered by volcanic rocks, fine grained deposits, and flysch formations. Similar findings have been provided from other researchers [57,59]. It seems that the fine-grained sediments, composed by mixing layers of clayey marls, marls, silty sands, and weak sandstones, are more prone to landslides. This could be attributed to the heterogeneous structure and the presence of a high degree of looseness that characterizes this type of formation [59]. Similarly, the flysch formations that are found within the research area are prone to rotational and transitional slides. Flysch formations are characterized by a varying and anisotropic geotechnical behavior, intense folded sediments, and thick weathering mantle, which are characteristics that significantly influence the evolution of landslides [87,89]. It is also well documented that slope angle is one of the most critical parameters that plays a significant role in slope stability [33]. Slope angle influences the shear and normal strength, which are developed on the discontinuities surfaces, with a steeper slope characterized by a higher shear stress and a lower safety factor [50]. In our study, higher slope angles (greater than 30°) showed higher C values; therefore, they contributed to a much higher degree to landslide susceptibility.

Although the two models gave similar results and their difference in predictive performance is about 5%, the final map that should be used is the one with the highest performance. As several previous studies have reported, even a 1 or 2% increment of the prediction accuracy could significantly control the resulting landslide susceptibility zones [11,16,105], making it a necessity to accurately predict these zones with the implementation of a high-performance-based model. In addition, our study highlighted the difference between the way each model classified the non-landslide areas, which is a variation that may be attributed to the learning and prediction procedures each model follows but also the identification of the non-landslide areas. Similar findings are reported by other researchers, who highlight the important task of identifying non-landslide areas. In our case, non-landslide areas may concern data characterized by some degree of “noise” that is more evenly distributed within the classes of each parameter and that the SVM model may by more tolerant to such data. Comparing the performance of each model, the ANN model shows a more stable performance, low variation between learning and predictive accuracy, and a higher prediction accuracy. A similar performance has been reported by several researchers in the field of natural hazard assessments, which assume that ANN is more capable of solving non-linear and complex problems in comparison to other ML models, such as SVM and tree-based models [106,107]. According to Tien Bui et al. [82], a Multi-Layer Preceptron model (neural-based model) was used to construct an landslide susceptibility map, and it outperformed several other ML models among which was a SVM model, based precisely on this ability. Overall, more studies are required under different settings and conditions that use other ML methods so that one can make safe conclusions about the superiority of one ML method over another.

Another issue that has a significant impact in landslide assessments is that during the selection of a predictive model, there is a need to understand its abilities and limitations. In our case, despite its very good performance, the optimized ANN model was not capable of providing an estimate of the contribution each parameter had on the classification process. In general, ANN models are known as black-box models, which provide little information concerning the learning and predictive process or how conclusions can be obtained from the model’s outcomes [108,109]. The implementation of our methodology comes along with some drawbacks and limitations. First of all, there was a high computational power requirement and the significant computational time it took to obtain the final result. Both may be a serious drawback when results are needed immediately. Rewriting parts of the code or dividing the area into smaller parts and applying the methods in each area may help mitigate some of the above limitations. Another issue that may have an impact on the overall performance of the applied methods is the selection of the appropriate values for the structural parameters of the evolutionary algorithms that are used for feature selection and tuning. Default values defined by the development team or the various r packages that were used in our study were applied. However, these may influence the outcomes. Specifically, the parameter elitism in GA that allows the best individual in the overall optimization to survive until the end of the evolution was set to 0 in our study. By this, there is a chance that the best individual at a given generation may be replaced by a new offspring or may be selected for mutation; hence, it can be lost. Perhaps the usage of grid search techniques may assist to overcome the important task of setting the appropriate structural parameters of the evolutionary algorithms. This could be an interesting future work. Finally, neither of the two models gave any explanation concerning the influence that each parameter had on the overall landslide susceptibility. The implementation of methods that provide such information may help in order to establish a better understanding about the relation between the landslide parameters and landslide susceptibility. However, a clear importance value would be useful probably only for our study area, since the underlying mechanism responsible for landslide manifestation is rather complicated and site specific.

6. Conclusions

In the present study, an advanced approach for landslide susceptibility mapping was developed that involved the usage of evolutionary algorithms for feature selection procedures and tuning the structural parameters of ML models. Two ML models were evaluated, SVM and ANN. The results of our study indicate that both models performed very well, achieving high learning and predictive accuracy, with the ANN model having a more stable performance with a much higher predictive accuracy. The results illustrate a difference in the way that the non-landslide areas were classified, which may be attributed to the different way SVM and ANN learn and predict, but also the identification procedure that was followed concerning the non-landslide areas. The ANN model classified more unknown sites as landslides than SVM. Finally, it must be highlighted that the gained knowledge regarding the methodology approach and also the application of the method at the Achaia Regional Unit may assist both the scientific community and the local and government authorities. The scientific community may adopt the approach, process the raw data within a specific probabilistic manner, introduce feature selection procedure, and optimize-tune the predictive models, whereas the local and government authorities could include the findings of our study in any forthcoming landslide management plan.

Author Contributions

All authors have read and agreed to the published version of the manuscript. W.C., Y.C., P.T., I.I. and X.W. contributed equally to this work. P.T. and I.I. collected field data, conducted the modeling and wrote the manuscript. W.C., Y.C. and X.W. provided critical comments in planning of this paper and edited the manuscript. W.C., Y.C., P.T., I.I. and X.W. contributed to the revision of the manuscript. All authors discussed the results and edited the manuscript.

Funding

This research was funded by the Innovation Capability Support Program of Shaanxi (Program No. 2020KJXX-005).

Acknowledgments

We would like to express our thanks to anonymous reviewers.

Conflicts of Interest

The authors declare no conflict of interest.

References

Feizizadeh, B.; Blaschke, T. An uncertainty and sensitivity analysis approach for GIS-based multicriteria landslide susceptibility mapping. Int. J. Geogr. Inf. Sci. 2014, 28, 610–638. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Reichenbach, P.; Rossi, M.; Malamud, B.D.; Mihir, M.; Guzzetti, F. A review of statistically-based landslide susceptibility models. Earth Sci. Rev. 2018, 180, 60–91. [Google Scholar] [CrossRef]
Kirschbaum, D.; Stanley, T.; Zhou, Y. Spatial and temporal analysis of a global landslide catalog. Geomorphology 2015, 249, 4–15. [Google Scholar] [CrossRef]
Zare, M.; Pourghasemi, H.R.; Vafakhah, M.; Pradhan, B. Landslide susceptibility mapping at Vaz Watershed (Iran) using an artificial neural network model: A comparison between multilayer perceptron (MLP) and radial basic function (RBF) algorithms. Arab. J. Geosci. 2013, 6, 2873–2888. [Google Scholar] [CrossRef]
Hong, H.; Tsangaratos, P.; Ilia, I.; Loupasakis, C.; Wang, Y. Introducing a novel multi-layer perceptron network based on stochastic gradient descent optimized by a meta-heuristic algorithm for landslide susceptibility mapping. Sci. Total Environ. 2020, 742, 140549. [Google Scholar] [CrossRef]
Korup, O.; Stolle, A. Landslide prediction from machine learning. Geol. Today 2014, 30, 26–33. [Google Scholar] [CrossRef]
Pham, B.T.; Bui, D.T.; Prakash, I. Bagging based Support Vector Machines for spatial prediction of landslides. Environ. Earth Sci. 2018, 77, 146. [Google Scholar] [CrossRef]
Ayalew, L.; Yamagishi, H. The application of GIS-based logistic regression for landslide susceptibility mapping in the Kakuda-Yahiko Mountains, Central Japan. Geomorphology 2005, 65, 15–31. [Google Scholar] [CrossRef]
Pradhan, B.; Lee, S. Landslide susceptibility assessment and factor effect analysis: Backpropagation artificial neural networks and their comparison with frequency ratio and bivariate logistic regression modelling. Environ. Model. Softw. 2010, 25, 747–759. [Google Scholar] [CrossRef]
Pradhan, B. Landslide susceptibility mapping of a catchment area using frequency ratio, fuzzy logic and multivariate logistic regression approaches. J. Indian Soc. Remote Sens. 2010, 38, 301–320. [Google Scholar] [CrossRef]
Pham, B.T.; Pradhan, B.; Bui, D.T.; Prakash, I.; Dholakia, M. A comparative study of different machine learning methods for landslide susceptibility assessment: A case study of Uttarakhand area (India). Environ. Model. Softw. 2016, 84, 240–250. [Google Scholar] [CrossRef]
Zhou, C.; Yin, K.; Cao, Y.; Ahmed, B.; Li, Y.; Catani, F.; Pourghasemi, H.R. Landslide susceptibility modeling applying machine learning methods: A case study from Longju in the Three Gorges Reservoir area, China. Comput. Geosci. 2018, 112, 23–37. [Google Scholar] [CrossRef] [Green Version]
Zêzere, J.L.; Pereira, S.; Melo, R.; Oliveira, S.C.; Garcia, R.A.C. Mapping landslide susceptibility using data-driven methods. Sci. Total Environ. 2017, 589, 250–267. [Google Scholar] [CrossRef] [PubMed]
Pham, B.T.; Prakash, I.; Singh, S.K.; Shirzadi, A.; Shahabi, H.; Bui, D.T. Landslide susceptibility modeling using Reduced Error Pruning Trees and different ensemble techniques: Hybrid machine learning approaches. Catena 2019, 175, 203–218. [Google Scholar] [CrossRef]
Fell, R.; Corominas, J.; Bonnard, C.; Cascini, L.; Leroi, E.; Savage, W.Z. Guidelines for landslide susceptibility, hazard and risk zoning for land use planning. Eng. Geol. 2008, 102, 85–98. [Google Scholar] [CrossRef] [Green Version]
Pourghasemi, H.R.; Bui, D.T. Prediction of the landslide susceptibility: Which algorithm, which precision? Catena 2018, 162, 177–192. [Google Scholar] [CrossRef]
Merghadi, A.; Yunus, A.P.; Dou, J.; Whiteley, J.; ThaiPham, B.; Bui, D.T.; Avtar, R.; Abderrahmane, B. Machine learning methods for landslide susceptibility studies: A comparative overview of algorithm performance. Earth Sci. Rev. 2020, 207, 103225. [Google Scholar] [CrossRef]
Chen, X.; Chen, W. GIS-based landslide susceptibility assessment using optimized hybrid machine learning methods. Catena 2021, 196, 104833. [Google Scholar] [CrossRef]
Zhao, X.; Chen, W. Optimization of Computational Intelligence Models for Landslide Susceptibility Evaluation. Remote Sens. 2020, 12, 2180. [Google Scholar] [CrossRef]
Chen, W.; Panahi, M.; Pourghasemi, H.R. Performance evaluation of GIS-based new ensemble data mining techniques of adaptive neuro-fuzzy inference system (ANFIS) with genetic algorithm (GA), differential evolution (DE), and particle swarm optimization (PSO) for landslide spatial modelling. Catena 2017, 157, 310–324. [Google Scholar] [CrossRef]
Wang, Y.; Feng, L.; Li, S.; Ren, F.; Du, Q. A hybrid model considering spatial heterogeneity for landslide susceptibility mapping in Zhejiang Province, China. Catena 2020, 188, 104425. [Google Scholar] [CrossRef]
Lai, J.-S.; Tsai, F. Improving GIS-based Landslide Susceptibility Assessments with Multi-temporal Remote Sensing and Machine Learning. Sensors 2019, 19, 3717. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Goetz, J.N.; Brenning, A.; Petschko, H.; Leopold, P. Evaluating machine learning and statistical prediction techniques for landslide susceptibility modeling. Comput. Geosci. 2015, 81, 1–11. [Google Scholar] [CrossRef]
Chalkias, C.; Ferentinou, M.; Polykretis, C. GIS Supported Landslide Susceptibility Modeling at Regional Scale: An Expert-Based Fuzzy Weighting Method. ISPRS Int. J. Geo-Inf. 2014, 3, 523–539. [Google Scholar] [CrossRef] [Green Version]
Ghorbanzadeh, O.; Rostamzadeh, H.; Blaschke, T.; Gholaminia, K.; Aryal, J. A new GIS-based data mining technique using an adaptive neuro-fuzzy inference system (ANFIS) and k-fold cross-validation approach for land subsidence susceptibility mapping. Nat. Hazards 2018, 94, 497–517. [Google Scholar] [CrossRef] [Green Version]
Tsangaratos, P.; Loupasakis, C.; Nikolakopoulos, K.; Angelitsa, V.; Ilia, I. Developing a landslide susceptibility map based on remote sensing, fuzzy logic and expert knowledge of the Island of Lefkada, Greece. Environ. Earth Sci. 2018, 77, 363. [Google Scholar] [CrossRef]
Roodposhti, M.S.; Aryal, J.; Shahabi, H.; Safarrad, T. Fuzzy Shannon Entropy: A Hybrid GIS-Based Landslide Susceptibility Mapping Method. Entropy 2016, 18, 343. [Google Scholar] [CrossRef]
Moharrami, M.; Naboureh, A.; Nachappa, T.G.; Ghorbanzadeh, O.; Guan, X.; Blaschke, T. National-Scale Landslide Susceptibility Mapping in Austria Using Fuzzy Best-Worst Multi-Criteria Decision-Making. ISPRS Int. J. Geo-Inf. 2020, 9, 393. [Google Scholar] [CrossRef]
Mehrabi, M.; Pradhan, B.; Moayedi, H.; Alamri, A. Optimizing an Adaptive Neuro-Fuzzy Inference System for Spatial Prediction of Landslide Susceptibility Using Four State-of-the-art Metaheuristic Techniques. Sensors 2020, 20, 1723. [Google Scholar] [CrossRef] [Green Version]
Tan, X.-H.; Bi, W.-H.; Hou, X.-L.; Wang, W. Reliability analysis using radial basis function networks and support vector machines. Comput. Geotech. 2011, 38, 178–186. [Google Scholar] [CrossRef]
Yu, X.; Wang, Y.; Niu, R.; Hu, Y. A Combination of Geographically Weighted Regression, Particle Swarm Optimization and Support Vector Machine for Landslide Susceptibility Mapping: A Case Study at Wanzhou in the Three Gorges Area, China. Int. J. Environ. Res. Public Health 2016, 13, 487. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Pourghasemi, H.R.; Jirandeh, A.G.; Pradhan, B.; Xu, C.; Gokceoglu, C. Landslide susceptibility mapping using support vector machine and GIS at the Golestan Province, Iran. J. Earth Syst. Sci. 2013, 122, 349–369. [Google Scholar] [CrossRef] [Green Version]
Youssef, A.M.; Pourghasemi, H.R.; Pourtaghi, Z.S.; Al-Katheeri, M.M. Landslide susceptibility mapping using random forest, boosted regression tree, classification and regression tree, and general linear models and comparison of their performance at Wadi Tayyah Basin, Asir Region, Saudi Arabia. Landslides 2016, 13, 839–856. [Google Scholar] [CrossRef]
Tsangaratos, P.; Ilia, I. Landslide susceptibility mapping using a modified decision tree classifier in the Xanthi Perfection, Greece. Landslides 2016, 13, 305–320. [Google Scholar] [CrossRef]
Shirzadi, A.; Bui, D.T.; Pham, B.T.; Solaimani, K.; Chapi, K.; Kavian, A.; Shahabi, H.; Revhaug, I. Shallow landslide susceptibility assessment using a novel hybrid intelligence approach. Environ. Earth Sci. 2017, 76, 60. [Google Scholar] [CrossRef]
Nguyen, V.V.; Pham, B.T.; Vu, B.T.; Prakash, I.; Jha, S.; Shahabi, H.; Shirzadi, A.; Ba, D.N.; Kumar, R.; Chatterjee, J.M.; et al. Hybrid Machine Learning Approaches for Landslide Susceptibility Modeling. Forests 2019, 10, 157. [Google Scholar] [CrossRef] [Green Version]
Dou, J.; Yunus, A.P.; Bui, D.T.; Merghadi, A.; Sahana, M.; Zhu, Z.; Chen, C.-W.; Han, Z.; Pham, B.T. Improved landslide assessment using support vector machine with bagging, boosting, and stacking ensemble machine learning framework in a mountainous watershed, Japan. Landslides 2020, 17, 641–658. [Google Scholar] [CrossRef]
Nefeslioglu, H.A.; Gokceoglu, C.; Sonmez, H. An assessment on the use of logistic regression and artificial neural networks with different sampling strategies for the preparation of landslide susceptibility maps. Eng. Geol. 2008, 97, 171–191. [Google Scholar] [CrossRef]
Zhao, X.; Chen, W. GIS-Based Evaluation of Landslide Susceptibility Models Using Certainty Factors and Functional Trees-Based Ensemble Techniques. Appl. Sci. 2019, 10, 16. [Google Scholar] [CrossRef] [Green Version]
Li, Y.; Chen, W. Landslide Susceptibility Evaluation Using Hybrid Integration of Evidential Belief Function and Machine Learning Techniques. Water 2020, 12, 113. [Google Scholar] [CrossRef] [Green Version]
Yilmaz, I. Comparison of landslide susceptibility mapping methodologies for Koyulhisar, Turkey: Conditional probability, logistic regression, artificial neural networks, and support vector machine. Environ. Earth Sci. 2009, 61, 821–836. [Google Scholar] [CrossRef]
Huang, F.; Yin, K.; Huang, J.; Gui, L.; Wang, P. Landslide susceptibility mapping based on self-organizing-map network and extreme learning machine. Eng. Geol. 2017, 223, 11–22. [Google Scholar] [CrossRef]
Shahabi, H.; Yamagishi, H.; Zhu, Z.; Yunus, A.P.; Chen, C.-W. A Comparative Study of the Binary Logistic Regression (BLR) and Artificial Neural Network (ANN) Models for GIS-Based Spatial Predicting Landslides at a Regional Scale—TXT-tool 1.081-6.1. In Landslide Dynamics: ISDR-ICL Landslide Interactive Teaching Tools; Springer: Berlin/Heidelberg, Germany, 2018; Volume 1, pp. 139–151. [Google Scholar]
Mutlu, B.; Nefeslioglu, H.A.; Sezer, E.A.; Akcayol, M.A.; Gokceoglu, C. An Experimental Research on the Use of Recurrent Neural Networks in Landslide Susceptibility Mapping. ISPRS Int. J. Geo-Inf. 2019, 8, 578. [Google Scholar] [CrossRef] [Green Version]
Ghorbanzadeh, O.; Blaschke, T.; Gholamnia, K.; Meena, S.R.; Tiede, D.; Aryal, J. Evaluation of Different Machine Learning Methods and Deep-Learning Convolutional Neural Networks for Landslide Detection. Remote Sens. 2019, 11, 196. [Google Scholar] [CrossRef] [Green Version]
Wang, G.; Lei, X.; Chen, W.; Shahabi, H.; Shirzadi, A. Hybrid Computational Intelligence Methods for Landslide Susceptibility Mapping. Symmetry 2020, 12, 325. [Google Scholar] [CrossRef] [Green Version]
Bach, F. Breaking the curse of dimensionality with convex neural networks. J. Mach. Learn. Res. 2017, 18, 629–681. [Google Scholar]
Bui, D.T.; Pham, B.T.; Nguyen, Q.P.; Hoang, N.-D. Spatial prediction of rainfall-induced shallow landslides using hybrid integration approach of Least-Squares Support Vector Machines and differential evolution optimization: A case study in Central Vietnam. Int. J. Digit. Earth 2016, 9, 1077–1097. [Google Scholar] [CrossRef]
Gu, J.; Zhu, M.; Jiang, L. Housing price forecasting based on genetic algorithm and support vector machine. Expert Syst. Appl. 2011, 38, 3383–3386. [Google Scholar] [CrossRef]
Nourani, V.; Pradhan, B.; Ghaffari, H.; Sharifi, S.S. Landslide susceptibility mapping at Zonouz Plain, Iran using genetic programming and comparison with frequency ratio, logistic regression, and artificial neural network models. Nat. Hazards 2013, 71, 523–547. [Google Scholar] [CrossRef]
Nguyen, H.; Mehrabi, M.; Kalantar, B.; Moayedi, H.; Abdullahi, M.M. Potential of hybrid evolutionary approaches for assessment of geo-hazard landslide susceptibility mapping. Geomat. Nat. Hazards Risk 2019, 10, 1667–1693. [Google Scholar] [CrossRef]
Jaafari, A.; Panahi, M.; Pham, B.T.; Shahabi, H.; Bui, D.T.; Rezaie, F.; Lee, S. Meta optimization of an adaptive neuro-fuzzy inference system with grey wolf optimizer and biogeography-based optimization algorithms for spatial prediction of landslide susceptibility. Catena 2019, 175, 430–445. [Google Scholar] [CrossRef]
Li, L.; Liu, R.; Pirasteh, S.; Chen, X.; He, L.; Li, J. A novel genetic algorithm for optimization of conditioning factors in shallow translational landslides and susceptibility mapping. Arab. J. Geosci. 2017, 10, 209. [Google Scholar] [CrossRef]
Paryani, S.; Neshat, A.; Javadi, S.; Pradhan, B. Comparative performance of new hybrid ANFIS models in landslide susceptibility mapping. Nat. Hazards 2020, 103, 1961–1988. [Google Scholar] [CrossRef]
Kavzoglu, T.; Sahin, E.K.; Colkesen, I. Selecting optimal conditioning factors in shallow translational landslide susceptibility mapping using genetic algorithm. Eng. Geol. 2015, 192, 101–112. [Google Scholar] [CrossRef]
Bui, D.T.; Pradhan, B.; Nampak, H.; Bui, Q.-T.; Tran, Q.-A.; Nguyen, Q.-P. Hybrid artificial intelligence approach based on neural fuzzy inference model and metaheuristic optimization for flood susceptibilitgy modeling in a high-frequency tropical cyclone area using GIS. J. Hydrol. 2016, 540, 317–330. [Google Scholar] [CrossRef]
Tsangaratos, P.; Ilia, I. Applying Machine Learning Algorithms in Landslide Susceptibility Assessments. In Handbook of Neural Computation; Elsevier: Amsterdam, The Netherlands, 2017; pp. 433–457. [Google Scholar]
Jones, P.; Harris, I. Climatic Research Unit (CRU) Time-Series Datasets of Variations in Climate with Variations in Other Phenomena; NCAS British Atmospheric Data Centre: Leeds, UK, 2008; p. 15. [Google Scholar]
Rozos, D. Engineering-Geological Conditions in the Achaia County. Geomechanical Characteristics of the Plio-Pleistocene Sediments. Ph.D. Thesis, University of Patras, Patras, Greece, 1989. [Google Scholar]
Brunn, J.H. Contribution à L’étude Géologique du Pinde Septentrional et D’une Partie de la Macédoine Occidentale. Ann. Géolog. Pays Helléniques 1956, 7, 1–358. [Google Scholar]
Aubouin, J. Contribution à L’étude Géologique de la Grèce Septentrionale: Les Confins de L’epire et de la Thessalie. Ann. Géolog. Pays Helléniques 1959, 10, 1–483. [Google Scholar]
Khun, M.; Wing, J.; Weston, S. Caret: Classification and Regression Training. R Package Version 6.0-77. 2017. Available online: https://cran.microsoft.com/snapshot/2017-09-17/web/packages/caret/index.html (accessed on 14 October 2020).
Ryden, K. Environmental Systems Research Institute Mapping. Am. Cartogr. 1987, 14, 261–263. [Google Scholar] [CrossRef]
Kavoura, K.; Konstantopoulou, M.; Depountis, N.; Sabatakakis, N. Slow-moving landslides: Kinematic analysis and movement evolution modeling. Environ. Earth Sci. 2020, 79, 1–11. [Google Scholar] [CrossRef]
Lee, S.; Choi, J.; Woo, I. The effect of spatial resolution on the accuracy of landslide susceptibility mapping: A case study in Boun, Korea. Geosci. J. 2004, 8, 51–60. [Google Scholar] [CrossRef]
Regmi, N.R.; Giardino, J.R.; Vitek, J.D. Modeling susceptibility to landslides using the weight of evidence approach: Western Colorado, USA. Geomorphology 2010, 115, 172–187. [Google Scholar] [CrossRef]
Ozdemir, A.; Altural, T. A comparative study of frequency ratio, weights of evidence and logistic regression methods for landslide susceptibility mapping: Sultan Mountains, SW Turkey. J. Asian Earth Sci. 2013, 64, 180–197. [Google Scholar] [CrossRef]
Bonham-Carter, G.F.; Agterberg, F.P.; Wright, D.F. Weights of evidence modelling: A new approach to mapping mineral potential. Stat. Appl. Earth Sci. 1989, 171–183. [Google Scholar] [CrossRef]
Agterberg, F.; Bonharn-Carter, G. Weights of Evidence Modeling And Weighted Logistic Regression For Mineral Potential Mapping. Comput. Geol. 1993, 25, 13–32. [Google Scholar] [CrossRef]
Zou, Z.; Yang, Y.; Fan, Z.; Tang, H.; Zou, M.; Hu, X.; Xiong, C.; Ma, J. Suitability of data preprocessing methods for landslide displacement forecasting. Stoch. Environ. Res. Risk Assess. 2020, 34, 1105–1119. [Google Scholar] [CrossRef]
Guns, M.; Vanacker, V. Logistic regression applied to natural hazards: Rare event logistic regression with replications. Nat. Hazards Earth Syst. Sci. 2012, 12, 1937–1947. [Google Scholar] [CrossRef]
Lei, X.; Chen, W.; Avand, M.; Janizadeh, S.; Kariminejad, N.; Shahabi, H.; Costache, R.-D.; Shahabi, H.; Shirzadi, A.; Mosavi, A. GIS-Based Machine Learning Algorithms for Gully Erosion Susceptibility Mapping in a Semi-Arid Region of Iran. Remote Sens. 2020, 12, 2478. [Google Scholar] [CrossRef]
Holland, J.H. Genetic Algorithms and Adaptation. In Adaptive Control of Ill-Defined Systems; Springer: Berlin/Heidelberg, Germany, 1984; pp. 317–333. [Google Scholar]
Kennedy, J.; Eberhart, R. Particle Swarm Optimization. In Proceedings of the ICNN’95-International Conference on Neural Networks, Perth, WA, Australia, 27 November–1 December 1995; pp. 1942–1948. [Google Scholar]
Moguerza, J.M.; Muñoz, A. Support Vector Machines with Applications. Stat. Sci. 2006, 21, 322–336. [Google Scholar] [CrossRef] [Green Version]
Cherkassky, V.; Mulier, F.M. Learning from Data: Concepts, Theory, and Methods; John Wiley & Sons: Hoboken, NJ, USA, 2007. [Google Scholar]
Tan, P.L.; Tan, S.C.; Lim, C.P.; Khor, S.E. A Modified Two-Stage Svm-Rfe Model for Cancer Classification Using Microarray Data. In Proceedings of the International Conference on Neural Information Processing, Shanghai, China, 14–17 November 2011; pp. 668–675. [Google Scholar]
Venables, W.N.; Ripley, B.D. Modern Applied Statistics with S; Springer: Berlin/Heidelberg, Germany, 2002. [Google Scholar]
Chen, W.; Chen, X.; Peng, J.; Panahi, M.; Lee, S. Landslide susceptibility modeling based on ANFIS with teaching-learning-based optimization and Satin bowerbird optimizer. Geosci. Front. 2021, 12, 93–107. [Google Scholar] [CrossRef]
Chen, W.; Zhao, X.; Tsangaratos, P.; Dou, J.; Ilia, I.; Xue, W.; Wang, X.; Bin Ahmad, B. Evaluating the usage of tree-based ensemble methods in groundwater spring potential mapping. J. Hydrol. 2020, 583, 124602. [Google Scholar] [CrossRef]
Wilcoxon, F. Individual Comparisons by Ranking Methods. Biom. Bull. 1945, 1, 80. [Google Scholar] [CrossRef]
Bui, D.T.; Tuan, T.A.; Klempe, H.; Pradhan, B.; Revhaug, I. Spatial prediction models for shallow landslide hazards: A comparative assessment of the efficacy of support vector machines, artificial neural networks, kernel logistic regression, and logistic model tree. Landslides 2016, 13, 361–378. [Google Scholar] [CrossRef]
Chen, W.; Li, Y. GIS-based evaluation of landslide susceptibility using hybrid computational intelligence models. Catena 2020, 195, 104777. [Google Scholar] [CrossRef]
Lei, X.; Chen, W.; Pham, B.T. Performance Evaluation of GIS-Based Artificial Intelligence Approaches for Landslide Susceptibility Modeling and Spatial Patterns Analysis. ISPRS Int. J. Geo-Inf. 2020, 9, 443. [Google Scholar] [CrossRef]
Varnes, D. IAEG Commission on Landslides and Other Mass Movements, Landslide Hazard Zonation: A Review of Principles and Practice; The UNESCO Press: Paris, France, 1984; p. 63. [Google Scholar]
Polykretis, C.; Chalkias, C. Comparison and evaluation of landslide susceptibility maps obtained from weight of evidence, logistic regression, and artificial neural network models. Nat. Hazards 2018, 93, 249–274. [Google Scholar] [CrossRef]
Koukis, G.; Rozos, D.; Hadzinakos, I. Relationship Between Rainfall and Landslides in the Formations of Achaia County, Greece. In Engineering Geology and the Environment; CRC Press: Boca Raton, FL, USA, 1997; pp. 793–798. [Google Scholar]
Koukouvelas, I.; Doutsos, T. The Effects of Active Faults on the Generation of Landslides in Nw Peloponnese, Greece. In Engineering Geology and the Environment; CRC Press: Boca Raton, FL, USA, 1997; pp. 799–804. [Google Scholar]
Tsagas, D. Geomorphological Investigation and Mass Movements in Northern Peloponnese: Area of Xylokastro-Diakofto. Ph.D. Thesis, University of Athens, Athens, Greece, 2011; p. 361. [Google Scholar]
NASA; Japan Space Systems; US/Japan Aster Science Team. ASTER Global Digital Elevation Model V003; Data Set; NASA: Washington, DC, USA, 2009.
IGME. Geological Map of Greece, at a Scale of 1:50,000; Patras Sheet; IGME: Madrid, Spain, 1980; Available online: https://shop.geospatial.com/product/03-GRAC-Greece-50000-Geological-Maps (accessed on 14 October 2020).
IGME. Geological Map of Greece, at a Scale of 1:50,000; Aigion Sheet; IGME: Madrid, Spain, 2005; Available online: https://shop.geospatial.com/product/03-GRAC-Greece-50000-Geological-Maps (accessed on 14 October 2020).
Pourghasemi, H.R.; Yansari, Z.T.; Panagos, P.; Pradhan, B. Analysis and evaluation of landslide susceptibility: A review on articles published during 2005–2016 (periods of 2005–2012 and 2013–2016). Arab. J. Geosci. 2018, 11, 193. [Google Scholar] [CrossRef]
Pham, B.T.; Bui, D.T.; Pourghasemi, H.R.; Indra, P.; Dholakia, M.B. Landslide susceptibility assesssment in the Uttarakhand area (India) using GIS: A comparison study of prediction capability of naïve bayes, multilayer perceptron neural networks, and functional trees methods. Theor. Appl. Climatol. 2017, 128, 255–273. [Google Scholar] [CrossRef]
Dai, F.; Lee, C.; Ngai, Y. Landslide risk assessment and management: An overview. Eng. Geol. 2002, 64, 65–87. [Google Scholar] [CrossRef]
Dehnavi, A.; Aghdam, I.N.; Pradhan, B.; Varzandeh, M.H.M. A new hybrid model using step-wise weight assessment ratio analysis (SWARA) technique and adaptive neuro-fuzzy inference system (ANFIS) for regional landslide hazard assessment in Iran. Catena 2015, 135, 122–148. [Google Scholar] [CrossRef]
Oh, H.-J.; Pradhan, B. Application of a neuro-fuzzy model to landslide-susceptibility mapping for shallow landslides in a tropical hilly area. Comput. Geosci. 2011, 37, 1264–1276. [Google Scholar] [CrossRef]
Nefeslioglu, H.A.; Sezer, E.; Gokceoglu, C.; Bozkir, A.S.; Duman, T.Y. Assessment of Landslide Susceptibility by Decision Trees in the Metropolitan Area of Istanbul, Turkey. Math. Probl. Eng. 2010, 2010, 1–15. [Google Scholar] [CrossRef] [Green Version]
Conoscenti, C.; Ciaccio, M.; Caraballo-Arias, N.A.; Gómez-Gutiérrez, Á.; Rotigliano, E.; Agnesi, V. Assessment of susceptibility to earth-flow landslide using logistic regression and multivariate adaptive regression splines: A case of the Belice River basin (western Sicily, Italy). Geomorphology 2015, 242, 49–64. [Google Scholar] [CrossRef]
Moore, I.D.; Grayson, R.B.; Ladson, A.R. Digital terrain modelling: A review of hydrological, geomorphological, and biological applications. Hydrol. Process. 1991, 5, 3–30. [Google Scholar] [CrossRef]
Wilson, J.; Gallant, J. Digital Terrain Analysis. In Terrain Analysis: Principles and Applications; John Wiley & Sons: Hoboken, NJ, USA, 2000; Volume 479, pp. 1–27. [Google Scholar]
Wang, G.; Chen, X.; Chen, W. Spatial Prediction of Landslide Susceptibility Based on GIS and Discriminant Functions. ISPRS Int. J. Geo-Inf. 2020, 9, 144. [Google Scholar] [CrossRef] [Green Version]
Van Westen, C.; Van Asch, T.W.; Soeters, R. Landslide hazard and risk zonation—Why is it still so difficult? Bull. Eng. Geol. Environ. 2006, 65, 167–184. [Google Scholar] [CrossRef]
Arabameri, A.; Saha, S.; Roy, J.; Tiefenbacher, J.P.; Cerdà, A.; Biggs, T.W.; Pradhan, B.; Pham, T.-D.; Collins, A.L. A novel ensemble computational intelligence approach for the spatial prediction of land subsidence susceptibility. Sci. Total Environ. 2020, 726, 138595. [Google Scholar] [CrossRef]
Jebur, M.N.; Pradhan, B.; Tehrany, M.S. Optimization of landslide conditioning factors using very high-resolution airborne laser scanning (LiDAR) data at catchment scale. Remote Sens. Environ. 2014, 152, 150–165. [Google Scholar] [CrossRef]
Feng, H.; Yu, J.; Zheng, J.; Tang, X.; Peng, C. Evaluation of different models in rainfall-triggered landslide susceptibility mapping: A case study in Chunan, southeast China. Environ. Earth Sci. 2016, 75, 1399. [Google Scholar] [CrossRef]
Luo, X.; Lin, F.; Zhu, S.; Yu, M.; Zhang, Z.; Meng, L.; Peng, J. Mine landslide susceptibility assessment using IVM, ANN and SVM models considering the contribution of affecting factors. PLoS ONE 2019, 14, e0215134. [Google Scholar] [CrossRef]
Gómez, H.; Kavzoglu, T. Assessment of shallow landslide susceptibility using artificial neural networks in Jabonosa River Basin, Venezuela. Eng. Geol. 2005, 78, 11–27. [Google Scholar] [CrossRef]
García-Rodríguez, M.J.; Malpica, J.A. Assessment of earthquake-triggered landslide susceptibility in El Salvador based on an Artificial Neural Network model. Nat. Hazards Earth Syst. Sci. 2010, 10, 1307–1315. [Google Scholar] [CrossRef]

Figure 1. Study area.

Figure 2. Spatial distribution of geological units within the study area.

Figure 3. Flowchart of the applied approach.

Figure 4. Characteristic landslides in the study area (a) translational landslide, (b) rotational landslide—weathering mantle.

Figure 5. Landslide-related parameters: (a) Elevation; (b) Slope angle; (c) Aspect; (d) Curvature.

Figure 6. Landslide-related parameters: (a) Plan curvature; (b) Profile curvature; (c) Topographic wetness index (TWI); (d) Stream power index (SPI).

Figure 7. Landslide-related parameters: (a) Distance from faults; (b) Distance from river network; (c) Hydrological cover.

Figure 8. Weights of each class of each parameter by Weight of Evidence (WofE) (higher weights means more susceptible colored red and lower weights means less susceptible colored green).

Figure 9. Landslide susceptibility estimated by the artificial neural network (ANN) model optimized by genetic algorithm (GA) and particle swarm optimization (PSO).

Figure 10. Landslide susceptibility estimated by the support vector machine (SVM) model optimized by GA and PSO.

Figure 11. Receiver operating characteristic (ROC) analysis and ROC curves for the: (a) ANN model based on the training subset; (b) ANN model based on the test subset; (c) SVM model based on the training subset; (d) SVM model based on the test subset.

Figure 12. Percentage of relative frequency of landslide susceptibility values based on the testing subset. (a) ANN model for the landslide locations; (b) ANN model for the non-landslide locations; (c) SVM model for the landslide locations; (d) SVM model for the non-landslide locations.

Table 1. Multi-collinearity analysis, tolerance and variance inflation factor.

Landslide Related Parameters	Tolerance (TOF)	Variance Inflation Factor (VIF)
Elevation	0.6526	1.5321
Slope angle	0.3053	3.2750
Slope aspect	0.9742	1.0264
Curvature	0.6027	1.6591
Plan curvature	0.7013	1.4257
Profile curvature	0.7065	1.4152
TWI	0.5408	1.8490
SPI	0.5909	1.6920
Distance from river network	0.9646	1.0366
Distance from faults	0.8808	1.1352
Lithological cover	0.8622	1.1597
Hydrological cover	0.9606	1.0409

Table 2. Training and testing statistical metrics.

	ANN (Training)	ANN (Testing)	SVM (Training)	SVM (Testing)
Area under the ROC curve (AUC)	0.969	0.800	0.977	0.750
Standard Error	0.0067	0.0316	0.0069	0.0351
95% Confidence Interval	0.949–0.983	0.738–0.853	0.959–0.988	0.684–0.808
z statistic	69.583	9.507	68.497	7.125
Significance level p (Area = 0.5)	0.0001	0.0001	0.0001	0.0001

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, W.; Chen, Y.; Tsangaratos, P.; Ilia, I.; Wang, X. Combining Evolutionary Algorithms and Machine Learning Models in Landslide Susceptibility Assessments. Remote Sens. 2020, 12, 3854. https://0-doi-org.brum.beds.ac.uk/10.3390/rs12233854

AMA Style

Chen W, Chen Y, Tsangaratos P, Ilia I, Wang X. Combining Evolutionary Algorithms and Machine Learning Models in Landslide Susceptibility Assessments. Remote Sensing. 2020; 12(23):3854. https://0-doi-org.brum.beds.ac.uk/10.3390/rs12233854

Chicago/Turabian Style

Chen, Wei, Yunzhi Chen, Paraskevas Tsangaratos, Ioanna Ilia, and Xiaojing Wang. 2020. "Combining Evolutionary Algorithms and Machine Learning Models in Landslide Susceptibility Assessments" Remote Sensing 12, no. 23: 3854. https://0-doi-org.brum.beds.ac.uk/10.3390/rs12233854

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Combining Evolutionary Algorithms and Machine Learning Models in Landslide Susceptibility Assessments

Abstract

1. Introduction

2. Study Area

3. Materials and Methods

3.1. Methodology

3.1.1. First Phase

3.1.2. Second Phase

3.1.3. Third Phase

3.1.4. Fourth Phase

3.1.5. Fifth Phase

3.2. Data

4. Results

4.1. First Phase—WofE Analysis

4.2. Second Phase—Multi-Collinearity Analysis

4.3. Third Phase—Feature Selection by GA

4.4. Fourth Phase—Optimizing SVM and ANN by PSO for Landslide Susceptibility Mapping

4.5. Fifth Phase—Evaluating the Performance of the SVM and ANN

5. Discussion

6. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI