Combination of Metaheuristic Optimization Algorithms and Machine Learning Methods for Groundwater Potential Mapping

AlAyyash, Saad; Al-Fugara, A’kif; Shatnawi, Rania; Al-Shabeeb, Abdel Rahman; Al-Adamat, Rida; Al-Amoush, Hani

doi:10.3390/su15032499

Open AccessArticle

Combination of Metaheuristic Optimization Algorithms and Machine Learning Methods for Groundwater Potential Mapping

by

Saad AlAyyash

^1,*

,

A’kif Al-Fugara

²

,

Rania Shatnawi

³

,

Abdel Rahman Al-Shabeeb

⁴,

Rida Al-Adamat

⁴

and

Hani Al-Amoush

⁵

¹

Department of Civil Engineering, Faculty of Engineering, Al al-Bayt University, Mafraq 25113, Jordan

²

Department of Surveying Engineering, Faculty of Engineering, Al al-Bayt University, Mafraq 25113, Jordan

³

Department of Civil Engineering, School of Built Environment Engineering, Al-Hussein Technical University, Amman 11822, Jordan

⁴

Department of Geographic Information Systems & Remote Sensing, Faculty of Earth and Environmental Sciences, Al al-Bayt University, Mafraq 25113, Jordan

⁵

Department of Applied Earth and Environmental Sciences, Faculty of Earth and Environmental Sciences, Al al-Bayt University, Mafraq 25113, Jordan

^*

Author to whom correspondence should be addressed.

Sustainability 2023, 15(3), 2499; https://0-doi-org.brum.beds.ac.uk/10.3390/su15032499

Submission received: 22 December 2022 / Revised: 17 January 2023 / Accepted: 23 January 2023 / Published: 30 January 2023

(This article belongs to the Special Issue Groundwater Recharge and Sustainable Use of Groundwater)

Download

Browse Figures

Versions Notes

Abstract

:

The groundwater contained in aquifers is among the most important water supply resources, especially in semi-arid and arid regions worldwide. This study aims to evaluate and compare the prediction capability of two well–known models, support vector machine (SVM) and adaptive neuro-fuzzy inference system (ANFIS), combined with a genetic algorithm (GA), invasive weed optimization (IWO), and teaching–learning-based optimization (TLBO) algorithms in groundwater potential mapping (GPM) the Azraq Basin in Jordan. The hybridization of the SVM and ANFIS models with the GA, IWO, and TLBO algorithms results in six models: SVM–GA, SVM–IWO, SVM–TLBO, ANFIS–GA, ANFIS–IWO, and ANFIS–TLBO. A database consisting of well data containing 464 wells with 12 predictive factors was developed for the groundwater potential mapping (GPM) of the study area. Of the 464 well locations, 70% (325 locations) were assigned for the training set and the rest (139 locations) for the validation set. The correlation between the 12 predictive factors and the well locations is analyzed using the frequency ratio (FR) statistical model. An area under receiver operating characteristic (AUROC) curve was used to evaluate and compare the models. According to the results, the SVM-based hybrid models outperformed other ANFIS hybrid models in the learning (training) and validation phases. The SVM–GA and SVM–TLBO hybrid models showed AUROC values of 0.984 and 0.971, respectively, in the training and validation phases. Moreover, the ANFIS–GA and ANFIS–TLBO hybrid models showed an AUROC of 0.979 and 0.984 in the training phase and an AUROC of 0.973 and 0.984 in the validation phase, respectively. The SVM–IWO and ANFIS–IWO hybrid models showed the lowest AUROC. This study demonstrated the more efficient results of the SVM-based hybrid models in comparison with the ANFIS-based hybrid models in terms of accuracy and modeling speed.

Keywords:

azraq basin; Jordan; groundwater potential mapping; ANFIS; SVM; GA; TLBO; IWO

1. Introduction

Water is essential for the survival and livelihood of all living beings on the earth. Groundwater is the most efficient and sustainable source of water that is not affected by the fluctuation in climate conditions of a particular area. About 1.5 billion people worldwide depend upon groundwater as the only source of clean drinking water, and about 38% of cultivated lands rely on groundwater for irrigation [1,2]. The importance of groundwater as the water source for wide communities encourages the regular monitoring and evaluation of groundwater quantity and quality and sustainable management and utilization, mainly in arid and semi-arid regions [3]. Groundwater quality can be studied through the chemical analysis of the water available from wells, aquifers, ponds, and sometimes streams, while quantity can be estimated by measuring the water table and saturated zone thickness. After industrialization, the multifold increment in population has caused excess groundwater utilization, which leads to scarcity, especially in major parts of developing countries with arid climatic conditions. Thus, the use of the advanced technology of remote sensing (RS) and geographic information systems (GIS) is being initiated for the identification of groundwater potential zones through mapping; several studies have been performed worldwide and are still being studied using advanced statistical tools [4,5,6,7,8,9,10]. Satellite-based RS techniques allow for more extensive ground surface coverage than possible through terrestrial observations [11]. They provide extensive, neutral, accurate, and readily available information about the location and the dynamics of changes worldwide [12]. On the other hand, the abilities of GIS help with the large size of geospatial data processing, and the delivery of reliable information using query-based calculation becomes easier [13].

Groundwater occurrence or availability depends on the role of various controlling factors, mainly the recharge, which is influenced by several important factors such as rainfall, geology, drainage density, lithology, soil texture, slope, and elevation [14,15,16]. Different conditioning factors carry different properties and help identify the potential groundwater locations. The most important factor is rainfall; the excessive water stored in the rainy season helps recharge groundwater. A drainage system is an indicator of shallow groundwater availability; in a drainage basin’s periphery, high permeability is found [17]. Other informative indicators are faults and fractures found in rocks; hence, lithology is counted as a conditioning factor. A region’s soil also determines groundwater availability; the more permeable soil has more chances of groundwater availability. The shape, grain size, arrangement, porosity, void ratio, and degree of saturation are some of the most influential factors determining the permeability of the soil [18]. The slope of the surface directly influences the infiltration process of surface water; a steep slope is less helpful compared with a flat surface for water infiltration. The flat surface provides optimal time for the surface waters to infiltrate completely before the evaporation of the remaining water. The geology of an area provides a descriptive layout of soil and rock layers, including their porosity and permeability properties. Geomorphology (a combined dataset comprising different geomorphic units), on the other hand, geomorphological units (single units of geomorphology), or geomorphological proxies (data produced by using either geomorphology or geomorphological units) help us understand the evolution of a landform, which is further useful in understanding porous and permeable zones [19]. Another essential controlling factor is the land use/landcover dataset, which provides different categories/classes of it. The important classes in this factor are vegetation cover, water bodies, forests, and settlements. Land use/landcover affects the size and volume of groundwater and surface water drastically by influencing the infiltration process, surface runoff, and groundwater utilization. Studies have found more success in delineating the groundwater potential zone (GPZ) when researchers use knowledge-driven factors, including different controlling factors, as layers (input datasets) and analyze them through RS and GIS tools to perform analysis for the mapping of GPZs. The analysis methods for mapping are varied, ranging from bivariate statistical methods to machine learning (artificial neural networks) and multicriteria decision-making models (analytical hierarchy processes, evidential belief function, TOPSIS, VIKOR, etc.) [20,21,22,23,24,25,26].

The simplest prediction models are bivariate statistical methods such as the frequency ratio (FR) model, which correlates the predictive variables with well locations. Jothibasu A. and Anbazhagan S. [26] used this model to map groundwater potential zones in Tamil Nadu with an area under the receiver operating characteristic curve (AUROC) equal to 0.789. Falah F. et al. [25] compared three statistical models, namely, frequency ratio (FR), statistical index (SI), and weight-of-evidence (WOE), to develop a groundwater spring potential map in Iran. The results showed that the accuracy of SI was the highest (AUROC = 0.854), while the FR and WOE accuracies were 0.837 and 0.763, respectively.

MCDM uses different algorithms, such as AHP, which are also widely used to map groundwater potential zones. Adiat K.A.N. et al. [20,21] used the MCDM-AHP model to predict GPZs in part of Malaysia. The prediction accuracy ranged between 80 and 81.25%. To the same extent, Akinlalu A.A. et al. [22] used the MCDM-AHP model to map GPZs in Nigeria with an accuracy of 70%.

Using optimization algorithms with MCDM can enhance prediction accuracy. Using random forest (RF) with MCDM can yield high accuracy, such as AUROC = 0.9572 in [23]. Duan H. et al. [24] combined MCDM with the C5.0 algorithm to predict GPZs in southwest China with more than 90% accuracy.

In search of better model performance, several studies have been accomplished by utilizing two or more models and their ensembles to compare them for mapping GPZs [3,23,27,28]. Furthermore, many studies have been conducted using an artificial neural network (ANN) and its advanced form, the adaptive neuro-fuzzy inference system (ANFIS), for the modeling of hydrological systems and the prediction of various hazards [29]. With a self-learning ability and decisions based on fuzzy logic, ANFIS produces a more consistent structure for finding a better solution [30,31]. The ANFIS works more accurately than fuzzy logic or ANN models. ANFIS architecture can be presented by two fuzzy if–then rules based on the first order of the Sugeno model [32]. Many studies have been completed by using ANFIS and other advanced models to make ensembles for GPZ mapping, but very few of them have compared ANFIS-based ensembles with any other machine learning-based ensemble models [4,27,33,34]. Hence, this study focused on filling this research gap and conducting such an analysis.

Machine learning models are used to predict GPZs. These models can be used alone or with optimization algorithms. Lee S. et al. [28] compared the performance of an artificial neural network (ANN) and a support vector machine (SVM) in predicting GPZs in Boryeong City in Korea and found that the accuracies (AUROC) of the two models were 0.8357 and 0.8083, respectively. Combining optimization algorithms with machine learning models can enhance the prediction accuracy of these models. Khosravi K. et al. [27] combined ANFIS with five algorithms (invasive weed optimization (IWO), differential evolution (DE), firefly algorithm (FA), particle swarm optimization (PSO), and the bees algorithm (BA)) to map GPZs in western Iran and found that ANFIS-DE provided the highest accuracy (AUROC = 0.875). At the same time, the least accurate model was ANFIS-BA, with an AUROC equal to 0.839.

Other environmental applications of machine learning models using these models to predict groundwater levels can be found in [29,33,34]. Another application is using machine learning models to predict floods, as in [30,31], where the models predicted flood susceptibility maps with AUROC of or more than 0.8.

This study will compare the GPZ prediction capabilities of two machine learning models, namely, support vector machine (SVM) and adaptive neuro-fuzzy inference system (ANFIS), when combined with three optimizing algorithms: teaching–learning-based optimization (TLBO), genetic algorithm (GA), and invasive weed optimization (IWO). The combination will result in six models: ANFIS-TLBO, ANFIS-GA, ANFIS-IWO, SVM-TLBO, SVM-GA, and SVM-IWO. A total of 12 predictive variables were used in the prediction, including geologic, topographic, geomorphologic, and climatic factors. First, the predictive variables were classified into classes, the frequency ratio (FR) of each class was calculated, and the probability weight for each class was assigned. The study’s next step was introducing the predictive variables’ classes and their weights into the models in two sets: a training set and a validation (test) set. In the final step, the prediction capabilities of the six models were compared and evaluated using runtime, mean square error (MSE), root mean square error (RMSE), and area under the receiver operating characteristic curve (AUROC).

2. Study Area and Data

2.1. Study Area

The study area covers about 60% of the central parts of the Azraq Basin in central Jordan. Azraq Basin is one of Jordan’s major groundwater basins, covering an area of about 12,000 km². In total, 94% of the Azraq Basin area is within Jordan, 5% is in Syria to the north, and about 1% is in Saudi Arabia to the south (Figure 1). The study area selected is in the central part of the basin. In this area, most of the groundwater-utilizing wells are located in the basin’s populated area, with about 60,000 people living in 32 settlements [35].

The Azraq Basin is part of the arid lands of Jordan that covers more than 80% of the country’s area. Like most of Jordan’s arid land, the climate of the Azraq Basin is characterized by hot and dry summers and cold and wet winters, with the rainfall season extending between October and May. The annual rainfall in the basin ranges from less than 50 mm in the southeastern parts to more than 500 mm in the northwestern parts [35]. Groundwater in the Azraq Basin is found in three aquifer systems: a shallow system in alluvium, basalt, and Um-Rijam (B4) geologic formations; a middle aquifer system found in upper cretaceous calcareous limestone formations (B2/A7); and a deep aquifer system found in a deep sandstone formation. The recharge for these aquifer systems varies from local recharge, mainly to the upper aquifer, to recharge from the northern highlands (Jabal Al-Arab in Syria), the western highlands of Amman and Madaba, and the southern highlands of Karak and Tafila, mainly to the middle and deep aquifers [35,36].

2.2. Dataset

A total of 12 factors affecting groundwater potential mapping (GPM) are considered in this study, including elevation, slope, aspect, length of the slope, plan curvature, soil type, geology, lithology, rainfall, distance from drainage, topographic wetness index (TWI), and stream power index (SPI)). In the GIS environment, these variables were classified into several classes. The topography-dependent variables, namely, elevation, slope, length of the slope, aspect, and plan curvature, were used for the GPM of the Azraq Basin. Land elevation and topography affect the rate of water infiltration into the ground. The high elevation of a region decreases the infiltration rate and causes an increase in the runoff [37]; in contrast, with decreasing elevation, further surface water infiltrates the ground [37].

Furthermore, the elevation of a region influences the direction and velocity of surface runoff. An elevation map of the study area was prepared in a GIS environment in five classes based on the digital elevation model (DEM) with a cell size of 30 m. The minimum and maximum elevations of the study area are 518 and 977 m, respectively. These elevation differences caused a surface slope leading to surface water infiltration and the distribution of groundwater [19,38]. The presence of a reliable groundwater aquifer is largely dependent on the land slope [39]. A slope map of the study area was prepared in seven classes based on the DEM. Using the DEM, an aspect map was prepared in GIS in nine classes of cardinal, ordinal, and flat directions.

The lands on the northern and eastern slopes receive less solar radiation than those on the southern and western slopes. This affects the vegetation in the northern and eastern slopes, so denser vegetation in these regions causes an increase in the infiltration of runoffs and the further recharging of the groundwater. Surface curvature plays a key role in the environmental analysis, runoff, and infiltration rates. Accordingly, the plan curvature map was prepared in three classes, convex, concave, and flat, based on the DEM. The topographic wetness index (TWI) is another factor used to find groundwater resource potential. According to this index, with an increasing slope, the available moisture decreases due to the more rapid inaccessibility of surface water. In contrast, more moisture is available in regions with a lower slope. This index represents the relationship between the surface slope and moisture content on the ground surface. The TWI is obtained from Equation (1) [40]:

T W I = \ln (\frac{A}{\tan α})

(1)

where A is the cumulative upslope area (m²), and a is the slope gradient (in degrees). The TWI map was prepared and classified into three classes.

The stream power index (SPI) is another topographic index related to the stream weight and local ground slope. The SPI characterizes the potential of flowing water to cause water movement and soil erosion. It is proportional to the accumulation area and local ground slope. The SPI can be calculated using Equation (2) [40]:

S P I = A \tan α

(2)

where A is the cumulative upslope area (m²), and a is the slope gradient (in degrees). The SPI map was prepared and classified into four classes.

Soil texture is a key factor influencing the water infiltration rate [41,42]. The soil texture significantly affects the infiltration rate’s increasing and decreasing rates. A soil map of the study area was prepared in three classes. On the other hand, there are different geological formations with different porosities depending on the type of rocks and their geological ages [43]. The infiltration rate and recharge of the groundwater increase with increasing porosity. A lithological map of the study area was prepared in four classes using 1:1000 maps developed by the Natural Resources Authority (NRA) of Jordan (Table 1). Geology is also important since it influences surface runoff and surface geomorphology and controls infiltration to the subsurface groundwater aquifers [44]. Table 2 lists the main surface geological formations in the study area.

Drainage lines can be considered the weak zones of formations developed by dissolution and turned into today’s shape. The drainage lines are inversely correlated with the water penetration rate. The density of the drainage network represents water transfer and a reduced infiltration rate. The distance to the watercourse map was prepared based on the digital layer of the drainage network. Figure 2 shows the classes of the 12 predictive factors.

3. Methodology

This study used hybrid models consisting of the two well-known machine learning algorithms, SVM and ANFIS, in combination with the IWO, GA, and TLBO models for GPM. Figure 3 shows the adapted research methodology flowchart. Once the factors effective for GPM and inventory mapping were prepared, the FR method was used for correlation analysis between each class of factor and the well locations. The six hybrid models were then employed to produce the groundwater potential maps. In the final step, all model outputs were evaluated using MSE and AUROC.

3.1. Support Vector Machine (SVM)

SVM is a widely used machine learning algorithm for solving classification and regression problems [45]. This algorithm searches for special linear models in which the margin of the hyper-plane is maximized, consequently maximizing the separation between the considered classes [46]. The training points closest to the maximum-margin hyperplane are referred to as the support vectors and are used to identify the boundaries between the classes [46]. Assuming the training points are selected using Equation (3),

D = {\{(x_{i}, y_{i})\}}_{i = 1}^{n}

(3)

where

x_{i}

is the input vector,

y_{i}

is the

i^{t h}

label associated with each training sample, and

n

is the total number of samples. In case the data are linearly separable, the classification function is then expressed by Equation (4) [46]:

y = s i g n (\sum_{i = 1}^{n} y_{i} a_{i} (X, X_{i}) + b)

(4)

where parameters

a

and

b

determine the hyperplane equation. However, if the data are not linearly separable, Equation (4) is transformed into Equation (5), as follows [46]:

y = s i g n (\sum_{i = 1}^{n} y_{i} a_{i} K (X, X_{i}) + b)

(5)

where

K (X, X_{i})

is the kernel function transferring the training samples to a higher-dimension space where the data can be linearly separated [46]. The RBF kernel has been frequently reported in the literature as the best kernel function [47,48,49]. Hence, the present study used this kernel for modeling. The equation used for the RBF kernel is presented in Equation (6).

K (x, y) = \exp (- γ ‖ x - y ‖^{2})

(6)

Finding an optimal value for γ in the RBF kernel is very important in achieving an efficient SVM. Moreover, the error function for the SVM model is defined in Equation (7) [46]:

\frac{1}{2} w^{T} w + C \sum_{i = 1}^{N} (ξ_{i} + ξ_{i}^{*})

(7)

In Equation (7),

C

,

w

, and

ξ

are the regularization parameters, the weight vector, and the bias, respectively. Finding the appropriate value for C is another important factor in achieving a reliable SVM model. Hence, the present study employed the GA, IWO, and TLBO algorithms to find the proper values for C and γ.

3.2. Adaptive Neuro-Fuzzy Inference System (ANFIS)

Proposed in 1993 by Jang et al. [50], the ANFIS combines neural networks and fuzzy logic to increase the efficiency of models. This model uses a set of input–output data to build an inference system. The ANFIS has five layers: a fuzzification layer, a product layer, a normalization layer, a defuzzification layer, and a single summation node [50]. In the training stage, the input values approach the real ones as the membership degree parameters are modified based on an acceptable error level [50].

The ANFIS employs neural networks and fuzzy logic to design the nonlinear mapping between input and output spaces. As an advantage, this algorithm allows for the extraction of fuzzy rules from numerical information or expert knowledge and creates a rule base. The learning rule in this method is based on the error backpropagation algorithm, aimed at minimizing the mean squared error between the network output and the real output.

Achieving a suitable model using the ANFIS depends on how the membership function and the FIS parameters are determined, for which various methods have been proposed in different studies [31,51,52,53]. Metaheuristic algorithms are among the most widely used methods in this regard. In order to incorporate these algorithms in the structure of the ANFIS model, the initial FIS is built based on the considered dataset. Subsequently, the metaheuristic algorithms are used to find the optimal values for the considered parameters based on an objective function, for which most studies use RMSE [31,51,52]: The main objective is to minimize the RMSE function. The termination criterion is commonly based on the number of iterations. Afterward, the optimal output values for the ANFIS model are calculated.

3.3. Teaching–Learning-Based Optimization (TLBO)

TLBO is a metaheuristic algorithm inspired by classroom teaching and learning processes [54]. Similar to students expanding their knowledge by learning from a teacher or exchanging information with other students, this algorithm attempts to improve the solutions in two phases, namely, the teacher and learner phases [54]. The problem includes

n_{p}

students taking

D

different courses, corresponding to a total of

n_{p} \times D

feasible solutions (dimensions), where

n_{p}

denotes the population size, and

D

is the number of design variables. The solutions are randomly initialized, and then, the teacher and learner phases are executed.

3.3.1. Teacher Phase

In the first phase, teachers attempt to transfer their knowledge to the students to raise the students’ knowledge level to theirs.

M_{i}

represents the average score of students in the

i^{t h}

position. The teachers attempt to increase this score through their skills and knowledge. The new scores are calculated as follows [54]:

X_{n e w, i} = X_{o l d, i} + r_{i} (M_{n e w} - T_{F} M_{i})

(8)

where

r_{i}

is a random number in the range (0, 1), and

T_{F}

is the teaching factor. The solutions move toward the best feasible solution and improves the students’ position. The current solution is replaced by the new solution if superior (i.e., its objective function is better).

3.3.2. Learner Phase

The students can gain new knowledge by exchanging information through group discussions, presentations, etc. In this phase, the following Algorithm 1 is executed [54]:

Algorithm 1

For

i = 1 : P_{n}

Randomly select two learners

X_{i}

and

X_{j}

, where

i \neq j

If

f (X_{i}) < f (X_{j})

X_{n e w, i} = X_{o l d, i} + r_{i} (X_{i} - X_{j})

Else

X_{n e w, i} = X_{o l d, i} + r_{i} (X_{j} - X_{i})

End If
End For
Accept

X_{n e w}

if it gives a better function value.

Two solutions are randomly selected in this phase, and the better solution improves the other. The new solution replaces the current solution if superior.

The steps taken in the TLBO algorithm are as follows:

Step 1: Initialize the problem parameters, such as the number of iterations.

Step 2: Initialize the solutions randomly.

Step 3: Improve the solutions by executing the teacher phase.

Step 4: Improve the solutions by executing the learner phase.

Step 5: Repeat Steps 3 to 5. The final solutions are ultimately obtained.

3.4. Genetic Algorithm (GA)

A genetic algorithm (GA) is a heuristic algorithm inspired by genetic science for solving optimization problems [55]. The algorithm starts with an initial population composed of a number of chromosomes (solutions), each comprising a number of genes (variables). This population is randomly initialized in the first iteration of the algorithm. A new generation of solutions is then produced by selecting and combining the solutions with a higher chance of producing better solutions. This process uses three operators: selection, crossover, and mutation [55].

3.4.1. Selection

The selection operator assesses the chromosomes (solutions) using a fitness function and selects the better ones as the parents for producing the next generation [55].

3.4.2. Crossover

In this stage, two new chromosomes are produced from the parent chromosomes through crossover. Different methods are available for the crossover, the simplest of which is the single-point crossover at which the parent chromosomes swap their chromosomes on one side of the selected point [55].

3.4.3. Mutation

In this stage, a few genes are randomly altered. Although the mutation probability is low, this stage is important, as it increases the population diversity, preventing the algorithm from being trapped in local optima [55].

3.5. Invasive Weed Optimization (IWO)

Inspired by weeds in nature, IWO is a metaheuristic algorithm used in optimization problems [56]. The main four steps in this algorithm are as follows:

Initialization: The initial solutions are dispersed with random positions across the d-dimensional search space.

Reproduction: The population members can produce seeds, the number of which is calculated based on Equation (9) [56]:

w e e d_{n} = \frac{f - f_{m i n}}{f_{m a x} - f_{m i n}} (s_{m a x} - s_{m i n}) + s_{m i n}

(9)

where

f

is the fitness value of a considered member (weed),

f_{m a x}

and

f_{m i n}

denote the maximum and minimum values for the population fitness, and

s_{m a x}

and

s_{m i n}

are the maximum and minimum values of a weed, respectively. Based on this formula, smaller fitness values for a weed indicate the weed’s ability to produce a smaller number of seeds and vice versa [56].

Spatial dispersal: The produced seeds are dispersed randomly based on a normal distribution, with zero mean and variable variance in the d-dimensional space [56]. This allows the seeds to be dispersed randomly but remain around the parent seeds. The standard deviation in each iteration is calculated based on Equation (10) [56]:

σ_{i t e r} = \frac{{(i t e r_{m a x} - i t e r)}^{n}}{{(i t e r_{m a x})}^{n}} (σ_{i n i t i a l} - σ_{f i n a l}) + σ_{f i n a l}

(10)

where

i t e r_{m a x}

represents the iteration threshold, and

n

indicates the nonlinear modulation index. This formula has been designed to reduce the standard deviation by each iteration, consequently allowing the fitter weeds to gather closer and eliminate the inappropriate ones.

Competitive exclusion: Since the population size rapidly increases through reproduction, the maximum size,

p_{\max}

, is reached after a few iterations, in which case, a competitive mechanism is applied to all members (both the initial members and those produced through reproduction) to eliminate those with a low fitness value and let the better members remain [56]. In fact, by eliminating the weak members, this mechanism provides a chance for the better members to produce new seeds and, therefore, produce better solutions for the problem. This cycle continues until the maximum number of iterations is reached or another termination criterion is met.

The algorithm can be summarized in six steps, as follows [56]:

Step 1: Initial weeds are initialized randomly.

Step 2: The fitness of population members is evaluated.

Step 3: Each population member produces a number of seeds so those with a higher fitness can reproduce more.

Step 4: The seeds are appropriately dispersed in the problem’s search space based on the positions obtained from Equation (10) (spatial dispersal).

Step 5: Once the population size reaches its maximum value,

p_{\max}

, those with a lower fitness are eliminated so that the better members produce new seeds (competitive exclusion).

Step 6: Steps 2 to 6 are repeated in case the termination criterion is not reached; otherwise, the member with the highest fitness is selected as the optimal solution, and the algorithm terminates.

3.6. Statistical Analysis

The first step of statistical analysis is finding the correlation between the predictive factors’ classes and the location of the wells. Frequency ratio (FR) statistical analysis was used to find the quantitative correlation between the defined classes of the 12 predictive factors and the wells’ locations [47].

The second stage of statistical analysis is assessing the goodness of the proposed model. There are different statistical methods to evaluate a model’s performance. This study used mean square error (MSE) and root mean square error (RMSE).

The other method used to assess a model’s performance is the receiver operating characteristic (ROC) curve. As the ROC curve is closer to the upper left corner in this method, the model performance is higher. Statistically, the area under the ROC curve is evaluated to compare the performance of a set of models [47].

4. Results

4.1. Comparison of Class Factors’ Correlation with Well Locations

The FR analysis found the probability correlation between the 12 predictive factors’ classes and the well locations. The results of this analysis are listed in Table 3. Accordingly, the greatest correlations came from heights below 571 m, the flat class of the aspect factor, 0–1.3° slopes, the flat class of the plane curvature factor, distances less than 200 m from drainage, TWIs above 14.55, the group 1 class of the lithology factor, the group 2 class of the geology factor, precipitation between 50 and 100 milliliters, sandy loam soil, SPI values above 300, and 0–0.43 lengths of the slope.

4.2. Comparison of the Runtimes of the Algorithms

The runtimes of the SVM-GA, SVM-IWO, and SVM-TLBO models were reported at 280, 60, and 4896 s, respectively, for 100 iterations. Therefore, SVM-IWO and SVM-TLBO were identified as the fastest and the slowest algorithms, respectively. Moreover, the runtimes of the ANFIS-GA, ANFIS-IWO, and ANFIS-TLBO models were reported to be 92, 56, and 23992 s, respectively. Among the hybrid algorithms based on the ANFIS, ANFIS-GA was the fastest model, whereas ANFIS-TLBO was the slowest model. In addition, the SVM-IWO and ANFIS-IWO algorithms had the shortest runtimes, whereas SVM-TLBO and ANFIS-TLBO had the longest runtimes.

4.3. Comparison of the Accuracy of the Algorithms

During the execution of the ANFIS-based hybrid algorithms, measures such as MSE, RMSE, mean, and standard deviation were calculated. Figure 4, Figure 5, and Figure 6 show these values for the ANFIS-GA, ANFIS-TLBO, and ANFIS-IWO models. As shown in the figures, the MSE values of the three models were reported as 0.054894, 0.04734, and 0.072158, respectively, in the training step, in which ANFIS-TLBO was more accurate than the other two models. On the other hand, the MSE values were reported as 0.055683, 0.050795, and 0.082486 for the above algorithms in the test step, where ANFIS-TLBO outperformed the other two models once again.

The MSE values for the SVM-GA, SVM-TLBO, and SVM-IWO models in the training step were 0.051009, 0.05199, and 0.1024, respectively, where SVM-GA was more accurate than the other two models. The values of the MSE in the test step for the three SVM-based models were 0.050538, 0.050612, and 0.20491, where SVM-IWO again exhibited a lower MSE value than the other two models.

In terms of the MSE measure, therefore, ANFIS-TLBO was the most accurate among the ANFIS-based hybrid models, as SVM-GA was among those based on SVM. Conversely, ANFIS-TLBO was the most accurate model in the training step, just as SVM-GA was in the test step in terms of MSE. Moreover, the ANFIS-TLBO model was more accurate than ANFIS-GA and ANFIS-IWO in terms of the RMSE measure, and SVM-GA was identified as the most accurate model with RMSE values of 0.22585 and 0.22481 in the training and test steps, respectively. Regarding the RMSE measure, ANFIS-TLBO was the most accurate model in the training step, just as SVM-GA was in the test step.

Figure 7 shows the ROC curves and AUC for the six hybrid models in the training and test steps. As observed, SVM-GA and SVM-TLBO, with an AUROC of 0.984, were the most accurate models in the training step, as ANFIS-TLBO was in the test step with an AUROC equal to 0.975.

5. Discussion

While many methods have been presented in various studies for GPM, few have assessed and compared hybrid models based on the well-known, versatile algorithms SVM and ANFIS. This research used the models GA, TLBO, and IWO in combination with the above two models resulting in six hybrids (SVM-GA, SVM-TLBO, SVM-IWO, ANFIS-GA, ANFIS-TLBO, and ANFIS-IWO) to obtain GPMs in the Azraq region in Jordan. Furthermore, the FR model was used here for the correlation analysis of the examined factors and the well locations.

The models were assessed and compared from two perspectives: (1) algorithm runtime and (2) model accuracy. The performance indicators for the models are listed in the following table (Table 4).

As can be seen in Table 4, the results demonstrated that the SVM-based hybrid models were faster than those based on ANFIS. Moreover, the shortest runtimes among the six hybrid models were exhibited by the IWO-based models, i.e., SVM-IWO and ANFIS-IWO, and the longest by the TLBO-based models, i.e., SVM-TLBO and ANFIS-TLBO. The hybrid models from the combination of GA with the SVM and ANFIS algorithms exhibited average runtimes.

When looking at the overall performance, although they were fast, the IWO-based hybrid algorithms, i.e., SVM-IWO and ANFIS-IWO, obtained lower accuracy than the other hybrid models. Those based on TLBO, on the other hand, i.e., SVM-TLBO and ANFIS-TLBO, obtained desirable and high accuracy despite their long runtimes. The GA-based models exhibited average runtimes compared to the other hybrids based on TLBO and IWO.

Comparing the performance of the different sets of models using the MSE, the ANFIS-TLBO had the lowest MSE in the training set, and it was the most accurate model while exhibiting the longest runtime (about 6.7 h). On the other hand, in the test set, SVW-GA had the lowest MSE with a relatively low runtime (about 5 min).

Looking at the AUROC values for the six models, it can be seen that all models performed well with AUROC values close to 1. In the training set, SVM-GA and SVM-TLBO had the best performance, while ANFIS-TLBO had the highest performance in the test set.

Figure 8 shows groundwater potential maps for the six models used in this paper. The visual comparison of the six maps in Figure 8 reveals that most of the wells are located in the high and very high zones for all models and that all the models are good for mapping the potential areas for GW.

6. Conclusions

In this study, groundwater potential maps were investigated in the central parts of the Azraq Basin in central Jordan using six hybrid models: SVM–GA, SVM–IWO, SVM–TLBO, ANFIS–GA, ANFIS–IWO, and ANFIS–TLBO. Subsequently, groundwater potential maps were developed using 12 predictor variables (elevation, slope, aspect, length of the slope, plan curvature, soil type, geology, lithology, rainfall, distance from drainage, topographic wetness index (TWI), and stream power index (SPI)), and the wells’ locations were analyzed using the frequency ratio (FR) statistical model. The receptor operating characteristics (AUROC) curve was used for model evaluation and comparison.

Based on the findings of this research, it is concluded that, given the time efficiency of the six hybrid models, the IWO-based algorithms, i.e., SVM-IWO and ANFIS-IWO, exhibited the shortest runtimes, and those based on TLBO, i.e., SVM-TLBO and ANFIS-TLBO, exhibited the longest. The runtimes of SVM-GA and ANFIS-GA were found to lie between the IWO-based and TLBO-based models. Furthermore, the algorithms with average and long runtimes obtained with AUC were better than those with short runtimes. However, all six models provided more or less acceptable AUCs, and a manager or planner could select the appropriate algorithm given the specific data size and the tradeoff between runtime and accuracy. Furthermore, the SVM-based hybrid algorithms were faster than those based on the ANFIS. In terms of accuracy, the models of the two classes obtained relatively close results in the learning and test steps. Hence, the groundwater potential maps obtained in this study can help water resource and environmental managers make managerial decisions regarding the preservation and correct utilization of groundwater resources.

Based on that, it is recommended that the models used in this research be compared to other machine learning algorithms, such as decision trees, with parameters tuned using heuristic and metaheuristic algorithms, which can be considered a future research direction.

Author Contributions

Conceptualization, S.A., A.A.-F. and R.S.; Methodology, R.A.-A., H.A.-A.; Software, A.R.A.-S. and A.A.-F.; Investigation, S.A., A.R.A.-S.; Resources, A.R.A.-S.; Writing—original draft, S.A. and A.A.-F.; Writing—review & editing, S.A., R.S., R.A.-A. and H.A.-A.; Supervision, R.A.-A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author, S.A. upon reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

References

Al-Abadi, A.M.; Pourghasemi, H.R.; Shahid, S.; Ghalib, H.B. Spatial Mapping of Groundwater Potential Using Entropy Weighted Linear Aggregate Novel Approach and GIS. Arab. J. Sci. Eng. 2016, 42, 1185–1199. [Google Scholar] [CrossRef]
Siebert, S.; Burke, J.; Faures, J.M.; Frenken, K.; Hoogeveen, J.; Döll, P.; Portmann, F.T. Groundwater use for irrigation—A global inventory. Hydrol. Earth Syst. Sci. 2010, 14, 1863–1880. [Google Scholar] [CrossRef] [Green Version]
Rahmati, O.; Melesse, A.M. Application of Dempster–Shafer theory, spatial analysis and remote sensing for groundwater potentiality and nitrate pollution analysis in the semi-arid region of Khuzestan, Iran. Sci. Total Environ. 2016, 568, 1110–1123. [Google Scholar] [CrossRef] [PubMed]
Chen, W.; Panahi, M.; Khosravi, K.; Pourghasemi, H.R.; Rezaie, F.; Parvinnezhad, D. Spatial prediction of groundwater potentiality using ANFIS ensembled with teaching-learning-based and biogeography-based optimization. J. Hydrol. 2019, 572, 435–448. [Google Scholar] [CrossRef]
Ghorbani Nejad, S.; Falah, F.; Daneshfar, M.; Haghizadeh, A.; Rahmati, O. Delineation of groundwater potential zones using remote sensing and gis-based data-driven models. Geocarto Int. 2017, 32, 167–187. [Google Scholar] [CrossRef]
Kumar, T.; Gautam, A.K.; Kumar, T. Appraising the accuracy of gis-based multi-criteria decision making technique for delineation of groundwater potential zones. Water Resour. Manag. 2014, 28, 4449–4466. [Google Scholar] [CrossRef]
Manap, M.A.; Nampak, H.; Pradhan, B.; Lee, S.; Sulaiman, W.N.A.; Ramli, M.F. Application of probabilistic-based frequency ratio model in groundwater potential mapping using remote sensing data and GIS. Arab. J. Geosci. 2012, 7, 711–724. [Google Scholar] [CrossRef]
Mogaji, K.A.; Lim, H.S. Application of a gis-/remote sensing-based approach for predicting groundwater potential zones using a multi-criteria data mining methodology. Environ. Monit. Assess. 2017, 189, 321. [Google Scholar] [CrossRef]
Pinto, D.; Shrestha, S.; Babel, M.S.; Ninsawat, S. Delineation of groundwater potential zones in the Comoro watershed, Timor Leste using GIS, remote sensing and analytic hierarchy process (AHP) technique. Appl. Water Sci. 2015, 7, 503–519. [Google Scholar] [CrossRef] [Green Version]
Şener, E.; Davraz, A.; Ozcelik, M. An integration of GIS and remote sensing in groundwater investigations: A case study in Burdur, Turkey. Hydrogeol. J. 2004, 13, 826–834. [Google Scholar] [CrossRef]
Jha, M.K.; Chowdhury, A.; Chowdary, V.M.; Peiffer, S. Groundwater management and development by integrated remote sensing and geographic information systems: Prospects and constraints. Water Resour. Manag. 2006, 21, 427–467. [Google Scholar] [CrossRef]
Mallick, J.; Khan, R.A.; Ahmed, M.; Alqadhi, S.D.; Alsubih, M.; Falqi, I.; Hasan, M.A. Modeling Groundwater Potential Zone in a Semi-Arid Region of Aseer Using Fuzzy-AHP and Geoinformation Techniques. Water 2019, 11, 2656. [Google Scholar] [CrossRef] [Green Version]
Lillesand, T.; Kiefer, R.W.; Chipman, J. Remote Sensing and Image Interpretation, 5th ed.; John Wiley & Sons: Hobokan, NJ, USA, 2004; ISBN 0471152277. [Google Scholar]
Das, S. Comparison among influencing factor, frequency ratio, and analytical hierarchy process techniques for groundwater potential zonation in Vaitarna basin, Maharashtra, India. Groundw. Sustain. Dev. 2019, 8, 617–629. [Google Scholar] [CrossRef]
Mallick, J.; Al-Wadi, H.; Rahman, A.; Ahmed, M. Landscape dynamic characteristics using satellite data for a mountainous watershed of abha, kingdom of saudi arabia. Environ. Earth Sci. 2014, 72, 4973–4984. [Google Scholar] [CrossRef]
Oh, H.-J.; Kim, Y.-S.; Choi, J.-K.; Park, E.; Lee, S. GIS mapping of regional probabilistic groundwater potential in the area of Pohang City, Korea. J. Hydrol. 2011, 399, 158–172. [Google Scholar] [CrossRef]
Krishnamurthy, J.; Mani, A.; Jayaraman, V.; Manivel, M. Groundwater resources development in hard rock terrain—An approach using remote sensing and GIS techniques. Int. J. Appl. Earth Obs. Geoinf. 2000, 2, 204–215. [Google Scholar] [CrossRef]
Punmia, B.; Jain, A.K.; Jain, A.K. Soil Mechanics and Foundations; Laxmi Publications: New Delhi, India, 2005; ISBN 8170087910. [Google Scholar]
Senanayake, I.; Dissanayake, D.; Mayadunna, B.; Weerasekera, W. An approach to delineate groundwater recharge potential sites in Ambalantota, Sri Lanka using GIS techniques. Geosci. Front. 2016, 7, 115–124. [Google Scholar] [CrossRef] [Green Version]
Adiat, K.; Nawawi, M.; Abdullah, K. Assessing the accuracy of GIS-based elementary multi criteria decision analysis as a spatial prediction tool—A case of predicting potential zones of sustainable groundwater resources. J. Hydrol. 2012, 440, 75–89. [Google Scholar] [CrossRef]
Adiat, K.; Nawawi, M.; Abdullah, K. Application of multi-criteria decision analysis to geoelectric and geologic param-eters for spatial prediction of groundwater resources potential and aquifer evaluation. Pure Appl. Geophys. 2013, 170, 453–471. [Google Scholar] [CrossRef]
Akinlalu, A.; Adegbuyiro, A.; Adiat, K.; Akeredolu, B.; Lateef, W. Application of multi-criteria decision analysis in pre-diction of groundwater resources potential: A case of Oke-ana, Ilesa area southwestern, Nigeria. Nriag J. Astron. Geophys. 2017, 6, 184–200. [Google Scholar] [CrossRef]
Chen, W.; Tsangaratos, P.; Ilia, I.; Duan, Z.; Chen, X. Groundwater spring potential mapping using population-based evolutionary algorithms and data mining methods. Sci. Total Environ. 2019, 684, 31–49. [Google Scholar] [CrossRef] [PubMed]
Duan, H.; Deng, Z.; Deng, F.; Wang, D. Assessment of Groundwater Potential Based on Multicriteria Decision Making Model and Decision Tree Algorithms. Math. Probl. Eng. 2016, 2016, 2064575. [Google Scholar] [CrossRef] [Green Version]
Falah, F.; Ghorbani Nejad, S.; Rahmati, O.; Daneshfar, M.; Zeinivand, H. Applicability of generalized additive model in groundwater potential modelling and comparison its performance by bivariate statistical methods. Geocarto Int. 2017, 32, 1069–1089. [Google Scholar] [CrossRef]
Jothibasu, A.; Anbazhagan, S. Spatial mapping of groundwater potential in ponnaiyar river basin using probabilis-tic-based frequency ratio model. Model. Earth Syst. Environ. 2017, 3, 33. [Google Scholar] [CrossRef]
Khosravi, K.; Panahi, M.; Tien Bui, D. Spatial prediction of groundwater spring potential mapping based on an adaptive neuro-fuzzy inference system and metaheuristic optimization. Hydrol. Earth Syst. Sci. 2018, 22, 4771–4792. [Google Scholar] [CrossRef] [Green Version]
Lee, S.; Hong, S.-M.; Jung, H.-S. GIS-based groundwater potential mapping using artificial neural network and support vector machine models: The case of Boryeong city in Korea. Geocarto Int. 2017, 33, 847–861. [Google Scholar] [CrossRef]
Emamgholizadeh, S.; Moslemi, K.; Karami, G. Prediction the Groundwater Level of Bastam Plain (Iran) by Artificial Neural Network (ANN) and Adaptive Neuro-Fuzzy Inference System (ANFIS). Water Resour. Manag. 2014, 28, 5433–5446. [Google Scholar] [CrossRef]
Bui, D.T.; Panahi, M.; Shahabi, H.; Singh, V.P.; Shirzadi, A.; Chapi, K.; Khosravi, K.; Chen, W.; Panahi, S.; Li, S.; et al. Novel Hybrid Evolutionary Algorithms for Spatial Prediction of Floods. Sci. Rep. 2018, 8, 15364. [Google Scholar] [CrossRef] [Green Version]
Hong, H.; Panahi, M.; Shirzadi, A.; Ma, T.; Liu, J.; Zhu, A.-X.; Chen, W.; Kougias, I.; Kazakis, N. Flood susceptibility assessment in hengfeng area coupling adaptive neuro-fuzzy inference system with genetic algorithm and differential evolution. Sci. Total Environ. 2018, 621, 1124–1141. [Google Scholar] [CrossRef]
Takagi, T.; Sugeno, M. Fuzzy Identification of Systems and Its Applications to Modeling and Control. IEEE Trans. Syst. Man Cybern. 1985, 15, 116–132. [Google Scholar] [CrossRef]
Moosavi, V.; Vafakhah, M.; Shirmohammadi, B.; Behnia, N. A Wavelet-ANFIS Hybrid Model for Groundwater Level Forecasting for Different Prediction Periods. Water Resour. Manag. 2013, 27, 1301–1321. [Google Scholar] [CrossRef]
Sreekanth, P.D.; Sreedevi, P.D.; Ahmed, S.; Geethanjali, N. Comparison of FFNN and ANFIS models for estimating groundwater level. Environ. Earth Sci. 2010, 62, 1301–1310. [Google Scholar] [CrossRef]
Al-shabeeb, A.A.R.; Al-Adamat, R.; Al-Amoush, H.; Alayyash, S. Delineating groundwater potential zones within the azraq basin of central Jordan using multi-criteria gis analysis. Groundw. Sustain. Dev. 2018, 7, 82–90. [Google Scholar] [CrossRef]
Salamah, I.; Bannayan, H. Water Resources of Jordan: Present Status and Future Potentials; Friedrich Ebert Stiftung: Berlin, Germany, 1993. [Google Scholar]
Wirmvem, M.J.; Mimba, M.E.; Kamtchueng, B.T.; Wotany, E.R.; Bafon, T.G.; Asaah, A.N.E.; Fantong, W.Y.; Ayonghe, S.N.; Ohba, T. Shallow groundwater recharge mechanism and apparent age in the Ndop plain, northwest Cameroon. Appl. Water Sci. 2015, 7, 489–502. [Google Scholar] [CrossRef] [Green Version]
Yeh, H.-F.; Lee, C.-H.; Hsu, K.-C.; Chang, P.-H. GIS for the assessment of the groundwater recharge potential zone. Environ. Geol. 2009, 58, 185–195. [Google Scholar] [CrossRef]
Abdalla, F. Mapping of groundwater prospective zones using remote sensing and GIS techniques: A case study from the Central Eastern Desert, Egypt. J. Afr. Earth Sci. 2012, 70, 8–17. [Google Scholar] [CrossRef]
Al-Fugara, A.K.; Ahmadlou, M.; Al-Shabeeb, A.R.; Alayyash, S.; Al-Amoush, H.; Al-Adamat, R. Spatial mapping of groundwater springs potentiality using grid search-based and genetic algorithm-based support vector regression. Geocarto Int. 2022, 37, 284–303. [Google Scholar] [CrossRef]
Pebesma, E.J.; de Kwaadsteniet, J. Mapping groundwater quality in the Netherlands. J. Hydrol. 1997, 200, 364–386. [Google Scholar] [CrossRef] [Green Version]
Van Stempvoort, D.; Ewert, L.; Wassenaar, L. Aquifer Vulnerability Index: A Gis—Compatible Method For Groundwater Vulnerability Mapping. Can. Water Resour. J. 1993, 18, 25–37. [Google Scholar] [CrossRef] [Green Version]
Campana, M.; Mahin, D. Model-derived estimates of groundwater mean ages, recharge rates, effective porosities and storage in a limestone aquifer. J. Hydrol. 1985, 76, 247–264. [Google Scholar] [CrossRef]
Bogena, H.; Kunkel, R.; Schöbel, T.; Schrey, H.; Wendland, F. Distributed modeling of groundwater recharge at the macroscale. Ecol. Model. 2005, 187, 15–26. [Google Scholar] [CrossRef]
Suthaharan, S. Support Vector Machine. In Machine Learning Models and Algorithms for Big Data Classification; Springer: Berlin/Heidelberg, Germany, 2016; ISBN 1489978526. [Google Scholar]
Vapnik, V.N. An overview of statistical learning theory. IEEE Trans. Neural Netw. 1999, 10, 988–999. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Al-Fugara, A.K.; Ahmadlou, M.; Shatnawi, R.; Alayyash, S.; Al-Adamat, R.; Al-Shabeeb, A.A.; Soni, S. Novel hybrid models combining meta-heuristic algorithms with support vector regression (SVR) for groundwater potential mapping. Geocarto Int. 2020, 37, 2627–2646. [Google Scholar] [CrossRef]
Debnath, R.; Takahashi, H. Kernel selection for the support vector machine. IEICE Trans. Inf. Syst. 2004, 87, 2903–2904. [Google Scholar]
Shafizadeh-Moghadam, H.; Tayyebi, A.; Ahmadlou, M.; Delavar, M.R.; Hasanlou, M. Integration of genetic algorithm and multiple kernel support vector regression for modeling urban growth. Comput. Environ. Urban Syst. 2017, 65, 28–40. [Google Scholar] [CrossRef]
Jang, J.-S.R. ANFIS: Adaptive-Network-Based Fuzzy Inference System. IEEE Trans. Syst. Man Cybern. 1993, 23, 665–685. [Google Scholar] [CrossRef]
Ahmadlou, M.; Karimi, M.; Alizadeh, S.; Shirzadi, A.; Parvinnejhad, D.; Shahabi, H.; Panahi, M. Flood susceptibility assessment using integration of adaptive network-based fuzzy inference system (ANFIS) and biogeography-based optimiza-tion (BBO) and bat algorithms (BA). Geocarto Int. 2019, 34, 1252–1272. [Google Scholar] [CrossRef]
Bui, D.T.; Khosravi, K.; Li, S.; Shahabi, H.; Panahi, M.; Singh, V.P.; Chapi, K.; Shirzadi, A.; Panahi, S.; Chen, W. New hybrids of anfis with several optimization algorithms for flood susceptibility modeling. Water 2018, 10, 1210. [Google Scholar] [CrossRef] [Green Version]
Bui, D.T.; Pradhan, B.; Nampak, H.; Bui, Q.-T.; Tran, Q.-A.; Nguyen, Q.-P. Hybrid artificial intelligence approach based on neural fuzzy inference model and metaheuristic optimization for flood susceptibilitgy modeling in a high-frequency tropical cyclone area using GIS. J. Hydrol. 2016, 540, 317–330. [Google Scholar] [CrossRef]
Rao, R.V.; Savsani, V.J.; Vakharia, D. Teaching–learning-based optimization: A novel method for constrained me-chanical design optimization problems. Comput. -Aided Des. 2011, 43, 303–315. [Google Scholar] [CrossRef]
Holland, J.H. Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Con-Trol, and Artificial Intelligence; MIT Press: Cambridge, MA, USA, 1992; ISBN 0262581116. [Google Scholar]
Mehrabian, A.; Lucas, C. A novel numerical optimization algorithm inspired from weed colonization. Ecol. Inform. 2006, 1, 355–366. [Google Scholar] [CrossRef]

Figure 1. Location map of the study area.

Figure 2. Explanatory variables for groundwater potential mapping (GPM).

Figure 3. Research methodology flowchart.

Figure 4. ANFIS-GA model: (a) observed and predicted ANFIS-GA values of training data; (b) observed and predicted ANFIS-GA values of testing data; (c) MSE and RMSE values of training data; (d) frequency errors of training data; (e) MSE and RMSE values of testing data; (f) frequency errors of testing data.

Figure 5. ANFIS-TLBO model: (a) observed and predicted ANFIS-TLBO values of training data; (b) observed and predicted ANFIS-TLBO values of testing data; (c) MSE and RMSE values of training data; (d) frequency errors of training data; (e) MSE and RMSE values of testing data; (f) frequency errors of testing data.

Figure 6. ANFIS-IWO model: (a) observed and predicted ANFIS-IWO values of training data; (b) observed and predicted ANFIS-IWO values of testing data; (c) MSE and RMSE values of training data; (d) frequency errors of training data; (e) MSE and RMSE values of testing data; (f) frequency errors of testing data.

Figure 7. ROC curves for the SVM-GA, SVM-TLBO, SVM-IWO, ANFIS-GA, ANFIS-TLBO, and ANFIS-IWO models.

Figure 8. Groundwater potential maps from the six hybrid models.

Table 1. Lithological units of the study area (NRA open files).

Lithology	Type
Group 1	Alluvium
Group 2	Mudflat
Group 3	Basalt
Group 4	Volcano

Table 2. Geological formation units outcropped in the study area.

Geology	Type
Group 1	Different basalt flows in northeast Jordan
Group 2	Pelitic sediments in mud flats
Group 3	Chalk, marl bituminous limestone, phosphorite
Group 4	Limestone with chert layers
Group 2	Terrestrial, fluviatile, and lacustrine sediments
Group 3	Fluviatile gravel lacustrine limestone
Group 4	Limestone with chert concretions
Group 8	Sandstone, conglomerate, marland evaporate
Group 9	Other

Table 3. The correlation analysis by FR.

Conditioning Factors	Classes	No. of Pixels	No. of Wells	FR
Attitude (m)	<571	2,339,888	312	3.09
	571–642	2,438,111	3	0.03
	642–725	1,166,062	1	0.02
	725–819	969,833	8	0.19
	819<	666,794	3	0.10
Aspect	Flat	1,621,144	278	3.98
	North	788,156	4	0.12
	Northeast	848,123	7	0.19
	East	868,080	9	0.24
	Southeast	775,815	1	0.03
	South	770,679	12	0.36
	Southwest	651,062	6	0.21
	West	635,448	5	0.18
	Northwest	622,181	5	0.19
Slope	0–1.3	3,532,823	296	1.94
	1.3–3.2	2,713,767	23	0.20
	3.2–5.9	1,057,097	7	0.15
	5.9<	277,001	1	0.08
Plan Curve	Convex	2,432,776	24	0.23
	Flat	2,851,341	288	2.34
	concave	2,296,571	15	0.15
Distance from	0–200	2,576,117	147	1.32
Drainage (m)	200–400	2,055,061	101	1.14
	400–600	1,456,195	47	0.75
	600–900	1,085,977	20	0.43
	900<	407,338	12	0.68
TWI	0–0.01	2,277,965	12	0.12
	0.01–11.45	2,379,073	28	0.27
	11.45–14.55	1,843,491	37	0.47
	14.55<	1,080,159	250	5.37
Lithology	Group 1	426,318	74	4.02
	Group 2	6,265,017	215	0.80
	Group 3	888,580	38	0.99
	Group 4	773	0	0
Geology	Group 1	865,369	31	0.83
	Group 2	153,348	11	1.66
	Group 3	135,175	1	0.17
	Group 4	2,430,691	12	0.11
	Group 5	1,772,489	168	2.20
	Group 6	1,863,636	97	1.21
	Group 7	121,284	7	1.34
	Group 8	236,275	0	0
	Other	2421	0	0
Rainfall (mm)	50–100	6,110,110	326	1.24
	150–200	59,511	0	0
	100–150	628,850	1	0.04
	50>	782,217	0	0
Soil Type	Loam	1,499,445	48	0.74
	Sandy Loam	6,042,993	279	1.07
	Silty Clay Loam	38,250	0	0
SPI	0–100	2,337,509	12	0.12
	100–200	941,212	15	0.37
	200–300	484,199	5	0.24
	300<	3,817,768	295	1.79
LS	0–0.43	7,511,462	326	1.01
	0.43–1.96	592,77	1	0.39
	1.96–4.78	8442	0	0
	4.78<	1507	0	0

Table 4. Performance indicators for the different models used in this study.

Model	Runtime (s)	MSE		AUROC
Model	Runtime (s)	Training Set	Test Set	Training Set	Test Set
SVM-GA	280	0.051009	0.050538	0.984	0.971
SVM-TLBO	4896	0.051990	0.050612	0.984	0.971
SVM-IWO	60	0.102400	0.204910	0.963	0.958
ANFIS-GA	92	0.054894	0.055683	0.979	0.972
ANFIS-TLBO	23,992	0.047340	0.050795	0.982	0.975
ANFIS-IOW	56	0.072158	0.082486	0.963	0.945

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

AlAyyash, S.; Al-Fugara, A.; Shatnawi, R.; Al-Shabeeb, A.R.; Al-Adamat, R.; Al-Amoush, H. Combination of Metaheuristic Optimization Algorithms and Machine Learning Methods for Groundwater Potential Mapping. Sustainability 2023, 15, 2499. https://0-doi-org.brum.beds.ac.uk/10.3390/su15032499

AMA Style

AlAyyash S, Al-Fugara A, Shatnawi R, Al-Shabeeb AR, Al-Adamat R, Al-Amoush H. Combination of Metaheuristic Optimization Algorithms and Machine Learning Methods for Groundwater Potential Mapping. Sustainability. 2023; 15(3):2499. https://0-doi-org.brum.beds.ac.uk/10.3390/su15032499

Chicago/Turabian Style

AlAyyash, Saad, A’kif Al-Fugara, Rania Shatnawi, Abdel Rahman Al-Shabeeb, Rida Al-Adamat, and Hani Al-Amoush. 2023. "Combination of Metaheuristic Optimization Algorithms and Machine Learning Methods for Groundwater Potential Mapping" Sustainability 15, no. 3: 2499. https://0-doi-org.brum.beds.ac.uk/10.3390/su15032499

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Combination of Metaheuristic Optimization Algorithms and Machine Learning Methods for Groundwater Potential Mapping

Abstract

1. Introduction

2. Study Area and Data

2.1. Study Area

2.2. Dataset

3. Methodology

3.1. Support Vector Machine (SVM)

3.2. Adaptive Neuro-Fuzzy Inference System (ANFIS)

3.3. Teaching–Learning-Based Optimization (TLBO)

3.3.1. Teacher Phase

3.3.2. Learner Phase

3.4. Genetic Algorithm (GA)

3.4.1. Selection

3.4.2. Crossover

3.4.3. Mutation

3.5. Invasive Weed Optimization (IWO)

3.6. Statistical Analysis

4. Results

4.1. Comparison of Class Factors’ Correlation with Well Locations

4.2. Comparison of the Runtimes of the Algorithms

4.3. Comparison of the Accuracy of the Algorithms

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI