Next Article in Journal
Reducing Environmental Impacts at the Royal Botanic Garden Edinburgh
Next Article in Special Issue
Strength Variation of Rocks Surrounding Road Tunnel Entrance/Exit in High–Altitude Mountain Areas under Freeze–Thaw Cycles
Previous Article in Journal
A Novel Solution for Optimized Energy Management Systems Comprising an AC/DC Hybrid Microgrid System for Industries
Previous Article in Special Issue
Analysis of Factors Affecting the Environmental Impact of Concrete Structures
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Novel Approach to Predicting Soil Permeability Coefficient Using Gaussian Process Regression

by
Mahmood Ahmad
1,*,
Suraparb Keawsawasvong
2,
Mohd Rasdan Bin Ibrahim
3,
Muhammad Waseem
4,
Kazem Reza Kashyzadeh
5,* and
Mohanad Muayad Sabri Sabri
6
1
Department of Civil Engineering, University of Engineering and Technology Peshawar (Bannu Campus), Bannu 28100, Pakistan
2
Department of Civil Engineering, Thammasat School of Engineering, Thammasat University, Pathumthani 12120, Thailand
3
Center for Transportation Research, Department of Civil Engineering, Engineering Faculty, Universiti Malaya, Kuala Lumpur 50603, Malaysia
4
Department of Civil Engineering, University of Engineering and Technology Peshawar, Peshawar 25000, Pakistan
5
Department of Transport, Academy of Engineering, Peoples’ Friendship University of Russia (RUDN University), 6 Miklukho-Maklaya Street, 117198 Moscow, Russia
6
Peter the Great St. Petersburg Polytechnic University, 195251 St. Petersburg, Russia
*
Authors to whom correspondence should be addressed.
Sustainability 2022, 14(14), 8781; https://0-doi-org.brum.beds.ac.uk/10.3390/su14148781
Submission received: 21 May 2022 / Revised: 4 July 2022 / Accepted: 13 July 2022 / Published: 18 July 2022

Abstract

:
In the design stage of construction projects, determining the soil permeability coefficient is one of the most important steps in assessing groundwater, infiltration, runoff, and drainage. In this study, various kernel-function-based Gaussian process regression models were developed to estimate the soil permeability coefficient, based on six input parameters such as liquid limit, plastic limit, clay content, void ratio, natural water content, and specific density. In this study, a total of 84 soil samples data reported in the literature from the detailed design-stage investigations of the Da Nang–Quang Ngai national road project in Vietnam were used for developing and validating the models. The models’ performance was evaluated and compared using statistical error indicators such as root mean square error and mean absolute error, as well as the determination coefficient and correlation coefficient. The analysis of performance measures demonstrates that the Gaussian process regression model based on Pearson universal kernel achieved comparatively better and reliable results and, thus, should be encouraged in further research.

1. Introduction

One of the most essential variables governing soil’s fluid-flow characteristics is its permeability. The importance of determining the soil permeability coefficient is widely acknowledged, and is affected by a variety of parameters, including mineralogy, soil density, soil structures, water content, void ratio, and others [1]. Ganjidoost et al. [2] reported that three category factors remarkably affect the soil permeability coefficient, namely, permeable soil parameters (density, clay content, viscosity etc.), inherent soil parameters (Atterberg limits, particle size distribution, etc.), and compacted soil factors (porosity, water content, density, etc.). Most of these factors are closely related to each other. It was reported that the soil permeability coefficient was decreased by over 100 times when the percentage passing through sieve No. 100 increased by in the range of 0 to 7% [3]. Conducting several experiments with the difference in percentages of granular and low-plastic marine soils, Shakoor and Cook [4] concluded that the soil permeability coefficient noticeably increased by increasing the percentage of granular material. The coefficient of soil permeability is used to solve a variety of geotechnical issues, including slope stability, and structure collapse due to ground settlement, seepage, and leakage. As a result, several authors have attempted to establish empirical correlations between affecting factors and permeability coefficients [5,6].
Field or laboratory tests can be used to determine the soil permeability coefficient. It is shown that determining the soil permeability coefficient in the field is expensive, complicated, time-consuming, and tedious [7,8,9]. However, obtaining undisturbed samples for laboratory measurements of soil permeability coefficient is problematic. In particular, laboratory samples are frequently reconstituted to match those collected in the field. As a result of the devastation of soil fabric during sampling, laboratory test findings may not reflect the true value of soil permeability in the field [10]. Due to the specific advantages and disadvantages of each test, the soil permeability coefficient is calculated using a combination of field and laboratory data [7,10]. To assess soil permeability, several researchers proposed a regression that takes into account porosity, clay percentage, and sand particle size [11]. Several other researchers calculated soil permeability based on particle shape, grain size, and bulk density [12,13]. As previously stated, soil permeability is greatly influenced by particle size distribution; nevertheless, this is not true for all soils [9,14]. These empirical relationships include limitations and uncertainties, according to Pham et al. [1] study.
Machine-learning (ML) algorithms have recently been successful in solving real-world issues in a variety of fields, including civil and environmental engineering [15], and geotechnical engineering [16,17,18,19,20,21]. Several studies have used ML methods to predict the soil permeability coefficient, such as the adaptive neuro-fuzzy system (ANFIS), artificial neural network (ANN), and hybrid optimization model of genetic algorithm-ANFIS (GA-ANFIS) [2,9,22,23]. Sezer et al. [24] used an ANFIS to estimate granular soil permeability and found that the ANFIS algorithm is effective at estimating granular soil permeability when grain size distribution and particle shape are taken into account [22]. In comparison to single ANN, ANFIS model, and the hybrid GAANN model, the hybrid model GAANFIs outperformed in terms of prediction accuracy [2]. Soft computing-based models, in general, are excellent techniques for predicting soil parameters; for instance, random forest (RF) has been effectively used to predict soil properties including shear strength and permeability coefficient [25,26]. In geotechnical research, the permeability coefficient (k) of soil is an important component for designing civil-engineering structures on soil. Correlating other soil engineering parameters using an empirical equation to estimate “k” may not be correct [5,6,27]. Therefore, the aims of this study are (1) to develop new improved prediction models based on the Gaussian process regression (GPR) on Da Nang–Quang Ngai expressway development-project-site soil by using six soil parameters, such as liquid limit LL (%), plastic limit PL (%), clay content CC (%), void ratio e, natural water content w (%), and specific density γ (g/cm3) as inputs; (2) to divide data into training and testing datasets with due attention to statistical aspects such as the minimum, maximum, mean and standard deviation of the datasets. The splitting of the datasets is performed to find out the predictive ability and generalization performance of developed models and later helps in better evaluating them; (3) to compare the proposed models to the reference models used in the published literature; and (4) to investigate the importance and impact of each input parameter on the soil permeability coefficient.

2. Methodology

2.1. Data Catalog

The dataset comprises of 84 soil samples obtained from detailed design state investigations of the Da Nang-Quang Ngai expressway development project near Da Nang, central Vietnam (Figure 1) and is reported in the research work of Pham et al. [28] (see Appendix A for complete dataset). Further details about the collection, testing and type of soils can be found in the Pham et al. [28] reference. Previous studies show that the coefficient of soil permeability is a function of the liquid limit LL (%), plastic limit PL (%), clay content CC (%), void ratio e, natural water content w (%), and specific density γ (g/cm3) [1,28]. It has been widely accepted, among researchers, that the input factors selected by Pham et al. [1,28] constitute a complete and suitable set to estimate “k”. As a result, these input variables were used to create the GPR model in the current study. The same input parameters related to permeability were used to estimate the “k” (×10−9 cm/s) of soil. Researchers have used a different percentage of the available data as the training and testing sets for different problems. For instance, Pham et al. [29] used 60%; Liang et al. [30] used 70%; while Ahmad et al. [31] used 80% of the data for training. In this study, the data set was divided into training (70%) and testing (30%) based on statistically consistency. The statistical consistency of training and testing datasets was based on statistically consistency. The statistical consistency of training and testing datasets has a substantial impact on the results when using soft computing techniques, which improves the performance of the model and helps in evaluating them better. Figure 2 depicts the cumulative percentage and frequency distributions for all of the input and output parameters of the mentioned database utilized in the modeling of soil permeability coefficient. The data points of every input parameter are distributed over its range. The statistical analysis, i.e., minimum (Min), maximum (Max), mean, and standard deviation (Std. Dev) of the training and testing datasets is presented in Table 1.

2.2. Gaussian Process Regression

Gaussian process regression (GPR) is a probabilistic, non-parametric supervised learning method for generalizing nonlinear and complicated function mapping hidden in data sets. The GPR model is based on Rasmussen and Williams’ [32] assumption that adjacent observations should communicate information about each other; it is a means of describing a prior directly over function space. The mean and covariance of a Gaussian distribution are vectors and matrices, respectively, whereas the Gaussian process is an over function. The GPR model can recognize a prediction distribution that is similar to the test input. A GPR is a set of random variables with a joint multivariate Gaussian distribution for any finite number. Let M × N denote the input and output domains, respectively, from which n pairings (Mi, Ni) are distributed independently and identically. For regression, let N⊆ ℜ; then, a GPR on 𝜒 is defined by a mean function μ: 𝜒→ℜ and a covariance function k:𝜒 × 𝜒→ℜ. Kuss [33] is recommended for more information on GPR and other covariance functions.

Details of Kernel Functions

The kernel function is used in the GPR design process. In the literature, several kernels have been discussed [34,35,36]. The following three kernel functions are used in this study:
  • Polynomial (Poly)
K ( M , N ) = ( 1 + ( M , N ) ) d
2.
Radial basis function (RBF)
K ( M , N ) = e λ | M N | 2
3.
Pearson universal kernel (PUK)
K ( M , N ) = ( 1 / [ 1 + ( 2 M N 2 2 ( 1 ω ) 1 / σ ) 2 ] ω )
The kernel width ( λ ) in RBF kernel, and parameters, σ (controls Pearson width) and ω (tailing factor of the peak) in PUK need to be established based on the precision in prediction.

2.3. Performance Metrics and Evaluation

To examine the performance of GPR modeling, the coefficient of determination (R2), correlation coefficient (R), mean absolute error (MAE), and root mean square error (RMSE) were utilized. The following formula can be used to compute these parameters:
R = i = 1 n [ ( y i o y ¯ p ) ( y i o y ¯ p ) ] i = 1 n ( y i o y ¯ p ) 2 i = 1 n ( y i o y ¯ p ) 2
R 2 = 1 i = 1 n ( y i p y i o ) 2 i = 1 n ( y i o y ¯ o ) 2
M A E = 1 N i = 1 n | y i o y i p |
R M S E = 1 N i = 1 n ( y i o y i p ) 2
where y i o and y i p represent the actual and estimated coefficient of soil permeability values, respectively; y ¯ o is the average of the reference samples’ values; and n is the defined number of data points.
The R2 and R are used to express the degree of collinearity between estimated and actual data. The correlation coefficient, which ranges from 1 to −1, indicates how closely actual and estimated data are related. If R is equal to 0, there is no linear relationship. If R = 1 or −1, there is a perfect positive or negative linear relationship. R2 indicates how much percentage of variance in estimated data the model can explain. R2 is a number that ranges from 0 to 1, with higher values indicating less error variation and values over 0.5 considered acceptable [37,38]. The MAE indicates the mean of the estimated and actual values. The adjustment has a better effect when the MAE is close to 0, meaning that the prediction model more accurately describes the set of training data [39]. The RMSE is the average magnitudes of the errors in predictions for all observations in a single measure of predictive power. The RMSE is larger than or equal to 0, with 0 signifying that the observed data is statistically perfectly fit. As a result, the lesser the values of MAE and RMSE criteria are, the better the model. Visual representations such as scatter plots were also employed to compare the performance of the established models. The flowchart of the methodology of the present study is shown in Figure 3.

3. Results and Discussion

To estimate the soil permeability coefficient, GPR models must be analyzed once they have been developed. The outcomes of the evaluation show if the models have practical value, that is, whether they can accurately estimate the soil permeability coefficient. As previously stated, 70% and 30% of total dataset records were used as training and testing sets, respectively, for modeling using the GRP approach.
The Waikato Environment for Knowledge Analysis (WEKA) software was used to implement a number of kernel-function-based Gaussian process regressions in this paper. The WEKA is a collection of machine-learning algorithms for data-mining jobs that is available as open-source software. Hyperparameters must be adjusted in most machine-learning algorithms. Table 2 depicts how the GPR-RBF, GPR-Poly, and GPR-PUK models’ essential hyperparameters were adjusted in this study. First, the models’ tuning parameters were set, and then the trials were repeated until the best fitness measures in Table 2 were obtained.
Table 3 lists the developed models’ results and their comparative performance results with other models reported in the literature. The top-ranked model was GPR-PUK, according to the results. Based on the training results, the R were 0.9901, 0.964, 0.9548; R2 0.980, 0.929, and 0.912; MAE 0.0023, 0.0028, and 0.0031; and RMSE 0.0038, 0.0047, and 0.0048 for GPR-PUK, GPR-Poly, and GPR-RBF models, respectively, the GPR-PUK outputs were verified to be the most compatible with actual coefficient of soil permeability values. Following that, GPR-Poly confirmed a high level of accuracy. Similarly, the GPR-PUK has the highest value of R (0.9754) and R2 (0.951), then comes the GPR-Poly (R = 0.9624; R2 = 0.926) and the GPR-RBF (R = 0.9387; R2 = 0.881) in the test dataset. The GPR-Poly, on the other hand, has the lowest values of MAE (0.0034), followed by the GPR-PUK (0.0037) and the GPR-RBF (0.0223), and the GPR-RBF has the lowest value of RMSE (0.0047), followed by the GPR-PUK (0.0062) and the GPR-Poly (0.0634).
Figure 4a–c and Figure 5a–c show the graphical correlation between measured (x-axis) and estimated (y-axis) coefficients of soil permeability for the training and testing datasets, respectively. The estimated values by GPR-PUK in the training and test sets have a high consistency with the actual/experimental values but fewer error points, as illustrated in Figure 4a and Figure 5a. The trend line for GPR-PUK was drawn by comparing the observed regression in Figure 4 and Figure 5, and the GPR-PUK findings have the maximum inclination to the line (see Figure 3a and Figure 4a) in the training phase (R2 = 0.980) and testing phase (R2 = 0.951), respectively. As a result, the GPR-PUK model proposed in this study can be utilized to calculate the soil permeability coefficient, as the predicted value agrees well with the actual value, indicating that this approach can accurately and effectively estimate the coefficient of soil permeability.

4. Comparison of Performance with Other Methods

In this section, the proposed GPR models were compared with other prediction models, i.e., RF, ANN, SVM, and M5P or M5Prime, reported in the literature and the CatBoost regression model, which is implemented in Orange software. The established values of user-defined parameters identified from various runs are the number of trees (100), maximum depth for CatBoost (10), and learning rate (0.042). In comparison to the findings of previous research published in the literature by Pham et al. [1] and compared with the CatBoost model, in the training dataset, the GPR (PUK) has the highest value of R (0.9901), followed by the GPR (Poly kernel) (0.964), CatBoost (0.960), GPR (RBF) (0.9548), RF (0.972), artificial neural network (ANN) (0.948) and the support vector machine (SVM) (0.861). The GPR (PUK) and RF has the lowest MAE (0.0023), followed by the ANN (0.0027), GPR (Poly) (0.0028), GPR (RBF) and CatBoost (0.0031), M5P (0.004) and the SVM (0.0056), respectively. In contrast, RF has the lowest RMSE (0.0035) value in comparison to the GPR (PUK) (0.0038). Similarly, in the testing dataset, the GPR (PUK) has the highest R value (0.9754), followed by the GPR (Poly) (0.9624), CatBoost (0.958), GPR (RBF) (0.9387), RF (0.851), ANN (0.845) and SVM (0.844). The ANN, on the other hand, has the lowest RMSE (0.001), and the CatBoost has the lowest MAE (0.0013). The M5P model reported by Pham et al. [28] has the R2 of 0.766, RMSE of 0.0064 and MAE of 0.004, in the case of the training dataset. Whereas, in the testing dataset, there is good agreement between actual and estimated values in the testing dataset, the M5P models’ error values are RMSE = 0.0081 and MAE = 0.0045 and the determination coefficient is high (R2 = 0.766) in the testing dataset. In general, the proposed GPR-PUK (R2 = 0.9754) has better prediction ability and has the highest goodness of fit with the data used in the training and testing datasets when compared to other models in this study.

5. Sensitivity Analysis

The developed models were evaluated using Yang and Zang’s [40] sensitivity analysis for measuring the influence of input factors on the coefficient of soil permeability. This approach [31,34,41,42,43,44] has been employed in a number of research investigations.
r i j = m = 1 n ( y i m × y o m ) m = 1 n y i m 2 m = 1 n y o m 2
where n is the number of data values, y i m and y o m are the input and output parameters. For each input parameter, the r i j value varied from zero to one, with the highest values indicating the most efficient output parameter (which was k, in this study). The value of rij must be close to 1 in order to assess the relationship between input and output variables. Figure 6 depicts the relative importance of input factors based on experimental actual and predicted coefficient of permeability values. As can be seen, the relative importance of various parameters can be displayed as follows: w > e > LL > PL > CC > γ. In other words, the w is the most significant factor for estimating the coefficient of soil permeability, while γ is the least important parameter.

6. Conclusions

In this study, the GPR modeling method was used to estimate the coefficient of the permeability of soil with six input parameters: liquid limit LL (%), plastic limit PL (%), clay content CC (%), void ratio e, natural water content w (%), and specific density γ (g/cm3). The available data is divided into two parts: training set (70%) and testing set (30%). The following is a summary of the findings of this study:
  • Comparing GPR models’ performance reveals that the GPR-PUK model gives more accurate prediction results with the coefficient of determination being 0.951, achieved from the correlation between experimental and estimated values of k.
  • The GPR-PUK model’s estimation of the soil permeability coefficient was found to be more reliable than that of the ANN, SVM, RF, and M5P models reported in the literature.
  • The findings of the sensitivity analysis demonstrate that different input factors have varying degrees of significance on the coefficient of soil permeability as w > e > LL > PL > CC > γ.
Development and improvement of the performance of models are a continuous process. The GPR-PUK model can accurately estimate the permeability coefficient of the soil using limited soil parameters, according to the findings of this study, but more research at different sites is needed to prove its wider application. It is evident that the proposed models are open to further modification, and that more data will result in much improved prediction capacity.

Author Contributions

Conceptualization, M.A., S.K.; methodology, M.A., K.R.K. and M.M.S.S.; software, M.A. and S.K.; validation, M.A., M.W. and M.R.B.I.; formal analysis, M.R.B.I. and M.A.; investigation, M.A., K.R.K., M.M.S.S. and M.W.; resources, M.M.S.S.; data curation, M.W.; writing—original draft preparation, M.A. and K.R.K.; writing—review and editing, M.A., M.R.B.I. and M.W.; visualization—review and editing, M.M.S.S.; supervision, M.A., M.R.B.I. and S.K.; project administration, M.M.S.S.; funding acquisition, M.M.S.S. All authors have read and agreed to the published version of the manuscript.

Funding

The research is partially funded by the Ministry of Science and Higher Education of the Russian Federation under the strategic academic leadership program ‘Priority 2030’ (Agreement 075-15-2021-1333 dated 30 September 2021).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data used to support the findings of this study are included within the article.

Acknowledgments

This paper was supported by the RUDN University Strategic Academic Leadership Program (recipient K. Reza Kashyzadeh: methodology, investigation, and writing—original draft preparation).

Conflicts of Interest

The authors declare no conflict of interest.

Notation

ANNArtificial neural network
RFRandom forest
SVMSupport vector machine
GPRGaussian process regression
MAEMean absolute error
M5PM5Prime algorithm
RMSERoot mean square error
PUKPearson universal kernel
RBFRadial basis function
XGBoostExtreme gradient boosting
R2Coefficient of determination
RCorrelation coefficient
kSoil permeability coefficient (×10−9 cm/s)
LLLiquid limit (%)
PLPlastic limit (%)
CCClay content (%)
eVoid ratio
wNatural water content (%)
γSpecific density (g/cm3)

Appendix A

Table A1. Dataset used in the present research.
Table A1. Dataset used in the present research.
S. No.CC (%)w (%)LL (%)PL (%)γ (g/cm3)ek (×10−9 cm/s)
14493.7375.6246.82.592.4530.029
221.720.7124.5813.52.720.6390.01
351.820.9838.1720.22.730.6250.003
49.718.0220.5114.22.680.6050.007
546.995.5882.25532.62.5140.026
612.722.7128.517.82.690.6710.01
747.585.3571.2440.52.622.2750.014
859.424.9541.8722.32.740.7130.003
99.223.9726.5219.82.670.7230.008
1055.398.0173.6340.12.592.5970.035
1144.879.9675.4543.62.592.0830.039
1251.173.7566.9635.82.611.9660.061
1346.125.7838.0317.52.730.8080.003
1456.183.2578.2341.92.622.2350.055
1516.117.5225.8512.22.690.5460.01
164925.4548.2424.82.720.7110.003
1710.724.5327.2219.62.690.7130.007
186478.7275.5339.52.642.1060.03
195.717.3520.3414.252.660.4940.006
2041.969.2666.4248.52.641.870.029
219.518.1221.214.52.680.5670.008
227.620.2323.6216.82.690.640.007
231120.1422.7816.12.670.6080.008
244535.5353.5628.62.741.0150.004
258.520.8125.3118.532.680.5760.005
268.620.1220.8214.82.670.5990.007
2710.717.2519.513.52.680.5580.008
288.921.7924.98192.680.6540.007
2946.499.982.1143.62.582.6340.041
309.717.3420.4914.32.660.4860.007
3125.921.2331.1813.22.720.6090.005
3212.519.2523.4614.672.670.6280.008
338.419.4622.9717.432.680.6050.007
348.123.2826.820.362.680.7070.011
3523.618.8427.4813.82.710.6040.006
3663.473.168.47352.611.9330.028
371918.3523.6113.352.70.5790.007
3842.527.2839.9921.742.720.7890.003
3949.462.259.9938.52.631.6570.026
4023.521.3232.2316.42.710.6040.005
416.116.9721.0115.872.660.5560.007
427.721.2325.318.52.680.6540.009
439.718.0120.314.22.670.5990.007
448.525.4927.4921.322.670.7230.008
4560.295.0984.0554.82.632.5070.038
4640.320.7540.7718.642.720.5910.003
478.418.2521.0814.52.690.5920.008
4850.728.9746.0425.22.720.8890.003
498.817.1919.8114.32.680.5490.007
5046.676.7764.8338.172.632.0230.025
519.617.9920.42152.670.5710.008
528.619.92316.92.680.5860.009
539.217.812114.32.680.5060.01
5411.719.7723.9113.52.680.5670.035
559.417.8520.4814.82.680.5580.008
5645.193.1988.93482.622.4470.057
5746.170.2165.4633.62.641.870.071
5837.421.1332.4414.22.710.6420.003
5945.319.630.9213.22.730.5690.007
601924.5529.0819.62.680.7070.017
6137.687.7175.3440.52.632.3290.048
62818.0520.9914.32.680.5950.01
638.519.8523.6717.582.670.5990.008
649.618.1822.58162.680.5670.006
658.618.0220.5114.62.690.5920.012
668.318.012114.22.670.5990.007
6710.218.1522.1415.62.670.5170.006
688.624.8429.32222.680.7520.012
6945.889.5185.8642.72.632.3720.051
7038.622.7935.8315.22.720.6890.009
718.217.1219.713.82.670.5710.01
7226.521.8930.9817.42.720.6190.005
7324.518.2828.1112.52.710.5220.006
742120.6228.6217.42.690.5920.014
759.321.1423.8918.532.680.6860.008
768.418.0221.114.52.670.5520.009
779.818.0720.6214.52.680.5670.01
7830.422.2339.5318.642.720.6480.004
799.822.0323.9217.82.680.6440.008
806.718.9121.49152.690.5820.007
8143.425.634.515.62.730.7170.005
8240.125.5336.1119.22.720.7550.01
838.715.0918.912.632.660.4620.008
849.419.6423.817.22.670.6480.009

References

  1. Pham, B.T.; Nguyen, M.D.; Al-Ansari, N.; Tran, Q.A.; Ho, L.S.; Le, H.V.; Prakash, I. A Comparative Study of Soft Computing Models for Prediction of Permeability Coefficient of Soil. Math. Probl. Eng. 2021, 2021, 7631493. [Google Scholar] [CrossRef]
  2. Ganjidoost, H.; Mousavi, S.J.; Soroush, A. Adaptive network-based fuzzy inference systems coupled with genetic algorithms for predicting soil permeability coefficient. Neural Process. Lett. 2016, 44, 53–79. [Google Scholar] [CrossRef]
  3. Cedergren, H.R. Seepage, Drainage, and Flow Nets; Wiley: London, UK, 1988; Volume 3. [Google Scholar]
  4. Shakoor, A.; Cook, B.D. The effect of stone content, size, and shape on the engineering properties of a compacted silty clay. Bull. Assoc. Eng. Geol. 1990, 27, 245–253. [Google Scholar] [CrossRef]
  5. Mitchell, J.K.; Hooper, D.R.; Campenella, R.G. Permeability of compacted clay. J. Soil Mech. Found. Div. 1965, 91, 41–65. [Google Scholar] [CrossRef]
  6. Olson, R.E. Effective stress theory of soil compaction. J. Soil Mech. Found. Div. 1963, 89, 27–45. [Google Scholar] [CrossRef]
  7. Vienken, T.; Dietrich, P. Field evaluation of methods for determining hydraulic conductivity from grain size data. J. Hydrol. 2011, 400, 58–71. [Google Scholar] [CrossRef]
  8. Rehfeldt, K.R.; Boggs, J.M.; Gelhar, L.W. Field study of dispersion in a heterogeneous aquifer: 3. Geostatistical analysis of hydraulic conductivity. Water Resour. Res. 1992, 28, 3309–3324. [Google Scholar] [CrossRef]
  9. Sinha, S.K.; Wang, M.C. Artificial neural network prediction models for soil compaction and permeability. Geotech. Geol. Eng. 2008, 26, 47–64. [Google Scholar] [CrossRef]
  10. Elhakim, A.F. Estimation of soil permeability. Alex. Eng. J. 2016, 55, 2631–2638. [Google Scholar] [CrossRef] [Green Version]
  11. Rawls, W.; Brakensiek, D. Estimation of soil water retention and hydraulic properties. In Unsaturated Flow in Hydrologic Modeling; Springer: Berlin/Heidelberg, Germany, 1989; pp. 275–300. [Google Scholar]
  12. Sperry, J.M.; Peirce, J.J. A model for estimating the hydraulic conductivity of granular material based on grain shape, grain size, and porosity. Groundwater 1995, 33, 892–898. [Google Scholar] [CrossRef]
  13. Lebron, I.; Schaap, M.; Suarez, D. Saturated hydraulic conductivity prediction from microscopic pore geometry measurements and neural network analysis. Water Resour. Res. 1999, 35, 3149–3158. [Google Scholar] [CrossRef]
  14. Hauser, V.L. Seepage control by particle size selection. Trans. ASAE 1978, 21, 691–0695. [Google Scholar] [CrossRef]
  15. Froemelt, A.; Dürrenmatt, D.J.; Hellweg, S. Using data mining to assess environmental impacts of household consumption behaviors. Environ. Sci. Technol. 2018, 52, 8467–8478. [Google Scholar] [CrossRef] [PubMed]
  16. Ahmad, M.; Tang, X.-W.; Qiu, J.-N.; Gu, W.-J.; Ahmad, F. A hybrid approach for evaluating CPT-based seismic soil liquefaction potential using Bayesian belief networks. J. Cent. South Univ. 2020, 27, 500–516. [Google Scholar]
  17. Ahmad, M.; Tang, X.-W.; Qiu, J.-N.; Ahmad, F. Evaluating Seismic Soil Liquefaction Potential Using Bayesian Belief Network and C4. 5 Decision Tree Approaches. Appl. Sci. 2019, 9, 4226. [Google Scholar] [CrossRef] [Green Version]
  18. Ahmad, M.; Tang, X.; Qiu, J.; Ahmad, F.; Gu, W. LLDV-a Comprehensive Framework for Assessing the Effects of Liquefaction Land Damage Potential. In Proceedings of the 2019 IEEE 14th International Conference on Intelligent Systems and Knowledge Engineering (ISKE), Dalian, China, 14–16 November 2019; pp. 527–533. [Google Scholar]
  19. Ahmad, M.; Tang, X.-W.; Qiu, J.-N.; Ahmad, F.; Gu, W.-J. A step forward towards a comprehensive framework for assessing liquefaction land damage vulnerability: Exploration from historical data. Front. Struct. Civ. Eng. 2020, 14, 1476–1491. [Google Scholar] [CrossRef]
  20. Ahmad, M.; Tang, X.; Ahmad, F. Evaluation of Liquefaction-Induced Settlement Using Random Forest and REP Tree Models: Taking Pohang Earthquake as a Case of Illustration. In Natural Hazards-Impacts, Adjustments & Resilience; IntechOpen: London, UK, 2020. [Google Scholar]
  21. Ahmad, M.; Al-Shayea, N.A.; Tang, X.-W.; Jamal, A.; Al-Ahmadi, H.M.; Ahmad, F. Predicting the Pillar Stability of Underground Mines with Random Trees and C4. 5 Decision Trees. Appl. Sci. 2020, 10, 6486. [Google Scholar] [CrossRef]
  22. Yilmaz, I.; Marschalko, M.; Bednarik, M.; Kaynar, O.; Fojtova, L. Neural computing models for prediction of permeability coefficient of coarse-grained soils. Neural Comput. Appl. 2012, 21, 957–968. [Google Scholar] [CrossRef]
  23. Park, H. Development of neural network model to estimate the permeability coefficient of soils. Mar. Georesources Geotechnol. 2011, 29, 267–278. [Google Scholar] [CrossRef]
  24. Sezer, A.; Göktepe, A.B.; Altun, S. Estimation of the Permeability of Granular Soils Using Neuro-fuzzy System. In Proceedings of the AIAI Workshops, Thessaloniki, Greece, 23–25 April 2009; pp. 333–342. [Google Scholar]
  25. Pham, B.T.; Qi, C.; Ho, L.S.; Nguyen-Thoi, T.; Al-Ansari, N.; Nguyen, M.D.; Nguyen, H.D.; Ly, H.-B.; Le, H.V.; Prakash, I. A novel hybrid soft computing model using random forest and particle swarm optimization for estimation of undrained shear strength of soil. Sustainability 2020, 12, 2218. [Google Scholar] [CrossRef] [Green Version]
  26. Singh, V.K.; Kumar, D.; Kashyap, P.; Singh, P.K.; Kumar, A.; Singh, S.K. Modelling of soil permeability using different data driven algorithms based on physical properties of soil. J. Hydrol. 2020, 580, 124223. [Google Scholar] [CrossRef]
  27. Garcia-Bengochea, I.; Altschaeffl, A.G.; Lovell, C.W. Pore distribution and permeability of silty clays. J. Geotech. Eng. Div. 1979, 105, 839–856. [Google Scholar] [CrossRef]
  28. Pham, B.T.; Ly, H.-B.; Al-Ansari, N.; Ho, L.S. A Comparison of Gaussian Process and M5P for Prediction of Soil Permeability Coefficient. Sci. Program. 2021, 2021, 3625289. [Google Scholar] [CrossRef]
  29. Pham, T.A.; Tran, V.Q.; Vu, H.-L.T.; Ly, H.-B. Design deep neural network architecture using a genetic algorithm for estimation of pile bearing capacity. PLoS ONE 2020, 15, e0243030. [Google Scholar] [CrossRef] [PubMed]
  30. Liang, W.; Luo, S.; Zhao, G.; Wu, H. Predicting hard rock pillar stability using GBDT, XGBoost, and LightGBM algorithms. Mathematics 2020, 8, 765. [Google Scholar] [CrossRef]
  31. Ahmad, M.H.; Hu, J.-L.; Ahmad, F.; Tang, X.-W.; Amjad, M.; Iqbal, M.J.; Asim, M.; Farooq, A. Supervised Learning Methods for Modeling Concrete Compressive Strength Prediction at High Temperature. Materials 2021, 14, 1983. [Google Scholar] [CrossRef] [PubMed]
  32. Rasmussen, C.; Williams, C. Gaussian Processes for Machine Learning; The MIT Press: Cambridge, MA, USA, 2006; Volume 38, pp. 715–719. [Google Scholar]
  33. Kuss, M. Gaussian Process Models for Robust Regression, Classification, and Reinforcement Learning. Ph.D. Thesis, Echnische Universität Darmstadt Darmstadt, Darmstadt, Germany, 2006. [Google Scholar]
  34. Ahmad, M.; Amjad, M.; Al-Mansob, R.A.; Kamiński, P.; Olczak, P.; Khan, B.J.; Alguno, A.C. Prediction of Liquefaction-Induced Lateral Displacements Using Gaussian Process Regression. Appl. Sci. 2022, 12, 1977. [Google Scholar] [CrossRef]
  35. Sihag, P.; Tiwari, N.; Ranjan, S. Modelling of infiltration of sandy soil using gaussian process regression. Modeling Earth Syst. Environ. 2017, 3, 1091–1100. [Google Scholar] [CrossRef]
  36. Elbeltagi, A.; Azad, N.; Arshad, A.; Mohammed, S.; Mokhtar, A.; Pande, C.; Etedali, H.R.; Bhat, S.A.; Islam, A.R.M.T.; Deng, J. Applications of Gaussian process regression for predicting blue water footprint: Case study in Ad Daqahliyah, Egypt. Agric. Water Manag. 2021, 255, 107052. [Google Scholar] [CrossRef]
  37. Santhi, C.; Arnold, J.G.; Williams, J.R.; Dugas, W.A.; Srinivasan, R.; Hauck, L.M. Validation of the swat model on a large rwer basin with point and nonpoint sources 1. JAWRA J. Am. Water Resour. Assoc. 2001, 37, 1169–1188. [Google Scholar] [CrossRef]
  38. Van Liew, M.; Arnold, J.; Garbrecht, J. Hydrologic simulation on agricultural watersheds: Choosing between two models. Trans. ASAE 2003, 46, 1539. [Google Scholar] [CrossRef]
  39. Lin, S.; Zheng, H.; Han, C.; Han, B.; Li, W. Evaluation and prediction of slope stability using machine learning approaches. Front. Struct. Civ. Eng. 2021, 15, 821–833. [Google Scholar] [CrossRef]
  40. Yang, Y.; Zhang, Q. A hierarchical analysis for rock engineering using artificial neural networks. Rock Mech. Rock Eng. 1997, 30, 207–222. [Google Scholar] [CrossRef]
  41. Faradonbeh, R.S.; Armaghani, D.J.; Abd Majid, M.; Tahir, M.M.; Murlidhar, B.R.; Monjezi, M.; Wong, H. Prediction of ground vibration due to quarry blasting based on gene expression programming: A new model for peak particle velocity prediction. Int. J. Environ. Sci. Technol. 2016, 13, 1453–1464. [Google Scholar] [CrossRef] [Green Version]
  42. Chen, W.; Hasanipanah, M.; Rad, H.N.; Armaghani, D.J.; Tahir, M. A new design of evolutionary hybrid optimization of SVR model in predicting the blast-induced ground vibration. Eng. Comput. 2019, 37, 1455–1471. [Google Scholar] [CrossRef]
  43. Rad, H.N.; Bakhshayeshi, I.; Jusoh, W.A.W.; Tahir, M.; Foong, L.K. Prediction of flyrock in mine blasting: A new computational intelligence approach. Nat. Resour. Res. 2020, 29, 609–623. [Google Scholar]
  44. Amjad, M.; Ahmad, I.; Ahmad, M.; Wróblewski, P.; Kamiński, P.; Amjad, U. Prediction of pile bearing capacity using XGBoost algorithm: Modeling and performance evaluation. Appl. Sci. 2022, 12, 2126. [Google Scholar] [CrossRef]
Figure 1. Da Nang-Quang Ngai expressway project location map.
Figure 1. Da Nang-Quang Ngai expressway project location map.
Sustainability 14 08781 g001
Figure 2. Frequency distribution histogram of inputs (in blue) and output (in green) parameter.
Figure 2. Frequency distribution histogram of inputs (in blue) and output (in green) parameter.
Sustainability 14 08781 g002aSustainability 14 08781 g002b
Figure 3. Flowchart of the proposed methodology.
Figure 3. Flowchart of the proposed methodology.
Sustainability 14 08781 g003
Figure 4. Comparison of the predicted and actual results of various kernel-function-based GPR models in the training dataset: (a) GPR-PUK, (b) GPR-Poly, and (c) GPR-RBF.
Figure 4. Comparison of the predicted and actual results of various kernel-function-based GPR models in the training dataset: (a) GPR-PUK, (b) GPR-Poly, and (c) GPR-RBF.
Sustainability 14 08781 g004
Figure 5. Comparison of the predicted and actual results of various kernel-function-based GPR models in the testing dataset: (a) GPR-PUK, (b) GPR-Poly, and (c) GPR-RBF.
Figure 5. Comparison of the predicted and actual results of various kernel-function-based GPR models in the testing dataset: (a) GPR-PUK, (b) GPR-Poly, and (c) GPR-RBF.
Sustainability 14 08781 g005aSustainability 14 08781 g005b
Figure 6. Sensitivity analysis of the input parameters.
Figure 6. Sensitivity analysis of the input parameters.
Sustainability 14 08781 g006
Table 1. Statistical analysis of the study’s inputs and output.
Table 1. Statistical analysis of the study’s inputs and output.
DatasetParametersClay Content, CC (%)Water Content, w (%)Liquid Limit, LLPlastic Limit, PLSpecific Density, γ (g/cm3)Void Ratio, ePermeability Coefficient, k (×10−9 cm/s)
TrainingMin5.716.9719.512.22.580.4860.003
Mean28.05637.8240.21923.8822.67151.05760.016
Max6499.988.9354.82.742.6340.071
Std. Dev19.76128.6222.22812.3470.04130.72340.016
TestingMin6.715.0918.912.52.630.4620.004
Mean18.3625.7530.30418.2792.68360.75530.012
Max45.889.5185.8642.72.732.3720.051
Std. Dev13.33719.1316.2727.38790.02560.48560.012
Table 2. The optimal tuning parameters for various regression models.
Table 2. The optimal tuning parameters for various regression models.
ModelOptimal Tuning Parameters
PUK kernel{noise = 0.6, ω = 0.1, σ = 0.1}
Poly kernel{noise = 0.02}
RBF kernel { noise = 0.04 , λ = 0.6}
Table 3. Comparative performance of the GRP method and previously existing models.
Table 3. Comparative performance of the GRP method and previously existing models.
ModelTrainingTestingReference
RR2MAERMSERR2MAERMSE
RF0.972-0.00230.00350.851-0.00490.0084[1]
ANN0.948-0.00270.00470.845-0.0050.001
SVM0.861-0.00560.00780.844-0.00640.0098
M5P-0.7920.0040.0064-0.7660.00450.0081[28]
GPR (PUK)0.99010.9800.00230.00380.97540.9510.00370.0062Present study
GPR (Poly kernel)0.9640.9290.00280.00470.96240.9260.02230.0634
GPR (RBF)0.95480.9120.00310.00480.93870.8810.00340.0047
CatBoost0.9600.9220.00310.00520.9580.91780.00130.0031
“-“ respective performance measure value is not reported in the reference.
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Ahmad, M.; Keawsawasvong, S.; Bin Ibrahim, M.R.; Waseem, M.; Kashyzadeh, K.R.; Sabri, M.M.S. Novel Approach to Predicting Soil Permeability Coefficient Using Gaussian Process Regression. Sustainability 2022, 14, 8781. https://0-doi-org.brum.beds.ac.uk/10.3390/su14148781

AMA Style

Ahmad M, Keawsawasvong S, Bin Ibrahim MR, Waseem M, Kashyzadeh KR, Sabri MMS. Novel Approach to Predicting Soil Permeability Coefficient Using Gaussian Process Regression. Sustainability. 2022; 14(14):8781. https://0-doi-org.brum.beds.ac.uk/10.3390/su14148781

Chicago/Turabian Style

Ahmad, Mahmood, Suraparb Keawsawasvong, Mohd Rasdan Bin Ibrahim, Muhammad Waseem, Kazem Reza Kashyzadeh, and Mohanad Muayad Sabri Sabri. 2022. "Novel Approach to Predicting Soil Permeability Coefficient Using Gaussian Process Regression" Sustainability 14, no. 14: 8781. https://0-doi-org.brum.beds.ac.uk/10.3390/su14148781

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop