Next Article in Journal
Wavelet Density and Regression Estimators for Functional Stationary and Ergodic Data: Discrete Time
Next Article in Special Issue
Residual Information in Deep Speaker Embedding Architectures
Previous Article in Journal
Simulations of Hypersonic Boundary-Layer Transition over a Flat Plate with the Spalart-Allmaras One-Equation BCM Transitional Model
Previous Article in Special Issue
Co-Occurrence-Based Double Thresholding Method for Research Topic Identification
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Predicting Subgrade Resistance Value of Hydrated Lime-Activated Rice Husk Ash-Treated Expansive Soil: A Comparison between M5P, Support Vector Machine, and Gaussian Process Regression Algorithms

1
Department of Civil Engineering, University of Engineering and Technology Peshawar (Bannu Campus), Bannu 28100, Pakistan
2
Department of Civil Engineering, College of Engineering and Islamic Architecture, Umm Al-Qura University, Makkah 24382, Saudi Arabia
3
Department of Civil Engineering, Faculty of Engineering, International Islamic University Malaysia, Jalan Gombak 50728, Selangor, Malaysia
4
Department of Civil Engineering, Thammasat School of Engineering, Thammasat University, Pathumthani 12120, Thailand
5
Department of Building and Construction Techniques Engineering, Al-Mustaqbal University College, Hilla 51001, Iraq
6
State Key Laboratory of Coastal and Offshore Engineering, Dalian University of Technology, Dalian 116024, China
*
Authors to whom correspondence should be addressed.
Submission received: 3 August 2022 / Revised: 11 September 2022 / Accepted: 18 September 2022 / Published: 21 September 2022

Abstract

:
Resistance value (R-value) is one of the basic subgrade stiffness characterizations that express a material’s resistance to deformation. In this paper, artificial intelligence (AI)-based models—especially M5P, support vector machine (SVM), and Gaussian process regression (GPR) algorithms—are built for R-value evaluation that meets the high precision and rapidity requirements in highway engineering. The dataset of this study comprises seven parameters: hydrated lime-activated rice husk ash, liquid limit, plastic limit, plasticity index, optimum moisture content, and maximum dry density. The available data are divided into three parts: training set (70%), test set (15%), and validation set (15%). The output (i.e., R-value) of the developed models is evaluated using the performance measures coefficient of determination (R2), mean absolute error (MAE), relative squared error (RSE), root mean square error (RMSE), relative root mean square error (RRMSE), performance indicator (ρ), and visual framework (Taylor diagram). GPR is concluded to be the best performing model (R2, MAE, RSE, RMSE, RRMSE, and ρ equal to 0.9996, 0.0258, 0.0032, 0.0012, 0.0012, and 0.0006, respectively, in the validation phase), very closely followed by SVM, and M5P. The application used for the aforementioned approaches for predicting the R-value is also compared with the recently developed artificial neural network model in the literature. The analysis of performance measures for the R-value dataset demonstrates that all the AI-based models achieved comparatively better and reliable results and thus should be encouraged in further research. Sensitivity analysis suggests that all the input parameters have a significant influence on the output, with maximum dry density being the highest.

1. Introduction

A flexible pavement structure is composed of several layers, the lowest of which is known as the pavement foundation or subgrade. The strength and stiffness of the subgrade are critical for pavement design, construction, and performance. California bearing ratio, resistance value (R-value), and resilient modulus are all terms used to describe subgrade strength and stiffness. In this study, R-value is used to represent subgrade strength and/or weakness. A soil’s deformation resistance is expressed as a function of the applied vertical pressure to lateral pressure ratio. The R-value, which ranges from 0 to 100, represents soil strength and stiffness, with 100 representing the highest strength [1]. The resulting pavement may be overdesigned if a low subgrade R-value is used in its design, despite the fact that the actual subgrade is not weak. If a high R-value is utilized in the design while the actual subgrade is weak, the subsequent pavement structure may be too thick to protect the weaker subgrade soil from traffic stresses [2,3]. The R-value is an important parameter in pavement design because the thickness of each layer is determined by the R-value and traffic index (expected level of traffic loading). It can be measured through a particular test called the stabilometer test in the laboratory. This test actually measures the resistance of soil/aggregate sample to the vertically applied load under particular conditions. Usually, foundation material suffers lateral deformation due to subjected vertical loading [4,5]. Rafiqul et al. [6] demonstrated from elastic analysis that increasing subgrade R-values can significantly reduce compressive strain at the top of the subgrade. Subgrade treatment can help to reduce stress and strains in weak subgrade.
This test was first used by Caltron for the purpose of pavement design, replacing the California bearing ratio test. In some areas, R-value testing is used for the road aggregate and subgrade soil. The R-value of a soil stratum can be defined as the ability of a soil medium to resist lateral spreading due to applied vertical load. In the case of pavement, tire load is applied as vertical load. The base course contains well-graded crushed stone with dense gradation bearing R-value of 80+ while silts attain 15–30. For robust pavement design, accurate prediction of the mechanical index of geomaterials is necessary [7]. There are certain parameters on which the pavement design is dependent, one of which is the R-value of the soil [5]. The R-value of a material is determined when the material is saturated to the point where water will be expelled from the compacted test specimen when a 16.8 kN (2.07 MPa) load is applied. Because it is not always possible to prepare a test specimen that will exude water at the specified load, a series of specimens prepared at different moisture contents must be tested [6]. The R-value can be measured using a laboratory stabilometer following the ASTM D 2844, AASHTO T 190, and California Test CT 301 and the following formula [6]:
R = 100 100 2.5 / D P v / P h 1 + 1
where D denotes the displacement of stabilometer fluid required to increase the horizontal pressure from 5 to 100 psi (34.5 to 689.5 kPa), Pv denotes the applied vertical pressure of 160 psi (1103 kPa), Ph denotes the transmitted horizontal pressure of 160 psi (1103 kPa), and R denotes the resistance value. The R-value can also be calculated empirically using soil classification and index properties.
The New Mexico Department of Transportation (NMDOT) uses a field empirical method to estimate the R-value, which involves first determining the American Association of State Highway and Transportation Officials (AASHTO) soil classification and the plasticity index, and then referencing the R-value from a standard estimated table of values [8]. It is a very complex and time-consuming process to determine the R-value in the laboratory [9]. In a similar manner, complications can also occur in the case of expansive soil for strength improvement and needs stabilization [5,9,10]. Onyelowe et al. [11] used 121 data points of expansive soil treated with recycled and activated composites of rice husk ash to predict the behavior of soil-strength parameters, such as R-value using the evolutionary hybrid algorithms of an artificial neural network (ANN). The results demonstrated that the aforementioned model can obtain the measured R-value with acceptable performance.
In the last 10 years, artificial intelligence (AI) techniques have been widely used to solve real-world problems, particularly in civil engineering. AI techniques have been successfully applied to a wide range of real-world scenarios, paving the way for a number of promising opportunities in civil engineering and other fields, such as environmental [12], geotechnical and geological [13,14,15,16,17,18,19,20,21,22,23,24,25,26], and other sciences [27,28,29,30] including R-value prediction [11,31]. These studies provided new concepts and ways for predicting R-value. However, this field is still being researched. This paper focuses on the use of AI-based models—especially M5P, support vector machine, and Gaussian process regression—to analyze the prediction of R-value for hydrated lime-activated rice husk ash (HARHA)-treated expansive soil, emphasizing accuracy and efficiency, as well as the ability of each technique to deal with experimental data. This study uses seven input parameters—HARHA, liquid limit (LL), plastic limit (PL), plasticity index (PI), optimum moisture content (OMC), clay activity (AC), and maximum dry density (MDD)—assessed for the estimation of output R-value. The accuracy of the models was compared with previously developed models in the literature using performance indices of coefficient of determination, mean absolute error, relative squared error, root mean square error, relative root mean square error, and performance indicator.
The first section of this research paper contains the introduction, the second section details about the dataset, Pearson’s correlation analysis, and a short literature review on soft computing techniques for estimation of the R-value. The third section has thorough discussion of the results, and the last section summarizes the whole study and presents the conclusions and future prospects.

2. Materials and Methods

2.1. Data Collection

The dataset acquired from Onyelowe et al. [11] was organized into three parts: training, testing, and validation. The total dataset was divided such that the training dataset is 70% (85) of the total dataset and the remaining 30% (36) dataset was equally divided between testing and validation. The complete database is available in Table A1 in Appendix A. In the Onyelowe et al. [11] study, a number of tests were conducted on prepared expansive soil. Both treated and untreated soil specimens were used for determination of the dataset. The seven input parameters—HARHA (X1), LL (X2), PL (X3), PI (X4), OMC (X5), AC (X6), and MDD (X7)—were assessed for estimation of output R-value (Y). A hybrid geometrical binder called HARHA was prepared from blending of rice husk with 5% hydrated lime Ca(OH)2, and the mix then remained for 24 h for complete activation reaction. The hydrated lime acted as alkali activator, while the rice husk was obtained from rice-processing mills. Basically, rice husk is an agro industrial waste. The rice husk mass was turned into ash by direct combustion to obtain rice husk ash (RHA) [32]. The HARHA was also used to treat the soil in varying proportions ranging from 0.1 to 12. Therefore, in this study, these seven input parameters were used to develop the proposed models for comparison. The descriptive statistics of each input and output are listed in Table 1 and the histograms are presented in Figure 1.

2.2. Pearson’s Correlation Analysis

In order to determine the linear relationship between input and output parameters, this study employed Pearson’s correlation coefficient [33]. The Pearson correlation coefficient (ρ) was used to determine the correlations between each pairwise variable. For random variables (p, q), the following equation was used:
ρ ( p , q ) = cov ( p , q ) σ p σ q
where σ p and σ p are the standard deviations of p and q, respectively, while cov represents the covariance.
Table 2 clearly summarizes the correlation of all parameters according to the ρ (absolute value), where |ρ| > 0.8 represents a strong relationship among each pairwise variable, values between 0.3 and 0.8 represent medium relationship, and |ρ| < 0.30 represents weak relationship [34]. It can be easily figured out from Table 2 that the HARHA (X1) is strongly correlated with R-value (|ρ| = 0.9844), while the optimum moisture content (X5) is slightly correlated with R-value (|ρ| = 0.3639).

2.3. Machine-Learning Algorithms

2.3.1. Gaussian Process Regression (GPR)

GPR is a purely probability-based method. It is used for regression and classification problems [35,36]. This method has attracted the interest of many researchers in different scientific areas [37,38]. It gives good results in cases of nonlinear data modeling. Nonlinearity may be due to kernel functions [39]. In addition to regression, it gives a very effective response to input parameters [40].
For the computation and assessment of the training data, the input parameter is assumed as D = { ( x i , y i ) i = 1 , , n } , X R D × n and the output desired parameter is y R n . In GPR computation of the output parameter can be calculated from the function y = f x + ε , where ε ~ N ( 0 , σ n 2 ) R is the equal noise variance for all x i samples [40].
Gaussian process regression number of observations n considered a single point instance in the y = { y 1 , , y n } vector of the Gaussian multivariable distribution. Moreover, it is assumed that this distribution has zero mean value. Covariance is a function that shows the dependence of one observation on another. In GPR, to approximate functions, the square of the covariance determiner is commonly used, which is as follows [35]:
k ( x , x ) = σ f 2 × exp ( x x ) 2 2 l 2 + σ n 2 δ ( x , x )
When the values of x and x are so close to each other, the value of k ( x , x ) is equal to the maximum allowable covariance. As a result, the values of f ( x ) and f ( x ) are approximately the same. l is the length of the kernel function. In addition, δ ( x , x ) is called the Kronecker product, which is defined as follows:
δ i j = 1   if   i = j   and   δ i j = 0   if   i j
In the case of the training dataset, for a new input pattern, the prediction of y * as an output value is the final objective. In order to obtain the desired objective, three covariance matrices need to be developed:
K = k ( x 1 , x 1 ) k ( x 1 , x 2 ) k ( x 1 , x n ) k ( x 2 , x 1 ) k ( x 2 , x 2 ) k ( x 2 , x n ) k ( x n , x 1 ) k ( x n , x 2 ) k ( x n , x n )
K * = k ( x * , x 1 ) k ( x * , x 2 ) k ( x * , x n ) K * * = k ( x * , x * )
Data from Gaussian multivariable distribution are assumed:
y y * ~ N 0 , K K * T K * K * *
From Gaussian multivariable distribution, the approximate average and variance of the predicted output are obtained:
E ( y * ) = K * K 1 y var ( y * ) = K * * K 1 K * T

2.3.2. M5P

Wang and Witten [41] rebuilt and presented the M5P algorithm from the M5 algorithm initially proposed by Quinlan [42], with the addition of a linear regression function to the leaves nodes.
The M5P could perform better on datasets than M5 by trimming the tree size. Typically, M5P consists of the following three steps:
(i)
Creating a model tree: Using the dividing criterion, the entered space was divided into numerous subspaces. The predicted error at the node was reduced by applying the standard deviation reduction factor (SDR). The SDR equation is displayed as follows:
SDR = s d H i H i H × s d H i , s d H = 1 N H i H ¯ 2 N 1 , H ¯ = 1 N H i N .
where Hi is the set that is obtained from a divided node according to a given attribute, sd is the standard deviation of Hi, and H is the instance dataset that stretches the node.
(ii)
Pruning tree: In each subspace, a classifying and regression tree is introduced to overfit the problem and improves classification performance. Any errors encountered in the learning data will be removed during the pruning process.
(iii)
Smoothing step: The nearby linear models may experience abrupt discontinuities as a result of the pruning tree. Therefore, all of the leaf models will be integrated from the leaf to the root to create the final model in order to tackle this problem. The anticipated value is then filtered all the way back to the root. By combining the current value with the expected value from the linear regression, the final value is smoothed using the following equation:
T = N t + K A N + K
where T′ is the predicted value shift to the higher level of the next node, Nt is the predicted value shifted from the lower node to the present node, N is the total number of training instances that shift to the next lower node, A is the predicted value by the node at this node, and K is a constant value.

2.3.3. Support Vector Machine (SVM)

SVM can be used for regression analysis but is typically employed for classification tasks. SVM exhibits the data points displayed on the plane, resulting in fewer close errors, and describes a hyperplane between groups to assure the maximum difference between the two classes. There would be a nonlinear separation of the training data [43]. Next, a nonlinear separable boundary needs to be constructed. The original space must be mapped to a high dimension to produce a nonlinear border. How to map the space in a specific input space is determined by a kernel function. For the model’s optimization, a penalty factor (C) for misclassification was added. By adding up the negative effects of each misclassification, the overall penalty in plotting is calculated. The groundwater and hydrological engineering fields have identified several beneficial applications of the SVM approach [44,45,46].

2.4. Model Evaluation and Comparison

To validate and compare the models, the six quantitative statistics—coefficient of determination (R2), mean absolute error (MAE), relative squared error (RSE), root mean square error (RMSE), relative root mean square error (RRMSE), and performance indicator (ρ)—were used. The mathematical expressions of the statistical measures are presented in Equations (9)–(14).
R 2 = i = 1 n e i e ¯ i i = 1 n m i m ¯ i i = 1 n e i e ¯ i 2 i = 1 n m i m ¯ i 2 2
M A E = i = 1 n e i m i n
RSE = i = 1 n m i e i 2 i = 1 n e ¯ i e i 2
R M S E = i = 1 n ( e i m i ) 2 n
R R M S E = 1 e ¯ i = 1 n e i m i 2 n
ρ = R R M S E 1 + r
where e i and m i are the nth actual and expected output of the i t h sample, respectively. e ¯ and m ¯ i represents the average values of the experimental and expected output, respectively. The number of total datasets is represented by n. The R2 value ranges from 0 to 1. A higher R2 value indicates a more efficient model. A model is considered to be effective if its R2 value is greater than 0.8 and near to 1 [47]. The mean squared difference between the projected output and targets is the criterion of RMSE, and the mean magnitude of the error is the criterion MAE. RRMSE is calculated by dividing RMSE by the average value of the measured data. Model accuracy is considered excellent when RMSE < 10%, good if 10% < RMSE < 20%, fair if 20% < RMSE < 30%, and poor if RMSE > 30% [48]. Relative squared error (RSE) normalizes the total squared error of predicted values by dividing it by the total squared error of measured values. The RSE index ranges from 0 to infinity, with 0 being the ideal. Lower-value RMSE, RRMSE, and RSE indicate better model calibration. A performance indicator (ρ) is proposed to assess the model’s performance as a function of both the RRMSE and the coefficient of correlation (r) value [49]. The value of ρ ranges from 0 to positive infinity, with a lower value (close to zero) indicating better model performance. A schematic diagram of the methodology for applying machine learning (ML) algorithms to predict the R-value is illustrated in Figure 2.

3. Results and Analysis

3.1. Hyperparameter Optimization

In the field of ML, hyperparameters control the performance of the model. Multiple runs on training data are used to optimize hyperparameters. Smaller RMSE and MAE values indicate that the modeling methods are closer to the experimental data. Larger R2 values indicate a better match of trends in experimental data by model predictions. This technique tends to point out the optimized setting for the proposed modeling, i.e., GPR, SVM, and M5P, to achieve the best possible prediction. Implementing SVR and GPR techniques requires the selection of an appropriate kernel function that works internally to map the given data to a high-dimension feature space for processing. To achieve a fair comparison in SVM and GPR models, a value of regularization parameter (C = 0.52) in SVM and Gaussian noise (0.3) in GPR is established and Pearson VII Function kernel (PUK) function is chosen based on modeling performance. The parameter σ controls Pearson width and ω is a tailing factor of the peak in PUK that must be determined based on prediction precision. Both SVM and GPR techniques are used to tune the models for these kernel-specific parameters. Calibration of models in M5P was accomplished by changing the value of the number of instances allowed at each node. The selected primary hyperparameters of the modeling approaches are presented in Table 3.

3.2. Model Performance Evaluation

In the field of ML, models must be evaluated in order to validate the performance of learned models. Different models employ various evaluation methodologies. Following the development of ML models for R-value prediction, the next critical step is to assess the predictive ability of the built ML models. The accuracy of the applied ML models’ prediction of R-value was confirmed in this work by comparing the predicted and measured values of R. Figure 3 depicts the comparison of the predicted and measured R-value values in the training, testing, and validation sets. All proposed models demonstrated very good predictive potential (R2 > 0.9) in training, test, and validation stages with less dispersion, with the exception of the ANN model developed in literature displaying a slightly worse result (i.e., R2 = 0.87) in the training stage. Figure 4 shows line graphs of the accuracy of the predicted R-value of different models in the training set, testing set, and validation set. The predicted R-value of the training set, test set, and validation set is in the range of 11.72–26.98, and the predicted value and actual value of the training set, test set, and validation set have a good fitting effect on the whole, with only a few points with large errors in the test set and validation set, e.g., in the test set, the measured value of R-value was about 11.70 and the predicted value was as high as 11.92. However, minor differences in individual data points have no effect on the GPR model’s overall predictive ability, i.e., the GPR model can accurately estimate the R-value. The results show that all the models predicted R-value with a high level of accuracy. Table 4 presents performance measures of the developed models and the ANN model. In the training stage, all three models performed equally as well as the SVM model (such as R2 = 0.9999, MAE = 0.0364, RSE = 0.0002, RMSE = 0.0560, RRMSE = 0.0030 and ρ = 0.0015) followed by M5P and GPR with parameters (such as R2 = 0.9999, MAE = 0.0453, RSE = 0.0002 and RMSE = 0.05612, RRMSE = 0.0031 and ρ = 0.0015) and (such as R2 = 0.9999, MAE = 0.0399, RSE = 0.0003, RMSE = 0.0592, RRMSE = 0.0032 and ρ = 0.0016), respectively. Similarly, in the testing stage, the SVM model holds best results (such as R2 = 0.9998, MAE = 0.0056, RSE = 0.0004, RMSE = 0.0100, RRMSE = 0.0004, and ρ = 0.0002) and outclassed the GPR (such as R2 = 0.9998, MAE = 0.0221, RSE = 0.0035, RMSE = 0.0287, RRMSE = 0.0012, and ρ = 0.0006) followed by M5P (such as R2 = 0.9947, MAE = 0.0374, RMSE = 0.0106, RMSE = 0.0496, RRMSE = 0.0020, and ρ = 0.0010). The GPR performance measures had better characteristics: their R2 value was high, while their MAE value, RSE value, RSME value, RRMSE value, and ρ were low in the validation stage.
In general, it can be stated that the proposed models are good for the prediction of the R-value, but the GPR model outperforms the SVM, M5P, and ANN models in the validation stage. The advantage of GPR over other models is that it has few parameters to tune and can thus be trained with a small training dataset [50]. The primary benefits of SVM is its ability to deal with overlearning and high dimensionality, which can lead to computational complexity and local extrema [51]. Furthermore, only a few hyperparameters must be tuned or adjusted, giving SVM a simple structure and ease of implementation [52]. However, because SVMs are sensitive to noise, their predictive ability endures when the dataset used is significantly noisy. Pruning in M5P can reduce tree size and the risk of overfitting [53]. Similarly, the ANN is computationally expensive and relies heavily on the capabilities of available hardware [54]. In comparison with previously published ANN model results [11], the results of this modeling study show that some data points are associated with very large model prediction errors (Figure 4), which can be attributed to the large deviations and high correlation between input and output variables of the data used (Table 2). As a result, it is suggested that these models (GPR, SVM, and M5P) be tested and validated on larger datasets with lower variable correlation for improved performance.

3.3. Rank Analysis

The simplest and most popular technique for evaluating the efficacy of constructed models and contrasting their robustness is rank analysis. The ideal values of the statistical parameters, which serve as the benchmark, are used to determine the score value in this study. Using one or more models is appropriate. The outcomes model with the best performance receives the highest score, and vice versa. It is possible that two models with similar results will have similar ranking ratings. The comparison of the testing stage results for the performance measures R2, MAE, RSE, RMSE, RRMSE, and ρ is shown in Table 5. In addition, in this table, the ranking procedure proposed by Zorlu et al. [55] was used. The ranking system is very easy to understand. The most accurate performance index is ranked first in this system. According to Table 5, the model with the highest accuracy was GPR, which also had the highest rank value (i.e., 24). The SVM was the second most accurate model, with a total rank value of 18, followed by M5P, with a total rank value of 11. The least accurate was the ANN model, with a total rank value of 9.

3.4. Sensitivity Analysis

Sensitivity analysis is used to identify each input parameter’s individual impact on the output parameter (i.e., R-value). To determine which of the seven input parameters—HARHA (X1), LL (X2), PL (X3), PI (X4), OMC (X5), AC (X6), and MDD (X7)—had an impact on the anticipated R-value (Y), sensitivity analysis was used in this work. The goal of the sensitivity analysis was to identify the crucial input variable that has the greatest impact on the R-value. In this present study, the cosine amplitude method was used to determine the sensitivity analysis of the problem [56,57]. This method has been utilized in a number of studies [58,59,60,61,62,63], and it is stated as follows:
R i j = t = 1 n y i t × y o t t = 1 n y i t 2 t = 1 n y o t 2
where n is the number of data values (this study used 85 data values), y i t and y o t are the input and output parameters. The R i j value ranged from zero to one for each input parameter, and the highest R i j values suggested the most efficient output parameter (which was R-value in this study). The findings of the sensitivity analysis, shown in Figure 5, show that the maximum and minimum relative strength effect on the R-value of soil was obtained for X7 (0.9964) and X4 (0.9375), respectively. The remaining input parameters have moderate effect on R-value of the soil. The greater the value of R i j , the greater the effect of that particular input variable on output (i.e., R-value). Hence, the maximum dry density, MDD (X7), has the greatest influence, while the plasticity index, PI (X4), has the least influence in predicting R-value.
Taylor [64] introduced Taylor diagrams to visually demonstrate how closely an estimated output (or set of estimated outputs) matches observations. The correlation and standard deviation of the estimated and observed datasets are used to visualize them. It is particularly helpful when assessing numerous aspects of complex models or comparing the relative skill of numerous models (e.g., [65]). Therefore, a model’s efficiency can further be evaluated by constructing a Taylor diagram (see Figure 6). The point (red) is the reference point for all developed models. The loser the developed models to the reference point, the greater will be the performance of the model. In our case, all the models are closer to the reference point, but more significantly, the GPR and SVM models have almost conceded totally to the reference point, indicating best prediction capabilities.
The proposed models of GPR, SVM and M5P can efficiently predict the R-value of the subgrade. This is strictly limited to the range of parameter values that was used to train and develop the model. The predicted accuracy of the model is also strictly dependent on the distribution of the parameter values and significantly affected by the parameter values used for training and development. For instance, the provided proportion of HARHA must be limited to 0–12%. In simple words, this provides the accuracy of the particular developed model.

4. Conclusions

In this study, three machine-learning models (i.e., GPR, M5P, and SVM) were used to predict R-value. The predictive capabilities of the aforementioned models were evaluated using statistical measure criteria of R2, MAE, RSE, RMSE, RRMSE and ρ. In order to assess robustness, the proposed models were also evaluated with an ANN model from the literature. The ML models developed in this research study performed well in the validation stage, but the GPR model (R2 = 0.9996, MAE = 0.0258, RSE = 0.0032, RMSE = 0.0012, RRMSE = 0.0012, and ρ = 0.0006) was the most viable when compared to other machine-learning models. Statistical analysis revealed that the GPR model shows enhancement in model accuracy by minimizing the error difference between measured and predicted values. Sensitivity analysis revealed that maximum dry density, MDD (X7) has the highest Rij values, showing that this particular input parameter has greater effect in predicting the R-value. It can therefore be inferred that the GPR model is a promising method for predicting R-value, which can be extended to predict other significant basic subgrade stiffness/strength characterizations, such as California bearing ratio or resilient modulus. Thus, the application of GPR in the field of predicting the subgrade physical properties is appropriate, and can be seen as an alternative and suitable approach. For a better understanding and prediction of the R-value, several artificial intelligence techniques, such as fuzzy logic and recurrent neural network response surface methodology, may also be used. Furthermore, more experimental data should be gathered in future studies in order to enhance the performance outcomes of prediction models.

Author Contributions

Conceptualization, M.A., R.A.A.-M. and S.L.I.; Formal analysis, M.A., S.K., A.M. and F.A.; Funding acquisition, S.L.I.; Investigation, M.A., R.A.A.-M., S.K. and A.M.; Methodology, M.A., R.A.A.-M., S.L.I. and S.K.; Project administration, R.A.A.-M. and S.L.I.; Resources, R.A.A.-M. and S.K.; Software, M.A., S.L.I., A.M. and F.A.; Supervision, R.A.A.-M.; Validation, F.A.; Visualization, S.K. and A.M.; Writing—original draft, M.A. and F.A.; Writing—review & editing, B.T.A., S.K. and A.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data used to support the findings of this study are included within the article.

Acknowledgments

The authors wish to express their appreciation to the International Islamic University Malaysia for supporting this study and making it possible. The authors would also like to thank the Deanship of Scientific Research at Umm Al-Qura University for supporting this work by Grant Code: 22UQU4390001DSR05.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1. Data for prediction modeling of R-value.
Table A1. Data for prediction modeling of R-value.
S. No.X1 (%)X2 (%)X3 (%)X4 (%)X5 (%)X6X7 (g/cm3)R-valueGPRM5PSVM
106621451621.2511.711.911.711.7
20.1662145161.981.2511.711.811.711.7
30.265.720.944.816.11.961.2711.711.711.711.7
40.365.620.944.716.31.961.2711.711.811.711.7
50.465.320.844.516.31.931.2811.811.911.811.8
60.565214416.41.91.301212.11212
70.664.820.84416.41.881.3112.212.212.112.2
80.764.520.843.716.451.881.3112.212.212.312.2
90.864.120.843.316.471.871.3312.312.412.412.4
100.963.520.942.616.491.851.3312.612.612.512.6
11163214216.51.81.3513.113.213.113.1
121.162.520.641.916.61.81.3513.313.313.313.3
131.262.120.341.816.71.811.3613.513.513.513.5
141.361.920.241.716.81.81.3713.613.713.613.7
151.461.720.141.6171.811.3813.813.913.813.9
161.561.52041.517.21.81.3814.214.314.214.3
171.661.42041.417.21.81.3914.414.414.514.4
181.761.32041.317.31.791.3914.814.714.714.6
191.861.320.141.217.51.811.4014.814.914.914.8
201.961.220.141.117.71.81.411515.115.115.1
21261204117.81.81.4115.315.315.315.3
222.160.919.94117.91.81.4215.615.615.515.5
232.260.719.74117.91.81.4215.715.715.715.6
242.360.619.641181.81.4315.815.815.915.8
252.460.419.44118.21.81.43161616.116.1
262.560194118.31.81.4316.216.316.316.3
272.659.81940.818.351.791.4416.516.516.516.5
282.759.719.140.618.41.771.4516.816.816.716.7
292.859.519.140.418.451.751.46171716.916.9
302.959.21940.218.51.721.4617.117.217.117.1
31359194018.51.71.4617.317.317.317.3
323.158.819.239.618.551.71.4717.417.417.517.5
333.258.418.939.518.61.71.4817.717.717.717.7
343.357.919.138.818.71.711.48181818.118
353.457.41938.418.751.691.4818.318.318.318.3
363.557193818.81.71.4918.518.518.518.5
373.656.818.937.918.851.691.5018.718.718.718.7
383.756.71937.718.91.651.5118.918.918.918.9
393.856.518.937.618.931.641.5119.119.119.119.1
403.956.31937.318.981.611.5219.219.219.319.3
414561937191.61.5219.419.419.419.4
424.155.71936.7191.591.5319.519.519.519.5
434.254.918.736.2191.571.5419.619.619.619.6
444.354.118.535.6191.551.5519.719.719.719.7
454.453.618.435.2191.521.5619.719.719.819.7
464.5531835191.51.5719.819.819.819.8
474.652.81834.818.981.51.582019.919.919.9
484.752.71834.718.961.51.5920202020
494.852.618.134.518.931.51.6020.120.120.120.1
504.952.31834.318.911.51.6120.220.220.320.2
51552183418.91.51.6120.420.320.320.3
525.151.517.733.818.881.481.6220.420.420.420.4
535.251.117.733.418.861.461.6320.520.520.520.5
545.350.818.132.718.841.431.6420.520.520.520.5
555.450.31832.318.821.411.6520.620.620.620.6
565.550183218.81.41.6520.620.620.620.6
575.649.91831.918.781.41.6620.720.720.720.7
585.749.617.931.718.751.411.6720.820.820.820.8
595.849.417.931.518.711.421.6720.820.820.820.8
605.949.117.731.418.651.411.6820.920.920.920.9
61649183118.61.41.6920.920.92120.9
626.148.617.830.818.551.381.7021212121
636.248.317.630.718.481.371.7121.121.121.121.1
646.347.717.330.418.61.351.7221.221.221.121.2
656.447.21730.218.441.331.7321.421.421.521.4
666.547173018.41.31.7421.521.521.621.5
676.646.817.129.718.41.311.7521.621.621.721.6
686.746.516.829.718.411.311.7621.821.721.821.7
696.845.615.929.718.41.31.7721.921.821.921.9
706.945.215.929.318.411.31.78222221.922
71745162918.41.31.7822222222.1
727.144.816.328.518.391.291.7922.122.122.122.2
737.244.316.128.218.371.271.8022.322.322.222.3
747.343.715.927.818.351.261.8122.422.422.322.4
757.443.41627.418.321.231.8322.522.522.422.5
767.543162718.31.21.8422.622.622.522.6
777.642.815.926.918.291.191.8522.722.722.622.7
787.742.41626.418.281.181.8622.822.722.722.8
797.841.815.426.418.261.161.8722.822.822.822.8
807.941.515.426.118.231.141.8722.922.922.922.9
81841152618.21.131.8822.922.92322.9
828.140.714.925.818.21.121.882322.923.123
838.240.31525.318.21.111.8923.223.223.223.2
848.339.815.124.718.21.111.9023.323.323.323.3
858.439.31524.318.211.11.9023.523.223.423.4
868.539152418.211.9123.623.623.623.6
878.638.81523.818.211.9223.723.723.723.7
888.738.314.923.418.211.9323.823.823.823.8
898.837.915.222.718.211.9423.923.923.923.9
908.937.515.222.318.211.9524242424
91937152218.211.9624242424
929.137152218.1911.9624.124.124.124.1
939.237152218.1811.9624.224.224.224.2
949.337152218.1611.9724.324.324.324.3
959.437152218.1311.9724.424.424.424.4
969.537152218.111.9724.524.524.524.5
979.636.815.121.7180.991.9724.624.624.624.6
989.736.715.121.617.920.981.9724.724.724.724.7
999.836.515.121.417.930.971.9824.824.824.824.8
1009.936.315.221.117.910.941.9824.824.824.924.8
1011036152117.90.91.9824.924.92524.9
10210.135.714.920.817.880.881.9825.125.125.125.1
10310.235.515.120.417.840.861.9825.325.225.225.3
10410.334.614.919.717.790.841.9825.425.525.325.4
10510.433.31419.317.730.821.9925.425.525.525.4
10610.533141917.70.81.9925.525.625.725.6
10710.632.81418.817.70.791.9925.825.825.925.8
10810.732.413.918.517.710.781.9926.226.226.126
10910.831.513.917.617.710.751.9926.326.326.326.3
11010.931.11417.117.70.721.9926.526.526.526.5
1111131141717.70.71.9926.826.826.826.7
11211.130.713.916.817.680.71.9926.826.826.826.8
11311.230.313.716.617.630.711.9926.826.826.826.8
11411.329.813.416.417.570.711.9926.926.926.926.9
11511.429.413.216.217.530.711.9826.926.926.926.9
11611.529131617.50.71.9726.926.926.926.9
11711.628.712.815.917.50.691.9726.926.926.926.9
11811.728.51315.517.40.671.9627272727
11911.827.81314.817.30.651.9627272727
12011.927.613.214.417.20.621.9527272727
1211227131417.10.61.9527272727

References

  1. American Association of State Highway and Transportation Officials (AASHTO). Standard Method of Test for Resistance R-Value and Expansion Pressure of Compacted Soils; Transportation Research Board: Washington, DC, USA, 2002. [Google Scholar]
  2. Bandara, N.; Rowe, G.M. Design subgrade resilient modulus for Florida subgrade soils. In Resilient Modulus Testing for Pavement Components; ASTM International: West Conshohocken, PA, USA, 2003. [Google Scholar]
  3. Khazanovich, L.; Celauro, C.; Chadbourn, B.; Zollars, J.; Dai, S. Evaluation of subgrade resilient modulus predictive model for use in mechanistic–empirical pavement design guide. Transp. Res. Rec. 2006, 1947, 155–166. [Google Scholar] [CrossRef]
  4. Onyelowe, K.C.; Onyia, M.E.; Onyelowe, F.D.A.; Van, D.B.; Salahudeen, A.B.; Eberemu, A.O.; Osinubi, K.J.; Amadi, A.A.; Onukwugha, E.; Odumade, A.O. Critical state desiccation induced shrinkage of biomass treated compacted soil as pavement foundation. Építöanyag 2020, 72, 40–47. [Google Scholar] [CrossRef]
  5. Onyelowe, K.C.; Bui Van, D.; Ubachukwu, O.; Ezugwu, C.; Salahudeen, B.; Nguyen Van, M.; Ikeagwuani, C.; Amhadi, T.; Sosa, F.; Wu, W. Recycling and reuse of solid wastes; a hub for ecofriendly, ecoefficient and sustainable soil, concrete, wastewater and pavement reengineering. Int. J. Low-Carbon Technol. 2019, 14, 440–451. [Google Scholar] [CrossRef]
  6. Tarefder, R.A.; Saha, N.; Hall, J.W.; Ng, P.T. Evaluating weak subgrade for pavement design and performance prediction: A case study of US 550. J. GeoEngineer. 2008, 3, 13–24. [Google Scholar]
  7. Rehman, Z.; Khalid, U.; Farooq, K.; Mujtaba, H. Prediction of CBR value from index properties of different soils. Technol. J. Univ. Eng. Technol. (UET) 2017, 22, 17–26. [Google Scholar]
  8. Officials, T. New Mexico Department of Transportation (NMDOT), Standard Specifications for Highway and Bridge Construction. Section 200–600, New Mexico DOT; Aashto: Washington, DC, USA, 2004. [Google Scholar]
  9. Kişi, Ö.; Uncuoğlu, E. Comparison of three back-propagation training algorithms for two case studies. Indian J. Eng. Mater. Sci. 2005, 12, 434–442. [Google Scholar]
  10. Van, D.B.; Onyelowe, K.C.; Van-Nguyen, M. Capillary rise, suction (absorption) and the strength development of HBM treated with QD base geopolymer. Int. J. Pavement Res. Technol. 2018, 4, 759–765. [Google Scholar]
  11. Onyelowe, K.C.; Iqbal, M.; Jalal, F.E.; Onyia, M.E.; Onuoha, I.C. Application of 3-algorithm ANN programming to predict the strength performance of hydrated-lime activated rice husk ash treated soil. Multiscale Multidiscip. Modeling Exp. Des. 2021, 4, 259–274. [Google Scholar] [CrossRef]
  12. Froemelt, A.; Dürrenmatt, D.J.; Hellweg, S. Using data mining to assess environmental impacts of household consumption behaviors. Environ. Sci. Technol. 2018, 52, 8467–8478. [Google Scholar] [CrossRef]
  13. Ahmad, M.; Tang, X.-W.; Qiu, J.-N.; Gu, W.-J.; Ahmad, F. A hybrid approach for evaluating CPT-based seismic soil liquefaction potential using Bayesian belief networks. J. Cent. South Univ. 2020, 27, 500–516. [Google Scholar]
  14. Ahmad, M.; Tang, X.-W.; Qiu, J.-N.; Ahmad, F. Evaluating seismic soil liquefaction potential using bayesian belief network and C4. 5 decision tree approaches. Appl. Sci. 2019, 9, 4226. [Google Scholar] [CrossRef]
  15. Ahmad, M.; Tang, X.; Qiu, J.; Ahmad, F.; Gu, W. LLDV-a Comprehensive Framework for Assessing the Effects of Liquefaction Land Damage Potential. In Proceedings of the 2019 IEEE 14th International Conference on Intelligent Systems and Knowledge Engineering (ISKE), Dalian, China, 14–16 November 2019; pp. 527–533. [Google Scholar]
  16. Ahmad, M.; Tang, X.-W.; Qiu, J.-N.; Ahmad, F.; Gu, W.-J. A step forward towards a comprehensive framework for assessing liquefaction land damage vulnerability: Exploration from historical data. Front. Struct. Civ. Eng. 2020, 14, 1476–1491. [Google Scholar] [CrossRef]
  17. Ahmad, M.; Tang, X.; Ahmad, F. Evaluation of Liquefaction-Induced Settlement Using Random Forest and REP Tree Models: Taking Pohang Earthquake as a Case of Illustration. In Natural Hazards-Impacts, Adjustments & Resilience; IntechOpen: London, UK, 2020. [Google Scholar]
  18. Ahmad, M.; Al-Shayea, N.A.; Tang, X.-W.; Jamal, A.; M Al-Ahmadi, H.; Ahmad, F. Predicting the Pillar Stability of Underground Mines with Random Trees and C4. 5 Decision Trees. Appl. Sci. 2020, 10, 6486. [Google Scholar] [CrossRef]
  19. Ahmad, M.; Kamiński, P.; Olczak, P.; Alam, M.; Iqbal, M.J.; Ahmad, F.; Sasui, S.; Khan, B.J. Development of Prediction Models for Shear Strength of Rockfill Material Using Machine Learning Techniques. Appl. Sci. 2021, 11, 6167. [Google Scholar] [CrossRef]
  20. Noori, A.M.; Mikaeil, R.; Mokhtarian, M.; Haghshenas, S.S.; Foroughi, M. Feasibility of intelligent models for prediction of utilization factor of TBM. Geotech. Geol. Eng. 2020, 38, 3125–3143. [Google Scholar] [CrossRef]
  21. Dormishi, A.; Ataei, M.; Mikaeil, R.; Khalokakaei, R.; Haghshenas, S.S. Evaluation of gang saws’ performance in the carbonate rock cutting process using feasibility of intelligent approaches. Eng. Sci. Technol. Int. J. 2019, 22, 990–1000. [Google Scholar] [CrossRef]
  22. Mikaeil, R.; Haghshenas, S.S.; Hoseinie, S.H. Rock penetrability classification using artificial bee colony (ABC) algorithm and self-organizing map. Geotech. Geol. Eng. 2018, 36, 1309–1318. [Google Scholar] [CrossRef]
  23. Mikaeil, R.; Haghshenas, S.S.; Ozcelik, Y.; Gharehgheshlagh, H.H. Performance evaluation of adaptive neuro-fuzzy inference system and group method of data handling-type neural network for estimating wear rate of diamond wire saw. Geotech. Geol. Eng. 2018, 36, 3779–3791. [Google Scholar] [CrossRef]
  24. Momeni, E.; Nazir, R.; Armaghani, D.J.; Maizir, H. Prediction of pile bearing capacity using a hybrid genetic algorithm-based ANN. Measurement 2014, 57, 122–131. [Google Scholar] [CrossRef]
  25. Xie, C.; Nguyen, H.; Choi, Y.; Armaghani, D.J. Optimized functional linked neural network for predicting diaphragm wall deflection induced by braced excavations in clays. Geosci. Front. 2022, 13, 101313. [Google Scholar] [CrossRef]
  26. Armaghani, D.J.; Mohamad, E.T.; Narayanasamy, M.S.; Narita, N.; Yagiz, S. Development of hybrid intelligent models for predicting TBM penetration rate in hard rock condition. Tunnel. Undergr. Space Technol. 2017, 63, 29–43. [Google Scholar] [CrossRef]
  27. Guido, G.; Haghshenas, S.S.; Haghshenas, S.S.; Vitale, A.; Gallelli, V.; Astarita, V. Development of a binary classification model to assess safety in transportation systems using GMDH-type neural network algorithm. Sustainability 2020, 12, 6735. [Google Scholar] [CrossRef]
  28. Morosini, A.F.; Haghshenas, S.S.; Haghshenas, S.S.; Choi, D.Y.; Geem, Z.W. Sensitivity Analysis for Performance Evaluation of a Real Water Distribution System by a Pressure Driven Analysis Approach and Artificial Intelligence Method. Water 2021, 13, 1116. [Google Scholar] [CrossRef]
  29. Asteris, P.G.; Lourenço, P.B.; Roussis, P.C.; Adami, C.E.; Armaghani, D.J.; Cavaleri, L.; Chalioris, C.E.; Hajihassani, M.; Lemonis, M.E.; Mohammed, A.S. Revealing the nature of metakaolin-based concrete materials using artificial intelligence techniques. Constr. Build. Mater. 2022, 322, 126500. [Google Scholar] [CrossRef]
  30. Hajihassani, M.; Armaghani, D.J.; Sohaei, H.; Mohamad, E.T.; Marto, A. Prediction of airblast-overpressure induced by blasting using a hybrid artificial neural network and particle swarm optimization. Appl. Acoust. 2014, 80, 57–67. [Google Scholar] [CrossRef]
  31. Onyelowe, K.C.; Jalal, F.E.; Onyia, M.E.; Onuoha, I.C.; Alaneme, G.U. Application of gene expression programming to evaluate strength characteristics of hydrated-lime-activated rice husk ash-treated expansive soil. Appl. Comput. Intell. Soft Comput. 2021, 2021, 6686347. [Google Scholar] [CrossRef]
  32. Onyelowe, K.; Salahudeen, A.B.; Eberemu, A.; Ezugwu, C.; Amhadi, T.; Alaneme, G. Oxides of carbon entrapment for environmental friendly geomaterials ash derivation. In International Congress and Exhibition “Sustainable Civil Infrastructures”; Springer: Berlin/Heidelberg, Germany, 2019; pp. 58–67. [Google Scholar]
  33. Benesty, J.; Chen, J.; Huang, Y.; Cohen, I. Pearson correlation coefficient. In Noise Reduction in Speech Processing; Springer: Berlin/Heidelberg, Germany, 2009; pp. 1–4. [Google Scholar]
  34. van Vuren, T. Modeling of transport demand–analyzing, calculating, and forecasting transport demand: By V. A. Profillidis and G. N. Botzoris, Amsterdam, Elsevier, 2018, 472 pp., $125 (paperback and ebook), eBook ISBN: 9780128115145, Paperback ISBN: 9780128115138. Transp. Rev. 2019, 40, 1–2. [Google Scholar] [CrossRef]
  35. Black, W. A method of estimating the California bearing ratio of cohesive soils from plasticity data. Geotechnique 1962, 12, 271–282. [Google Scholar] [CrossRef]
  36. Wang, J. An intuitive tutorial to Gaussian processes regression. arXiv 2020, arXiv:2009.10862. [Google Scholar]
  37. Cheng, M.-Y.; Huang, C.-C.; Roy, A.F.V. Predicting project success in construction using an evolutionary Gaussian process inference model. J. Civ. Eng. Manag. 2013, 19, S202–S211. [Google Scholar] [CrossRef]
  38. Chou, J.-S.; Chiu, C.-K.; Farfoura, M.; Al-Taharwa, I. Optimizing the prediction accuracy of concrete compressive strength based on a comparison of data-mining techniques. J. Comput. Civ. Eng. 2011, 25, 242–253. [Google Scholar] [CrossRef]
  39. Mahesh, P.; Deswal, S. Modelling pile capacity using gaussian process regression. Comput. Geotech. 2010, 37, 942–947. [Google Scholar]
  40. Rasmussen, C.E. Gaussian processes in machine learning. In Summer School on Machine Learning; Springer: Berlin/Heidelberg, Germany, 2003; pp. 63–71. [Google Scholar]
  41. Wang, Y.; Witten, I.H. Induction of Model Trees for Predicting Continuous Classes; University of Waikato, Department of Computer Science: Hamilton, New Zealand, 1996. [Google Scholar]
  42. Quinlan, J.R. Learning with continuous classes. In Proceedings of the 5th Australian Joint Conference on Artificial Intelligence, Ai’92, Hobart, Australia, 16–18 November 1992; World Scientific: Singapore, 1992; pp. 343–348. [Google Scholar]
  43. Tong, S.; Koller, D. Support vector machine active learning with applications to text classification. J. Mach. Learn. Res. 2001, 2, 45–66. [Google Scholar]
  44. Asefa, T.; Kemblowski, M.; Urroz, G.; McKee, M. Support vector machines (SVMs) for monitoring network design. Groundwater 2005, 43, 413–422. [Google Scholar] [CrossRef]
  45. Deka, P.C. Support vector machine applications in the field of hydrology: A review. Appl. Soft Comput. 2014, 19, 372–386. [Google Scholar]
  46. Nguyen, L. Tutorial on support vector machine. Appl. Comput. Math. 2017, 6, 1–15. [Google Scholar]
  47. Ahmad, M.; Ahmad, F.; Wróblewski, P.; Al-Mansob, R.A.; Olczak, P.; Kamiński, P.; Safdar, M.; Rai, P. Prediction of ultimate bearing capacity of shallow foundations on cohesionless soils: A gaussian process regression approach. Appl. Sci. 2021, 11, 10317. [Google Scholar] [CrossRef]
  48. Despotovic, M.; Nedic, V.; Despotovic, D.; Cvetanovic, S. Evaluation of empirical models for predicting monthly mean horizontal diffuse solar radiation. Renew. Sustain. Energy Rev. 2016, 56, 246–260. [Google Scholar] [CrossRef]
  49. Gandomi, A.H.; Roke, D.A. Assessment of artificial neural network and genetic programming as predictive tools. Adv. Eng. Softw. 2015, 88, 63–72. [Google Scholar] [CrossRef]
  50. Faul, S.; Gregorcic, G.; Boylan, G.; Marnane, W.; Lightbody, G.; Connolly, S. Gaussian process modeling of EEG for the detection of neonatal seizures. IEEE Trans. Biomed. Eng. 2007, 54, 2151–2162. [Google Scholar] [CrossRef]
  51. Tao, Z.; Huiling, L.; Wenwen, W.; Xia, Y. GA-SVM based feature selection and parameter optimization in hospitalization expense modeling. Appl. Soft Comput. 2019, 75, 323–332. [Google Scholar] [CrossRef]
  52. Yin, X.; Hou, Y.; Yin, J.; Li, C. A novel SVM parameter tuning method based on advanced whale optimization algorithm. J. Phys. Conf. Ser. 2019, 1237, 022140. [Google Scholar] [CrossRef]
  53. Ma, J.; Kim, H.M. Continuous preference trend mining for optimal product design with multiple profit cycles. J. Mech. Des. 2014, 136, 061002. [Google Scholar] [CrossRef]
  54. Mijwel, M.M. Artificial Neural Networks Advantages and Disadvantages. 2018. Available online: https://www.linkedin.com/pulse/artificial-NeuralNetwork (accessed on 15 May 2022).
  55. Zorlu, K.; Gokceoglu, C.; Ocakoglu, F.; Nefeslioglu, H.; Acikalin, S. Prediction of uniaxial compressive strength of sandstones using petrography-based models. Eng. Geol. 2008, 96, 141–158. [Google Scholar] [CrossRef]
  56. Wu, X.; Kumar, V. The Top Ten Algorithms in Data Mining; CRC Press: Boca Raton, FL, USA, 2009. [Google Scholar]
  57. Momeni, E.; Armaghani, D.J.; Hajihassani, M.; Amin, M.F.M. Prediction of uniaxial compressive strength of rock samples using hybrid particle swarm optimization-based artificial neural networks. Measurement 2015, 60, 50–63. [Google Scholar] [CrossRef]
  58. Faradonbeh, R.S.; Armaghani, D.J.; Abd Majid, M.; Tahir, M.M.; Murlidhar, B.R.; Monjezi, M.; Wong, H. Prediction of ground vibration due to quarry blasting based on gene expression programming: A new model for peak particle velocity prediction. Int. J. Environ. Sci. Technol. 2016, 13, 1453–1464. [Google Scholar] [CrossRef]
  59. Chen, W.; Hasanipanah, M.; Rad, H.N.; Armaghani, D.J.; Tahir, M. A new design of evolutionary hybrid optimization of SVR model in predicting the blast-induced ground vibration. Eng. Comput. 2019, 37, 1455–1471. [Google Scholar] [CrossRef]
  60. Rad, H.N.; Bakhshayeshi, I.; Jusoh, W.A.W.; Tahir, M.; Foong, L.K. Prediction of flyrock in mine blasting: A new computational intelligence approach. Nat. Resour. Res. 2020, 29, 609–623. [Google Scholar]
  61. Ahmad, M.H.J.-L.; Ahmad, F.; Tang, X.-W.; Amjad, M.; Iqbal, M.J.; Asim, M.; Farooq, A. Supervised Learning Methods for Modeling Concrete Compressive Strength Prediction at High Temperature. Materials 2021, 14, 1983. [Google Scholar] [CrossRef]
  62. Ahmad, M.; Amjad, M.; Al-Mansob, R.A.; Kamiński, P.; Olczak, P.; Khan, B.J.; Alguno, A.C. Prediction of Liquefaction-Induced Lateral Displacements Using Gaussian Process Regression. Appl. Sci. 2022, 12, 1977. [Google Scholar] [CrossRef]
  63. Amjad, M.; Ahmad, I.; Ahmad, M.; Wróblewski, P.; Kamiński, P.; Amjad, U. Prediction of pile bearing capacity using XGBoost algorithm: Modeling and performance evaluation. Appl. Sci. 2022, 12, 2126. [Google Scholar] [CrossRef]
  64. Taylor, K.E. Summarizing multiple aspects of model performance in a single diagram. J. Geophys. Res. Atmos. 2001, 106, 7183–7192. [Google Scholar] [CrossRef]
  65. Houghton, J.T.; Ding, Y.; Griggs, D.J.; Noguer, M.; van der Linden, P.J.; Dai, X.; Maskell, K.; Johnson, C. Climate Change 2001: The Scientific Basis: Contribution of Working Group I to the Third Assessment Report of the Intergovernmental Panel on Climate Change; Cambridge University Press: Cambridge, UK, 2001. [Google Scholar]
Figure 1. Histograms of the input (in green) and output (in yellow) variables used in this study.
Figure 1. Histograms of the input (in green) and output (in yellow) variables used in this study.
Mathematics 10 03432 g001aMathematics 10 03432 g001b
Figure 2. Flowchart for applying ML algorithms to predict the R-value.
Figure 2. Flowchart for applying ML algorithms to predict the R-value.
Mathematics 10 03432 g002
Figure 3. Comparison of measured R-value and predicted R-value in (a) training set, (b) testing set, and (c) validation set.
Figure 3. Comparison of measured R-value and predicted R-value in (a) training set, (b) testing set, and (c) validation set.
Mathematics 10 03432 g003
Figure 4. Line graphs showing accuracy of predicted R-value of different models in (a) training set, (b) testing set, and (c) validation set.
Figure 4. Line graphs showing accuracy of predicted R-value of different models in (a) training set, (b) testing set, and (c) validation set.
Mathematics 10 03432 g004aMathematics 10 03432 g004b
Figure 5. Relative strength effect of input parameters in predicting R-value.
Figure 5. Relative strength effect of input parameters in predicting R-value.
Mathematics 10 03432 g005
Figure 6. Taylor diagram of GPR, SVM, and M5P predicted models.
Figure 6. Taylor diagram of GPR, SVM, and M5P predicted models.
Mathematics 10 03432 g006
Table 1. Statistical analysis of the input and output parameters.
Table 1. Statistical analysis of the input and output parameters.
DatasetStatistical ParametersInput and Output Parameters
X1 (%)X2 (%)X3 (%)X4X5 (%)X6X7 (g/cm3)Y
Minimum0.0039.3014.9024.3016.001.101.2511.70
Maximum8.4066.0021.0045.0019.002.001.9023.50
TrainingMean4.2054.0218.3835.6518.111.561.5718.41
SD2.477.791.766.070.880.250.193.63
COV58.7714.439.5717.034.8716.3312.0119.74
Minimum8.5035.5014.9020.4017.840.861.9123.60
Maximum10.2039.0015.2024.0018.201.001.9825.30
TestingMean9.3537.0615.0422.0118.070.971.9624.37
SD0.530.960.090.980.140.050.020.50
COV5.712.600.614.430.774.741.102.04
Minimum10.3027.0012.8014.0017.100.601.9525.40
Maximum12.0034.6014.9019.7017.790.841.9927.00
ValidationMean11.1530.4713.6116.8717.560.721.9826.51
SD0.532.180.551.690.200.070.010.59
COV4.797.154.0710.001.139.190.752.23
SD = standard deviation, COV = coefficient of variation.
Table 2. Pearson correlation coefficients for inputs and the target output.
Table 2. Pearson correlation coefficients for inputs and the target output.
ParameterX1X2X3X4X5X6X7Y
X11
X2−0.99721
X3−0.98930.99151
X4−0.99650.99940.98651
X50.2014−0.1435−0.1749−0.13481
X6−0.99390.99750.98460.9981−0.12041
X70.9858−0.9818−0.9770−0.98030.2394−0.97421
Y0.9844−0.9721−0.9695−0.97000.3639−0.96590.97281
Table 3. Optimized hyperparameters.
Table 3. Optimized hyperparameters.
AlgorithmFunctionHyperparametersOptimal Value
GPRPUK kernelNoise0.3
σ0.4
ω0.4
SVMPUK kernelC0.52
σ1
ω1
M5P-Instances 4
Table 4. Comparison of statistical parameters for performance evaluation of the GPR, SVM, M5P, and ANN models.
Table 4. Comparison of statistical parameters for performance evaluation of the GPR, SVM, M5P, and ANN models.
DatasetModelStatistical Parameters for Performance EvaluationReference
R2MAERSERMSERRMSEρ
TrainingGPR0.99990.03990.00030.05920.00320.0016Present study
SVM0.99990.03640.00020.05600.00300.0015
M5P0.99990.04530.00020.05620.00310.0015
ANN0.87000.57000.00004.32000.23000.1200 [11]
TestingGPR0.99980.02210.00350.02870.00120.0006Present study
SVM0.99980.00560.00040.01000.00040.0002
M5P0.99470.03740.01060.04960.00200.0010
ANN0.99000.35000.00964.93000.20000.1000 [11]
ValidationGPR0.99960.02580.00320.03250.00120.0006Present study
SVM0.99550.02680.00920.05510.00210.0010
M5P0.99390.03250.01220.06360.00240.0012
ANN0.99000.03800.00971.19000.04000.0200 [11]
Table 5. Modeling results for the validation set of GPR, M5P, and SVM, and ANN models.
Table 5. Modeling results for the validation set of GPR, M5P, and SVM, and ANN models.
ModelPerformance MeasuresRankReference
R2MAERSERMSERRMSEρR2MAERSERMSERRMSEρTotal
GPR0.99960.02580.00320.03250.00120.000644444424Present study
SVM0.99550.02680.00920.05510.00210.001033333318
M5P0.99390.03250.01220.06360.00240.001222122211
ANN0.99000.0380.00971.19000.04000.02003121119 [11]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Ahmad, M.; Alsulami, B.T.; Al-Mansob, R.A.; Ibrahim, S.L.; Keawsawasvong, S.; Majdi, A.; Ahmad, F. Predicting Subgrade Resistance Value of Hydrated Lime-Activated Rice Husk Ash-Treated Expansive Soil: A Comparison between M5P, Support Vector Machine, and Gaussian Process Regression Algorithms. Mathematics 2022, 10, 3432. https://0-doi-org.brum.beds.ac.uk/10.3390/math10193432

AMA Style

Ahmad M, Alsulami BT, Al-Mansob RA, Ibrahim SL, Keawsawasvong S, Majdi A, Ahmad F. Predicting Subgrade Resistance Value of Hydrated Lime-Activated Rice Husk Ash-Treated Expansive Soil: A Comparison between M5P, Support Vector Machine, and Gaussian Process Regression Algorithms. Mathematics. 2022; 10(19):3432. https://0-doi-org.brum.beds.ac.uk/10.3390/math10193432

Chicago/Turabian Style

Ahmad, Mahmood, Badr T. Alsulami, Ramez A. Al-Mansob, Saerahany Legori Ibrahim, Suraparb Keawsawasvong, Ali Majdi, and Feezan Ahmad. 2022. "Predicting Subgrade Resistance Value of Hydrated Lime-Activated Rice Husk Ash-Treated Expansive Soil: A Comparison between M5P, Support Vector Machine, and Gaussian Process Regression Algorithms" Mathematics 10, no. 19: 3432. https://0-doi-org.brum.beds.ac.uk/10.3390/math10193432

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop