1. Introduction
Since the second half of the 19th century, considerable attention has been paid to the use of coarse-grained soil in the construction of embankment dams. Then, improvements in the density methods and the use of coarse-grained soil meant that engineers were able to design and construct higher rockfill dams [
1]. In order to control the functions of dams, their behaviour must be examined at the time of construction, and considerable data need to be gathered. Observations reveal that, with the first impoundment or heavy rain, not only long-term deformation (creep) [
2] but also sudden settlements occur [
3]. Sudden settlement during the first impoundment is a salient feature of central core clay dams with rockfill shells. This sudden settlement, which is referred to as collapse settlement, is ascribed to the impacts of floods in the upstream rockfill [
4]. Collapse settlement is a dangerous geotechnical phenomenon that can damage dams and their equipment. However, it is hard to predict the collapses resulting from collapsible soil for various reasons, such as soil deformability, the degree of saturation, and the variable loading conditions [
5].
Previous studies have indicated that most soils can be exposed to collapse [
6,
7,
8,
9]. In other words, if poorly graded unsaturated soil is excessively saturated under the loading process, it is likely to collapse. However, there must be poor binding between the grains for it to be destroyed as a result of saturation and, consequently, the soil weakens [
10]. Destruction of this binding reduces the shear strength of the soil and finally leads to significant deformations. Studies have demonstrated that factors such as the size of particles, Atterberg limits, and the moisture content of fine-grained particles affect the collapse failure [
11].
In order to look into the reasons for collapse and the considerable deformations of rockfill dams, many researchers conducted different experiments on soil in the 1960s and 1970s [
2,
12]. One of these experiments was the one-dimensional odometer test, which is widely used to estimate the collapse potential of soils. It can also be used to calculate soil collapse settlement [
5,
13,
14]. The results of these experiments revealed that the fracture of coarse grains under considerable stress leads to the rearrangement of soil grains and collapse settlement. Comprehensive studies were undertaken in this regard by running several three-dimensional and one-dimensional investigations [
15]. The results of sieving before and after the experiments indicated that soil grains were shattered during the experiment. Thus, it was understood that wetting of soil reduces the strength of particles. Various reasons, such as spreading of cracks in grains [
16], suction reduction, and reduction of the energy level of minerals [
17], have been suggested to illuminate the fracturing of grains [
18]. In response to the reduction of matrix suction, collapsible soils undergo three distinct phases: the pre-collapse phase, collapse phase, and post-collapse phase. In the pre-collapse phase, collapsible soils may expand, maintain a constant volume, or collapse slightly, and the soil structure is intact in this phase. In the post-collapse phase, soil may collapse with the same intensity as in the collapse phase, slow down, or stop collapsing [
19].
Previous studies have revealed that various parameters, such as moisture content, level of normal and shear stresses, sand content, clay content, and relative density, can influence the collapse phenomenon [
10,
20,
21,
22]. Collapse criteria have been developed based on two parameters, dry density and saturation degree [
5]. It is understood that, as dry density is reduced in terms of fixed moisture content (fixed matrix suction), the collapse potential of soils increases [
5,
14,
23,
24]. Further, as the moisture percentage increases in terms of the fixed dry unit weight, collapse potential decreases [
5,
14,
24,
25]. However, higher collapse potential has been observed for samples that have higher matrix suction, regardless of their dry density and moisture percentage [
24]. On the other hand, it has been observed that collapse potential for different soils increases as the porosity rate increases [
26]. Previous studies have also showed that the primary percentage of moisture in soil has had the strongest effect on collapse settlement, as reduction in the primary percentage of moisture leads to collapse settlement.
In order to investigate the stress and strain paths during collapse settlement, the use of collapse settlement (ΔH) and coefficient of stress release (CSR) values has been proposed. [
20]. The CSR is typically defined as the ratio of shear stress in saturated conditions (
) to the shear stress in dry conditions (
):
Therefore, the shear stress of soil after collapse can be determined by multiplying the CSR by the soil shear stress in dry conditions. Measuring ΔH and CSR requires complicated laboratory tests and expensive equipment. Hence, it seems that developing predictive models could be an alternative to deal with such a complicated problem.
Soft computing-based approaches have been effectively applied in order to model and predict mechanical behaviour and material strength in the field of civil engineering [
27,
28,
29,
30,
31,
32,
33,
34,
35,
36,
37,
38,
39,
40,
41,
42,
43,
44,
45]. Hasanzadehshooiili et al. [
46] modelled the collapse settlement of sandy gravel via an artificial neural network (ANN) [
46]. To this end, 180 data points obtained from a large-scale direct shear experiment were used as a dataset. Additionally, sand content, shear stress level, normal stress, and relative density were considered as independent variables, and the collapse settlement of sandy gravel soil and coefficient of stress release were selected as dependent parameters. The results of the study showed that the proposed model had the capability of predicting the collapse settlement of the soil and the coefficient of stress release with R
2 values of 0.9828 and 0.9806, respectively. Soleimani et al. [
47] offered a model based on multi-gene genetic programming (MGGP) and multi-variable least square regression (MLSR) to predict collapse settlement and the coefficient of stress release [
47]. To develop the model, they used a dataset containing 180 experimental samples. The developed models could be used to estimate the collapse settlement and coefficient of stress release based on the sand content, shear stress level, normal stress, and relative density. The MGGP model could predict the collapse settlement and coefficient of stress release with R
2 values of 0.958 and 0.982, respectively. This model, compared with the MLSR model that predicted the collapse settlement and coefficient of stress release with R
2 values of 0.857 and 0.942, showed higher accuracy. Najemalden et al. (2020) used the ANN method to predict the collapse potential of gypsiferous sandy soils [
48]. Sandy soils were taken from four zones in Iraq to produce 180 samples with varying properties. This experimental study included estimation of the collapse potential with an oedometer device. Seven parameters of soil, including gypsum content, specific density, primary dry unit weight, primary saturation degree, primary porosity ratio, primary moisture content, and the passing percentage through a No. 200 sieve (0.074 mm), were regarded as input variables. Moreover, the collapse potential of gypsiferous sandy soils was taken as the output. The results revealed that the ANN method could desirably estimate the collapse potential. Uysal (2020) presented a model to predict the collapse potential of soil via gene expression programming (GEP) [
49]. The dataset used in the study included uniformity coefficient, primary moisture content, primary dry unit weight, and wetting pressure. Comparing the prediction performance of GEP models with that utilizing empirical results and relations based on regression showed that the GEP model had more precision in estimating the collapse potential of soil. Zhang (2020) made use of the multivariate adaptive regression spline (MARS) method to present models for predicting the collapse potential of dense soils [
50]. In this research study, a dataset containing 330 data points was utilized and the parameters primary moisture content, primary dry unit weight, and wetting pressure were taken as independent variables. Then, the performances of the MARS method and ANN were compared in terms of prediction precision, calculation time, and model interpretation. The results revealed that the MARS model, with R
2 values of 0.948 and 0.926, had better performance for training and testing sets, respectively. Mawlood (2021) modelled the collapse potential of gypseous soils using linear and nonlinear regression methods [
51]. In this work, 220 collected data points from various studies were used to develop the model. The developed models could predict the collapse potential in terms of gypsum content, initial moisture content, void ratio, liquid limit, plasticity index, total unit weight, and dry unit weight. The results showed that the developed models, based on the statistical parameters of the regression coefficient (R
2) and the root mean square error (RMSE), could predict the collapse potential of gypseous soils well.
As the research background indicates, two intelligent models, ANN and MGGP, have been developed to predict the collapse settlement and coefficient of stress release based on sand content (SC), normal stress (
), shear stress level (SL), and relative density (Dr) variables as affecting parameters. The models developed based on the artificial neural network method have several disadvantages, such as the “black box” nature of the developed model, greater computational load, and over-fitting potential [
52]. Furthermore, in order to implement artificial neural network models and make predictions, matrix calculations are needed to simulate the network, which cannot be undertaken with simple manual calculations. GP uses an evolutionary approach to determine the mathematical form of the model, but the values of the constant coefficients of the model are generated randomly as non-tuneable constants. Therefore, the values obtained for the constant coefficients are not necessarily the optimal values and the proper structures of the model can be affected by these inaccurate coefficients. Bloating may also occur when modelling with GP. Bloating causes the model sentences to grow excessively without any significant improvement in the overall performance. Therefore, bloating can hinder the progress of the evolutionary process of GP [
53,
54]. Evolutionary polynomial regression (EPR) is a hybrid machine-learning method that was developed to overcome some of the drawbacks of the GP method. EPR requires a small number of constants to build the final model, which helps reduce the potential for over-fitting, especially for small datasets. It uses the least squares method to estimate the values of the coefficients, which makes it possible to obtain a unique solution when the inverse problem is well-conditioned [
53].
In this study, the EPR technique was used to obtain simple equations for predicting the collapse settlement and coefficient of stress release for sandy gravel soil. The developed equations have high accuracy and generalizability and, due to their simple form, they can be easily implemented in manual calculations. After evaluating the precision of each of the models, the optimal model in terms of precision and simplicity was selected, and sensitivity analysis was performed to recognize the degree of importance of each of the inputs in predicting the collapse settlement and coefficient of stress release. Furthermore, a parametric study was also undertaken to identify the effects of changes in each input parameter on the output parameters. Finally, the model was also compared with other previously developed models.
2. Evolutionary Polynomial Regression (EPR)
EPR is a smart regression method that automatically searches for the best model for the relationship among the input and output variables. The main advantage of EPR is that it does not need to consider a nonlinear regression model. The basis of this algorithm is that it combines the genetic algorithm (GA) and the regression analysis method [
55].
EPR works based on the creation of several candidate relations between input and output data using an evolutionary process that utilizes the GA. The developed relations depend on the number of data points, the kind of relations among input and output variables, the suggested limit for constant exponents, and the number of suggested terms in the final relation. Equation (2) shows the general scheme of EPR [
33]:
where,
y is the output vector,
ai is a constant value,
X is the input variables matrix, and
m is the number of the terms in suggested relation. Furthermore,
F is a relation created by the process and
f(
X) is a function defined by user.
To develop Equation (2), the GA is used. However,
f(
X) and m are determined by the user based on their understanding of the intended physical phenomenon or a trial and error method. The first step in finding the relation between the input and output data is rewriting Equation (2) as a vector, as shown in Equation (3) [
53]:
where,
is a vector for the estimation of the least squares for
N target values, and
is a vector consisting of
and
. Finally,
is a matrix that is composed of the identity matrix for
and m number of
Zj variables. For a fixed value of j,
Zj is the product of vectors of independent variables [
53]. Equation (4) presents the input data of
X [
53]:
where the
kth column of
X represents the candidate variables for the
jth term of Equation (3). Thus,
in Equation (3) can be written as Equation (5) [
56]:
where
Zj is the vector of the
jth column whose elements are a product of candidate independent inputs, ES is a matrix of exponents, and k is the number of independent variables.
To develop the final equation, assuming that the vector of exponent constants defined by the user is
, the number of terms specified by m (without bias) is 4, the number of independent variables used in the analysis (
k) is 3, the number of columns is 3, and the number of the lines of the ES matrix is 4. The coefficients of the ES exponent, for example, will be as follows:
Applying the matrix presented in Equation (6) to Equation (5), the four mathematical equations (Equations (7)–(10)) are:
Therefore, Equation (3) becomes:
In order to determine the constant coefficients of Equation (11), the least-squares method is used [
53]. Additionally, EPR utilizes the GA to find the best mathematical equation based on the suggested exponents [
53]. The genetic algorithm (GA) is an evolutionary optimization algorithm inspired by the process of natural selection. This method starts by creating a random primary population from solutions. Each determined parameter represents a person’s chromosome. The fit of each person is also determined based on their performance in the environment. Then, the new population is developed from mutation and crossover operations [
57,
58].
Figure 1 shows the EPR flowchart.
7. Parametric Study
Cost and time restrictions, as well as limited access to applicable equipment, are typically the main problems in laboratory studies. In most cases, examining the effects of each input variable on the output variables requires preparing several samples, which is costly and time consuming. One merit of modelling is that developed models can be used for parametric studies and evaluation of the impact of each input variable on the model output.
As was already noted, in this study, the input parameters were SC,
, Dr, and SL. This study made use of the optimal EPR model to examine the interation impact of SC and Dr, Dr and
Dr and SL, and SL and
on the collapse settlement and coefficient of stress release. To this end, the desired variable altered between its minimum and maximum values and other variables were considered equal to the mean value. Then, the collapse settlement and coefficient of stress release were determined via EPR. The pertinent results are shown in
Figure 16 and
Figure 17.
The interaction of
and SL indicates that increasing SL led to an increase in the collapse settlement and a reduction in the coefficient of stress release. Additionally, an increase in
led to an increase in both the collapse settlement and coefficient of stress release. Furthermore, the interaction of SC and Dr indicated the high impact of SC in increasing the collapse settlement and decreasing the coefficient of stress release. The trivial effect of Dr in increasing the coefficient of stress release and decreasing the collapse settlement is obvious. The interaction of Dr and
also revealed that an increase in
resulted in increases in both the collapse settlement and the coefficient of stress release. Increases in Dr also increased the coefficient of stress release and decreased collapse settlement. The interaction of SL and Dr implied a remarkable effect of SL on increasing the collapse settlement and coefficient of stress release. Finally, the changes related to Dr implied that an increase in Dr resulted in increasing the coefficient of stress release and decreasing the collapse settlement.
Table 12 presents a comparison of the parametric study results for the present study model and the models by Hasanzadehshooiili et al. [
46] and Soleimani et al. [
47].
Table 12 reveals that the effects of the input variables on the collapse settlement and coefficient of stress release in the EPR model had a similar trend to those of Hasanzadehshooiili et al. [
46] and Soleimani et al. [
47]. This, in fact, indicates the precision of the developed model.
8. Conclusions
In this study, the collapse settlement and coefficient of stress release of sandy gravel soil were examined. To develop the prediction model, a dataset consisting of 180 samples from a large-scale direct shear test was employed. Using sand content (SC), normal stress (), shear stress level (SL), and relative density (Dr) variables, the developed models could predict the collapse settlement and the coefficient of stress release. The findings of the present study can be summarized as follows:
(1) EPR models developed with an exponential function were selected as the optimal models. According to the R2 coefficient, the levels of precision of the model in predicting collapse settlement using training, testing, and all data were 0.9759, 0.9759, and 0.9759, respectively, and the precision levels in predicting the coefficient of stress release were 0.9833, 0.9820, and 0.9833, respectively;
(2) The EPR models showed superior performance in predicting the collapse settlement and coefficient of stress release compared to the MGGP model. Additionally, the ANN model showed higher accuracy than the EPR model in predicting collapse settlement. However, the EPR model could predict the coefficient of stress release more precisely compared to the ANN model;
(3) The results of the sensitivity analysis revealed that the SC was the most important and Dr the least important parameter in predicting the collapse settlement. Furthermore, the Dr and SC were found to be the most and least important parameters, respectively, in predicting the coefficient of stress release;
(4) The results of the parametric study confirmed that increases in the SL and SC led to an increase in collapse settlement and a decrease in the coefficient of stress release. Additionally, increasing σn caused both collapse settlement and coefficient of stress release to increase. Finally, increases in the Dr variable reduced the collapse settlement and increased the coefficient of stress release;
(5) To continue this research and achieve higher accuracy in predicting the collapse settlement and coefficient of stress release, other machine learning methods could be used.