Next Article in Journal
Influence of Building Height Variation on Air Pollution Dispersion in Different Wind Directions: A Numerical Simulation Study
Previous Article in Journal
Digital Twins in Software Engineering—A Systematic Literature Review and Vision
Previous Article in Special Issue
Kernel Geometric Mean Metric Learning
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Construction Project Cost Prediction Method Based on Improved BiLSTM

School of Information and Control Engineering, Xi’an University of Architecture and Technology, Xi’an 710055, China
*
Author to whom correspondence should be addressed.
Submission received: 27 November 2023 / Revised: 21 January 2024 / Accepted: 22 January 2024 / Published: 23 January 2024
(This article belongs to the Special Issue Machine/Deep Learning: Applications, Technologies and Algorithms)

Abstract

:
In construction project management, accurate cost forecasting is critical for ensuring informed decision making. In this article, a construction cost prediction method based on an improved bidirectional long- and short-term memory (BiLSTM) network is proposed to address the high interactivity among construction cost data and difficulty in feature extraction. Firstly, the correlation between cost-influencing factors and the unilateral cost is calculated via grey correlation analysis to select the characteristic index. Secondly, a BiLSTM network is used to capture the temporal interactions in the cost data at a deep level, and the hybrid attention mechanism is incorporated to enhance the model’s feature extraction capability to comprehensively capture the interactions among the features in the cost data. Finally, a hyperparameter optimisation method based on the improved particle swarm optimisation algorithm is proposed using the prediction accuracy as the fitness function of the algorithm. The MAE, RMSE, MPE, MAPE, and coefficient of determination of the simulated prediction results of the proposed method on the dataset are 7.487, 8.936, 0.236, 0.393, and 0.996%, respectively, where MPE is a positive coefficient. This avoids the serious consequences of underestimating the cost. Compared with the unimproved BiLSTM, the MAE, RMSE, and MAPE are reduced by 15.271, 18.193, and 0.784%, respectively, which reflects the superiority and effectiveness of the method and can provide technical support for project cost estimation in the construction field.

1. Introduction

The scale and complexity of construction work continues to rise in the context of the global construction boom. Engineering cost is one of the most critical factors affecting the success of engineering projects; in the bidding stage for construction projects, accurate cost prediction is the basis for assessing the feasibility of the project and selecting design options, which directly affects how reasonable the bidding price is and the probability of success in winning the bid. In turn, these factors affect the economic feasibility and overall quality of the project. Consequently, an indispensable need exists to provide an accurate method of predicting construction cost.
In previous studies, linear models such as the autoregressive moving average (ARIMA) [1] model have been utilised due to their simple structure and robustness to data size and noise levels. However, the ARIMA model is unsuitable for capturing the nonlinearities of time series in engineering costs [2], which negatively affects prediction accuracy. To solve this problem, the support vector machine (SVM) [3,4,5], backpropagation (BP) neural network [6,7,8], and other machine learning models have been applied to cost prediction. Although such methods can effectively handle nonlinear problems, the SVM has limitations regarding data correlation processing and slow processing speed, and the BP neural network can quickly lose time series data and fall into local minimal values. With the continuous development of deep learning models, long short-term memory (LSTM) networks have achieved good prediction results in engineering cost prediction. Dong et al. [9] compared an LSTM network model to an SVM model through construction cost prediction experiments. The results revealed that the LSTM model can handle high-dimensional feature vectors and the selective recording of historical information better than an SVM while also having advantages in terms of prediction accuracy and parameter adjustment. Cao et al. [10] used LSTM to predict the construction cost of highway projects. The results showed that LSTM provides more accurate predictions in short, medium, and long-term prediction scenarios compared to the traditional model. However, LSTM still has shortcomings in dealing with the strong interactions between the features of the cost data. A bidirectional LSTM [11,12,13,14] (BiLSTM) network is a variant of the LSTM network that has an additional layer providing an inverse structure that can mine more data. Sima et al. [15] proved that the prediction accuracy of a BiLSTM network is higher than that of an LSTM network for time series problems, and suggested using BiLSTM instead of LSTM for issues related to time series analysis. A single model often fails to achieve optimal performance, and combined models can provide better performances and more accurate predictions by combining the strengths of multiple underlying models [16]. Chen et al. [17] incorporated an attention mechanism (AM) into a BiLSTM model for feature weight adjustment, which further improved the prediction effect of the model. Jalil et al. [18] used a particle swarm optimisation (PSO) algorithm to optimise the hyperparameters of a BiLSTM model and reduce its prediction error.
Implicit temporal patterns exist in construction cost data as factors such as materials, labor, and variable project duration. In addition, the data contain non-linear relationships and multi-level dependencies that allow for complex interactions between different cost characteristic indicators. However, existing studies have yet to consider these effects fully, so the constructed models are deficient in generalisation ability and prediction accuracy. Based on the above problems, this paper introduces the BiLSTM network into the field of engineering cost prediction. Grey correlation feature analysis (GRA) is performed to calculate the correlation between indicators and costs for feature indicator selection. The inter-temporal and inter-feature interactions in the cost data are captured by constructing the model prediction part by fusing the hybrid attention mechanism (HAM) and BiLSTM network. They aim to address the shortcomings of the traditional particle swarm optimisation algorithm. An improved particle swarm optimisation algorithm is proposed to find the optimal hyper-parameter configuration of the model, which avoids the tediousness and uncertainty problems of manual parameter adjustment. Experiments were conducted on 120 completed residential building projects in Jiangsu Province from 2019 to 2022, and the results show that the method has significant advantages in mean absolute error, root-mean-square error, and symmetric mean absolute percentage error.

2. Methodology for the Selection of Characteristic Indicators

Considering the specificity and diversity of construction projects, the factors affecting project costs are complex and diverse. However, not all factors have the same weight and importance regarding project costs [19]. Therefore, when selecting indicators, the principle of moderation should be considered to screen out indicators that can effectively describe project characteristics or have a significantly impactful project cost.
GRA is a data analysis method based on grey system theory that aims to study the interrelatedness between multiple indicators [20]. The data associated with each indicator are summed to determine the relative degree of each indicator, and the grey correlation between indicators is calculated to determine the degree of influence of each indicator on a problem. Compared to the traditional correlation analysis method, GRA can more accurately reflect the correlations between indicators and their degree of influence. Additionally, GRA has the advantages of a simple model, a small data volume, and interpretable results, which are suitable for selecting construction cost indicators. The steps involved in the grey correlation calculations are as follows.
  • Raw data standardisation: the values of indicators in the raw data are mapped to intervals that ensure the same scale and range of variation among different variables.
  • Absolute difference matrix construction: we take the absolute value of the difference between the target indicator and each candidate indicator to form an absolute difference matrix as:
    Δ i k = x 0 k x i k
    where x 0 and x i denote the standardised target and candidate indicators, respectively, k denotes the kth sample, and i denotes the ith candidate indicator.
  • Correlation matrix construction: a correlation matrix is derived by considering each element in the absolute difference matrix to calculate the minimum and maximum values of the absolute difference matrix as:
    ζ i k = Δ min + ρ Δ max Δ i k + ρ Δ max
    where Δ min and Δ max denote the minimum and maximum values of the matrix, respectively, and ρ denotes the resolution factor, which was set to 0.5 in this study.
  • The grey correlation is calculated as:
    r 0 , i = 1 n k = 1 n ζ i k
In this study, GRA was used to select characteristic indicators. By analysing the frequency of influencing factors and the difficulty of obtaining them in previous literature [21,22,23,24,25,26], we selected 18 indicators that have significant impacts on construction cost as candidate indicators: foundation type, structural type, fortification intensity, management level, façade material, aboveground floor area, underground floor area, number of aboveground floors, number of underground floors, aboveground floor height, underground floor height, interior wall decoration, types of doors and windows, number of elevators, roof type, concrete prices, reinforcing steel prices, and duration. The correlations between the candidate indicators and the unilateral cost were calculated via GRA, and the characteristic indicators were selected according to the resulting correlation value.

3. Predictive Modelling Design

The structure of the prediction model is shown in Figure 1, which consists of four parts: an input layer, a BiLSTM layer, an attention layer, and an output layer.
In this case, the input layer pre-processes the cost data to transform it into a shape that the BiLSTM can process. The BiLSTM layer is used to capture inter-temporal interactions in the costing data, and the attention layer is used to obtain inter-feature interactions in costing data. The output layer contains the dropout layer and the dense layer, where the dropout layer randomly switches off some neurons with probability p during training to set their outputs to zero, thus reducing the risk of overfitting. The weight matrix W of each neuron is multiplied by the probability p to maintain consistency during testing to compensate for the random switching-off operation during training, thus improving the model generalisation and robustness. In the dense layer, the neurons establish connections with each neuron in the previous layer, each connection has a corresponding weight, and each feature in the input data is passed through these connections to the dense layer where a weighted summation is performed in the neuron and nonlinear mapping is performed on the weighted output sequence to produce the output of the neuron.

3.1. Capturing Inter-Temporal Interactions

An LSTM network is a variant structure of a recurrent neural network (RNN) that solves the problem of gradient vanishing that occurs in RNNs when handling long sequential data. The structure of an LSTM network is illustrated in Figure 2.
An LSTM unit contains three gate structures called the forget gate, input gate, and output gate, respectively. The forget gate determines that information f t is overlooked based on the hidden state h t 1 in the previous moment and the input x t in the current moment. The input gate takes the information update value i t and candidate cell state c ^ t through h t 1 and x t . We multiply f t times the cell state c t 1 in the previous moment and add the data from the cell update to obtain the new cell state c t , which yields the initial output o t through h t 1 and x t and the current moment output h t through c t and o t . The corresponding formulas are defined as follows:
f t = σ w f h t 1 + w f x t + b f
i t = σ w i h t 1 + w i x t + b i
c ^ t = tanh w c h t 1 + w c x t + b c
c t = f t c t 1 + i t c ^ t
o t = tanh w o h t 1 + w o x t + b o
h t = o t tanh c t
where w f , w i , w c , and w o represent the weight matrices of the different gates; b f , b i , b c , and b o represent the corresponding bias vectors; and σ and tanh represent the activation functions.
In contrast to traditional LSTM, a BiLSTM network consists of two LSTM layers for the forward and backward directions and can fully account for both historical and future information. The structure of BiLSTM is illustrated in Figure 3.
The BiLSTM layer performs inter-temporal interaction capture by conducting both forward and backward pass steps. During the forward pass, the data in each time step are processed stepwise, starting from the first time step of the time series data. For each time step, the BiLSTM calculates and updates the hidden state h i f of the current time step based on the input data x i of the current time step and hidden state h i 1 f from the previous time step. In contrast to the forward pass, in the backward pass, the BiLSTM begins processing forward from the last time step in the time series data. For each time step, BiLSTM calculates and updates the hidden state h i b of the current time step based on the input data x i of the current time step and the hidden state h i + 1 b from the subsequent time step. Because the information transmitted in the past and the information transmitted in the future regarding the cost data have different degrees of importance, a more comprehensive information representation o t is obtained by connecting h i f and h i b through the adaptive assignment of weights to capture the inter-temporal interactions at a deep level, which is calculated as follows:
h t f = LSTM x t , h t 1 f
h t b = LSTM x t , h t 1 b
o t = w f h t f + w b h t b + b
where w f and w b represent the weight matrices of the forward and backward LSTM layers, respectively, and b represents the bias vector.

3.2. Capturing Inter-Feature Interactions

The attention mechanism is a powerful tool in deep learning, and its design is inspired by the information processing mechanism of the human brain [27,28]. The attention mechanism makes the deep learning model pay more attention to the critical parts of the input data while ignoring the unimportant parts. The core idea is to assign different weights dynamically to different inputs so that the model can effectively learn and utilise important information related to the task and adapt more flexibly to the complex relationships in the data, thus improving the performance and generalisation of the model.
Given the complex spatial-temporal interactions between features of cost data, this article proposes a hybrid attention mechanism (HAM) to capture the inter-feature interactions, aiming to focus on both the temporal and feature dimensions of the data and to improve the ability of the model to capture multivariate time-series information. HAM contains two modules: channel attention and spatial attention. The structure is shown in Figure 4.
Here, the channel attention module aggregates the information about the feature dimensions through the tie pooling and maximum pooling operations to obtain the deep feature representation of the feature dimensions, which is then mapped into the attention weights on the feature dimensions by the multilayer perceptron. The s i g m o i d activation function is used to activate the outputs of the summation to obtain the weights MC between zero and one. Unlike channel attention, the spatial attention module targets the temporal dimension by concatenating the pooled outputs and inputting them into a convolutional layer to extract features with high attentional weights in the temporal dimension, the convolution operation is utilised to emphasise the significance of the different time steps, and the s i g m o i d activation function is used to activate the outputs of the summed outputs in order to obtain weights between zero and one MS. Connecting the channel-attention module and the spatial-attention module in a parallel way makes the modules consider the features of both the channel and spatial dimensions, avoiding the mutual influence of the channel and spatial dimensions and enabling the model to understand the multidimensional time series data better. This hybrid model helps the model capture key information about the data more comprehensively, which in turn improves the performance and generalisation of the model, as calculated in the following formula:
M c F = σ M L P A v e P o o l F + M L P M a x P o o l F
M s F = σ C o n v C o n n A v e P o o l F , M a x P o o l F
F = M c F F M s F
where σ represents the s i g m o i d function, M L P is a multilayer perceptual machine, A v e P o o l and M a x P o o l denote tie pooling and maximum pooling, C o n v stands for convolution, C o n n stands for connection, and M c and M s represent the outputs of channel attention and spatial attention, respectively.

3.3. Loss Function Selection

During the training of the prediction model, the Adam optimisation algorithm was selected to update the model parameters based on the losses. Adam is a gradient descent-based optimisation algorithm that combines the advantages of momentum gradient descent and the root-mean-squared transfer algorithm [29]. It can effectively improve the training of a model, converge to optimal solutions faster, and iteratively update the weights and biases of a neural network based on training data to optimise the loss function output value. The loss function of the model was the MSE, which is calculated as follows:
M S E = 1 n i = 1 n y i y ¯ i 2
where n denotes the number of samples, y i denotes the actual value, and y ¯ i denotes the model output value.

4. Hyperparametric Optimisation Methods

In the formulation of the cost prediction task, the accuracy of the HAM-BiLSTM model not only depends on the comprehensiveness of feature extraction, but is also affected by the combination of hyperparameters used by the model. In this study, the improved particle swarm optimisation (IPSO) algorithm was used to search for optimal hyperparameter combinations automatically to avoid the tedium and instability of manual parameter tuning and further improve model performance.

4.1. PSO Algorithm

The PSO algorithm [30,31,32] is an optimisation algorithm based on group intelligence inspired by the collective behaviors of flocks of birds or schools of fish. In the PSO algorithm, individuals are called particles. Each particle represents a potential solution and particles find the optimal solution by adjusting their speed and position. The inertia weight is an important parameter for balancing and adjusting the global and local search ability of the algorithm. The particle trajectory depends on the individual and social experiences of the particle, which are controlled by a learning factor. The velocity and position update formulas for a particle are defined as follows:
V i , t + 1 = w V i , t + c 1 r 1 p b e s t i - X i , t + c 2 r 2 g b e s t t - X i , t
X i , t + 1 = X i , t + λ V i , t + 1
where V i , t denotes the velocity of particle i after t iterations, w denotes the inertia weight, c 1 and c 2 denote the individual learning factor and social learning factor, r 1 and r 2 denote random numbers, x i , t denotes the position of particle i after t iterations, and λ denotes the velocity coefficient.

4.2. Phase Adjustment of Inertia Weights and Learning Factors

In traditional PSO algorithms, inertia weights and learning factors are typically set to fixed values, which limits global optimisation capability and convergence speed. Reference [33] dynamically adjusted inertia weights and learning factors according to changes in fitness, [34] dynamically adjusted inertia weights according to the number of iterations, and [35] dynamically adjusted the learning factor according to the number of iterations. These schemes all resulted in significant improvements in algorithm performance. However, considering the different states of particles in different periods, it is difficult to maximise PSO performance based on a single factor. Therefore, we combine the number of iterations and degree of adaptability and propose adjusting the inertia weights and learning factors in stages according to the following formulas:
w = w m i n + w m a x - w m i n * f i , t - f b e s t f i , t m
c 1 = c 1 , m i n + c 1 , m a x - c 1 , m i n * f i , t - f b e s t f i , t m
c 2 = c 1 , m i n + c 1 , m a x - c 1
m = t 2 T 2
where w m a x and w m i n are the maximum and minimum values of the inertia weight w , respectively; f i , t and f b e s t are the fitness value of particle i and the value of globally optimal particle fitness in iteration t , respectively; c 1 , m a x and c 1 , m i n denote the maximum and minimum values of the individual learning factor c 1 , respectively; c 2 denotes the social learning factor; and t and T denote the current iteration and a maximum number of iterations, respectively. Additionally, w 0.4 , 0.9 and c 1 1 , 4 .
Stage regulation divides the iterative process of the algorithm into three phases.
  • Exploration phase: at the beginning of the iteration, when the value of m is small, the inertia weights and learning factors are relatively less affected by adaptation, which reduces the magnitude of changes in the weight and learning factors, thereby ensuring that the particles have larger inertia weights and stronger individual learning abilities, promoting global exploration and escape from local optima.
  • Equilibrium phase: in the middle of the iteration, when the value of m is moderate, the inertia weights and learning factors become smoother through adaptation, which improves the social learning ability of the particles and helps them better utilise collective experiences for refined searching and adjustment.
  • Convergence phase: in later iterations, when the value of m is large, the inertia weights and learning factors are more sensitive to the influence of adaptation, which enhances the social cognitive ability of the particles, accelerates the convergence of the algorithm and helps guide particles to converge to the globally optimal solution more quickly.
As the value of m varies in the interval 0 , 1 , it enables constraining the particles while adjusting their behavior, enabling them to strike a balance between exploration and convergence and thereby optimising the performance of the algorithm.

4.3. Negative Selection Evolutionary Strategy

In the iterative process of the algorithm, when the fitness values of the particles gradually become close to each other, the algorithm may be in a state of convergence or stagnation. At this time, the particles will be classified into advantageous and disadvantageous particles according to the average fitness value, and the disadvantageous particles generally converge to advantageous particles as the iteration progresses in the traditional PSO, which prevents the algorithm from finding a better solution in the solution space.
Inspired by biological evolution, each particle is considered as an individual and each dimension of a particle is considered as a gene segment. Under the premise of retaining the superior particles in the population, a gene crossover operation is used to break the gene structures of inferior individuals to promote their evolution. First, the ratio C of the average fitness of each particle to the fitness of the global optimal particle is calculated using the following formulas:
f b e s t f a v e = C
f a v e = i = 1 n f i , t n
where f b e s t , f a v e , and f i , t are the average fitness of each particle, global optimal fitness, and fitness of particle i after t iterations, respectively, and n is the number of particles.
Second, a critical point K is defined by observing the magnitude of the change in C . When C is greater than K , inferior particles are selected as bi-parental samples (e.g., particles A and B). Eventually, x gene segments in the same position are randomly selected from the bi-parental samples to be interchanged, where the value of x is in 1 , n 1 , to produce new particles A1 and B1. This maintains the diversity of the population and prompts the algorithm to continue exploring the solution space. The negative selection evolutionary strategy is illustrated in Figure 5.

4.4. Hyperparameter Optimisation Process

The hyperparameter optimisation process is shown in Figure 6.
The specific steps are as follows:
  • The batch size, number of LSTM layer cells, dropout probability, number of dense layer cells, and number of iterations in the HAM-BiLSTM model are selected as hyper-parameters to be optimised, and the respective search ranges are set.
  • We defined the population size N , maximum number of iterations T , inertia weight w search range, learning factor c 1 search range, and particle dimension d of the IPSO algorithm and initialised the particle velocity V and particle position X .
  • The MSE of the true and predicted values was used as the fitness function.
  • The performance was evaluated by training the HAM-BiLSTM model, calculating the fitness of each particle, and recording the average fitness f a v e and global optimal fitness f b e s t of the current particle population.
  • We determined whether f b e s t was less than the predefined value g b e s t or whether the current number of iterations t reached T . If so, then we output the optimal solution.
  • The individual and population extremes were updated based on particle fitness.
  • We updated w , c 1 , and c 2 according to Equations (18)–(21).
  • We determined whether the ratio C of f b e s t to f a v e was greater than the critical value K according to Equations (22) and (23). If so, the negative selection evolution strategy was executed.
  • The V and X values of the current particle were updated according to Equations (7) and (8) before returning to step four.

5. Forecasting

The prediction process is shown in Figure 7.
The model prediction steps are as follows.
  • Data pre-processing: the sample data were divided into training and testing sets and normalised.
  • Model training: the model was trained using the training set, and the IPSO algorithm was utilised to determine the optimal combination of hyperparameters for the model.
  • Model prediction: the trained model performed predictions on the testing set by employing an optimal combination of hyperparameters.
  • Output results: the predictions of the model were inversely normalised to the original data range and outputted.

6. Experimental Results and Analysis

6.1. Data Preprocessing

This study is based on the data collection of residential cost data provided by Guanglianda Indicator Network and material price data provided by the Guangcai Network. The balance of the sample needs to be considered in the sample collection process, including the fact that different areas will have different land costs and code standards, high-rise buildings will require more complex and robust structural design and engineering than mid-rise buildings, and different forms of delivery represent different levels of renovation, all of which will affect the balance of the sample. Therefore, we restricted the region to Jiangsu Province during the sample collection process, the number of building floors to mid-rise residential buildings (18 floors and below), and the form of residential delivery to be simple. We initially screened 167 pieces of data from 2019 to 2023, excluding invalid samples with missing data, and ultimately obtained 156 valid sample data.

6.1.1. Relevance Analysis

The correlation values of the candidate indicators are listed in Table 1, and the candidate indicators with correlation values greater than 0.9 are designated as feature indicators. The foundation type ( x 1 ), structure type ( x 2 ), fortification intensity ( x 3 ), management level ( x 4 ), aboveground floor area ( x 5 ), number of aboveground floors ( x 6 ), number of underground floors ( x 7 ), aboveground floor height ( x 8 ), basement floor height ( x 9 ), concrete prices ( x 10 ), reinforcing steel prices ( x 11 ), and duration ( x 12 ) indicators were selected as the model inputs, and the unit cost ( y ) was defined as the model output.

6.1.2. Data Quantification

For qualitative indicators to be used as inputs for the model, they must be quantified. These indicators include foundation type, structure type, fortification intensity, and management level. The detailed quantification process is defined in Table 2.

6.1.3. Data Normalisation

The sample data were divided at a ratio of 8:2 to obtain training and test sets. The input quantities of the model were the 12 feature indicators selected above, and the output quantities were the unilateral costs. Considering the magnitude differences between different feature indicators, to eliminate the impacts of such differences on the model training, reduce the training time, and accelerate the convergence speed, we adopted the max-min method to normalise all sample data. The max-min method eliminates the impact caused by the large magnitude differences by mapping original data to the interval 0 , 1 such that different feature indicators have the same scale range. This process is formulated as follows:
y i = x i x m i n x m a x x m i n i = 1 , 2 , , n
where x m i n denotes the minimum value of indicator x in the data and x m a x denotes the maximum value of indicator x in the data.
Some of the data obtained after normalising the sample data are presented in Table 3.

6.2. IPSO Algorithm Performance Test

To verify the superiority of the IPSO algorithm, the PSO and IPSO algorithms were executed on eight benchmark test functions. The population size for the PSO and IPSO algorithms was set to 50, the number of iterations was set to 100, the dimensionality was set to 30, and the number of independent runs was set to 20. As shown in Table 4, the minima, maxima, averages, and standard deviations obtained by the IPSO algorithm were better than those obtained by the PSO algorithm and were closer to the global optimum. Figure 8 compares the convergence curves of the PSO and IPSO algorithms, and one can draw the following conclusions: the IPSO algorithm performs better in terms of convergence accuracy, and it enables the particles to gather more stably in the vicinity of the global optimum, therefore improving the ability of the PSO algorithm to overcome local extreme values.

6.3. Hyperparametric Optimisation Results

The hyperparameters to be optimised in the model are the package batch size N b a t c h , number of LSTM layer cells N L S T M , dropout probability N D r o p o u t , number of dense layer cells N D e n s e , and number of iterations N e p o c h . In this paper, the hyperparameter search ranges were set as shown in Table 5.
The results of IPSO and traditional PSO optimisation searches were compared. The number of iterations of the optimisation algorithm was set to 50, and the population was set to 30. The variation of fitness values during the iteration process of the PSO and IPSO algorithms is shown in Figure 9.
One can see that the IPSO algorithm has a better ability to find the optimum compared to the PSO algorithm. The optimisation results are summarised in Table 6.

6.4. Evaluation Indicators

In this study, we used five evaluation indices to evaluate the model: the mean absolute error (MAE), root-mean-square error (RMSE), mean percentage error (MPE), mean absolute percentage error (MAPE), and coefficient of determination R2. The smaller the values of MAE, RMSE, and SMAPE, the better the performance of the model, and the larger the value of R2, the better the model fitting ability. In assessing the feasibility of a project, the consequences of underestimating costs are more serious than those of overestimating them. The MPE was used to measure the directionality of the forecasting index. The specific formulas are as follows:
M A E = 1 n i = 1 n y i y ^ i
R M S E = 1 n i = 1 n y i y ^ i 2
M P E = 100 % n i = 1 n y ^ i y i y i
M A P E = 100 % n i = 1 n y i y ^ i y i
R 2 = 1 i = 1 n y i y ^ i 2 / i = 1 n y i y ¯ 2
where y i is the true value, y ^ i is the predicted value, y ¯ i is the mean of the true values, and n is the number of data used.

6.5. Comparative Experiments and Analyses

Figure 10 compares the predicted and real value curves of the proposed model on the test set for the unilateral cost of construction projects. The predicted and real value curves fit well together, which shows that the model has good prediction performance for the unilateral cost.
To verify the superiority of BiLSTM in capturing the interactions between data tensors, BiLSTM is compared with the traditional benchmark models BP, SVM, and LSTM. The prediction results and errors are shown in Figure 11. The prediction results demonstrate that BiLSTM has the best fit to the true values, and the overall error variation of BiLSTM is relatively smooth, as seen from the error curves.
Table 7 shows the results of the comparison of each single model on the five evaluation indicators. For the construction cost data, the average errors of LSTM and BiLSTM with time series capturing capability are relatively small, and the MAE, RMSE, and MAPE of BiLSTM are reduced by 2.536, 3.107, and 0.142%, respectively, compared with those of LSTM, which verifies the superiority of the BiLSTM bi-directional processing mechanism in capturing the inter-temporal interactions of the data.
To validate the effectiveness of the model components further, the model was compared with BiLSTM, AM-BiLSTM, HAM-BiLSTM, and PSO-HAM-BiLSTM. The prediction results and error comparisons are shown in Figure 12.
As can be seen from the figure, although the prediction results of each model have trends similar to that of the true value curve, the error curves of each model have a large difference. Comparing the error curves of HAM-BiLSTM and AM-BiLSTM reveals that the number of errors greater than 20 decreases from 13 to 8, reflecting that HAM enhances the robustness of the model in terms of its ability to capture inter-feature interactions due to traditional AM. The HAM-BiLSTM with hyperparameters optimised by IPSO has a smoother trend of error curves and a smaller range of error fluctuations relative to the HAM-BiLSTM with hyperparameters optimised by the conventional PSO, reflecting the fact that IPSO can further improve the accuracy of the model.
Table 8 shows the comparison results of the models on the five evaluation metrics, the mean percentage error of the five models is positive, avoiding the serious consequences of underestimating the cost, and all the other error metrics of HAM-BiLSTM after optimising the hyperparameters by IPSO are significantly reduced. In particular, the average absolute error of HAM-BiLSTM is reduced by 6.711 and 4.252%, the RMSE is reduced by 7.757 and 3.670%, and the average absolute percentage error is reduced by 0.325 and 0.237%, respectively. This further demonstrates the effectiveness of HAM in capturing the interactions between the features and enhancing the feature extraction capability of the model. The average absolute error of HAM-BiLSTM after IPSO optimisation of hyperparameters is reduced by 8.560 and 6.236%, the RMSE is reduced by 10.436 and 7.658%, and the average absolute percentage error is reduced by 0.463 and 0.324%, in comparison with those of HAM-BiLSTM and AM-BiLSTM after PSO of the hyperparameters, indicating that the improved PSO algorithm has a better search capability, which can better overcome the problem of falling into local optimal solutions and then find a more comprehensive search of the hyperparameter space to find a better combination. Thus, the improved particle swarm optimisation algorithm has stronger search capability and can better overcome the problem of falling into local optima, thus searching the hyperparameter space more comprehensively and finding better hyperparameter combinations.

7. Conclusions

In this study, a new construction cost prediction method was developed to address the problem of strong interaction among construction cost data. Based on the exploration and improvement of the BiLSTM network in the field of deep learning, the temporal interactions in the data are captured at a deep level, and the HAM module is added to the network, which enhances the ability of the network to capture the interactions among the features and then constructs an effective construction project cost prediction model. In addition, to select the feature indicators that have greater impacts on the cost as model inputs, GRA was used to assess the importance of the factors related to influencing the cost, and the most valuable feature indicators were screened according to the principle of proportionality. To avoid the tediousness and uncertainty of manual parameter tuning, an improved particle swarm optimisation algorithm was designed to find the optimal hyperparameter combination for the model automatically. Through comparative experiments and analyses, the performance of the method in construction cost prediction work was verified.
Even though the results obtained with the improved model are very positive compared to the BiLSTM network, it must be recognised that it has limitations. Firstly, since HAM contains two attention modules, it embodies a powerful feature extraction capability, which will pay more computational resources accordingly. Secondly, IPSO is committed to improving the algorithm’s ability to find the global optimal solution, and we do not reduce the number of iterations and running time of the algorithm relative to the original algorithm.
In future research, we will devote ourselves to overcoming these limitations by focusing on an improved method to simplify the model structure in the feature extraction problem, and a hyperparameter optimisation method that combines computing accuracy and computing speed in the hyperparameter optimisation problem. In addition, for the construction cost prediction problem, not only is an accurate prediction model needed, but also the selection of characteristic indexes is equally important. We will analyse the influencing factors of the cost in depth to contribute more value to the development of this field.

Author Contributions

Conceptualisation, C.W.; methodology, J.Q.; validation, J.Q.; investigation, J.Q.; data curation, J.Q.; writing—original draft preparation, J.Q. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (No. 62072363) and the Natural Science Foundation of Shaanxi Province (No. 2019JM-167).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data used in this study are available at (www.gldzb.com, accessed on 18 January 2024) and (www.gldjc.com, accessed on 18 January 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Choi, C.Y.; Ryu, K.R.; Shahandashti, M. Predicting City-Level Construction Cost Index Using Linear Forecasting Models. J. Constr. Eng. Manag. 2021, 147, 04020158. [Google Scholar] [CrossRef]
  2. Kim, S.; Choi, C.Y.; Shahandashti, M.; Ryu, K.R. Improving Accuracy in Predicting City-Level Construction Cost Indices by Combining Linear Arima and Nonlinear ANNs. J. Manag. Eng. 2022, 38, 04021093. [Google Scholar] [CrossRef]
  3. Petruseva, S.; Zileska-Pancovska, V.; Žujo, V.; Brkan-Vejzović, A. Construction Costs Forecasting: Comparison of the Accuracy of Linear Regression and Support Vector Machine Models. Tech. Gaz. 2017, 24, 1431–1438. [Google Scholar]
  4. Ali, Z.H.; Burhan, A.M.; Kassim, M.; Al-Khafaji, Z. Developing an Integrative Data Intelligence Model for Construction Cost Estimation. Complexity 2022, 2022, 4285328. [Google Scholar] [CrossRef]
  5. Li, L. Dynamic Cost Estimation of Reconstruction Project Based on Particle Swarm Optimization Algorithm. Informatica 2023, 47, 173–182. [Google Scholar] [CrossRef]
  6. Wang, X. Forecasting Construction Project Cost Based on BP Neural Network. In Proceedings of the 10th International Conference on Measuring Technology and Mechatronics Automation (ICMTMA), Changsha, China, 10–11 February 2018; IEEE Publications: New York, NY, USA, 2018; Volume 2018, pp. 420–423. [Google Scholar] [CrossRef]
  7. Ye, D. An Algorithm for Construction Project Cost Forecast Based on Particle Swarm Optimization-guided BP Neural Network. Sci. Program. 2021, 2021, 4309495. [Google Scholar] [CrossRef]
  8. Wang, B.; Dai, J. Discussion on the Prediction of Engineering Cost Based on Improved BP Neural Network Algorithm. J. Intell. Fuzzy Syst. 2019, 37, 6091–6098. [Google Scholar] [CrossRef]
  9. Dong, J.; Chen, Y.; Guan, G. Cost Index Predictions for Construction Engineering Based on LSTM Neural Networks. Adv. Civ. Eng. 2020, 2020, 6518147. [Google Scholar] [CrossRef]
  10. Cao, Y.; Ashuri, B. Predicting the Volatility of Highway Construction Cost Index Using Long Short-Term Memory. J. Manag. Eng. 2020, 36, 04020020. [Google Scholar] [CrossRef]
  11. Joseph, L.P.; Deo, R.C.; Prasad, R.; Salcedo-Sanz, S.; Raj, N.; Soar, J. Near Real-Time Wind Speed Forecast Model with Bidirectional LSTM Networks. Renew. Energy 2023, 204, 39–58. [Google Scholar] [CrossRef]
  12. Li, X.; Pan, Y.; Zhang, L.; Chen, J. Dynamic and Explainable Deep Learning-Based Risk Prediction on Adjacent Buildings Induced by Deep Excavation. Tunn. Undergr. Space Technol. 2023, 140, 105243. [Google Scholar] [CrossRef]
  13. Atef, S.; Nakata, K.; Eltawil, A.B. A Deep Bi-directional Long-Short Term Memory Neural Network-Based Methodology to Enhance Short-Term Electricity Load Forecasting for Residential Applications. Comput. Ind. Eng. 2022, 170, 108364. [Google Scholar] [CrossRef]
  14. Niu, D.; Sun, L.; Yu, M.; Wang, K. Point and Interval Forecasting of Ultra-short-Term Wind Power Based on a Data-Driven Method and Hybrid Deep Learning Model. Energy 2022, 254, 124384. [Google Scholar] [CrossRef]
  15. Siami-Namini, S.; Tavakoli, N.; Namin, A.S. The Performance of LSTM and BiLSTM in Forecasting Time Series. In Proceedings of the IEEE International Conference on Big Data (Big Data), Los Angeles, CA, USA, 9–12 December 2019; IEEE Publications: New York, NY, USA, 2019; Volume 2019, pp. 3285–3292. [Google Scholar] [CrossRef]
  16. Ribeiro, M.H.D.M.; dos Santos Coelho, L. Ensemble Approach Based on Bagging, Boosting and Stacking for Short-Term Prediction in Agribusiness Time Series. Appl. Soft Comput. 2020, 86, 105837. [Google Scholar] [CrossRef]
  17. Zhang, Q.; Wang, R.; Qi, Y.; Wen, F. A Watershed Water Quality Prediction Model Based on Attention Mechanism and Bi-LSTM. Environ. Sci. Pollut. Res. Int. 2022, 29, 75664–75680. [Google Scholar] [CrossRef]
  18. Vaziri, J.; Farid, D.; Nazemi Ardakani, M.; Hosseini Bamakan, S.M.; Shahlaei, M. A Time-Varying Stock Portfolio Selection Model Based on Optimised PSO-BiLSTM and Multi-objective Mathematical Programming Under Budget Constraints. Neural Comput. Appl. 2023, 35, 18445–18470. [Google Scholar] [CrossRef]
  19. Hatamleh, M.T.; Hiyassat, M.; Sweis, G.J.; Sweis, R.J. Factors Affecting the Accuracy of Cost Estimate: Case of Jordan. Eng. Constr. Archit. Manag. 2018, 25, 113–131. [Google Scholar] [CrossRef]
  20. Gai, R.; Guo, Z. A Water Quality Assessment Method Based on an Improved Grey Relational Analysis and Particle Swarm Optimisation Multi-classification Support Vector Machine. Front. Plant Sci. 2023, 14, 1099668. [Google Scholar] [CrossRef] [PubMed]
  21. Ahn, J.; Ji, S.H.; Ahn, S.J.; Park, M.; Lee, H.; Kwon, N.; Lee, E.; Kim, Y. Performance Evaluation of Normalization-Based CBR Models for Improving Construction Cost Estimation. Autom. Constr. 2020, 119, 103329. [Google Scholar] [CrossRef]
  22. Dursun, O.; Stoy, C. Conceptual Estimation of Construction Costs Using the Multistep Ahead Approach. J. Constr. Eng. Manag. 2016, 142, 04016038. [Google Scholar] [CrossRef]
  23. Xiao, X.; Skitmore, M.; Yao, W.; Ali, Y. Improving Robustness of Case-Based Reasoning for Early-Stage Construction Cost Estimation. Autom. Constr. 2023, 151, 104777. [Google Scholar] [CrossRef]
  24. Ahn, J.; Park, M.; Lee, H.S.; Ahn, S.J.; Ji, S.; Song, K.; Son, B. Covariance Effect Analysis of Similarity Measurement Methods for Early Construction Cost Estimation Using Case-Based Reasoning. Autom. Constr. 2017, 81, 254–266. [Google Scholar] [CrossRef]
  25. Ji, S.H.; Ahn, J.; Lee, H.S.; Han, K. Cost Estimation Model Using Modified Parameters for Construction Projects. Adv. Civ. Eng. 2019, 2019, 8290935. [Google Scholar] [CrossRef]
  26. Hu, W.; Chang, Y.; He, X. Influencing Factors and Prediction Model of Construction Project Duration. Civ. Eng. 2018, 51, 103–112. [Google Scholar]
  27. Brauwers, G.; Frasincar, F. A General Survey on Attention Mechanisms in Deep Learning. IEEE Trans. Knowl. Data Eng. 2021, 35, 3279–3298. [Google Scholar] [CrossRef]
  28. Niu, Z.; Zhong, G.; Yu, H. A Review on the Attention Mechanism of Deep Learning. Neurocomputing 2021, 452, 48–62. [Google Scholar] [CrossRef]
  29. Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
  30. Jain, M.; Saihjpal, V.; Singh, N.; Singh, S.B. An Overview of Variants and Advancements of PSO Algorithm. Appl. Sci. 2022, 12, 8392. [Google Scholar] [CrossRef]
  31. Shami, T.M.; El-Saleh, A.A.; Alswaitti, M.; Al-Tashi, Q.; Summakieh, M.A.; Mirjalili, S. Particle Swarm Optimization: A Comprehensive Survey. IEEE Access 2022, 10, 10031–10061. [Google Scholar] [CrossRef]
  32. Bonyadi, M.R. A Theoretical Guideline for Designing an Effective Adaptive Particle Swarm. IEEE Trans. Evol. Comput. 2019, 24, 57–68. [Google Scholar] [CrossRef]
  33. Zhao, G.; Jiang, D.; Liu, X.; Tong, X.; Sun, Y.; Tao, B.; Kong, J.; Yun, J.; Liu, Y.; Fang, Z. A Tandem Robotic Arm Inverse Kinematic Solution Based on an Improved Particle Swarm Algorithm. Front. Bioeng. Biotechnol. 2022, 10, 832829. [Google Scholar] [CrossRef]
  34. Wang, H.; Peng, M.J.; Wesley Hines, J.W.; Zheng, G.Y.; Liu, Y.K.; Upadhyaya, B.R. A Hybrid Fault Diagnosis Methodology with Support Vector Machine and Improved Particle Swarm Optimization for Nuclear Power Plants. ISA Trans. 2019, 95, 358–371. [Google Scholar] [CrossRef]
  35. Yu, H. Evaluation of Cloud Computing Resource Scheduling Based on Improved Optimization Algorithm. Complex Intell. Syst. 2021, 7, 1817–1822. [Google Scholar] [CrossRef]
Figure 1. Forecasting model structure.
Figure 1. Forecasting model structure.
Applsci 14 00978 g001
Figure 2. LSTM network structure.
Figure 2. LSTM network structure.
Applsci 14 00978 g002
Figure 3. BiLSTM network structure.
Figure 3. BiLSTM network structure.
Applsci 14 00978 g003
Figure 4. Structure of a HAM.
Figure 4. Structure of a HAM.
Applsci 14 00978 g004
Figure 5. Negative selection evolutionary strategy.
Figure 5. Negative selection evolutionary strategy.
Applsci 14 00978 g005
Figure 6. Hyperparameter optimisation process.
Figure 6. Hyperparameter optimisation process.
Applsci 14 00978 g006
Figure 7. Model prediction process.
Figure 7. Model prediction process.
Applsci 14 00978 g007
Figure 8. Comparison of convergence curves for eight test functions.
Figure 8. Comparison of convergence curves for eight test functions.
Applsci 14 00978 g008aApplsci 14 00978 g008b
Figure 9. Hyperparametric optimisation convergence curves.
Figure 9. Hyperparametric optimisation convergence curves.
Applsci 14 00978 g009
Figure 10. Comparison of real and projected values.
Figure 10. Comparison of real and projected values.
Applsci 14 00978 g010
Figure 11. Prediction results and errors for single model comparisons.
Figure 11. Prediction results and errors for single model comparisons.
Applsci 14 00978 g011aApplsci 14 00978 g011b
Figure 12. Prediction results and errors for combined model comparisons.
Figure 12. Prediction results and errors for combined model comparisons.
Applsci 14 00978 g012aApplsci 14 00978 g012b
Table 1. Candidate indicator correlation values.
Table 1. Candidate indicator correlation values.
Indicator NameRelatednessIndicator NameRelatedness
Foundation type0.9852Aboveground floor height0.9732
Structure type0.9897Underground floor height0.9436
Fortification intensity0.9603Interior wall decoration0.8763
Management level0.9718Types of doors and windows0.8525
Facade material0.8956Number of elevators0.8643
Aboveground floor area0.9496Roof type0.8726
Underground floor area0.8831Concrete prices0.9558
Number of aboveground floors0.9764Reinforcing steel prices0.9623
Number of underground floors0.9668Duration0.9311
Table 2. Quantitative table of qualitative indicators.
Table 2. Quantitative table of qualitative indicators.
Foundation
Type
ValueStructure
Type
ValueFortification
Intensity
ValueManagement
Level
Value
Raft slab foundation1Shear wall structure16 degrees1Excellent1
Pile foundation2Framework structure27 degrees2Good2
Mantenna foundation3Frame shear construction38 degrees3Bad3
9 degrees4
Table 3. Normalised data.
Table 3. Normalised data.
Numberx1x2x3x4x5x6x7x8x9x10x11x12y
11.000.001.001.000.540.911.000.000.780.340.470.980.83
21.000.001.001.001.000.861.000.000.690.340.470.371.00
31.000.500.501.000.200.960.670.000.890.360.470.220.63
40.000.001.000.500.180.910.671.000.780.360.470.610.18
1551.000.500.500.500.021.000.180.330.520.940.360.270.23
1561.001.000.500.000.061.000.270.330.710.040.710.860.59
Table 4. Comparison of PSO and IPSO algorithm test results.
Table 4. Comparison of PSO and IPSO algorithm test results.
FunctionArithmeticMinMaxAverageSTD
SpherePSO2.23 × 10−11.125.53 × 10−12.24 × 10−1
IPSO3.62 × 10−303.32 × 10−261.46 × 10−261.70 × 10−27
SchwefelPSO1.224.051.976.30 × 10−1
IPSO1.28 × 10−153.61 × 10−133.64 × 10−149.27 × 10−13
Sum squaresPSO5.672.20 × 1011.16 × 1015.06
IPSO6.48 × 10−305.78 × 10−262.89 × 10−263.97 × 10−27
RosenbrockPSO6.18 × 1012.68 × 1021.26 × 1025.16 × 101
IPSO2.87 × 1012.90 × 1012.89 × 1013.76 × 10−2
RastigrinPSO4.73 × 1011.01 × 1027.36 × 1011.45 × 101
IPSO1.82 × 10−11.35 × 1016.463.79
AckleyPSO7.91 × 10−13.182.284.75 × 10−1
IPSO8.88 × 10−168.88 × 10−168.88 × 10−160
GriewankPSO2.43 × 10−21.04 × 10−14.86 × 10−21.68 × 10−2
IPSO2.84 × 10−47.60 × 10−38.78 × 10−41.60 × 10−3
PenalisedPSO1.71 × 10−22.45 × 10−19.51 × 10−26.30 × 10−2
IPSO2.79 × 10−52.62 × 10−41.53 × 10−45.61 × 10−5
Table 5. Hyperparameter search ranges.
Table 5. Hyperparameter search ranges.
HyperparameterSearch Scope
N b a t c h 1–30
N L S T M 1–100
N D r o p o u t 0.1–0.5
N D e n s e 1–100
N e p o c h 200
Table 6. Optimisation results.
Table 6. Optimisation results.
Algorithm N b a t c h N L S T M N D r o p o u t N D e n s e N e p o c h
PSO5260.222200
IPSO3320.118200
Table 7. Comparison of single model performance evaluation.
Table 7. Comparison of single model performance evaluation.
ModelMAERMSEMPE/%MAPE/%R2
BP33.98039.174−0.1241.7870.921
SVM29.25436.068−0.1541.5170.933
LSTM25.29430.2360.4721.3290.953
BiLSTM22.75827.1290.0541.1870.962
Table 8. Comparison of performance evaluation of combined models.
Table 8. Comparison of performance evaluation of combined models.
ModelMAERMSEMPE/%MAPE/%R2
BiLSTM22.75827.1290.0541.1810.962
AM-BiLSTM20.29923.0420.8381.0930.973
HAM-BiLSTM16.04719.3720.6470.8560.981
PSO-HAM-BiLSTM13.72316.5940.1610.7170.986
IPSO-HAM-BiLSTM7.4878.9360.2360.3930.996
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, C.; Qiao, J. Construction Project Cost Prediction Method Based on Improved BiLSTM. Appl. Sci. 2024, 14, 978. https://0-doi-org.brum.beds.ac.uk/10.3390/app14030978

AMA Style

Wang C, Qiao J. Construction Project Cost Prediction Method Based on Improved BiLSTM. Applied Sciences. 2024; 14(3):978. https://0-doi-org.brum.beds.ac.uk/10.3390/app14030978

Chicago/Turabian Style

Wang, Chaoxue, and Jiale Qiao. 2024. "Construction Project Cost Prediction Method Based on Improved BiLSTM" Applied Sciences 14, no. 3: 978. https://0-doi-org.brum.beds.ac.uk/10.3390/app14030978

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop