Predicting the Energy Consumption of Commercial Buildings Based on Deep Forest Model and Its Interpretability

Zheng, Guangfa; Feng, Zao; Jiang, Mingkai; Tan, Li; Wang, Zhenglang

doi:10.3390/buildings13092162

Open AccessArticle

Predicting the Energy Consumption of Commercial Buildings Based on Deep Forest Model and Its Interpretability

by

Guangfa Zheng

¹,

Zao Feng

^1,2,3,*

,

Mingkai Jiang

⁴,

Li Tan

¹ and

Zhenglang Wang

¹

Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China

²

Yunnan Key Laboratory of Artificial Intelligence, Kunming University of Science and Technology, Kunming 650500, China

³

Yunnan International Joint Laboratory of Intelligent Control and Application of Advanced Equipment, Kunming University of Science and Technology, Kunming 650500, China

⁴

Guangzhou Nansha Power Supply Bureau, Guangdong Power Grid Limited Liability Company, Guangzhou 511458, China

^*

Author to whom correspondence should be addressed.

Buildings 2023, 13(9), 2162; https://0-doi-org.brum.beds.ac.uk/10.3390/buildings13092162

Submission received: 15 July 2023 / Revised: 23 August 2023 / Accepted: 24 August 2023 / Published: 25 August 2023

(This article belongs to the Topic Advances in Building Simulation)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Building energy assessment models are considered to be one of the most informative methods in building energy efficiency design, and most of the current building energy assessment models have been developed based on machine learning algorithms. Deep learning models have proved their effectiveness in fields such as image and fault detection. This paper proposes a deep learning energy assessment framework with interpretability to support building energy efficiency design. The proposed framework is validated using the Commercial Building Energy Consumption Survey dataset, and the results show that the wrapper feature selection method (Sequential Forward Generation) significantly improves the performance of deep learning and machine learning models compared with the filtered (Mutual Information) and embedded (Least Absolute Shrinkage and Selection Operator) feature selection algorithms. Moreover, the Deep Forest model has an R² of 0.90 and outperforms the Deep Multilayer Perceptron, the Convolutional Neural Network, the Backpropagation Neural Network, and the Radial Basis Function Network in terms of prediction performance. In addition, the model interpretability results reveal how the features affect the prediction results and the contribution of the features to the energy consumption in a single building sample. This study helps building energy designers assess the energy consumption of new buildings and develop improvement measures.

Keywords:

energy consumption forecasting; deep learning (DL); feature selection; model interpretability; energy efficient design

1. Introduction

According to the Global Alliance for Building and Construction (GABC) Global Status Report, buildings account for approximately 40% of global energy use and 38% of global greenhouse gas (GHG) emissions [1]. Commercial buildings play a crucial role as the primary contributors to energy consumption and greenhouse gas emissions in the sector. In line with this, the World Green Building Council (World GBC) has emphasized the urgency of adopting net-zero carbon buildings as a standard practice in the commercial sector, with a target starting in 2030 [2]. Accurately predicting building energy consumption and developing various building energy efficiency strategies (e.g., optimization of air conditioning systems, lighting systems, etc.) is the best way to reduce greenhouse gas emissions from buildings. It is of great practical importance to study the mechanisms and patterns of building energy consumption and to develop accurate and effective building energy prediction models to assess the average annual energy consumption at the beginning of building design.

Current building energy consumption prediction methods are divided into three main categories: physical modeling (white-box model), data-driven modeling (black-box model), and hybrid modeling (gray-box model) [3,4]. Physical modeling methods use thermodynamic principles for energy consumption modeling and analysis, such as thermodynamic models [5], computational fluid dynamics models (CFD) [6], etc. Accurate simulations usually require the input of very detailed building information, such as individual spatial characteristics, the thermal properties of building materials, etc., which are often difficult to obtain [7]. In addition, physical modeling is often applied to individual buildings or small subsets of buildings and is not ideal for depicting the spatial variability of energy use in neighborhoods, blocks, or broader areas [8]. Compared with physical modeling approaches, data-driven approaches do not require extensive expertise in parametric mechanisms and internal building components and use historical data for energy consumption prediction with more accurate and faster predictions [9]. Hybrid approaches involve a combination of both physical and data-driven methods and use the outputs of physical models as inputs to data-driven models [10]. These models aim to offset some of the limitations involved in physical modeling through the flexibility of statistical methods [11]. However, this method requires more computational resources and has high runtimes and computational costs as it requires running two models simultaneously. In addition, since two different types of models are involved, the predicted results may be more difficult to interpret and understand.

Energy consumption in buildings is influenced by a variety of factors, such as building type, climatic conditions, building structure, heating system, human activities, etc. Therefore, predicting building energy consumption is also a complex and variable problem. Applying machine learning models to predict building energy consumption is more advantageous than traditional statistical models. For example, Arjunan [12] compared the performance of MLR, multiple linear regression with feature interactions (MLRi), and gradient boosted tree (GBT) and showed that GBT outperformed the other two linear regression models. Chen et al. [13] developed a neural network model specifically for high-rise buildings, showcasing its accuracy, computational speed, and flexibility compared with conventional building simulation methods. This model enables designers to predict building performance during the early stages of design, thereby improving energy efficiency and occupant comfort. Fu et al. [14] applied a clustering method to study the factors influencing energy consumption in a school building, and the results showed that occupancy rate was the main influencing factor.

In addition to the above machine learning models, a large number of existing scholars have begun to explore the effectiveness of deep learning on building energy consumption prediction and have achieved some results. For example, in Razak’s study [15], the performances of eight machine learning models and one deep learning model in predicting the annual energy consumption of residential buildings were compared. The results showed that the deep neural network (DNN) outperformed the other machine learning models with an R² of 0.95 and an RMSE of 1.19, which motivates building designers to use it to make informed decisions and to manage and optimize their designs prior to construction. In Moisés’ study [3], the hourly electricity consumption of a single-family home was predicted using a variety of machine learning and deep learning models, and the results showed that LSTM had the best prediction performance (nRMSE of 4.74%). In addition to predicting residential energy consumption, LSTM also shows excellent prediction results in predicting the energy consumption of other building types. For example, the LSTM model developed in the studies of Kim [16] and Li [17] achieved better performance in predicting hospital buildings. The study of Dinh [18] also demonstrated the effectiveness of LSTM in predicting the energy consumption of a commercial building.

Although a large number of studies [19,20,21] have shown that LSTM has promising performance in predicting energy consumption in buildings, it is limited to predicting short-term energy consumption (energy consumption with a time granularity of daily or hourly) [18]. From the existing literature, most of the previous studies have been conducted on a specific building or a small number of buildings in a region, which provides limited guidance on energy efficiency for other building types and poor scalability. There is a lack of more generalized predictive models, and with the increased availability of data on actual energy consumption, there is now ample opportunity to apply advanced techniques to large datasets in order to build more generalized predictive models [22].

The Commercial Building Energy Consumption Survey [23] is the most comprehensive publicly available data set on commercial building energy use in the United States, originally developed to make statistical inferences about the national commercial building population [24]. The CBECS is also attractive to building energy modelers due to the ample amount of data, the representativeness of the sample, and the broad geographic coverage [22]. Deng et al. [22] predicted the total EUI, heating EUI, cooling EUI, and plug load EUI of an office building using the CBECS dataset, and the results showed that the RF model had an RMSE of 28.3, which was superior to other machine learning models. Norouziasl et al. proposed a data-driven prediction framework based on the energy consumption of lighting in office buildings. The results showed that the Support Vector Machine (SVM) algorithm provided the best prediction performance with an R-squared value of 0.78. Robinson et al. [11] used the CBECS dataset to develop prediction models for 18 building types, including office buildings, hotels, and educational buildings, respectively, using machine learning models, such as linear regression and stochastic forest gradient boosting. The results showed that Extreme Gradient Boosting (XGBoost) has a higher prediction performance (R² of 0.82) than other machine learning models. However, this method is time-consuming and has low scalability as it has to model each building individually. Kumar et al. [25] developed a Random Forest prediction model for the fuel consumption of an entire commercial building, which has better prediction and scalability with training and validation delays of only 0.82 s and 1.14 s, respectively.

Despite the widespread use of various data-driven models in the field of building energy consumption, machine learning methods remain the dominant approach due to limitations in data volume and dimensionality. However, numerous studies have shown the great potential of deep learning methods in this field [26,27,28,29]. For instance, Goodfellow et al. [30] argued that having 5000 labeled complete data can lead to improved results for algorithms in deep learning. Nevertheless, deep learning methods also have their limitations and face significant challenges in interpreting predicted results. Furthermore, most prediction models are built for a single building type, lacking the ability to characterize other similar buildings and limiting their guidance for energy-saving and emission reduction. Therefore, this study aims to develop a model based on Deep Forest (DF) and the SHAP value theory for assessing energy consumption in multiple types of commercial buildings. In summary, the novelty of this study is as follows:

Compared with the current deep neural network algorithms in the field of building energy consumption prediction, the prediction effect is good but sacrifices the interpretability. The Deep Forest algorithm proposed in this paper fills the gap of poor interpretability of deep learning in the field of building energy consumption prediction;
Unlike the predictive models developed in the literature based on a single or a few building types, the building energy assessment model in this paper was developed based on 20 commercial building types, such as office buildings, warehouses, and schools. It has broader adaptability to provide energy consumption prediction and energy saving recommendations for 20 commercial building types.

The rest of this study is organized as follows. Section 2 describes the proposed modeling framework, dataset, and related theories for the energy consumption assessment of commercial buildings. Section 3 demonstrates the performance evaluation and DF model interpretation results of various deep learning models used to predict energy consumption. Finally, the conclusion of this study is given in Section 4.

2. Materials and Methods

The building energy consumption assessment model proposed in this paper is a deep learning-based prediction method constructed on DF. The framework of this study is shown in Figure 1. First, the CBECS dataset is pre-processed with missing value processing, outlier processing, and constant value processing. Subsequently, feature selection is performed after removing a large number of redundant features using Spearman feature filtering, including three different types of feature selection methods: mutual information feature selection (MI), forward feature selection (SFS), and Least Absolute Shrinkage and Selection Operator (LASSO). After extracting features using three different selection methods, these features are used as inputs for energy consumption prediction in three deep learning and two machine learning models. The prediction effects of combining various feature selection algorithms and prediction models are then compared to develop an effective energy consumption assessment model. Finally, by using SHAP values for the interpretability study of the established prediction model, the main influencing factors are analyzed, the causes of high energy consumption are identified, and the model prediction results and analysis conclusions are used as feedback on which to formulate energy-saving strategies that will inversely guide the energy-saving design of new buildings.

2.1. Data Description

CBECS 2012 is the largest commercial building energy survey conducted by the U.S. Energy Information Administration (EIA) to date, providing data on the annual energy consumption of over 6700 commercial buildings [22]. This dataset represents approximately 5.6 million commercial buildings across the United States [31]. Figure 2a displays the distribution of different building types, including office buildings, school buildings, and shopping malls, among others. Office buildings are of particular interest to most researchers due to their higher representation compared with other building types. The dataset comprises more than 1181 features related to four types of energy consumption, including heating, cooling, ventilation, and others. Figure 2b illustrates the percentage of energy consumption in commercial buildings, with fuel and electricity accounting for over 80% of the total energy consumption and serving as the primary sources of energy consumption and emissions. The remaining two energy sources contribute to only 17.3% of the total energy consumption. In this study, due to a significant number of missing values for fuel and natural gas consumption and their relatively low contribution to the total energy consumption (less than 20%), this paper primarily focuses on combining fuel and electricity consumption to represent the total energy consumption.

2.2. Data Pre-Processing

While the CBECS 2012 dataset provides valuable insights for energy efficiency analysis and prediction research, it is not exempt from certain limitations. These include a considerable number of missing values, the presence of outliers, and inconsistencies within the data. These factors can potentially compromise the accuracy and reliability of the analysis. Therefore, it is essential to preprocess the raw data before conducting any data analysis. The preprocessing phase involves various operations, such as data cleaning, missing value imputation, data standardization, and data filtering. These measures are implemented to enhance the quality and reliability of the data, enabling improved data analysis and inference. In this study, the following preprocessing steps were executed on the CBECS 2012 dataset:

Empirical removal of variables not relevant to predicting energy consumption, e.g., ‘Imputed roof replacement’ used to describe whether the roof parameter input is estimated or measured data, or the removal of constant and quasi-constant parameters (<5% change), leaving 614 features;
To reduce the uncertainty of the data, 162 variables with more than 80% missing values were removed to ensure data quality, leaving 452 characteristics;Missing values were filled using 0 values because missing values are not applicable in the database and 0 values have no significant effect on the results in regression prediction; in addition, samples with outliers were directly removed to preserve the characteristics of the data.
In order to better satisfy the model’s requirement that the data conform to the properties of a normal distribution, this paper used Z-Score normalization to process the data so that they are close to a normal distribution.

2.3. Feature Selection

During the development of an energy consumption assessment model, feature selection plays a crucial role as it directly impacts the model’s performance [32]. Building energy consumption features often exhibit high dimensionality, and as the number of features increases, the density of training samples dramatically decreases, leading to an elevated risk of overfitting. Moreover, high-dimensional features require more computational resources and longer training times, which can increase the cost of the machine learning problem. Currently, feature selection methods can be classified into three types: filter, wrapper, and embedded methods [33]. In this study, we applied and compared these three typical feature selection methods to evaluate their performance.

MI (Mutual Information) is a commonly used filtered feature selection method to measure the correlation between two variables. Specifically, the MI method calculates the mutual information I(X, Y) between two variables X and Y. Mutual information measures the degree of interdependence between variables X and Y. In feature selection, we usually represent the original dataset as a matrix of np, where n denotes the number of samples and p denotes the number of features. For the relationship between each feature and the target variable, the MI method calculates the mutual information between it and the target variable, and selects the feature that has the greatest correlation with the target variable. The formula of the MI method is shown in Equation (1):

M I (X, Y) = \sum_{x \in X} \sum_{y \in Y} p (x, y) \log \frac{p (x, y)}{p (x) p (y)}

(1)

where

p (x, y)

denotes the joint probability of X = x and Y = y,

p (x)

denotes the marginal probability of X = x, and

p (y)

denotes the marginal probability of Y = y. The MI method is suitable for dealing with both discrete and continuous variables, does not need to normalize or standardize the data beforehand, and is widely used in filtered feature selection.

SFS begins with an empty set S. It iteratively selects features from the original feature set based on evaluation criteria and adds them to the current feature subset S. This process continues until the desired number of features in the subset is achieved or no further optimal feature can be selected. SFS based on Random Forest (RF) is a popular wrapper feature selection method. It evaluates feature importance by constructing a Random Forest and gradually selects features to build the best model possible. SFS allows for stepwise selection of features, preventing overfitting, and can find the optimal feature subset within a predefined range of feature numbers. SFS has demonstrated excellent performance in fields such as gene expression data analysis, image processing, and text classification.

LASSO is a widely employed feature selection method known for its interpretability and stability compared with traditional approaches [33,34,35]. The core concept behind LASSO is to select features based on linear regression while imposing a constraint that the sum of the absolute values of the independent variable coefficients is less than a threshold. By minimizing the sum of squared residuals, the LASSO method forces the coefficients of weakly correlated independent variables with the target variable to become zero. There are several techniques available to solve the LASSO model and, in this study, we employ the widely recognized Least Angle Regression (LAR) method [36,37], known for its computational efficiency. The LAR algorithm identifies the next most influential feature along the direction defined by the first two selected features, requiring only a few iterations of least squares fitting to obtain the feature variable coefficients β, thereby achieving a rapid solution.

2.4. Predictive Modeling

2.4.1. Deep Forest Model

The Deep Forest Model (DF) consists of a Multi-Grained Scanning (MGS) structure and a cascading forest structure. The process of the multi-grain size scanning structure is illustrated in Figure 3. Each sample in the input dataset is divided into multiple subsamples by MGS, and each subsample contains a continuous segment of feature vectors that are fed into a Random Forest (R-Forest) and a completely random tree forest (E-Forests) for prediction. Each Random Forest outputs a real value, so a Deep Forest model will obtain multiple real numbers as prediction outputs and stitch these results into a vector to input into the cascade forest structure for further prediction.

One of the sources of the superior performance of deep neural networks is their multilayer connection structure [38]; similarly, the cascade forest structure in the Deep Forest model also adopts a multilayer cascade structure. The cascade forest consists of multiple cascade layers, each consisting of two R-Forests and two E-Forests. As shown in Figure 4, each Random Forest is trained after receiving the multi-grain scan data features and using the out-of-bag estimation as the prediction of the corresponding samples. The four feature vectors will be combined into a new vector and spliced with the features input from the previous layer into new features. After the second cascade layer receives the features passed down from the first layer, in addition to outputting the new features, it will also compare the prediction performance of this layer with the previous layer using an evaluation function. If the performance improvement exceeds a set threshold, the new cascade layer is retained and the category vector of the previous layer is updated with the new category vector and then passed down. When the prediction performance of the new cascade layer does not meet expectations, the cascade forest stops training. It then combines all the previous cascade layers to form the final cascade forest model. The cascade forest’s prediction value is obtained by averaging the output values of the last cascade layer.

2.4.2. Deep Multilayer Perceptron

Deep Multilayer Perceptron (Deep MLP) is a deep neural network used to efficiently predict nonlinearly varying data, typically with more than three hidden layers. Figure 5 illustrates a multilayer perceptron network structure with a 3-component input layer and a 1-component output layer with 4 hidden layers, with each layer containing 4 neurons. The

f (x)

representative activation function can be one of the Sigmoid, Tanh, or ReLU functions. The role of the activation function is to introduce nonlinear operations into the learning network so that the network can approximate any nonlinear function and substantially improve the model generalization ability. The most widely used activation function is the ReLU activation function, which avoids the “gradient disappearance” defect, i.e., y is no longer sensitive to the increase in x after a large value of x, compared with the Sigmoid and Tanh functions.

The parameters to be solved in the Deep MLP model are the connection weights

w_{i, j, k}

and the bias constants

b_{i, j, k}

. The convex optimization objective generalization of the Deep MLP model is established using the prediction error least squares criterion and regularization constraints. The convex optimization objective function is shown in Equation (2):

\min_{w_{i, j, k}, b_{i, j, k}^{}} [\sum_{i = 1}^{N} {(d_{i} - y_{i})}^{2} + μ (\sum w_{i, j, k}^{2} + \sum b_{i, j, k}^{2})]

(2)

where

d_{i}

represents the true value and

y_{i}

is the predicted value. The equation is solved using the error Back Propagation strategy. Deep MLP has significant improvement in model generalization ability and prediction accuracy compared with the classical neural network model due to the modules of multiple hidden layers, multiple neural nodes, and activation functions.

2.4.3. Convolutional Neural Network

Convolutional neural network (CNN) belongs to one of the deep learning methods, and its network architecture is shown in Figure 6. CNN is composed of three fundamental components: the convolutional layer, the pooling layer, and the activation layer [39]. The role of the convolutional layer is to first convolve the input one-dimensional feature data with a one-dimensional convolutional kernel, and then use the activation function to non-linearize the data after the convolutional operation [40].

The main function of the pooling layer is to remove some redundant information and extract the important features while maintaining feature invariance, i.e., feature extraction and data dimensionality reduction [41]. The common pooling methods are divided into mean pooling, maximum pooling, etc., and mean pooling is less used because its performance is inferior to that of maximum pooling [42,43].

The fully connected network layer encodes the local features into the global features of the convolutional neural network input data. It then adjusts and updates the model parameters by calculating the error between the model prediction and the true label [39], using a backward propagation algorithm. After several iterations of training, the loss bias is converged and the model parameters are no longer updated. At this point, the optimal values of the algorithm parameters are obtained, and the optimal model parameters are saved for subsequent testing of the algorithm on the validation set for testing and evaluation [44].

2.5. Model Evaluation

Based on the prediction results of the test set, the goodness of fit (R²) and root mean square error (RMSE) are used as the evaluation indexes of the regression model performance in this paper. The evaluation indicators R² and RMSE are defined as shown in Equations (3) and (4):

R^{2} = \frac{\sum_{i = 1}^{N} {({\hat{y}}_{i} - y_{i})}^{2}}{\sum_{i = 1}^{N} {(y_{i} - \bar{y})}^{2}}

(3)

R M S E = \sqrt{\frac{\sum_{i = 1}^{n} {(y_{i} - \overset{\land}{y_{i}})}^{2}}{n}}

(4)

where

{\hat{y}}_{i}

is the i-th sample predicted value;

y_{i}

is the i-th sample true value;

\bar{y}

is the sample mean; and N is the number of samples.

R² is used to measure the goodness of fit of the regression model to the data; a higher value of R² indicates a better fit of the algorithm and better interpretability [18]. RMSE is used to compare the predictive performance of the model. It measures the average difference between the model’s predicted values and the actual observed values, i.e., the standard deviation of the prediction error. The smaller the value of RMSE, the better the predictive performance of the mode. Therefore, it is widely used to evaluate the model performance in the field of building energy consumption prediction [45,46,47,48]. By using both R² and RMSE, we aim to gain a comprehensive understanding of how well our model captures potential patterns in the data and how accurately it predicts future outcomes.

2.6. Model Interpretation

An effective energy consumption assessment model should not only be accurate but should also provide an appropriate interpretation of the model using the features. Although tree-based machine learning algorithms such as XGBoost and RF can only provide results on feature importance, they do not support providing the effect of each feature on the predicted composite energy consumption of each sample. Furthermore, they do not indicate whether these features are positively or negatively correlated with composite energy consumption.

This study uses the SHAP method to interpret the proposed energy assessment model. SHAP is an algorithm that utilizes a game theoretic approach to interpret the output of any machine learning model. Traditionally, simple models such as decision trees and linear regression have been well visualized, but the visualization of other models, especially those that integrate learning, is not feasible. To address this problem, Lundberg et al. proposed the use of local interpretation to understand advanced decision tree-based models based on game theory [49]. SHAP values were developed to analyze the results of integrated decision tree learning models, especially the importance of output feature parameters. The specific process of interpreting the trained RF model using SHAP values is as follows: firstly, calculate the contribution of each feature vector to the integrated energy consumption, and then count the mean of the absolute SHAP values of all the features of the interpreted samples to obtain the contribution value of each feature to the integrated energy consumption, where the specific formula for the ith feature SHAP value is shown in Equation (5):

ϕ_{i} (f, x^{'}) = \sum_{S \subseteq F \ {x_{i}^{'}}} [\frac{| S |! (M - | S | - 1)!}{M!} * [f (S \cup x_{i}^{'}) - f (S)]]

(5)

where

M

is the number of feature parameters,

F

is the set of all feature parameters,

f

is the interpreted model,

x_{i}

is an instance of the interpreted feature vector, and

x_{i}^{'}

is the i-th feature in the feature vector. S is a subset of

F \ {x_{i}^{'}}

, and

ϕ_{i}

is the SHAP value of the i-th feature.

3. Methodology

The flow of the Deep Forest-based prediction algorithm proposed in this paper is shown in Figure 7. First, the CBECS data set is preprocessed, including outliers, missing values, and other processing methods. Then, the training set, validation set, and test set are divided according to the ratio of 8:1:1. In order to avoid information leakage, we process the training set with feature selection: first, we use PCC for feature filtering, and then we use the forward feature selection method to select key features. Then, the prediction model is built: first, the hyperparameters of the DF model are initialized, and then the selected features are entered into the prediction model for training. After training, the model performance is evaluated using a validation set, and the model parameters are output if the model performance reaches stability, otherwise the model parameters are further adjusted. Model performance is evaluated using R² and RMSE for the trained data. In addition, the model is interpreted using SHAP values, and energy saving strategies are developed based on the interpretation results to provide a reference for the energy management department.

4. Results and Discussion

4.1. Pre-Processing

Skewness, kurtosis, and standard deviation were used to characterize the distribution of the data after removing the 165 data in the CBECS 2012 dataset that had missing values. In this case, skewness measures the asymmetry of the data distribution; kurtosis describes the sharpness or the height of the peaks of the data distribution; and the standard deviation is used to measure the degree of dispersion of the data. Table 1 shows the data normal distribution characteristics of the energy consumption data and energy intensity of 6555 buildings in the CBECS 2012 dataset before and after the removal of outliers. From the raw data, it can be seen that the total energy consumption and total energy intensity data obey skewness and do not follow the assumption of normal distribution. The data contain a high standard deviation, and there are a large number of outliers. After removing the outliers, the right skewness and kurtosis of the data distribution are weakened, bringing the data closer to a normal distribution. The processed dataset consists of 6442 buildings, which are then divided into three parts: 80% for the training set, 10% for the validation set, and 10% for the test set. Here, the validation set is mainly used for tuning the parameters of the network model, and the model parameters are optimized using grid search, which is the most common technique for optimizing multiple parameters at the same time.

4.2. Feature Selection Results

To address the issue of significant redundant data in CBECS 2012, it was crucial to eliminate redundant features before feature selection. Features that exhibited a correlation coefficient greater than 0.9, determined using the Pearson correlation coefficient method, were filtered out. Subsequently, the MI, SFS, and LASSO algorithms were applied to the training dataset to identify the most appropriate features for the EUI prediction model. The results of the MI coefficients between the candidate features, ranked from highest to lowest based on their MI coefficient values, and the EUI data are presented in Figure 8. A higher MI coefficient indicates a better feature for developing the EUI model. Based on the recommendations of Peng et al. [50], features with MI coefficients greater than 0.2 are considered significant (red dotted line). Consequently, the top 28 features are selected to construct the prediction model in this study.

The second feature selection algorithm is SFS, in which the optimal feature subset is selected from all the candidate features using model reconstruction and the recursive pruning of the features with the least ranking importance in the current set. Since the SFS process depends on a specific regression algorithm, each machine learning model extracts different features from the candidate features. Figure 9 shows the feature selection results of the SFS method used for the RF model. The green dots indicate the average MSE values corresponding to the five-fold cross-validation of the selected feature subset in the training dataset. The optimal number of selected features is determined based on the points with the minimum MSE, at x = 30 suitable for having the best MSE (red dotted line), so the top 30 features are selected as the model input features.

The third feature selection method is LASSO regression, which selects the best input features by developing a regularized regression model that removes irrelevant and redundant features by adjusting the penalty parameter. The optimal penalty parameter λ is determined by five-fold cross-validation. The variation of MSE with the penalty parameter λ is shown in Figure 10. The minimum MSE is reached when λ is 0.0077 (Figure 10 red dotted line), but at this time, there are 130 features with coefficients that are not 0, as shown in Figure 11 for the important results of the first 40 features. In this paper, the first 29 features with absolute values of the eigencoefficients more than 0.05 were selected as model inputs (Figure 11 red dotted line).

Table 2 presents the results of the three feature selections (see Appendix A for feature descriptions). As shown in the table, the selected features include four main categories: physical features, occupancy features, geographic features, and equipment features. Floor area (SQFT) is a very important feature in any feature selection algorithm, indicating its crucial role in predicting energy consumption.

4.3. Model Performance Evaluation

Before performing the model evaluation, we considered the uncertainties in each model. Since Deep MLP, CNN, BP, and RBF are all network models, they are particularly sensitive to hyperparameters. Therefore, we selected the best hyperparameters for the network models using a lattice optimization algorithm, which is the conventional method for finding the optimum for multiple hyperparameters simultaneously [51,52,53]. The DF model was not sensitive to hyperparameters, and the hyperparameters of the DF model were set to a reasonable empirical value and then fine-tuned to fit the data set with reference to Zhou et al. [54,55,56]. The hyperparameter settings in the above model are shown in Table 3.

Different feature selection results were developed for three typical deep learning models (DF, Deep MLP, CNN) and two typical machine learning model (BP, RBF) algorithms. Meanwhile, in order to eliminate the randomness of the experiments, we ran the models at each of the 10 different initial points. And the mean values of R² and RMSE, based on the test dataset, were counted, and the statistical results are shown in Table 4. The distribution of the original data was skewed, which led to poor or invalid model predictions (R² of the CNN model is −0.191). To facilitate the comparison between models, this paper mainly predicts the log-transformed Total Energy Consumption (Total EC) data and Total Energy Intensity (Total EUI). According to the effect of predicting Total EC, compared with traditional machine learning models, the deep model had significant advantages in larger data sets, and the overall R² was higher than 0.80. Among them, the Deep Forest model had the best effect, with the highest average R² of 0.89 and the lowest average RMSE of 1.04. It was followed by the Deep MLP model with an average R² of 0.88. The CNN prediction was the worst among the deep models with an average R² of 0.81, and its performance was inferior to the BP neural network in machine learning with an average R² of 0.82. In addition, SFS achieved the best model performance compared with both MI and LASSO feature choices.

The model performance for predicting total EUI using the same approach is shown in Table 5. The results show that compared with the prediction results of Total EC, the prediction results of Total EUI were less satisfactory, and the overall R² was below 0.65, mainly because the correlation between Total EUI and features was weakened by the floor area. The model with the best prediction, Deep Forest, had the highest average R² of 0.58 and the lowest average RMSE (1.03), and this advantage is more significant in predicting Total EUI compared with that of Total EC (average R² of Deep MLP is 0.55, average R² of CNN is 0.38, and average R² of BP is 0.45). In addition, the R² of the RBF model was negative in predicting Total EUI, mainly because RBF assumes that the data are uniformly distributed in the feature space, while this CBECS dataset mostly obeys a skewed distribution, so RBF is not suitable for the prediction task of this dataset. In summary, the DF model is suitable for developing evaluation models for building energy consumption and energy intensity and has the best results when combined with SFS feature selection.

In this study, three feature selection algorithms were combined with five prediction models, and the results showed that the DF model (SFS + DF) using the SFS algorithm performed the best in predicting the total EC. It had the highest average R² value of 0.90 and the lowest average RMSE value of 1.00. Although SFS + Deep MLP also showed better predictive performance with the highest average R² and the lowest RMSE (1.09), it is not the best choice for developing an energy rating model for the following reasons:

The former requires the identification of a large number of suitable hyperparameters, resulting in a complex and time-consuming model development process. In addition, the stability of the model is greatly reduced due to the initial values. In contrast, the SFS + DF model requires fewer hyperparameters and can adaptively select the most suitable parameters, which are less affected by the initial values.
The total EUI prediction results show that SFS + DF has the best performance, further validating the stability of the model. Therefore, this model is the preferred choice for developing models for the overall EC assessment of buildings.
Compared with other network models, Deep Forest is an integrated learning method based on tree modeling, and the prediction process is relatively transparent with good interpretability.

In summary, the SFS + DF model is the best choice for developing energy assessment models. In addition, the high stability and better interpretability exhibited by the model make it more popular in certain application scenarios, especially in situations where model decisions or prediction results need to be interpreted. For example, areas such as control, optimization, and fault diagnosis strategies.

4.4. Model Interpretation

Although the feature selection algorithms can all quantify the impact of input features on energy consumption (output) to some extent, their analysis does not depend on the machine learning model developed and can only provide results on feature importance. As a result, they do not support providing the impact of each feature on the predicted combined energy consumption for each sample and whether these features are positively or negatively correlated with the combined energy consumption. Therefore, the above feature selection algorithm is not the best choice for interpreting the prediction results, while the SHAP values not only provide the interpretation of individual samples, but also maintain consistency in the interpretation of different models.

For the SFS + DF model, the contribution of each feature is calculated and evaluated using SHAP values, as shown in the feature global SHAP interpretation plot in Figure 12a. The x-axis represents the SHAP value of each feature, where positive and negative values represent positive and negative correlations with the input feature and output energy consumption, respectively. The y-axis represents the feature names ranked by feature importance, and each point corresponds to the feature’s SHAP value interpretation. The results show that in the SFS + DF model, the most significant features that positively correlated with Total EC (Total Energy Consumption) are SQFT (floor area) and NWKER (number of employees). A higher floor area and number of employees lead to higher energy consumption. On the other hand, features such as EMCS (Building automation system), RFTILT (Roof tilt), and HWRDCL (How to reduce cooling) are negatively correlated with Total EC, but their effects are relatively small. It is worth noting that certain features exhibit a long tail distribution of SHAP values, such as NELVTR (Number of elevators) and RFGWIN (Number of walk-in units). Although their importance is relatively low, certain extreme values within these features may have a stronger impact on Total EC compared with certain NWKER features. The development of global SHAP interpretation maps for features facilitates the understanding of overall impact trends in existing buildings, aiding energy management efforts in establishing a comprehensive direction for enhancing energy efficiency in the building industry.

For important features, such as SQFT and NWKER, it is important to understand how they affect the predicted energy consumption values for energy efficient building design. Figure 13a,b show the effect of these two features on the total energy consumption, where the horizontal axis shows the values taken by the features and the vertical axis shows the SHAP values of the features, which represent the contribution of the features to the model outputs, with each blue dot representing y. The results are shown for a sample of buildings. The results in Figure 13a show that SQFT = 5 × 10³ is a critical value for the SHAP value of the feature, above which a positive impact is produced and vice versa, and the impact stabilizes for values greater than 6 × 10⁵. The further results in Figure 13b show that the number of employees always has a positive effect, but when the number of employees is less than 100, the corresponding SHAP values are widely distributed, indicating that there is a strong feature interaction in the model.

Although the SHAP value dependence plot of features contains rich information, some of the features interact too strongly, which makes the analysis too complicated. The SHAP value force diagram is used to address this limitation, which refers to the illustration of how these different features affect the output in a certain sample, and the SHAP value force diagram is more easily understood by architects than the feature SHAP value dependence plot. As shown in Figure 14a,b,

E [f (x)]

represents the baseline value of the SHAP value, which is the mean value of the model prediction, where blue represents a negative influence and red represents a positive influence. From the bottom red bar in Figure 14a, it can be seen that 21 insignificant features produce a positive influence of 0.15. In addition, SQFT = 18,000 produces a negative influence of 0.85, while WKHRS = 168 produces a positive influence of 0.61, and so on, to obtain the final predicted value of the sample, which is 20.249. In this sample SQFT, PBAPLUS, etc., are the main factors leading to the decrease in energy consumption, while WKHRS is the main factor leading to the increase in energy consumption. In Figure 14b, a notable difference can be observed compared with the previous figure. In this specific instance, the value of SQFT increases from 18,000 in Figure 14a to 30,000 in Figure 14b. Despite the increase in SQFT, the impact on energy consumption reduction decreases to 0.51. This decrease can be attributed to the concurrent increase in NWKER, which weakens the influence of SQFT on energy consumption. Furthermore, as NWKER rises from 7 in Figure 14a to 60 in Figure 14b, its effect on energy consumption reduction becomes significantly stronger. The development of this figure can provide a reference for building designers to design key features of new buildings to design low-energy consumption buildings.

5. Conclusions

This study proposes a framework for a data-driven commercial building energy assessment model, including five parts: data pre-processing, feature selection, predictive model development, model evaluation, and model interpretation. Taking the CBECS 2012 dataset as an example, the performance of three feature selection methods (filtering method (MI), wrapper method (SFS), and embedding method (LASSO)) was used to select the most favorable features for energy consumption prediction; the features were also used as input to build three deep learning models and two machine learning models and compare their performance. Finally, SHAP theory was used on multiple perspectives to explain the best-performing model and analyze the explanation results.

Among the results of the proposed framework, the feature selection algorithm SFS causes the most obvious improvement in the prediction effect, especially when predicting Total EUI; the algorithm incurs a 2~9.6% improvement in all five prediction algorithms. In addition, the combination of SFS with the DF model exhibits the best performance, achieving an R² value of 0.90. Although the combination of SFS with the Deep MLP model achieves a similar level of accuracy, the Deep MLP model has certain drawbacks compared with the former, such as the complexity of hyperparameter tuning and poorer model stability. So, the SFS + DF model is the best choice for developing building energy assessment models. The model interpretability was analyzed at three levels, namely, the impact of 20 features on the output, the impact of individual features on the output, and the impact of features in a single sample on the output, respectively. The results of the feature global SHAP interpretation plots show that the features with the most significant impact on energy consumption are SQFT and NWKER, which are positively correlated, followed by NELVTR, WKHRS, etc. In addition, certain features (NWKER) have stronger interactions, and it is difficult to analyze the cause of energy consumption from their feature dependence plots, while the single-sample SHAP value force plots are easier to see the roles of features in a certain sample and are not affected by feature interactions. The results show that although SQFT is the most influential factor among them in the global explanation, the influence of certain secondary factors on energy consumption in the SHAP explanation of the sample may be greater than SQFT, such as NWKER, NELVTR, WKHRS, etc. Therefore, architects should take into account the extreme values of non-significant features in addition to the major factors when designing buildings for energy efficiency.

This paper demonstrates the effectiveness of deep learning algorithms in the field of building energy consumption prediction, and the proposed model can provide suggestions and references for building designers, building energy managers, and other related staff.

The limitation of this study is that solving the hyperparameters of the DF model using the grid optimization algorithm does not significantly improve the model performance and is time-consuming. In the future, the predictive performance of the model will be further improved by pruning and optimizing the structure of the DF model for higher dimensional feature sets.

Author Contributions

Investigation, L.T., Z.F. and G.Z.; Conceptualization, Z.F. and G.Z.; Methodology, Z.F., G.Z., M.J., L.T. and Z.W.; Supervision, Z.F. and M.J.; Formal analysis, L.T. and G.Z.; Data curation, G.Z., L.T. and Z.W.; Writing—original draft, Z.F. and G.Z.; Writing—review and editing, G.Z. Resources, Z.F. and M.J.; Funding acquisition, Z.F. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (61563024).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

https://www.eia.gov/consumption/commercial/data/2012/index.php?view=microdata (accessed on 17 May 2023).

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1. The following table shows the interpretation of all the characteristics used in this study.

Feature Name	Feature Description	Feature Name	Feature Description	Feature Selection Method
SQFT	Square footage	DATACNTR	Data center or server farm	MI
NWKER	Number of employees	NFLOOR	Number of floors
FINALWT197	Final replicate weight 197	TRNGRM	Computer-based training room
PCTERMN	Number of computers	EMCS	Building automation system
PRNTRN	Number of printers	WKHRS	Total hours open per week
NELVTR	Number of elevators	RFGWIN	Number of walk-in units
TVVIDEON	Number of TV or video displays	RFGVEN	Refrigerated vending machines
COPIERN	Number of photocopiers	EMCSLT	BAS controls lighting
PBAPLUS	More specific building activity	CHILLR	Central chillers inside the building
LAPTPN	Number of laptops	STDNRM	Student or public computer center
RFGVNN	Number of refrigerated vending machines	GENR	Energy used for electricity generation
RFGICN	Number of ice makers	RFGCOMPN	Number of compact refrigerators
RFGRSN	Number of residential refrigerators	CAPGEN	Electricity generation capability
SERVERN	Number of servers	ECN	Economizer cycle
SQFT	Square footage	PKGHT	Packaged heating units	SFS
WLCNS	Wall construction material	PKGHTTYP	Type of packaged heating
RFTILT	Roof tilt	HTVVAV	Heating ventilation: Central air handling with VAV
EQGLSS	Equal glass on all sides	RCAC	Residential type central A/C
NELVTR	Number of elevators	ACWNWL	Individual room air conditioners
RENOV	Any renovations	OTCLEQ	Other cooling equipment
RENINT	Interior wall re-configuration	EMCS	Building automation system
PBAPLUS	More specific building activity	RDHTNF	Heating reduced during 24 h period
NOCC	Number of businesses	HWRDCL	How reduce cooling
MONUSE	Months in use	ELMANU	Electricity used for manufacturing
WKHRS	Total hours open per week	NGSRC	How to purchase natural gas
NWKER	Number of employees	LABEQP	Laboratory equipment
PRUSED	Bottled gas/LPG/propane used	MCHEQP	Machine equipment
NGHT1	Natural gas used for main heating	RFGRES	Full-size residential-type refrigerator
HEATP	Imputed percent heated	RFGOP	Open case refrigeration units
FINALWT197	Final replicate weight 197	OCSN	Occupancy sensors	LASSO
SQFT	Square footage	ZDHEXP	Imputed district heat expenditures
WKHRS	Total hours open per week	ZFKEXP	Imputed fuel oil expenditures
EMCSLT	BAS controls lighting	MLTMNC	Multiple monitor category
COOK	Energy used for cooking	PKLT	Lighted parking area
STDNRM	Student or public computer center	WTHTEQ	Water heating equipment
ZNGEXP	Imputed natural gas expenditures	RFGRES	Full-size residential-type refrigerator
RFTILT	Roof tilt	GLSSPC	Percent exterior
SERVER	Dedicated servers	ZMFEXP	Glass
GENR	Energy used for electricity generation	CLVCAV	Cooling ventilation: central air-handling unit with CAV
OPNMF	Open during week	ZMFBTU	Imputed major fuels consumption
RCAC	Residential type central A/C	WATR	Energy used for water heating
MONUSE	Months in use	LTNHRP	Percent lit off hours
ECNTYPE	Type of economizer	HEATP	Percent heated

References

Whitney, S.; Dreyer, B.C.; Riemer, M. Motivations, barriers and leverage points: Exploring pathways for energy consumption reduction in Canadian commercial office buildings. Energy Res. Soc. Sci. 2020, 70, 101687. [Google Scholar] [CrossRef]
Jonathon, L.; Victoria, B. From Thousands to Billions: Coordinated Action towards 100% Net Zero Carbon Buildings by 2050; World Green Building Council: Toronto, ON, Canada, 2017. [Google Scholar]
Park, B.K.; Kim, C. Unsteady Heat Flux Measurement and Predictions Using Long Short-Term Memory Networks. Buildings 2023, 13, 707. [Google Scholar] [CrossRef]
Bi, X.; Zhao, J. A novel orthogonal self-attentive variational autoencoder method for interpretable chemical process fault detection and identification. Process Saf. Environ. Protect. 2021, 156, 581–597. [Google Scholar] [CrossRef]
Harish, V.S.K.V.; Kumar, A. A review on modeling and simulation of building energy systems. Renew. Sustain. Energy Rev. 2016, 56, 1272–1292. [Google Scholar] [CrossRef]
Liu, J.; Gong, G.; Liu, R.; Peng, P. Investigation of operation performance of air carrying energy radiant air-conditioning system based on CFD and thermodynamic model. Build. Simul. 2018, 11, 1229–1243. [Google Scholar] [CrossRef]
Nguyen, A.; Reiter, S.; Rigo, P. A review on simulation-based optimization methods applied to building performance analysis. Appl. Energy 2014, 113, 1043–1058. [Google Scholar] [CrossRef]
Shams Amiri, S.; Mueller, M.; Hoque, S. Investigating the application of a commercial and residential energy consumption prediction model for urban Planning scenarios with Machine Learning and Shapley Additive explanation methods. Energy Build. 2023, 287, 112965. [Google Scholar] [CrossRef]
Sun, Y.; Haghighat, F.; Fung, B.C.M. A review of the-state-of-the-art in data-driven approaches for building energy prediction. Energy Build. 2020, 221, 110022. [Google Scholar] [CrossRef]
Nielsen, H.A.; Madsen, H. Modelling the heat consumption in district heating systems using a grey-box approach. Energy Build. 2006, 38, 63–71. [Google Scholar] [CrossRef]
Robinson, C.; Dilkina, B.; Hubbs, J.; Zhang, W.; Guhathakurta, S.; Brown, M.A.; Pendyala, R.M. Machine learning approaches for estimating commercial building energy consumption. Appl. Energy 2017, 208, 889–904. [Google Scholar] [CrossRef]
Arjunan, P.; Poolla, K.; Miller, C. EnergyStar++: Towards more accurate and explanatory building energy benchmarking. Appl. Energy 2020, 276, 115413. [Google Scholar] [CrossRef]
Chen, X.; Singh, M.M.; Geyer, P. Component-based machine learning for predicting representative time-series of energy performance in building design. In Proceedings of the 28th International Workshop on Intelligent Computing in Engineering, Berlin, Germany, 30 June–2 July 2021. [Google Scholar]
Fu, H.; Lee, S.; Baltazar, J.; Claridge, D.E. Occupancy analysis in commercial building cooling energy modelling with domestic water and electricity consumption. Energy Build. 2021, 253, 111534. [Google Scholar] [CrossRef]
Olu-Ajayi, R.; Alaka, H.; Sulaimon, I.; Sunmola, F.; Ajayi, S. Building energy consumption prediction for residential buildings using deep learning and other machine learning techniques. J. Build. Eng. 2022, 45, 103406. [Google Scholar] [CrossRef]
Kim, D.; Lee, Y.; Chin, K.; Mago, P.J.; Cho, H.; Zhang, J. Implementation of a Long Short-Term Memory Transfer Learning (LSTM-TL)-Based Data-Driven Model for Building Energy Demand Forecasting. Sustainability 2023, 15, 2340. [Google Scholar] [CrossRef]
Li, H.; Shang, L.; Li, C.; Lei, J. Research on Air-Conditioning Cooling Load Correction and Its Application Based on Clustering and LSTM Algorithm. Appl. Sci. 2023, 13, 5151. [Google Scholar] [CrossRef]
Dinh, T.N.; Thirunavukkarasu, G.S.; Seyedmahmoudian, M.; Mekhilef, S.; Stojcevski, A. Predicting Commercial Building Energy Consumption Using a Multivariate Multilayered Long-Short Term Memory Time-Series Model. Appl. Sci. 2023, 13, 7775. [Google Scholar] [CrossRef]
Wang, Y.; Zhan, C.; Li, G.; Zhang, D.; Han, X. Physics-guided LSTM model for heat load prediction of buildings. Energy Build. 2023, 294, 113169. [Google Scholar] [CrossRef]
Li, Y.; Tong, Z.; Tong, S.; Westerdahl, D. A data-driven interval forecasting model for building energy prediction using attention-based LSTM and fuzzy information granulation. Sust. Cities Soc. 2022, 76, 103481. [Google Scholar] [CrossRef]
Chandra, Y.P.; Matuska, T. Intelligent data systems for building energy workflow: Data pipelines, LSTM efficiency prediction and more. Energy Build. 2022, 267, 112135. [Google Scholar] [CrossRef]
Deng, H.; Fannon, D.; Eckelman, M.J. Predictive modeling for US commercial building energy use: A comparison of existing statistical and machine learning algorithms using CBECS microdata. Energy Build. 2018, 163, 34–43. [Google Scholar] [CrossRef]
U.S. Energy Information Administration. Commercial Buildings Energy Consumption Survey (CBECS). Available online: https://www.eia.gov/consumption/commercial/ (accessed on 17 May 2023).
Babaei, T.; Abdi, H.; Lim, C.P.; Nahavandi, S. A study and a directory of energy consumption data sets of buildings. Energy Build. 2015, 94, 91–99. [Google Scholar] [CrossRef]
Kumar Mohapatra, S.; Mishra, S.; Tripathy, H.K.; Alkhayyat, A. A sustainable data-driven energy consumption assessment model for building infrastructures in resource constraint environment. Sustain. Energy Technol. Assess. 2022, 53, 102697. [Google Scholar] [CrossRef]
Al-Shargabi, A.A.; Almhafdy, A.; Ibrahim, D.M.; Alghieth, M.; Chiclana, F. Buildings’ energy consumption prediction models based on buildings’characteristics: Research trends, taxonomy, and performance measures. J. Build. Eng. 2022, 54, 104577. [Google Scholar] [CrossRef]
Amasyali, K.; El-Gohary, N.M. A review of data-driven building energy consumption prediction studies. Renew. Sustain. Energy Rev. 2018, 81, 1192–1205. [Google Scholar] [CrossRef]
Jin, W.; Fu, Q.; Chen, J.; Wang, Y.; Liu, L.; Lu, Y.; Wu, H. A novel building energy consumption prediction method using deep reinforcement learning with consideration of fluctuation points. J. Build. Eng. 2023, 63, 105458. [Google Scholar] [CrossRef]
Morteza, A.; Yahyaeian, A.A.; Mirzaeibonehkhater, M.; Sadeghi, S.; Mohaimeni, A.; Taheri, S. Deep learning hyperparameter optimization: Application to electricity and heat demand prediction for buildings. Energy Build. 2023, 289, 113036. [Google Scholar] [CrossRef]
Bengio, Y.; Goodfellow, I.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2017. [Google Scholar]
Gassar, A.A.A.; Cha, S.H. Energy prediction techniques for large-scale buildings towards a sustainable built environment: A review. Energy Build. 2020, 224, 110238. [Google Scholar] [CrossRef]
Liu, X.; Tang, H.; Ding, Y.; Yan, D. Investigating the performance of machine learning models combined with different feature selection methods to estimate the energy consumption of buildings. Energy Build. 2022, 273, 112408. [Google Scholar] [CrossRef]
Guyon, I.; Elisseeff, A. An introduction to variable and feature selection. J. Mach. Learn. Res. 2003, 3, 1157–1182. [Google Scholar]
Han, B.; Li, H.; Wang, Z.; Cui, X. A Feature Selection Approach via LASSO for Aerosol Optical Thickness Estimation. Wuhan Daxue Xuebao (Xinxi Kexue Ban)/Geomat. Inf. Sci. Wuhan Univ. 2018, 43, 536–541. [Google Scholar]
Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Methodol.) 1996, 58, 267–288. [Google Scholar] [CrossRef]
Efron, B.; Hastie, T.; Tibshirani, J.R. Least Angle Regression. Ann. Stat. 2004, 32, 407–451. [Google Scholar] [CrossRef]
Yu-Qiang, L.I.; Tian-Hong, P.; Hao-Ran, L.I.; Xiao-Bo, Z. NIR spectral feature selection using lasso method and its application in the classification analysis. Spectrosc. Spectr. Anal. 2019, 39, 3809–3815. [Google Scholar]
Siniscalchi, S.M.; Yu, D.; Deng, L.; Lee, C. Exploiting deep neural networks for detection-based speech recognition. Neurocomputing 2013, 106, 148–157. [Google Scholar] [CrossRef]
Fu, L.; Tang, Q.; Gao, P.; Xin, J.; Zhou, J. Damage Identification of Long-Span Bridges Using the Hybrid of Convolutional Neural Network and Long Short-Term Memory Network. Algorithms 2021, 14, 180. [Google Scholar] [CrossRef]
Gao, K.; Yang, S.; Liu, S.; Li, X. Transient stability assessment for power system based on one-dimensional convolutional neural network. Autom. Electr. Power Syst. 2019, 43, 18–26. [Google Scholar]
Bao, G.; Wu, T.; Wang, D.; Zhou, X.; Wang, H. Multi-model coupling-based dynamic control system of ladle slag in argon blowing refining process. J. Iron Steel Res. Int. 2023, 30, 926–936. [Google Scholar] [CrossRef]
Qiao, Y.; Wang, Y.; Ma, C.; Yang, J. Short-term traffic flow prediction based on 1DCNN-LSTM neural network structure. Mod. Phys. Lett. B 2021, 35, 2150042. [Google Scholar] [CrossRef]
Zhao, B.; Wang, Z.; Ji, W.; Gao, X.; Li, X. A short-term power load forecasting method based on attention mechanism of CNN-GRU. Power Syst. Technol. 2019, 43, 4370–4376. [Google Scholar]
Mei, Z.; Oguchi, T. A real-time identification method of network structure in complex network systems. Int. J. Syst. Sci. 2023, 54, 549–564. [Google Scholar] [CrossRef]
Cai, W.; Wen, X.; Li, C.; Shao, J.; Xu, J. Predicting the energy consumption in buildings using the optimized support vector regression model. Energy 2023, 273, 127188. [Google Scholar] [CrossRef]
Zheng, P.; Zhou, H.; Liu, J.; Nakanishi, Y. Interpretable building energy consumption forecasting using spectral clustering algorithm and temporal fusion transformers architecture. Appl. Energy 2023, 349, 121607. [Google Scholar] [CrossRef]
Ali, A.; Jayaraman, R.; Mayyas, A.; Alaifan, B.; Azar, E. Machine learning as a surrogate to building performance simulation: Predicting energy consumption under different operational settings. Energy Build. 2023, 286, 112940. [Google Scholar] [CrossRef]
Ding, Y.; Fan, L.; Liu, X. Analysis of feature matrix in machine learning algorithms to predict energy consumption of public buildings. Energy Build. 2021, 249, 111208. [Google Scholar] [CrossRef]
Lundberg, S.M.; Lee, S. A unified approach to interpreting model predictions. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
Hanchuan, P.; Fuhui, L.; Ding, C. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 1226–1238. [Google Scholar] [CrossRef]
Malakouti, S.M. Prediction of Wind Speed and Power with LightGBM and Grid Search: Case Study Based on Scada System in Turkey. Int. J. Energy Prod. Manag. 2023, 8, 35–40. [Google Scholar]
Selcuk Nogay, H.; Adeli, H. Diagnostic of autism spectrum disorder based on structural brain MRI images using, grid search optimization, and convolutional neural networks. Biomed. Signal Process. Control 2023, 79, 104234. [Google Scholar] [CrossRef]
Malakouti, S.M.; Menhaj, M.B.; Suratgar, A.A. The usage of 10-fold cross-validation and grid search to enhance ML methods performance in solar farm power generation prediction. Clean. Eng. Technol. 2023, 15, 100664. [Google Scholar] [CrossRef]
Chen, J.; Tang, J.; Xia, H.; Yu, W.; Qiao, J. Modelling the furnace temperature field of a municipal solid waste incinerator using the numerical simulation and the deep forest regression algorithm. Fuel 2023, 347, 128511. [Google Scholar] [CrossRef]
Zhang, G.; Shin, S.; Jung, J. Cascade forest regression algorithm for non-invasive blood pressure estimation using PPG signals. Appl. Soft Comput. 2023, 144, 110520. [Google Scholar] [CrossRef]
Zhou, Z.; Feng, J. Deep forest. Natl. Sci. Rev. 2019, 6, 74–86. [Google Scholar] [CrossRef]

Figure 1. The framework of the energy consumption assessment model for commercial buildings.

Figure 2. CBECS 2012 building type distribution (a) and energy consumption distribution (b).

Figure 3. Schematic diagram of the multigrain size scanning structure.

Figure 4. Cascade forest structure of the Deep Forest model.

Figure 5. Deep MLP network structure.

Figure 6. Convolutional neural network structure.

Figure 7. Algorithmic framework for Deep Forest-based energy consumption prediction in commercial buildings and its interpretability study.

Figure 8. MI coefficient between features and Total EC.

Figure 9. SFS feature selection process.

Figure 10. Relationship between MSE and log(λ) for LASSO feature selection.

Figure 11. LASSO feature selection top 40 feature importance.

Figure 12. Feature global SHAP interpretation plot (a) and feature importance plot (b).

Figure 13. The SHAP dependence plot for SQFT (a) and the SHAP dependence plot for NWKER (b).

Figure 14. The SHAP value force diagram for sample 1 (a) and the SHAP value force diagram for sample 2 (b).

Table 1. Data distribution characteristics before and after pre-processing.

Variable Type	Raw Data			Data after Cleaning
Variable Type	Skewness	Peak State	Standard Deviation	Skewness	Peak State	Standard Deviation
Total EC	11.2	212.7	9.1 × 10⁷	1.9	2.8	4.2 × 10⁷
Total EUI	3.3	16.5	185.089	1.6	3.0	125.75

Table 2. Results of selecting energy consumption characteristics of commercial buildings.

Feature Selection Method	Features
MI	SQFT, NWKER, FINALWT197, PCTERMN, PRNTRN, NELVTR, TVVIDEON, COPIERN, PBAPLUS, LAPTPN, RFGVNN, RFGICN, RFGRSN, SERVERN, DATACNTR, NFLOOR, TRNGRM, EMCS, WKHRS, RFGWIN, RFGVEN, EMCSLT, CHILLR, STDNRM, GENR, RFGCOMPN, CAPGEN, ECN.
SFS	SQFT, WLCNS, RFTILT, EQGLSS, NELVTR, RENOV, RENINT, PBAPLUS, NOCC, MONUSE, WKHRS, NWKER, PRUSED, NGHT1, HEATP, PKGHT, PKGHTTYP, HTVVAV, RCAC, ACWNWL, OTCLEQ, EMCS, RDHTNF, HWRDCL, ELMANU, NGSRC, LABEQP, MCHEQP, RFGRES, RFGOP
LASSO	FINALWT197, SQFT, WKHRS, EMCSLT, COOK, STDNRM, ZNGEXP, RFTILT, SERVER, GENR, OPNMF, RCAC, MONUSE, ECNTYPE, LTOHRP, OCSN, ZDHEXP, ZFKEXP, MLTMNC, PKLT, WTHTEQ

Table 3. Hyperparameter selection for different models.

Model	Hyperparameter	Selected Value
DF	Max depth	30
	Max layers	15
	Min samples leaf	1
	N trees	300
	predictor	Forest
Deep MLP	Hidden layers	5
	Nodes of the hidden layers	100, 50, 50, 50, 50
	Activations of the hidden layers	linear
	Learning rate	0.01
CNN	Hidden layers	1
	FiltersPadding	32
	Stride	1
BP	Hidden layers	1
	Nodes of the hidden layers	64
	Activations of the hidden layers	Linear
	Learning rate	0.01
RBF	Nodes of the hidden layers	125
RBF	sigma	1.0

Table 4. The comparison of model performance for energy consumption prediction.

Algorithm	MI		SFS		LASSO		Average R²	Average RMSE
Algorithm	R²	RMSE	R²	RMSE	R²	RMSE	Average R²	Average RMSE
DF	0.88	1.07	0.90	1.00	0.89	1.06	0.89	1.04
Deep MLP	0.87	1.14	0.89	1.09	0.88	1.09	0.88	1.11
CNN	0.81	1.37	0.81	1.38	0.80	1.50	0.81	1.42
BP	0.82	1.35	0.82	1.35	0.81	1.38	0.82	1.36
RBF	0.70	1.72	0.73	1.64	0.71	1.70	0.71	1.70

Table 5. The comparison of model performance for Total EUI prediction.

Algorithm	MI		SFS		LASSO		Average R²	Average RMSE
Algorithm	R²	RMSE	R²	RMSE	R²	RMSE	Average R²	Average RMSE
DF	0.55	1.08	0.61	1.00	0.59	1.02	0.58	1.03
Deep MLP	0.53	1.13	0.57	1.05	0.56	1.08	0.55	1.08
CNN	0.37	1.38	0.40	1.33	0.39	1.26	0.38	1.32
BP	0.45	1.20	0.47	1.17	0.44	1.20	0.45	1.19
RBF	negative	2.46	negative	2.03	negative	2.12	negative	2.20

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zheng, G.; Feng, Z.; Jiang, M.; Tan, L.; Wang, Z. Predicting the Energy Consumption of Commercial Buildings Based on Deep Forest Model and Its Interpretability. Buildings 2023, 13, 2162. https://0-doi-org.brum.beds.ac.uk/10.3390/buildings13092162

AMA Style

Zheng G, Feng Z, Jiang M, Tan L, Wang Z. Predicting the Energy Consumption of Commercial Buildings Based on Deep Forest Model and Its Interpretability. Buildings. 2023; 13(9):2162. https://0-doi-org.brum.beds.ac.uk/10.3390/buildings13092162

Chicago/Turabian Style

Zheng, Guangfa, Zao Feng, Mingkai Jiang, Li Tan, and Zhenglang Wang. 2023. "Predicting the Energy Consumption of Commercial Buildings Based on Deep Forest Model and Its Interpretability" Buildings 13, no. 9: 2162. https://0-doi-org.brum.beds.ac.uk/10.3390/buildings13092162

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Predicting the Energy Consumption of Commercial Buildings Based on Deep Forest Model and Its Interpretability

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Description

2.2. Data Pre-Processing

2.3. Feature Selection

2.4. Predictive Modeling

2.4.1. Deep Forest Model

2.4.2. Deep Multilayer Perceptron

2.4.3. Convolutional Neural Network

2.5. Model Evaluation

2.6. Model Interpretation

3. Methodology

4. Results and Discussion

4.1. Pre-Processing

4.2. Feature Selection Results

4.3. Model Performance Evaluation

4.4. Model Interpretation

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI