A Multi-View Ensemble Width-Depth Neural Network for Short-Term Wind Power Forecasting

Wan, Jing; Huang, Jiehui; Liao, Zhiyuan; Li, Chunquan; Liu, Peter X.

doi:10.3390/math10111824

Open AccessArticle

A Multi-View Ensemble Width-Depth Neural Network for Short-Term Wind Power Forecasting

¹

The School of Qianhu, Nanchang University, Nanchang 330031, China

²

The School of Information Engineering, Nanchang University, Nanchang 330031, China

³

The Department of Systems and Computer Engineering, Carleton University, Ottawa, ON K1S 5B7, Canada

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Mathematics 2022, 10(11), 1824; https://0-doi-org.brum.beds.ac.uk/10.3390/math10111824

Submission received: 23 March 2022 / Revised: 13 May 2022 / Accepted: 17 May 2022 / Published: 25 May 2022

(This article belongs to the Special Issue Numerical Simulation and Computational Methods in Engineering and Sciences)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Short-term wind power forecasting (SWPF) is essential for managing wind power systems management. However, most existing forecasting methods fail to fully consider how to rationally integrate multi-view learning technologies with attention mechanisms. In this case, some potential features cannot be fully extracted, degenerating the predictive accuracy and robustness in SWPF. To solve this problem, this paper proposes a multi-view ensemble width-depth neural network (MVEW-DNN) for SWPF. Specifically, MVEW-DNN consists of local and global view learning subnetworks, which can effectively achieve more potential global and local view features of the original wind power data. In MVEW-DNN, the local view learning subnetwork is developed by introducing the deep belief network (DBN) model, which can efficiently extract the local view features. On the other hand, by introducing the attention mechanism, a new deep encoder board learning system (deBLS) is developed as the global view learning subnetwork, which provides more comprehensive global information. Therefore, by rationally learning the effective local and global view features, MVEW-DNN can achieve competitive predictive performance in SWPF. MVEW-DNN is compared with the state-of-the-art models in SWPF. The experiment results indicate that MVEW-DNN can provide competitive predictive accuracy and robustness.

Keywords:

renewable energy; wind power forecasting; hybrid model; machine learning

MSC:

65-04

1. Introduction

Since wind power has clean and pollution-free features compared with traditional energy sources, it has become an important part of modern power systems [1,2,3]. In fact, accurate wind power forecasting (WPF) is becoming increasingly important because it can optimize the generation schedules and units, as well as improve the profitability and stability of the power system [4,5]. However, it is still a challenging task to obtain accurate and robust WPF due to the uncertainty, volatility, and intermittency of wind speed [6].

To improve the predictive accuracy and robustness in SWPF, various forecasting methods have been developed. These systems can be divided into physical methods, statistical methods, and machine learning methods [7]. Physical methods mainly rely on numerical weather prediction (NWP) information such as atmospheric pressure, temperature, and relative humidity [8]. For example, Zjavka et al. [9] designed a wind power forecasting system by polynomial decomposition of the general differential equation. Jacondino et al. [10] proposed a weather and research forecasting (WRF) system for forecasting wind power from two different wind farms.

Statistical methods involve the application of autoregressive dynamic adaptive (ARDA) models [11], Bayesian models [12], autoregressive moving average (ARMA) models [13], Gaussian mixture models [14,15], and the quantile regression neural network (QRNN) models [16]. In ultra-short wind power forecasting (UWPF), wind power data are almost linear. Since statistical and physical methods can be easily formulated into linear predictive models, they provide promising predictive results for UWPF [17]. However, different from UWPF, short-term wind power forecasting (SWPF) has higher volatility and more uncertain power load data. Therefore, statistical and physical methods cannot handle such nonlinear characteristics information in SWPF [18].

To obtain better predictive performance in SWPF, various machine learning models have been developed. Because machine learning effectively constructs the nonlinear mapping relationship between the input and output of wind power data, it can effectively learn and mine the nonlinear characteristics from wind power data [19]. The commonly used machine learning models are support vector machines (SVRs) [20], deep belief networks (DBNs) [21], echo state networks (ESNs) [22], extreme learning machines (ELMs) [23], and broad learning systems (BLSs) [24].

For example, as a promising deep learning network, DBN is composed of multiple restricted Boltzmann machines, which provides powerful nonlinear data processing capability. However, DBN can produce high-dimensional features because of multiple BPNN layers. This may limit the prediction performance [25]. BLS is a new single-layer incremental neural network. Its advantages lie in fast computing speed, low computing resource consumption, easy online incremental learning, and easy expansion. However, BLS needs to perform the random nodes selection and pseudo-inverse calculation, so that its predictive accuracy is often inferior to deep-learning networks in the face of large-scale data. Furthermore, a single BLS model may have problems such as over-training, poor generalization ability, or limited prediction accuracy [26].

Wind power can be significantly influenced by many natural factors such as geographical location, weather conditions, and seasonal effects [27]. To overcome the instability and intermittent nature of the time series in SWPF, the combination of the decomposition-based method and the machine learning model has been proven to be an effective solution [28]. Chen et al. [29] used the discrete wavelet transform to decompose PV output power. The decomposed subsequences were then fed into an adaptive neuro-fuzzy inference system (ANFIS) to predict the short-term PV output power. Wang et al. [30] developed the VMD-CISSA-LSSVM model, consisting of the variational modal decomposition (VMD) data preprocessing method, the sparrow search algorithm (SSA), and the least squares support vector machine (LSSVM) model, which has high prediction accuracy and stable prediction results. Devi et al. [31] rationally combined ensemble empirical mode decomposition (EEMD), cuckoo search optimization algorithm, and an improved LSTM to improve forecasting accuracy. Zhang et al. [15] integrated CEEMDAN to Gaussian process regression, which can also obtain promising prediction performance.

Furthermore, to further improve the prediction ability of a single machine learning model, hybrid models are also considered effective solutions. This because the hybrid models can compensate for the disadvantage of each method. For example, Zhao et al. [32] used CNN and attention mechanisms to provide more reliable multitask learning accuracy. In [33], the attention mechanism is combined with the gated recurrent unit (GRU) network, obtaining robust prediction performance. In [34], a novel genetic long short-term memory (GLSTM) framework was developed to provide accurate, reliable, and robust performance in SWPF. The genetic technology is used to automatically optimize LSTM parameters according to different wind power data. Khan et al. [35] combined autoencoder (AE) and bidirectional long short-term memory (BiLSTM) to form a novel hybrid model. Duan et al. [36] proposed a novel hybrid model consisting of VMD, LSTM, and PSO-DBN. Wu et al. [37] used the charged system search (CSS) algorithm to construct a hybrid model, which consists of least-squares-support vector machines (LS-SVM), a modified artificial neural network (ANN), and an adaptive network-based fuzzy inference system (ANFIS) model. Ogliari et al. [38] built a hybrid model by combining the physical and ANN-developed models, which can give some knowledge about the wind turbine characteristics to the ANN model. Hong et al. [39] effectively combined CNN with a radial basis function neural network (RBFNN). Here, RBFNN is built for dealing with uncertain characteristics, and CNN is built for extracting wind power characteristics, so that it also has outstanding prediction performance. Ribeiro et al. [40] proposed an ensemble learning model by introducing bagging and stacking, which integrated the samples through the arithmetic and weighted average values.

Note that some multi-view hybrid models have recently also gained outstanding prediction performance. For example, Lai et al. [41] proposed a multi-view neural network by rationally combining LSTM model with RBFNN model for short- and mid-term load forecast, which has higher generalization ability. Nguyen et al. [42] decomposed time series data into a closely related time series group and a loosely related group, which are fed into a multitask learning model and a multi-view learning part, respectively. Zhong et al. [43] devised a multi-view deep forecast model for solar irradiance forecast, which uses multiple related data sets to feed into the RBFNN model and one hybrid model (MvDF).

Although the existing methods have achieved great success, there are still some problems that need to be further solved as follows:

For the SWPF task, the existing machine learning methods rarely consider the compromise between prediction accuracy and computation cost. Although BLS [24] has attracted considerable attention for its fast training speed and incremental learning algorithm, the prediction performance of BLS is also limited. Because BLS has problems such as the randomness of its parameter settings [26] and its sensitivity to the number of enhanced nodes, vulnerability to noise, and lack of uncertainty expression ability. Furthermore, although various deep learning models can achieve promising performance, computation costs are high.
Most hybrid machine learning methods rarely consider how to establish a multi-view learning mechanism, which may degenerate the predictive accuracy. Although researchers [15,30,31] have introduced decomposition in single models, the models cannot comprehensively learn original data characteristics, so that the robustness of the models is limited.
The attention mechanism is integrated into CNN [32] and GRU [33] for obtaining stable performance and providing feature selection, respectively. However, this technology is rarely considered for adjusting network structures while applying to different regression tasks, which may limit the improvement of model performance on other data sets.
Most existing models seldom consider how to effectively and stably connect two distinctly different models. For instance, a fully connected neural network is used to connect two different models in [35], which may cause gradient disappearing or exploding while connecting two much different models.

From the above analysis, our investigation mainly considers how to effectively integrate a decomposition mechanism and a multi-view learning mechanism into machine learning for efficient and stable SWPF. Considering the above motivation, this paper proposes a multi-view ensemble width-depth neural network (MVEW-DNN), which involves a local view learning subnetwork (CEEMDAN-DBN), a global view learning subnetwork (deBLS), and a feature dynamic decisionmaker (FDDM). The local view learning subnetwork covers CEEMDAN and a deep belief network (DBN), which can effectively extract part features of wind power load data. On the other hand, the global view learning subnetwork is an encoder board learning system (deBLS), which can provide more comprehensive features. In addition, the effective feature dynamic decisionmaker (FDDM) is developed to effectively fuse the local view learning subnetwork and the global view learning subnetwork.

The main contributions are as follows:

A novel width-depth integration model with a global view and a local view is introduced for short-term wind power forecasting. Different from other multi-view models, our model focuses on improving model performance and reducing the computational cost of the global view learning subnetwork (deBLS) as much as possible.
The deBLS model is developed by rationally replacing the feature nodes with the multiple encoder nodes, which can improve the learning ability of BLS. Furthermore, the deBLS introduced an attention mechanism of adjusting the enhancement nodes to achieve higher prediction accuracy.
An effective feature fusion model FDDM is proposed for rationally combining the deBLS and CEEMDAN-DBN, which can promise optimal predictive performance.

The rest of this paper is arranged as follows. The framework of the model and related theoretical knowledge are introduced in Section 2. Section 3 analyses the test data and details the concrete case analysis. Section 4 is the conclusion of this paper.

2. The Proposed MVEW-DNN

As shown in Figure 1, the proposed MVEW-DNN is divided into global view and local view learning subnetworks. In the local view learning subnetwork (CEEMDAN-DBN), CEEMDAN decomposes the original wind power data into multiple local view components. Then, the DBN network is used to extract the features of the local view components. In the global view learning subnetwork (deBLS), the original wind power data are regarded as the global view data. The deBLS model is developed to learn the features of the global view data. Finally, the FDDM method is developed to fuse CEEMDAN-DBN and deBLS by dynamically adjusting the fusion parameters. The FDDM method can monitor the performance of deBLS and CEEMDAN-DBN in the training phase, which can obtain the best model fusion performance.

2.1. Local View Subnetwork

Wind power data have high uncertainty and volatility, degenerating the predictive accuracy. To address these problems, CEEMDAN is applied to decompose the original wind power data into multiple smooth local view components called eigen-modes [44], and DBN is used to effectively capture more local view characteristics of wind power data.

2.1.1. Empirical Mode Decomposition with Adaptive Noise (CEEMDAN)

In contrast with EMD and EEMD, CEEMDAN overcomes the modal aliasing of EMD and the inefficiency of EEMD on some closely spaced spectral signals. Furthermore, CEEMDAN adds white noise to each decomposition stage so that the noise of the original data and the added white noise are superimposed and cancel each other. Therefore, CEEMDAN gradually eliminates the reconstruction error in the iterative process, which ensures the accuracy of the decomposition and greatly improves the influence of the modal aliasing. The details of CEEMDAN are given as follows:

Assumption 1.

y_{0} (t)

is defined as the original signal.

n^{m} (t)

is defined as a Gaussian white noise signal with a standard normal distribution, and

m \in

[1, M].

β

is defined as the noise coefficient of

n^{m} (t)

.

E [*]

is defined as the EMD decomposition.

\bar{{IMF}_{k}} (t)

is defined as the kth intrinsic mode function obtained by the CEEMDAN.

Step 1: Add Gaussian white noise to $y_{0} (t)$ to get a new signal $y_{0} (t) + {(- 1)}^{q} β_{0} n^{m} (t)$ , $q = 1, 2$ . Perform EMD on the new signal to get the $I M F_{1}^{m} (t)$ :

$E (y_{0} (t) + {(- 1)}^{q} β_{0} n^{m} (t)) = I M F_{1}^{m} (t) + r^{m}$

(1)
Step 2: Average $I M F_{1}^{m} (t)$ to get $\bar{I M F_{1}} (t)$ , as show in Formula (2). Then, calculate the residual $r_{1} (t)$ after removing $\bar{I M F_{1}} (t)$ :

$\bar{I M F_{1}} (t) = \frac{1}{M} \sum_{m = 1}^{M} I M F_{1}^{m} (t)$

(2)

$r_{1} (t) = y_{0} (t) - \bar{I M F_{1}} (t)$

(3)
Step 3: Add Gaussian white noise to $r_{1} (t)$ to get a new signal and perform EMD on the new signal to get the $I M F_{2}^{m} (t)$ . Then, $\bar{I M F_{2}} (t)$ and residual $r_{2} (t)$ can be obtained:

$\bar{I M F_{2}} (t) = \frac{1}{M} \sum_{m = 1}^{M} I M F_{2}^{m} (t)$

(4)

$r_{2} (t) = r_{1} (t) - \bar{I M F_{1}} (t)$

(5)
Step 4: The above steps are repeated until the obtained residual signal is a monotonic function and the decomposition signal cannot be continued. At this time, the number of intrinsic mode function is $K$ . Finally, the original signal $y_{0} (t)$ is decomposed into

$y_{0} (t) = \sum_{k = 1}^{K} \bar{I M F_{k}} (t) + r_{K} (t)$

(6)

2.1.2. Deep Belief Network (DBN)

DBN is a probability map model that includes multiple restricted Boltzmann machines (RBM). Here, it is used to effectively extract effective local multiple view features from the decomposed smooth signals of CEEMDAN. The training process of DBN can be divided into two phases, namely pretraining and fine-tuning. In the pretraining phase, RBM is trained in an unsupervised manner. In the fine-tuning phase, DBN is treated as a backward propagation neural network for supervised learning that can fine-tune the model parameters.

2.2. Global View

Although decomposition methods such as CEEMDAN can decompose the original wind power load data into multiple smooth local view components, it may lead to the loss of the original wind data during the entire decomposed process. To compensate for the information loss from the above decomposed view components, we develop a deep encoder board learning system (deBLS) to extract global view information from original wind power data.

In the traditional BLS, its feature node generation method is feature mapping, and its number of enhancement nodes is preset. Different from the traditional BLS, deBLS has improvements in how it generates both feature nodes and enhancement nodes. Specifically, the deBLS method of generating feature nodes contains not only feature mapping but also a sparse encoder. Its number of enhanced nodes can also be automatically adjusted via an attention mechanism. The details of deBLS are given as follows:

Definition 1.

X

and

\hat{Y}

are defined as the input and output of deBLS, respectively.

J_{i}

is defined as the feature nodes,

i = 1, 2, \dots, n

.

J^{n} = [J_{1}, J_{2}, \dots J_{n}]

is defined as the combination of all feature nodes;

δ_{e}

, and

δ_{h}

are defined as bias matrices; and the four matrices are fine-tuned by a sparse encoder.

Step 1: By feature mapping, the original data $X$ is mapped into $n$ nodes $K_{i}$ , as shown in Formula (7). Here, $η (\cdot)$ is a linear transform. Then, by increasing the depth, a three-layer sparse encoder is used to perform feature extraction on nodes $K_{i}$ to obtain $J^{n}$ .

$K_{i} = η ({XW}_{e_{i}} + δ_{e_{i}}), i = 1, 2, \dots, n$

(7)
Step 2: By enhancing and transforming with $J^{n}$ , the enhancement nodes $E_{k}$ can be obtained as

$E_{k} = t a n h (J^{n} W_{h_{k}} + δ_{h_{k}}), k = 1, 2, \dots, m$

(8)

Note that the number of enhancement nodes has an impact on the prediction performance. To obtain a suitable number of enhancement nodes, we introduce the attention mechanism to deBLS, which can automatically adjust the number of nodes of deBLS to the most suitable number in the training phase. Here, a detailed pseudo-code for the attention mechanism algorithm is given in Algorithm 1. When the number of enhancement nodes is determined,

E^{m}

and

B

can be obtained.

Algorithm 1: Attention Mechanism Algorithm.

// Our attention mechanism Algorithm is located on lines 11 to 13
Original data

X

is divided into training set and testing set. The training set consists of train-x and train-y, the testing set consists of test-x and test-y.
Input: train-x, train-y, test-x
Output:

W

Process:
while 1

1:: if $t h e t r a i n i n g e r r o r t h r e s h o l d i s n o t s a t i s f i e d$ do
2:: k = k + 1, m = m + 1;
3:: Random $W_{h_{k}}$ , $δ_{k}$ ;
4:: Calculate $E_{m}$ = [ $ξ (J^{n} W_{k} + δ_{h_{k}})$ ];
5:: Set $E^{m} = [E_{1}, E_{2}, \dots, E_{m}]$ , $B = [J^{n} | E^{m}]$ ;
6:: $r$ is the number of rows of matrix $B^{'}$ ;
7:: $I$ is a $r \times r$ unit matrix;
8:: Set parameter $C$ as $2^{- 30}$ ;
9:: Calculate $W^{m}$ by $\frac{B^{'} \times t r a i n - y}{(B^{'} \times B + I \times C)}$ ;
10:: Calculate $Y$ by $B \times W^{m}$ ;
11:: Calculate the cosine similarity between train-y and $Y$ as $δ$ ;
12:: $W^{m} = W^{m} + δ \times ω$ , $ω$ is a parameter $\in [0, 1]$ ;
13:: Update $Y$ by $Y = B \times W^{m}$ ;
14:: Calculate $t h e t r a i n i n g e r r o r$ between $Y$ and train-y;
15:: else
16:: break;
17:: end
18:: end

Step 3: Calculate the weight matrix $W$ . The predictive result $\hat{Y}$ can be expressed as $\hat{Y} = B W$ . Furthermore, during the training phase, the actual value $Y$ is known, and therefore, the weight matrix $W$ can be calculated as shown in Formula (9), where $B^{+}$ is the pseudo-inverse of $B$ :

$W = B^{+} Y$

(9)

To obtain a suitable

W

, the ridge regression is used to transform the above problem into

\arg \underset{W}{m i n} ({‖ \hat{Y} - Y ‖}_{2}^{2} + λ {‖ W ‖}_{2}^{2})

. Here,

λ

is the regularization parameter; when

λ \to 0

,

W = {(λ I + B B^{T})}^{- 1} B^{T} Y

where

I

is the identity matrix. Thus,

B^{+}

can be obtained as

B^{+} = \underset{λ \to 0}{l i m} ({(λ I + B B^{T})}^{- 1} B^{T}) .

(10)

2.3. FDDM

FDDM was developed to reasonably allocate the fusion weights of deBLS and CEEMDAN-DBN on the training stage and the test stage by dynamically adjusting the fusion parameters. This can achieve the optimal prediction performance of the proposed hybrid MVEW-DNN model (Algorithm 2).

Algorithm 2: FDDM Algorithm.

ℳ

: Cross-validation fold

N

: The length of the test set (the number of output nodes)

W_{ℬ}

: Fusion weight of deBLS

V_{ℬ} (\hat{V_{ℬ}})

: Prediction validation data (Real validation data) of deBLS

T_{ℬ}

: Prediction test data of deBLS

W_{D}

: Fusion weight of DBN

V_{D} (\hat{V_{D}})

: Prediction validation data (Real validation data) of DBN

T_{D}

: Prediction test data of DBN

ρ

: Correlation test threshold

Error of each node : ε (x, y) = \frac{| x_{i} - y_{i} |}{{‖ x - y ‖}_{2}}, i ϵ [1, N]

(11)

Correlation value : λ (x, y) = \frac{| \sum_{i = 1}^{n} (x_{i} - \bar{x}) (y_{i} - \bar{y}) |}{\sqrt{\sum_{i = 1}^{n} {(x_{i} - \bar{x})}^{4} \sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{4}}}

(12)
Process:

1:: $W_{ℬ} = W_{D} = z e r o [ℳ, N];$
2:: $ρ = 2$ ; $ℳ = 4;$
3:: for $m = 1; m \leq ℳ$ do
4:: Obtain $V_{ℬ}^{m}$ and $\hat{V_{ℬ}^{m}}$ from $V_{ℬ}$ and $\hat{V_{ℬ}}$ , respectively.
5:: Obtain $V_{D}^{m}$ and $\hat{V_{D}^{m}}$ from $V_{D}$ and $\hat{V_{D}}$ , respectively.
6:: for $n = 1; n \leq N$ do
7:: Calculate error $ε_{ℬ}$ between $V_{ℬ}^{m}$ and $\hat{V_{ℬ}^{m}}$ by Formula (11).
8:: Calculate error $ε_{D}$ between $V_{D}^{m}$ and $\hat{V_{D}^{m}}$ by Formula (11).
9:: if $ε_{ℬ}^{n}$ > $ε_{D}^{n}$ then
10:: $W_{ℬ} (m, n) = 1;$ // Give deBLS a higher fusion weight.
11:: else $W_{D} (m, n) = 1;$
12:: end
13:: end
14:: for $m = 1; m \leq ℳ$ do
15:: Calculate $λ$ between $W_{ℬ} (m, :)$ and $W_{D} (m, :)$ by Formula (12).
16:: if $λ < ρ$ then
17:: $W_{ℬ} (m, :) = W_{D} (m, :)$ = [ ];// Clear $W_{ℬ} (m, :)$ and $W_{D} (m, :)$ .
18:: end

end

Here, the original data are divided into training data set, validation data set, and the test data set, whose proportion is set as 6:2:2. The cross-validation fold parameter

ℳ

is set as 4, which keeps the same weight length of validation data set and test data set. Therefore, the final prediction data can be defined as

P = W_{ℬ} * T_{ℬ} + W_{D} * T_{D}

. Additionally, the parameter

ρ

is set as 2, which is determined by our experimental tests [45]. In general, MVEW-DNN can effectively provide the prediction of SWPF by rationally combining deBLS and CEEMDAN-DBN.

2.4. Evaluation Criteria

To test the performance of the proposed MVEW-DNN, we use three evaluation indicators: Normalized Root Mean Square Error (NRMSE), Normalized Mean Absolute Error (NMAE), and Theil’s inequality coefficient (TIC), as shown in Formulas (13)–(15). NRMSE and NMAE are utilized to measure the deviation between the actual value and the predictive value. TIC is applied to characterize the predictive performance of predictive models. The smaller the NRMSE and NMAE are, the stronger the non-linear approximation capability is. The closer to 0 the TIC is, the stronger the spatial learning ability is:

N R M S E = \frac{\sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2}}}{{\hat{y}}_{m a x} - {\hat{y}}_{m i n}}

(13)

N M A E = \frac{\frac{1}{N} \sum_{i = 1}^{N} | {\hat{y}}_{i} - y_{i} |}{{\hat{y}}_{m a x} - {\hat{y}}_{m i n}}

(14)

T I C = \frac{\sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2}}}{\sqrt{\frac{1}{N} \sum_{i = 1}^{N} y_{i}^{2}} + \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {\hat{y}}_{i}^{2}}}

(15)

where

{\hat{y}}_{i}

and

y_{i}

are the predictive and actual values at the time

i

,

N

is the observation size.

3. Results and Discussion

3.1. Data Description and Experiment Settings

3.1.1. Data Description

The real wind power data from the Wind Forecasting track of the Global Energy Forecasting Competition 2021 (GEF-Com2021) are used to test the predictive performance of MVEW-DNN [46]. This data set contains the wind power measurement values and wind speed data from seven wind farms (WF1–WF7). The time resolution of the data is one hour. In [46], the data set was normalized. The data set contains data from 1 a.m. on 1 July 2009 to 12 a.m. on 26 June 2012. The entire data set contains 18,757 samples. In experiments, we consider the forecasts for the next 24 h. To effectively test the predictive models, the data from the last 24 h of the data set is first excluded. Then, the data set is divided into training set (75%) and testing set (25%).

Wind power data form a random, seasonal, nonlinear time series. Figure 2 displays the wind power data characteristics of the seven wind farms (wp1–wp7) in a box diagram. By analyzing the data characteristics of the seven wind farms, we find that there are inevitably abnormal data points in the data set. The abnormal data points are mainly caused by uncontrollable factors such as breakdown and operation planning [47].

Figure 3 shows some data points that correspond to high wind speeds and zero power generation. These abnormal data points are mainly caused by the shutdown of the turbine [48]. Therefore, prediction systems are required to provide high prediction accuracy and robustness when performing SWPF tasks.

3.1.2. Experiment Settings

The parameter settings of the proposed MVEW-DNN model are given as follows. MVEW-DNN mainly consists of the local learner (DBN) and the global learner (deBLS). First, when the original data are decomposed by the CEEMDN algorithm, the signal-to-noise ratio of CEEMDAN (

N_{s t d}

) is set as 0.01. The number of noise additions is set to 6. The maximum number of iterations is 2000. Then, the sub-models (IMFs) are split into training set (train-x, train-y) and testing set (test-x, test-y) in a ratio of 8 to 2. The training data set is used as input data, and the input layer of the local learner (DBN) has 20 input layer nodes. We adopt three hidden layers of 100 nodes in each layer. Each sigmoid activation function is optimized. For the global learner, we directly use the original normalized wind power measurement values for the training task. We set the number of mapped features to 73 and the dimension of the mapped features is 6. The mapped features are mapped to the enhancement nodes. Every enhancement node group has 4 enhancement nodes. All experiments are implemented using MATLAB on a laptop equipped with Intel-i7 1.8 GHz CPU.

3.2. Models

To verify the predictive performance of the MVEW-DNN model, we compare it with some state-of-the-art and conventional predictive models in the SWPF task. These models are described as follows:

Autoregressive Integrated Moving Average (ARIMA) [49] is a seasonal model expressed as $A R I M A (p, d, q) {(P, D, Q)}_{m}$ . Here, $m$ refers to the number of periods in each season, and $P$ , D, and Q refer to autoregressive, differencing, and moving average terms for the seasonal part of the ARIMA model, respectively.
Random Vector Functional Link (RVFL) Network [50] is a multilayer perceptron (MLP). Its input and output are directly linked; only the output weights are selected as adaptive parameters. However, the remaining parameters are set to random values that are independently preselected. RVFL can also obtain promising prediction performance on SWPF tasks.
MOGWO-ELM [51] can provide promising SWPF by integrating the variational mode decomposition (VMD), the extreme learning machine model, the error factor, and a nonlinear ensemble method.
IVMD-SE-MCC-LSTM [52] is composed of the improved variational mode decomposition (IVMD), sample entropy (SE), the maximum correntropy criterion (MCC), and long short-term memory (LSTM) neural network. Here, the parameter K of the IVMD is determined by the MCC; the decomposed subseries is reconstructed by SE to improve the prediction efficiency. Then, the MCC is also utilized to replace the mean square error in the classic LSTM network.
Multi-view Neural Network Ensemble [41] is an ensemble of Radial Basis Function Neural Networks (RBFNN). In this ensemble neural network, a long short-term memory network (LSTM) and a multi-resolution wavelet transform are first used to extract the features for training. Then, the extracted feature data is input into multiple RBFNN networks for prediction. The output layer of the Multi-view Neural Network Ensemble is a local generalization error model, which assigns the corresponding weights to the output of multiple RBFNN networks. Finally, these output results of RBFNN are weighted and summed to provide the final predictive results.

3.3. Results

The NRMSE, NMAE, and TIC metrics are used to evaluate the predictive performance of the above six models on wind farm validation data (WF1–WF7). Interestingly, Table 1 shows that the proposed MVEW-DNN model provides lower NRMSR, NMAE, and TIC values than those of ARIMA, RVFL, MOGWO-ELM, IVMD-SE-MCC-LSTM, and Multi-view Neural Network Ensemble on WF1-WF7. For instance, the predictive results given by IVMD-SE-MCC-LSTM for NRMSE, NMAE, and TIC on WF1 are 0.2547, 0.2058, and 0.4131, respectively. On the other hand, the proposed MVEW-DNN gives 0.2103, 0.1603, and 0.3381 from NRMSE, NMAE, and TIC, respectively. Furthermore, we also provide a clear visual prediction display of the above six predictive models on WF3 in Figure 4. An interesting observation is that the proposed MVEW-DNN model has the best predictive performance among all compared models. Table 1 and Figure 4 indicate that our model can provide the best nonlinear approximation capability, robustness, and spatial learning ability among all six models.

Figure 5 shows the NRMSE, NMAE, and TIC results for the six predictive models on WF1-WF7. We can clearly see that the proposed MVEW-DNN model achieves the best predictive performance. This further indicates that our model has the best nonlinear approximation capability, robustness, and spatial learning ability among all six models.

3.4. Ablation Investigations of MVEW-DNN

Note that the proposed MVEW-DNN model covers the local view learning subnetwork (CEEMDAN-DBN), the global view learning subnetwork (deBLS), and FDDM. To evaluate the effectiveness of the local view learning subnetwork (CEEMDAN-DBN), the global view learning subnetwork (deBLS), and FDDM, ablation investigations are performed.

3.4.1. Effect of the Local View Learning Subnetwork

The local view learning subnetwork covers CEEMDAN and DBN. To evaluate the performance of the local view learning subnetwork (CEEMDAN-DBN), it is compared with DBN. The experimental results are listed in Table 2, where we can observe that CEEMDAN-DBN has better results for NRMSE, NMAE, and TIC on WF1–WF7 than DBN. For instance, CEEMDAN-DBN can provide NRMSE = 0.2555 on WF4, whereas DBN provides NRMSE = 0.3024 on WF4. CEEMDAN-DBN can provide NMAE = 0.2054 on WF4, but DBN provides NMAE = 0.2583 on WF4. These findings indicate that CEEMDAN-DBN has better nonlinear approximation capability than the single DBN. Furthermore, CEEMDAN-DBN can provide TIC = 0.3364 on WF4, whereas the single DBN achieves TIC = 0.4077. This implies that CEEMDAN-DBN has better spatial learning ability than the single DBN. From the above analyses, CEEMDAN-DBN can provide promising predictive performance on WF1–WF7.

3.4.2. Effect of the Global Network

The proposed global view learning subnetwork is called deBLS, which is composed of the attention mechanism, the additional enhancement nodes, and BLS. To assess the performance of deBLS, it is compared with BLS, BLS with the additional enhancement nodes (BLS-AEN), and BLS with the attention mechanism (BLS-A). The experimental results are presented in Table 3. It can be seen that deBLS has better results than BLS, BLS-AEN, and BLS-A on the NRMSE, NMAE, and TIC indicators of WF1–WF7. For example, deBLS can provide NRMSE = 0.2956 on WF4, whereas BLS, BLS-AEN, and BLS-A provide NRMSE = 0.3074, NRMSE = 0.3028, and NRMSE = 0.2999 on WF4, respectively. The NMAE for deBLS is 0.245 on WF4, versus 0.2521, 0.2531, and 0.251 for BLS, BLS-AEN, and BLS-A, respectively. These suggest that deBLS has better non-inear approximation capability than either BLS, BLS-AEN, and BLS-A. Furthermore, deBLS provides TIC = 0.4053 on WF4, versus TIC = 0.4174, TIC = 0.4215, and TIC = 0.4206 for BLS, BLS-AEN, and BLS-A, respectively. This implies that deBLS has better spatial learning ability. Based on the above analyses, deBLS has competitive predictive performance on WF1–WF7.

3.4.3. Effect of FDDM

Our MVEW-DNN consists of CEEMDAN-DBN, deBLS, and FDDM. To better assess the performance of MVEW-DNN, it is compared with CEEMDAN-DBN and deBLS. The experimental results are displayed in Table 4. We can see that the proposed MVEW-DNN model has better NRMSE, NMAE, and TIC results on WF1–WF7 than either CEEMDAN-DBN or deBLS. For example, the proposed MVEW-DNN model can provide NRMSE = 0.2264 and NMAE = 0.1784 on WF4, versus either NRMSE = 0.2555 and NMAE = 0.2054 for CEEMDAN-DBN or NRMSE = 0.2956 and NMAE = 0.245 for deBLS. This indicates that the proposed MVEW-DNN has better nonlinear approximation capability than CEEMDAN-DBN and deBLS. Furthermore, the proposed MVEW-DNN model can provide TIC = 0.2954 on WF4, versus TIC = 0.3364 and TIC = 0.4053 for CEEMDAN-DBN and deBLS, respectively. This means that the proposed MVEW-DNN model has better spatial learning ability. Moreover, the above analyses also imply that FDDM can effectively integrate CEEMDAN-DBN and deBLS to improve the predictive performance of MVEW-DNN.

To highlight the prediction performance difference, we visualize the data from Table 2, Table 3 and Table 4 with the radar charts in Figure 6. Interestingly, we can see that CEEMDAN-DBN outperforms DBN in predicting wind power generation on WF1–WF7. It strongly clarifies the effectiveness of CEEMDAN. Moreover, both BLS-AEN and BLS-A have better prediction performance than that of BLS, validating the effectiveness of the attention mechanism and the additional enhancement nodes. The proposed deBLS outperforms BLS-A and BLS-AEN, further suggesting that the combination of the attention mechanism and the additional enhancement nodes can improve the prediction performance of deBLS. MVEW-DNN outperforms CEEMDAN-DBN and deBLS, indicating that FDDM can provide effective combination between CEEMDAN-DBN and deBLS.

3.5. Parameter Selection Experiments

Our proposed model involves only one key parameter,

ω

, which is used for the attention mechanism. It serves to adjust the output layer weights

W^{m}

(see Algorithm 1). The different values of

ω

inevitably lead to discrepancies in the global learner predictions. We apply different values of

ω

to optimize the global learner on different sub-data sets. Some representative results are listed in Table 5. It can be seen that the different values of

ω

provide similar results for NRMSE, NMAE, and TIC on WF3. This indicates that MVEW-DNN is insensitive to

ω

. This further implies that MVEW-DNN has promising robustness in predictive performance.

3.6. Discussion

Based on the above experiments, our proposed MVEW-DNN has outstanding predictive performance in the SWPF field, mainly for the following reasons:

First, MVEW-DNN provides global and local view learning subnetworks, which can effectively learn more potential feature information to enhance the prediction accuracy.

Second, in MVEW-DNN, deBLS can provide higher predictive accuracy by rationally integrating the attention mechanism and the additional enhancement nodes.

Third, FDDM can provide effective feature fusion between the global and local view learning subnetworks to perfectly complement each other.

Fourth, the local view learning subnetwork provides the combination of CEEMDAN and DBN to achieve more potential local view feature data in SWPF, effectively reducing the impact of data volatility and avoiding the model confounding problem on model prediction results.

4. Conclusions

In this paper, the developed MVEW-DNN model is a new width-depth integrated predictor that consists of a global view learning subnetwork and a local view learning subnetwork. The global view learning subnetwork effectively integrates the attention mechanism and the additional enhancement nodes, which gives it the advantages of low computational cost and high prediction accuracy. The local view learning subnetwork rationally combines CEEMDAN and DBN, which can achieve better potential local view features, enhancing the predictive accuracy and robustness. FDDM can provide an effective feature fusion between the global and local view learning subnetworks, further enhancing the predictive accuracy and robustness. Therefore, the proposed MVEW-DNN provides better predictive performance, e.g., nonlinear approximation capability and spatial learning ability, than that of the state-of-the-art and conventional predictive models on the SWPF task. MVEW-DNN can effectively and significantly improve the wind power schedule and production program, which relieves the pressure on the power system for peak and frequency regulation, to greatly improve the wind energy utilization. Table 6 shows that the high time-computation costs of our model are mainly due to the local view learning subnetwork (CEEMDAN-DBN). Therefore, in the future, we will consider how to effectively reduce the computational costs of the proposed MVEW-DNN.

Author Contributions

Conceptualization, J.H., J.W. and Z.L.; methodology, J.H., J.W. and Z.L.; software, J.W., J.H. and Z.L.; validation, J.W.; formal analysis, J.H., J.W. and Z.L.; investigation, J.W.; resources, J.W.; data curation, J.W.; writing—original draft preparation, J.H., J.W. and Z.L.; writing—review and editing, C.L., J.H. and Z.L.; visualization, J.W.; supervision, C.L. and P.X.L.; project administration, C.L. and J.H.; funding acquisition, C.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China under grants 61863028, 62173176, 81660299, and 61503177, and in part by the Science and Technology Department of Jiangxi Province of China under grants 20204ABC03A39, 20161ACB21007, 20171BBE50071, and 20171BAB202033.

Institutional Review Board Statement

The study did not involve humans or animals.

Informed Consent Statement

The study did not involve humans.

Data Availability Statement

Publicly available data sets were analyzed in this study. These data can be found here: [https://www.kaggle.com/c/GEF2012-wind-forecasting/data] (accessed on 14 October 2021).

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

SWPF	Short-term wind power forecasting
CEEMDAN	Complete ensemble empirical mode decomposition with adaptive noise
BLS-AEN	BLS network with Addition Enhancement Nodes
BLS-A	The BLS network with Attention Mechanism
deBLS	Deep encoder board learning system
MVEW-DNN	Multi-view ensemble-based width-depth neural network
Variables
$X (t)$	The original wind power data
$\bar{I M F_{k}} (t)$	The $k$ th decomposition in CEEMDAN
$W_{e_{i}}$	The randomly generated weight matrix
$δ_{e_{i}}$	The randomly generated bias matrix
$J_{n}$	The feature nodes
$J^{n}$	The feature nodes group
$E_{m}$	The enhancement nodes
$E^{m}$	The enhancement nodes group
$B$	$The combination matrix of J^{n}$ $and E_{m}$
$P$	The output of our proposed model
$W_{ℬ}$	The fusion weight of deBLS
$W_{D}$	The fusion weight of DBN
Indices
k	The IMF index
n	The index of the feature nodes
m	The index of the enhancement nodes
$N_{s t d}$	The signal-to-noise ratio
$ω$	The adjusting weights in attention mechanism algorithm
$C$	The regularization parameter for sparse regularization
$S$	The shrinkage parameter in deBLS
$ρ$	The correlation test threshold

References

Hong, T.; Pinson, P.; Wang, Y.; Weron, R.; Yang, D.; Zareipour, H. Energy forecasting: A review and outlook. IEEE Open Access J. Power Energy 2020, 7, 376–388. [Google Scholar] [CrossRef]
Singh, U.; Rizwan, M. A Systematic Review on Selected Applications and Approaches of Wind Energy Forecasting and Integration. J. Inst. Eng. Ser. B 2021, 102, 1061–1078. [Google Scholar] [CrossRef]
Kerem, A.; Saygin, A.; Rahmani, R. A green energy research: Forecasting of wind power for a cleaner environment using robust hybrid metaheuristic model. Environ. Sci. Pollut. Res. 2021. [Google Scholar] [CrossRef]
Liu, H.; Li, Y.; Duan, Z.; Chen, C. A review on multi-objective optimization framework in wind energy forecasting techniques and applications. Energy Convers. Manag. 2020, 224, 113324. [Google Scholar] [CrossRef]
Maciejowska, K.; Nitka, W.; Weron, T. Enhancing load, wind and solar generation for day-ahead forecasting of electricity prices. Energy Econ. 2021, 99, 105273. [Google Scholar] [CrossRef]
Rodriguez, H.; Flores, J.J.; Morales, L.A.; Lara, C.; Guerra, A.; Manjarrez, G. Forecasting from incomplete and chaotic wind speed data. Soft Comput. 2019, 23, 10119–10127. [Google Scholar] [CrossRef]
Yang, B.; Zhong, L.; Wang, J.; Shu, H.; Zhang, X.; Yu, T.; Sun, L. State-of-the-art one-stop handbook on wind forecasting technologies: An overview of classifications, methodologies, and analysis. J. Clean. Prod. 2021, 283, 124628. [Google Scholar] [CrossRef]
Foley, A.M.; Leahy, P.G.; Marvuglia, A.; McKeogh, E.J. Current methods and advances in forecasting of wind power generation. Renew. Energy 2012, 37, 1–8. [Google Scholar] [CrossRef] [Green Version]
Zjavka, L.; Mišák, S. Direct wind power forecasting using a polynomial decomposition of the general differential equation. IEEE Trans. Sustain. Energy 2018, 9, 1529–1539. [Google Scholar] [CrossRef]
Jacondino, W.D.; da Silva Nascimento, A.L.; Calvetti, L.; Fisch, G.; Beneti, C.A.; da Paz, S.R. Hourly day-ahead wind power forecasting at two wind farms in northeast brazil using WRF model. Energy 2021, 230, 120841. [Google Scholar] [CrossRef]
Zhang, F.; Li, P.C.; Gao, L.; Liu, Y.Q.; Ren, X.Y. Application of autoregressive dynamic adaptive (ARDA) model in real-time wind power forecasting. Renew. Energy 2021, 169, 129–143. [Google Scholar] [CrossRef]
Xie, W.; Zhang, P.; Chen, R.; Zhou, Z. A nonparametric Bayesian framework for short-term wind power probabilistic forecast. IEEE Trans. Power Syst. 2018, 34, 371–379. [Google Scholar] [CrossRef]
Zhang, J.; Wang, C. Application of ARMA model in ultra-short term prediction of wind power. In Proceedings of the 2013 International Conference on Computer Sciences and Applications IEEE, Washington, DC, USA, 14–15 December 2013; pp. 361–364. [Google Scholar]
Jia, M.; Shen, C.; Wang, Z. A distributed incremental update scheme for probability distribution of wind power forecast error. Int. J. Electr. Power Energy Syst. 2020, 121, 106151. [Google Scholar] [CrossRef]
Zhang, C.; Peng, T.; Nazir, M.S. A novel hybrid approach based on variational heteroscedastic Gaussian process regression for multi-step ahead wind speed forecasting. Int. J. Electr. Power Energy Syst. 2022, 136, 107717. [Google Scholar] [CrossRef]
He, Y.; Zhang, W. Probability density forecasting of wind power based on multi-core parallel quantile regression neural network. Knowl.-Based Syst. 2020, 209, 106431. [Google Scholar] [CrossRef]
Pearre, N.S.; Swan, L.G. Statistical approach for improved wind speed forecasting for wind power production. Sustain. Energy Technol. Assess. 2018, 27, 180–191. [Google Scholar]
Hu, T.; Wu, W.; Guo, Q.; Sun, H.; Shi, L.; Shen, X. Very short-term spatial and temporal wind power forecasting: A deep learning approach. CSEE J. Power Energy Syst. 2019, 6, 434–443. [Google Scholar]
Zendehboudi, A.; Baseer, M.A.; Saidur, R. Application of support vector machine models for forecasting solar and wind energy resources: A review. J. Clean. Prod. 2018, 199, 272–285. [Google Scholar] [CrossRef]
Li, R.; Ke, Y.Q.; Zhang, X.Q. Wind power forecasting based on time series and SVM. Electr. Power 2012, 45, 64–68. [Google Scholar]
Sun, W.; Wang, Y. Short-term wind speed forecasting based on fast ensemble empirical mode decomposition, phase space reconstruction, sample entropy and improved back-propagation neural network. Energy Convers. Manag. 2018, 157, 1–12. [Google Scholar] [CrossRef]
Hu, H.; Wang, L.; Lv, S.X. Forecasting energy consumption and wind power generation using deep echo state network. Renew. Energy 2020, 154, 598–613. [Google Scholar] [CrossRef]
Shetty, R.P.; Sathyabhama, A.; Pai, P.S. An efficient online sequential extreme learning machine model based on feature selection and parameter optimization using cuckoo search algorithm for multi-step wind speed forecasting. Soft Comput. 2021, 25, 1277–1295. [Google Scholar] [CrossRef]
Chen, C.P.; Liu, Z. Broad learning system: An effective and efficient incremental learning system without the need for deep architecture. IEEE Trans. Neural Netw. Learn. Syst. 2017, 29, 10–24. [Google Scholar] [CrossRef] [PubMed]
Hu, Q.; Zhang, R.; Zhou, Y. Transfer learning for short-term wind speed prediction with deep neural networks. Renew Energy 2016, 85, 83–95. [Google Scholar] [CrossRef]
Xu, L.; Chen, C.P.; Han, R. Sparse Bayesian broad learning system for probabilistic estimation of prediction. IEEE Access 2020, 8, 56267–56280. [Google Scholar] [CrossRef]
Yan, J.; Liu, Y.; Han, S.; Wang, Y.; Feng, S. Reviews on uncertainty analysis of wind power forecasting. Renew. Sustain. Energy Rev. 2015, 52, 1322–1330. [Google Scholar] [CrossRef]
Jiajun, H.; Chuanjin, Y.; Yongle, L.; Huoyue, X. Ultra-short term wind prediction with wavelet transform, deep belief network and ensemble learning. Energy Convers. Manag. 2020, 205, 112418. [Google Scholar] [CrossRef]
Chen, C.R.; Ouedraogo, F.B.; Chang, Y.M.; Larasati, D.A.; Tan, S.W. Hour-Ahead Photovoltaic Output Forecasting Using Wavelet-ANFIS. Mathematics 2021, 9, 2438. [Google Scholar] [CrossRef]
Wang, G.; Wang, X.; Wang, Z.; Ma, C.; Song, Z. A VMD–CISSA–LSSVM Based Electricity Load Forecasting Model. Mathematics 2022, 10, 28. [Google Scholar]
Devi, A.S.; Maragatham, G.; Boopathi, K.; Rangaraj, A.G. Hourly day-ahead wind power forecasting with the EEMD-CSO-LSTM-EFG deep learning technique. Soft Comput. 2020, 24, 12391–12411. [Google Scholar] [CrossRef]
Zhao, X.; Bai, M.; Yang, X.; Liu, J.; Yu, D.; Chang, J. Short-term probabilistic predictions of wind multi-parameter based on one-dimensional convolutional neural network with attention mechanism and multivariate copula distribution estimation. Energy 2021, 234, 121306. [Google Scholar] [CrossRef]
Niu, Z.; Yu, Z.; Tang, W.; Wu, Q.; Reformat, M. Wind power forecasting using attention-based gated recurrent unit network. Energy 2020, 196, 117081. [Google Scholar] [CrossRef]
Shahid, F.; Zameer, A.; Muneeb, M. A novel genetic LSTM model for wind power forecast. Energy 2021, 223, 120069. [Google Scholar] [CrossRef]
Khan, N.; Ullah, F.U.M.; Haq, I.U.; Khan, S.U.; Lee, M.Y.; Baik, S.W. AB-Net: A Novel Deep Learning Assisted Framework for Renewable Energy Generation Forecasting. Mathematics 2021, 9, 2456. [Google Scholar] [CrossRef]
Duan, J.; Wang, P.; Ma, W.; Fang, S.; Hou, Z. A novel hybrid model based on nonlinear weighted combination for short-term wind power forecasting. Int. J. Electr. Power Energy Syst. 2022, 134, 107452. [Google Scholar] [CrossRef]
Wu, Y.K.; Su, P.E.; Hong, J.S. Stratification-based wind power forecasting in a high-penetration wind power system using a hybrid model. IEEE Trans. Ind. Appl. 2016, 52, 2016–2030. [Google Scholar] [CrossRef]
Ogliari, E.; Guilizzoni, M.; Giglio, A.; Pretto, S. Wind power 24-h ahead forecast by an artificial neural network and an hybrid model: Comparison of the predictive performance. Renew. Energy 2021, 178, 1466–1474. [Google Scholar] [CrossRef]
Hong, Y.Y.; Rioflorido, C.L.P.P. A hybrid deep learning-based neural network for 24-h ahead wind power forecasting. Appl. Energy 2019, 250, 530–539. [Google Scholar] [CrossRef]
Ribeiro, M.H.D.M.; da Silva, R.G.; Moreno, S.R.; Mariani, V.C.; dos Santos Coelho, L. Efficient bootstrap stacking ensemble learning model applied to wind power generation forecasting. Int. J. Electr. Power Energy Syst. 2022, 136, 107712. [Google Scholar] [CrossRef]
Lai, C.S.; Yang, Y.; Pan, K.; Zhang, J.; Yuan, H.; Ng, W.W.; Gao, Y.; Zhao, Z.; Wang, T.; Shahidehpour, M.; et al. Multi-view neural network ensemble for short and mid-term load forecasting. IEEE Trans. Power Syst. 2020, 36, 2992–3003. [Google Scholar] [CrossRef]
Nguyen, L.H.; Pan, Z.; Openiyi, O.; Abu-gellban, H.; Moghadasi, M.; Jin, F. Self-boosted time-series forecasting with multi-task and multi-view learning. arXiv 2019, arXiv:1909.08181. [Google Scholar]
Zhong, C.; Lai, C.S.; Ng, W.W.; Tao, Y.; Wang, T.; Lai, L.L. Multi-view deep forecasting for hourly solar irradiance with error correction. Sol. Energy 2021, 228, 308–316. [Google Scholar] [CrossRef]
Torres, M.E.; Colominas, M.A.; Schlotthauer, G.; Flandrin, P. A Complete Ensemble Empirical Mode Decomposition with Adaptive Noise. In Proceedings of the 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, Czech Republic, 22–27 May 2011; pp. 4144–4147. [Google Scholar]
Flores, J.H.F.; Engel, P.M.; Pinto, R.C. Autocorrelation and Partial Autocorrelation Functions to Improve Neural Networks Models on Univariate Time Series Forecasting. In Proceedings of the The 2012 International Joint Conference on Neural Networks (IJCNN) IEEE, San Diego, CA, USA, 10–15 June 2012; pp. 1–8. [Google Scholar]
Global Energy Forecasting Competition 2012—Wind Forecasting. Available online: https://www.kaggle.com/c/GEF2012-wind-forecasting/data (accessed on 14 October 2021).
Kisvari, A.; Lin, Z.; Liu, X. Wind power forecasting—A data-driven method along with gated recurrent neural network. Renew. Energy 2021, 163, 1895–1909. [Google Scholar] [CrossRef]
Zhu, Y.; Zhu, C.; Song, C.; Li, Y.; Chen, X.; Yong, B. Improvement of reliability and wind power generation based on wind turbine real-time condition assessment. Int. J. Electr. Power Energy Syst. 2019, 113, 344–354. [Google Scholar] [CrossRef]
Putz, D.; Gumhalter, M.; Auer, H. A novel approach to multi-horizon wind power forecasting based on deep neural architecture. Renew. Energy 2021, 178, 494–505. [Google Scholar] [CrossRef]
Ren, Y.; Suganthan, P.N.; Srikanth, N.; Amaratunga, G. Random vector functional link network for short-term electricity load demand forecasting. Inf. Sci. 2016, 367, 1078–1093. [Google Scholar] [CrossRef]
Hao, Y.; Tian, C. A novel two-stage forecasting model based on error factor and ensemble method for multi-step wind power forecasting. Appl. Energy 2019, 238, 368–383. [Google Scholar] [CrossRef]
Duan, J.; Wang, P.; Ma, W. Short-term wind power forecasting using the hybrid model of improved variational mode decomposition and Correntropy Long Short-term memory neural network. Energy 2021, 214, 118980. [Google Scholar] [CrossRef]

Figure 1. MVEW-DNN consists of the CEEMDAN-DBN, deBLS, and FDDM. deBLS is established for the global view. CEEMDAN-DBN is established for the local view.

Figure 2. The wind power data characteristics of 7 wind farms. The green triangles and circles represent the outliers and the average values of the data set respectively. The short purple lines on the upper side represent the maximum values, while those on the lower side represent the minimum values. And the short green lines represent the median values.

Figure 3. Wind power curve of WF3 under 1-h sampling rate. The blue points refer to the sampling moments. The vertical coordinates of the blue points indicate the wind power measurement values at the sampling moments, and the horizontal coordinates of the points are the corresponding wind speed data.

Figure 4. The prediction results are executed by the above models with the data set from WF3. (a) Prediction results from 6 models for WF3 wind power data forecasted 24-h in advance. (b) Partial enlargement of the 6 model predictions.

Figure 5. The NRMSE, NMAE, and TIC assessment results: (a) comparison of prediction results from 6 models on NMAE; (b) comparison of prediction results from 6 models on TIC; (c) comparison of prediction results from 6 models on NRMSE.

Figure 6. Ablation experiments on NMAE, TIC, and NRMSE evaluation indicators for 7 wind power data sets: (a) results of ablation experiments on NMAE evaluation indicators for 7 wind power data sets; (b) results of ablation experiments on TIC evaluation indicator for 7 wind power data sets; (c) results of ablation experiments on NRMSE evaluation indicator for 7 wind power data sets.

Table 1. The NRMSE, NMAE, and TIC assessment results of the 6 models.

Data Set	Metrics	ARIMA	RVFL	MOGWO-ELM	IVMD-SE-MCC-LSTM	Multi-View Neural Network Ensemble	MVEW-DNN
WF1	NRMSE	0.4477	0.3417	0.2534	0.2547	0.2758	0.2103
	NMAE	0.3285	0.2505	0.2042	0.2058	0.1916	0.1603
	TIC	0.5878	0.651	0.4118	0.4131	0.6618	0.3381
WF2	NRMSE	0.4744	0.3494	0.2761	0.2766	0.2944	0.2236
	NMAE	0.3458	0.2575	0.2306	0.2275	0.241	0.173
	TIC	0.571	0.6543	0.4219	0.4253	0.4145	0.3281
WF3	NRMSE	0.5442	0.4566	0.3188	0.3119	0.2839	0.2378
	NMAE	0.4178	0.3373	0.2744	0.268	0.2386	0.1879
	TIC	0.5503	0.7596	0.398	0.3865	0.3714	0.2784
WF4	NRMSE	0.5008	0.4023	0.2992	0.2925	0.2664	0.2264
	NMAE	0.3686	0.2935	0.2505	0.2465	0.1859	0.1784
	TIC	0.5586	0.7166	0.4217	0.4035	0.6372	0.2954
WF5	NRMSE	0.4978	0.425	0.3208	0.3221	0.3863	0.2502
	NMAE	0.36	0.308	0.2591	0.2603	0.2738	0.1988
	TIC	0.5838	0.7119	0.453	0.4535	0.7101	0.3113
WF6	NRMSE	0.4804	0.4057	0.2915	0.2882	0.302	0.2328
	NMAE	0.356	0.2936	0.2414	0.2349	0.208	0.1821
	TIC	0.5504	0.7219	0.4078	0.4092	0.6483	0.2977
WF7	NRMSE	0.5041	0.4058	0.3025	0.2973	0.3872	0.5041
	NMAE	0.368	0.2953	0.2599	0.2541	0.2367	0.368
	TIC	0.5492	0.7502	0.4218	0.4108	0.6439	0.5492

Table 2. The assessments results of DBN and CEEMDAN-DBN on three metrics.

Data Set	Metrics	CEEMDAN-DBN	DBN
WF1	NRMSE	0.2519	0.2702
	NMAE	0.1925	0.2172
	TIC	0.3388	0.4146
WF2	NRMSE	0.2454	0.2778
	NMAE	0.1979	0.2338
	TIC	0.3595	0.4141
WF3	NRMSE	0.2845	0.3278
	NMAE	0.2346	0.2798
	TIC	0.3275	0.4021
WF4	NRMSE	0.2555	0.3024
	NMAE	0.2054	0.2583
	TIC	0.3364	0.4077
WF5	NRMSE	0.2736	0.3327
	NMAE	0.2204	0.2678
	TIC	0.3592	0.4561
WF6	NRMSE	0.2533	0.3002
	NMAE	0.2124	0.2506
	TIC	0.3155	0.4049
WF7	NRMSE	0.2512	0.3178
	NMAE	0.2102	0.2745
	TIC	0.3296	0.4195

Table 3. The assessment results of BLS, BLS-AEN, BLS-A, and deBLS.

Data Set	Metrics	BLS	BLS-AEN	BLS-A	deBLS
WF1	NRMSE	0.2776	0.2713	0.2628	0.2548
	NMAE	0.2223	0.2184	0.213	0.209
	TIC	0.4082	0.4128	0.3998	0.3783
WF2	NRMSE	0.2817	0.2783	0.2759	0.2757
	NMAE	0.2329	0.2313	0.2255	0.2241
	TIC	0.418	0.4219	0.4257	0.4362
WF3	NRMSE	0.3238	0.3199	0.3127	0.3126
	NMAE	0.2731	0.2742	0.2669	0.2628
	TIC	0.3863	0.3975	0.3791	0.3879
WF4	NRMSE	0.3074	0.3028	0.2999	0.2956
	NMAE	0.2521	0.2531	0.251	0.245
	TIC	0.4174	0.4215	0.4206	0.4053
WF5	NRMSE	0.339	0.3321	0.3205	0.3179
	NMAE	0.2727	0.2683	0.2597	0.2543
	TIC	0.4451	0.4526	0.4508	0.4443
WF6	NRMSE	0.3052	0.2944	0.2925	0.2892
	NMAE	0.2462	0.2428	0.2424	0.2376
	TIC	0.408	0.4117	0.4071	0.3917
WF7	NRMSE	0.3203	0.3054	0.3023	0.2997
	NMAE	0.2652	0.2608	0.2576	0.2521
	TIC	0.4181	0.4247	0.4016	0.4085

Table 4. The results of CEEMDAN-DBN, deBLS, and the proposed MVEW-DNN model.

Data Set	Metrics	CEEMDAN-DBN	deBLS	MVEW-DNN
WF1	NRMSE	0.2519	0.2548	0.2103
	NMAE	0.1925	0.209	0.1603
	TIC	0.3388	0.3783	0.3381
WF2	NRMSE	0.2454	0.2757	0.2236
	NMAE	0.1979	0.2241	0.173
	TIC	0.3595	0.4362	0.3281
WF3	NRMSE	0.2845	0.3126	0.2378
	NMAE	0.2346	0.2628	0.1879
	TIC	0.3275	0.3879	0.2784
WF4	NRMSE	0.2555	0.2956	0.2264
	NMAE	0.2054	0.245	0.1784
	TIC	0.3364	0.4053	0.2954
WF5	NRMSE	0.2736	0.3179	0.2502
	NMAE	0.2204	0.2543	0.1988
	TIC	0.3592	0.4443	0.3113
WF6	NRMSE	0.2533	0.2892	0.2328
	NMAE	0.2124	0.2376	0.1821
	TIC	0.3155	0.3917	0.2977
WF7	NRMSE	0.2512	0.2997	0.2271
	NMAE	0.2102	0.2521	0.1782
	TIC	0.3296	0.4085	0.2926

Table 5. Effect of adjusting parameter ω with the data from WF3.

ω	NRMSE	NMAE	TIC
0.5	0.3275	0.2794	0.3881
0.6	0.3268	0.2692	0.3857
0.7	0.3263	0.2792	0.3835
0.8	0.3257	0.2891	0.3814

Table 6. The time computation costs of deBLS, CEEMDAN-DBN, FDDM, and the proposed MVEW-DNN.

Wind Farm	deBLS (s)	CEEMDAN-DBN (s)	FDDM (s)	MVEW-DNN (s)
WF1	0.654	142.582	0.984	144.22
WF2	0.833	135.888	0.994	137.715
WF3	0.438	149.442	0.643	150.523
WF4	0.394	160.626	0.717	161.737
WF5	0.423	188.86	0.815	190.098
WF6	0.417	163.324	0.893	164.634
WF7	0.433	172.055	0.927	173.415

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wan, J.; Huang, J.; Liao, Z.; Li, C.; Liu, P.X. A Multi-View Ensemble Width-Depth Neural Network for Short-Term Wind Power Forecasting. Mathematics 2022, 10, 1824. https://0-doi-org.brum.beds.ac.uk/10.3390/math10111824

AMA Style

Wan J, Huang J, Liao Z, Li C, Liu PX. A Multi-View Ensemble Width-Depth Neural Network for Short-Term Wind Power Forecasting. Mathematics. 2022; 10(11):1824. https://0-doi-org.brum.beds.ac.uk/10.3390/math10111824

Chicago/Turabian Style

Wan, Jing, Jiehui Huang, Zhiyuan Liao, Chunquan Li, and Peter X. Liu. 2022. "A Multi-View Ensemble Width-Depth Neural Network for Short-Term Wind Power Forecasting" Mathematics 10, no. 11: 1824. https://0-doi-org.brum.beds.ac.uk/10.3390/math10111824

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Multi-View Ensemble Width-Depth Neural Network for Short-Term Wind Power Forecasting

Abstract

1. Introduction

2. The Proposed MVEW-DNN

2.1. Local View Subnetwork

2.1.1. Empirical Mode Decomposition with Adaptive Noise (CEEMDAN)

2.1.2. Deep Belief Network (DBN)

2.2. Global View

2.3. FDDM

2.4. Evaluation Criteria

3. Results and Discussion

3.1. Data Description and Experiment Settings

3.1.1. Data Description

3.1.2. Experiment Settings

3.2. Models

3.3. Results

3.4. Ablation Investigations of MVEW-DNN

3.4.1. Effect of the Local View Learning Subnetwork

3.4.2. Effect of the Global Network

3.4.3. Effect of FDDM

3.5. Parameter Selection Experiments

3.6. Discussion

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI