1. Introduction
When the penetration of wind power into the network reaches a certain level, system operators have difficulties in balancing generation with demand. To help address this issue, it is necessary to apply forecasting methods to estimate the wind power generated in the next few hours and days.
Several methods have been used to forecast the wind speed: stochastic methods such as the AR [
1] and ARIMA [
2,
3,
4] process or heuristic methods such as the Kalman filter [
5,
6], neural networks [
7,
8,
9], and neuro-fuzzy systems [
10,
11]. Among all methods, neural networks are the most widely used by researchers.
It is difficult to compare different methods if they do not use the same dataset and the same performance indexes. A typical approach for comparison is to use the persistence model as the reference [
12].
Table 1 illustrates the improvement achieved by different forecasting methods compared to the persistence method (although different forecast horizons are used, the list gives an idea of the range of improvement). These improvements are rarely over 25%.
Wind speed series have considerable uncertainty because of weather fluctuations and the added instrument uncertainty. This uncertainty and noise make it difficult to improve the forecasts. There are several strategies to process the data [
13,
14,
15]. Some authors have used wavelet transforms [
16] to process the time series. Most authors [
17,
18,
19,
20,
21,
22,
23,
24,
25,
26,
27,
28,
29,
30,
31] have used wavelets to decompose the time series into sub-series, called approximation and details; applied the forecasting method to each sub-series, and finally, summarized the forecasting results to obtain the final solution. The advantage of this method comes from the sub-series having an improved performance with respect to the original series. A few authors [
32,
33,
34,
35,
36,
37] have used other wavelet filtering techniques to eliminate the high frequency variations and smooth the time series. In all papers, the authors selected the wavelet family and decomposition level without too much justification. For example, the cited authors only used one wavelet family. These works used the wavelet transform as an auxiliary technique and did not study them at sufficient depth.
In this paper, the wavelet transform was analyzed thoroughly. This work demonstrated that the selection of the wavelet family and decomposition level were far more important than they have been given credit for thus far. The improvement obtained was greater than that achieved with most new forecasting methods. The result was applied to the three main forecasting methods currently used, namely statistical, neural network, and fuzzy methods. These were applied to several forecast horizons and sample times. In all cases, the results obtained were improved for each method when the optimal wavelet filter was applied. Finally, the main contribution of the paper is to highlight the importance of data processing and to propose it as an additional phase in the forecasting method, so that both steps are optimized together.
The rest of this paper is structured as follows.
Section 2 explains the basic concepts of the wavelet transform.
Section 3 presents the different forecasting approaches.
Section 4 describes the forecasting approach proposed. In
Section 5, the comparison criteria to evaluate the improvement of each method are explained.
Section 6 presents the results for the different methods considered. Finally,
Section 7 draws the main conclusions of this research.
2. Basic Concepts of the Wavelet Transform
Fourier analysis is commonly used to help analyze different types of signals. With this method, a signal
f(
t) is expressed as a linear decomposition of real-valued functions of
t, as shown in Equation (1),
where
ak are the real-valued expansion coefficients and
ϕk(
t) are a set of real-valued functions of
t called the expansion set. In Fourier series, these are sin(
kω0t) and cos(
kω0t) with frequencies of
kω0.
2.1. Wavelet Transform
An introduction to the wavelet transform can be found in [
16]. In Equation (2), the signal is already decomposed into coefficients
aj,k and functions
Ψj,k(
t), which depend on parameters
j and
k,
where
Ψj,k are the wavelet expansion functions and
aj,k is the discrete wavelet transform of
f(
t), or the set of expansion coefficients.
The wavelet expansion functions, or family of wavelets are generated from a mother wavelet
Ψ(
t) by scaling and translation:
where the parameter
k translates the function and the parameter
j scales it.
Figure 1 shows the translating and scaling operations of the function
.
2.2. Multi-Resolution Formulation of Wavelet Systems
In multi-resolution analysis, the resolution of the approximation of
f(
t) depends on the choice of
j in Equation (2). For a value of
j =
j0, the equation is:
For low values of j, the approximation of f(t) can represent only coarse information. In multi-resolution formulation, φj0,k(t) are called scaling functions. If we want to represent detailed information, then high values of j are required.
However, there is another way to describe a signal with better resolution without increasing
j. This new approach consists of describing the differences between the approximation and the original signal with a combination of other functions called wavelets
Ψj,k(
t) and the coefficients
dj,k, as shown in Equation (5). The parameters
k and
j indicate the translation and scaling of the function.
There are several packets of scaling functions
φ(
t) and wavelets
Ψ(
t), as shown in Equation (6) (see
Figure 2), which were chosen depending on the signal that has to be approximated.
In Equation (6), the coefficients
h0(
n) and
h1(
n) with
n∈
Z, are a sequence of real numbers called filter coefficients. The process is similar to digital filters, where
h0(
m − 2
k) acts as a low-pass filter and
h1(
m − 2
k) acts as a high-pass filter.
Figure 3 shows the decomposition process of
cj in:
cj+1 (low frequency) and
dj+1 (high frequency).
The
j+1 level scaling coefficients are:
These expressions represent the approximation and details of the signal for a j + 1 level scaling, and m = 2k + n is a sequence of real numbers.
This process can be repeated iteratively to reduce the high-frequency component as shown in
Figure 4.
In
Figure 5b, we can see the approximation
c2(
k) and the details
d2(
k),
d1(
k), and
d0(
k) of the original signal
f(
t) are shown in
Figure 5a.
Wind speed time series have a high-frequency component due to wind gusts, measurement errors, and random events as well as a low-frequency component with slower variation. The high-frequency component of the signal introduces a lot of noise into forecasting methods, causing them to perform poorly. If this component is eliminated and the forecasting methods are applied to an approximation with only the low frequency component, improved results can be obtained.
2.3. Wavelet Families
There is a large number of wavelets. The selection of the wavelet function depends on the problem and the properties of the wavelet function [
16]. The main properties are its region of support and the number of vanishing moments. The region of support affects its localization capabilities, whereas the vanishing moments limit the ability of the wavelet to represent the information of a signal. In this paper, the wavelet families used were: Haar, Daubechies, Symlet, Coiflet, Biorsplines, and Meyer.
There are some methods to select the optimal wavelet family, but they have been developed for specific applications and it is not certain that they can be applied to forecasting problems:
In the cross-correlation method [
38], the optimum wavelet maximizes the cross-correlation between the signal of interest and the wavelet;
In the energy method [
39], the aim is to maximize the energy of the signal of interest; and
In the entropy method [
40], the best wavelet minimizes the entropy of the signal of interest.
3. Forecasting Models
The wavelet filter was applied to several forecasting methods, namely regression, neural network and fuzzy models.
3.1. Persistence Model
In the persistence model, the variable value in
t + Δ
t is equal to the variable value in
t. Due to its simplicity, this model was used as a reference.
3.2. Regressive Model
This model [
41] is based on the multiple regressions that study the relations between a dependent variable and a set of independent variables. Among the independent variables, there are exogenous variables such as temperature, and intrinsic variables like the historical values of these variables. When the model only uses the historical values of these variables, it is called the auto-regressive temporal series model. In this work, the model used the historical values:
where
αi represents the auto-regressive parameters and
p is the number of past values.
3.3. Neural Network Model
Neural networks [
42] are auto-adaptive dynamic systems that are able to find nonlinear relations between several variables. The model used is a multilayer perceptron that gives good results in forecasting problems:
where
and
are the layer thresholds;
wji and
wkj are the layer weights;
i and
j are the number of neurons in each layer; and
g is the activation function.
3.4. Fuzzy Model
The fuzzy model [
43] is based on concepts of fuzzy sets theory, fuzzy rules of type if–then, and approximation reasoning:
where
is the normalized firing strengths;
ui is the functions that depend on the inputs
yt-I;
is the fuzzy set that represents the input variables; and
pi is the membership grade of each input
yt in
.
4. Forecasting Approach
Wind time series have high variability due to the intrinsic uncertainty of the wind; this variability negatively influences the forecasting result. In this paper, the adopted approach, illustrated in
Figure 6, consists of using an optimized filter based on wavelets to de-noise the data that are used to train the chosen forecasting method. The filter is optimized with a genetic algorithm that selects the best wavelet family and the optimal decomposition level.
The algorithm receives as inputs the wind speed data and the prediction method to be used.
In step 1, a random population of individuals is created. Each individual contains the information of the parameters of the prediction method, the wavelet family, and the level of decomposition.
In step 2, each individual in the population is evaluated. The evaluation has three phases.
The first phase consists of applying the wavelet filter to the input data with the wavelet family and the level indicated by each individual. The data are divided into training and test sets. The original time series is filtered with the wavelet transform and is decomposed into an approximation component and several details of the signal. The approximation component has improved behavior in comparison to the original series in the forecasting process. Therefore, only the approximation component is used in the next phase and the details are discarded. In this phase, there are two important decisions to analyze: the best wavelet family to use and the optimal filter level. These questions have not been answered in the technical literature.
The second phase consists of training the prediction method with the parameters indicated by each individual and the training dataset. A forecasting method is applied only to the approximation component. In this paper, three methods were used to forecast the time series: autoregressive, neural networks, and fuzzy models.
The third phase consists of evaluating the prediction method, already trained, with the test dataset.
In step 3, the best individuals are selected based on the error obtained in the evaluation with the test data.
In step 4, the crossover and mutation operators of the genetic algorithm are applied that give rise to a new population.
Steps 2 through 4 are repeated until the termination criterion is reached, which is the number of generations or iterations of the genetic algorithm.
5. Forecasting Errors
We compared these forecasting methods with the simpler persistence model used as the reference. The measure of the error of each method was calculated by the root mean square error (RMSE),
where Xpred
t is the predicted value in
t; Xreal
t is the real value at
t; and
n is the number of samples.
The improvement of each method in comparison to the persistence model was calculated with the following equation:
6. Results of Validation Cases
6.1. Data Description
The wavelet filter was applied to several forecasting methods: regression, neural network, and fuzzy models. Five wind speed series were used in this work: two series with a high sampling frequency (1″ and 1′, respectively) and three with a sampling frequency of 10′.
Table 2 shows the main statistic characteristics of these time series and
Figure A1,
Figure A2,
Figure A3,
Figure A4 and
Figure A5 in the
Appendix A represent their temporal variation.
Several cases were built from these data to observe the performance of the forecasting models when the sampling step (Δt) and the forecasting horizon (FH) changed. An identifier (ID) was assigned to each case in
Table 3.
6.2. Results
Every time series was divided into two sets of the same length: a training set and a test set. The forecasting models were built with the training set and the results presented here were obtained by applying these models to the test set. Moreover, eight inputs were used in all methods.
First, a detailed analysis of wavelet filtering is presented, aiming to answer whether they are helpful with different prediction methods and whether they depend on factors such as the level of filtering, the prediction horizon, and the sampling frequency. Afterward, we analyze whether it is possible to select the wavelet family by any of the methods described in the literature regardless of the prediction method used. Finally, its application is presented in a specific case.
6.2.1. Influence of Wavelet Filters in Several Forecasting Methods
The best obtained results of applying the wavelet filters on time series are presented in
Figure 7 and
Figure 8. It is shown that regardless of the model, the forecasting horizon, or time step, the performance was much better with the wavelet filter than without it when the optimal wavelet family and level were chosen.
Detailed results are provided in
Table A1,
Table A2,
Table A3,
Table A4,
Table A5 and
Table A6 in the
Appendix A. It is important to note that the optimal wavelet family and level was different in each case. That is, there was a lot of variability in this point. This fact is contrary to the widespread action among researchers who choose these parameters depending on the application.
6.2.2. Influence of Decomposition Level
However, there was a great similarity in the performance of the different wavelet families in each case. Each family reached different levels of improvement, but all families achieved their maximum improvement percentage at a similar level.
Figure 9 shows the improvement/level rate for a particular case, with different forecasting methods.
Figure 10 shows the improvement/level rate of the same case and method for different wavelet families (see
Table A10,
Table A11,
Table A12 and
Table A13 for details).
The last results explain why researchers can obtain favorable solutions by applying wavelets, even when they do not select the wavelet family and level accurately.
6.2.3. Influence of the Forecasting Horizon
Comparisons made up to now were in percent, because it permitted us to adequately show the difference between whether the wavelet filters were applied or not. However, it is necessary to remember that the error (RMSE) increased with the forecasting horizon, as can be seen in
Figure 11,
Figure 12 and
Figure 13, although less when the wavelet filters were applied.
6.2.4. Influence of Different Sampling Frequencies
In
Figure 14, it can be seen that with low filtering levels, an important improvement was obtained, but with high filtering levels, information was lost and the improvement decreased or even worsened substantially at high sampling frequencies.
6.2.5. Selection of Optimal Wavelet Family
In
Table 4, the wavelet families found by the cross-correlation, energy, and entropy methods are shown in the columns “cross-corr”, “energy”, and “entropy”, respectively. The column “optimum” shows the wavelet family that gave the best results in our tests.
These methods were applied to the time series with poor results. The cross-correlation method obtained the correct result in cases 7, 30, and 31; the energy method in cases 1, 5, 13, 15, 18, 25, 27, 32, 33, 34 and 3; and the entropy method in cases 5 and 33.
6.2.6. Applying the Forecasting Approach
The importance of using filtered data is illustrated in the following example.
Figure 15 shows the first 300 data points (to appreciate it in detail) of the original data series of case 22, the data series filtered with the wavelet family “dmey” and a filter level 2, and the difference between the two series.
Figure 16 shows the results of the forecast made with the neural network trained with the filtered data training set, and
Figure 17 shows the results of the forecast made with the neural network trained with the unfiltered data training set. The results using correctly filtered data were considerably better than those with the unfiltered data.
7. Conclusions
In this paper, the forecasting models were applied to the approximation component of the wavelet decomposition and the details components were discarded, as opposed to most authors who use both components in their forecasting models.
A deep analysis of the wavelet filter in results was made, and the conclusions will enable improvements in all forecasting models.
The wavelet filter method was applied to different forecasting models: regression, neural network, and fuzzy models. In all models, this technique (wavelet filter + forecasting model) improved the obtained results compared to the case when only the forecasting model was used. The improvement of these methods versus the persistence method was between 2% and 30%, but with the wavelet filter method, it was between 20% and 50%.
The study was extended to several wavelet families. In all cases, there were improvements, but it was not easy to select the best family. The selection methods did not work for the proposed method.
The filtering level was more important to obtain good results than the wavelet family in most of cases. This optimum level was between 2 and 5 in all wavelet families.
As a final conclusion, it seems necessary to use an optimization algorithm to select the wavelet family and level.
It has become clear that it is not easy to determine the parameters of the data processing methods and that they significantly influence the results obtained. Hence, future research is the joint optimization of the data processing and the forecasting method.