1. Introduction
The variability of photovoltaic (PV) power generation induces great challenges for the management of energy systems including PV plants [
1], e.g., a PV water pumping system. The PV generation is essential for the optimal scheduling of a PV water pumping system, in which the performance of PV pumps is affected by the fluctuations of solar irradiance [
2]. The PV power generation highly depends on incident solar irradiance [
3]. Hence, timely and accurate solar irradiance forecasting is a promising technique to solve the problem of uncertainty of PV generation. In recent years, the data-driven-based solar forecasting method has become mainstream due to the rapid development of computation technique and the access of comprehensive quality-controlled solar data [
4]. Based on forecasting horizons, solar irradiance forecasts can be classified into intra-hour (5 min–30 min) forecast, intra-day (30 min–180 min) forecast, and day-ahead (26–39 h) forecast [
4]. The history records of solar irradiance are valuable inputs for data-driven-based solar irradiance forecasting methods [
5,
6].
For different solar irradiance forecasting horizons, exogenous inputs can improve forecasting accuracy and robustness. For example, satellite imagery is helpful for intra-day forecasting [
7]. Day-ahead forecasting is also improved by incorporating information from numerical weather prediction (NWP) models [
8]. Short-term solar irradiance is highly affected by moving cloud [
9,
10]. The ground-based sky image provides high spatial and temporal resolution on clouds [
11], thereby, it is reasonable to consider ground-based sky image for short-term solar irradiance forecasting. The introduction of sky image analysis for irradiance forecasting can be classified into a pixel-value-based group and a cloud motion detecting-based group.
The pixel-value-based group extracts numerical features from red-green-blue (RGB) color information and gray value. Fu et al. [
12] proposed to extract proper subset feature from all-sky images, e.g., mean and variance of intensity level. In addition, clear-sky index was predicted via regression instead of directly predicting solar irradiance to remove deterministic daily and seasonal variations in the data. The statistical values, e.g., entropy of every RGB channel and red-to-blue channel, were extracted from sky image in [
13], and k-nearest-neighbor (KNN) algorithm was utilized to forecast irradiance. However, the numerical experiment results illustrated that the inclusion of sky images gave rise to slight improvement in forecasting accuracy compared with methods based on endogenous data. Other than using all pixels in image, Kamadinata et al. [
14] proposed to use less pixels (20 to 60 sampling points) to reduce computational complexity and statistical values of moving cloud field were explored in [
15]. Pixel-value-based features were incorporated with various algorithms for short-term irradiance forecasting, e.g., analog ensemble and quantile regression in [
16], KNN and gradient boosting (GB) in [
17].
The motion detecting-based group explores cloud motion information, and cloud/irradiance field distribution in the future. Chu et al. [
18] extracted numerical cloud indexes by cloud-tracking technique and utilized artificial neural network (ANN) to predict solar irradiance. In their work, sky images were classified into clear, overcast, and partly cloudy. Yang et al. [
19] processed sky image to predict future cloud location by cloud cover, optimal depth, and mean cloud field velocity. Solar forecasting based on future cloud location outperformed image persistence forecasts. Based on cloud pattern classification, Alonso-Montesinos et al. [
20] converted digital image levels into irradiances and applied maximum cross-correlation method to obtain future predictions. Three commonly used motion detecting methods, block-matching algorithm, optical flow algorithm, and feature-matching algorithm, were integrated for solar forecasting in [
21], and particle swarm optimization was introduced to optimize weights of integrated methods.
The above motion-detecting methods strive to capture cloud motion information between adjacent images. However, these methods lack robustness due to strong condition limitations, e.g., optimal flow assumes that the image grayscale in adjacent images does not change and the feature-matching algorithms including scale-invariant features transform (SIFT) greatly rely on texture information. These strong conditions are easily violated in a complex natural environment. As a result, hand-crafted features from the above methods are intractable on large-scale datasets. In addition, these methods explore adjacent images in pairs, which ignore long-range dependency information among images. Spatiotemporal 3D convolutional neural networks (3D CNNs) were proposed to extract motion features from raw images and videos and were applied in human action recognition [
22], medical image segmentation [
23], video classification [
24], temporal action localization [
25], and spatiotemporal vision-related tasks. The 3D CNNs utilize transfer learning to initialize model weights, similar to 2D CNNs initialized with weight pre-trained on ImageNet [
26], and are fine-tuned on specific datasets. Compared with feature-extracting methods in [
13], the extracted features by 3D CNN are more robust for large datasets.
In this work, a 3D CNN model was developed to extract numerical features for data-driven GHI forecasting model. The motivation for this effort stems from the fact that sky image features derived from pixel value are inefficient and numerical features extracted by motion-detecting algorithm lack robustness. The 3D CNN was developed with effective strategies, i.e., weak supervision and transfer learning. Features derived from 3D CNN were incorporated with endogenous features for short-term GHI forecasting. Popular machine learning algorithms were introduced as GHI forecasting models and comprehensive experiments with different input features were conducted.
The main contributions of the paper are summarized as:
A 3D CNN model was proposed to extract features from ground-based sky images for short-term GHI forecasting with machine learning algorithms.
To illustrate the effectiveness of the proposed 3D CNN in feature extraction, a comprehensive comparison study was conducted against existing feature extraction method.
The proposed method for short-term GHI forecasting with ground-based sky images was verified on a large dataset
The remainders of the paper are organized as follows. The methodology for short-term GHI forecasting is carefully introduced in
Section 2, including the framework of the forecasting method, the machine learning algorithms, and the 3D-CNN-based feature extraction model. The utilized dataset is presented in
Section 3, which is followed by the presentation of experiment results in
Section 4. The conclusion is given in
Section 5.
2. Methodology
To improve GHI forecasting accuracy, sky image, which provides high temporal and spatial information on clouds, is introduced. The feature engineering is a key element for sky-image-based short-term solar irradiance forecasting. Different from previous studies, a 3D CNN model is developed as a universal feature engineering tool. Machine learning algorithms including artificial neural network (ANN), support vector machine (SVM), and k-nearest neighbor (KNN) are used as forecasting models which build the relationship between input features and future solar irradiance. To illustrate the effectiveness of the proposed 3D CNN model in feature extraction of sky images for solar irradiance forecasting, forecasting accuracy with different input features are fully explored, specifically, only endogenous features from history irradiance records, endogenous features integrating RGB color information of sky images, and endogenous features together with sky image features derived by the proposed 3D CNN model. In this Section, the machine learning algorithm-based forecasting models are firstly introduced. This is followed by the development of the 3D CNN model.
2.1. GHI Forecasting Method
To remove deterministic daily and seasonal variations in irradiance data, clear-sky index is introduced following [
12]. The relationship between clear-sky index and solar irradiance is defined as follows:
where
I denotes actual GHI,
Ics is the clear-sky irradiance of clear-sky model in [
27] and
kt is the clear-sky index at specific time
t. The GHI forecasting methods forecast clear-sky index instead of solar irradiance. Once
kt is forecasted, the corresponding solar irradiance can be obtained according to Equation (1).
For our short-term GHI forecasting with ground-based sky image, the general framework is illustrated in
Figure 1, which could be mathematically represented by:
where
represents the forecasted solar irradiance,
t indicates forecasting issuing time,
δ denotes the forecasting horizon, and
p and
q denote the number of sky-image and clear-sky index for feature extraction at issuing time
t. The functions
gexo and
gend represent the methods used to extract features from lagged images
it:t-p and lagged clear-sky indexes
kt:t-q, respectively. The function
P denotes the machine learning algorithm which maps the input features into future clear-sky index.
As illustrated in
Figure 1, the inputs of GHI forecasting models are composed of endogenous features and exogenous features. Endogenous features extracted from clear-sky index are discussed in
Section 3. Exogenous features extracted from sky image vary according to the applied feature engineering method
gexo, including the method based on pixel RGB color information in [
4] and the proposed 3D CNN method. The exogenous features are thereafter referred to as color features and CNN features, respectively.
Popular machine learning algorithms including SVM, ANN, and KNN are introduced as GHI forecasting model P. These algorithms are introduced from the scikit-learn package in Python. Brief introductions of these algorithms are presented as follows:
- 1
The SVM is originally proposed for classification task and has been successfully applied to regression analysis. The key idea for SVM is to map input data into a high-dimension feature space in which the input data can be linearly separated [
28]. In this work, Epsilon-support vector regression is introduced as a case study for SVM.
- 2
ANN is a strong and robust nonlinear method and can model the complex relationship between inputs and outputs. The architecture of ANN consists of several layers and the whole architecture is optimized by backpropagation. The combination of ANN and backpropagation is used to forecast day-ahead solar energy in [
29].
- 3
KNN is based on the similarity of predictors to forecast target value of input data. The similarity is defined by Euclidean distance between train data and input data. The performance of KNN is sensitive to hyper parameters, e.g., the number of nearest neighbors, which are fully explored by an optimization algorithm in [
13].
In addition, the smart persistence forecasting model is considered as a baseline due to its excellent performance in short-term forecasting.
2.2. 3D CNN Model for Feature Extraction
In this work, a 3D CNN model is trained to extract numerical features from sky images with the proposed weak supervision model (WSM). The training process is facilitated by transfer learning and weak supervision strategies.
The specific 3D CNN in this work is the most promising 3D ResNet in [
30]. The architecture of introduced 3D ResNets is illustrated in
Table 1, where
F denotes channels of output features and
N denotes number of blocks in every layer, and
FC indicates the last fully connected layer with
C dimensions. Two different structures with differences in network depth of 34 and 50 are explored. Both models consist of a
7 × 7 × 7 convolution layer, four concatenated layers, and a fully connected layer. The main difference for two models resulting from consisted block, i.e., basic block and bottleneck block in [
30]. For more details on the ResNet architecture, refer to [
31]. To ensure that extracted features are comparable with endogenous features in dimension, two intuitive strategies are compared in experiments. The first strategy is to replace the last fully connected layer (indicated by FCR, fully connected layer replacement). The second strategy is to maintain complete architecture and an extra fully connected layer is added at the end of ResNet. The extraction progress could be defined by a mathematic formulation as follows:
where the 3D ResNet is indicated by
g,
means ReLU activation and
means 3D input consisting of sky images.
Transfer learning helps to achieve promising results in CNN-based research, e.g., ImageNet pre-training is a common strategy in CNN-based tasks such as image classification and object detection [
32]. In GHI forecasting, the 3D CNN is initialized with model pre-trained on mega-scale video datasets [
33] to extract motion information from sky images. In [
30], the authors compared video classification accuracy among models pre-trained on 4 video datasets, and in this work, the most proper pre-trained weights are introduced according to reported video classification accuracy.
For 3D ResNet training, it is time-consuming to annotate a large dataset which is necessary for deep model training. To solve this problem, the 3D ResNet learns to extract irradiance related features with weak supervision of irradiance. The weak supervision of irradiance strategy is intuitive. It guides 3D ResNet to extract features which benefit irradiance forecasting by backpropagation. To make full use of irradiance supervision, the 3D ResNet is integrated with a 1D CNN to fuse endogenous features and predict target clear-sky index. The integrated architecture is named as weak supervision model (WSM).
Figure 2 illustrates the WSM architecture, and the whole model is optimized in an end-to-end way.
The 1D CNN consists of series basic convolutional layer and a convolutional operation, as illustrated in
Figure 2. The basic convolutional layer consists of convolutional operation, batch normalization, and ReLU activation. The last convolutional operation outputs forecasted value directly. In addition, to learn mutual dependence between two input features, a self-attention mechanism is introduced. The mechanism can be defined as follows:
where
Xl is the concatenated features,
f is a fully connected layer and
σ is the softmax activation function.
The whole WSM is optimized by minimizing the mean square error loss function:
where
and
are target clear-sky index and forecasted value, respectively, and
n indicates the number of sampling data.
WSM is introduced to train 3D ResNet to extract irradiance related features, while it is also capable of forecasting GHI. Forecasting accuracy of WSM is reported in
Section 4 and compared with GHI forecasting model in
Section 4.1. After WSM training, the 3D ResNet is retained as a feature engineering tool and no more optimization is needed.
2.3. Forecasting Performance Metric
Commonly used metrics including mean absolute error (
MAE), mean bias error (
MBE), root mean square error (
RMSE), mean absolute percentage error (
MAPE), and an improvement evaluation metric with respect to smart persistence method,
Skill score are considered in this work. These metrics are defined as follows:
where
y and
are target GHI value and forecasted GHI value, respectively,
N is the number of data samples, and
RMSEp and
RMSEm are the
RMSEs of smart persistence model and specific forecasting model, respectively.