3.1. Bibliometric Analysis Results
Queries 1 and 2 were used for studying each database at different time points. The number of the obtained results of each query is presented in
Table 1. The observed changes indicate the huge dynamics in the topic of EPF and its popularity. There are existing research gaps covered by researchers in multiple papers within seven months.
Although the WoS has fewer records representing the indexed scientific publications than Scopus (
Table 1), the syntaxes of formulated Queries 1 and 2 differ. The results do differ quantitatively, as the collections of publications indexed by WoS and Scopus are not the same, but they do not differ qualitatively. However, the Scopus search engine is more user-friendly and allows for more refined queries. The results presented in the
Supplementary Materials File S1 section prove that among the ten most cited papers indexed in WoS, some go beyond the energy or EPF models. There are some of the most cited papers related to the fashion, food, and movie industries. Due to this observation, these papers were excluded from further research with the VOSviewer program. However, from Scopus analysis, the most popular model (
Table S1 in
Supplementary Materials File S1) for EPF is “A moving-average filter-based hybrid ARIMA-ANN model for forecasting time series” [
63]. On the other hand, in the same Query 1 results, there is the highest growth (111%) of citations related to the paper titled “Energy markets volatility modeling using GARCH“ [
64]. In the WoS, the most popular is the paper “An empirical comparison of alternative schemes for combining electricity spot price forecasts” [
65], as proved by Query 2 (
Table S2 in
Supplementary Materials File S1).
The number of publications aggregated by year in the WoS and Scopus databases is displayed in
Figure 1. A primary database, Queries 1 and 2, was used to generate tabular results, and the graph was created using a Python tool.
Figure 1 presents the results of Queries 1 and 2 in the period 2014–2021. The year 2022 is not presented in
Figure 1. The WoS contains a lower number of scientific publications than Scopus, but on this basis, it is not clear if it contains the same documents. Contrary to other publications, there is no certainty that the WoS is a sub-collection of Scopus (differences in results collected in
Supplementary Materials File S1).
Figure 1 shows numbers for scientific books, book chapters, journal articles, and conference papers.
Figure 2 and
Figure 3 show a bibliometric map of co-occurrences of keywords related to EPF forecasting models in the energy sector. A file was selected in the VOSviewer program, which was prepared from the queries performed on the Scopus and WoS databases. A 5 minimum number of keywords co-occurrences was set for each bibliometric map (Figures 2–4 and 6 except for Figures 5 and Figure 7). In the final dialogue box in the VOSviewer program, the keywords selected that are related to forecasting models and keywords duplicates were removed.
Although the EPF subject was explored by Query 1 in Scopus and Query 2 in Wos, the keyword “electricity price forecasting” is not present in
Figure 2 and
Figure 3. The keywords which were selected and used by the VOSviewer program are gathered in
Table 2. Keywords are separated by semicolons. In
Table 2 there are clusters identified by colors, as in
Figure 2. Despite the original writing form (for example ARIMA or GARCH models), the keywords in
Table 2 are written in small letters as in the VOSviewer program. The order of clusters presented in
Table 2 is caused by the number of keywords identified by the VOSviewer.
The identified clusters are related to the different areas of the scientific interest of the analyzed scientific publications’ authors. First is the red cluster, related to the mathematical aspects of EPF models. The second cluster is marked in green and is related to deep learning, and this is additionally represented in
Table 2. In
Figure 2, there is also a third cluster with seven keywords, and this cluster is related to the classical statistical models. The fourth cluster (yellow color) consists of keywords combined with analyses of data used for EPF model formulation. The final identified by the VOSviewer cluster revolves around machine learning with two similar keywords.
Figure 3 presents the graphical results of Query 2 used for the WoS exploration. In
Figure 3, the minimum 5 co-occurrences of keywords are represented as nodes in the network. The edges of the network represent the explored co-occurrences between keywords in data obtained from WoS [
66]. The keywords used in the WoS Query 2 were collected in
Table 3 and are separated by semicolons. The keywords are written in small letters in
Table 3, despite their original writing due to the VOSviewer procedures.
In
Table 3 there are 4 clusters automatically identified and ordered by the VOSviewer program. The first cluster is marked in color red in
Figure 3 and consists of co-occurring keywords related to machine learning models of EPF. The second cluster keywords revolve around procedures and models related to the EPF. Another group of scientific publications gathered in the third blue cluster crate paper dedicated to artificial intelligence. The last cluster, colored in yellow in
Figure 3, consists of a mix of EPF procedures.
Figure 2 (results from Query 1) and
Figure 3 (results from Query 2) bibliometric maps represent the number proportions presented in
Table 1 and
Figure 1. There is a significantly lower number of the bibliometric map nodes in
Figure 3, which represent keywords identified by VOSviewer algorithms. There are, however, the common points of both bibliometric maps visible in yellow keywords (
Figure 2 and
Figure 3) related to EPF models: LSTM, deep learning, and regression models.
After the identification of the most popular EPF method in scientific works and their areas in two researched databases, the used original queries were modified to find the most accurate EPF models. Although the EPF models and procedures were explored by Query 1 in Scopus and Query 2 in Wos, the keyword “electricity price forecasting” is not present in
Figure 4 and
Figure 5. There are co-occurring keywords, along with the EPF’s most accurate models.
Figure 4 presents the graphical results of data gathered with Query 1 in the Scopus database exploration. In
Figure 4, the minimum 5 co-occurrences of keywords are represented as nodes in the network. The edges of the network represent the explored co-occurrences between keywords in data obtained from WoS. Their connection represents the common areas represented by the repetition of some keywords in clusters distinguished automatically by the VOSviewer program. The keywords extracted from Scopus by modified Query 1 were collected in
Table 4. The keywords are written again in small letters, despite their original writing.
In
Table 4 there are 5 clusters identified and ordered automatically by the VOSviewer [
67]. The first cluster is marked in color red in
Figure 4 and consists of co-occurring keywords related to neural networks and their modifications related to models of EPF [
16,
68]. Second cluster keywords revolve around procedures and models related to the EPF such as clustering and decision tree models. Another group of scientific publications gathered in the third blue cluster is dedicated to deep learning. The fourth cluster, colored in yellow in
Figure 4, consists of hybrid EPF model procedures. The last cluster in
Table 4 consists of the keywords “machine learning” and “extreme learning machines”, which are used to calculate the most accurate EPF models.
Figure 5 presents the bibliometric map of keyword co-occurrences with a minimum number of thresholds equaling 1. During the creation of a bibliometric map, the same methodology as in the Figures described earlier was applied. The uniqueness of this map comes from the vast number of clusters and their combinations. Two clusters (green, yellow, and red) were observed to be interconnected. The remaining clusters are connected by a minimum of 2 keywords between them or are so far apart that the keywords do not strictly form clusters, as there is one keyword in the cluster defined by the VOSviewer software. Therefore, only the interconnected three nodes are further described in
Table 5.
In
Table 5 there are presented only interconnected clusters from
Figure 5. The rest of the clusters are nodes with no connections. The designed minimum cluster size was established for 2 keywords. Only red and green clusters aggregate more than 2 keywords (
Table 5). The most accurate EPF models identified in the WoS database are gathered in the red cluster and are connected with the following keywords: machine learning; xgboost; sarima; ga-bp algorithm. The second cluster, colored green, consists of hybrid EPF models. The third yellow cluster is similar to the results gathered in
Table 4, and cluster 2 is related to the decision trees and classification procedures.
Keywords identified in bibliometric procedures with the use of original and modified Queries 1 and 2 are collected in
Table 6. Common keywords are present in different numbered and colored clusters when the results of original queries are compared. The hybrid models are a narrow group of methods and are a subset of machine learning. These keywords are distributed without any pattern between
Table 2 and
Table 3. In modified query results, the whole colored and numbered clusters are exchanged between
Table 4 and
Table 5. For example, the yellow 4th cluster from
Table 4 and part of the 2nd green cluster in
Table 5.
There are more differences between
Table 4 and
Table 5. Therefore, there is a small number of common keywords in
Table 6. Another reason for the observed differences between
Table 4 and
Table 5 is the different thresholds of a minimum number of co-occurring keywords, 5 and 1 keyword respectively. This indicates that the Scopus database contains more details than WoS in areas related to the EPF models [
68,
69]. That was proved by the disconnected points in the bibliometric map in
Figure 5 instead of the network in
Figure 4 (Scopus’ results).
The trend of EPF models is presented in
Figure 6 and
Figure 7. The overlay visualization map indicates which methods have been developed over time and which are still in use. The years are represented by various color ranges. In blue, green, and yellow, the topics studied over the studied years of 2014–2021 are depicted. Darker colors are used to mark the older publications, while light colors represent the newest publications.
In
Figure 6, there is a clear shift from fundamental or classical statistical EPF methods (purple and dark blue) to decision trees and their derivatives (green and light green) to keywords associated with the most recent (yellow) EPF models: LSTM, deep learning [
70], and convolutional neural networks. Additionally, there are strong relations between neural networks and machine learning, which are presented as a wider edge between these two nodes in
Figure 6, also visible in dynamic analysis in the VOSviewer software. Trend analysis does not provide information about groups or categories of methods to calculate EPF models as shown in
Figure 2,
Figure 3,
Figure 4 and
Figure 5. However, it is possible to name 3 main groups in
Figure 6 and
Figure 7, due to the years and colors: blue (2016–2017), green (2018–2019), and yellow (2020–2021).
In
Figure 7, there is movement in the interest of authors from ARIMA to LSTM. This change is visible in
Figure 7: differences between the oldest blue and the newest yellow keyword in the presented time. There is a distinguished group of keywords on the far-right side of
Figure 7, and those are extreme machine learning, support vector regression, and genetic algorithms. Those keywords represent the oldest directions of scientific publications dedicated to EPF models. There are also combined methods of EPF in the upper-left corner of
Figure 7. Classification and clustering methods are represented on the lower-left side of the same bibliometric map.
The trend of EPF models presented in
Figure 6 and
Figure 7 indicates that there are classical and new methods of EPF model formulation. An interesting observation is that hybrid models for EPF are intermediaries in both databases. The trend in publications goes towards machine learning in both explored databases. Both figures present different numbers of nodes (keywords) and edges (relations, co-occurrences) as results of standard Queries 1 and 2 for Scopus and WoS, respectively. The common elements for both Figures are the newest EPF models: ARIMA, LSTM, deep learning models (Scopus), and regression models (WoS). There are also visible differences between
Figure 6 and
Figure 7. Scopus analysis shows in
Figure 6 that the dominant node of the bibliometric map is made by EPF models based on neural networks. As a result of the WoS analysis, the machine learning as EPF model is visible as the network’s largest node (
Figure 7).
3.2. Overview of the Most Cited and Most Accurate Publications of EPF Models
The result of the use of Query 1 in the Scopus database is the most cited scientific article, which describes a hybrid model between linear and nonlinear time series forecasting models proposed by Babu and Reddy [
21]. This paper explores the linear autoregressive Integrated Moving Average (ARIMA) [
71] and nonlinear Artificial Neural Network (ANN) models to develop a new hybrid ARIMA–ANN model for the prediction of time series data [
72]. ARIMA models assume that current data is a linear function of previous data points and past errors before fitting a linear equation to the data. Moreover, ARIMA requires that the data be made stationary. In the related works [
72,
73] in the hybrid methods, the input data were collected as the sum of linear and nonlinear components. The given data, however, is not divided into linear and nonlinear components; instead, a linear ARIMA model is fitted directly to the data, and the resulting error sequence is applied to the nonlinear component. Therefore, the linear aspect of the ARIMA model is used in both these hybrid models [
63].
The most cited paper from Query 2 in WoS is “An empirical comparison of alternative schemes for combining electricity spot price forecasts” [
65]. The main findings of this article indicate the added value of combining the projections of separate methods to derive more accurate forecasts. Nevertheless, the performance is not uniform across the markets and periods investigated. Specifically, equally weighted pooling of forecasts emerges as a simple yet effective strategy when compared to alternative systems that rely on estimated combination weights, but only when no predictor regularly outperforms its competition. Constrained least squares regression (CLS) [
12] provides a balance between robustness against such well-performing individual approaches and relatively accurate forecasts, which are, on average, more accurate than those derived from the individual predictors. Some well-known forecast averaging strategies, such as ordinary least squares regression (OLS) and Bayesian Model Averaging (BMA), are inappropriate for forecasting day-ahead electricity prices.
The most cited publication from modified Query 1 was “The price prediction for the energy market based on a new method” [
74]. The described article is a common part of Query 1 (no. 6 article in
Supplementary Materials File S1 Table S1). In the article under review, a feature selection based on mutual information is implemented for day-ahead prediction of energy prices, which is essential for establishing the redundancy and relevance of selected features [
74]. Based on a neural network, a combination of wavelet transforms and a hybrid EPF method is presented.
From modified Query 2, the article titled “A novel ensemble deep learning model with dynamic error correction and multi-objective ensemble pruning for time series forecasting” was distinguished [
75] and the accurate method was elaborated. Due to high variances and low biases, standalone deep learning methods are not sufficient. In the novel forecasting model, ensemble methods are being used. The main difference between standard learning models is using fewer methods for both statistical and deep learning models. The result of that combination is better robustness in dynamic environments and improving generalization effectiveness. This article with the most cited article in WoS contributed a new framework for time series forecasting [
75].
The proposed EPF models require more computation time, memory, and resources during the construction and combination stages of the basic predictors, according to the available literature. To further enhance the computational efficiency and scalability of the model, it is necessary to investigate more efficient construction and ensemble methods for basic predictors [
75]. Another study concludes that the proposed hybrid method is over a hundred times more efficient than the bootstrap neural network (BNN) method [
76].