Next Article in Journal
Acknowledgment to Reviewers of Stats in 2020
Previous Article in Journal
A Statistical Approach to Analyzing Engineering Estimates and Bids

Fusing Nature with Computational Science for Optimal Signal Extraction

by 1,*,†, 2,† and 3,†
Research Institute of Energy Management and Planning, University of Tehran, Tehran 1417466191, Iran
Department of Accounting, Islamic Azad University, Central Tehran Branch, Tehran 1955847781, Iran
Leicester Castle Business School, De Montfort University, Leicester LE1 9BH, UK
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Received: 31 October 2020 / Revised: 30 December 2020 / Accepted: 12 January 2021 / Published: 19 January 2021


Fusing nature with computational science has been proved paramount importance and researchers have also shown growing enthusiasm on inventing and developing nature inspired algorithms for solving complex problems across subjects. Inevitably, these advancements have rapidly promoted the development of data science, where nature inspired algorithms are changing the traditional way of data processing. This paper proposes the hybrid approach, namely SSA-GA, which incorporates the optimization merits of genetic algorithm (GA) for the advancements of Singular Spectrum Analysis (SSA). This approach further boosts the performance of SSA forecasting via better and more efficient grouping. Given the performances of SSA-GA on 100 real time series data across various subjects, this newly proposed SSA-GA approach is proved to be computationally efficient and robust with improved forecasting performance.
Keywords: forecasting; Singular Spectrum Analysis; genetic algorithm forecasting; Singular Spectrum Analysis; genetic algorithm

1. Introduction

The vigorous advancements of data science and computational technologies recent decades have significantly altered the way of conducting interdisciplinary research. Meanwhile, these interdisciplinary developments have also injected novel aspects of thinking and problem solving capabilities back to the progression of computational algorithms. Scientists march on the path of seeking knowledge of everything we encounter in life and the nature, which itself acts as the most inclusive housing facility to all, always seems to have its wise answers. Just as the phrase “let nature take its course”, researchers also seek means to better appreciate the solutions nature may have to offer. It is not new that researchers invent and implement algorithms inspired by the nature as intelligent solution to complex problems and these achievements continuously bring new breakthroughs on a wider scale of science and technology. A recent review focusing on nature inspired algorithms can be found in [1]. Among which, some well established models include: the neural networks [2], which was inspired by the mechanism of biological neural networks, and has been widely applied and developed to form a large branch containing various types of computational architectures; swarm intelligence (SI) [3,4], which has been contributing to the intelligent advancements on both scientific and engineering domains, and a wide spectrum of SI inspired algorithms (i.e., bat algorithm, ant colony optimization, firefly algorithm, etc.) have emerged recent decades [1]; genetic algorithm (GA) [5], which was inspired by the theory of natural evolution, has promoted the trends of evolutionary algorithms and been widely applied for searching and optimization. The list of nature inspired algorithms goes on and new ones are developed and update the list regularly, we are not intended to review all here, but the wide scale of developments and implementations certainly reflected the significance of seeking knowledge via the mysterious means offered by nature.
Among the various branches of nature inspired algorithms, this paper focuses on GA, the one that shows extraordinary performance in optimization [1]. In brief, GA simulates the optimization process for computational problems in line with the process of natural evolution [5]. Optimal solution thus can be considered as the evolutionary outcome via mutation, crossover and selection by fitness evaluation. This algorithm is widely applicable considering the common existence of optimization problem in computational science. There have been a collection of review papers that investigate various implementation of GA in different subjects, such as chemometrics [6], electromagnetics [7], mechanical engineering [8], image reconstruction [9], production and operations management [10], supply chain management [11], economics and finance [12], etc. Moreover, there are also numerous attempts of researchers, who applied GA alone or in combination with other algorithms so to seek better solutions for specific problems. The applications of GA are rather diverse that, to the best of our knowledge, no research alone has reviewed them all.
In regard to the domain of signal extraction and forecasting, GA has certainly played an active role in the recent decades. Some of the selected topics include: bankruptcy prediction [13,14,15], credit scoring [16,17], crude oil price [18,19,20], tourism demand [21,22,23], the beta systematic risk [24], financial data [25,26], gas demand [27], electric load [28,29], wind speed [30], rainfall [31], etc. Via comprehensively exploring existing literature, it came to our attention that although GA has been applied jointly with many data analytics techniques in practice, to name a few, neural network, principal component analysis, wavelet analysis, long and short memory network, support vector machines. To the best of our knowledge, it has not been exploited jointly with Singular Spectrum Analysis (SSA) [32], which is a powerful technique for time series analysis and has been widely applied for denoising, signal extraction and forecasting [33,34,35,36].
Given the rapid development of SSA and its hybrid approaches [35,37,38,39,40], this is not the first attempt of collaborating SSA with nature inspired algorithms. There has been a successful journey full of advancements and one of the most popular collaborator is neural network. Different sub-branches of neural network have been fused with SSA for achieving better forecasting, for instance, fuzzy/Elman/Laguerre neural network and SSA are combined for wind speed forecasting in [41,42,43], for road traffic forecasting in [44], for energy demand/load forecasting in [45], for water demand forecasting in [46], etc. The authors have proposed the Colonial Theory (CT) inspired SSA-CT approach back in [35], which incorporates CT for an improved “grouping” process of basic SSA. Moreover, the authors also explored the hybrid approach of SSA and neural network for improving the prediction of tourism demand in [40]. This paper serves as a further development of [35] via implementing GA for efficiently optimising of the “grouping” stage of basic SSA so to achieve improved forecasting. It is also of note that this paper contributes to the literature where for the first time to the best of our knowledge, SSA and GA are jointly collaborated for forecasting advancements. Furthermore, in order to provide robust validation of this newly proposed approach and reveal its true performance in forecasting, 100 real time series from various subjects are considered in this paper.
The reminder of the paper is organized as follows: Section 2 demonstrates the basic SSA and CT inspired SSA-CT [35] processes. Section 3 introduces the newly proposed SSA-GA approach which was developed incorporating the advanced features of GA. Section 4 adopted 100 real time series data across various subjects of research for evaluating the forecasting performance of SSA-GA in comparison with the basic SSA. Finally, the paper concludes in Section 5.

2. Basic SSA and SSA-CT

According to [32], the basic SSA contains two stages: Decomposition and Reconstruction, while each stage includes its own two steps, Embedding and Singular Value Decomposition (SVD), and Grouping and Diagonal Averaging, respectively. To conduct this process, two setting options will need to be decided: the window length L and number of eigenvalues r. It is of note that the detailed instructions of SSA can be found in [32], which will not be reproduced here. Instead, a brief summary of the process will be presented below and we mainly follow [35].
For the Decomposition stage, with a selected window length L, the one dimensional main time series can then be embedded into a multi-dimensional variable, which forms a trajectory matrix, this is then followed by SVD, where a group of small number of independent and interpretable components are achieved. Second stage, namely Reconstruction, starts from the important step—“grouping”. Briefly to say, this step aims to gather eigenvalues of different characters, i.e., trend, seasonality, etc., whilst leaving out those corresponding to noise. Lastly, the grouped eigenvalues will then be transformed back to a one dimensional time series, namely the signal, via performing diagonal averaging.
A common technique in SSA’s grouping stage is to choose first r components to reconstruct the signal. The number of components is selected to minimize in-sample Root Mean Square Error (RMSE) or out-of-sample forecasting RMSE. Selecting the first r components to reconstruct the signal comes from the common believe that later components are related noise in time series, since they have smaller variances and higher frequencies.
Hassani et al. (2016) [35] proposed an alternative approach, namely SSA-CT, which is inspired by CT. They showed that using first r components to reconstruct the signal does not necessarily produce the minimum RMSE results. SSA-CT considers all possible 2 L combination of components, for a given window length L, to reconstruct the signal. Then it uses the combination of components which produce minimum RMSE results. Although SSA-CT can improve the basic SSA’s results, checking all 2 L possible combinations of components to find the minimum RMSE is computationally expensive and time consuming.


Consider the non zero real valued time series { y t } 1 N . If the aim is to extract the signal from noise, all available data will be used to calculate the RMSE. If the main aim is to forecast the time series, one may divide the series in to two parts, use the first part (say 2 3 of the data) to find the minimum RMSE grouping (training data) and use the rest of the series to test the out-of-sample forecasting performance (as RMSE of the second part). The SSA-GA follows these steps:
  • Run a basic SSA on training data and find the optimum r.
  • Use the training data to build the trajectory matrix X = ( x i j ) i , j = 1 L , K = [ X 1 , , X k ] where X j = ( y j , , y L + j 1 T ) .
  • Apply the SVD for X and calculate eigenvalues λ 1 λ L and corresponding eigenvectors U 1 , , U L . Obtain V i = X T U i / λ i and X i = λ i U i V i T .
  • Define a chromosome C i as a vector of length L with binary values:
    C i = ( c i 1 , c i 2 , , c i L ) ,
    where c i j = 1 if jth components is considered for signal reconstruction and c i j = 0 , otherwise.
  • Build a population containing M chromosomes, i.e., chromosomes C 1 , C M . Generate K % ( K > 70 ) of the chromosomes in the population randomly (from uniform distribution). This will produce chromosomes C 1 to C k . Add C k + 1 = ( 0 , 0 , , 0 ) and C k + 2 = ( 1 , 1 , , 1 ) to the population (as extreme solutions). The rest of the population will be the same chromosomes as the basic SSA solution:
    c i j = 1 j r 0 j > r i = k + 3 , , M ,
    where r is the grouping parameter from basic SSA (step 1).
  • Use a binary crossover function to produce M offspring chromosomes. A simple crossover function produce offspring chromosomes as follows:
    Pair chromosomes in the population randomly.
    For a given pair of chromosomes C i and C j generate random number d from uniform distribution ( 1 d L ).
    Produce offspring chromosomes for C i and C j with switching their first d genes:
    First offspring = ( c i 1 , , c i d , c j ( d + 1 ) , , c j L ) Sec ond offspring = ( c j 1 , , c j d , c i ( d + 1 ) , , c i L )
  • Produce weight matrix W i for each of M + M chromosomes:
    W i = diag C i , i = 1 , , M + M .
  • Reconstruct the signal for each weight matrix W i :
    S ^ i = U 1 W i Σ 1 V 1 T , i = 1 , , M + M .
  • For each chromosomes generate in-sample h step ahead forecasting and calculate the in-sample RMSE for all M + M chromosomes. Select the M chromosomes with smallest RMSE as the new population.
  • Repeat steps 6 to 9 until minimum RMSE in the population does not improve for several iterations.
  • Begin with L = 2 and repeat steps 1 to 10 for 2 L N 2 , to find the L and grouping parameter which minimizes in-sample RMSE.
Adding basic SSA solution to the initial population, in step 5, will boost the searching speed and grantees that the final grouping solution will be at least as accurate as basic SSA. The SSA-GA as described above, will expedite SSA-CT’s searching for minimum RMSE solution and grantees that the final solution is at least as good as basic SSA, in the same time. Although, it should be mentioned that the minimum in sample RMSE does not necessarily grantees minimum out-of-sample RMSE.

4. Empirical Results

We used a set of 100 real time series, with different sampling frequencies, normality, stationarity and skewness characteristics, to compare the accuracy of SSA-GA whit basic SSA. The dataset is accessed through Data Market ( (accessed on 12 January 2021)) and previously was employed by Ghodsi et al. [47] and Hassane et al. [36] to compare different SSA based forecasting methods. Table 1 shows description of each time series in the dataset. The name and description of each time series and their codes assigned to improve presentation are presented in Table A1 in Appendix A. Table A2 presents descriptive statistics for all time series to enable the reader to obtain a rich understanding of the nature of the real data. This also includes skewness statistics, results from the normality (Shapiro-Wilk) and stationarity (Augmented Dickey-Fuller) tests. As it can be seen the data comes from different fields of energy, finance, health, tourism, housing market, crime, agriculture, economics, chemistry, ecology, and production, to name a few. Figure 1 shows a selection of 9/100 series used in this study.
For each time series, the out-of-sample forecasting RMSE is calculated using both basic SSA and SSA-GA, for very short, short, long and very long term forecasting horizons (i.e., h = 1 , 3 , 6 , 12 ). To compare the RMSEs from two methods, we used the RRMSE defined as ratio of SSA-GA’s RMSE to basic SSA’s RMSE (i.e., R R M S E = R M S E S S A G A / R M S E b a s i c S S A ). We also employed Kolmogorov-Smirnov Predictive Accuracy (KSPA) test [48] to compare the accuracy of two methods. Table A3 shows the RRMSEs and p-values for KSPA test, for each time series. Descriptions of RRMSEs are given in Table 2. As it can be seen, the SSA-GA’s results are not necessarily same as the basic SSA’s. As mentioned before, the SSA-GA’s in-sample RMSE is always at least as good as basic SSA. However, in-sample accuracy does not guarantee out-of sample accuracy. This means in all the cases that the SSA-GA’s result differs from basic SSA, it has better accuracy for in-sample forecasting. However, as it is evident from Table 2, it doesn’t necessarily improve out-of-sample forecasting accuracy. Figure 2 shows that the mode of RRMSEs in these 100 case is less than 1 for all forecasting horizons. According to the results given in Table 2 and Figure 2, SSA-GA and basic SSA does not dominate each other in out-of-sample forecasting accuracy. This could be the result of over-fitting in SSA-GA, since SSA-GA is always at least as accurate as basic SSA for in-sample forecasting.
In order to further investigate the accuracy of SSA-GA in forecasting time series with different characteristics, Kruskal-Wallis test is employed to compare the RRMSE of time series with different features. The Kruskal-Wallis test results are given in Table 2. As Kruskal-Wallis test results show, the sampling frequency, stationarity, normality and skewness of time series does not affect RRMSE significantly. In other words, the difference between accuracy of SSA-GA and basic SSA is not affected by these factors. According to these results, although SSA-GA has better in-sample forecasting accuracy, it may have over-fitting issue for out of sample forecasting. Nevertheless, using SSA-GA, as an advanced version of SSA-CT, can improve the basic SSA’s results and at the same time will reduce SSA-CT’s computational expenses.

5. Conclusions

Nature inspired algorithms have shown remarkable performance in solving complex problems that traditional computational approaches fail or struggle to achieve. As evident by the various achievements of nature inspired algorithms across subjects in searching, forecasting, optimising, and signal extracting. The ones which better appreciate the means of nature tend to better understand the natural mechanism that holds underlying the broad scale of science and technology. Given the emerging trends of fusing nature with computational science for the past decades, this paper aims to have SSA and GA joint forces so to achieve more efficient and accurate forecast.
To the best of our knowledge, this paper is the first research that combines the powerful time series analysis technique SSA with the widely applied and established GA. This research also progresses in line with the paper [35], in which the authors proposed the hybrid SSA-CT technique that employed CT for improving the grouping stage of basic SSA. As a developed version, SSA-GA is introduced so that the merits of optimization feature of GA is adopted for further improving the efficiency of grouping and optimizing the signal reconstruction. The performance of this newly proposed hybrid approach is verified by a collection of 100 time series covering a range of diverse subjects, also promising results are achieved, especially for the in sample reconstruction. To clearly demonstrate the comparison and critically evaluate the performance, the authors employed RMSE, RRMSE, KSPA test and Kruskal-Wallis test, so to give a comprehensive investigation of SSA-GA in comparison with basic SSA. In general, with much improved SSA-CT’s computational efficiency and better grouping process, the signal reconstruction has been significantly improved, while the out of sample forecasting shows stable performance which is robust as SSA-CT. Considering that basic SSA has already been a powerful tool in reconstruction and forecasting with outstanding performance, even small improvement and efficiency boost can indicate huge steps in terms of processing data in scale. It is recognised that the potential over fitting issue with out of sample and this will be one direction to address for our future research. Advanced versions of nature inspired algorithms could be explored alone or jointly to further improve part or more stages of SSA, as well as multivariate SSA.

Author Contributions

All authors contributed equally to this work. All authors have read and agreed to the published version of the manuscript.


This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1. List of 100 real time series.
Table A1. List of 100 real time series.
CodeName of Time Series
A001US Economic Statistics: Capacity Utilization.
A002Births by months 1853–2012.
A003Electricity: electricity net generation: total (all sectors).
A004Energy prices: average retail prices of electricity.
A005Coloured fox fur returns, Hopedale, Labrador, 1834–1925.
A006Alcohol demand (log spirits consumption per head), UK, 1870–1938.
A007Monthly Sutter county workforce, January 1946–December 1966 priesema (1979).
A008Exchange rates—monthly data: Japanese yen.
A009Exchange rates—monthly data: Pound sterling.
A010Exchange rates—monthly data: Romanian leu.
A011HICP (2005 = 100)—monthly data (annual rate of change): European Union (27 countries).
A012HICP (2005 = 100)—monthly data (annual rate of change): UK.
A013HICP (2005 = 100)—monthly data (annual rate of change): US.
A014New Homes Sold in the United States.
A015Goods, Value of Exports for United States.
A016Goods, Value of Imports for United States.
A017Market capitalisation—monthly data: UK.
A018Market capitalisation—monthly data: US.
A019Average monthly temperatures across the world (1701–2011): Bournemouth.
A020Average monthly temperatures across the world (1701–2011): Eskdalemuir.
A021Average monthly temperatures across the world (1701–2011): Lerwick.
A022Average monthly temperatures across the world (1701–2011): Valley.
A023Average monthly temperatures across the world (1701–2011): Death Valley.
A024US Economic Statistics: Personal Savings Rate.
A025Economic Policy Uncertainty Index for United States (Monthly Data).
A026Coal Production, Total for Germany.
A027Coke, Beehive Production (by Statistical Area).
A028Monthly champagne sales (in 1000’s) (p. 273: Montgomery: Fore. and T.S.).
A029Domestic Auto Production.
A030Index of Cotton Textile Production for France.
A031Index of Production of Chemical Products (by Statistical Area).
A032Index of Production of Leather Products (by Statistical Area).
A033Index of Production of Metal Products (by Statistical Area).
A034Index of Production of Mineral Fuels (by Statistical Area).
A035Industrial Production Index.
A036Knit Underwear Production (by Statistical Area).
A037Lubricants Production for United States.
A038Silver Production for United States.
A039Slab Zinc Production (by Statistical Area).
A040Annual domestic sales and advertising of Lydia E, Pinkham Medicine, 1907 to 1960.
A041Chemical concentration readings.
A042Monthly Boston armed robberies January 1966–October 1975 Deutsch and Alt (1977).
A043Monthly Minneapolis public drunkenness intakes January’66–July’78.
A044Motor vehicles engines and parts/CPI, Canada, 1976–1991.
A045Methane input into gas furnace: cu. ft/min. Sampling interval 9 s.
A046Monthly civilian population of Australia: thousand persons. February 1978–April 1991.
A047Daily total female births in California, 1959.
A048Annual immigration into the United States: thousands. 1820–1962.
A049Monthly New York City births: unknown scale. January 1946–December 1959.
A050Estimated quarterly resident population of Australia: thousand persons.
A051Annual Swedish population rates (1000’s) 1750–1849 Thomas (1940).
A052Industry sales for printing and writing paper (in Thousands of French francs).
A053Coloured fox fur production, Hebron, Labrador, 1834–1925.
A054Coloured fox fur production, Nain, Labrador, 1834–1925.
A055Coloured fox fur production, oak, Labrador, 1834–1925.
A056Monthly average daily calls to directory assistance January’62–December’76.
A057Monthly Av. residential electricity usage Iowa city 1971–1979.
A058Montly av. residential gas usage Iowa (cubic feet)*100 ’71–’79.
A059Monthly precipitation (in mm), January 1983–April 1994. London, United Kingdom.
A060Monthly water usage (mL/day), London Ontario, 1966–1988.
A061Quarterly production of Gas in Australia: million megajoules. Includes natural gas from July 1989. March 1956–September 1994.
A062Residential water consumption, January 1983–April 1994. London, United Kingdom.
A063The total generation of electricity by the U.S. electric industry (monthly data for the period January 1985–October 1996).
A064Total number of water consumers, January 1983–April 1994. London, United Kingdom.
A065Monthly milk production: pounds per cow. January 62–December 75.
A066Monthly milk production: pounds per cow. January 62–December 75, adjusted for month length.
A067Monthly total number of pigs slaughtered in Victoria. January 1980–August 1995.
A068Monthly demand repair parts large/heavy equip. Iowa 1972–1979.
A069Number of deaths and serious injuries in UK road accidents each month. January 1969–December 1984.
A070Passenger miles (Mil) flown domestic U.K. July’62–May’72.
A071Monthly hotel occupied room av. ’63–’76 B.L.Bowerman et al.
A072Weekday bus ridership, Iowa city, Iowa (monthly averages).
A073Portland Oregon average monthly bus ridership (/100).
A074U.S. airlines: monthly aircraft miles flown (Millions) 1963–1970.
A075International airline passengers: monthly totals in thousands. January 49–December 60.
A076Sales: souvenir shop at a beach resort town in Queensland, Australia. January 1987–December 1993.
A077Der Stern: Weekly sales of wholesalers A, ’71–’72.
A078Der Stern: Weekly sales of wholesalers B, ’71–’72’
A079Der Stern: Weekly sales of wholesalers ’71–’72.
A080Monthly sales of U.S. houses (thousands) 1965–1975.
A081CFE specialty writing papers monthly sales.
A082Monthly sales of new one-family houses sold in USA since 1973.
A083Wisconsin employment time series, food and kindred products, January 1961–October 1975.
A084Monthly gasoline demand Ontario gallon millions 1960–1975.
A085Wisconsin employment time series, fabricated metals, January 1961–October 1975.
A086Monthly empolyees wholes./retail Wisconsin ’61–’75 R.B.Miller.
A087US monthly sales of chemical related products. January 1971–December 1991.
A088US monthly sales of coal related products. January 1971–December 1991.
A089US monthly sales of petrol related products. January 1971–December 1991.
A090US monthly sales of vehicle related products. January 1971–December 1991.
A091Civilian labour force in Australia each month: thousands of persons. February 1978–August 1995.
A092Numbers on Unemployment Benefits in Australia: monthly January 1956–July 1992.
A093Monthly Canadian total unemployment figures (thousands) 1956–1975.
A094Monthly number of unemployed persons in Australia: thousands. February 1978–April 1991.
A095Monthly U.S. female (20 years and over) unemployment figures 1948–1981.
A096Monthly U.S. female (16–19 years) unemployment figures (thousands) 1948–1981.
A097Monthly unemployment figures in West Germany 1948–1980.
A098Monthly U.S. male (20 years and over) unemployment figures 1948–1981.
A099Wisconsin employment time series, transportation equipment, January 1961–October 1975.
A100Monthly U.S. male (16–19 years) unemployment figures (thousands) 1948–1981.
Table A2. Descriptives for the 100 time series.
Table A2. Descriptives for the 100 time series.
A001M539808056−0.55<0.01−0.60 A002M192027124988330.16<0.01−1.82
A003M4842.59 × 10 5 2.61 × 10 5 6.88 × 10 5 270.15<0.01−0.90 A004M31077228−0.24<0.010.56
A007M25229782741111137.320.79<0.01−0.80 A008M16012812819150.34<0.01−0.59
A009M1600.720.690.10130.66<0.010.53 A010M1603.413.610.8324−0.92<0.011.58
A013M1762.52.41.666−0.52<0.01−2.27 A014M606555320350.79<0.01−1.41
A015M6723.391.893.481031.09<0.012.46 A016M6725.182.895.781111.13<0.011.91
A017M24913013024190.35<0.010.24 A018M2491121142522−0.010.01 *0.06
A025M34310810033300.99<0.01−1.23 A026M27711.711.92.320−0.160.06 *−0.40
A027M1710.210.130.19881.26<0.01−1.81 A028M9648014084264054.991.55<0.01−1.66
A029M24839138511630−0.030.08 *−1.22 A030M13989921213−0.82<0.01−0.28
A031M12113413827200.05<0.011.51 A032M153113114109−0.290.45 *−0.52
A033M1151171181715−0.290.03 *−0.46 A034M1151101111110−0.530.02 *0.30
A035M1137403431780.56<0.015.14 A036M1651.081.100.2018.37−1.15<0.01−0.59
A037M4793.042.831.0233.600.46<0.010.61 A038M2839.3910.022.2724.15−0.80<0.01−1.01
A039M45254521936−0.15<0.010.08 A040Q1081382120668449.550.83<0.01−0.80
A041H19717.0617.000.392.340.150.21 *0.09 A042M118196.3166.0128.065.20.45<0.010.41
A043M151391.1267.0237.4960.720.43<0.01−1.17 A044M18813441425479.135.6−0.41<0.01−1.28
A045H296−−1887−0.050.55 *−7.66A046M15911,89011,830882.937.420.12<0.015.71
A047D36541.9842.007.3417.500.44<0.01−1.07 A048A1432.5 × 10 5 2.2 × 10 5 2.1 × 10 5 83.191.06<0.01−2.63
A049M16825.0524.952.319.25−0.020.02 *0.07 A050Q8915,27415,18413588.890.19<0.019.72
A057M106489.73465.0093.3419.060.92<0.01−1.21 A058M106124.7194.5084.1567.480.52<0.01−3.88
A059M13685.6680.2537.5443.830.91<0.01−1.88 A060M276118.61115.6326.3922.240.86<0.01−0.47
A061Q15561,72847,97653,90787.330.44<0.010.06 A062M1365.72 × 10 7 5.53 × 10 7 1.2 × 10 7 21.511.13<0.01−0.84
A063M142231.09226.7324.3710.550.520.01−0.39 A064M13631,38831,251323210.300.250.22 *−0.16
A065M156754.71761.00102.2013.540.010.04 *0.04 A066M156746.49749.1598.5913.210.080.04 *−0.38
A067M18890,64091,66113,92615.36−0.380.01 *−0.38 A068M9415401532474.3530.790.380.05 *0.54
A069M19216701631289.6117.340.53<0.01−0.74 A070M11991.0986.2032.8036.010.34<0.01−1.93
A071M168722.30709.50142.6619.750.72<0.01−0.52 A072W13659135500178430.170.67<0.01−0.68
A073M11411201158270.8924.17−0.37<0.010.76 A074M9610,38510,401220221.210.330.18 *−0.13
A075M144280.30265.50119.9742.800.57<0.01−0.35 A076M8414,315877115,7481103.37<0.01−0.29
A077W10411,90911,640123110.340.60<0.01−0.16 A078W10474,63673,60047376.350.64<0.01−0.59
A079W1041020101271.787.030.600.01 *−0.41 A080M13245.3644.0010.3822.880.170.15 *−0.81
A081M14717451730479.5227.47−0.39<0.01−1.15 A082M27552.2953.0011.9422.830.180.13 *−1.30
A083M17858.7955.806.6811.360.93<0.01−0.92 A084M1921.62 × 10 5 1.57 × 10 5 41,66125.710.32<0.010.25
A085M17840.9741.505.1112.47−0.07<0.011.45 A086M178307.56308.3546.7615.200.17<0.011.51
A087M25213.7014.086.1344.730.16<0.011.13 A088M25265.6768.2014.2521.70−0.53<0.01−0.53
A089M25210.7610.925.1147.50−0.19<0.01−0.05 A090M25211.7411.055.1143.540.38<0.01−0.88
A091M2117661762181910.700.03<0.013.27 A092M4392.21 × 10 5 5.67 × 10 4 2.35 × 10 5 106.320.77<0.011.61
A093M240413.28396.50152.8436.980.36<0.01−1.60 A094M21167876528604.628.910.56<0.012.69
A095M40813731132686.0549.960.91<0.010.60 A096M408422.38342.00252.8659.870.65<0.01−1.95
A097M3967.14 × 10 5 5.57 × 10 5 5.64 × 10 5 78.970.79<0.01−2.51 A098M4081937182579441.040.64<0.01−1.15
A099M17840.6040.504.9512.19−0.65<0.01−0.10 A100M408520.28425.50261.2250.210.64<0.01−1.65
Note: * indicates data is normally distributed based on a Shapiro-Wilk test at p = 0.01. indicates a nonstationary time series based on the Augmented Dickey-Fuller test at p = 0.01. A indicates annual, M indicates monthly, Q indicates quarterly, W indicates weekly, D indicates daily and H indicates hourly. N indicates series length.
Table A3. RRMSEs and KSPA p-values for the 100 time series.
Table A3. RRMSEs and KSPA p-values for the 100 time series.
Forecasting Horizon
Series’h = 1h = 3h = 6h = 12


  1. Yang, X.S. Nature-inspired optimization algorithms: Challenges and open problems. J. Comput. Sci. 2020, 46, 101104. [Google Scholar] [CrossRef]
  2. Markou, M.; Singh, S. Novelty detection: A review—Part 2: Neural network based approaches. Signal Process. 2003, 83, 2499–2521. [Google Scholar] [CrossRef]
  3. Shen, W.; Guo, X.; Wu, C.; Wu, D. Forecasting stock indices using radial basis function neural networks optimized by artificial fish swarm algorithm. Knowl. Based Syst. 2011, 24, 378–385. [Google Scholar] [CrossRef]
  4. Ab Wahab, M.N.; Nefti-Meziani, S.; Atyabi, A. A comprehensive review of swarm optimization algorithms. PLoS ONE 2015, 10, e0122827. [Google Scholar] [CrossRef]
  5. Holland, J.H. Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence; MIT Press: Cambridge, MA, USA, 1992. [Google Scholar]
  6. Leardi, R. Genetic algorithms in chemometrics and chemistry: A review. J. Chemom. J. Chemom. Soc. 2001, 15, 559–569. [Google Scholar] [CrossRef]
  7. Weile, D.S.; Michielssen, E. Genetic algorithm optimization applied to electromagnetics: A review. IEEE Trans. Antennas Propag. 1997, 45, 343–353. [Google Scholar] [CrossRef]
  8. Bhoskar, M.T.; Kulkarni, M.O.K.; Kulkarni, M.N.K.; Patekar, M.S.L.; Kakandikar, G.M.; Nandedkar, V.M. Genetic algorithm and its applications to mechanical engineering: A review. Mater. Today Proc. 2015, 2, 2624–2630. [Google Scholar] [CrossRef]
  9. Mirjalili, S.; Dong, J.S.; Sadiq, A.S.; Faris, H. Genetic algorithm: Theory, literature review, and application in image reconstruction. In Nature-Inspired Optimizers; Springer: Cham, Switzerland, 2020; pp. 69–85. [Google Scholar]
  10. Chaudhry, S.S.; Luo, W. Application of genetic algorithms in production and operations management: A review. Int. J. Prod. Res. 2005, 43, 4083–4101. [Google Scholar] [CrossRef]
  11. Jauhar, S.K.; Pant, M. Genetic algorithms, a nature-inspired tool: Review of applications in supply chain management. In Proceedings of the Fourth International Conference on Soft Computing for Problem Solving; Springer: New Delhi, India, 2015; pp. 71–86. [Google Scholar]
  12. Drake, A.E.; Marks, R.E. Genetic algorithms in economics and finance: Forecasting stock market prices and foreign exchange—A review. In Genetic Algorithms and Genetic Programming in Computational Finance; Springer: Boston, MA, USA, 2002; pp. 29–54. [Google Scholar]
  13. Shin, K.S.; Lee, Y.J. A genetic algorithm application in bankruptcy prediction modeling. Expert Syst. Appl. 2002, 23, 321–328. [Google Scholar] [CrossRef]
  14. Chou, C.H.; Hsieh, S.C.; Qiu, C.J. Hybrid genetic algorithm and fuzzy clustering for bankruptcy prediction. Appl. Soft Comput. 2017, 56, 298–316. [Google Scholar] [CrossRef]
  15. Zelenkov, Y.; Fedorova, E.; Chekrizov, D. Two-step classification method based on genetic algorithm for bankruptcy forecasting. Expert Syst. Appl. 2017, 88, 393–401. [Google Scholar] [CrossRef]
  16. Oreski, S.; Oreski, D.; Oreski, G. Hybrid system with genetic algorithm and artificial neural networks and its application to retail credit risk assessment. Expert Syst. Appl. 2012, 39, 12605–12617. [Google Scholar] [CrossRef]
  17. Zhang, W.; He, H.; Zhang, S. A novel multi-stage hybrid model with enhanced multi-population niche genetic algorithm: An application in credit scoring. Expert Syst. Appl. 2019, 121, 221–232. [Google Scholar] [CrossRef]
  18. Mirmirani, S.; Li, H.C. A comparison of VAR and neural networks with genetic algorithm in forecasting price of oil. Adv. Econom. 2004, 19, 203–223. [Google Scholar]
  19. Chiroma, H.; Abdulkareem, S.; Herawan, T. Evolutionary Neural Network model for West Texas Intermediate crude oil price prediction. Appl. Energy 2015, 142, 266–273. [Google Scholar] [CrossRef]
  20. Deng, S.; Xiang, Y.; Fu, Z.; Wang, M.; Wang, Y. A hybrid method for crude oil price direction forecasting using multiple timeframes dynamic time wrapping and genetic algorithm. Appl. Soft Comput. 2019, 82, 105566. [Google Scholar] [CrossRef]
  21. Chen, K.Y.; Wang, C.H. Support vector regression with genetic algorithms in forecasting tourism demand. Tour. Manag. 2007, 28, 215–226. [Google Scholar] [CrossRef]
  22. Hong, W.C.; Dong, Y.; Chen, L.Y.; Wei, S.Y. SVR with hybrid chaotic genetic algorithms for tourism demand forecasting. Appl. Soft Comput. 2011, 11, 1881–1890. [Google Scholar] [CrossRef]
  23. Chen, R.; Liang, C.Y.; Hong, W.C.; Gu, D.X. Forecasting holiday daily tourist flow based on seasonal support vector regression with adaptive genetic algorithm. Appl. Soft Comput. 2015, 26, 435–443. [Google Scholar] [CrossRef]
  24. Yuan, F.C.; Lee, C.H. Using least square support vector regression with genetic algorithm to forecast beta systematic risk. J. Comput. Sci. 2015, 11, 26–33. [Google Scholar] [CrossRef]
  25. Cai, Q.; Zhang, D.; Wu, B.; Leung, S.C. A novel stock forecasting model based on fuzzy time series and genetic algorithm. Procedia Comput. Sci. 2013, 18, 1155–1162. [Google Scholar] [CrossRef]
  26. Huang, Y.; Gao, Y.; Gan, Y.; Ye, M. A new financial data forecasting model using genetic algorithm and long short-term memory network. Neurocomputing 2020, in press. [Google Scholar] [CrossRef]
  27. Panapakidis, I.P.; Dagoumas, A.S. Day-ahead natural gas demand forecasting based on the combination of wavelet transform and ANFIS/genetic algorithm/neural network model. Energy 2017, 118, 231–245. [Google Scholar] [CrossRef]
  28. Ozturk, H.K.; Ceylan, H. Forecasting total and industrial sector electricity demand based on genetic algorithm approach: Turkey case study. Int. J. Energy Res. 2005, 29, 829–840. [Google Scholar] [CrossRef]
  29. Bouktif, S.; Fiaz, A.; Ouni, A.; Serhani, M.A. Optimal deep learning lstm model for electric load forecasting using feature selection and genetic algorithm: Comparison with machine learning approaches. Energies 2018, 11, 1636. [Google Scholar] [CrossRef]
  30. Liu, D.; Niu, D.; Wang, H.; Fan, L. Short-term wind speed forecasting using wavelet transform and support vector machines optimized by genetic algorithm. Renew. Energy 2014, 62, 592–597. [Google Scholar] [CrossRef]
  31. Nasseri, M.; Asghari, K.; Abedini, M.J. Optimized scenario for rainfall forecasting using genetic algorithm coupled with artificial neural network. Expert Syst. Appl. 2008, 35, 1415–1421. [Google Scholar] [CrossRef]
  32. Hassani, H. Singular Spectrum Analysis: Methodology and Comparison. J. Data Sci. 2007, 5, 239–257. [Google Scholar]
  33. Hassani, H.; Heravi, S.; Zhigljavsky, A. Forecasting European industrial production with singular spectrum analysis. Int. J. Forecast. 2009, 25, 103–118. [Google Scholar] [CrossRef]
  34. Hassani, H.; Rua, A.; Silva, E.S.; Thomakos, D. Monthly forecasting of GDP with mixed-frequency multivariate singular spectrum analysis. Int. J. Forecast. 2019, 35, 1263–1272. [Google Scholar] [CrossRef]
  35. Hassani, H.; Ghodsi, Z.; Silva, E.S.; Heravid, S. From nature to maths: Improving forecasting performance in subspace-based methods using genetics Colonial Theory. Digit. Signal Process. 2016, 21, 101–109. [Google Scholar] [CrossRef]
  36. Hassani, H.; Yeganegi, M.R.; Khan, A.; Silva, E.S. The effect of data transformation on Singular Spectrum Analysis for forecasting. Signals 2020, 1, 4–25. [Google Scholar] [CrossRef]
  37. Kalantari, M.; Yarmohammadi, M.; Hassani, H. Singular spectrum analysis based on L 1-norm. Fluct. Noise Lett. 2016, 15, 1650009. [Google Scholar] [CrossRef]
  38. Silva, E.S.; Hassani, H.; Ghodsi, M.; Ghodsi, Z. Forecasting with auxiliary information in forecasts using multivariate singular spectrum analysis. Inf. Sci. 2019, 479, 214–230. [Google Scholar] [CrossRef]
  39. Kalantari, M.; Hassani, H.; Silva, E.S. Weighted Linear Recurrent Forecasting in Singular Spectrum Analysis. Fluct. Noise Lett. 2020, 19, 2050010. [Google Scholar] [CrossRef]
  40. Silva, E.S.; Hassani, H.; Heravi, S.; Huang, X. Forecasting tourism demand with denoised neural networks. Ann. Tour. Res. 2019, 74, 134–154. [Google Scholar] [CrossRef]
  41. Ma, X.; Jin, Y.; Dong, Q. A generalized dynamic fuzzy neural network based on singular spectrum analysis optimized by brain storm optimization for short-term wind speed forecasting. Appl. Soft Comput. 2017, 54, 296–312. [Google Scholar] [CrossRef]
  42. Yu, C.; Li, Y.; Zhang, M. Comparative study on three new hybrid models using Elman Neural Network and Empirical Mode Decomposition based technologies improved by Singular Spectrum Analysis for hour-ahead wind speed forecasting. Energy Convers. Manag. 2017, 147, 75–85. [Google Scholar] [CrossRef]
  43. Wang, C.; Zhang, H.; Ma, P. Wind power forecasting based on singular spectrum analysis and a new hybrid Laguerre neural network. Appl. Energy 2020, 259, 114139. [Google Scholar] [CrossRef]
  44. Kolidakis, S.; Botzoris, G.; Profillidis, V.; Lemonakis, P. Road traffic forecasting—A hybrid approach combining Artificial Neural Network with Singular Spectrum Analysis. Econ. Anal. Policy 2019, 64, 159–171. [Google Scholar] [CrossRef]
  45. Sulandari, W.; Lee, M.H.; Rodrigues, P.C. Indonesian electricity load forecasting using singular spectrum analysis, fuzzy systems and neural networks. Energy 2020, 190, 116408. [Google Scholar] [CrossRef]
  46. Zubaidi, S.L.; Dooley, J.; Alkhaddar, R.M.; Abdellatif, M.; Al-Bugharbee, H.; Ortega-Martorell, S. A Novel approach for predicting monthly water demand by combining singular spectrum analysis with neural networks. J. Hydrol. 2018, 561, 136–145. [Google Scholar] [CrossRef]
  47. Ghodsi, M.; Hassani, H.; Rahmani, D.; Silva, E.S. Vector and recurrent singular spectrum analysis: Which is better at forecasting? J. Appl. Stat. 2018, 45, 1872–1899. [Google Scholar] [CrossRef]
  48. Hassani, H.; Silva, E.S. A Kolmogorov-Smirnov based test for comparing the predictive accuracy of two sets of forecasts. Econometrics 2015, 3, 590–609. [Google Scholar] [CrossRef]
Figure 1. A selection of nine real time series.
Figure 1. A selection of nine real time series.
Stats 04 00006 g001
Figure 2. Histogram of RRMSEs for different forecasting horizons (To better illustrate the data, one extreme value is removed for h = 6 and two extreme values are removed for h = 12).
Figure 2. Histogram of RRMSEs for different forecasting horizons (To better illustrate the data, one extreme value is removed for h = 6 and two extreme values are removed for h = 12).
Stats 04 00006 g002
Table 1. Number of time series with each feature.
Table 1. Number of time series with each feature.
Sampling Frequency5834422
Positive SkewNegative SkewSymmetric
Table 2. RRMSEs’ descriptives and Krskal-Wallis test results.
Table 2. RRMSEs’ descriptives and Krskal-Wallis test results.
Forecasting Horizon
h = 1h = 3h = 6h = 12
RRMSE’s Median1.06181.03621.03191.0302
N. RRMSE < 1 1 21242124
N. RRMSE > 1 2 57545754
N. RRMSE < 1 (Significantly) 3 3546
N. RRMSE > 1 (Significantly) 3 1713714
RRMSE ∼ Frequency 4 0.19750.19750.19750.1975
Kruskal-WallisRRMSE ∼ Normality 4 0.90470.90470.90470.9047
p-value’sRRMSE ∼ Stationarity 4 0.16250.16250.16250.1625
RRMSE ∼ Skewness 4 0.96180.96180.96180.9618
1 Number of RRMSEs less than 1; 2 Number of RRMSEs larger than 1; 3 Cases with KSPA’s p-value less than 0.05; 4 Kruskal-Wallis’ p-value for testing the effect of given factor on RRMSE.
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Back to TopTop