## 2. Materials and Methods

As discussed earlier, seismic data of the entire Greek vicinity are readily available since 1964. These data comprise of the time of occurrence, latitude, longitude, hypocenter depth and magnitude of every recorded earthquake. These characteristics are sufficient to devise an expert heterogeneous parallel processing agglomerative spatio-temporal clustering algorithm based upon the aforementioned domain expert equations ρ = 10

^{(0.414 Μ − 1.696)} km [

10], t

_{before} = 10

^{(0.5 M − 2.1)} days [

11] and t

_{after} = 10

^{(0.51 M − 1.15)} days [

12], where M is the earthquake’s magnitude, ρ is the sphere of influence radius, t

_{before} is the temporal extent of the sphere before the occurrence of the earthquake and t

_{after} is the temporal extent of the sphere after the occurrence of the earthquake. The parallel algorithm, as outlined by the following pseudo-code, starts with the first catalogued main seismic event but eventually selects the largest earthquake in the data set to build the first cluster. It then uses the magnitude of the earthquake to compute its spatio-temporal field of influence and clusters together all earthquakes in the seismic data set falling within those boundaries.

This is enabled by using heterogeneous parallel programming by sorting earthquakes in ascending order chronologically and then selecting the first un-clustered event as the current event in the spatio-temporal clustering algorithm shown in

Figure 1. The sphere of influence of the current event, i.e., the time window and the strain radius, is then computed, thereby creating a new cluster. All earthquakes that coincide both spatially and temporally with the aforementioned sphere of influence are assigned to the new cluster. The latter process is conducted using multiple GPU parallel threads with a thread assigned to processing a single seismic event. The earthquake with the largest magnitude in the newly created cluster is then identified and becomes the maximum magnitude event of the current cluster. If the maximum magnitude event is not the current event, then a shifted sphere of influence both spatially and temporally is being computed, adding further earthquakes to the current cluster and the maximum magnitude event becomes the current event. Another scan for the largest earthquake in the cluster then occurs and the above loop repeats itself until the current event coincides with the maximum magnitude event. When this is the case, all clustered data are removed from the overall seismic data set and the process carries on chronologically with the formation of a new cluster starting with the first of the remaining un-clustered events, which then becomes the current event of the algorithm in

Figure 1. As a result, the created clusters comprise of the net product of multiple spheres of influence resulting in an irregular shape and variability in the temporal influence at various parts of the cluster. The parallel algorithm ends when the remaining earthquakes in a cluster center end up forming single point clusters or if there are no remaining earthquakes in the reduced seismic data set.

Once the data attributed to the various potentially distinct seismic zones are made readily available via the aforementioned process, it is then possible to investigate for possible correlations between the released amounts of seismic energy and the time intervals between consecutive large earthquakes of a particular potential distinct seismic zone alone. Deep learning neural networks are deployed for that purpose as they are both capable of handling large amounts of data and also perform features’ extraction and classification upon the processed data. With classic neural networks, features’ extraction requires the incorporation of domain experts’ knowledge. In the case of deep learning neural networks this is alleviated providing there is a sufficient number of hidden layers of neurons in the neural network and the data provided are characteristic of the described system [

17,

18,

19]. To ensure that for the particular case of investigating the seismic behavior of the Ionian potentially distinct seismic region, all seismic data recordings from 1964 to 2019 are shown to the deep learning neural network, whose architecture was extended to the maximum possible dimensions supported by the available hardware resources in terms of GPU compute unified device architecture (CUDA) cores and parallel processing threads. The architecture of the deep learning neural network, shown in

Figure 2, comprised of one input, six hidden and one output layer. The input layer imports to the deep learning neural network the current and previous mean seismicity rates computed for main earthquakes (main EQs) only, as well as the current and previous mean seismicity rates computed for all earthquakes (all EQs), including foreshocks and aftershocks, hereby infiltrating recursive information to the deep learning neural network. This information is being propagated through a six hidden layer network of a hundred neurons per hidden layer before reaching the single neuron output layer, which produced a crisp output.

The crisp output corresponded to the time interval, which, when added to the date of occurrence of the latest large earthquake, indicated the expected time of occurrence of the next forthcoming large seismic event. Subtractive clustering was used upon the training data set in order to obtain initial values for the synaptic weights of the neural network before training commenced. Supervised learning using CUDA C (NVIDIA, Santa Clara, CA, USA) parallel processing error back-propagation was used to train the deep learning neural network for as many epochs as necessary until the mean square error function became nearly flat or ideally zero. Additional measures were taken to prevent overfitting the training data by running in parallel a testing data set comprised of approximately thirty percent of the overall data set selected randomly with an even yearly distribution, which was kept unseen by the deep learning neural network during the training process. When the testing data error function began to stray from the equivalent training data error function, the parameters of the synaptic weights at that very stage were then those that were maintained. These parameters were then used as the final parameters of the synaptic weights of the deep learning neural network following the completion of the training process. CUDA C helped to speed up significantly the operation of the error back-propagation algorithm as the operation of each single neuron per hidden layer was allocated to a different processing thread. All threads attributed to a single hidden layer were working in parallel, bringing down processing times by a factor analogous to the number of neurons per hidden layer. Thread synchronization was applied before the parallel error back-propagation algorithm moved from one hidden layer to another. The threads of each hidden layer were organized in single blocks of linear dimensions, which formed the parallel processing grid operated by the GPU and were synchronized following the completion of each training epoch of the deep learning neural network.

## 4. Discussion

Various time periods ending with the occurrence of large seismic events from February 1966 to November 1992, November 1997, August 2003, June 2008, November 2015 and October 2018, respectively, formed progressive training data sets in the potentially distinct Ionian seismic region. Within the time period from 1992 to 2020, nine earthquakes of magnitude M

_{S} ≥ 5.8 occurred, with a mean recurrence time of approximately five years with the exception of the occurrence of a batch of four earthquakes in the first half of 2008. To assist the reader,

Table 1 lists all main earthquakes of M

_{S} ≥ 5.5 that were kept unseen by the deep learning neural network in the potentially distinct Ionian seismic zone from 1997 onwards, displaying large earthquakes of M

_{S} ≥ 5.8 in bold.

The initial training data set (1966–1992) imported to the deep learning neural network mean seismicity rates on a monthly basis along with time intervals among consecutive pairs of large earthquakes until the occurrence of the M_{S} 5.8 large earthquake on 21 November 1992. Once training was complete, the deep learning neural network received at its inputs the recursive information of the monthly mean seismicity rates of the main earthquakes as well as of all seismic events that corresponded to the time intervals between previous large main earthquakes with magnitudes of M_{S} ≥ 5.8. At its output, the deep learning neural network then generated a crisp number corresponding to the estimated time interval between the latest and the immediately forthcoming large seismic event. The deep learning neural network then produced an output of 151,772,272. This value, when added to the occurrence Unix timestamp date 722,322,439 of the aforementioned large earthquake, pointed to the timestamp 874,094,711, which corresponded to the calendar date of 12 September 1997. This date only preceded the actual date of occurrence, 18 November 1997, by approximately two months, with the observed difference possibly being attributed to the larger earthquake magnitude (M_{S} 6.1) in comparison with the M_{S} 5.8 large earthquake threshold used herein.

The deep learning neural network was then retrained with the training data set including the mean seismicity rates and the time intervals amongst consecutive large earthquakes until 18 November 1997. After training, when presented with the new unseen input data set of recursive information of the monthly mean seismicity rates of the main earthquakes as well as of all seismic events that corresponded to the time intervals between previous large main earthquakes with magnitudes of M_{S} ≥ 5.8, the deep learning neural network produced an output of 174,647,370. This value, when added to the occurrence Unix timestamp date 879,858,457 of the aforementioned large earthquake, pointed to the timestamp 1,054,505,827, which corresponded to the calendar date of 1 June 2003. This date only preceded the actual date of occurrence, 14 August 2003, by approximately two and a half months, with the observed difference possibly being attributed to the two interim medium sized earthquakes of magnitudes M_{S} 5.6 and M_{S} 5.5 that might have delayed briefly the seismic energy build up process in the Ionian potentially distinct seismic zone.

Having retrained the deep learning neural network with the next training data set including the mean seismicity rates and the time intervals amongst consecutive large earthquakes until 14 August 2003, when presented with the new unseen input data with recursive information of the monthly mean seismicity rates both of the main earthquakes as well as of all seismic events that corresponded to the time intervals between previous large main earthquakes with magnitudes of M

_{S} ≥ 5.8, the deep learning neural network produced an output of 146,034,832. This value, when added to the occurrence Unix timestamp date 1,060,838,094 of the aforementioned large earthquake, pointed to the timestamp 1,206,872,926, which corresponded to the calendar date of 30 March 2008. This date fell amidst the potential occurrence of a seismic clustering phenomenon where an initial large earthquake of M

_{S} 6.1 on 6 January 2008 with a deep hypocenter 86km in depth appeared to have triggered a series of seismic reactions at shallower depths with hypocenters at 41, 38, 25 and 25 km and magnitudes of M

_{S} 6.2, 6.1, 6.0 and 6.5 from February until June 2008, respectively, as shown in

Table 1.

Since the M_{S} 6.5 that occurred on 8 June 2008 was the last and also happened to be the largest of the potential seismic clustering sequence, the deep learning neural network was then retrained up to that date. After training when presented with the new unseen input data of recursive information of the monthly mean seismicity rates of the main earthquakes as well as of all seismic events that corresponded to the time intervals between previous large main earthquakes with magnitudes of M_{S} ≥ 5.8, the deep learning neural network produced an output of 219,776,812. This value, when added to the occurrence Unix timestamp date 1,212,927,928 of the aforementioned large earthquake, pointed to the timestamp 1,432,704,740, which corresponded to the calendar date of 27 May 2015. This date preceded the actual date of occurrence, 17 November 2015, by nearly six months. This is most likely due to the fact that the deep learning neural network was affected by the multiple, compact in time large earthquakes of the aforementioned possible seismic clustering phenomenon of which no other occurrence was reported in the training data set. However, the deep learning neural network tried to learn from the data and compensated for the increased seismic activity by extending its output from an average of nearly five years to a mere seven years since the occurrence of the latest large earthquake presented to it.

Following on, the deep learning neural network was retrained with the training data set now including the mean seismicity rates and the time intervals amongst consecutive large earthquakes until 17 November 2015. After training, when presented with the new unseen input data set of recursive information of the monthly mean seismicity rates of the main earthquakes as well as of all seismic events that corresponded to the time intervals between previous large main earthquakes with magnitudes of M

_{S} ≥ 5.8, the deep learning neural network produced an output of 107,495,042. This value, when added to the occurrence Unix timestamp date 1,447,744,207 of the aforementioned large earthquake, pointed to the timestamp 1,555,239,249, which corresponded to the calendar date of 14 April 2019. This date superseded the actual date of occurrence, 25 October 2018, by just over six months. This difference is possibly due to the fact that there were no occurrences of interim medium-large seismic events as indicated by

Table 1, as had been the case prior to all previous large earthquakes examined. The appearance of a non-president situation in the previously recorded data used during training was likely to be the main reason causing the aforementioned deviation.