Early Earthquake Detection Using Batch Normalization Graph Convolutional Neural Network (BNGCNN)

Bilal, Muhammad Atif; Ji, Yanju; Wang, Yongzhi; Akhter, Muhammad Pervez; Yaqub, Muhammad

doi:10.3390/app12157548

Open AccessArticle

Early Earthquake Detection Using Batch Normalization Graph Convolutional Neural Network (BNGCNN)

by

Muhammad Atif Bilal

^1,*,

Yanju Ji

¹,

Yongzhi Wang

^2,3,*

,

Muhammad Pervez Akhter

⁴

and

Muhammad Yaqub

⁴

¹

College of Instrumentation & Electrical Engineering, Jilin University, Changchun 130061, China

²

College of Geoexploration Science & Technology, Jilin University, Changchun 130061, China

³

Institute of Integrated Information for Mineral Resources Prediction, Jilin University, Changchun 130026, China

⁴

Riphah College of Computing, Riphah International University (Faisalabad Campus), Faisalabad 38000, Pakistan

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2022, 12(15), 7548; https://0-doi-org.brum.beds.ac.uk/10.3390/app12157548

Submission received: 23 June 2022 / Revised: 24 July 2022 / Accepted: 25 July 2022 / Published: 27 July 2022

(This article belongs to the Special Issue Machine Learning Applications in Seismology)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Featured Application

Earthquake Detection, Earthquake Early Warning System (EEWS), Processing of Seismic data.

Abstract

Earthquake is a major hazard to humans, buildings, and infrastructure. Early warning systems should detect an earthquake and issue a warning with earthquake information such as location, magnitude, and depth. Earthquake detection from raw waveform data using deep learning models such as graph neural networks (GNN) is becoming an important research area. The multilayered structure of the GNN with a number of epochs takes more training time. It is also hard to train the model with saturating nonlinearities. The batch normalization technique is applied to each mini-batch to reduce epochs in training and obtain a steady distribution of activation values. It improves model training and prediction accuracy. This study proposes a deep learning model batch normalization graph convolutional neural network (BNGCNN) for early earthquake detection. It consists of two main components: CNN and GNN. Input to the CNN model is multi-station and three-component waveform data with magnitude

\geq 3.0

were collected from January 2000 to January 2015 for Southern California. The extracted features of CNN are appended with location information and input to GNN model for earthquake detection. After hyperparameter tuning of the BNGCNN, when testing and evaluating the model on the Southern California dataset, our method shows promising results to the baseline model GNN by obtaining a low error rate to predict the magnitude, depth, and location of an earthquake.

Keywords:

batch normalization; deep learning; earthquake detection; graph convolution network; seismic network

1. Introduction

Earthquake is a major hazard to humans, buildings, and infrastructure. In recent years, for emergency response, early automatic detection of an earthquake from raw waveform data generated from the sensors of seismic stations is becoming an important research area [1]. For this purpose, the earthquake early warning (EEW) system generates an early warning on the targeted area a few seconds after the detection of earthquake waves without the intervention of an analyst [2]. Machine learning-based computational methods are stronger candidates for automatic earthquake detection.

Traditional machine learning and deep learning techniques have shown superior performance in many automated tasks such as text processing [3,4] image processing [5,6], and speech recognition [7,8]. Ref. [9] used an ANN-based MLP model to assess the safety of existing buildings. Results show that the model outperforms the others to classify concrete structure damage. Ref. [10] designed an earthquake early warning system using SVM to predict magnitude and peak ground velocity. The proposed system can effectively generate an alert at different levels from 0 to 3. The major drawback of machine learning techniques is their dependency on feature selection techniques. Several feature selection methods have been proposed in the past but several comparative studies show that there is no universal feature selection method that works well with all types of data. In contrast to traditional machine learning, convolutional filters in the convolutional layer extract feature automatically, and therefore, deep learning models outperform the traditional machine learning models on various tasks [11]. An important factor that can increase the performance of deep learning models is hyper-parameter tuning [12,13]. A model has several parameters such as batch size, dropout, learning rate, activation function, number of epochs, and number of convolutional filters. Finding the best value of these parameters (also called parameter tuning) is a time-consuming and resource-exhausting process. Batch normalization is a technique used for more stable and faster training of deep learning models [13,14].

Recently, deep learning techniques such as convolutional neural networks (CNN), graph neural networks (GNNs), and their ensemble models graph convolutional neural networks (GCNNs) have shown good performance for earthquake detection [15,16]. CNN has strong power to extract useful information from the seismic raw waveform data. Convolutional layers in the CNN model can extract contextual local features from input waveform data and by applying pooling operation it can learn global features. Combining these features with spatial information of stations helps for the accurate prediction of an earthquake. Adding batch normalization in CNN results in faster and more stable training of the model. GNNs have been designed particularly to process data arising from networks [17]. CNN with the GNN model has shown very good performance on seismic data processing in several studies [18].

Several past attempts for earthquake detection using deep learning techniques have some shortcomings. Ref. [18] used deep learning-based GNN method for seismic source characterization and appended latitude and longitude information with data to extract features from CNN. A graph partitioning method with CNN used for earthquake detection is proposed [15]. In this method, they did not use the actual GNN method but only classic graph theory was used. Recently, a study by [1] used GNN with CNN for seismic event classification from multiple stations but they did not use any spatial information or meta-information about stations.

Therefore, in this study, we propose a large-scale deep learning model batch normalization graph convolutional neural network (BNGCNN). The CNN part of the model extracts useful and relevant features from seismic raw waveform data collected from multiple stations and GNN processes spatial information and meta-information about multiple base stations. Batch normalization in CNN improves learning and reduces training time [13,14]. In this way, the proposed model effectively process seismic data obtained from multiple networks and can predict earthquake efficiently and correctly.

Our contributions to this study are summarized as follows:

We propose a deep learning-based model BNGCNN model for the early prediction of an earthquake.
For experiments, we use a seismological dataset having 1477 events collected from 187 stations. Event waveform data with location information have been collected from multiple seismic stations instead of a single station.
The performance of our model has been systematically analyzed by fine-tuning its several hyper-parameters.
We chose the model proposed in [18] as a baseline model to compare the results obtained from the proposed model. Results show the superiority of our model.

The rest of the paper is organized as follows: In Section 2, we discuss related work including seismic data, convolutional networks, and graph networks. Section 3 discusses the architecture of the proposed model and its parameters. A brief introduction of the seismic dataset, challenges, and statistics are given in Section 4. Section 5 includes the experimental results of the proposed model and the baseline models. Conclusion and future work are given in Section 6.

2. Related Work

Traditional machine learning requires user knowledge to extract meaningful features from the data. These feature selection or extraction methods heavily affect the performance of these models like decision trees, support vector machines, and naïve Bayes [19]. Ref. [10] designed an earthquake early warning system using SVM to predict magnitude and peak ground velocity. The proposed system can effectively generate an alert at different levels from 0 to 3. Further, applying feature selection methods to the raw data is time-consuming and more prone to errors. Deep learning models perform feature selection automatically using multiple hidden layers. These layers are also used for dimensionality reduction which makes deep learning models more powerful for processing nonlinear and complex data [20]. The architecture of deep learning models such as graph neural networks and multilayer perceptron represents effectively spatial information such as stations and their relationship [18].

Deep learning models have shown superior performance to traditional machine learning models on a variety of tasks such as image processing [21], text classification [22], blockchain, etc. Deep learning has three mainstream architectures CNN [23], GNN [5], and RNN which are inherited from the machine learning model known as multi-layer perceptron (MLP) [18]. MLP has a fully connected layered architecture that is powerful but because of its high computational time, the depth of the model is limited. The advanced architectures have multiple layers of different types (convolutional, pooling, dropout, softmax) to learn complex data and improve performance. Refs. [9,24] used an ANN-based MLP model to assess the safety of existing buildings. Results show that the model outperforms the others to classify concrete structure damage. Ref. [25] proposed a neural network-based forecasting model to predict earthquake intensity. Experiments show that DeepShake can effectively predict an earthquake five seconds before the event.

In some studies, CNN-based models have been used for seismic data processing, feature extraction, and classification. Ref. [16] investigated the CNN for rapid earthquake detection and epicenter classification from the single station waveform data. Experiments on three-component waveform data obtained from IRIS show that the proposed CNN-based model achieves 87% accuracy to predict earthquake sources over a broad range of distances and magnitudes. Ref. [16] used CNN to predict the ground shaking intensity of earthquake after 15–20 s after earthquake origin time. Ref. [23] proposed two CNN-based models to estimate the seismic response of the surface. Ref. [20] used a deep CNN model for earthquake detection and source region estimation. The proposed models predict fairly well the amplitude and the natural periods. All these studies used single-station waveform data and do not consider location information in prediction.

Recently, a hybrid deep learning model GCNN is becoming popular for earthquake detection using seismic data. Ref. [17] used the GNN model to show that along with the time series data sensor location information can be exploited using graph-based networks. Experimental results on two seismic datasets containing earthquake waveforms show promising results. Refs. [1,18] proposed a deep CNN and GNN model for earthquake events classification from multiple stations. CNN layers aggregate features from the waveform data and combining these features with spatial information of stations helps the GNN for accurate prediction of an earthquake. Ref. [15] proposed a graph partitioning-based model that uses both CNN and GNN for earthquake detection from seismic array data. They used data from multiple seismic stations but the spatial information of these stations was ignored completely.

Different events can be detected by multiple stations in different locations. Multiple stations enable a generation of heterogeneous data but there exists a relationship between the observations collected from multiple stations [1]. A single station enables the generation of a large amount of homogenous data. Therefore, earthquake detection from multiple station data is a more challenging task than single station data for deep learning models [17]. CNN-based models are often used for single-station datasets [26,27] while GCNN models are good to process multiple-station datasets [5,18].

A problem with deep learning models is the fine-tuning of their parameters. Batch normalization is a technique used for faster and stable training of a deep learning model [13,14]. In training, the objective of batch normalization is to normalize the layer output using the statistics of each mini-batch size. Recently, many studies used batch normalization in their proposed deep learning models to improve the model performance such as brain tumor detection [6], gas–liquid interface construction [28], and fault diagnosis of a machine [29]. Unfortunately, automatic earthquake detection from seismic data generated from multiple stations is deprived of the batch normalization technique.

A set of studies have been summarized in Table 1. It is seen that CNN-based studies only use single-station seismic data for earthquake detection and do not use geographical information of the station for prediction. The studies which use graph-based neural networks use the dataset where events have been collected from multiple seismic stations. Moreover, these studies include station geographical information that significantly increase model prediction. It is because GNNs are good to handle spatial information [30]. A few studies are based on other models such as transformers or RNN models such as LSTM or BiLSTM.

3. Methods

There are several advantages of using CNN on seismic waveform data: (1) It operates directly on waveform data with little preprocessing and without feature extraction; (2) it is shifted invariant and not sensitive to the time position of the feature (P- or S-waves); and (3) it does not make explicit use of existing physics-based knowledge such as S-P difference, or seismic travel times. The proposed methodology used in this study is shown in Figure 1. First, three component wave-from data are downloaded from IRIS (Incorporated Research Institute for Seismology). After preprocessing, in the second step, we use a CNN to examine the waveforms of a specific station. The three-component waveform is processed by this CNN, and then a set of features is extracted from it. After that, the geographic location (latitude and longitude) of the seismic station is appended to the feature vector to create it. In the third step, this feature vector is used as an input by the second component, which is a GNN. This GNN will recombine the time series features with the station location to create a final station-specific feature vector. This procedure is carried out once for each of the stations that make up the network, utilizing the CNN and GNN components in the same way (i.e., the same operations are applied to each station individually). Only along the time axis are the processes that constitute the convolution carried out. Output is flattened and the final prediction about the magnitude, location, and depth is made finally. The architecture of the proposed model is given in detail in Figure 2.

BNGCNN is derived from the model [18] to detect an earthquake event and predict its magnitude and location. We have modified the architecture and added CNN with a batch normalization layer for optimized training and to improve prediction accuracy. Input to the BNGCNN is an array size (50, N, 3) for all stations where the number of components in data is three, 50 is the number of stations and N is the number of samples with a sampling rate of 512 per second. As a result of the fact that seismic waveforms recorded at various stations have distinct biases, we preprocess the data in accordance with:

y_{m} = x_{m} - \frac{1}{M} \sum_{i = 1}^{M} x_{i}, m = 1, \dots, M

(1)

where

x_{m}

is the seismic waveform data,

y_{m}

is the converted seismic waveform data, and M is the total number of samples in the occurrence event, preprocessing has the effect of geometrically pushing the data center to the origin. The ordering of the stations is preserved and each waveform starts its origin time. We have normalized the waveform by its maximum value as it is recommended to improve CNN performance [16]. A stack of 5 feed-forward convolutional blocks where each block consists of 2-D convolution filters of size 1 × 5 × f_i where f = {4, 8, 16, 32, 64} is the number of filters in the ith block. Convolution is performed on the time-axis and recombined into the filters of the next layer. Convolutional operation is defined as:

y_{i}^{l + 1} (j) = k_{i}^{l} * M^{l} (j) + b_{i}^{l}

(2)

where

y_{i}^{l + 1} (j)

denotes the input of the jth neuron in the feature map

i

of layer

l + 1

.

k_{i}^{l}

the weights of the ith filter kernel in layer

l

,

M^{j}

the jth local region in the layer

l

and

b_{i}^{l}

the bias. After each convolutional layer, an activation function is employed to extract the nonlinear features. The rectified linear unit (ReLU) is a typical activation function used for this purpose and can be defined as:

ReLU (x) = m a x (0, x)

(3)

where

x

is the output of the convolutional layer. It is a piecewise linear function that returns a value of zero if the input is negative and if the input is positive, it will be output directly.

T a n h (x) = \frac{e^{x} - e^{- x}}{e^{x} + e^{- x}} = \frac{2}{1 + e^{- 2 x}} - 1 \in (- 1 t o 1)

(4)

where

e

is Euler’s constant. The advantage of this activation function is that it can return negative values which are useful if the desired output distribution contains negative values. If the value of the input is greater (more positive), then the value of the output will be closer to 1.0. On the other hand, if the value of the input is smaller (more negative), then the output will be closer to −1.0.

Batch normalization is added before the activation function. ReLU introduced non-linearity followed by a spatial dropout of 15%. Data are reduced by 1 × 4 max-pooling with stride 1 after passing three convolutional layers. The last convolutional layer does not use any activation function. To preserve the extracted features, in the last layer of the final block, we use tanh activation function not followed by a dropout. The final max-pooling layer reduced the data from Ns × 32 × 64 to Ns × 1 × 64. Ns denotes the total number of stations. For each station from Ns, it produces a feature vector of size 64. This feature vector is then appended by spatial information (longitude, latitude) and yields a feature vector Ns × 2 × 66.

The second component of the BNGCNN is a multi-layer perceptron that recombines spatial information with time-series features. It has two hidden layers 128 neurons and ReLU activation function, and spatial dropout. It produces the data of Ns × 1 × 128 size and represents the collection of features of each node in the graph. This node feature vector is aggregated with the graph feature vector by performing max reduce operation along the station’s dimension.

The final component is MLP with two hidden layers of 128 neurons. After activation using tanh, this component maps the graph feature vector into the model output of size 4 (depth, magnitude, latitude, longitude). The predicted output is mapped with the labels scaled between −1 and 1 through a mean absolute error (MAE) loss function. The loss can be expressed as the mean of the absolute differences between the actual values and those that were predicted, or it can be written as a formula as follows:

L (y, \hat{y}) = \frac{1}{N} \sum_{i - 0}^{N} | y - \hat{y} |

(5)

where

\hat{y}

is the predicted value by the model, and

y

is the actual value in the data. We used an Adam optimizer with initial learning rate 0.0001. All the components are directly connected and trained as a single model.

4. Results

4.1. Datasets

This study uses a dataset from Southern California. The area of study is important because it is the economic and social hub of the states. The dataset consists of three component raw waveforms (BHN, BHE, BHZ) recorded directly from seismic stations. N oriented north-west, E oriented west-east, and Z oriented vertically. We extracted 120 s long time windows that contained the onsets of both P- and S-wave arrivals for all the available events. An earthquake with a magnitude less than 3.0 with no depth cut-off was not considered. Each station provides a simultaneous sampling of three channels with a 24-bit resolution. Each waveform file is converted to a sac file using python libraries. Waves are filtered at a frequency range from 0.1–8.0 Hz and interpolate onto a time base

1 < t < 101

s after the event origin time, over 512 evenly spaced time samples (5 Hz sampling frequency). We considered the waveform length starting from its origin time to 120 s after its end time.

We use a python library ObsPy to collect data for datasets [32,33]. ObsPy downloads the broadband inventory and earthquake catalog of the Southern California Seismic Network (SCSN (doi:10.7914/SN/CI)). In the dataset, there is a total of 1427 events collected from 187 stations from the period 31 January 2000 to 31 December 2015 (15 years). For both seismic stations and event locations, we set the limits for latitude from [32° to 36°] and longitude [−120° to −116°]. Maps of stations and events considered in this study are given in Figure 3. The locations of these seismic stations are shown as triangles in Figure 3a and the event locations are shown as dots in Figure 3b. Statistics of the dataset are given in Table 2.

The magnitude distribution and depth distribution of each event are shown as Histograms in Figure 4a,b. Magnitude distribution shows that events are not equally distributed concerning the magnitude values. Half of the events in the dataset have magnitude ranges from 3.0 to 3.13. Only a few events have a magnitude greater than 3.9. The average magnitude value of all the events is 3.3. On the other side depth distribution, more than 60% of events have a depth range from 1.7 km to 10.2 km. Only a few events have a depth of more than 20.0 km. The average value of the depth in the dataset is 9.0 km.

4.2. Experimentation

All of the experiments are carried out on an Intel Core i7-7700 processor running at 3.60 GHz with 16 GB of RAM, an NVIDIA GeForce GTX 1080 graphics card, Windows 10, and TensorFlow 2.3 with CUDA toolkit, respectively.

To optimize the network parameters, we started our model parameters as the parameters of the model [18]. The experiments are designed to compare two different types of neural network models, GNN and BNGCNN, with the ultimate goal of developing a generalized detection model as a result of the findings. A random 80/20 split is performed on the entire data set to create a training set and a validation set, respectively. Training set constitutes 80 percent of the whole data with 1140 out of 1427 events. This dataset is used for training the model. Remaining 20 percent data used as a validation set to validate the model performance with 287 out of 1427 events. We evaluate the performance of the trained model on both the training and validation data sets separately as [18]. Several models can be compared to one another with the use of an independent test set to determine which is the most efficient. In addition, it should be emphasized that k-Fold cross-validation would almost probably result in somewhat better-performing models with smaller sample sizes, but this has not been confirmed in this study.

Finding the optimal parameters for a model requires extensive training across a large number of epochs. The term “one epoch” refers to when a model passes through all of the examples once in a forward pass and once in a backward pass in a dataset or a batch of data. A large, complicated, and noisy dataset necessitates the use of more epochs by the model. To do this, early halting from evaluating validation performance is employed. A model may be trained for 30 epochs and then the best parameters are selected from the model with the highest validation performance, as an example. Positive class labels are less likely to be mislabeled in training data than negative class labels, hence models with higher accuracy on the positive class are preferred over models with higher accuracy on the negative class. Figure 5a shows the training loss and validation loss with a different number of epochs (50 to 500). A model’s training loss reveals how well it fits the training data, whereas a model’s validation loss tells how well it fits new data that was used for validation. Our model is showing the best performance over 400 epochs where the validation loss is minimum and the model best fits the new data.

Batch size refers to the number of instances that are sent to the model at one time for processing in a single iteration of the model. A large batch would take more memory on the GPU, resulting in slower training operations. Figure 5b shows the performance of our model over different batches. We put our model through its paces on 16 different batches of varying sizes (4 to 64). The results demonstrate that a batch size of 32 produces superior results by obtaining the lowest validation loss value of 0.107, which is consistent with the findings of [6].

Figure 5c depicts the performance of our model when different dropout values are considered. To protect the network against noise and overfitting, dropouts are used. If the model is trained on an insufficient dataset, it may encounter the problem of overfitting. An alternative method would be to increase the dataset size or decrease the number of hidden units utilized for feature computation. Dropout is a model feature that deletes or deactivates inactive units in the model’s hidden layer. These units are not included in the calculations for the following rounds of the algorithm. Figure 5c illustrates that the model gets the maximum accuracy on a 0.8 dropout value.

Tests are carried out with varying batch sizes, several epochs, dropout values, and other hyperparameters. Ultimately, the goal is to develop a generalized detection model that outperforms the competition in terms of prediction accuracy. There is a maximum number of training epochs allowed for each sample size; these values are determined by reviewing a large number of potential training scenarios with a small number of epochs and manually selecting the numbers from empirical evidence. It is preferable to train for an excessive amount of time since early stopping will select the highest performing model on the validation set, which is resistant to over-fitting.

When several further experiments have been conducted to analyze hyperparameters, the most ideal choices are made for executing the ensuing stated experiments and obtaining the resulting results. Figure 5d shows the performance of the proposed model with the best-performing network parameters. Take note of how the distance between validation and train loss narrows with each passing epoch. The reason for this is that while the network learns the data, it also reduces the regularization loss (model weights), resulting in a slight discrepancy between validation and train losses. On the training set, the model, on the other hand, is still more accurate.

5. Results and Discussion

In this section, we compare the performance of our proposed model with the GNN model [18] through experiments on the California dataset. Performance is evaluated separately for training and validation datasets. Charts of Figure 6 show the mean training error, mean validation error, and mean square error obtained from the GNN model for longitude, latitude, depth, and magnitude predictions. The mean absolute difference between the predicted values and the actual values in the catalog is less than 13 km for longitude and latitude, 3.3 km for depth, and 0.13 for magnitude. The experimental results for the proposed BNGCNN model are shown in Figure 7. The mean absolute difference between the predicted values and the actual values in the catalog is less than 10 km for longitude and latitude, 2.6 km for depth, and 0.09 for magnitude. It is visible from the comparison of both Figure 6 and Figure 7 that the proposed model with batch normalization significantly reduces the error and improves accuracy.

On the training and validation datasets, we examine the model’s performance with and without station position information. The results are shown graphically in Figure 8a–d for the baseline model and Figure 8e–h for the BNGCNN model. The black color line shows the training error, the yellow line for validation error, and the blue line for errors when no station location information is used. The performance of the model when including the geographic locations of the stations is examined separately for both the train data set and the validation data set, and the results demonstrate minimal overfitting. The performance is evaluated using the combined data set even when the station locations are not included in the analysis. The model posterior is calculated by running inference 100 times on each event in the training and validation catalogs and determining the related mean and standard deviation while retaining dropout regularization. For GNN, both data sets produce similar results, indicating that overfitting on the training set is minimal. The mean absolute difference between catalog data and model projections is less than 0.13 for magnitude, 3.3 km for depth, and less than 0.11° (13 km) for latitude and longitude (which translates to a mean epicentral location error of 18 km). For the proposed BNGCNN model, averaging the latitude and longitude values from the catalog and model predictions, the mean absolute difference is less than 10 km in distance for the latitude and longitude (resulting in a mean epicentral location error of 18 km), 2.6 km in depth, and 0.09 in magnitude. The approach produces a respectable first-order estimate of location and magnitude that can be used as a starting point for further refinement using typical seismological instruments. The magnitude curve in Figure 8h illustrates some prominent peaks than Figure 8d. This is because of the scale and shift operations that a batch normalization layer uses [13]. Unlike the input layer, which requires all normalized values to have a zero mean and unit variance, Batch Norm allows its values to be shifted (to a different mean) and scaled (to a different variance). It does this by multiplying the normalized values by a factor, gamma, and adding to them a factor, beta. Therefore, the results in Figure 8b are different than Figure 8h.

Figure 9a,b show the difference between the event’s actual epicenter and the predicted epicenter of GNN and BNGCNN, respectively. Each arrow represents a single cataloged event, beginning from the predicted epicenter and pointing to the catalog epicenter. The colors represent the misfit ratio over the model posterior’s 95 percent confidence interval. As a result, the blue color denotes that the catalog epicenter is within the 95 percent confidence interval, while the red color denotes that it is not. We can compare the posterior’s confidence intervals with the actual epicenter location error because we can compute the posterior distribution for each event. A model error ratiometric, which measures the distance between predicted and observed epicenters, is used to illustrate the model uncertainty. This metric is normalized by the model posterior’s 95% confidence interval. A score of 1 indicates that the genuine epicenter location falls within a 95 percent confidence interval, while a value of 0 indicates that the true epicenter location does not lie within the confidence interval at all. In most cases, the error ratio is less than one. This means that the model posterior distribution’s expected aleatoric uncertainties are substantially smaller than the actual epistemic uncertainties.

Because regions with the highest density of seismic stations also have the lowest prediction error (see Figure 8), it appears that the spatially interpolated prediction error is somewhat related to the local density of seismic stations. The highest systematic errors are seen in the northwest and southeast corners of the selected region, where the station density is low and the model appears to be unable to obtain the boundary values of latitude and longitude. In this case, the behavior of the tanh activation function can be explained by the fact that it asymptotically approaches the value ±1, which corresponds to the range of latitudes and longitudes represented by the training samples. As a result, ever greater activations are necessary to push the final location predictions toward the boundaries of the domain, resulting in findings that are biased toward the interior of the domain. This shows a basic trade-off between resolution (prediction accuracy) in the interior of the data domain and the greatest amplitude of the predictions (which also applies to linear activation functions).

6. Conclusions

In this study, we propose a BNGCNN model to predict earthquakes with satisfying accuracy adopting multistation and 3 channel waveform data consisting of 1477 events with magnitude 3 or more than 3 from 187 stations. After preprocessing the data, CNN of the proposed model with batch normalization before activation is applied to the input data. We found that it not only helps to extract valuable features but also improves training by reducing the number of epochs and reducing nonlinearity which helps in fast training and improves prediction accuracy. We also found that incorporating station location information in feature vectors and applying GNN on the feature vector also improves the prediction accuracy of our model. We have analyzed the performance of our proposed model by analyzing its different hyperparameters which help us to find the best values of the model parameters to reduce training error. Comparison with the baseline model proves the superiority of our model on the same dataset by reducing the error rate to 3 km, 0.7 km, and 0.04 for longitude and latitude, depth, and magnitude, respectively.

For future work, we aim to explore other variants of deep learning models such as GNN with attention layer mechanisms [34] and transformers [2,31]. We also aim to test deeply deep learning models on multiple datasets with sparse and dense stations with and without station location information. The proposed model can also be applied to assess the damage done by an earthquake [9].

Author Contributions

Conceptualization, Y.J. and Y.W.; Formal analysis, M.A.B., M.P.A. and M.Y.; Funding acquisition, Y.J. and Y.W.; Investigation, M.Y.; Methodology, M.P.A.; Software, M.A.B.; Supervision, Y.W.; Visualization, M.A.B.; Writing—original draft, M.A.B.; Writing—review & editing, M.P.A. All authors have read and agreed to the published version of the manuscript.

Funding

National Key R & D Plan: 2021YFC2901801.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Kim, G.; Ku, B.; Ahn, J.K.; Ko, H. Graph Convolution Networks for Seismic Events Classification Using Raw Waveform Data from Multiple Stations. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar] [CrossRef]
Münchmeyer, J.; Bindi, D.; Leser, U.; Tilmann, F. The Transformer Earthquake Alerting Model: A New Versatile Approach to Earthquake Early Warning. Geophys. J. Int. 2021, 225, 646–656. [Google Scholar] [CrossRef]
Elnagar, A.; Al-Debsi, R.; Einea, O. Arabic Text Classification Using Deep Learning Models. Inf. Process. Manag. 2020, 57, 102121. [Google Scholar] [CrossRef]
Akhter, M.P.; Zheng, J.; Afzal, F.; Lin, H.; Riaz, S.; Mehmood, A. Supervised Ensemble Learning Methods towards Automatically Filtering Urdu Fake News within Social Media. PeerJ Comput. Sci. 2021, 7, e425. [Google Scholar] [CrossRef] [PubMed]
Zhou, J.; Cui, G.; Hu, S.; Zhang, Z.; Yang, C.; Liu, Z.; Wang, L.; Li, C.; Sun, M. Graph Neural Networks: A Review of Methods and Applications. AI Open 2020, 1, 57–81. [Google Scholar] [CrossRef]
Amin, J.; Sharif, M.; Anjum, M.A.; Raza, M.; Bukhari, S.A.C. Convolutional Neural Network with Batch Normalization for Glioma and Stroke Lesion Detection Using MRI. Cogn. Syst. Res. 2020, 59, 304–311. [Google Scholar] [CrossRef]
Park, T.J.; Kanda, N.; Dimitriadis, D.; Han, K.J.; Watanabe, S.; Narayanan, S. A Review of Speaker Diarization: Recent Advances with Deep Learning. Comput. Speech Lang. 2022, 72, 101317. [Google Scholar] [CrossRef]
Zia, T.; Zahid, U. Long Short-Term Memory Recurrent Neural Network Architectures for Urdu Acoustic Modeling. Int. J. Speech Technol. 2019, 22, 21–30. [Google Scholar] [CrossRef]
Harirchian, E.; Lahmer, T. Improved Rapid Assessment of Earthquake Hazard Safety of Structures via Artificial Neural Networks. IOP Conf. Ser. Mater. Sci. Eng. 2020, 897, 012014. [Google Scholar] [CrossRef]
Song, J.; Zhu, J.; Wang, Y.; Li, S. On-Site Alert-Level Earthquake Early Warning Using Machine-Learning-Based Prediction Equations. Geophys. J. Int. 2022, 231, 786–800. [Google Scholar] [CrossRef]
Audretsch, J. Earthquake Detection Using Deep Learning Based Approaches. Available online: https://repository.kaust.edu.sa/handle/10754/662251 (accessed on 17 March 2020).
Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
Santurkar, S.; Tsipras, D.; Ilyas, A.; Madry, A. How Does Batch Normalization Help Optimization? In Proceedings of the Advances in Neural Information Processing Systems, Montreal, ON, Canada, 3–8 December 2018. [Google Scholar]
Kalayeh, M.M.; Shah, M. Training Faster by Separating Modes of Variation in Batch-Normalized Models. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 1483–1500. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Yano, K.; Shiina, T.; Kurata, S.; Kato, A.; Komaki, F.; Sakai, S.; Hirata, N. Graph-Partitioning Based Convolutional Neural Network for Earthquake Detection Using a Seismic Array. J. Geophys. Res. Solid Earth 2021, 126, 1–17. [Google Scholar] [CrossRef]
Jozinović, D.; Lomax, A.; Štajduhar, I.; Michelini, A. Rapid Prediction of Earthquake Ground Shaking Intensity Using Raw Waveform Data and a Convolutional Neural Network. Geophys. J. Int. 2021, 222, 1379–1389. [Google Scholar] [CrossRef]
Bloemheuvel, S.; van den Hoogen, J.; Jozinović, D.; Michelini, A.; Atzmueller, M. Multivariate Time Series Regression with Graph Neural Networks. arxiv 2022, 2201, 00818. [Google Scholar] [CrossRef]
van den Ende, M.P.A.; Ampuero, J.P. Automated Seismic Source Characterization Using Deep Graph Neural Networks. Geophys. Res. Lett. 2020, 47, 1–11. [Google Scholar] [CrossRef]
Mousavi, S.M.; Beroza, G.C. A Machine-Learning Approach for Earthquake Magnitude Estimation. Geophys. Res. Lett. 2020, 47, 1–7. [Google Scholar] [CrossRef] [Green Version]
Tous, R.; Alvarado, L.; Otero, B.; Cruz, L.; Rojas, O. Deep Neural Networks for Earthquake Detection and Source Region Estimation in North-Central Venezuela. Bull. Seismol. Soc. Am. 2020, 110, 2519–2529. [Google Scholar] [CrossRef]
Ahmad, J.; Farman, H.; Jan, Z. Deep Learning Methods and Applications BT—Deep Learning: Convergence to Big Data Analytics; Khan, M., Jan, B., Farman, H., Eds.; Springer: Singapore, 2019; pp. 31–42. ISBN 978-981-13-3459-7. [Google Scholar]
Akhter, M.P.; Jiangbin, Z.; Naqvi, I.R.; Abdelmajeed, M.; Fayyaz, M. Exploring Deep Learning Approaches for Urdu Text Classification in Product Manufacturing. Enterp. Inf. Syst. 2022, 16, 223–248. [Google Scholar] [CrossRef]
Hong, S.; Nguyen, H.T.; Jung, J.; Ahn, J. Seismic Ground Response Estimation Based on Convolutional Neural Networks (Cnn). Appl. Sci. 2021, 11, 10760. [Google Scholar] [CrossRef]
Harirchian, E.; Lahmer, T.; Rasulzade, S. Earthquake Hazard Safety Assessment of Existing Buildings Using Optimized Multi-Layer Perceptron Neural Network. Energies 2020, 13, 2060. [Google Scholar] [CrossRef] [Green Version]
Datta, A.; Wu, D.J.; Zhu, W.; Cai, M.; Ellsworth, W.L. DeepShake: Shaking Intensity Prediction Using Deep Spatiotemporal RNNs for Earthquake Early Warning. Seismol. Res. Lett. 2022, 93, 1636–1649. [Google Scholar] [CrossRef]
Ochoa, L.H.; Niño, L.F.; Vargas, C.A. Fast Magnitude Determination Using a Single Seismological Station Record Implementing Machine Learning Techniques. Geod. Geodyn. 2018, 9, 34–41. [Google Scholar] [CrossRef]
Lomax, A.; Michelini, A.; Jozinović, D. An Investigation of Rapid Earthquake Characterization Using Single-Station Waveforms and a Convolutional Neural Network. Seismol. Res. Lett. 2019, 90, 517–529. [Google Scholar] [CrossRef]
Tan, C.; Li, F.; Lv, S.; Yang, Y.; Dong, F. Gas-Liquid Two-Phase Stratified Flow Interface Reconstruction with Sparse Batch Normalization Convolutional Neural Network. IEEE Sens. J. 2021, 21, 17076–17084. [Google Scholar] [CrossRef]
Wang, J.; Li, S.; An, Z.; Jiang, X.; Qian, W.; Ji, S. Batch-Normalized Deep Neural Networks for Achieving Fast Intelligent Fault Diagnosis of Machines. Neurocomputing 2019, 329, 53–65. [Google Scholar] [CrossRef]
Bacciu, D.; Errica, F.; Micheli, A.; Podda, M. A Gentle Introduction to Deep Learning for Graphs. Neural Netw. 2020, 129, 203–221. [Google Scholar] [CrossRef]
Mousavi, S.M.; Ellsworth, W.L.; Zhu, W.; Chuang, L.Y.; Beroza, G.C. Earthquake Transformer—an Attentive Deep-Learning Model for Simultaneous Earthquake Detection and Phase Picking. Nat. Commun. 2020, 11, 1–12. [Google Scholar] [CrossRef]
Krischer, L.; Megies, T.; Barsch, R.; Beyreuther, M.; Lecocq, T.; Caudron, C.; Wassermann, J. ObsPy: A Bridge for Seismology into the Scientific Python Ecosystem. Comput. Sci. Discov. 2015, 8, 014003. [Google Scholar] [CrossRef]
Megies, T.; Beyreuther, M.; Barsch, R.; Krischer, L.; Wassermann, J. ObsPy—What Can It Do for Data Centers and Observatories? Ann. Geophys. 2011, 54, 47–58. [Google Scholar] [CrossRef]
Ku, B.; Kim, G.; Ahn, J.K.; Lee, J.; Ko, H. Attention-Based Convolutional Neural Network for Earthquake Event Classification. IEEE Geosci. Remote Sens. Lett. 2021, 18, 2057–2061. [Google Scholar] [CrossRef]

Figure 1. The proposed methodology consists of three components. First, data downloading and preprocessing. Second, CNN part to examine the data to extract the feature map. Third, use extracted future map and combine it with station location information to make a final prediction.

Figure 2. The proposed architecture of the batch normalization graph convolutional neural network. 3-channel data are fed into the model. Features are extracted by CNN and batch normalization is applied before pooling operation. Location information is added to the feature vector and the GNN part makes the final prediction.

Figure 3. (a) Shows the number of stations and their location for the Southern California region. (b) Event distribution for California. The depth and magnitude of an event are encoded by its color and size.

Figure 4. The histogram in (a) shows the magnitude distribution and the histogram in (b) shows the depth distribution of the events from the Southern California dataset used in experiments by the proposed model.

Figure 5. Model performance using different values of epochs in (a), batch size in (b), dropout in (c), and iterations in (d).

Figure 6. Mean absolute error for training and validation sets obtained from the California dataset. Predicted values by GNN model and the actual values for (a) latitude, (b) longitude, (c) depth, and (d) magnitude after inserting geographical information.

Figure 7. Mean absolute error for training and validation sets obtained from the California dataset. Predicted values by the BNGCNN model and the actual values for (a) latitude, (b) longitude, (c) depth, and (d) magnitude by inserting geographical information.

Figure 8. Prediction errors of the GNN from (a–d) and BNGCNN from (e–h) models to predict latitude, longitude, depth, and magnitude of a seismic event without and with using station location information. With station spatial information, the black line shows the training error on the training subset while the yellow line shows the validation error on the validation set. The blue line shows the prediction error on combined training and validation subsets without location information.

Figure 9. Residuals of the epicentral locations (in first column) and overlay of the locations of seismic stations (in second column) on the interpolated prediction error (in km) on the California dataset. (a) Residuals of the epicentral locations and overlay of the locations of seismic stations by GNN; (b) Epicentral residuals and an overlay of seismic station locations by BNGCNN.

Table 1. A summary of the literature is discussed in this study. Most of the studies do not use spatial information about events.

Study	Model	Spatial Info	Year	Data	Station
[17]	GCNN	Yes	2022	Italy and California	Multiple
[18]	GCNN	Yes	2020	California	Multiple
[27]	CNN	No	2019	IRIS	Single
[16]	CNN	No	2021	Central Italy	Multiple
[1]	GCNN		2022	--	Multiple
[26]	SVMR	No	2018	Bogota, Colombia	Single
[31]	CNN + LSTM + BiLSTM + Transformer	No	2020	STEAD	Single
[15]	CNN and Graph	No	2021	MeSO-Net Japan	Multiple, Single
[23]	CNN	--	2021	NIED Japan	Single
[23]	Deep CNN	--	2021	CARABOBO	Single
[2]	CNN and Team. Transformer	Yes	2021	Japan, Italy	Multiple

Table 2. A summary of the Southern California dataset used in this study.

Properties	Values	Properties	Values
Period	2000–2015	Min. and Max. Latitude	[32° to 36°]
No. of events	1427	Min and Max. Longitude	[−120° to 116°]
No. of stations	187	Minimum magnitude	3.0
Filter the waveform	0.1–8 Hz	Even spaced time sample	2048 Hz
No. of stations	187	Scaled Min. max. source depth	0 to 30 km
Scaled magnitude	3–6	Time-base	$1 < t < 101$

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Bilal, M.A.; Ji, Y.; Wang, Y.; Akhter, M.P.; Yaqub, M. Early Earthquake Detection Using Batch Normalization Graph Convolutional Neural Network (BNGCNN). Appl. Sci. 2022, 12, 7548. https://0-doi-org.brum.beds.ac.uk/10.3390/app12157548

AMA Style

Bilal MA, Ji Y, Wang Y, Akhter MP, Yaqub M. Early Earthquake Detection Using Batch Normalization Graph Convolutional Neural Network (BNGCNN). Applied Sciences. 2022; 12(15):7548. https://0-doi-org.brum.beds.ac.uk/10.3390/app12157548

Chicago/Turabian Style

Bilal, Muhammad Atif, Yanju Ji, Yongzhi Wang, Muhammad Pervez Akhter, and Muhammad Yaqub. 2022. "Early Earthquake Detection Using Batch Normalization Graph Convolutional Neural Network (BNGCNN)" Applied Sciences 12, no. 15: 7548. https://0-doi-org.brum.beds.ac.uk/10.3390/app12157548

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Early Earthquake Detection Using Batch Normalization Graph Convolutional Neural Network (BNGCNN)

Abstract

Featured Application

Abstract

1. Introduction

2. Related Work

3. Methods

4. Results

4.1. Datasets

4.2. Experimentation

5. Results and Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI