1. Introduction
The long-term accumulated Global Navigational Satellite System (GNSS) coordinate time series provides valuable data for geodesy and geodynamic research [
1,
2,
3]. These data not only reflect the long-term trend of change, but also represent nonlinear changes caused by geophysical effects. GNSS coordinate time series play an important role in the monitoring of crustal plate movements [
4,
5], dam or bridge deformation monitoring [
6,
7,
8,
9,
10], and the maintenance of global or regional coordinate frames [
11,
12]. The coordinates of the successive time point can be predicted by analyzing the GNSS coordinate time series, thus providing an important basis for judging the motion trend. Therefore, the prediction of GNSS coordinate time series is a highly valuable work.
It is well known that the GNSS coordinate time series reflect both the deterministic law of motion and uncertain information, which may be caused by imperfect processing models, geophysical effects, and other factors that are difficult to model [
13]. Two kinds of time-series analysis methods exist: physical modeling and numerical modeling. In the traditional physical and numerical modeling method, models of coordinate time series are constructed according to geophysics theory, the linear term, the periodic term, and gap information [
14,
15]. Usually, in these traditional modeling methods, the feature information and modeling parameters must be established artificially. The exclusion of elements will lead to systematic deviation and limitations in the results.
Deep learning is an emerging technology that forms a deep architecture by stacking learning modules in a hierarchical structure, and trains the whole network in an end-to-end manner according to gradient training. The deep learning algorithm does not need to artificially select the feature information, and automatically extracts the information suitable for the data characteristics by constructing a complex and precise network [
16]. Due to the development of artificial intelligence (AI), an increasing number of powerful algorithms have been applied in different fields and have achieved excellent results. Among these, the recurrent neural network (RNN) is one of the most popular AI methods used for time-series prediction, and can process sequence information and regard the output of the current epoch as the input for the subsequent epoch [
17,
18]. Its data-driven characteristic can effectively memorize the information of the data. However, because the RNN is subject to the problem of the vanishing gradient, it cannot easily handle long sequences [
19]. Thus, Hochreiter and Schmidhuber proposed long short-term memory (LSTM), which avoids the problem of gradient disappearance by optimizing memory cells via the use of gates [
20]. LSTM has been widely used to deal with sequence learning problems such as natural language processing (NLP), and has shown significant potential for time-series prediction, such as air quality forecasting, weather forecasting, and traffic flow prediction [
21,
22,
23].
Recently, LSTM has also been applied in the GNSS field, and has achieved remarkable results. In the monitoring of landslide deformation, Xing et al. proposed a model based on variational mode decomposition (VMD) and a stack LSTM, which had a higher forecast accuracy than that of LSTM and the EMD-LSTM network, in experiments conducted in Dashuitian [
24]. Subsequently, Xing et al. combined the double moving average (DMA) method and LSTM to predict landslide displacement, and obtained high-quality confidence intervals [
25]. Xie et al. used the LSTM algorithm to predict the periodic component of landslides, and showed that the performance of LSTM has good characteristics of dynamic features [
26]. Wang et al. developed an attention mechanism LSTM model based on Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN-AMLSTM), and confirmed its validity for landslide displacement prediction [
27]. Yang et al. used an LSTM model to predict the periodic displacement of landslides in the Three Gorges Reservoir Area, and found that the LSTM method can simulate the dynamic characteristics of landslides better than a static model due to full use of historical information [
28].
In navigation and positioning, Tan et al. used LSTM as a de-noising filter and proposed the rEKF-LSTM method to significantly improve single-point positioning accuracy [
29]. Jiang et al. proposed an LSTM-RNN algorithm to filter MEMS gyroscope outputs, and the results indicated that the method was effective for improving MEMS INS precision [
30]. Kim et al. improved the accuracy and stability of GNSS absolute solutions for autonomous vehicle navigation using LSTM [
31]. Tao et al. developed a CNN-LSTM method to mine the deep multipath features in GNSS coordinate series, and showed that the CNN-LSTM can effectively mitigate multi-GNSS multipath issues and reduce the average RMS of positioning errors [
32]. In addition, Hoang et al. proposed an LSTM structure for WiFi fingerprinting of indoor localization, and achieved a smaller average localization error than that obtained from other algorithms [
33]. Fang et al. used LSTM to support an inertial navigation system (INS), and confirmed that the algorithm can enhance the navigation accuracy compared with pure INS [
34]. The above research shows that LSTM has produced good results in both deformation monitoring and positioning in the GNSS field, and the use of deep learning has gradually become more common, providing new ideas for research.
Prior to the use of LSTM, the data must be preprocessed. The traditional approach of a single sliding window is widely used in the existing research on data preprocessing. A review of studies of image processing shows that the multiscale sliding window is widely used in this area, and has achieved good results, because it can take into account information at different scales. The multiscale sliding window is a feature extraction method for image processing in the field of computer vision [
35,
36] that is able to consider the feature information at different scales. In this study, we applied the idea of the multiscale sliding window to one-dimensional time-series data. Furthermore, we applied the algorithm that was originally conceived for application to two-dimensional data, to one-dimensional data, thus providing a new idea for the use of LSTM.
In this study, we proposed a multiscale sliding window LSTM (MSSW-LSTM) approach for GNSS time-series prediction. The new method uses several different sliding windows for data preprocessing that can capture data information at different scales. Then, the preprocessed outputs are used as inputs into the corresponding LSTM, and each LSTM can be adjusted according to the data. The structure of this article is as follows:
Section 2 details the methodology for the MSSW-LSTM. Then, the data and processing strategy are introduced in
Section 3.
Section 4 analyses the experimental results, and a discussion and conclusions are given in
Section 5.
2. Methodology
2.1. LSTM
The traditional neural network model does not encompass the processing information of the previous time span, but only concerns information of the current time. In contrast, the RNN has a memory function, which provides information of the current moment to the subsequent moment. However, the long-term dependence of the RNN leads to gradient explosion. By comparison, LSTM can avoid the problem of gradient disappearance by optimizing memory cells, via the introduction of the concept of gates.
As shown in
Figure 1a, a typical LSTM cell has three gates, i.e., input gate, forget gate, and output gate. The cell state and output hidden state are also cores of the LSTM cell.
The single-layer and multi-layer LSTM models are shown separately in
Figure 1b and
Figure 2.
The definition of the forget gate can be written as:
where
$\sigma $ is the logistic sigmoid function,
${W}_{fh}$,
${W}_{fx}$ are the weight matrix for transformation of information from cell to gate vectors,
${h}_{t-1}$ is the input of the previous time,
${x}_{t}$ is the input of the current time,
${b}_{f}$ is the offset value of the forget gate, and
${f}_{t}$ is the forget gate of the moment
$t$. The forget gate combines the input
${h}_{t-1}$ of the previous time with the input
${x}_{t}$ of the current time to selectively forget the content.
The input gate can be shown as:
where
$\sigma $ and
$\mathrm{tanh}$ are activation functions,
${W}_{ih}$,
${W}_{ix}$,
${W}_{ch}$,
${W}_{cx}$ are weight matrixes,
${h}_{t-1}$ is the input of the previous time,
${x}_{t}$ is the input of the current time,
${b}_{i}$ and
${b}_{c}$ are offset values of the input gate, and
${i}_{t}$ and
${\tilde{C}}_{t}$ are the input gates of the
$t$ moment. The input gate combines the input
${h}_{t-1}$ of the previous time with the input
${x}_{t}$ of the current time to selectively remember the content.
The definition of the cell state update can be written as:
where
${f}_{t}$ is the forget gate,
${C}_{t-1}$ represents the information of the previous moment on the main line, and
${i}_{t}$ is the input gate.
${\tilde{C}}_{t}$ denotes information that should be memorized at time
$t$, and
${C}_{t}$ indicates the cell state of the main line. The main line cells selectively remember and forget the current input information. Finally, the output gate can be obtained by:
where
$\sigma $ and
$\mathrm{tanh}$ are activation functions,
${W}_{oh}$ and
${W}_{ox}$ are weight matrixes,
${h}_{t-1}$ indicates the input of the previous time,
${x}_{t}$ is the input of the current time,
${b}_{o}$ denotes offset values of the input gate,
${O}_{t}$ represents the output gate,
${C}_{t}$ is the cell state of the main line, and
${h}_{t}$ denotes the output of the
$t$ moment.
2.2. Multi-Scale Sliding Window LSTM
The sliding window, usually when dealing with two-dimensional images, is widely used in computer vision processing, such as in the fields of object detection and semantic segmentation. In this study, the concept of the sliding window was applied to data preprocessing. because GNSS coordinate time series are one dimensional, the sliding window was reduced to one dimension to construct the data sets. Traditional data preprocessing uses a single-scale sliding window to establish the initial data, as shown in
Figure 3, among which the
$length\_x$ and
$length\_y$ are unique. The current LSTM research on time series uses a single-scale sliding window, or other transformations of the data. However, the information captured by a single scale at each time has a fixed scale, and this method of constructing a dataset is not perfect. The construction of the dataset may determine the accuracy of the model training. In this study, we proposed the method of a multiscale sliding window to input different scale information into the corresponding network, form a unified dimension, and integrate the existing research into a unified processing framework.
The GNSS coordinate time series are obtained and arranged in a unified dimension according to the time sequence:
where
$m$ is the length of
$X$. The interval of the GNSS time series should a adopt uniform dimension, such as seconds, minutes, hours, days, weeks, months, or years. The construction the of multiscale sliding window is undertaken as follows:
Assume that the length of the front portion in the
ith sliding window is
$length\_xi$, and the length of the back portion is
$length\_yi$. At each time, one unit is moved to sequentially construct the data, and the following conditions are required
$v\le m-(length\_xi+length\_yi)+1$. The data formats are as follows:
In the multiscale mode, $k\ge i\ge 2$, where $k$ represents a total of $k$ scales. $length\_x1$, $length\_x2$, …, $length\_xk$ are not equal because it would be meaningless to construct duplicate data sets. However, $length\_y1$, $length\_y2$, and $length\_yk$ are equal, which is convenient for the final result of the weighting calculation.
The constructed data set is shown in Equation (9) and
Figure 4:
Figure 4 is a schematic diagram of K sliding windows of different scales. It can be seen that the sizes of the red sliding windows are different at different scales, and the sizes of the blue sliding windows are the same.
Thus, an MSSW-LSTM algorithm for GNSS time-series prediction was proposed. The overall processing flow of MSSW-LSTM is shown in
Figure 5. First, the GNSS station coordinate time series is obtained, and different datasets are constructed using the multiscale window. Following the construction of the datasets, the corresponding LSTM subnetworks are established for each data set according to the actual situation of the data set.
Each LSTM sub network has its own weight matrix after training, adjustment, and optimization. The trained parameters are saved, and the model of each subnetwork is used for prediction. The prediction results of each subnetwork (1) ${r}_{1}$, subnetwork (2) ${r}_{2}$, …, sub network (k) ${r}_{k}$ are then obtained.
The final prediction value
$R$ is the weighted value of each subnetwork prediction result, and the calculation formula is shown in Equation (10):
where
${w}_{1}$,
${w}_{2}$,…,
${w}_{k-1}$ and
${w}_{k}$ are the weights of the prediction results from each subnetwork. The sum of all weight values should be 1, as shown in Equation (11).
In general, if there is little difference between the subnetworks, the weight value of each subnetwork should be the same, as shown in Equation (12).
It should be noted that the MSSW-LSTM method has significant flexibility. For example, LSTM networks may be the same or different, and may consistent of a single layer or multiple layers. This flexibility is beneficial for researchers, who are able to select the most appropriate network model according to their own dataset characteristics and utilize the advantages of the network model.
2.3. Evaluation Criteria
To quantitatively evaluate the prediction accuracy of our proposed model, some indexes are used to calculate the difference between the real value and the predicted value. Here, the root mean square error (RMSE) and the mean absolute error (MAE) are used to evaluate the prediction accuracy [
37], and the corresponding formulas are shown as below.
where
$N$ is the number of datasets, and
${y}_{i}$ are true values and
${\stackrel{\u2322}{y}}_{i}$ are predicted values.
5. Discussion and Conclusions
In this study, a new forecasting framework, named MSSW-LSTM, comprising a multiscale sliding window (MSSW) and LSTM, was proposed for predicting GNSS time series. In the data preprocessing stage, the multiscale sliding window is used to form different training subsets, which can effectively extract the feature relationship under different scales, and facilitates mining the deep features of the data. The LSTM network can then effectively avoid the problem of gradient disappearance in the process of parameter solving. The MSSW-LSTM can use multiple LSTM networks to make simultaneous predictions, and obtains final results by weighting.
To verify the effectiveness of the MSSW-LSTM algorithm, 1000 daily solutions of the XJSS station in the Up component were selected for prediction experiments. The results of three groups of controlled experiments showed that the RMSE was reduced by 2.1%, 23.7%, and 20.1%, and MAE was decreased by 1.6%, 21.1%, and 22.2%, respectively. The experimental results showed that the proposed framework has a higher prediction accuracy and a smaller error.
It should be noted that the MSSW-LSTM method has significant flexibility. Researchers can easily construct appropriate subspace subsets formed by multiscale windows according to different data characteristics. In addition, LSTM networks may be the same or different, and may comprise single layer or multiple layers. This feature is beneficial to researchers for the selection of the most appropriate network model according to their own dataset characteristics, and for use of the advantages of the network model. MSSW-LSTM is a general framework for prediction that can be extended to other fields, such as traffic flow prediction, weather forecasting, and air quality forecasting.