1. Introduction
Time series data are ubiquitous and important in many subject domains, and are actively utilized in a large number of analytical applications related to decision-making, modeling, planning, and so on. Summarization of a long time series is one of the tasks that most often occurs in such applications. Summarization can informally be defined as discovering a small-sized set of typical patterns (subsequences) that provide a concise representation of the long time series.
At the moment, the time series research community has proposed various approaches to formalize the concept of time series typical subsequences, namely motifs [
1,
2], representative trends [
3], shapelets [
4], etc. However, these approaches have two unavoidable limitations—that they either (a) require pre-labeled training data from the respective subject area, or (b) do not provide an analyst with information regarding the fraction of the time series that a typical subsequence found corresponds to. In a recent work [
5], the authors propose the time series snippet concept that overcomes the above-mentioned limitations. For a given length, a time series snippet is a subsequence that is similar to many other subsequences of the time series with respect to MPdist, a specially defined similarity measure [
6] based on the Euclidean distance. At the same moment, all the subsequences similar to the snippet can be exactly specified and counted. In the experimental evaluation, the snippet-based time series summarization shows adequate results for a wide range of subject domains [
5,
7]. However, the original snippet discovery algorithm has a high time complexity, namely cubic concerning the lengths of the time series and snippet [
7].
In this study, we address the problem of parallelization in the snippet discovery, continuing our research on improvement of time series management and accelerating various time series mining tasks with parallel architectures [
8,
9,
10,
11,
12,
13]. The article makes the following basic contributions: we present and formalize a novel parallelization scheme for snippet discovery on a graphics processor; in the extensive experimental evaluation over real-world time series, we show that our algorithm outruns both the original serial one and straightforward parallelization; we present the experimental results on the scalability of our algorithm with respect to its input parameters (the snippet length and subsequence length) and the similarity measure employed (based on the ordinary or squared Euclidean distance).
The remainder of the article is organized as follows. In
Section 2, we briefly discuss related works.
Section 3 contains notation and formal definitions, along with a short description of the original serial algorithm.
Section 4 presents the proposed parallel algorithm of discovering time series snippets. In
Section 5, we give the results and discussion of the experimental evaluation of our algorithm. Finally,
Section 6 summarizes the results obtained and suggests directions for further research.
2. Related Work
In [
5,
7], Keogh et al. describe the following properties that the typical subsequences found in a time series should have: scalable computability, quantifiability, diversity, diminishing returns, and domain agnosticism.
Scalable computability means that a discovery algorithm should not require a large time or space overhead.
Quantifiability supposes the association of each pattern found with a fraction of the time series represented by the pattern. The
diversity and
diminishing returns properties are about uniqueness of each pattern found and sorting all the patterns in descending order according to their fractions, respectively. Finally, the
domain agnosticism property assumes that the algorithm should be applied to a time series from any subject area without leveraging the knowledge and/or training data on the area to achieve all the properties above. Additionally, the authors discuss the following apparent approaches to discover typical time series subsequences that do not meet the requirements above, partially or fully: motifs [
1,
2], representative trends [
3], centroids from clustering a time series subsequences [
14], shapelets [
4], and random samples.
A
motif [
1,
2] is a pair of time series subsequences that are very similar to each other with respect to the chosen similarity measure. In [
2], Mueen et al. propose an effective Euclidean distance based motif discovery algorithm that employs triangular inequality to prune unpromising candidates for motif. Additionally, there is a significant number of developments that accelerate motif discovery on various parallel architectures, namely multi-core CPU [
15], Intel MIC (Many Integrated Core) accelerators [
9,
16], GPU [
12,
17], etc. However, the motif concept does not meet the quantifiability property, i.e., we are able to point out shapes and locations of typical patterns found but cannot indicate a coverage of each pattern.
In subject domains that are not related to time-series, summarizing a set of objects of the same structure is performed through the clustering methods.
Clustering assumes splitting a given set of objects into subsets (clusters) in such a way that objects from one cluster are substantially similar to each other, while objects from different clusters are substantially dissimilar to each other, with respect to the chosen similarity measure [
18]. However, clustering of all the time series subsequences with the same length is meaningless, with any distance measure, or with any algorithm, as shown by Keogh et al. in [
14].
In [
4], Ye et al. presented the shapelet concept, which assumes that the time series subsequences are previously classified. A
shapelet is a subsequence that is both the most similar one to most of the subsequences of a given class and the most dissimilar one from the subsequences belonging to other classes with respect to the chosen similarity measure. Obviously, the shapelet concept is not domain agnostic.
In [
3], Indyk et al. proposed the relaxed period and the average trend concepts, defined as follows. Let us fix the subsequence length and the similarity measure for a given time series. Then a subsequence of the time series is called a
relaxed period, if the synthetic time series of the same length as the original one and composed by repeated concatenation of the subsequence above has maximal similarity with the original time series. Next, a subsequence is called an
average trend of the time series if, for such a subsequence, the maximal sum of the squares of its similarity with all other subsequences is achieved. The described concepts assume that the time series is preliminarily split into periods with a well-defined duration and a starting index, and therefore cannot be considered as agnostic.
Simple random sampling (SRS) assumes the random selection of time series subsequences without dividing them into groups. SRS is applicable to the problem of typical time series subsequences discovery in a significantly narrow range of subject domains, and is used as a baseline for comparison with other approaches [
7].
In [
5], Keogh et al. proposed the time series snippet concept that meets all the above-mentioned properties. Informally speaking, a time series
snippet is a subsequence of a given length, which is similar to many other subsequences of the time series with respect to a specially defined similarity measure, MPdist [
6]. At the same moment, all the subsequences similar to the snippet can be exactly specified and counted. A set of snippets has significantly less cardinality than a set of the time series subsequences of the given length and therefore can be employed to summarize the original time series. In the experimental evaluation, the snippet-based discovery of typical subsequences shows adequate results for time series from a wide range of subject domains [
5,
7]. In the original paper [
5], the authors introduce the Snippet-Finder algorithm, while in the expanded version thereof [
7], in addition, they present a generalization of the algorithm to streaming settings. However, the time complexity of Snippet-Finder is cubic (more precisely,
, where
n is time series length and
m is subsequence length) [
7]. In their research, the authors did not address the parallelization of the Snippet-Finder algorithm, although the most time-consuming step of Snippet-Finder, computation of the matrix profile [
4], can be straightforwardly parallelized through the SCAMP algorithm [
19].
There are works devoted to discovery of typical patterns in a particular type of time series. For instance, representative ECG heartbeat morphologies [
20], music thumbnails [
21], and patterns of human activity in time series from wearable devices [
22].
We also mention research [
23,
24] aimed at discovering time series typical patterns through Convolutional Autoencoders (CAE). CAE is employed to reconstruct the input time series with convolutional encoding and decoding filters, while the filters contain interpretable features (patterns) of the input time series. Such an approach is not domain agnostic, since more than ten neural network parameters need to be set carefully in order to obtain good results [
24].
Summing up our overview of related work, we conclude that the snippet concept [
5,
7] is the only one that is closely related to domain agnostic typical patterns discovery in time series. However, due to its high time complexity, the Snippet-Finder algorithm requires parallelization to ensure acceptable performance over very long time series. Such parallelization is a topical issue since, to the best of our knowledge, no research has addressed the acceleration of time series snippet discovery with GPU or any other parallel hardware architecture.
3. Preliminaries
Prior to detailing the proposed parallel algorithm for snippet discovery, we introduce basic notation and formal definitions according to [
7] (see
Section 3.1), briefly describe the MPdist similarity measure [
6] the snippet concept is based on (see
Section 3.2), and give a short description of the original serial algorithm [
7] of snippet discovery (see
Section 3.3).
3.1. Notation and Definitions
A
time series is a chronologically ordered sequence of real-valued numbers:
The length of a time series, n, is denoted by . Hereinafter, we assume that the time series T fit into the main memory.
A
subsequence of a time series
T is its subset of
m successive elements that starts at the
i-th position:
In what follows, we assume that n is a multiple of m. This does not lead to a loss of generality, since when is not an integer number, we pad the time series to the end by zeros until the result of the division above becomes an integer number.
We represent a time series
T as a set of
segments, i.e., as a set of non-overlapped
m-length subsequences, and denote such a set as
:
A time series
snippet is an actual segment of
T.
Nearest neighbors are the time series subsequences of the same length that are the most similar to the snippet. Similarity of subsequences is determined through a special measure that is based on the Euclidean distance. The task-at-hand is somewhat like the clustering problem, where at the end, for each cluster, we provide an end-user with a typical representative that is optimal in the sense that it minimizes the objective function. Snippets are about the situation when the
K-medoids [
18] clustering is employed, where a medoid is an object of a set to be clustered, in contrast to the
K-means [
25] clustering algorithm, where commonly, a cluster center is not an object of the set. Snippets are arranged in an ordered list in descending order of the number of their nearest neighbors. Formally, snippets are defined as follows.
Let us denote a set of
m-length snippets of a time series
T as
:
where a time series snippet
is associated with the following attributes: an index, nearest neighbors, and a fraction. We denote these attributes as
,
, and
, respectively.
An index of a snippet is a number j of a segment that corresponds to the snippet, i.e., .
Nearest neighbors of a snippet
is a set of subsequences that are the most similar to the corresponding segment with respect to the MPdist [
6] measure (which is formally defined below in
Section 3.2):
A
fraction of a snippet
is a ratio of the number of the snippet’s nearest neighbors to the total number of
m-length subsequences in the time series:
Snippets are ordered in descending order of their fraction:
Finally, in the task-at-hand, we are given by a time series T, a segment length m, and an integer parameter K, and should find top-K snippets , including their indexes, nearest neighbors, and fractions.
3.2. The MPdist Measure
The MPdist similarity measure [
6] considers two equal-length time series to be similar if they share many similar equal-length subsequences with respect to the Euclidean distance, regardless of the order of matching subsequences. Although MPdist is a measure, not a metric (it does not obey the triangular inequality), it has a lot of merits, namely robustness to spikes, warping, linear trends, etc. [
6]. MPdist is formally defined as follows.
Let us have two equal-length time series,
A and
B (
), and
ℓ is a user-defined subsequence length with value in range of
. Typically,
ℓ is a value to be taken in range
[
6]. In what follows, the MPdist definition employs the matrix profile concept [
4]. A
matrix profile of two time series
A and
B, with respect to the subsequence length
ℓ, is a time series denoted as
, where an element of
is z-normalized Euclidean distance between an
ℓ-length subsequence in
A and its corresponding nearest neighbor in
B:
The z-normalization of the Euclidean distance between two equal-length subsequences is defined as follows:
Similarly, the matrix profile
is defined as follows:
Let us concatenate
with
and denote the resulting time series as
(in what follows, we use the symbol ⊙ to denote a concatenation of two operands):
Next, let us denote
in which the elements are ordered in ascending order as
. To compute MPdist between
A and
B with respect to the subsequence length
ℓ, the
k-th element of
is used, where
k is a user-defined parameter. Typically,
k is taken as 5 percent of
, double length of the concatenated time series
. However, if the subsequence length
ℓ is close to the time series length m, then the length of
is less than 5 percent of
, and the maximal element of
is taken as a value of MPdist. Formally speaking,
where
.
3.3. The Serial Algorithm
Below, we give a brief overview of the serial Snippet-Finder algorithm according to the original article [
7].
An
MPdist-profile of a time series
T and a given query subsequence
Q is a vector of the MPdist distances between
Q and each subsequence in
T, which is denoted as
:
Let us denote an MPdist-profile of a time series
T and its segment
as
. Next, let us consider a set of MPdist-profiles of
T and all its segments, and denote it as
D:
Discovering time series snippets is performed through the construction of the
curve M that allows an objective function.
M consists of
points and is constructed on
, a given non-empty subset of the set of MPdist-profiles
D. In
M,
i-th point represents MPdist distance between
i-th subsequence of
T and its nearest segment from the given subset:
The area under the curve
M is denoted as
and considered as an objective function:
has the intuitive property wherein if every non-overlapping subsequence is used as a snippet, its value would be exactly zero. In discovering snippets, we try to select the reduced set of segments so that the value of
will be close to zero. The algorithm performs iteratively as follows. At the first step, it selects a segment for which the
value is minimal, as a snippet:
At each further step, we employ the MPdist-profile of the segment that have been chosen as a snippet at the previous step:
All subsequent steps of the algorithm are performed similarly to the second step until
K snippets are found:
After the snippets are found, their fractions are calculated as follows:
Algorithm 1 depicts pseudo-code of Snippet-Finder. The algorithm starts by computing a set of MPdist-profiles of all segments (see line 2). It employs GetAllProfiles and MPdistProfile, the auxiliary algorithms to compute a set of MPdist-profiles and MPdist-profile of a segment, that are shown in Algorithms 2 and 3, respectively. Next, snippets are discovered through Equations (
17)–(
19) (see lines 3–11). The algorithm stops by calculating the fractions of the snippets found through Equation (
20) (see lines 12–14).
Algorithm 1 SnippetFinder (in ; out ) |
- 1:
; - 2:
GetAllProfiles - 3:
whiledo - 4:
- 5:
for to do - 6:
- 7:
if then - 8:
; - 9:
- 10:
; - 11:
- 12:
for to K do - 13:
- 14:
- 15:
return
|
Algorithm 2 GetAllProfiles (in ; out D) |
- 1:
- 2:
for to do - 3:
- 4:
- 5:
return D
|
Algorithm 3 MPdistProfile (in ; out ) |
- 1:
- 2:
for to do - 3:
- 4:
- 5:
return
|
4. Accelerating Snippet Discovery with GPU
Currently, NVIDIA graphics processors (GPU, Graphics Processing Unit) are among the most popular many-core accelerators that address data-parallel computational problems [
26]. GPU has a hierarchical architecture composed of symmetric streaming multiprocessors (SM). Each SM, in turn, consists of symmetric CUDA (Compute Unified Device Architecture) cores. CUDA application programming interface supports SIMD (Single Instruction Multiple Data) paradigm, making it possible to assign multiple threads to execute the same set of instructions over multiple data. In CUDA, all threads form a
grid that is managed as
blocks of threads. In a block, threads perform concurrently and communicate with each other through shared local resources. A CUDA function is called a
kernel. When running a kernel on GPU, an application programmer specifies the number of blocks and the number of threads in each block. Further, we show matrix data structures and kernels developed to process data in SIMD manner to discover snippets.
PSF employs data structures summarized in
Figure 1. The computational scheme of the PSF algorithm differs from the original one as follows. The calculation of a set of MPdist-profiles (see Algorithm 1, line 2) is performed more efficiently than in the original GetAllProfiles algorithm (see Algorithm 2). Instead of one serial step at which the MPdist-profile between a segment and each subsequence of the time series is calculated, we perform a sequence of four steps, each of which is parallelized. Let us describe these steps for a fixed segment
.
At the first step, we calculate a matrix of the ED
norm distances between the segment and each subsequence of the time series. Let us denote such a matrix as
:
At the second step, for the matrix obtained at the first step, we calculate column-wise minimum values. Let us denote the resulting vector of such minima as
:
At the third step, in the
matrix, we calculate row-wise minimum values in an
ℓ-length sliding window. Let us denote the resulting matrix as
:
Finally, at the fourth step, for each segment, we concatenate each column of the
matrix and all
-length subsequences from the
vector, and denote the resulting structure as
:
In fact, at the end of this step, for a specified segment, we compute a matrix profile of the segment and each subsequence of the time series. For the final calculation of the MPdist similarity measure between the segment and a subsequence, it is necessary to sort
and take the
k-th value of the ordered array, as defined in Equation (
12).
Algorithm 4 depicts pseudo-code of the above-described steps, performed in parallel. Here, calls of the EDmatrSCAMP algorithm (line 3) provide a parallel calculation of the matrix of ED
norm distances between a specified segment and each subsequence of the time series according to Equations (
8)–(
10). In EDmatrSCAMP, parallel computations are based on the technique employed in the SCAMP algorithm proposed in [
19]. This technique employs the following equations:
where
,
, and
denotes the Euclidean norm.
Algorithm 4 ParallelGetAllProfiles (in ; out D) |
- 1:
- 2:
for to do - 3:
EDmatrSCAMP - 4:
for to do ▹ PARALLEL - 5:
- 6:
for to do ▹ PARALLEL - 7:
for to do - 8:
- 9:
ParallelProfile - 10:
- 11:
return D
|
As opposed to its predecessor, GPU-STOMP
OPT [
27], while computing the Euclidean distances, SCAMP reorders floating-point computations and replaces sliding dot product update with a centered sum-of-products formula (Equations (
25)–(
28)). Equation (
26) precompute the terms used in the sum-of-products update formula of Equation (
25), and incorporate incremental mean centering into the update. Equation (
27) replaces the Euclidean distance with the Pearson Correlation that can be computed incrementally using fewer computations than the Euclidean distance, and can be converted to the z-normalized Euclidean distance in
by Equation (
28).
In ParallelGetAllProfiles (see Algorithm 4), lines 4–5 implement parallel calculation of the
vector containing column-wise minima of the
matrix according to Equation (
22). The corresponding CUDA kernel is organized as follows (see
Figure 2). We form a grid consisting of
blocks of
threads in each block. An
column is copied from the global memory to the shared memory of each block. Finally, each block finds the minimum through the reduction operation.
Next, lines 6–8 of the algorithm implement parallel calculation of the
matrix of row-wise minima in an
ℓ-length sliding window of
according to Equation (
23). The corresponding CUDA kernel is depicted in
Figure 3. We create a single block grid consisting of
threads. Each thread calculates the minimum in an
ℓ-length sliding window for one row of the matrix. The resulting matrix is stored in global memory.
After that, calculations are continued by the ParallelProfile algorithm (see Algorithm 5). The algorithm performs according to Equation (
24) through the parallel concatenation of each column of the
matrix and all the
-length subsequences included in the
vector.
Algorithm 5 ParallelProfile (in ; out P) |
- 1:
; - 2:
for to do ▹ PARALLEL - 3:
- 4:
Sort - 5:
- 6:
- 7:
return P
|
To compute this, we employ the following CUDA kernel (see
Figure 4). We create a grid consisting of
blocks of
threads in each block. Each block forms the
matrix profile for a segment. Half of the block’s threads copy
data from the global memory to the shared memory of this block, and the other half of the threads copy
data from the global memory to the shared memory of this block for each column of
. Next, each block sorts
and writes its
k-th element (see Equation (
12)) to global memory, thus forming the MPdist-profile of the segment.
We also parallelize calculations of the area under the curve and fractions of the snippets found in the original serial algorithm (see lines 5–11 and lines 12–14 in Algorithm 1, respectively).
Figure 5 depicts the respective CUDA kernels. The first kernel is a grid consisting of
blocks of
threads in each block. Each block calculates the minima of the curve. Next, elements of the curve are summed through the reduction operation. After
K snippets are found, the second CUDA kernel performs as a grid consisting of
K blocks of
threads in each block. This grid calculates the fraction of each snippet by comparing the values of the MPdist-profiles of snippets and the curve.
6. Conclusions
In this article, we addressed the task of accelerating the summarization of a long time series with a graphics processor. Informally, summarization can be described as discovering a small-sized set of typical patterns (subsequences) that briefly represent the long time series. The task of time series summarization arises in a wide spectrum of subject domains within analytical applications related to decision-making, modeling, planning, and so on.
A number of apparent approaches to time series summarization like motifs, shapelets, cluster centroids, and so on, have two unavoidable limitations that they either require pre-labeled training data, or cannot determine a fraction of the time series represented by the pattern found. The snippet concept recently proposed by Keogh et al. [
5] overcomes the above-mentioned limitations. For a given length, a time series snippet is a subsequence that is similar to many other subsequences of the time series with respect to the MPdist similarity measure [
6] based on the Euclidean distance. In addition, all the subsequences similar to the snippet can be exactly specified and counted. However, the original Snippet-Finder algorithm has a cubic time complexity concerning the lengths of the time series and snippet [
7]. Thus, Snippet-Finder requires parallelization to ensure acceptable performance over very long time series. Our extensive search in recent scientific publications showed that, to the best of our knowledge, no research addresses the acceleration of time series snippet discovery with GPU or any other parallel hardware architecture.
In the article, we employed Keogh et al.’s works [
5,
7] as a basis and proposed a novel parallelization scheme for snippet discovery on a graphics processor. Our algorithm is called PSF (Parallel Snippet-Finder) and employs advanced data structures and a computational scheme different to the original algorithm. In PSF, we calculated MPdist-profiles more efficiently than in the original algorithm: instead of one serial step at which the MPdist-distance between a snippet and each subsequence of the time series is calculated, we performed a sequence of steps, each of which is parallelized on GPU.
We carried out an extensive experimental evaluation of PSF, employing a diverse collection of time series compiled by the authors of the original serial algorithm. For evaluation purposes, we also developed a straightforward version of PSF where only the most time-consuming part of snippet discovery (computation of distances between snippets and all subsequences) is parallelized on GPU through the separate calls of the SCAMP framework [
19]. Experimental results showed that PSF outruns both the original serial algorithm and the straightforward parallel version (at least an order of magnitude and up to four times, respectively). In addition, experiments showed that PSF is well scalable with respect to its input parameters (the snippet length and subsequence length) and the similarity measure employed (based on the ordinary or squared Euclidean distance).
In further studies, we plan to extend our approach in two directions: (a) the case of a many-core CPU as an underlying hardware platform where parallelization of snippet discovery is performed through the OpenMP technology [
32] instead of CUDA, and (b) the case of a large time series that cannot be entirely placed in RAM, and snippets should be found on a high-performance cluster with GPU or many-core CPU nodes.