Non-Pattern-Based Anomaly Detection in Time-Series

Tkach, Volodymyr; Kudin, Anton; Kebande, Victor R.; Baranovskyi, Oleksii; Kudin, Ivan

doi:10.3390/electronics12030721

Open AccessArticle

Non-Pattern-Based Anomaly Detection in Time-Series

¹

Department of Computer Science (DIDA), Blekinge Institute of Technology, 371 41 Karlskrona, Sweden

²

Department of Information Security, Igor Sikorsky Kyiv Polytechnic Institute, 03057 Kyiv, Ukraine

^*

Author to whom correspondence should be addressed.

Electronics 2023, 12(3), 721; https://0-doi-org.brum.beds.ac.uk/10.3390/electronics12030721

Submission received: 14 December 2022 / Revised: 27 January 2023 / Accepted: 28 January 2023 / Published: 1 February 2023

(This article belongs to the Special Issue Futuristic Security and Privacy in 6G-Enabled IoT)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Anomaly detection across critical infrastructures is not only a key step towards detecting threats but also gives early warnings of the likelihood of potential cyber-attacks, faults, or infrastructure failures. Owing to the heterogeneity and complexity of the cybersecurity field, several anomaly detection algorithms have been suggested in the recent past based on the literature; however, there still exists little or no research that points or focuses on Non-Pattern Anomaly Detection (NP-AD) in Time-Series at the time of writing this paper. Most of the existing anomaly detection approaches refer to the initial profiling, i.e., defining which behavior represented by time series is “normal”, whereas everything that does not meet the criteria of “normality” is set as “abnormal” or anomalous. Such a definition does not reflect the complexity and sophistication of anomaly nature. Under different conditions, the same behavior may or may not be anomalous. Therefore, the authors of this paper posit the need for NP-AD in Time-Series as a step toward showing the relevance of deviating or not conforming to expected behaviors. Non-Pattern (NP), in the context of this paper, illustrates non-conforming patterns or a technique of deviating with respect to some characteristics while dynamically adapting to changes. Based on the experiments that have been conducted in this paper, it has been observed that the likelihood of NP-AD in Time-Series is a significant approach based on the margins of data streams that have been used from the perspective of non-seasonal time series with outliers, the Numenta Anomaly Benchmark (NAB) dataset and the SIEM SPLUNK machine learning toolkit. It is the authors’ opinion that this approach provides a significant step toward predicting futuristic anomalies across diverse cyber, critical infrastructures, and other complex settings.

Keywords:

anomaly; behavioral analysis; non-pattern-based anomaly detection; time series; data processing

1. Introduction

As the complexity of critical infrastructures is seen to increase, so does the impact of cyber threats, and this is being witnessed as a result of continuous information flows and increased streaming data. These data evolve over time, and in the long run, detecting patterns and maintaining a high detection accuracy in a complex setting becomes a priority [1,2]. Research by Waite, ref. [3] shows that higher functionality leads to higher complexity, and in turn, it reduces the security level of any system. Considering the consistently increasing the number of cyber-attacks as highlighted in [4,5], the most important task of a state’s cyber defense is to protect its state institutions and the whole critical infrastructure. One of the biggest problems in the field of cyber defense is the lack of effective mechanisms for preventing attacks unless specific signatures of such attacks are identified. Therefore, it becomes necessary to develop mechanisms for the preventive detection and prevention of cyber threats before their complete implementation. Provided that there is no clearly defined information about the potential threats and/or signatures of such threats. Such a mechanism should be based on identifying anomalies based on an agent’s behavior (this study considers an agent to be a user or any type of device interacting with an observable object). The authors consider an anomaly as unusual agent behavior that goes beyond the expected rather than the “normal” one, as the normality of the behavior can be defined within the subjective approach for different systems where deviations from expected behavior will be based primarily on forecasting [5,6,7].

In order to develop a non-pattern anomaly detection approach, it is necessary to ensure a continuous and real-time input of data reflecting the agent behavior, which is the basis for determining anomalies. One of the most popular tools for continuous and instant data gathering and analysis from all the infrastructure, both inside and outside the security perimeter, is a Security Information and Event Management System (SIEM) [8]. A SIEM is able to gather and analyze logs that are generated by diverse network devices, IDS, and applications with the ultimate goal of achieving secure analysis, log correlation, and security event detection [9].

Consequently, the continuous rise of zero-day vulnerabilities necessitates the need for detecting patternless anomalies. That notwithstanding, pattern-based detection in time series across critical infrastructures intrinsically varies because of some characteristics, such as a lack of labels, generalization, efficiency, etc. [10]. Furthermore, existing approaches appear to be most inclined towards pattern-based detection, i.e., conforming patterns. It is based on this premise that the authors of this paper are compelled to explore Non-Pattern based Anomaly Detection (NP-AD) in Time-Series. It is worth noting that in the context of this paper, the terminology ‘detection’ has been used to depict recognition or distinguishing an unusual or usual action or activity relative to data [11].

To highlight the contribution of this paper, the authors have explored the state-of-the-art studies, as depicted in [6,12,13,14,15,16,17,18,19,20,21,22], which have shown relevant similarities and differences where necessary, which in the context of this study has provided concrete insights that have enabled the authors of this paper to concentrate on the main problems/contributions of NP-AD in Time-Series.

Therefore, the specific contributions of this paper can be summarised as follows:

We provide the Non-Pattern based Anomaly Detection (NP-AD) formalisms from the perspective of a Finite State Machine (FSM), where we map each FSM state with the generic security data collection system, i.e., SIEM.
We propose an NP-AD approach in Time-Series. Experiments are conducted in varying environments based on non-seasonal time series with/without anomalies in the Numenta Anomaly Benchmark (NAB) dataset, and from SIEM SPLUNK machine learning toolkit datasets. The outcome of the conducted experiments provides proof of the possibility of achieving NP-AD in a complex setting, given a variety of environments.
We conduct a comparative analysis and map the outcome with the proposed study, where the limitations of each study are identified. That notwithstanding, a contextual critical evaluation of NP-AD in Time-Series is given. The outcome shows our approach to be easily integrated and generalized in a complex environment, even with the absence of statistical methods.

The remainder of this paper is organized as follows: Section 2 describes the methodology that has been used in this study, which is then followed by background and related literature in Section 3. A Non-Pattern Anomaly Detection (NP-AD) in Time Series is discussed in Section 4, which is then followed by Experiments and Results in Section 5. After this, a comparative study is given in Section 6, which is then followed by a critical evaluation of the NP-AD in Time Series (Section 7) . The paper ends with a conclusion and a mention of future work in Section 8.

2. Methodology

A mixed approach encompassing qualitative and quantitative approaches has been used to design this research, as highlighted by the studies by Williams and Pattern in [23,24] and shown in the processes in Figure 1. Mathematical formulations have been leveraged to formulate the problem, where also descriptive observations of data patterns have been used as a means of collecting data. In addition, correlations have been employed that allow the examination of the relationship between Pattern-Based and Non-Pattern Anomaly Detection (NP-AD) [25]. Thereafter, experimental approaches are utilized, which at its core has allowed observation of the conditions of NP-AD in Time-Series. Next, a comparative approach has been employed that has helped the authors to consolidate the problem by comparing past, present, and current studies [25].

From Figure 1, this study’s methodology is qualitatively conceptualized based on a time-series measurement where observations are handled at a given time in order to realize NP-AD propositions. To formulate a suitable approach, this study has been designed to leverage a Finite State Machine (FSM) as its transitions are also used as representations for the Security Information and Event Management (SIEM) system. Experimental results are generated based on General Anomaly Detection and NP-AD where the likelihood of NP-AD is assessed from the observations. Ultimately, comparative studies that are mapped against the state-of-the-art lead to a critical evaluation of NP-AD in Time Series, which in the long run, examines the possible applicability of this study. This has been used to show that the relevance of this study truncates this study by generating significant contributions.

3. Background and Related Literature

This section gives a discussion of works that are relevant or closely inclined to the study presented in this paper that has been used as background and related literature. The authors explore anomalies and secure data gathering and time-series.

3.1. Anomalies and Secure Data Gathering

Anomalous data, which is usually viewed as an outlier, is interpreted based on different perspectives. Hawkins [26] defines it as an “observation that deviates from other observations significantly with the aim of arousing suspicion”. However, research by Barnett and Lewis [27] presents it as an inconsistent observation from a remainder of a set of data. Data are simply a collection of bytes, giving no valuable knowledge to a security specialist. Instead, information is the data put into context that is able to generate new knowledge to detect, prevent and defend from attackers. Based on the definitions by Hawkins and Barnett [26,27], we explore dimensions of gathering data using a security system and how data can be ordered in a sequence of values based on some time intervals. Furthermore, research by [28] has shown that detecting anomalies involves detecting anomalous or abnormal data from a dataset. In addition, rare and significant events that can warrant critical actions can be unraveled in the process as anomalies through unusual behaviors [29]. While it is evident that data and anomalies have been explored from a wider perspective [5,30,31], other relevant studies that check on data and anomalies mainly focus on classic anomaly detection, leveraging nearest neighbor distances and clusters. Another relevant approach used to detect anomalous data is the use of deep neural networks, as highlighted by Markou [32].

Security Information and Event Management (SIEM) offers suitable real-time platforms for analyzing security events [33]. They provide real-time analysis of security events from sensors of information and communication systems. SIEM system is normally represented by applications, devices, or services. It is used to accomplish a number of tasks that are shown in Table 1.

SPLUNK is a common type of SIEM that is able to gather data in an intelligent way [34,35,36]. The relevant data gathered by SPLUNK ranges from log files, configuration files, system notifications and applications, alerts, metrics, and scripts to network data. The basis of the information collected in the SPLUNK system is an index—a data repository, which is inherently a file or set of files that store data. There is no specific type of data that can be stored in SPLUNK because it can process any data (the vast majority of unstructured and poorly structured data is automatically identified and processed by SPLUNK) [34,35,36].

3.2. Time-Series

Time series are represented as a set of observations that follow a chronological order and are observed from diverse phenomena [37]. Furthermore, it can be seen as an ordered sequence of values of a variable at equally spaced time intervals. We define a time series as follows:

Definition 1.

A time series is a set of points where measurements T are made, and the observations are handled at a given time t. The set of these observations are represented as

X (t), t ϵ T

[37].

Where T is assumed to be a finite set of points with T = 1, 2, ‖, N. Furthermore, T can be represented as a continuous parameter where T is positioned as a finite interval as T = {t

: 0 ≦

t

≦ L}

. Consequently, a stochastic process with random variables is given by Equation (1);

{Y_{t} : t = 0, \pm 1, \pm 3, \dots}

(1)

Equation (1) can be observed by a time series. It is seen that such a process is determined by a set of distributions s =

{0, 1, 2, \dots, n}

of the finite collections of Y [38]. In order to detect anomalies in time series, the input data can be represented as a univariate or multivariate. Univariate time series, according to [12], is an ordered set of real-valued observations in which each observation is recorded within a specified time/period, while in multivariate time series, an ordered set of dimensional vectors is recorded over a period of time [12]. The challenge of detecting anomalies in time series is usually seen when assessing whether it is possible to predict the novelty or normalcy of the observed time series in a set of training time series [19].

4. Non-Pattern Anomaly Detection (NP-AD) in Time Series

In this section, a discussion of the proposed Non-Pattern Anomaly Detection (NP-AD) approach is presented. This discussion is presented in two-fold: first, a high-level description of the NP-AD in the time series approach is discussed in Section 4.1, which is then followed by NP-AD in the time series problem formulation in Section 4.2. In the later sections, experiments that provide proof of the proposed concept are discussed.

4.1. High-Level Description of NP-AD in Time Series

The proposed Non-Pattern based Anomaly detection (NP-AD) in Time-Series is presented as a five-step approach, as illustrated in Figure 2. The first step, labeled 1, is able to accept anomaly samples as input data. These input data are classified as a univariate time series or multivariate time series, respectively, as shown in Step 2. The step labeled 3 comprises algorithmic training/processing that allows the acquisition of the user’s time series from log files, for example, from a SIEM in a pattern-based approach. The NP-AD in steps labeled 3 and 4 is a process that allows the detection of pattern-based and patternless observations, respectively. Pattern-based anomalies tend to conform, while patternless or non-pattern are anomalies that do not conform/deviate much from other observations.

The ultimate objective of this step is to point out arising suspicions. This is a step that shows approaches of outlier detection that are thought to be having observations with unexpected behavior. It is worth noting that this study concentrates on patternless anomalies that are shown using the dotted rectangle in the step labeled as 4 in Figure 2. The last step, labeled 5, shows the outcome where outlier behavior is observed. In the context of this paper, patternless and non-pattern are used interchangeably; however, they represent the same semantics.

4.2. NP-AD Problem Formulation

The main goal of NP-AD is to be able to generate a comprehensible approach that is able to characterize patternless anomalous behavior over a certain time. In order to formulate the problem, we first map the stochastic processes with a Finite State Machine (FSM) in Section 4.2.1. This is then followed by the general NP-AD formalisms in Section 4.3.

4.2.1. Finite State Machine

A Finite State Machine (FSM) is depicted as a graph with nodes [39] in which it is possible to give computations of a number of states and transitions for some given inputs and output conditions, respectively [40], as shown in Figure 3.

Definition 2.

A Finite State Machine (FSM) is depicted as a sequential system with a graph with nodes that are represented using five tuples, as shown in Equation (2) below:

F S M = {Q, \sum, δ, q_{o}, F}

(2)

where Q =

{q_{1}, q_{2}, \dots n}

is a finite nonempty set of states that can only be in one definite state,

q_{i}

, where

i = 1, 2, \dots, n

and ∑ =

{σ_{1}, σ_{2}, \dots, σ_{m}}

is a set of input symbols that allows FSM to receive certain inputs σ,

j = 1, 2, \dots, m

and δ:

Q \times \sum

→Q is a state transition function with which when FSM receives some inputs, it changes from the definite state.

q_{o} ϵ Q

is the initial state that allows FSM to start receiving inputs from the initial state, and F

\subseteq Q

represents a set of end states that prevents FSM from receiving any inputs.

A summary of the FSM tuples is also shown in Table 2.

This study leverages the FSM transitions and states, where it is possible to simulate stochastic processes and/or determine system states and transitions. Based on the representations of the FSM that have been shown in Table 1, we map the states and transitions with this study as a step towards patternless anomaly detection, as is shown in Table 2. For example, Q translates to the number of events in a SIEM system, and ∑ translates and fits with the number of features of the logs collected from the SIEM. In addition,

δ

translated to the number of functions in the SIEM system, as

q_{o}

ϵ

Q and

F \subseteq Q

, which is shown as a source alphabet, represents a set of all values that characterize the output streams of information or actions (system response to an input signal, system status change messages, etc.). This has been shown in Table 3.

We consider a set of states as a subset of the flow of data from SIEM systems. The input information can be represented as a vector of values, as shown in Equation (3)

x = 〈 a_{1}, a_{2}, a_{3}, \dots, a_{k} 〉

(3)

where the value of the coordinates of the vector

a_{j, j = 1 \dots k}

is the value of log entry parameters (categorical or numeric). Without loss of generality, configuration files, system messages, and applications can be considered based on this approach. This allows the establishment of a connection between the input data and the system states. The following section concentrates on giving general descriptions of NP-AD formalisms.

4.3. General NP-AD Formalisms

To further use the numerical values of the input vectors to obtain a finite number of states, it is desirable to reduce the input vectors to categorical ones by dividing them into intervals and assigning them to certain categories. For example, if one of the attributes has definitions on the set of rational numbers, then its minimum and maximum values (thresholds) and the intervals between them can correspond to the categorical values “low”, “medium”, and “high”. Values that are outside the maximum and minimum thresholds correspond to the categorical values represented either as “high” or “low”. Thus, the input alphabet, as translated from the FSM, can be defined as a finite set of all possible states of the vector

x (2)

, where the total number of these states represented by n can be determined by the Equations (4) and (5), respectively.

X = {x_{1}, x_{2}, x_{3}, \dots, x_{n}}

(4)

n = \prod_{i = 1}^{k} | a_{i} |

(5)

where

| • |

is the operator of determining the power of the set of values, which takes a certain coordinate of the vector of input information. The output alphabet can be defined as the signals coming from the system, as shown in Equation (6).

Y = {y_{1}, y_{2}, y_{3}, \dots, y_{m}}

(6)

These output signals allow deviations from the normal conditions based on the transition matrix. Specific definitions for multiple outputs can be represented based on the following conditions: do nothing; increase the likelihood of anomaly; reduce the likelihood of anomaly and signals. The set of states of the system is defined as shown in Equation (7).

S = {s_{1}, s_{2}, s_{3}, . . ., s_{d}}

(7)

where d is the number of all possible states.

Definition 3

(Probability/Likelihood). The likelihood estimator for n independent activity is denoted by

Y_{i}

, where

Y_{i}

is the i-th activity, and

i = 1, 2, \dots, n

, with a probability p observing the positive outcome [41].

This is shown in Equation (8)

P_{r} = [Y_{i} = y] = P^{y} {(1 - p)}^{(1, y)} f o r {y = 0, 1}

(8)

where the likelihood for obtaining a sample from an activity

{A_{1}, A_{2}, \dots, A_{n}}

is given in Equation (9)

P_{r} = [A_{1}, A_{2} \dots, A_{n} P] = \prod_{i = 1}^{n} P^{A_{i}} {(1 - p)}^{(1, A)}

(9)

Without reducing the generality of this approach, the term “normal” and “anomalous” are used to represent “normal”, “warning”, or “anomaly”. It should be noted that in this case, the type of anomaly will not be determined, but we will determine the probability of its occurrence. However, in this paper, we use the term “likelihood” because it refers to the mathematical meaning of probability and more likely reflects the expectations of potential anomaly. It is worth noting that we assume that each value of the input vector is determined by the likelihood function, i.e., the values of each of the components of the vector are random within the space of values , forming a random vector. Consequently, it is essential to evaluate the anomalies probabilistically; we, therefore, introduce a parameter that will determine the value of the likelihood of an anomalous state at a certain time based on state-to-state transition functions and output detection, as shown in Equation (10).

s_{t + 1} = f (s_{t}, p_{t + 1})

(10)

where

s_{t + 1}

and

s_{t}

are states values at appropriate times,

p_{t + 1}

is the current value of the anomaly likelihood. Because the FSM is a discrete-time model, it is necessary to define a transition function as a threshold function, which may be, for example, a transition to the opposite state in the presence of a likelihood value above/below a certain threshold. Depending on the current state and the likelihood value, it may change to the opposite state or again to the current state based on the behavior. In addition to this, based on a certain sequential ordering of the input vectors, the predictive value of the next input vector can be constructed. Note that in the context of this paper, it does not matter how the predictive value is determined. We assume this value has already been obtained using one of the multiple prediction functions. Hereafter, we assume that the predicted value is obtained as a function of

μ

(•) from the input dataset (for the time interval

x_{t - m}

to

x_{t})

as shown in Equation (11).

{\tilde{x}}_{t + 1} = μ (x_{t - m}, \dots, x_{t})

(11)

Further, to compare the predicted value with the actual value of the input vector, it is necessary to determine a certain measure of the distance of the vectors in the feature space, which is represented as the prediction error, as shown in Equation (12).

δ x = Δ ({\tilde{x}}_{t}, x_{t})

(12)

where

{\tilde{x}}_{t}

is a predicted value of the vector at the time t, and

x_{t}

is its actual value. Having observed a measure of distance, the function of the dependence of the anomaly likelihood of the state on that distance can now be determined. The likelihood determination functions

ρ

can be defined as a function of the current likelihood and a measure of the distance in Equation (12) of the predicted vector from Equation (13).

ρ_{t + 1} = ρ (p_{t}, δ x)

(13)

The final step determines the thresholds that have been highlighted from Equation (10).

4.4. NP-AD Requirements and Functions

This section gives a discussion on the NP-AD requirements and functions based on the NP-AD formalisms that have been discussed in Section 4.3 of this paper. This discussion has been presented based on two aspects; the general prediction function, which is discussed in Section 4.4.1, and the logistic function, which is discussed in Section 4.4.2, respectively. These functions are mainly discussed because they give an understanding of the likelihood of identifying and profiling anomalies based on how states and transitions are depicted in an FSM.

4.4.1. General Prediction Function

The prediction value considers the most recent values as opposed to the older values. However, it is pertinent that older values are incorporated owing to the likelihood of occurrence. That notwithstanding, a number of prediction functions can meet this criterion; an example is Single Exponential Smoothing (SES), which has been highlighted by [42]. This paper has taken careful consideration of this aspect, given that it is basically a key component in a wide range of time series. It is based on the premise that SES has been considered a prediction function in the NP-AD in the time series approach. Based on the FSM dimensions and transitions, this study presents the prediction function as shown in Equation (14).

{\tilde{x}}_{t} = α x_{t} + (1 - α) {\tilde{x}}_{t - 1}

(14)

where

α

is a parameter reflecting the “depth” of the prediction function memory. The higher it is, the less the memory depth is, and prediction will only be based on the most recent previous values. Next, we introduce the logistic function, which overall forms part of the scope of this paper.

4.4.2. Logistic Function

The logistic function has been presented as a function that is used to calculate the likelihood of detecting an NP-AD, and this is represented as shown in Equation (15)

l o g i s t i c (x) = 1 / (1 + e^{b} (x - x_{0}))

(15)

The logic behind the logistic function is based on the fact that it is positioned as a function that normalizes the deviation and scales it to measurable and predictive values in a range of (0–1). Based on this normalization, the likelihood function is denoted as shown in Equation (16).

p_{t} = l o g i s t i c (p_{t - 1} + l o g i s t i c (δ x))

(16)

where

p_{t}

is the current value of likelihood, and

δ x

is the normalized deviation of the time series actual value from the predicted (expected) value.

It is observed that using this function enables two distinct criteria to be met: firstly, it keeps the likelihood value in a range of (0–1). Secondly, it has a deviation normalized to use constant logistic function parameters (to keep an outer logistic function input in a range of (0–2). From Equation (16), it is necessary to mention that logistic functions may have different parameters, for example,

{b, x_{0}}

, also according to Equation (15). For example, we denote parameters

b = 5

,

x_{0} = 0.9

in order to create the logistic curve, as is depicted in Figure 4. Here we may see there is an anomaly likelihood value of 0.5 (i.e., our previous step could have some occurrence of potentially unexpected behavior). In the next step, unexpected behavior of the time series may lead to some deviation of

ρ x

, which in turn makes

l o g i s t i c (δ x) = 0.7

. The outer logistic function will have a value of 1.2, which gives the new value of

p_{t} \approx 0.77

, which increases the likelihood.

Consequently, having a new value of logistic

(δ x) \approx 0.4

would give us a new value of

p_{t} \approx 0.43

, which is less than the previous value of anomaly likelihood. This corresponds to common sense and understanding of deviation from expected behavior.

In order to define the criteria that are applied to the inner logistic function and its parameters, an assumption is made that those parameters

{b, x_{0}}

may be completely different for different values of the time series and different deviations. Indeed, the deviation of 20 may not be counted as anomalous for the average value of 2000 in some time range. On the other hand, the deviation of 0.2 may be significant for the average value of another time series. In this paper, we apply the sliding window approach (considering its width as N) to dynamically measure changes in time series characteristics in order to adapt our model to these changes.

Thus, we cannot apply a function, as depicted in Figure 4, for different time series with no adjustments. Therefore, we must adapt this logistic function to the time series prediction error averaged values. To do this, instead of using static values in Equation (15), we consider a logistic curves family, which requires meeting the following set of criteria:

Its middle point (the point where $l o g i s t i c$ $(δ x)$ = 0.5) depends on the average of the last N values of $δ x$ ;
Its parameter b is reciprocally proportional to the average of the last N values of $δ x$ .

The above-mentioned average of the last values is calculated as is in Equation (17).

\sum_{j = i - N}^{i} (| x_{j} | / N), w h e r e i \in [N, l e n (t i m e s e r i e s)]

(17)

This family of logistic curves is shown in Figure 5, for the average of the last N values of

δ x

equal to

{0.25, 0.5, 1, 1.5, and 2}

, from left to right, respectively.

Having looked at essential aspects that have been used as basic building blocks for NP-AD, it has been observed and understood that the deviation of anomaly is dependent on the average prediction errors, as was also shown from 1–13. Furthermore, it was observed that every deviation is normalized by average value. Experimental approaches based on Algorithm 1 that provide proof of the concept that has been proposed in Section 4 of this paper are given in the next section.

Algorithm 1: NP-AD in Time-Series

Input:

x_{t},

Output:

A (x_{t})

5. Experiment and Results

In order to validate the feasibility and proof of the proposed approach for Non-Pattern Anomaly detection (NP-AD) in time series, implementation has been conducted where the experiments that have been realized are focused on the following:

Approach 1: General Anomaly Detection Conduct general anomaly detection in time-series against the historical data as a step towards NP-AD.
Approach 2: Non-pattern Anomaly Detection Conduct an experiment on more complex noise data focused on data on voltage measurement with time-series from a CPU. We utilize non-seasonal time series with outliers, Numenta Anomaly Benchmark (NAB) dataset, and SIEM SPLUNK machine learning toolkit to show anomalies in time-series and we apply a Numenta Anomaly Benchmark (NAB) to evaluate the detected anomalies.

A discussion of the above-mentioned outcomes has been discussed in Section 5.1 and Section 5.2, respectively.

5.1. Approach 1: General Anomaly Detection

Over time-series, the role of the experiment conducted in the first approach was to determine if there was any anomaly against the historical data. Non-seasonal time series with outliers, the Numenta Anomaly Benchmark (NAB) dataset, and the SPLUNK Machine Learning Toolkit was used in gathering data, and based on the approach shown in Algorithm 1, three main anomalies were detected. As shown in Figure 6, the first point can hardly be visually detected as an anomaly because of the scale, but its value is the first in a time series other than zero. The second anomaly is an obvious deviation compared to the historical values, and the third one is anomalous as it is an outlier with value times greater than average historical data.

5.2. Approach 2: Non-Pattern Anomaly Detection

This approach mainly shows the steps taken in Non-pattern Anomaly Detection (NP-AD). By considering more complex and noised data with obvious anomaly points than the one utilized in Section 5.1, the outcome is shown in Figure 7. The data used in this approach are mainly inclined towards the voltage measurement time series that is obtained from the 0.8V CPU battery. From Figure 7, there are only two behavioral anomalies in this time series, which are basically outliers. There is also another one that is not marked due to two factors: (1) the threshold level for anomalies is too high; (2) the amplitude of fluctuations of the time series values in the interval preceding this point gradually increases, and the model “gets used” (adjusts) to such values. From Figure 8 and Figure 9, anomalies are detected on a more complex time series of a hard-disk drive temperature collected from the laptop, where the x-axis is time measured in seconds.

Basically, Figure 8 and Figure 9 represent a non-seasonal time series with outliers (CPU battery voltage collected from the laptop, time is measured in seconds), and Figure 10 and Figure 11 depict a repeated values time series with the spike density anomaly with different anomaly likelihood thresholds taken from the Numenta Anomaly Benchmark (hereinafter, NAB) dataset (art_increase_spike_density.csv). Despite only two obvious outliers (near points, both 160 and 600), there are also many other points, which may be counted as anomalies (or at least, points of attention) exactly because of the suggested approach of deviation accumulation. A few deviations, one by one, led to a deviation accumulation that caused an increase in the anomaly likelihood. In a range of (0, 50), we can see a point-to-point increase, which is indicated as anomalous behavior, as well as near point 700.

In order to choose a proper anomaly likelihood threshold, adjustments are made to the number of alerts for different types of raw data, or it is required to find some new anomalies in a repeated time series. For example, Figure 12 and Figure 13 show an example of a time series with high variance with and without a seasonal component. Fundamentally, from Figure 12, our approach detected three anomaly points, which can be explained as the change in behavior (momentarily increased variance). Consequently, the anomaly in the time series that is shown in Figure 13 is one of the flaws of the proposed model with an SES as the prediction function. To detect this type of anomaly, the SARMA (SARIMA, SARIMAX) function is used instead [43,44,45,46].

However, the observed time series is a reflection of certain system behavior, and these anomalies are explainable. At the beginning of every peak set, our approach “recognizes” a change in behavior, then “gets used” to a new behavior, showing no likelihood of anomaly. In addition, a context anomaly is detected near point 500, as is shown in Figure 14, which could not have been detected if statistical methods were applied. Furthermore, from Figure 14, there is an anomaly of a time series with a zero-order trend and a normal distribution with several obvious outliers, and it is worth mentioning that based on the dataset that has been utilized, the observed spikes are translated as major anomalies. By considering behavior analysis approaches, it is pertinent to highlight that our approach detects a slightly different set of points to be anomalous. Points that lie near 150 are detected as anomalies in the range (120, 150), where observations show that the system state described with the given time series has become “stable” with low variance. Further, as this model adjusts to the increased variance, spikes in a range of (8160, 300) are not recognized as anomalies.

Consequently, there are several obvious outliers detected as anomalies, as shown in Figure 14; apart from that, there is also a likelihood of an increased anomaly in the range of (160, 180). This is seen because of the change in the system behavior, which is a time series action. From this, it is possible to see how the model can be seen and how it adjusts to new behavior. Specifically, in Figure 15 we may see how after a long time of no activity (values equal to 0) everything is recognized as an anomaly. However, later this new type of behavior becomes normal. Notably, from Figure 16 and Figure 17, a representation of the Numenta Anomaly Benchmark NAB time series with no anomalies to be detected is seen. With the threshold value of 0.9, it is possible to detect some anomalous behavior points. Specifically, in Figure 17, the detected anomalies are indeed transitional between two more or less stable states (values close to 20, and values close to 80). Increasing the anomaly likelihood threshold up to 0.95 will reduce these anomalies to 0.

Based on the suggested approach, it is vital to split the anomaly likelihood awareness into several levels of alerts herein seen as (severity) depending on the anomaly likelihood thresholds: For instance, Level 1, with a range over 0.95 of anomaly likelihood, Level 2, with range (0.9, 0.95) of anomaly likelihood, Level 3, with range (0.8, 0.9) of anomaly likelihood, etc. This allows splitting the rules for further processing of detected points. An example of this is seen in Figure 18 and Figure 19. In that context, there are several more time series examples with different types of anomalies detected. In addition, in Figure 18, there are behavioral anomalies detected (e.g., unexpectedly rapid value growth) as well as some context anomalies (some contextually meaningful outliers).

Contextual anomalies have also been portrayed in Figure 20 and Figure 21, where it is shown from both Figure 20 and Figure 21 that only contextual anomalies are detected instead of detecting the collective ones in a range of (3000, 3100) and (2900, 3100), respectively.

6. Comparative Analysis

This section gives a comparative analysis with other existing relevant studies that have somewhat focused on/or are inclined towards pattern/patternless anomaly detection in time series (Table 4). The essence of exploring these studies is to show not only how the proposed study translates and fits within the current discipline but also shows the actual contribution to the body of knowledge based on the contributions that were mentioned previously in Section 1 of this paper.

Research by [12] has suggested an approach for outlier anomaly detection in time-series data. The study explored the state-of-the-art using unsupervised outlier detection by proposing a taxonomy. While this was a relevant study in an unsupervised environment, its approach was inclined toward pattern-based detection. Next, a study by Nurjahan [13] describes time series based on rare patterns with a never-ending stream of data in higher-level data mining algorithms; however, the study did not show utilities for wide-range real-life datasets. Furthermore, a novel non-signature/patternless intrusion detection system has been proposed by [14,15], where a prototype was implemented at the US Pacific command. Perturbations were identified as a real-time intrusion for thermodynamics. While this approach was patternless, it was hardly deployed in a time series approach. Another study that focused on learning pattern classification over time series processes shows that each identified learning pattern is basically correlated with the appearance of the outlier, and the time series learning analytics shows that abnormal values tend to fall under a learning pattern. While this appears to be a pertinent study, its focus is generic to normal patterns [16]. Next, research that focused on resilient real-time anomaly detection based on non-parametric statistical tests has shown the detection of anomalies of computer network traffic applied to volumetric anomalies from DDoS attacks, where timing analysis was conducted. This approach only worked on a single feature, and the impact was unknown [17]. In addition to this, a method for the implementation of a patternless IDS herein referred to as Zippo, which is based on decision trees and organization policies, has shown a significant approach that utilizes a patternless approach, however not over time-series [18].

In addition to the above-mentioned studies, the authors of [19] have studied anomaly detection in a series that reduced distance computation in order to detect abnormal time series. This approach applied the nearest neighbor approach; however, a DBAD algorithm was suggested. It has been observed that exploring the efficiency of the suggested algorithm could be an important step aimed at increasing the accuracy. Other research has focused on addressing Long Short Term memory (LSTM) network anomaly detection in time series. This approach involves learning and fault detection [20]. This study allowed a network to be trained on anomalous data where the resulting prediction errors were modeled as Gaussian distribution in order to assess the likelihood of anomalous detection behavior. From this study, it was not known whether the normal behavior involved long-time dependencies.

Nevertheless, research by [6] suggests a Deep Learning, DeepAnt approach aimed at detecting anomalies in unsupervised time series, where the time series predictor employs a Convolution Neural Network (CNN). From this study, since the anomaly was not supervised, it hardly relied on labels to generate the model, where DeepAnt outclassed the state-of-the-art when ten datasets were evaluated with 433 time series. From this study, it was not certain how adversarial approaches could be handled if the model could be transferred in time series. Furthermore, it was important to evaluate the impact of time series forecasting. In addition to this, studies aimed at the automatic detection of outliers for time series that are applied to sensor-based data have performed a computation to understand the effects of the threshold and window width parameter values, where small window width was recommended, that allowed unusual values to be identified. However, its knowledge of the signal used was important in this study [21]. Next, research by [22] suggests an anomaly detection scheme that is based on time-series analysis that determines the streams of real-time sensor data from a computer in order to determine if a heartbeat is abnormal or not in ECG. There is a need to analyze sensor data from oximeters in this study using window-based discord discovery.

The authors of this paper acknowledge the significant contributions from the above-mentioned researchers, and having looked at the relevant studies that align or are closely related to the study that has been proposed in this paper, it is the authors’ opinion that these studies have played a significant role by providing key insights that have helped to map the proposed study in the current research domain and consolidate the approach and the problem that was a highlight in Section 1 of this paper. Based on the findings that have been explored, the following section provides a critical evaluation of the proposed concept.

7. Critical Evaluation of NP-AD in Time Series

More often than not, the time series phenomenon in a complex environment is seen to represent situations that allow an ordered sequence of situations at diverse intervals. The observations of anomaly detection are a projection that allows the detection of unseen occurrences in these complex environments based on some historical instances. Studies have shown the need for relying on pattern-based approaches to realize anomaly detection objectives.

This paper has taken a step to propose an anomaly detection approach for patternless observations, herein referred to as NP-AD, which has been discussed in prior sections of this paper. It is worth noting that the suggested NP-AD is an approach that relies on deviating approaches that are used to detect anomalies. This, in the long run, is a suitable approach for detecting novel behaviors from a complex setting. Consequently, while prior studies have shown that a normal detection algorithm takes distinct approaches to learn normalities through training instances that are deemed as normal and also apply previous instances by way of measuring distance and threshold for detecting anomalies [19], our approach has opted to deduce observations from a patternless observation but with more precise observations based on the figures shown.

From the experiments that have been conducted in this paper, our proposition is specifically inclined to the Numenta Anomaly Benchmark (NAB) dataset coupled with a machine learning SIEM SPLUNK toolkit, which are suitable approaches for patternless/non-pattern anomaly detection at the time of writing this paper. The choice of this dataset has been motivated by the complexity of the cybersecurity field, where the threat landscape is seen to change dynamically and constantly. This means that a much older dataset would be relevant in some scenarios based on different metrics; however, this proposition’s ultimate goal, which was realized in Section 5, was meant to provide proof of concept. The authors also acknowledge the fact that there exist other approaches in the state-of-the-art such as SOTA [47,48], which addressed problems on supervised and unsupervised datasets for general anomaly detection. While the authors of this paper concur that SOTA represents the state-of-the-art anomaly detection methods used to identify outliers or deviations from normal behavior in data, we also justify the choice and effectiveness of our approach against the state-of-the-art because of how our approach adapts to the changing complex cyber-security environment, as seen by data gathered by SIEM-SPLUNK in a patternless approach. The contrast in this context is based on the fact that SOTA is basically inclined to employ machine learning and statistical techniques to detect outliers in large datasets.

The NP-AD approach that has been proposed in this paper significantly portrays the concept of outlier or anomalous data, where the representation shows that the outlier, in this case, is unlabeled time series data. While we look at these from the perspective of detecting unwanted data [12], our study considers the generic representation of an anomaly as being prone to detect events where our study sticks to the likelihood of occurrence.

Discounting the above-mentioned representations, the techniques that have been used to arrive at the conclusions in this paper show a more systematic approach given the varying results that depend on the changes of instances, as is shown in the figures. The NP-AD is basically mapped to the FSM. Based on the FSM computations, their traversals explicitly translate and fit the actions witnessed in a complex setting where useful data to be observed are extracted from during a normal detection approach. This is shown in Table 2, whereby establishing a connection between the input data and the system sets allows effective interconnection of states and transitions.

The generic NP-AD formalisms have shown the relevance of using numerical values and finite representations of states in this paper, where, based on the achieved descriptions, the actual functions (generic and logistic, see Section 4.4) utilized are able to show the dimensions of the transitions. Consequently, the relevance of the experiments that have been conducted in this paper has shown the essence of identifying deviating and normal results based on our representations for non-seasonal time series outliers, the Numenta Anomaly Benchmark (NAB), and SIEM SPLUNK machine learning toolkit representations.

In view of the foregoing, it is also worth noting that some of the examples considered in this paper may trigger more alerts that can be recognized by some statistical approaches or machine learning-based approaches as anomalies. This difference from the proposed study is based on the fact that we consider the expectation of values from history based on the moving time window (or several time windows) instead of searching for some predefined patterns of known anomalies.

Nevertheless, it is pertinent to mention that during anomaly detection, machine learning models would be analyzed based on some metrics such as performance and accuracy, etc. [11,49,50,51,52,53]. For example, ML classifiers could be subjected to anomaly detection problems, which could allow the application of an F1 score [54]. The F1 score is a metric used to measure the performance of a model in a classification task. It is the harmonic mean of precision and recall and takes into account both false positives and false negatives. The F1 score is most commonly used when there is an uneven class distribution in the data, and there is a need to focus on both the precision and recall of the model. F1 score ranges from 0 to 1, with 1 being the best possible score. Furthermore, the ROC-AUC [55] metric can be used for evaluation purposes. The ROC-AUC is a metric used to evaluate the performance of a binary classification model. It stands for Receiver Operating Characteristic—Area Under Curve. It is calculated by plotting the true positive rate against the false positive rate for every possible classifier threshold. The area under the curve represents the model’s accuracy and how well it can distinguish between positive and negative classes. The higher the area under the curve, the better the model is at distinguishing between positive and negative classes. While the authors of this paper find these metrics to have a lot of importance, especially during the evaluation of performance and accuracy, the scope of the study that has been presented in this paper was limited to patternless detection. As a result, these metrics are positioned as avenues of future work that will serve as a continuation of this present study.

Ultimately, a comparative study conducted in this paper has outlined the relevance of the NP-AP approach, where key limitations of each identified work have unearthed the existing research gaps and have shown the essence of exploring or proposing significant approaches.

8. Conclusions and Future Work

The objective of this paper was to investigate and propose a non-pattern anomaly detection in time series, which is important in giving warnings for the likelihood of potential cybersecurity attacks. As a result, a Non-Pattern Anomaly Detection (NP-AD) approach has been proposed, which is a step toward detecting behavioral anomalies. Furthermore, an assumption is made in this paper that the behavior is represented by the time series of a single parameter with the understanding that the actual behavior of any information system or any user in it can be significantly more complex. Furthermore, a comparative analysis has been conducted where the limitations of each study have been identified.

Consequently, the results from the initial experiments that have been conducted have shown the possibility of detecting anomalies even with the absence of a statistical method. The significance of this work lies in the fact that the NP-AD approach can be applied to detect anomalies even with the absence of statistical methods. As a result, it is pertinent to note that this work could be extended based on the following directions: employing statistical approaches to NP-AD in time series, ambient and complex settings and studying the different types of prediction functions and approaches for self-adjusting the model parameters. Future work will concentrate on conducting a comparison of machine learning algorithms on NP-AD approaches and a comparison of different datasets in diversified scenarios, where the focus will be on more traditional anomaly detection methods versus modern techniques in order to show the suitability and relevance of newer propositions. Furthermore, future research will be aimed at proposing high-level solutions for the identified research limitations and mapping the solutions to the NP-AD approach.

9. Raw Data and Sources

The raw data used in this paper, along with the source code of the proposed program model (utilizing the proposed approach) in Python, are permanently available at the link: https://github.com/vntkach/anomalydetection (accessed on 13 December 2022).

Author Contributions

Conceptualization, V.T.; methodology, V.T.; software, V.T.; validation, V.T.; formal analysis, V.T. and V.R.K.; investigation, V.T.; resources, V.T. and V.R.K.; data curation, V.T. and O.B.; writing—original draft preparation, V.T.; writing—review and editing, V.R.K., V.T.; visualization, V.T. and I.K.; supervision, A.K.; project administration, A.K.; funding acquisition, A.K., O.B. and V.T. All authors have read and agreed to the published version of the manuscript.

Funding

APC was funded by Blekinge Institute of Technology, BTH, Sweden.

Acknowledgments

This paper was published under the CRDF Global Ukraine-USA Grant Agreement with the National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute” (Kyiv, Ukraine), and Florida International University (Miami, FL, USA).

Conflicts of Interest

The authors declare no conflict of interest.

References

Ahmad, S.; Lavin, A.; Purdy, S.; Agha, Z. Unsupervised real-time anomaly detection for streaming data. Neurocomputing 2017, 262, 134–147. [Google Scholar] [CrossRef]
Tan, S.C.; Ting, K.M.; Liu, T.F. Fast anomaly detection for streaming data. In Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, Catalonia, Spain, 16–22 July 2011. [Google Scholar]
Waite, A. InfoSec Triads: Security/Functionality/Ease-of-Use. Available online: https://blog.infosanity.co.uk/?p=676 (accessed on 13 December 2022).
Rainie, L.; Anderson, J.; Connolly, J. Cyber Attacks Likely to Increase; Pew Research Center: Washington, DC, USA, 2014. [Google Scholar]
Chandola, V.; Banerjee, A.; Kumar, V. Anomaly detection: A survey. ACM Comput. Surv. (CSUR) 2009, 41, 1–58. [Google Scholar] [CrossRef]
Munir, M.; Siddiqui, S.A.; Dengel, A.; Ahmed, S. DeepAnT: A deep learning approach for unsupervised anomaly detection in time series. IEEE Access 2018, 7, 1991–2005. [Google Scholar] [CrossRef]
Wei, L.; Kumar, N.; Lolla, V.N.; Keogh, E.J.; Lonardi, S.; Ratanamahatana, C.A. Assumption-Free Anomaly Detection in Time Series. In Proceedings of the SSDBM, Santa Barbara, CA, USA, 27–29 June 2005; Volume 5, pp. 237–242. [Google Scholar]
Hindy, H.; Brosset, D.; Bayne, E.; Seeam, A.; Bellekens, X. Improving SIEM for critical SCADA water infrastructures using machine learning. In Computer Security; Springer: Berlin/Heidelberg, Germany, 2018; pp. 3–19. [Google Scholar]
Di Mauro, M.; Di Sarno, C. Improving SIEM capabilities through an enhanced probe for encrypted Skype traffic detection. J. Inf. Secur. Appl. 2018, 38, 85–95. [Google Scholar] [CrossRef]
Ren, H.; Xu, B.; Wang, Y.; Yi, C.; Huang, C.; Kou, X.; Xing, T.; Yang, M.; Tong, J.; Zhang, Q. Time-series anomaly detection service at microsoft. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 3009–3017. [Google Scholar]
Alkharabsheh, K.; Alawadi, S.; Kebande, V.R.; Crespo, Y.; Fernández-Delgado, M.; Taboada, J.A. A comparison of machine learning algorithms on design smell detection using balanced and imbalanced dataset: A study of God class. Inf. Softw. Technol. 2022, 143, 106736. [Google Scholar] [CrossRef]
Blázquez-García, A.; Conde, A.; Mori, U.; Lozano, J.A. A review on outlier/anomaly detection in time series data. ACM Comput. Surv. 2021, 54, 1–33. [Google Scholar] [CrossRef]
Begum, N.; Keogh, E. Rare Pattern Discovery from Time Series. In Proceedings of the Int’l Conference on Very Large Databases (VLDB), Kohala Coast, HI, USA, 31 August–4 September 2015. [Google Scholar]
Donald, S.D.; McMillen, R.V.; Ford, D.K.; McEachen, J.C. Therminator 2: A thermodynamics-based method for real-time patternless intrusion detection. In Proceedings of the MILCOM 2002, Anaheim, CA, USA, 7–10 October 2002; IEEE: Piscataway, NJ, USA, 2002; Volume 2, pp. 1498–1502. [Google Scholar]
Donald, S.D.; McMillen, R.V.; Ford, D.K.; McEachen, J.C. Modeling Network Conversation Flux for Patternless Intrusion Detection. Available online: https://scholar.google.com.hk/scholar?hl=zh-CN&as_sdt=0%2C5&q=Modeling+network+conversation+flux+for+patternless+intrusion++detection&btnG= (accessed on 13 December 2022).
Dobashi, K.; Ho, C.P.; Fulford, C.P.; Lin, M.F.G.; Higa, C. Learning pattern classification using moodle logs and the visualization of browsing processes by time-series cross-section. Comput. Educ. Artif. Intell. 2022, 3, 100105. [Google Scholar] [CrossRef]
Bollmann, C.A.; Tummala, M.; McEachen, J.C. Resilient real-time network anomaly detection using novel non-parametric statistical tests. Comput. Secur. 2021, 102, 102146. [Google Scholar] [CrossRef]
Olsavsky, V.L. Implementing a Patternless Intrusion Detection System; A Methodology for Zippo; Technical Report; Naval Postgraduate School: Monterey, CA, USA, 2005. [Google Scholar]
Teng, M. Anomaly detection on time series. In Proceedings of the 2010 IEEE International Conference on Progress in Informatics and Computing, Shanghai, China, 10–12 December 2010; IEEE: Piscataway, NJ, USA, 2010; Volume 1, pp. 603–608. [Google Scholar]
Malhotra, P.; Vig, L.; Shroff, G.; Agarwal, P. Long short term memory networks for anomaly detection in time series. In Proceedings of the ESANN, Bruges, Belgium, 22–23 April 2015; Volume 89, pp. 89–94. [Google Scholar]
Basu, S.; Meckesheimer, M. Automatic outlier detection for time series: An application to sensor data. Knowl. Inf. Syst. 2007, 11, 137–154. [Google Scholar] [CrossRef]
Chuah, M.C.; Fu, F. ECG anomaly detection via time series analysis. In Proceedings of the International Symposium on Parallel and Distributed Processing and Applications, Niagara Falls, Canada, 29–31 August 2007; Springer: Berlin/Heidelberg, Germany, 2007; pp. 123–135. [Google Scholar]
Williams, C. Research methods. J. Bus. Econ. Res. 2007, 5, 65–72. [Google Scholar] [CrossRef]
Patten, M.L. Understanding Research Methods: An Overview of the Essentials; Routledge: Abingdon, UK, 2017. [Google Scholar]
McNeill, P. Research Methods; Routledge: Abingdon, UK, 2006. [Google Scholar]
Hawkins, D.M. Identification of Outliers; Springer: Berlin/Heidelberg, Germany, 1980; Volume 11. [Google Scholar]
Barnett, V.; Lewis, T. Outliers in statistical data. Applied Probability and Statistics; Wiley Series in Probability and Mathematical Statistics; Wiley: New York, NY, USA, 1984. [Google Scholar]
Ahmed, M.; Mahmood, A.N.; Hu, J. A survey of network anomaly detection techniques. J. Netw. Comput. Appl. 2016, 60, 19–31. [Google Scholar] [CrossRef]
Ahmed, M.; Mahmood, A.N. Novel approach for network traffic pattern analysis using clustering-based collective anomaly detection. Ann. Data Sci. 2015, 2, 111–130. [Google Scholar] [CrossRef]
Zimek, A.; Schubert, E.; Kriegel, H.P. A survey on unsupervised outlier detection in high-dimensional numerical data. Stat. Anal. Data Mining ASA Data Sci. J. 2012, 5, 363–387. [Google Scholar] [CrossRef]
Pimentel, M.A.; Clifton, D.A.; Clifton, L.; Tarassenko, L. A review of novelty detection. Signal Process. 2014, 99, 215–249. [Google Scholar] [CrossRef]
Markou, M.; Singh, S. Novelty detection: A review—Part 2: Neural network based approaches. Signal Process. 2003, 83, 2499–2521. [Google Scholar] [CrossRef]
González-Granadillo, G.; González-Zarzosa, S.; Diaz, R. Security information and event management (SIEM): Analysis, trends, and usage in critical infrastructures. Sensors 2021, 21, 4759. [Google Scholar] [CrossRef]
Carasso, D. Exploring Splunk; CITO Research: New York, NY, USA, 2012. [Google Scholar]
Fedorov, M.; Adams, P.; Brunton, G.; Fishler, B.; Flegel, M.; Wilhelmsen, K.; Wilson, R. Leveraging Splunk for Control System Monitoring and Management; Technical Report; Lawrence Livermore National Lab. (LLNL): Livermore, CA, USA, 2017. [Google Scholar]
Sigman, B.P.; Delgado, E. Splunk Essentials; Packt Publishing Ltd.: Birmingham, UK, 2016. [Google Scholar]
Parzen, E. An approach to time series analysis. Ann. Math. Stat. 1961, 32, 951–989. [Google Scholar] [CrossRef]
Cryer, J.D. Time Series Analysis; Springer: Berlin/Heidelberg, Germany, 1986; Volume 286. [Google Scholar]
Gladyshev, P.; Patel, A. Finite state machine approach to digital event reconstruction. Digit. Investig. 2004, 1, 130–149. [Google Scholar] [CrossRef]
Kebande, V.R.; Choo, K.K.R. Finite state machine for cloud forensic readiness as a service (CFRaaS) events. Secur. Priv. 2022, 5, e182. [Google Scholar] [CrossRef]
Pan, J.X.; Fang, K.T. Maximum likelihood estimation. In Growth Curve Models and Statistical Diagnostics; Springer: Berlin/Heidelberg, Germany, 2002; pp. 77–158. [Google Scholar]
Aue, A.; Norinho, D.D.; Hörmann, S. On the prediction of functional time series. arXiv 2012, arXiv:1208.2892. [Google Scholar]
Bercu, S.; Proïa, F. A SARIMAX coupled modelling applied to individual load curves intraday forecasting. J. Appl. Stat. 2013, 40, 1333–1348. [Google Scholar] [CrossRef] [Green Version]
Vagropoulos, S.I.; Chouliaras, G.; Kardakos, E.G.; Simoglou, C.K.; Bakirtzis, A.G. Comparison of SARIMAX, SARIMA, modified SARIMA and ANN-based models for short-term PV generation forecasting. In Proceedings of the 2016 IEEE International Energy Conference (ENERGYCON), Leuven, Belgium, 4–8 April 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 1–6. [Google Scholar]
Tarsitano, A.; Amerise, I.L. Short-term load forecasting using a two-stage sarimax model. Energy 2017, 133, 108–114. [Google Scholar] [CrossRef]
Choi, T.M.; Yu, Y.; Au, K.F. A hybrid SARIMA wavelet transform method for sales forecasting. Decis. Support Syst. 2011, 51, 130–140. [Google Scholar] [CrossRef]
Molan, M.; Borghesi, A.; Cesarini, D.; Benini, L.; Bartolini, A. RUAD: Unsupervised anomaly detection in HPC systems. Future Gener. Comput. Syst. 2023, 141, 542–554. [Google Scholar] [CrossRef]
Venkataramanan, S.; Peng, K.C.; Singh, R.V.; Mahalanobis, A. Attention guided anomaly localization in images. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 485–503. [Google Scholar]
Kebande, V.R.; Alawadi, S.; Awaysheh, F.M.; Persson, J.A. Active machine learning adversarial attack detection in the user feedback process. IEEE Access 2021, 9, 36908–36923. [Google Scholar] [CrossRef]
Shin, Y.; Kim, K. Comparison of anomaly detection accuracy of host-based intrusion detection systems based on different machine learning algorithms. Int. J. Adv. Comput. Sci. Appl. 2020, 11, 252–259. [Google Scholar] [CrossRef]
Park, S.; Choi, J.Y. Hierarchical anomaly detection model for in-vehicle networks using machine learning algorithms. Sensors 2020, 20, 3934. [Google Scholar] [CrossRef] [PubMed]
Escalante, H.J. A comparison of outlier detection algorithms for machine learning. In Proceedings of the International Conference on Communications in Computing, Las Vegas, NV, USA, 27–30 June 2005; pp. 228–237. [Google Scholar]
Nawir, M.; Amir, A.; Lynn, O.B.; Yaakob, N.; Ahmad, R.B. Performances of machine learning algorithms for binary classification of network anomaly detection system. In Journal of Physics: Conference Series; IOP Publishing: Bristol, UK, 2018; Volume 1018, p. 012015. [Google Scholar]
Lipton, Z.C.; Elkan, C.; Narayanaswamy, B. Thresholding classifiers to maximize F1 score. arXiv 2014, arXiv:1402.1892. [Google Scholar]
Narkhede, S. Understanding auc-roc curve. Towards Data Sci. 2018, 26, 220–227. [Google Scholar]

Figure 1. Research methodology approaches.

Figure 2. A high-level view of the proposed NP-AD.

Figure 3. An FSM with states and transitions.

Figure 4. Logistic curve to represent the function of

p_{t} = l o g i s t i c

.

Figure 4. Logistic curve to represent the function of

p_{t} = l o g i s t i c

.

Figure 5. Logistic function normalized for

δ x

by its average value for the last N points in a sliding window, left to right:

{0.25, 0.5, 1, 1.5

and

2}

.

Figure 5. Logistic function normalized for

δ x

by its average value for the last N points in a sliding window, left to right:

{0.25, 0.5, 1, 1.5

and

2}

.

Figure 6. Artificially made time-series with outliers to demonstrate the approach. It is a simple time series with the first outlier as the first value other than 0. Hereinafter, anomalies are marked with red dots.

Figure 7. Artificially made time-series with outliers to demonstrate the approach. Non-seasonal time series with outliers.

Figure 8. Non-seasonal time-series with outliers. Anomalies were found with an anomaly likelihood threshold of 0.9.

Figure 9. Non-seasonal time-series with outliers. Anomalies were found with an anomaly likelihood threshold of 0.95.

Figure 10. Numenta Anomaly Benchmark dataset spike density anomaly found with anomaly likelihood threshold 0.9.

Figure 11. Numenta Anomaly Benchmark spike density anomaly found with anomaly likelihood threshold 0.98.

Figure 12. SIEM SPLUNK® Machine Learning Toolkit example time series without a seasonal component.

Figure 13. Numenta Anomaly Benchmark time series with a seasonal component.

Figure 14. Numenta Anomaly Benchmark time series with zero-order trend and several spikes.

Figure 15. SIEM SPLUNK Machine Learning Toolkit, another example time series without a seasonal component.

Figure 16. Numenta Anomaly Benchmark time series (no anomalies). Highly noisy data (art_noisy.csv).

Figure 17. Numenta Anomaly Benchmark time series with no anomalies. No anomalies: artificial repeating (art_daily_perfect_square_wave.csv).

Figure 18. Numenta Anomaly Benchmark time series with both point and context anomalies (anomalies found with anomaly likelihood threshold 0.9).

Figure 19. Numenta Anomaly Benchmark time series with hidden context anomalies within repeating point anomalies set. Anomalies were found with an anomaly likelihood threshold of 0.9.

Figure 20. Numenta Anomaly Benchmark time series with anomalies in the seasonal component.

Figure 21. Numenta Anomaly Benchmark time series with anomalies in a seasonal component.

Table 1. Tasks mainly accomplished by a SIEM system.

No	Task	Description
1	Gathering	Collecting, processing, and analyzing security events that come into the system from diverse sources many sources
2	Detection	Real-time detection of attacks and violations of security criteria and policies
3	Assessment	Prompt assessment of the security of information, telecommunications, and other critical resources
4	Risk	Security risk analysis and management
5	Investigation	Conducting investigations into incidents
6	Security	Making effective decisions to protect information
6	Reporting	Reporting documents

Table 2. General FSM tuple representation.

	Tuple	Representation
1	Q	Finite nonempty set of states
2	∑	Set of input symbols
3	$δ$	A state transition functions
4	$q_{o}$ $ϵ$ Q	The initial state
5	$F \subseteq Q$	Set of end states

Table 3. Mapping FSM with anomaly detection approaches.

Tuple	SIEM Representation
Q	Number of events in the SIEM system
∑	Number of features in SIEM logs
$δ$	Number of functions in the SIEM system
$q_{o}$ $ϵ$ Q	Subset of the flow of data from SIEM systems
$F \subseteq Q$	Subset of the flow of data from SIEM systems

Table 4. Selected relevant studies ¹ that comparatively have been mapped to NP-AD in a time series study.

Ref.	Focus	Approach	Limitations
[12]	Anomaly detection in time-series data	Focused on the unsupervised environment and taxonomy for outlier detection	Not inclined toward patternless
[13]	Rare pattern discovery in time-series	Using repeated sub-sequences in time-series	Does not show utilities for wide-range real-life dataset
[14,15]	Real-time patternless IDS	Non-signature based/patternless IDS for thermodynamics	hardly focused on a time series approach
[16]	Learning patterns over time-series	Pattern classifications using logs and processes over time-series	Basically a generic and normal to the learning patterns
[17]	Resilient real-time anomaly	Uses non-parametric statistical test	Only worked on single features and not inclined toward patternless over time series
[18]	Patternless IDS	Leverages ZIPPO	Time-series approach is hardly observed
[19]	Anomaly detection in time series	Detecting abnormalities in time series based on distance computation	There is need to check the accuracy of the proposed algorithm, together with the efficiency
[20]	Anomaly in LSTM networks	Trained LSTM network in non-anomalous data modeled in Gaussian distribution	It is not known whether normal behaviors included long-time dependencies
[6]	DeepAnt unsupervised anomaly	Leverages Deep learning in unsupervised anomaly detection in time series	Not certain of adversarial approaches and there is a need to evaluate the impact on time series forecasting
[21]	Automatic outlier detection in time-series	Detecting unusual values from time-series where data are hard to model	Knowledge about the signal is important
[22]	ECG anomaly detection in time series	Leverages ZIPPO	Time-series approach is hardly observed

¹ Only studies aligned to NP-AD in time-series.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tkach, V.; Kudin, A.; Kebande, V.R.; Baranovskyi, O.; Kudin, I. Non-Pattern-Based Anomaly Detection in Time-Series. Electronics 2023, 12, 721. https://0-doi-org.brum.beds.ac.uk/10.3390/electronics12030721

AMA Style

Tkach V, Kudin A, Kebande VR, Baranovskyi O, Kudin I. Non-Pattern-Based Anomaly Detection in Time-Series. Electronics. 2023; 12(3):721. https://0-doi-org.brum.beds.ac.uk/10.3390/electronics12030721

Chicago/Turabian Style

Tkach, Volodymyr, Anton Kudin, Victor R. Kebande, Oleksii Baranovskyi, and Ivan Kudin. 2023. "Non-Pattern-Based Anomaly Detection in Time-Series" Electronics 12, no. 3: 721. https://0-doi-org.brum.beds.ac.uk/10.3390/electronics12030721

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Non-Pattern-Based Anomaly Detection in Time-Series

Abstract

1. Introduction

2. Methodology

3. Background and Related Literature

3.1. Anomalies and Secure Data Gathering

3.2. Time-Series

4. Non-Pattern Anomaly Detection (NP-AD) in Time Series

4.1. High-Level Description of NP-AD in Time Series

4.2. NP-AD Problem Formulation

4.2.1. Finite State Machine

4.3. General NP-AD Formalisms

4.4. NP-AD Requirements and Functions

4.4.1. General Prediction Function

4.4.2. Logistic Function

5. Experiment and Results

5.1. Approach 1: General Anomaly Detection

5.2. Approach 2: Non-Pattern Anomaly Detection

6. Comparative Analysis

7. Critical Evaluation of NP-AD in Time Series

8. Conclusions and Future Work

9. Raw Data and Sources

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI