An Effective Method for Detecting Unknown Types of Attacks Based on Log-Cosh Variational Autoencoder

Yu, Li; Xu, Liuquan; Jiang, Xuefeng

doi:10.3390/app132212492

Open AccessArticle

An Effective Method for Detecting Unknown Types of Attacks Based on Log-Cosh Variational Autoencoder

by

Li Yu

^*,

Liuquan Xu

and

Xuefeng Jiang

School of Computer Science and Engineering, Anhui University of Science and Technology, Huainan 232001, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(22), 12492; https://0-doi-org.brum.beds.ac.uk/10.3390/app132212492

Submission received: 20 October 2023 / Revised: 13 November 2023 / Accepted: 14 November 2023 / Published: 19 November 2023

(This article belongs to the Special Issue Network Intrusion Detection and Attack Identification)

Download

Browse Figures

Versions Notes

Abstract

:

The increasing prevalence of unknown-type attacks on the Internet highlights the importance of developing efficient intrusion detection systems. While machine learning-based techniques can detect unknown types of attacks, the need for innovative approaches becomes evident, as traditional methods may not be sufficient. In this research, we propose a deep learning-based solution called the log-cosh variational autoencoder (LVAE) to address this challenge. The LVAE inherits the strong modeling abilities of the variational autoencoder (VAE), enabling it to understand complex data distributions and generate reconstructed data. To better simulate discrete features of real attacks and generate unknown types of attacks, we introduce an effective reconstruction loss term utilizing the logarithmic hyperbolic cosine (log-cosh) function in the LVAE. Compared to conventional VAEs, the LVAE shows promising potential in generating data that closely resemble unknown attacks, which is a critical capability for improving the detection rate of unknown attacks. In order to classify the generated unknown data, we employed eight feature extraction and classification techniques. Numerous experiments were conducted using the latest CICIDS2017 dataset, training with varying amounts of real and unknown-type attacks. Our optimal experimental results surpassed several state-of-the-art techniques, achieving accuracy and average F1 scores of 99.89% and 99.83%, respectively. The suggested LVAE strategy also demonstrated outstanding performance in generating unknown attack data. Overall, our work establishes a solid foundation for accurately and efficiently identifying unknown types of attacks, contributing to the advancement of intrusion detection techniques.

Keywords:

intrusion detection; variational autoencoder; deep learning attack of unknown type

1. Introduction

1.1. Research Background

The rapid expansion of the Internet has permeated every aspect of contemporary life, resulting in the generation of enormous volumes of sensitive data. Unfortunately, this abundance of data has also provided hackers with more opportunities to exploit vulnerabilities. As the Internet continues to evolve, the types of attacks associated with it also evolve, leading to the emergence of increasingly sophisticated and unknown attack methods. These unknown attacks on network traffic not only consume valuable network resources but also have detrimental impacts on the functionality of hosts and devices. Additionally, they pose a serious concern by breaching the security and confidentiality of network users’ private information, potentially endangering national and social security [1]. Intrusion detection systems (IDSs) play vital roles in network security as they monitor network traffic and identify unusual user behavior. When an IDS detects deviations such as unusual conduct or non-traditional data transmission techniques, it immediately generates alerts and notifies the appropriate staff to take necessary action [2]. Generally speaking, there are two types of network-based intrusion detection systems: anomaly-based and signature-based intrusion detection systems. Network traffic patterns are compared to known attack signatures or features using signature-based intrusion detection systems to identify attacks. These methods, however, are unable to identify novel attack variations, unknown attacks, or attacks from related families. By contrast, anomaly-based intrusion detection systems (IDSs) identify and label aberrant communications when they detect deviations from a model of typical user behavior [3,4]. While anomaly-based IDSs can identify unknown or zero-day attacks, accurately capturing the ever-changing behaviors of network users can be challenging [5]. Efforts to develop intrusion detection systems that are capable of accurately identifying previously unidentified attack types have gained traction and are considered a hot topic in the research community.

Many successful approaches have been developed in the past decade, most of which utilize predetermined rules for classification. These techniques rely on complex machine learning algorithms, such as Support Vector Machines (SVMs) [6] and Random Forests (RFs) [7], to distinguish different attack classes. However, traditional machine learning techniques often struggle to effectively learn high-dimensional data characteristics and are more inclined to focus on low-dimensional data features. Conventional machine learning techniques have several shortcomings: (a) They significantly rely on predetermined traffic features or properties and are not good at identifying unknown types of attacks; instead, they are best at detecting known attacks [8]. (b) Traditional intrusion detection algorithms lack scalability and adaptability to identify unknown types of attacks in dynamic network architectures [9]. (c) These methods rely on labeled data for training, which can be computationally expensive and susceptible to manipulation through artificial data, resulting in a decline in overall performance [9]. Deep learning techniques show great promise in effectively extracting significant features from large amounts of high-dimensional data [10,11,12,13], addressing the challenge of constantly evolving network patterns. For example, refs. [14,15] each propose two lightweight approaches and achieve lightweight while being able to defend against multiple security attacks. However, the accuracy of identifying unknown types of attacks is often compromised due to variations between pre-training traffic samples and real network traffic data, such as differences in the network packet sizes and communication protocols [16].

1.2. Related Work

In recent years, numerous innovative approaches have emerged to address the challenge of detecting unknown types of attacks [17]. Singh, A. presented an edge-based hybrid intrusion detection system that integrated three different categorization techniques to efficiently identify novel and unknown attack types. Their experimental results demonstrated a remarkable 93% decrease in false alarms, significantly improving the overall detection rate for unknown attacks [18]. Similarly, Zoppi, T. conducted a thorough analysis using 47 distinct algorithms to identify unidentified attack types across 11 datasets. Notably, their experimental results showcased the superiority of the meta-learning strategy in detecting unknown attacks compared to existing methods [19].

In [20], a comprehensive analysis of active learning techniques is provided, focusing on the use of k-nearest neighbor techniques in conjunction with deep neural networks to facilitate the adaptive incremental detection of unknown attacks. Additionally, Soltani, M. proposed a unique approach that combines deep models and clustering techniques to effectively identify zero-day threats [21]. Mahdavi, E. presented a method that combines incremental learning and transfer learning to detect unknown attacks, with encouraging results on the KDD99 and CICIDS2017 datasets [22]. Mananayaka, A.K. employed a combination of four machine learning techniques for automatic feature selection, achieving excellent f1 scores and accuracy in both datasets [23]. Zhou, X. proposed a hierarchical adversarial attack generation technique along with a hierarchical node selection algorithm to efficiently identify previously unidentified attack types, enhancing the ability to detect unknown threats [24].

In a similar vein, Kumar, V. developed a two-phase intelligent network technique specifically for identifying zero-day threats, achieving impressive accuracy rates of over 90% on CICIDS 2018 and real-time datasets using created signatures [25]. Sarhan, M. suggested a zero-sample learning technique to evaluate the performance of machine learning-based detection systems against unknown threats, providing valuable insights into their abilities to identify and mitigate such threats [26]. Sheng, C. devised a self-growing attack traffic classification system based on density-based heuristic clustering to improve the detection of unknown forms of attacks, enabling real-time automated detection [27]. Hairab, B.I. utilized a convolutional neural network with L1 and L2 regularization algorithms to prevent overfitting and identify unknown attacks [28].

In contrast, Araujo-Filho, P.F.d. successfully detected zero-day attacks without the need for labeled data. They achieved this by combining temporal convolutional networks, self-attention, and generative adversarial networks [29]. Verkerken, M., on the other hand, explored a multi-level hierarchical approach that integrated neural network techniques, autoencoders, random forests, and one-class Support Vector Machines to detect zero-day attacks. This method exhibited an impressive accuracy rate of 96%, surpassing earlier techniques [30]. Sohi, S.M. made a groundbreaking discovery by demonstrating that the utilization of recurrent neural networks aided in generating unknown types of attacks from malware. This innovation led to a remarkable 16.67% improvement in the detection rate [31]. Additionally, a distributed anomaly detection technique was developed by Debicha, I., which utilized a mixed Gaussian distribution based on correntropy to instantly detect zero-day attacks. Positive results were obtained from the experiments conducted on the NSL-KDD and UNSW-NB15 datasets [32]. Debicha, I. further combined several adversarial classifiers using migration learning and leveraged their individual judgments to identify attacks [33].

A comprehensive overview of machine learning-based techniques that have been extensively researched over the past decade and have demonstrated remarkable results is provided in [34]. The authors of [35] proposed a three-layer design using machine learning approaches for tasks related to preprocessing, binary classification, and multi-class classification. Sabeel, U. summarized the current developments in deep learning techniques for identifying unknown attacks, highlighting several strategies that have exhibited exceptional performance [36]. Rani, S.V.J. presented a revolutionary approach that combines deep hierarchical neural networks with machine learning, achieving an astounding accuracy of 99.07% [37]. Furthermore, the authors of [38] reported a detection technique based on a convolutional neural network and meta-learner, utilizing a comprehensive dataset created by merging five distinct datasets. The experimental findings demonstrate the method’s adaptability. Shin, G.Y. suggested a novel method that enhances the accuracy for all types of attacks on the NSL-KDD dataset by training a fuzzy c-mean eigen analysis model at decision boundary points [39]. Moreover, Lan, J. introduced an unsupervised domain adaptation technique and a hierarchical attention triple network, both of which accurately identify previously unknown attacks, as confirmed by experimental findings [40]. Zavrak, S. proposed a novel method that combines an autoencoder and a variational autoencoder to efficiently detect unknown threats based on stream characteristics. This method outperforms one-class Support Vector Machines and standard autoencoders, making it valuable in identifying unknown attacks [41].

The author examined two deep generation techniques in their investigation: an adversarial autoencoder with conditional denoising and an autoencoder combined with the k-nearest neighbor algorithm. Their aim was to develop an intrusion detection system with robust detection capabilities for unknown threats. To assess the performance of these three approaches, the authors conducted experiments on four datasets. The results clearly demonstrated the potential of the proposed approach in enhancing the resilience of intrusion detection systems [42]. Furthermore, Long, C. presented an approach that combines autoencoders by first selecting the best subset of features using feature selection. The subsequent step involves integrating multiple autoencoders to identify unknown threats. The experimental findings showcase the method’s robustness and effectiveness in detecting unknown attacks [43].

Some effective solutions have appeared for the better detection of unknown attacks [44,45,46,47,48,49,50,51,52]. These methods use correlated autoencoders, variational autoencoders, and conditional variational autoencoders and combine them with different methods. The experiments have yielded good results.

To address the challenges, we developed a unique intrusion detection system (IDS) called the LVAE. The LVAE effectively identifies unknown types of attacks by integrating a logarithmic hyperbolic cosine (log-cosh) reconstruction loss function. This function optimizes the potential space between the input data and reconstructed data, surpassing conventional variational autoencoders (VAEs). Consequently, the LVAE significantly enhances the quality of generated unknown attacks. We employed eight methods to extract the features and employed multiple techniques to identify unknown attacks before selecting the most accurate approach. Through extensive testing and comparison, we demonstrate that the LVAE reliably detects unknown types of attacks, ensuring robust and efficient network security.

1.3. The Contribution of the Work in This Paper

The contribution of the work in this paper can be summarized as follows:

We introduce a novel approach called the LVAE to identify unknown attacks. We developed this approach by incorporating an effective reconstruction loss term that utilizes the logarithmic hyperbolic cosine (log-cosh) function. This function accurately captures the intricate distribution of actual attack data, enabling the simulation of discrete features to model and enhance the detection of novel, unknown attacks.
We employ eight different techniques for feature extraction and data classification. Throughout the experimentation process, we select the most accurate approach to ensure exceptional performance in identifying unknown attacks.
Our model is trained using the latest CICIDS 2017 dataset, which includes a variety of real and unknown-type attacks. In comparison to several state-of-the-art methods, our LVAE approach demonstrates superior performance, surpassing recent advancements and significantly improving the detection rate of unknown attacks.
Our method exhibits fast convergence and effectively minimizes the loss function within a short learning epoch.

2. Materials and Methods

Figure 1 illustrates the proposed framework of the study. The LVAE approach is divided into two main components: a generator and a classifier. The generator is responsible for creating novel or unknown attacks, while the classifier combines eight distinct techniques for feature extraction and data categorization.

The former approach involves reconstructing the input dataset, thereby generating data for novel or unknown types of attacks. On the other hand, the latter approach focuses on data classification and feature extraction. The generator trains to efficiently generate novel attacks or unknown types of attacks. The original samples are then combined with the generated samples of unknown attacks and fed into the integrated classifier for training. These two parts are trained independently of each other. After evaluating the effectiveness of several approaches, the classifier selects the top-performing classifier to output the outcomes.

Figure 1. The overall structure of the method proposed in this paper.

2.1. Log-Cosh Variational Auto Encoder

This section also explains the variational autoencoder (VAE) and its role in fitting the input data using a multidimensional Gaussian distribution. The encoder network (

q_{f} (z | x)

) is the first part of the VAE, with model parameters (

ϕ

). It maps the input, X, to the feature variable, Z, effectively compressing the input data into a potential low-dimensional space. The decoder network (

p_{θ} (x | z)

) is the second part, with model parameters (

θ

). Its task is to map the feature variable, Z, back to the data,

\bar{X}

, by reconstructing the feature variable, Z. By using the VAE approach, the decoder can generate new data by reconstructing the latent spatial feature variable, Z. Equation (1) summarizes the formulation of the conventional VAE loss function.

L_{V A E} = E [\log p (X | Z)] - D_{K L} [q (Z | X) | | p (Z)]

(1)

There are two fundamental parts of the VAE loss function in Equation (1). The logarithmic reconstruction loss term, or the probability distribution

E [\log p (X | Z)]

from which the data are derived, is the initial component. Its goal is to minimize the squared

L_{2}

which is the loss between the reconstructed data

\bar{X}

and the input data X. The second portion is the Kullback–Leibler (KL) divergence,

D_{K L} [q (Z | X) | | p (Z)]

, which minimizes the difference between the learned distribution

q (Z | X)

and the predefined distribution

p (Z)

. This facilitates efficient learning of the input data’s latent space representation using the VAE model. By integrating these two elements, the VAE loss function encourages the learning distribution to match the predefined distribution while optimizing the reconstruction accuracy. This dual goal allows the VAE model to provide excellent reconstructions and capture the underlying structure of the input data.

The squared

L_{2}

loss function, represented by Equation (2), plays a crucial role in the VAE framework. It seeks to minimize the difference between the input vector, X, and its corresponding reconstructed vector,

\bar{X}

, ensuring the accurate reconstruction of the original data. The ith sample in X is denoted by the term

x_{i}

, while the ith sample in

\bar{X}

is denoted by the term

{\bar{x}}_{i}

. In order to ensure the correct reconstruction of the original data, this loss function seeks to minimize the difference between the input vector and its corresponding recreated vector.

L_{reconstruction} = {\sqrt{\sum_{i = 1}^{n} {(x_{i} - {\bar{x}}_{i})}^{2}}}^{2} = \sum_{i = 1}^{n} | x_{i} - {\bar{x}}_{i} |^{2} = L_{2}^{2}

(2)

The primary purpose of the conventional VAE is data generation; nonetheless, it has significant limitations. Reconstructing data using a low-dimensional latent spatial feature variable, Z, is one such difficulty. Dealing with intrusion detection datasets, like the CICIDS2017 dataset, which frequently contains discrete and high-dimensional data, makes this extremely difficult. As a result, the efficiency of the created data is decreased since there is typically a large reconstruction loss between the generated and input data.

To address this problem, our goal is to balance the weights given to the reconstructed data and the latent space in Equation (1). However, the estimated reconstruction loss is usually very small, resulting in a limited error margin and a negligible penalty for the reconstruction loss component. Consequently, the KL divergence, represented by the second part of Equation (1), becomes the dominant part of the loss function.

An efficient way to address this issue and take different loss regions into consideration is to make the reconstruction loss weight heavier when the loss is tiny and to keep the reconstruction loss from increasing too much when the reconstruction error rises linearly. Therefore, we use the log-cosh function in place of the logarithmic reconstruction loss term in Equation (1) (i.e., the first portion). The reconstruction loss is concentrated closer to the origin when the logarithmic hyperbolic cosine function is used, avoiding excessive penalization when the reconstruction error is high. Equation (3) provides a clear formula for the logarithmic hyperbolic cosine function.

f (x) = \log (\cosh (x)) = \log \frac{e^{x} + e^{- x}}{2}

(3)

To efficiently handle this issue and consider different loss regions, we propose using a log-cosh loss function, as shown in Equation (4). X and

\bar{X}

are the input vector and reconstruction vector, respectively.

L_{\log - \cosh} = \sum_{i = 1}^{n} \log (\cosh (X_{i} - {\bar{X}}_{i})) = \sum_{i = 1}^{n} \log \frac{e^{X_{i} - {\bar{X}}_{i}} + e^{- (X_{i} - {\bar{X}}_{i})}}{2}

(4)

In our LVAE technique, the log-cosh function replaces the logarithmic reconstruction loss term (the first part) of Equation (1), reconstructing the loss. Equation (5) demonstrates the final loss function we create.

L o s s = \sum_{i = 1}^{n} \log (\cosh (X_{i} - {\bar{X}}_{i})) - D_{K L} [q (Z | X) | | p (Z)]

(5)

In order to generate novel and unknown types of attack data, we introduce an enhanced VAE model that uses log-cosh as the reconstruction loss function. The generated data are then used to evaluate the model’s ability to detect unknown attacks. Unlike conventional VAEs, we use Equation (5) as the loss function for the data generation phase. The input data are first mapped to the latent space feature variable, Z, using the encoder, which is then fed into the trained decoder to reconstruct the data. Finally, the classification stage is employed to classify the newly generated unknown types of attack data. Algorithm 1 outlines the training process for the LVAE method to generate new or unknown types of attack data, where

x_{i}

represents the ith sample of the input dataset, and

{\bar{x}}_{i}

represents the ith sample of the output dataset.

Algorithm 1 Train LVAE to generate new or unknown types of attack data

Input: data set X.

Output: new or unknown type of attack dataset

\bar{X}

.

1: Data preprocessing: removing redundant information, filling in missing values and normalization.

2: Iteration

3: for number of epochs learned do

4: for mini-batch quantities do

5: The data

x_{i}

is input to the encoder to obtain the feature variable Z.

6: The feature variable Z is input to the decoder to obtain the reconstructed data

{\bar{x}}_{i}

.

7: Backpropagation calculates the loss values and gradients for Equation (5).

8: Gradient descent.

9: end for

10: end for

11: Until Equation (5) converges.

12: Output new or unknown type of attack dataset

\bar{X}

.

2.2. Classification Stage

To guarantee optimal performance, we preprocess the data before training the integrated classifier. During this preprocessing phase, the data are normalized, the superfluous information is eliminated, and the missing values are filled in. We start by deleting any duplicate data from the dataset. We then ensure that the data are complete for an additional analysis by adding 0 to any missing values.

We employ Min–Max normalization to address the issue of features with a large range of values dominating the effect. This normalization method scales each feature to a value between 0 and 1, ensuring that every feature contributes proportionately to the overall classification process. The ith characteristic is denoted by

x_{i}

, as indicated in Equation (6).

x_{i} = \frac{x_{i} - {(x_{i})}_{\min}}{{(x_{i})}_{\max} - {(x_{i})}_{\min}}

(6)

The preprocessed data are used to train the integrated classifier after the data preprocessing step. Equation (7) defines the loss function for the classification stage. In this equation, the true value of the ith sample is denoted by

y_{i}

, the predicted value of the ith sample is denoted by

h_{θ} {(x)}_{i}

, and the number of samples is represented by n.

L_{Classification} = - \frac{\sum_{i = 1}^{n} y_{i} \log h_{θ} {(x)}_{i} + (1 - y_{i}) \log (1 - h_{θ} {(x)}_{i})}{n}

(7)

Algorithm 2 describes the training steps for the integrated classifier. The number of classifiers in the integrated classifier is denoted by C. In the rebuilt dataset,

{\bar{x}}_{i}

represents the ith sample, and

h_{θ} {(\bar{x})}_{i}

represents the corresponding anticipated value.

R_{i} (i = 1, 2, 3, 4, 5, 6, 7, 8)

represents the classification result for each classifier, while

R_{i}

represents the best classification result. Now, let us delve into the methods used to train integrated classifiers. The reconstructed dataset is first fed into the integrated classifier, which undergoes data preprocessing operations such as removing the redundant information, filling in the missing values, and normalization. Next, the dataset is fed into each classifier of the integrated classifier, which undergoes forward and backpropagation to minimize the cost function and converge. The prediction results of each classifier are then output, and the integrated classifier selects the best classification result based on these outputs.

Algorithm 2 Training Integrated Classifiers

Input: data set

\bar{X}

.

Output: Classification results

R_{i}

.

1: Data preprocessing: removing redundant information, filling in missing values and normalization.

2: Iteration

3: for c = 1 in rang (1, 9)

4: Input data

{\bar{x}}_{i}

into the classifier to get the predicted value

h_{θ} {(\bar{x})}_{i}

5: Backpropagation calculates Equation (6) losses and gradients.

6: Gradient descent.

7: end for

8: Until Equation (6) converges.

9: Output the classification result for each classifier

R_{i} (i = 1, 2, 3, 4, 5, 6, 7, 8)

.

10: for c = 1 in rang(1, 9)

11: if

(R_{i} > R_{i} (i = 1, 2, 3, 4, 5, 6, 7, 8)

)

12: Output classification result

R_{i}

.

13: end if

14: end for

15: Output the classification results

R_{i}

.

2.3. Model Structural Details and Detection Process

Our LVAE method consists of two parts: a variational autoencoder part and an integrated classifier part. Figure 2 illustrates the detection process. The variational autoencoder part has an input vector size of 79 and a latent space variable size of 2. Unlike other research works, our input is in the form of data. The encoder and decoder parts utilize fully connected neural networks.

The encoder has an input dimension of 79 and a hidden layer with 79 nodes and a ReLU activation function. The second hidden layer also has 79 nodes and a sigmoid activation function. The output latent vector dimension has a mean and variance of 2. The Gaussian distribution vector is then combined with the output, resulting in the latent space vector, Z.

The decoder takes the input latent space variable, Z. The first hidden layer is a neural network with 79 nodes and a ReLU activation function. The second hidden layer is also a neural network with 79 nodes and a sigmoid activation function. The output layer is a fully connected neural network with a tanh activation function.

After gradient descent using our designed log-cosh loss function, the reconstructed data are obtained. The reconstructed data and the original data are then preprocessed together, and the preprocessed original data are input into the integrated classifier for training. The reconstructed data are input into the trained model for classification. Now, let us discuss the structure of the integrated classifier.

Multi-Layer Perceptron (MLP): The input dimension is 78, and the hidden layer is a neural network with 80 nodes and a ReLU activation function. The output layer has a dimension of 1 and uses a softmax activation function.

Naive Bayes (NB): A model defined by scikit-learn library.

Decision Tree (DT): A model defined by scikit-learn library.

Random Forest (RF): A model defined by scikit-learn library.

Support Vector Machine (SVM): A model defined by the scikit-learn library.

Logistic Regression (LR): A model defined by scikit-learn library.

Gradient Boosting (GB): A model defined by scikit-learn library.

Gated Recurrent Unit (GRU): A total of 80 nodes are used for the hidden layer and a sigmoid activation function is used; the dropout is set to 0.2. The output layer is set to 1, and softmax is used for the activation function.

Figure 2. Model-specific detection process.

3. Experiments

For our experimental setup, we used a personal laptop with an AMD Ryzen TM 7 6800H CPU running at 3.20 GHz and Windows 10 installed. The laptop was equipped with 16 GB of RAM to meet the computing demands of the study. Additionally, we utilized a laptop GPU with a GeForce RTX 3060 to accelerate the calculations.

Python 3.7 programming language and TensorFlow version 2.1 were used as the deep learning frameworks to conduct our tests. These resources provided a reliable and effective environment for our research and evaluation.

3.1. Description of the Dataset

The CICIDS2017 dataset [53], which contains both traditional and cutting-edge attacks, is utilized in this section. Introduced in 2018, the purpose of this dataset is to simulate real network attacks by imitating abstract network activities and injecting them into attack scenarios. The dataset consists of expert-created attack profiles and weekly network activity.

The CICIDS2017 dataset consists of data samples with 80 characteristics, including attributes such as the number of forward packets, minimum flow length, and the maximum time between two flows. It is important to note that the dataset is highly unbalanced in terms of distribution. For example, there are over two million data samples from normal traffic, while there are only 11 instances of Heartbleed attacks. This characteristic makes the CICIDS2017 dataset more representative of real-world cyberattacks. Table 1 provides a detailed distribution of the dataset.

The dataset is usually divided according to the standards as 20% and 80% or as 30% and 70% [2]. However, our LVAE model aims to efficiently identify unknown types of attacks, unlike traditional methods. To achieve this, we create the training set by randomly selecting an equal number of attack samples and normal samples, which, together, account for 50% of the dataset. Similarly, we select the same number of samples for the testing set as for the training set. This ensures that we can accurately evaluate the detection ability of our method for unknown threats. Table 2 provides a detailed description of the dataset used in our research. The generated unknown attacks are only used for the experiments in this paper and are not used for any other purpose.

Table 1. Detailed distribution of CICIDS2017 dataset.

Traffic Class	Label	Numbers	Ratio
Benign	Benign	2,273,097	80.30%
DDoS	DDoS	128,027	4.52%
DoS	DoS Hulk	231,073	8.16%
	DoS GoldenEye	10,293	0.36%
	DoS Slowloris	5796	0.20%
	DoS Slowhttptest	5499	0.19%
Port Scan	Port Scan	158,930	5.61%
Botnet	Bot	1966	0.07%
Brute Force	FTP-Patator	7938	0.28%
Brute Force	SSH-Patator	5897	0.20%
Web Attack	Web Attack—Brute Force	1507	0.05%
	Web Attack—Sql Injection	21	0.001%
	Web Attack—XSS	652	0.02%
Infiltration	Infiltration	36	0.002%
Heartbleed	Heartbleed	11	0.001%
Total	N	2,830,743	100%

Table 2. A detailed distribution of the dataset.

Training Set Distribution	Training Dataset	Testing Set Distribution	Testing Dataset
Benign (50%)	10,000	Benign 100	2000 + 8000 (generated)
	20,000		2000 + 18,000 (generated)
	30,000		2000 + 28,000 (generated)
	40,000		2000 + 38,000 (generated)
	50,000		2000 + 48,000 (generated)
Attack (50%)	60,000	Attack 1900	2000 + 58,000 (generated)
	70,000		2000 + 68,000 (generated)
	80,000		2000 + 78,000 (generated)
	90,000		2000 + 88,000 (generated)

3.2. Assessment of Indicators

In IDSs, the following metrics are commonly used to assess the performance of a method: true positives (TP) represent correctly predicted attack samples, true negatives (TN) represent correctly predicted normal samples, false positives (FP) indicate samples where the normal class was incorrectly predicted as an attack, and false negatives (FN) represent samples where the attack class was incorrectly predicted as normal.

Accuracy is the ratio of correctly classified samples to all samples, as shown in Equation (8).

Accuracy = \frac{T P + T N}{T R + T N + F P + F N}

(8)

Precision is the ratio of samples correctly predicted as attacks to the total number of positively predicted samples, as shown in Equation (9).

Precision = \frac{T P}{T P + F P}

(9)

Recall is the ratio of samples correctly predicted as attacks to the total number of negatively predicted samples, as shown in Equation (10).

R e c a l l = \frac{T P}{T P + F N}

(10)

The F1 score is the metric used to balance precision and recall and is the most important metric to evaluate a method. P means precision, and R means recall, as shown in Equation (11).

F 1 = 2 \times \frac{P R}{P + R}

(11)

3.3. Setting of Model Hyperparameters

The creation of a novel or unknown type of attack in our proposed LVAE method is divided into two stages: the encoder and the decoder. The ReLU activation function is used in the hidden layers of both the encoder and decoder, while the Sigmoid activation function is used in the final layer. The loss function is optimized using the Adam optimizer after training the model for 100 epochs with a batch size of 10.

In the integrated classifier section, our method combines eight distinct approaches to detect new or unknown types of attacks. The key parameters for each method are as follows:

Multilayer Perceptron (MLP): The hidden layer consists of 80 nodes with ReLU activation. The model is trained for five epochs with a batch size of five, and the Adam optimizer is used for loss optimization.
Gaussian-Based Naive Bayes (Gaussian NB): No specific priorities are set.
Decision Tree (DT): The criterion is set to entropy, and the maximum number of tree layers is set to 4. The remaining parameters use default values.
Random Forest (RF): The number of estimators is set to 100, while the other parameters use default values.
Support Vector Machine (SVM): The gamma parameter is set to scale, and C is set to 1.
Logistic Regression (LR): The penalty is set to L2, C is set to 1, and the maximum number of iterations is set to 1,200,000.
Gradient Boost (GB): The random state is set to 0.
Gated Recurrent Unit (GRU): The hidden layer consists of 80 nodes with a sigmoid activation function. The dropout is set to 0.2. The output layer utilizes the softmax activation function. The model is trained for five epochs with a batch size of 10, and the Adam optimizer is used for loss optimization.

Our LVAE approach accurately detects unknown types of attacks by carefully selecting and fine-tuning the model parameters, as mentioned earlier. The choice of optimal hyperparameters is crucial for the overall performance of the model, and we have conducted numerous tests to identify these optimal values. This ensures that our approach remains effective in identifying unknown attacks in the face of new threats.

4. Results and Discussion

In this segment, we present the outcomes of our experiments and provide a detailed analysis and discussion of the obtained results. We also delve into the computational cost associated with each component of the LVAE approach.

Experimental Outcomes and Analysis: To evaluate the effectiveness of our LVAE technique, we conducted several experiments. The results clearly demonstrate that our approach is capable of accurately detecting unknown types of attacks. By carefully selecting and fine-tuning the model parameters, we ensure its success against new threats. Furthermore, we improved the overall performance of our method by carefully determining the optimal hyperparameters through experimentation. The specific configuration of these hyperparameters is detailed in Section 3.3.
Computational Cost Analysis: In order to identify novel or unknown attacks, we integrate eight different techniques in the integrated classifier section. Each technique has a set of carefully chosen parameters that maximize performance. While each approach incurs a different computational cost, we took measures to ensure efficiency without compromising accuracy. The specific configuration of these hyperparameters is detailed in Section 3.3.

Overall, our LVAE approach exhibits promise in detecting unknown attacks while considering the computational cost of each component. This makes it a viable option for practical applications where effectiveness and accuracy are crucial considerations.

4.1. Analysis of Experimental Results

Unlike conventional techniques, our method generates a new dataset by combining the created unknown attacks dataset with the original dataset. This new dataset is solely used as a testing set to evaluate the LVAE’s ability to detect unknown threats. This novel testing method provides valuable insights into the performance of our approach.

During the training phase, samples are randomly selected from the original dataset to train the model. Table 3, Table 4 and Table 5 display the individual classification results obtained using each method. These results represent the best classification outcome achieved by comparing the individual results obtained using each method. The data presented in the tables indicate the best classification result obtained for each sample size.

Table 3 illustrates the individual results obtained from various methods for detecting unknown attacks, with sample sizes ranging from 10,000 to 30,000. When the sample size is 10,000, a comparison of the individual results reveals that Random Forest achieves the highest training accuracy of 99.95%, while Gated Recurrent Unit achieves the highest testing accuracy of 99.01%. Multilayer Perceptron and Decision Tree achieve the best precision and recall, respectively. However, the Gated Recurrent Unit demonstrates the best overall model performance, as it achieves the highest F1 score.

As the sample size increases, Random Forest maintains the highest training accuracy, while the Gated Recurrent Unit continues to achieve the best testing accuracy. Multilayer Perceptron retains the highest precision, and Random Forest and Gated Recurrent Unit exhibit the best recall and F1 score, respectively.

These results showcase the specific performance of each method for detecting unknown attacks with varying sample sizes. All methods perform well, further highlighting the superiority of the LVAE method in detecting unknown types of attacks.

Table 4 presents a detailed analysis of the individual results obtained using various methods, with a sample size ranging from 40,000 to 60,000. At a sample size of 40,000, we observe that the Naive Bayes method achieves a lower training accuracy, while the other methods perform well. Random Forest performs the best, achieving the highest training accuracy, recall, and F1. Multilayer Perceptron, Gradient Boosting, and Gated Recurrent Unit achieve the highest precision, recall, and test accuracy, respectively. As the sample size increases, Random Forest, Gated Recurrent Unit, and Gradient Boosting exhibit the highest performance.

As the number of samples increases, we see steady progress in all metrics, indicating that our model successfully learns the underlying data properties. This demonstrates the effectiveness of our suggested LVAE approach, particularly in creating unknown attacks.

Table 5 displays the results obtained from various methods, with a sample size ranging from 70,000 to 90,000. At a sample size of 70,000, we find that Random Forest performs the best, achieving the best performance in almost every metric. Gradient Boosting and Naive Bayes also achieve the highest F1 scores, while Support Vector Machines and Gated Recurrent Units achieve the highest precision and test accuracy. As the number of samples increases, Random Forest, Gradient Boosting, and Gated Recurrent Unit achieve the best F1 scores, and the experimental results for all methods improve.

In summary, our suggested LVAE approach improves the reconstruction loss component by utilizing the log-cosh loss function, leading to better performance in identifying unknown attacks. Through extensive experimentation, we demonstrated that the LVAE consistently and reliably detects unknown attacks.

Despite the high performance achieved using our method, there are still some limitations. For example, the Gaussian-based Naive Bayesian approach performs poorly in detecting unknown attacks with a sample size of 30,000. This is partially attributed to the large error in the testing data, which negatively affects the predictive power of the model. The integrated classifier consists of eight methods. Consequently, individual classification results for each method are required, followed by a comparison of the results. The best classification result is then output. This process necessitates training for each method, leading to high computational complexity. The design of lightweight classification methods can be considered in later studies.

To verify the efficacy of our methodology, we carried out a thorough comparative study, considering various techniques. Notably, every study in our review used a variety of techniques to detect unknown threats, with encouraging results. Table 6 illustrates the effectiveness of our LVAE approach in detecting unknown attacks, with substantial improvements in various metrics.

Table 3. Performance of various methods of detection between 10,000 and 30,000 samples.

Number of Samples	Methods	Training (Accuracy)	Testing (Accuracy)	Precision	Recall	F1
Training (10,000) Testing (10,000)	MLP	95.44%	89.65%	98.63%	86.68%	92.27%
	NB	80.84%	97.82%	98.15%	97.82%	97.98%
	DT	88.29%	98.95%	98.03%	98.95%	98.49%
	RF	99.95%	98.87%	98.03%	98.90%	98.45%
	SVM	93.36%	94.07%	98.06%	94.08%	96.03%
	LR	85.09%	96.40%	98.02%	96.41%	97.21%
	GB	99.65%	98.88%	98.04%	98.88%	98.46%
	GRU	80.97%	99.01%	98.12%	98.90%	98.52%
Training (20,000) Testing (20,000)	MLP	95.19%	98.63%	99.27%	92.49%	95.76%
	NB	63.60%	94.37%	99.11%	94.37%	96.68%
	DT	91.57%	97.56%	99.06%	97.56%	98.31%
	RF	99.98%	99.47%	99.01%	99.50%	99.25%
	SVM	92.95%	97.12%	99.03%	97.13%	98.07%
	LR	83.35%	97.23%	99.02%	97.23%	98.12%
	GB	99.24%	97.55%	99.01%	97.55%	98.27%
	GRU	87.30%	99.50%	99.01%	99.50%	99.25%
Training (30,000) Testing (30,000)	MLP	96.13%	99.02%	99.42%	99.30%	99.36%
	NB	79.27%	3.88%	93.21%	3.88%	7.46%
	DT	91.13%	98.37%	99.37%	98.37%	98.87%
	RF	99.93%	99.66%	99.34%	99.66%	99.50%
	SVM	93.62%	98.08%	99.35%	98.08%	98.71%
	LR	84.39%	98.17%	99.34%	98.18%	98.76%
	GB	99.09%	98.91%	99.35%	98.92%	99.13%
	GRU	88.04%	99.67%	99.34%	99.66%	99.50%

Table 4. Performance of various methods of detection between 40,000 and 60,000 samples.

Number of Samples	Methods	Training (Accuracy)	Testing (Accuracy)	Precision	Recall	F1
Training (40,000) Testing (40,000)	MLP	96.90%	99.64%	99.65%	97.54%	98.58%
	NB	60.49%	98.65%	99.55%	98.65%	99.10%
	DT	91.73%	98.77%	99.53%	98.78%	99.15%
	RF	99.97%	99.73%	99.50%	99.75%	99.63%
	SVM	94.44%	98.98%	99.51%	98.99%	99.25%
	LR	86.85%	98.63%	99.52%	98.63%	99.07%
	GB	99.21%	99.74%	99.50%	99.75%	99.62%
	GRU	91.13%	99.75%	99.52%	99.72%	99.62%
Training (50,000) Testing (50,000)	MLP	97.33%	98.62%	99.67%	99.31%	99.49%
	NB	77.73%	98.94%	99.64%	98.94%	99.29%
	DT	92.07%	99.03%	99.62%	99.03%	99.33%
	RF	99.94%	99.80%	99.60%	99.80%	99.70%
	SVM	94.94%	98.19%	99.61%	98.20%	98.90%
	LR	88.29%	98.90%	99.61%	98.90%	99.25%
	GB	99.17%	99.78%	99.60%	99.79%	99.69%
	GRU	93.26%	99.80%	99.60%	99.80%	99.70%
Training (60,000) Testing (60,000)	MLP	97.41%	99.06%	99.69%	99.68%	99.69%
	NB	77.71%	99.13%	99.70%	99.14%	99.42%
	DT	92.56%	99.18%	99.67%	99.19%	99.43%
	RF	99.99%	99.83%	99.67%	99.83%	99.75%
	SVM	94.28%	98.97%	99.67%	98.79%	99.32%
	LR	88.52%	99.09%	99.67%	99.09%	99.38%
	GB	99.21%	99.83%	99.67%	99.83%	99.75%
	GRU	93.54%	99.82%	99.67%	99.83%	99.75%

Table 5. Performance of various methods of detection between 70,000 and 90,000 samples.

Number of Samples	Methods	Training (Accuracy)	Testing (Accuracy)	Precision	Recall	F1
Training (70,000) Testing (70,000)	MLP	97.28%	99.85%	99.75%	99.10%	99.42%
	NB	59.99%	99.85%	99.71%	99.86%	99.79%
	DT	91.84%	99.29%	99.73%	99.30%	99.52%
	RF	99.98%	99.86%	99.72%	99.86%	99.79%
	SVM	92.07%	97.71%	99.85%	97.71%	98.77%
	LR	87.63%	99.21%	99.74%	99.21%	99.47%
	GB	99.16%	99.84%	99.73%	99.85%	99.79%
	GRU	91.12%	99.86%	99.73%	99.84%	99.78%
Training (80,000) Testing (80,000)	MLP	97.68%	99.88%	99.78%	99.29%	99.54%
	NB	63.57%	99.38%	99.77%	99.39%	99.58%
	DT	92.67%	99.37%	99.77%	99.37%	99.57%
	RF	99.99%	99.87%	99.75%	99.88%	99.81%
	SVM	95.52%	98.85%	99.76%	98.86%	99.31%
	LR	89.10%	99.31%	99.77%	99.31%	99.54%
	GB	99.09%	99.86%	99.76%	99.87%	99.81%
	GRU	92.97%	99.87%	99.75%	99.88%	99.81%
Training (90,000) Testing (90,000)	MLP	97.71%	99.32%	99.79%	99.85%	99.82%
	NB	65.17%	99.45%	99.80%	99.46%	99.63%
	DT	92.85%	99.44%	99.80%	99.44%	99.62%
	RF	99.99%	99.87%	99.78%	99.87%	99.83%
	SVM	95.80%	99.31%	99.78%	99.31%	99.55%
	LR	89.75%	99.38%	99.80%	99.39%	99.59%
	GB	99.07%	99.88%	99.78%	99.89%	99.84%
	GRU	95.32%	99.89%	99.78%	99.87%	99.83%

Table 6. Comparative study.

Reference	Accuracy	Precision	Recall	F1
[16]	97.28%	-----	97%	72.81%
[44]	92.10%	59.13%	49.12%	97.15%
[49]	96.90%	97.70%	97.60%	97.60%
[39]	89.78%	91.16%	95.34%	93.13%
[37]	99.07%	98.91%	98.95%	98.93%
[29]	97.07%	97.05%	97.10%	97.07%
[25]	91.33%	99.77%	89.00%	94.10%
[17]	97.20%	81.40%	83.60%	82.50%
[1]	98.99%	96.99%	89.78%	93.25%
Proposed	99.88%	99.78%	99.89%	99.84%

A represents the integration of classifiers, i.e., the eight methods of MLP, GaussianNB, DT, RF, SVM, LR, GB, and GRU.

4.2. Calculated Cost Analysis of the Various Components of the LVAE Methodology

We used a personal laptop with a GeForce RTX 3060 laptop GPU to evaluate the LVAE approach. The time requirements for each step in the LVAE approach for different training data quantities are shown in Figure 3, Figure 4, Figure 5 and Figure 6. Interestingly, the VAE phase consistently required the least amount of time to generate unknown attacks, demonstrating the efficiency of our method. The combination of the VAE and MLP produced the highest time consumption, with the combination of GRU coming in second. The third-highest time consumption was attributed to the combination with SVM, while the other components required comparable amounts of time.

When the number of training samples increases, the total amount of time consumed remains relatively low, as observed by examining the time consumption for each LVAE component. This discovery demonstrates the effectiveness of the LVAE in identifying unknown attacks at little computing cost.

Figure 3 and Figure 4 show the time consumption at different stages in the LVAE process for sample sizes between 10,000 and 40,000. The VAE component requires the least amount of time to generate unknown attacks, indicating its effectiveness in producing them. While the other components require comparable amounts of time, MLP and GRU demand a substantial amount of time during the classification step.

Figure 5 and Figure 6 display the time spent at each stage of the LVAE approach for sample sizes ranging from 50,000 to 90,000, which further supports the earlier findings. The enhanced VAE efficiently produces unknown attacks, which are categorized by a classifier that incorporates eight techniques. The time analysis shown in Figure 3, Figure 4, Figure 5 and Figure 6 confirms that our approach effectively generates unknown attack data and learns data features quickly.

Figure 3. Time consumed in sections between 10,000 and 20,000.

Figure 4. Time consumed in sections between 30,000 and 40,000.

Table 3, Table 4 and Table 5 demonstrate the accuracy of the LVAE in detecting unknown threats. Our approach consistently outperforms alternative methods and achieves outstanding accuracy rates when combined with different categorization algorithms, demonstrating its efficacy in identifying unknown attacks.

To sum up, the performance of identifying unknown attacks is greatly enhanced by our LVAE approach. The effectiveness of our technique is enhanced by the correct classification of these attacks and the efficient production of unknown attacks. Our LVAE method’s efficiency and dependability are confirmed by the testing results and the time consumption analysis.

Figure 5. Time consumed in sections between 50,000 and 60,000.

Figure 6. Time consumed in sections between 70,000 and 90,000.

Additionally, Figure 7 illustrates the evaluation of the loss in our LVAE approach during the generation of unknown attacks. It is evident that the loss rapidly decreases and converges to its minimum value before the 20th epoch. Subsequently, it maintains a stable state of convergence. These findings, coupled with the results presented in Table 3, Table 4 and Table 5, provide further evidence of the effectiveness of our LVAE approach. Within just a few training epochs, our approach achieves a high detection rate and quickly converges to optimal performance. These results underscore the reliability and efficacy of our LVAE method in identifying unknown attacks.

Due to the discrete and high-dimensional nature of attack data, traditional variational autoencoders struggle to fully capture the features of such data, resulting in high loss.

In contrast, our approach utilizes the log-cosh function as the reconstruction loss, which leads to a significant drop in the loss and eventual convergence during gradient descent. This indicates that our model can effectively learn and model the features of attacks, thereby improving the generation of unknown attacks.

Consequently, the attacks generated by our variational autoencoder using the log-cosh function exhibit similarities to the original attacks. In simpler terms, the generated attack can be seen as belonging to the same attack family as the original attack, with matching features.

5. Conclusions

In this research, we introduce a novel LVAE technique that harnesses the power of VAE to effectively learn complex data distributions and generate accurate reconstruction data. By incorporating the log-cosh function as a reconstruction loss term in our method, we can generate unknown-type attacks with great success. Additionally, our approach utilizes eight classifiers to combine and efficiently comprehend the unique characteristics of these unknown threats.

To evaluate the performance of our LVAE method, we conducted experiments on the CICIDS2017 dataset, which encompasses a wide range of modern attacks in varying quantities. The results unequivocally demonstrate that our LVAE method outperforms several state-of-the-art methods in terms of the detection rates. This underscores the effectiveness of our meticulously designed LVAE method in accurately identifying unknown attacks and enhancing the overall detection rates.

In future research, our objective is to explore even more effective methods for detecting unknown attacks and deploy them in real-world scenarios for real-time detection. It is worth noting that our current LVAE method can generate attack samples, but it is unable to generate specific attacks due to the random selection of attack samples for reconstruction. This is an aspect that we intend to improve upon in the future. Through the continuous refinement and enhancement of our approach, we are committed to making significant contributions to the field of intrusion detection and bolstering the security of systems and networks.

Author Contributions

Conceptualization, methodology, validation, and writing, L.Y.; conceptualization, data curation, and formal analysis, L.X. and X.J.; supervision, funding acquisition, and review, L.Y., L.X. and X.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Project of Key Research and Development Program of Anhui Province, grant number 202104d07020010, and the China National Natural Science Foundation, grant number 61572034.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All datasets are available in the mentioned references.

Acknowledgments

We thank the above projects for their support.

Conflicts of Interest

The authors declare no conflict of interest.

References

Dong, S.; Xia, Y.; Peng, T. Network Abnormal Traffic Detection Model Based on Semi-Supervised Deep Reinforcement Learning. IEEE Trans. Netw. Serv. Manag. 2021, 18, 4197–4212. [Google Scholar] [CrossRef]
Alahmed, S.; Alasad, Q.; Hammood, M.M.; Yuan, J.-S.; Alawad, M. Mitigation of Black-Box Attacks on Intrusion Detection Systems-Based ML. Computers 2022, 11, 115. [Google Scholar] [CrossRef]
Ahmad, S.; Arif, F.; Zabeehullah, Z.; Iltaf, N. Novel Approach Using Deep Learning for Intrusion Detection and Classification of the Network Traffic. In Proceedings of the 2020 IEEE International Conference on Computational Intelligence and Virtual Environments for Measurement Systems and Applications (CIVEMSA), Tunis, Tunisia, 22–24 June 2020; pp. 1–6. [Google Scholar]
Rigaki, M. Adversarial Deep Learning against Intrusion Detection Classifiers. In Proceedings of the IST-152 Workshop on Intelligent Autonomous Agents for Cyber Defence and Resilience, Prague, Czech Republic, 18–20 October 2017. [Google Scholar]
Alasad, Q.; Hammood, M.M.; Alahmed, S. Performance and Complexity Tradeoffs of Feature Selection on Intrusion Detection System-Based Neural Network Classification with High-Dimensional Dataset. In Proceedings of the 2nd International Conference on Emerging Technologies and Intelligent Systems, Online, 2–3 September 2022; pp. 533–542. [Google Scholar]
Tian, Y.; Mirzabagheri, M.; Bamakan, S.M.H.; Wang, H.; Qu, Q. Ramp loss one-class support vector machine; A robust and effective approach to anomaly detection problems. Neurocomputing 2018, 310, 223–235. [Google Scholar] [CrossRef]
Kamarudin, M.H.; Maple, C.; Watson, T.; Safa, N.S. A LogitBoost-Based Algorithm for Detecting Known and Unknown Web Attacks. IEEE Access 2017, 5, 26190–26200. [Google Scholar] [CrossRef]
Ahmad, R.; Alsmadi, I.; Alhamdani, W.; Tawalbeh, L. A Deep Learning Ensemble Approach to Detecting Unknown Network Attacks. J. Inf. Secur. Appl. 2022, 67, 103196. [Google Scholar] [CrossRef]
Liu, Y.; Chen, K.; Liao, X.; Zhang, W. A genetic clustering method for intrusion detection. Pattern Recognit. 2004, 37, 927–942. [Google Scholar] [CrossRef]
Xu, X.; Shen, F.; Yang, Y.; Shen, H.T.; Li, X. Learning Discriminative Binary Codes for Large-scale Cross-modal Retrieval. IEEE Trans. Image Process. 2017, 26, 2494–2507. [Google Scholar] [CrossRef] [PubMed]
Luo, Y.; Yang, Y.; Shen, F.; Huang, Z.; Zhou, P.; Shen, H.T. Robust discrete code modeling for supervised hashing. Pattern Recognit. 2018, 75, 128–135. [Google Scholar] [CrossRef]
Hu, M.; Yang, Y.; Shen, F.; Xie, N.; Shen, H.T. Hashing with Angular Reconstructive Embeddings. IEEE Trans. Image Process. 2018, 27, 545–555. [Google Scholar] [CrossRef] [PubMed]
Xu, X.; Lu, H.; Song, J.; Yang, Y.; Shen, H.T.; Li, X. Ternary Adversarial Networks with Self-Supervision for Zero-Shot Cross-Modal Retrieval. IEEE Trans. Cybern. 2020, 50, 2400–2413. [Google Scholar] [CrossRef]
Aziz, M.F.; Khan, A.N.; Shuja, J.; Khan, I.A.; Khan, F.G.; Khan, A.U.R. A lightweight and compromise-resilient authentication scheme for IoTs. Trans. Emerg. Telecommun. Technol. 2022, 33, e3813. [Google Scholar] [CrossRef]
Jan, S.A.; Amin, N.U.; Shuja, J.; Abbas, A.; Maray, M.; Ali, M. SELWAK: A Secure and Efficient Lightweight and Anonymous Authentication and Key Establishment Scheme for IoT Based Vehicular Ad hoc Networks. Sensors 2022, 22, 4019. [Google Scholar] [CrossRef]
Lee, J.-S.; Chen, Y.-C.; Chew, C.-J.; Chen, C.-L.; Huynh, T.-N.; Kuo, C.-W. CoNN-IDS: Intrusion detection system based on collaborative neural networks and agile training. Comput. Secur. 2022, 122, 102908. [Google Scholar] [CrossRef]
Lopez-Martin, M.; Sanchez-Esguevillas, A.; Arribas, J.I.; Carro, B. Contrastive Learning over Random Fourier Features for IoT Network Intrusion Detection. IEEE Internet Things J. 2023, 10, 8505–8513. [Google Scholar] [CrossRef]
Singh, A.; Chatterjee, K.; Satapathy, S.C. An edge based hybrid intrusion detection framework for mobile edge computing. Complex Intell. Syst. 2022, 8, 3719–3746. [Google Scholar] [CrossRef]
Zoppi, T.; Ceccarelli, A.; Puccetti, T.; Bondavalli, A. Which algorithm can detect unknown attacks? Comparison of supervised, unsupervised and meta-learning algorithms for intrusion detection. Comput. Secur. 2023, 127, 103107. [Google Scholar] [CrossRef]
Boukela, L.; Zhang, G.; Yacoub, M.; Bouzefrane, S. A near-autonomous and incremental intrusion detection system through active learning of known and unknown attacks. In Proceedings of the 2021 International Conference on Security, Pattern Analysis, and Cybernetics (SPAC), Chengdu, China, 18–20 June 2021; pp. 374–379. [Google Scholar]
Soltani, M.; Ousat, B.; Jafari Siavoshani, M.; Jahangir, A.H. An adaptable deep learning-based intrusion detection system to zero-day attacks. J. Inf. Secur. Appl. 2023, 76, 103516. [Google Scholar] [CrossRef]
Mahdavi, E.; Fanian, A.; Mirzaei, A.; Taghiyarrenani, Z. ITL-IDS: Incremental Transfer Learning for Intrusion Detection Systems. Knowl.-Based Syst. 2022, 253, 109542. [Google Scholar] [CrossRef]
Mananayaka, A.K.; Chung, S.S. Network Intrusion Detection with Two-Phased Hybrid Ensemble Learning and Automatic Feature Selection. IEEE Access 2023, 11, 45154–45167. [Google Scholar] [CrossRef]
Zhou, X.; Liang, W.; Li, W.; Yan, K.; Shimizu, S.; Wang, K.I.K. Hierarchical Adversarial Attacks Against Graph-Neural-Network-Based IoT Network Intrusion Detection System. IEEE Internet Things J. 2022, 9, 9310–9319. [Google Scholar] [CrossRef]
Kumar, V.; Sinha, D. A robust intelligent zero-day cyber-attack detection technique. Complex Intell. Syst. 2021, 7, 2211–2234. [Google Scholar] [CrossRef] [PubMed]
Sarhan, M.; Layeghy, S.; Gallagher, M.; Portmann, M. From zero-shot machine learning to zero-day attack detection. Int. J. Inf. Secur. 2023, 22, 947–959. [Google Scholar] [CrossRef]
Sheng, C.; Yao, Y.; Li, W.; Yang, W.; Liu, Y. Unknown Attack Traffic Classification in SCADA Network Using Heuristic Clustering Technique. IEEE Trans. Netw. Serv. Manag. 2023, 20, 2625–2638. [Google Scholar] [CrossRef]
Hairab, B.I.; Elsayed, M.S.; Jurcut, A.D.; Azer, M.A. Anomaly Detection Based on CNN and Regularization Techniques against Zero-Day Attacks in IoT Networks. IEEE Access 2022, 10, 98427–98440. [Google Scholar] [CrossRef]
de Araujo-Filho, P.F.; Naili, M.; Kaddoum, G.; Fapi, E.T.; Zhu, Z. Unsupervised GAN-Based Intrusion Detection System Using Temporal Convolutional Networks and Self-Attention. IEEE Trans. Netw. Serv. Manag. 2023. [Google Scholar] [CrossRef]
Verkerken, M.; D’hooge, L.; Sudyana, D.; Lin, Y.D.; Wauters, T.; Volckaert, B.; Turck, F.D. A Novel Multi-Stage Approach for Hierarchical Intrusion Detection. IEEE Trans. Netw. Serv. Manag. 2023, 20, 3915–3929. [Google Scholar] [CrossRef]
Sohi, S.M.; Seifert, J.-P.; Ganji, F. RNNIDS: Enhancing network intrusion detection systems through deep learning. Comput. Secur. 2021, 102, 102151. [Google Scholar] [CrossRef]
Moustafa, N.; Keshk, M.; Choo, K.-K.R.; Lynar, T.; Camtepe, S.; Whitty, M. DAD: A Distributed Anomaly Detection system using ensemble one-class statistical learning in edge networks. Future Gener. Comput. Syst. 2021, 118, 240–251. [Google Scholar] [CrossRef]
Debicha, I.; Bauwens, R.; Debatty, T.; Dricot, J.-M.; Kenaza, T.; Mees, W. TAD: Transfer learning-based multi-adversarial detection of evasion attacks against network intrusion detection systems. Future Gener. Comput. Syst. 2023, 138, 185–197. [Google Scholar] [CrossRef]
Dina, A.S.; Manivannan, D. Intrusion detection based on Machine Learning techniques in computer networks. Internet Things 2021, 16, 100462. [Google Scholar] [CrossRef]
Lai, Y.C.; Sudyana, D.; Lin, Y.D.; Verkerken, M.; D’hooge, L.; Wauters, T.; Volckaert, B.; Turck, F.D. Task Assignment and Capacity Allocation for ML-Based Intrusion Detection as a Service in a Multi-Tier Architecture. IEEE Trans. Netw. Serv. Manag. 2023, 20, 672–683. [Google Scholar] [CrossRef]
Sabeel, U.; Heydari, S.S.; El-Khatib, K.; Elgazzar, K. Unknown, Atypical and Polymorphic Network Intrusion Detection: A Systematic Survey. IEEE Trans. Netw. Serv. Manag. 2023. [Google Scholar] [CrossRef]
Rani, S.V.J.; Ioannou, I.; Nagaradjane, P.; Christophorou, C.; Vassiliou, V.; Yarramsetti, H.; Shridhar, S.; Balaji, L.M.; Pitsillides, A. A Novel Deep Hierarchical Machine Learning Approach for Identification of Known and Unknown Multiple Security Attacks in a D2D Communications Network. IEEE Access 2023. [Google Scholar] [CrossRef]
Lu, C.; Wang, X.; Yang, A.; Liu, Y.; Dong, Z. A Few-Shot Based Model-Agnostic Meta-Learning for Intrusion Detection in Security of Internet of Things. IEEE Internet Things J. 2023. [Google Scholar] [CrossRef]
Shin, G.Y.; Kim, D.W.; Han, M.M. Data Discretization and Decision Boundary Data Point Analysis for Unknown Attack Detection. IEEE Access 2022, 10, 114008–114015. [Google Scholar] [CrossRef]
Lan, J.; Liu, X.; Li, B.; Zhao, J. A novel hierarchical attention-based triplet network with unsupervised domain adaptation for network intrusion detection. Appl. Intell. 2023, 53, 11705–11726. [Google Scholar] [CrossRef]
Zavrak, S.; İskefiyeli, M. Anomaly-Based Intrusion Detection from Network Flow Features Using Variational Autoencoder. IEEE Access 2020, 8, 108346–108358. [Google Scholar] [CrossRef]
Vu, L.; Nguyen, Q.U.; Nguyen, D.N.; Hoang, D.T.; Dutkiewicz, E. Deep Generative Learning Models for Cloud Intrusion Detection Systems. IEEE Trans. Cybern. 2023, 53, 565–577. [Google Scholar] [CrossRef]
Long, C.; Xiao, J.; Wei, J.; Zhao, J.; Wan, W.; Du, G. Autoencoder ensembles for network intrusion detection. In Proceedings of the 2022 24th International Conference on Advanced Communication Technology (ICACT), Pyeongchang, Republic of Korea, 13–16 February 2022; pp. 323–333. [Google Scholar]
Yang, J.; Chen, X.; Chen, S.; Jiang, X.; Tan, X. Conditional Variational Auto-Encoder and Extreme Value Theory Aided Two-Stage Learning Approach for Intelligent Fine-Grained Known/Unknown Intrusion Detection. IEEE Trans. Inf. Forensics Secur. 2021, 16, 3538–3553. [Google Scholar] [CrossRef]
Abdalgawad, N.; Sajun, A.; Kaddoura, Y.; Zualkernan, I.A.; Aloul, F. Generative Deep Learning to Detect Cyberattacks for the IoT-23 Dataset. IEEE Access 2022, 10, 6430–6441. [Google Scholar] [CrossRef]
Jin, D.; Chen, S.; He, H.; Jiang, X.; Cheng, S.; Yang, J. Federated Incremental Learning based Evolvable Intrusion Detection System for Zero-Day Attacks. IEEE Netw. 2023, 37, 125–132. [Google Scholar] [CrossRef]
Yang, L.; Song, Y.; Gao, S.; Hu, A.; Xiao, B. Griffin: Real-Time Network Intrusion Detection System via Ensemble of Autoencoder in SDN. IEEE Trans. Netw. Serv. Manag. 2022, 19, 2269–2281. [Google Scholar] [CrossRef]
Zahoora, U.; Rajarajan, M.; Pan, Z.; Khan, A. Zero-Day Ransomware Attack Detection Using Deep Contractive Autoencoder and Voting Based Ensemble Classifier. Appl. Intell. 2022, 52, 13941–13960. [Google Scholar] [CrossRef]
Boppana, T.K.; Bagade, P. GAN-AE: An unsupervised intrusion detection system for MQTT networks. Eng. Appl. Artif. Intell. 2023, 119, 105805. [Google Scholar] [CrossRef]
Kim, C.; Chang, S.Y.; Kim, J.; Lee, D.; Kim, J. Automated, Reliable Zero-Day Malware Detection Based on Autoencoding Architecture. IEEE Trans. Netw. Serv. Manag. 2023, 20, 3900–3914. [Google Scholar] [CrossRef]
Li, R.; Li, Q.; Zhou, J.; Jiang, Y. ADRIoT: An Edge-Assisted Anomaly Detection Framework against IoT-Based Network Attacks. IEEE Internet Things J. 2022, 9, 10576–10587. [Google Scholar] [CrossRef]
Li, Z.; Chen, S.; Dai, H.; Xu, D.; Chu, C.K.; Xiao, B. Abnormal Traffic Detection: Traffic Feature Extraction and DAE-GAN with Efficient Data Augmentation. IEEE Trans. Reliab. 2023, 72, 498–510. [Google Scholar] [CrossRef]
Sharafaldin, I.; Lashkari, A.H.; Ghorbani, A.A. Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSP 2018, 1, 108–116. [Google Scholar]

Figure 7. As the number of epochs increases, the loss of the model changes.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yu, L.; Xu, L.; Jiang, X. An Effective Method for Detecting Unknown Types of Attacks Based on Log-Cosh Variational Autoencoder. Appl. Sci. 2023, 13, 12492. https://0-doi-org.brum.beds.ac.uk/10.3390/app132212492

AMA Style

Yu L, Xu L, Jiang X. An Effective Method for Detecting Unknown Types of Attacks Based on Log-Cosh Variational Autoencoder. Applied Sciences. 2023; 13(22):12492. https://0-doi-org.brum.beds.ac.uk/10.3390/app132212492

Chicago/Turabian Style

Yu, Li, Liuquan Xu, and Xuefeng Jiang. 2023. "An Effective Method for Detecting Unknown Types of Attacks Based on Log-Cosh Variational Autoencoder" Applied Sciences 13, no. 22: 12492. https://0-doi-org.brum.beds.ac.uk/10.3390/app132212492

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Effective Method for Detecting Unknown Types of Attacks Based on Log-Cosh Variational Autoencoder

Abstract

1. Introduction

1.1. Research Background

1.2. Related Work

1.3. The Contribution of the Work in This Paper

2. Materials and Methods

2.1. Log-Cosh Variational Auto Encoder

2.2. Classification Stage

2.3. Model Structural Details and Detection Process

3. Experiments

3.1. Description of the Dataset

3.2. Assessment of Indicators

3.3. Setting of Model Hyperparameters

4. Results and Discussion

4.1. Analysis of Experimental Results

4.2. Calculated Cost Analysis of the Various Components of the LVAE Methodology

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI