1. Introduction
With the development of the Internet of Things (IoT), big data, and other technologies, the connection of vehicles to the Internet is becoming, increasingly, common, and the automobile industry is, also, evolving towards the direction of an intelligent Internet of Vehicles (IoV) [
1]. IoV refers to the network system that connects the internal devices of vehicles, cars and people, cars and cars, cars and road, and cars and cloud platforms, through various mobile communication technologies [
2]. As an important part of the smart city network, the importance of the Internet of Vehicles is apparent, in intelligent transportation and autonomous driving [
3]. Governments and enterprises, in many countries, are working together towards the direction of an intelligent IoV [
4,
5].
As the early automobile was a relatively closed and independent system, the issue of automobile network security did not attract much attention. Nowadays, with the continuous development of IoV technology, network attacks against cars are increasingly frequent. Attackers can attack a vehicle, using either physical or remote access, and take control of the car, which seriously threatens the normal running of the car and the life of the driver [
6]. Frequent information exchange between vehicles and the outer world leads to more and more types of external interfaces of vehicles, which, also, leads to an increasing number of attack paths against vehicle-mounted networks (VMNs) [
7].
Some famous experimental attacks were conducted in the past few years, including an attack on the control systems of the Ford Escape and Toyota Prius, in 2013 [
8], and the remote attack on more than 20 models, to estimate the difficulty of remote exploitation for these vehicles [
9]. Later, a demonstrative remote control of steering and braking on Fiat-Chrysler vehicles was presented, forcing Fiat to recall 1.4 million vehicles in an emergency [
10]. In 2016, a car’s powertrain and steering wheel were successfully interfered with, by the injection of an attack message through the Jeep’s onboard diagnostic (OBD) system interface [
11]. A Tesla was, also, attacked, due to a security vulnerability, which allowed acquisition of the location information of the vehicle, to remotely control it [
12,
13]. In 2018, physical contact and remote attacks on a number of BMW models were realized, to control the vehicle [
14]. In 2020, an attacker successfully developed a new key clone, called the Relay Attack, for Tesla cars, and demonstrated it on a Tesla Model X electric car [
15]. From 2016 to 2019, the security incidents of the intelligently connected vehicles increased by seven times, among which the incidents in 2019 increased by 99%, compared with those in 2018. In 2019, 82% of security incidents were caused by remote attacks.
In addition to VMN, intelligently connected vehicle platform, and network-level platform and terminals, the IoV system, also, includes various ECUs (Electric Control Units). Common ECUs include body control, engine control, airbag, etc. [
16], which are associated with the internal communication buses and constitute an on-board network system. Common on-board network protocols include the CAN (Control Area Network) bus, Flex Ray bus, and MOST (Media Oriented System Transport) bus, etc. [
17], among which the CAN bus protocol is the most widely used on-board network protocol at present and has, virtually, become the actual standard for it [
18].
Attack messages sent by attackers are transmitted through the VMN, so prevention of intrusion via VMN is the most important task in the area of vehicle information security [
19,
20]. Since the data containing different functions are transmitted, periodically, through the CAN bus, an intruder can attack a function to control the car through replay, without the need to master the CAN protocol [
21]. Currently, attacks on the vehicle-mounted CAN network include discarding, tampering, reading, spoofing, flooding, and replaying, etc. Discard means that the attacker deletes the key message data on the CAN bus, interfering with the normal operation of the vehicle. Tampering is when the attacker modifies the data content of the CAN message, causing the car to follow the wrong instructions. Reading is when the attacker obtains real data on the CAN bus, through a node controlled by intrusion. Spoofing is when the attacker uses the attacked node to send diagnostic and attack information, to occupy ECU resources. Flooding means that messages are sent with a high priority to the CAN bus at high frequency, thus occupying the CAN resources and preventing other nodes on the network from sending messages normally, causing the bus network to collapse. Replaying means that an attacker can attack and control an ECU at will and reload its message data onto the CAN bus.
The methods and technologies of vehicle-network-information security can be divided into encryption-authentication technology, the security architecture standard, and network-intrusion-detection technologies. In this paper, the method based on network-intrusion-detection technology is adopted, to detect the attacks on the vehicle network.
As for the intrusion-detection systems, they can be divided into anomaly-based and misuse-based ones, according to the technology adopted. A misuse-detection system is based on the extraction of the characteristics and rules of the attack behavior, to establish a feature rule. During intrusion detection, if the characteristic behavior of the system is found to match the characteristics of the feature rule, it is considered an attack; otherwise, it is normal. An anomaly-detection system, first, establishes a characteristic rule of normal behavior and sets a threshold value. When there is an intrusion, the system compares it with the normal one. If the result is greater than the threshold, it is an attack; otherwise, it is normal. The intrusion detection method, based on misuse, should update the attack-behavior-characteristic library in real time. Otherwise, it may not be able to detect attacks that are not present in the library [
22,
23]. Whereas, anomaly detection does not require periodic updates to the system and can detect attacks, if the normal behavior of the VMN can be successfully defined. Therefore, it is more suitable for the detection of a vehicle-mounted CAN network. When the difference between the received message and the predicted result is greater than a threshold value, it would be recognized as an anomaly. By using detection sensors, a structured anomaly detection method was proposed to detect the CAN identifier and message frequency [
24]. In another piece of research, attackers were prevented from analyzing and tampering ECU codes, using a Markov-chain-decision model on the encrypted storage system of the ECU [
25]. An anomaly-detection method, based on information entropy [
26], and a frequency-based anomaly-detection method were, also, proposed for the detection of anomaly intrusions [
27]. To detect malicious attack messages in time, a lightweight intrusion-detection algorithm was proposed, based on analyzing the messages’s time interval, with a response time of less than 1 ms [
28]. Larson et al. proposed a CAN bus intrusion-detection method, based on protocol-level security rules between ECUs, to detect ECU exceptions [
29]. Murvay et al. proposed a method to identify the sender of information from the ECU, by analyzing the characteristics of the CAN bus signals [
30]. In addition to traditional methods, neural networks have, also, been used to detect anomaly intrusions. Taylor et al. proposed an anomaly-detection method, based on the LSTM (Long Short-Term Memory) neural network, in which the network was trained with the content of the message to predict the content of the next message [
31]. One-dimensional (1D) CNN models can, also, be used in the processing of 1D data, and good results have been obtained [
32,
33,
34,
35]. However, 1D convolution operation is not able to identify the time connections between the data. Therefore, in intrusion detections, only 2D CNNs were used to process 1D sequential data. For example, an intrusion-detection model, based on the deep convolutional neural network, was proposed by Song et al. [
36]. However, there are, still, problems when using a CNN on this issue, such as how to convert the 1D sequential data into 2D grid data, which are easier to be recognized by the CNN, etc. [
37].
In view of this, this paper focuses on the security issues of a vehicle-mounted CAN network, from the perspective of intrusion-detection technology, combined with a deep learning model and a novel Mosaic pattern-based coding method. As CNN is only good at dealing with grid data, such as images, and CAN is, typically, a kind of sequential data, a novel pattern-based coding method was proposed in this paper, to make the CNN more effectively extract the data characteristics. The main contributions of this paper include:
A 2D Mosaic-coding method was proposed, for converting the 1D attack data into 2D grid data, to make full use of the ability of a CNN to extract grid data and maintain the sequential time relationship of the data.
Different thresholds higher than 0.5 were set, to effectively test the reliability of the model.
Extensive experiments were carried out, to show that our method could achieve better performance with higher classification capability, and that it was more reliable and stable in identifying intruders’s attacks than the previous method.
The rest of this paper is organized as follows. In the next section, we briefly describe the background of a CNN, the CAN data, and the previous CNN model for this problem.
Section 3 describes our proposed method. In
Section 4, the effectiveness of the proposed method is evaluated, and the performances of the proposed and the existing methods are compared and discussed.
Section 5 makes a conclusion for this paper.
4. Experimental Result and Discussions
Our CNN model, based on Mosaic pattern coding strategy, was tested in a series of experiments. We evaluated the effectiveness of the proposed method, made comparisons with other methods for the same problem, and discussed the accuracy as well as performance of the existing method and ours.
4.1. Experimental Result
The data were tested over all thresholds, from 0.5 to 0.9, and, then, the numbers of TP, TN, FP, and FN in the confusion matrix can be worked out. After that, each indicator expressed in Equations (7)–(12) can be calculated. For example, when we used the 4 × 4 Mosaic 6 × 6 data-grid-coding method to classify and discriminate DoS attack datasets, the results were TP = 16853, TN = 40373, FP = 0, and FN = 52. According to these specific values, we can further evaluate the quality of the model.
Table 2 and
Table 3 only showed the results for the threshold values of 0.5 and 0.9. The coding methods used in the tests can be divided into the following four categories: (1) sequential coding of size 16 × 16; (2) sequential coding of size 29 × 29; (3) Mosaic coding of sizes 4 × 4, 6 × 6, and 8 × 8 Mosaic patterns, with each ID being converted to a 4 × 4 Mosaic pattern; and (4) Mosaic coding of sizes 4 × 4, 6 × 6, and 8 × 8 Mosaic patterns, with each ID being converted to a 6 × 6 Mosaic pattern. Coding methods (1) and (3) correspond to an 11-bit ID, but methods (2) and (4) correspond to a 29-bit ID. In the sequential-coding method, take the 16 × 16 sequence as an example; each 11-bit CAN ID was, first, extended to 16 bits, by adding 5 zeros to the end of it, and, then, 16 such CAN IDs were put together, sequentially, to splice a 16 × 16 sequence data grid. In the Mosaic-coding method, take the 4 × 4 Mosaic 4 × 4 data grid as an example; each 11-bit CAN ID was, first, extended to 16 bits, by adding 5 zeros to the end of it, and, then, each of them was grid-coded as a 4 × 4 Mosaic pattern; at last, 16 such Mosaic patterns were grid-coded again, to splice a 4 × 4 data grid.
As can be seen from the experimental results in
Table 2 and
Table 3, although different Mosaic-coding methods would lead to slightly different results, our method, generally, achieved better results compared to the direct 16 × 16 or 29 × 29 sequential coding ones, with the best results always given by our method. No matter whether the 11-bit or 29-bit CAN ID was used, for a DoS attack, the 4 × 4 data-grid-coding method showed the best results; for Fuzzy attack, the 8 × 8 data-grid-coding method gave the best; and for Gear and RPM attacks, the 6 × 6 data-grid-coding method was the best. This is because the Fuzzy attack is the most difficult to detect, followed by the Gear and RPM attacks, while the DoS attack is the easiest to detect. Therefore, a more difficult attack may need a more complicated network to detect it.
To show the good performance of our method,
Table 4 gives the percentage increase for our method, over the sequential-coding method for each performance index, with different ID and threshold values. We can see from this table that, over all the performance indexes, our method always achieved better results. This is more prominent for the indexes of UR and FNR, as the other indexes obtained by the sequential-coding method had already quite good, and our method could not further improve them much. Therefore, the coding method proposed in this paper is superior to the direct sequential-coding method, in overall performance. We, also, found that when Mosaic coding is adopted, the 29-bit coding method is slightly better than 11-bit coding, which may be because the CAN ID of 29 bits is processed by 6 × 6 grid size, resulting in a larger model input data and more model parameters.
When the threshold was increased from 0.5 to 0.6, 0.7, 0.8, or 0.9, some of the outputs that did not meet the judgment conditions would not be able to be classified and would be discarded as unrecognized samples. The higher the threshold is, the more the samples are discarded, resulting in a larger UR value. According to our results, as the threshold increased, the values of Precision, Recall, and F1_score all increased, while the values of Accuracy and FNR decreased. The reason of the decrease in Accuracy is that it was worked out using the sum of TP and TN divided by the total number of patterns processed in that test. When the threshold increased, the total number of TP and TN would decrease, leading to a reduction in Accuracy. According to the experimental results, our coding method led to a smaller UR value, indicating that our model had a better discrimination rate.
To have an overall view on how the performance of the models would be changed under different thresholds, the changes of each performance index in the Fuzzy dataset under different thresholds are shown in
Figure 9.
We can see from
Figure 9 that the change in performance indexes over different thresholds is not as big for the Mosaic pattern coding as it is for the sequential coding. The performance of Precision was nearly unchanged with the threshold, when the Mosaic pattern coding method was used, but it changed for the sequential-coding method, especially for the 11-bit ID dataset. For other performance indexes, although they changed slightly when the Mosaic pattern coding method was used, the change was much bigger when the sequential-coding method was used. In all cases, the performances of the Mosaic pattern coding method were always better than those of the sequential-coding method. Again, in all cases, the performance of the sequential coding-method on the 11-bit ID dataset changed the most, with the change of the threshold. These showed that the proposed Mosaic pattern coding method not only had better performance and higher discrimination rate but also had lower dependence on the change of the threshold values.
4.2. Discussion
A convolutional neural network is good at processing 2D grid data, such as images. To make the CAN ID data meet the requirement of the 2D grid structure as the input data of the CNN model, some researchers, directly, packed the data in sequence, as the input data of the CNN model. This kind of data-processing method was unable to make full use of the advantages of the CNN, in pattern extraction. Therefore, this paper proposed a novel data-coding method, by converting the 1D ID to a 2D data block, and each such data block was joined together with a time connection, to splice a Mosaic pattern to reflect the time association. This was convenient for the CNN, to effectively extract the data patterns, and it, also, maintained the time characteristics among the data.
To evaluate the reliability of different models, our model was compared with the sequential-coding model. As the UR value is good at evaluating how well a model would be to distinguish patterns at a higher standard,
Figure 10 gives the change of UR with that of the threshold, for both the Mosaic-coding and the direct-sequential-coding methods. It can be seen from this figure that the UR value increases with the increase in the threshold in both methods, and the larger the threshold is, the higher the UR value, representing more data that were unrecognized by the higher standard. However, comparing the Mosaic-coding method with the sequential-coding methods, it can be seen that the increase in the UR with the Mosaic-coding method is much lower than that with the sequential-coding method. For example, even when the threshold value equals 0.9, the UR value with the Mosaic-coding method is still lower than 0.2%, meaning that the classification capability of the proposed model is much higher than that of the previous one, which indicates that the Mosaic-coding method is more reliable and stable in identifying the intruders’s attack.
In recent years, with the development of machine learning, more and more researchers use machine-learning methods for intrusion-detection research [
47]. To further verify the feasibility of the proposed method, the Mosaic-coded-CNN method was, also, used to compare with some other classical machine-learning algorithms, using 29-bit CAN ID. The results obtained from an Artificial Neural Network (ANN) with 2 hidden layers, an LSTM with 256 hidden units, a Support Vector Machine (SVM), a K-Nearest Neighbor (KNN) for K = 5, a Naive Bayes (NB), and a Decision Trees (DT) [
36] were used for comparison in this paper. The Markov-Transition-Field (MTF) method [
48] was, also, used to convert each binary CAN ID sequence data into a 2 × 2 grid image, for comparison. The experimental results are shown in
Figure 11a–d.
Experimental results showed that, generally speaking, our method performed slightly better than or as well as the other machine-learning algorithms, for the evaluation indexes of Precision, Recall, F1_score, and Accuracy. For FNR, our method always performed much better than KNN, NB, and DT. However, it was sometimes slightly worse than the algorithm of ANN, LSTM, or SVM, for some datasets. This is because we only used the simplest CNN model in this paper, which has certain limitations and could be further optimized.
Finally, to compare the running time of the Mosaic-pattern-coding method with that of the sequential-coding method,
Table 5 gives the time (in seconds) taken to run the program. The program was run on an Intel(R) Core (TM) i7-4510U
[email protected] GHz laptop; the machine has two cores and four threads, and the program runs on its CPU.
It can be seen from
Table 5 that the program running time is different, with different Mosaic sizes and coding methods. Compared with the sequential-coding method, the Mosaic-coding method, generally, takes a little longer to run. This is because, no matter whether the 4 × 4 data grid (for an 11-bit CAN ID) or the 6 × 6 data grid (for a 29-bit CAN ID) was used in our method, there were always some redundant bits in the grid, which would consume some of the running time of the program. We, also, find, from
Table 5, that the 6 × 6 Mosaic pattern took slightly longer to run than the 4 × 4 or 8 × 8 Mosaic patterns, in both the 11-bit and 29-bit CAN IDs. This is, mainly, due to the total number of running cycles, the number of samples, and the number of convolutions, which were different for different coding methods, and the product of them was the largest for the 6 × 6 Mosaic patterns. However, this only reflects the training time. After the networks have been trained, the test time would be much shorter. The test time for one sample, along with the number of parameters in the model for each method, is shown in
Figure 12, which showed that all test times were in the order of sub-milliseconds. If a slightly higher-performance computer was used, they would be further reduced. Therefore, our model can well meet the timely requirement of detecting intrusion attacks in real time. We, also, find in this figure that the test time of the model is, mainly, determined by the size of the input data and the complexity of the model itself. When the structure of the model changes, the test time of the model will, inevitably, change. For example, when the number of convolution layers or the number of fully connected layers increases, the test time of the model will, also, increase.
4.3. Some Limitations of the Mothed
The first limitation is related to the training dataset. As the data used in this paper are composed of four independent datasets, each containing only one attack type, we, also, conducted model training and testing on each dataset, separately. As a result, the models trained on a specific attack dataset are only guaranteed effective against that type of attack. In order to test the performance of the model trained on one dataset and tested on another, we list, in
Table 6, the accuracy of the model trained using the 8 × 8 Mosaic 6 × 6 data-grid-coding method on one dataset and tested on all four datasets. We can see from this table that other types of attacks are detected by a model trained on only one type of attack data, with a decreased accuracy. Therefore, we should be careful when planning to use a model trained on one type of attack data to detect other types of attacks. Nevertheless, we shall improve the design in our future work, to include all types of attacks.
The other limitation is the possible adversarial attack, which is the artificially designed “noise” added to the samples, in the training process of the model. Although the modified sample is difficult to be directly distinguished by human eyes, it is an attack method that may degrade the model’s classification ability. In fact, a number of studies have proven that adversarial attacks can affect the performance in image recognition, natural language processing, malware detection, and other fields [
49]. The CNN model, also, would be vulnerable to adversarial attacks, in which some carefully designed input perturbations, either at training or the test stage, could divert its predictions. Adversarial training is an effective method, to defend against the adversarial sample attacks. It generates an adversarial sample set, by adding various disturbance information to the original samples, and trains the deep-neural-network model with the adversarial and original sample sets [
50]. As most of the vehicle’s CAN IDs are periodic, the message on the CAN bus will be different, when the vehicle is in different driving states. Therefore, we can train the detection model, by collecting CAN ID data at different driving states (such as static, braking, accelerating, etc.) of the vehicle, so that the model can learn richer and more complex information and improve the robustness of it. As the detection models can be trained off-line, in which adversarial attacks will be classified as attack messages, we did not consider this issue in our paper. However, this would be an important issue for us to further explore, in the future.