Next Article in Journal
Identification of Challenges for Second-Life Battery Systems—A Literature Review
Next Article in Special Issue
Accelerated and Refined Lane-Level Route-Planning Method Based on a New Road Network Model for Autonomous Vehicle Navigation
Previous Article in Journal
Design of Auto-Tuning Nonlinear PID Tracking Speed Control for Electric Vehicle with Uncertainty Consideration
Previous Article in Special Issue
Coordinated Control of Unmanned Electric Formula Car
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Driving Decisions for Autonomous Vehicles in Intersection Environments: Deep Reinforcement Learning Approaches with Risk Assessment

School of Mechanical and Automotive Engineering, Shanghai University of Engineering Science, Shanghai 201620, China
*
Author to whom correspondence should be addressed.
World Electr. Veh. J. 2023, 14(4), 79; https://0-doi-org.brum.beds.ac.uk/10.3390/wevj14040079
Submission received: 4 March 2023 / Revised: 16 March 2023 / Accepted: 20 March 2023 / Published: 23 March 2023
(This article belongs to the Special Issue Recent Advance in Intelligent Vehicle)

Abstract

:
Intersection scenarios are one of the most complex and high-risk traffic scenarios. Therefore, it is important to propose a vehicle driving decision algorithm for intersection scenarios. Most of the related studies have focused on considering explicit collision risks while lacking consideration for potential driving risks. Therefore, this study proposes a deep-reinforcement-learning-based driving decision algorithm to address these problems. In this study, a non-deterministic vehicle driving risk assessment method is proposed for intersection scenarios and introduced into a learning-based intelligent driving decision algorithm. In addition, this study proposes an attention network based on state information. In this study, a typical intersection scenario was constructed using simulation software, and experiments were conducted. The experimental results show that the algorithm proposed in this paper can effectively derive a driving strategy with both driving efficiency and driving safety in the intersection driving scenario. It is also demonstrated that the attentional neural network designed in this study helps intelligent vehicles to perceive the surrounding environment more accurately, improves the performance of intelligent vehicles, as well as accelerates the convergence speed.

1. Introduction

Due to the improvement of people’s living standards and economic development, as well as growing urbanization and transportation needs, the overall trend of car ownership in various countries around the world is rising. According to the International Energy Agency (IEA) [1], global car ownership has grown from about 560 million in 2000 to about 1.32 billion in 2020, with China’s car ownership already exceeding 300 million. With the growth of car ownership, society faces many challenges and problems. One of the major issues is the increasing frequency of traffic accidents. According to data released by the World Health Organization’s Global Status Report on Road Safety 2018, more than 1.3 million people worldwide die each year due to traffic accidents, and traffic accidents are the leading cause of death among children and young people aged 5–29 years of age [2], and this number is increasing year by year. Among the various driving scenarios, intersections are one of the most frequent scenarios for traffic accidents due to the complex traffic environment. According to the German In-Depth Accident Study (GIDAS) and other organizations, 40% of road injury accidents occur at intersections [3]. Therefore, in this context, it is important to propose a vehicle driving decision algorithm applicable to intersection scenarios. The problem of improving traffic safety is complicated by the fact that drivers have different physical and psychological states that make them perceive and react to risks differently during the driving process. Automated driving essentially changes the closed-loop human–vehicle–road system, reducing or minimizing the driver’s influence in the system and making it more efficient and safer. Therefore, in the current context, the method of replacing the driver with an autonomous driving decision algorithm has become a hot research topic.
There have been many studies on intersection driving problems [4]; for example, Li et al. [5] proposed a deep reinforcement learning-based driving decision framework to build an end-to-end decision framework by convolutional neural networks to derive driving strategies at intersections without traffic signals using traffic images as input. Seong et al. [6] proposed an attention-based deep reinforcement learning for driving decision method that uses local vehicle perception data as input to derive driving strategies at intersections without traffic signals. Many driving decision methods have attempted to use various vehicle perception data as input to derive driving strategies. However, driving strategies using vehicle perception data as input are difficult to apply to natural or more complex driving environments because vehicle perception data are affected by changes in the driving environment. Many researchers use vehicle-to-vehicle (V2V) and vehicle-to-infrastructure (V2I) technologies to derive driving strategies by sharing information about the state and environment of individual vehicles at intersections through communication [7]. V2V and V2I technologies can transmit traffic information from intersections to intelligent vehicles regardless of weather conditions. The research in this study assumes that each vehicle’s status and environmental information at the intersection is transmitted to the intelligent vehicle through communication sharing to carry out the research work.
As shown in Figure 1, current intelligent driving decision methods can be categorized into three types according to the technical approach: rule-based decision methods, risk assessment-based decision methods, and learning-based decision methods. Among them, the rule-based decision method, as the most traditional and common method, can meet most of the regular driving scenarios but lacks flexibility and applicability to deal with unexpected situations. Therefore, based on this, researchers have proposed an intelligent driving decision method based on risk assessment. In recent years, with the rapid development of machine learning technology, researchers have begun to explore learning-based decision methods to solve the problem of intelligent driving decision-making. However, learning-based decision methods lack the consideration of uncertainty problems such as traffic rules, driving risks, etc. Therefore, this paper integrates risk-assessment-based and learning-based driving decision methods and proposes an intelligent driving decision algorithm based on deep reinforcement learning, aiming to derive an autonomous driving decision strategy with low expected risk and high driving efficiency to meet the driving challenges in intersection scenarios.

1.1. Rule-Based Decision Methods

The rule-based approach is the most traditional and common approach in driving decision-making. The rule-based decision-making approach builds a rule base through regular driving habits and traffic regulations. It develops corresponding driving strategies based on different driving situations. Furda et al. [8] proposed a method based on Petri nets and multi-criteria decision-making to solve the problem of real-time decision-making for autonomous driving. The proposed method obtains a set of feasible alternative driving decisions through Petri nets and then uses a multi-criteria decision-making method to select the best automatic driving decision from them. Chong et al. [9] proposed a rule-based neural network model to simulate the driver’s driving behavior. The proposed method is based on a fuzzy rule-based neural network model to obtain rules for driver behavior decisions from the driver’s vehicle trajectory. Li et al. [10] developed a decision model for automatic driving behavior in emergency situations based on the T-S fuzzy neural network. The rule-based driving decision method, although able to satisfy regular driving situations, cannot cope with unexpected unconsidered situations. Therefore, the rule-based driving decision method lacks a certain degree of flexibility and adaptability [11].

1.2. Risk-Assessment-Based Decision Methods

Decision-making methods based on risk assessment are generally used to determine driving strategies by assessing the risk profile of the current driving state. Currently, risk assessment is divided into two main categories: deterministic risk assessment and uncertainty risk assessment.
Deterministic risk assessment is usually based on multiple parameters in order to derive from obtaining a numerical value or different risk level regions to represent the risk magnitude. Hillenbrand J et al. [12] used TTC to assess the risk magnitude to make driving decisions by tracking information about vehicles in the area associated with the main vehicle lane change and calculating the time to collision (TTC) between them. Glaser et al. [13], based on the driving environment using TTC and headway time distance (THW) for risk assessment of the possible driving behavior of the primary vehicle, ranked the vehicles according to the risk assessment results to make the best driving strategy. Lee H et al. [14] similarly used the relative speed and position information between vehicles to calculate the risk coefficients, and based on this, the acceleration and speed change characteristics of the surrounding vehicles were taken into account in the risk assessment to help vehicles to make reasonable lane change decisions. Moreover, there are also many research scholars who establish risk assessment methods based on different risk factors from different perspectives [15,16,17].
Uncertainty risk assessment is usually based on mathematical probability models to represent the degree of risk. Many related studies of risk assessment methods assume that the state of the surroundings of the autonomous vehicle is constant, but the surrounding vehicle’s driving state in a realistic environment is uncertain. It is this uncertainty that makes probabilistic model-based risk assessment methods important. Schubert R et al. [18] proposed a driving risk assessment method that considers sensor uncertainty, evaluates the risk of vehicle lane change behavior based on the Bayesian networks, and provides the driver with suggestions for lane change operations, thus improving the safety and reliability of vehicle driving. Kim B et al. [19] proposed a probabilistic threat assessment method based on road traffic information to predict and avoid possible collisions in multi-vehicle traffic. The proposed collision occurrence probability algorithm follows the basic idea of particle filtering and implements the numerical calculation of collision probability. Noh S et al. [20] proposed an automated driving decision framework for highway environments. The framework robustly assesses the potential risk of a collision for a given highway condition and determines the appropriate driving strategy for that situation. The proposed risk assessment method takes into account the uncertainty of the input data and infers the potential crash risk of the driving environment based on the Bayesian networks. In addition to this, there are probabilistic model-based risk assessment methods using Markov models, Gaussian processes, and deep learning [21,22].

1.3. Learning-Based Decision Methods

With the rapid development of machine learning in recent years, researchers have started to experiment with learning-based approaches to solve the problem of intelligent vehicle driving decisions. These approaches are further divided into imitation learning approaches and reinforcement learning approaches depending on the learning objectives. In the area of imitation-based learning research, Xu et al. [23] conducted a large-scale study on driving behavior learning and tested it using the BDDV dataset with image segmentation as an additional task of the network. The study achieved good accuracy in action mapping. However, in real driving situations, there are multiple solutions. For example, when crossing an intersection, there may be multiple ways of driving a vehicle. Codevilla et al. [24] proposed a conditional imitation learning approach, which inputs higher-level decision commands as conditions into the imitation learning framework to derive driving strategies for autonomous vehicles. It has been tested in both simulation and realistic environments with good results. However, imitation learning-based approaches usually rely on a large amount of data for training, which is costly to collect on the one hand. On the other hand, the collected data are usually related to the subjective judgment of the driver, and therefore, optimal driving decisions are not always obtained.
In recent years, reinforcement learning has been increasingly used for autonomous driving behavior decision-making. Mircheveska et al. [25] proposed a reinforcement learning method applying random forests for autonomous driving in highway scenarios. Mukadam et al. [26] proposed a deep Q-learning-based method to solve the problem of automatic vehicle lane changing in a multi-lane, multi-vehicle environment. The proposed method takes the vehicle state and the surrounding environment state as network inputs and the output is a score of five driving behaviors. Finally, the performance and generalization capability of the algorithm were verified in a SUMO simulation environment. Hu et al. [27] further proposed a training method based on a multi-intelligent body framework, aiming to obtain a more diverse training environment and to train and validate it in a lane merging task. The method extends the driving environment to more complex traffic scenarios by introducing multiple intelligences as agents of other vehicles, thus improving the diversity and realism of the training data. Bouton et al. [28] proposed a reinforcement learning method based on probabilistic guarantees to constrain the action selection of intelligence by using a desired probability specification represented by linear temporal logic (LTL) to achieve a more realistic training environment at autonomous driving at intersections involving multiple participants. Most of the current reinforcement learning is still explored in more idealized lane scenarios, lacking consideration of traffic rules, driving risks, and other uncertainty issues. Further research is still needed on how to introduce reinforcement learning into more complex traffic environments and realistic driving scenarios.

1.4. Contribution

This paper aims to design a risk-aware intelligent driving decision algorithm based on deep reinforcement learning methods to derive an autonomous driving decision strategy with low expected risk and high driving efficiency in intersection scenarios. To this end, this paper integrates risk-based assessment and learning-based driving decision methods. It introduces a non-deterministic risk assessment method for vehicle driving in the study of learning-based driving decision methods to address the lack of consideration of uncertainty issues such as driving risk in learning-based driving decision methods. The research in this paper constructs an intersection driving scenario in RoadRunner simulation software and conducts experiments to verify the effectiveness of the proposed method. The main work of this research paper is as follows.
  • Using a driving simulation platform, a typical intersection driving scenario is established and used for training and testing of reinforcement learning for autonomous driving;
  • Based on the Bayesian probability theory, this paper proposes a vehicle driving risk assessment method for intersection driving scenarios and incorporates the method into a learning-based driving decision method;
  • To improve driving safety, this paper proposes a driving policy learning algorithm based on state-action value distributed deep q-networks and introduces an attention network based on state information;
  • This paper is divided into six main sections. Section 1 is an introduction that describes the research background of this paper and the current state of research on rule-based, risk assessment, and learning-driving decision methods. Section 2 introduces the intersection driving scenario constructed by this paper’s research using a driving simulation platform. Section 3 introduces the vehicle driving risk assessment method based on the Bayesian probability theory for the intersection scenario proposed in this paper. Section 4 presents a driving strategy learning algorithm based on state-action value distributed deep q-networks proposed in this paper, in which the relevant state space, action space, reward function, and neural network are systematically analyzed and designed. Section 5 is the experimental design and result analysis of this paper. Section 6 is the summary and outlook of this paper. This chapter summarizes the research results and shortcomings and looks toward future work.

2. Simulation Environment

In this paper, we study vehicle driving strategies in intersection scenarios through deep reinforcement learning methods. As a self-learning method, reinforcement learning requires an intelligent body to interact continuously with the environment in order to continuously learn strategies and achieve specific goals. In this paper, we simulated the vehicle driving environment using the RoadRunner simulation software in MathWorks. This software can simulate various real-world traffic scenarios and also supports developers in building custom driving scenarios and interaction behaviors by writing scripts to help developers test and optimize autonomous driving algorithms and promote the progress of autonomous driving technologies. As one of the most complex traffic environments and accident-prone traffic scenarios, intersections also present complex and diverse causes of traffic accidents, but according to relevant studies, driver error is one of the most important causes. According to statistics, about 96% of traffic accidents are caused by driver’s misoperation, such as misunderstanding of traffic signs, negligence, and violation of traffic rules [29]. Therefore, considering the impact of driver misoperation with other vehicles in the scenario on the intelligent vehicle, the intersection scenario under study is assumed to be an unprotected intersection in this paper.

2.1. Experimental Scene Construction

The complexity of an intersection scenario comes from its dense traffic flow and complex traffic conditions. Therefore, the intersection simulation design requires careful consideration of various factors. In order to improve the simulation effect of the experiment and enhance the generalization ability of the intelligent driving strategy obtained from the training, the intersection scenario designed in this paper is a cross-shaped intersection connecting eight roads from different driving directions, and each road contains two parallel sub-lanes in the same direction, as shown in Figure 2. As shown in Figure 2, this paper sets the starting and ending points of intelligent vehicles according to three path scenarios of the left turn, straight ahead, and right turn. Other vehicles around are randomly generated in the lane at a random speed and pass through the intersection. The intelligent vehicle needs to learn to pass the intersection safely and efficiently in the designed intersection scenario.
This study proposes a hierarchical framework for vehicle motion behavior control for the intersection scenario. The intelligent drive model (IDM) controls the surrounding vehicles in the intersection scenario. However, this model only considers the interaction of vehicles within the same lane. Therefore, this paper studies the need to pay special attention to avoid lateral collisions with surrounding vehicles when constructing intersection scenarios. We used the following simplification strategy in building the scenario to address this issue: by using a fixed speed model, each surrounding vehicle predicts the position of its neighboring vehicle in the next 3 s, and when a risk of collision with the neighboring vehicle is detected, it decides whether to yield or not based on the priority of the road and the braking performance until the risk of collision is eliminated. Yielding is essential to traffic law in that it helps ensure safe traffic flow and prevents accidents. The lower-level model also designs speed and steering controllers that enable surrounding vehicles to travel according to the target speed given by the upper-level control model and the target lane.
A hierarchical framework is used for the control of the motion behavior of intelligent vehicles, with the upper layer being the driving policy managed by the deep reinforcement learning model and the lower layer being responsible for controlling the controllers for longitudinal and lateral motion, respectively, which together with the surrounding vehicles form the entire control system.

2.2. Vehicle Behavior Control

In this study, the driving motions of the intelligent vehicle and its surrounding vehicles are managed by a hierarchical control framework. The IDM model in the upper layer of the surrounding vehicle framework manages the longitudinal driving behavior.
In the environment constructed in this paper, the surrounding vehicle uses a simple model that matches the actual driving behavior, which influences the acceleration and steering of the vehicle. Among them, the IDM model, as a typical representative of the microscopic model, is used to achieve collision avoidance following driving. Therefore, the longitudinal behavior of the surrounding vehicles is controlled by the IDM model, whose longitudinal acceleration is given by the IDM model [30].
v ˙ = a [ 1 ( v v 0 ) δ ( d * d ) 2 ]
In the equation, v and v ˙ represent the travel speed and acceleration of the surrounding vehicle, respectively; a is the maximum acceleration of the surrounding vehicle; d denotes the relative distance between the surrounding vehicle and the vehicle in front of it; δ is the acceleration index; v 0 is the desired target speed; and d * is the desired target relative distance. Where d * is influenced by the vehicle ahead.
d * = d 0 + T v + v Δ v 2 a b
where d 0 represents the safe distance from the vehicle ahead; T represents the driver’s reaction time; v represents the relative speed difference with the vehicle ahead; and b represents the maximum deceleration rate of the vehicle.
In this paper, the relative velocities and distances in the IDM model are predefined to induce velocities and accelerations within each time step. The default parameters are set as follows: maximum acceleration a = 6   m / s 2 ; maximum deceleration b = 5   m / s 2 ; acceleration index δ = 4 ; relative safety distance d 0 = 10   m from the vehicle in front; and driver’s reaction time T = 1.5   s .
In the environment constructed in this study, the intelligent driving vehicle motion control system consists of a decision layer and a control layer. The decision layer uses deep reinforcement learning algorithms to generate driving decisions based on the input environmental information to determine the driving behavior of the intelligent vehicle (including accelerating, decelerating, and maintaining the current state). The control layer uses the relevant speed controller to realize the intelligent vehicle driving according to the target speed given by the decision layer. The following mathematical expression can represent its relevant mathematical model.
v t a r g e t = v + i Δ v
where v t a r g e t represents the target driving speed that the intelligent vehicle needs to achieve through acceleration or braking behavior; v represents the current driving speed of the intelligent vehicle; i represents the speed variation coefficient, i = 1 when the decision layer determines the driving behavior as acceleration, i = 1 when deceleration, otherwise, i = 0 ; v represents the amount of speed variation under intelligent vehicle control, which is related to the policy frequency of the algorithm to influence the accuracy of intelligent vehicle control.
The above is the upper framework of the motion behavior control framework for intelligent vehicles and surrounding vehicles in the environment constructed in this paper; the upper framework determines the driving behavior, and the lower layer controls the vehicle driving through the motion controller according to the determined driving behavior.

2.3. Vehicle Motion Control

In the environment constructed in this paper, the framework for controlling the motion behavior of intelligent vehicles and surrounding vehicles is divided into two frameworks: the upper and lower. Among them, the lower framework is the vehicle motion controller, which is responsible for controlling the vehicle motion according to the driving behavior and target driving speed determined by the upper framework and the target lane. The motion controller is a longitudinal controller, where the longitudinal controller uses the control strategy of a proportional–integral–differential controller with the following mathematical expressions.
a = K p × ( v t v ) + K i × ( ( v t v ) d t ) + K d × d ( v t v ) d t
where a represents the vehicle’s acceleration; v represents the vehicle’s travel speed; v t is the reference speed; and K p , K i , and K d are the proportional, integral, and differential gains of the controller, respectively.
In the experimental scenario constructed in this paper, the intelligent vehicle and the surrounding vehicles realize the driving control of the vehicle through the designed two-layer control framework. In this experimental environment, the intelligent vehicle will master how to achieve safe and efficient driving in the designed scenario by continuously trying and learning.

2.4. Experimental Scene Setting

In order to increase the diversity of cases in the experimental scenario and enhance the simulation effect and the generalization of the derived driving strategies, multiple other surrounding vehicles are set up by the scenario characteristics when constructing the experimental scenario in this paper; these surrounding vehicles follow the hierarchical vehicle motion behavior control framework proposed above to drive in the scenario.
The intersection scenario has a high traffic flow density and a more complex traffic situation scenario. The intersection scenario designed in this paper includes a cross-shaped intersection and several other surrounding vehicles. The designed intersection connects eight roads from different driving directions, and each road contains two parallel sub-lanes in the same direction, roughly ranging from a 50 m × 50 m square area. Specifically, the intersection scenario designed in this paper includes the following.
  • Intelligent vehicles are generated from the starting point at the beginning of the simulation at a speed of 7 m/s and travel along the road toward the set end point;
  • In the intersection scenario designed in this paper, two to three surrounding vehicles are set up at each road bordering the intersection. The peripheral vehicles are generated and driven at random locations on the road with a random speed in the interval of [0 m/s, 9 m/s];
  • Considering the characteristics of the intersection scenario and the driving task that the intelligent vehicle needs to complete, this paper sets the termination conditions for each round of the intelligent vehicle training process in the intersection scenario as follows: the intelligent vehicle collides, the intelligent vehicle reaches the endpoint, or the driving time reaches 30 s.

3. Vehicle Driving Risk Assessment Methods for Intersection Scenarios

Intersection scenarios are among the most complex and high-risk traffic scenarios. Therefore, it is challenging to propose an evaluation method that can robustly and correctly assess vehicle driving risk. In this paper, we assume that the intersection scenario under study is an unprotected intersection and design a vehicle driving risk assessment method for the intersection scenario based on the Bayesian probability theory, which has good robustness and applicability.

3.1. Scene Analysis

Intersection scenarios are more complex than other scenarios. However, the structured characteristics of the road network environment, road markings, and traffic regulations make it possible to predict the movement patterns of vehicles within intersection scenarios in advance. For example, by identifying the vehicle’s position on a digital map, the vehicle’s driving motive can be inferred. Therefore, this research paper determines a limited future driving path of a vehicle based on its location, intersection geometry, and topological features by projecting the vehicle in the intersection scene onto a digital map, as shown in Figure 3. In the figure, a finite set of future paths of the vehicle is given as it passes through the intersection, and the proposed method uses this finite set to identify potential threats.
When an autonomous vehicle passes through an intersection, it predicts the future path for all vehicles within the scenario. Furthermore, to achieve the main task of safely crossing the intersection, the autonomous vehicle pays special attention to the vehicles with intersection points with its predicted path and refers to them as relevant vehicles. In this process, the autonomous vehicle identifies the locations of the intersection points on the path of the relevant vehicles. It pays attention to the kinematic information of the relevant vehicles. All remaining vehicles are subsequently defined as irrelevant vehicles, as they do not threaten the autonomous vehicle, so they are not evaluated to improve the efficiency of the computation. Finally, the information on the relevant vehicles is used by the autonomous vehicle to evaluate the possibility of a collision when crossing the intersection in the current motion state.
The potential threat identification schematic in the intersection scenario is shown in Figure 4. In the figure, yellow, orange, and blue vehicles represent intelligent, related, and unrelated vehicles, respectively. The intelligent vehicle and related vehicles are represented by a and r i , respectively, where i denotes the serial number of the related vehicle. The red area in the figure is the potential collision area, which indicates the area where the relevant vehicle’s possible future path intersects with the intelligent vehicle’s future path. In the study, the potential collision regions are denoted by c j , where j denotes the serial number of the potential collision domain by distance on the future path of the intelligent vehicle. In addition, each potential collision region c j has a safety line and an end line corresponding to the intelligent driving vehicle and the related vehicles. In the risk assessment process, the probability of a two-vehicle collision is evaluated using the time when the intelligent vehicle and the related vehicle drive into the safety zone and out of the end line. v indicates the driving speed of the corresponding vehicle.

3.2. Vehicle Driving Risk Evaluation Metrics for Intersection Scenarios

For the intersection scenario, this study uses time to enter (TTE) to evaluate the driving condition of intelligent vehicles through the intersection scenario. The TTE indicates the time that the vehicle continues to travel in its current state of motion from its current position until it enters the collision zone. The calculation process is as follows.
t j T T E = 2 a A V d j + v A V 2 v A V a A V
In the above equation, t j T T E denotes the TTE value of the intelligent vehicle relative to the potential collision region c j ; a A V denotes the acceleration of the intelligent vehicle; d j denotes the distance of the intelligent vehicle from its current position along the road curvature to the safety line corresponding to the potential collision region c ; and v A V denotes the travel speed of the intelligent vehicle. However, the existence of an intersection region between the future path of the vehicle in question and the future path of the intelligent driving vehicle does not necessarily mean that a collision will occur between them. Therefore, assessing the driving risk in the current traffic environment requires using the kinematic information of the intelligent vehicle and the related vehicle to determine the possibility of collision in the corresponding potential collision region. In this study, we use the temporal overlap of intelligent and related vehicles through the potential collision region to determine whether they will collide in the corresponding potential collision region. The mathematical expression of this temporal overlap degree is as follows.
{ t j s = 2 a d j s + v 2 v a t j e = 2 a ( d j e + l ) + v 2 v a
In the above formula, t j s indicates the time when the vehicle drives into the safety line corresponding to the potential collision area c j ; t j e indicates the time when the vehicle drives out of the end line corresponding to the potential collision area c j ; d j s is the distance when the vehicle drives from the current position along the road curvature to the safety line corresponding to the potential collision area c j ; d j e is the distance when the vehicle drives from the current position along the road curvature to the end line corresponding to the potential collision area c j ; v is the current driving speed of the vehicle; a is the acceleration of the vehicle; and l is the body length of the vehicle.
By substituting the information of the map, the intelligent vehicle, and the related vehicles into Equation (6), we can obtain the time intervals [ t j s A V , t j e A V ] and [ t j s S V , t j e S V ] that the intelligent vehicle and the related vehicles pass in the potential collision region c j . By analyzing the overlap of these two-time intervals, we can determine whether a collision occurs between the intelligent vehicle and the vehicle of interest in the potential collision region c j . According to the different temporal arrangements of t j s A V , t j e A V , t j s S V and t j e S V , we can classify them into six cases, as shown in Figure 5. Except for the cases of ⑤ and ⑥, there is an overlap of time intervals between the intelligent vehicle and the related vehicle in passing the potential collision area c j , and the intelligent vehicle needs to make reasonable driving decisions to avoid the collision. In the cases of ⑤ and ⑥, there is no time interval overlap between the intelligent vehicle and the related vehicle in passing the potential collision area c j , and the intelligent vehicle can pass safely.
The time overlap between the intelligent vehicle passing through each potential collision area and the related vehicle passing through the potential collision area is calculated and analyzed to determine whether a collision will occur in each potential collision area for the intelligent vehicle. In the case of a collision, the driving risk of the intelligent vehicle regarding the potential collision region c j is evaluated based on the value of t j T T E .

3.3. Risk Probability Reasoning

In the intersection scenario, this paper assesses vehicle driving risk by defining three risk levels associated with t j T T E values. These three risk levels represent the risk levels associated with the likelihood of a collision, as shown below.
z j Z = { D a n g e r o u s , A t t e n t i v e , S a f e } = { D , A , S }
where z j is defined as a random variable that represents the risk level of the intelligent vehicle in the potential collision area c j with relevant vehicles. In order to determine the boundaries of these three risk levels, two thresholds t d and t a are defined, which represent the TTE value thresholds for the “dangerous” and “attentive” risk levels, respectively. According to the international safety standards [31,32] and vehicle tracking performance [33], T D and T A are defined as 4 s and 7 s, respectively. In addition, to consider the uncertainty in the driving scenario, we introduce the uncertainty measure σ t when constructing the likelihood function. t j T T E is defined as follows:
p ( t j T E E | z j = D ) { exp ( ( t j T E E t d ) 2 2 σ t 2 ) , for   t j T E E > t d 1 , otherwise p ( t j T E E | z j = A ) { exp ( ( t j T E E t d ) 2 2 σ t 2 ) , for   t j T E E < t d exp ( ( t j T E E t a ) 2 2 σ t 2 ) , for   t j T E E > t a 1 , otherwise p ( t j T E E | z j = S ) { exp ( ( t j T E E t a ) 2 2 σ t 2 ) , for   t j T E E < t a 1 , otherwise
In this paper, based on the assumption that the prior probability of vehicle risk level P ( z j ) follows a uniform distribution, the Bayesian probability theory is used to determine the probability distribution of each risk level of an intelligent vehicle for a potential collision area c j under a given traffic situation, which is calculated as follows.
P ( z j | t j T E E ) = p ( t j T E E | z j ) p ( z j ) z j Z p ( z j ) p ( t j T E E | z j )
This paper uses a numerical approach to measure the degree of the risk level. Precisely, values 2, 1, and 0 denote the different risk levels. Therefore, this paper defines the risk level value ε , which is defined as follows.
ε Z = { D a n g e r o u s , A t t e n t i v e , S a f e } = { 2 , 1 , 0 }
Based on the calculated probability distribution of each risk level of the intelligent driving vehicle regarding the potential collision area c j and the expected risk level value ε c j of the intelligent driving vehicle regarding the potential collision area c j can be calculated at the current moment, and the calculation process is as follows:
ε c j = ε { 2 , 1 , 0 } ε P ( z j | t j T E E )
In an intersection scenario, the threat to the intelligent vehicle may come from multiple possible future paths of multiple related vehicles. For example, in Figure 4, the related vehicle r 1 may have multiple future paths before entering the intersection. However, only the straight-ahead path intersects with the planned path of the intelligent vehicle. Therefore, the possibility of the future trajectory of the relevant vehicle also needs to be considered when assessing the driving risk of the intelligent vehicle. In addition, the intelligent vehicle will pass through each potential collision area in turn when passing through an intersection. In this process, the level of attention of the intelligent vehicle to each potential collision area is not equal. In order to measure the overall expected risk level value of the intelligent vehicle in the intersection scenario, the relevant vehicle trajectory likelihood weight w j and the recession factor γ are introduced, as follows:
ε r i s k = j = 1 n γ j 1 w j ε c j j = 1 n γ j 1 w j ε max
where ε r i s k represents the overall expected risk level value of the intelligent vehicle in the intersection scenario. w j represents the future trajectory possibility of the relevant vehicle corresponding to the potential collision region c j . In this paper, we assume that multiple future trajectories of the relevant vehicle follow a uniform distribution. γ [ 0 , 1 ] is the attenuation factor used to balance the potential collision region that is about to be passed and the potential collision region that will be passed in the future—the importance of the potential collision area. When γ is close to 0, the risk assessment method will pay more attention to the potential collision areas that are about to pass by; when γ is close to 1, the risk assessment method will pay more attention to the potential collision areas that are about to pass by in the future.

4. Intelligent Driving Decision Algorithm Based on Deep Reinforcement Learning

4.1. Deep Q-Network Learning Based on State-Action Value Distribution

Q-learning-based algorithms suffer from the problem of overestimation due to valuation errors, which can lead to a significant bias in the final obtained algorithmic model and result in degraded algorithm performance. To solve this problem, H. V. Hasselt et al. proposed the theory of double Q learning and applied it to the Deep Q-Network (DQN) learning algorithm [34]. The Double Deep Q-Network (DDQN) learning algorithm solves the overestimation problem by decoupling the selection of the target Q-value action and the computation of the target Q-value. Although the DDQN algorithm improves the algorithm performance by optimizing the computation of the target Q-value, there are still exceptional cases during the learning process, such as when an intelligent body chooses an action that may be accompanied by a significant risk to obtain a greater reward return. This situation is perilous and is manifested in the intelligent vehicle driving strategy learning process, as the intelligent vehicle may improve driving efficiency by driving dangerously. Therefore, safety should be considered along with reward maximization in the intelligent vehicle policy learning process. To this end, this paper uses a state-action value distributed deep Q-network-based learning algorithm distributional DQN learning algorithm through which optimal driving strategies for intelligent vehicles are derived.
The distributional DQN learning algorithm aims to improve the estimation of the distribution of possible rewards, making learning more stable when combined with a neural network. Specifically, the distributional DQN algorithm transforms the estimation of Q-values into a probability distribution so that the intelligent body can choose the best action based on the probability of Q-values for different actions. This change not only allows the intelligence to effectively perceive the risks present in the environment and improve the exploration of the environment, it also allows the intelligence to avoid risks when deciding on actions effectively, and the intelligence is more inclined to choose the action with a better worst-case scenario rather than just choosing the action with a larger Q-value. The basic structure of the distributional DQN algorithm used in this paper is shown in Figure 6.
In the distributional DQN algorithm used in this paper, the neural network’s output is the probability distribution of Q-values for each action, as shown in Figure 7. In the figure, the x-axis represents the normalized Q-values, and the y-axis represents the probability that the intelligence achieves the corresponding Q-value after taking action. To model the Q-value distribution, we use a discrete distribution represented by parameters N N and V m a x , V m i n R . In parameterizing the distribution, the range of Q-value taking [ V m i n , V m a x ] is discretized into N branches and divided into N 1 equal branching sets, and each is represented as follows:
{ z i = V min + i Δ z ; 0 i < N , Δ z = V max V min N 1 }
The branch set is fixed, and the neural network outputs the probability that each action Q-value takes the current branch value. Therefore, the Q-value of each action is calculated as follows:
Q ( s t + 1 , a ) = i = 0 N 1 z i p i ( s t + 1 , a )
The neural network of the distributional DQN algorithm takes the predicted Q-value distribution as the output. Accordingly, the training target should be obtained as a distribution as well. Therefore, the Bellman update for each branch is calculated as follows:
T z i = r t + γ z i
After the Bellman update calculation, there may be cases where the distribution takes values outside the range of the original distribution. In order to obtain the target probability distribution ( m j , i = 0 , , N 1 ) for each branch, the distribution T z i needs to be projected onto the branch of T z i . The specific procedure is as follows:
Φ T z i = j = 0 N 1 [ 1 | [ T z j ] V min V max z i | Δ z ] 0 1 p j
In the distributional DQN algorithm, since both the estimation network and the target network use the distribution as the output, this paper uses a measure of the similarity between the two distributions as the loss. Therefore, the loss function of the Distributional DQN algorithm can be implemented by calculating the cross-entropy term of the Kullback–Leibler divergence. Precisely, the loss function is calculated as follows:
L = i = 0 N 1 m i log p i ( s t , a t )
The distributional DQN algorithm has the same structure as the DQN algorithm, including empirical recall and a separate target network, and uses a sampling process with an ϵ-greedy strategy. However, the neural network output of the distributional DQN algorithm is no longer a Q-value function for each action but a probability distribution of Q-values for each action. This makes the action selection of the distributional DQN algorithm based on the ϵ-greedy strategy of expected Q-values. In the DQN algorithm, the Q-network is trained with the mean squared deviation as the loss function by finding the action corresponding to the maximum Q-value in the estimation network. However, the distributional DQN algorithm trains the network with KL divergence as the loss function by finding the action that corresponds to the best probability distribution of Q values in the estimation network. This change allows intelligence to use the probability distribution of Q-values to provide more information in action decision-making. In specific risk scenarios, the intelligence is more inclined to choose the action with less variance or a better worst-case scenario rather than just choosing the action with a larger Q value. Such a strategy allows intelligence to deal with uncertainty more robustly and improve its performance in risky environments.

4.2. Structure of the Intelligent Driving Decision Algorithm

In this paper, we propose an intelligent vehicle driving decision learning algorithm based on the deep reinforcement learning method by considering the vehicle driving risk approach and the Distributional DQN learning algorithm proposed above. The algorithm consists of three components: a perception layer, a decision layer, and a control layer. The perception layer fuses information from the intelligent vehicle and the surrounding environment as input and constructs it as a deep reinforcement learning state space. The decision layer uses a deep neural network to output driving decision commands. Finally, the control layer implements the longitudinal control of the intelligent vehicle using a PID model to control the vehicle according to the decision commands outputted by the decision layer. Combining the above descriptions, the structure of the intelligent driving decision algorithm in this paper is shown in Figure 8.

4.3. Markov Decision Process for Intelligent Driving

In this study, the driving decision problem of an intelligent vehicle is studied by describing it as a Markov decision process. In this process, the intelligent vehicle decides its action at the next time step based on the current state and is rewarded accordingly. For the driving decision problem in the straight ahead and intersection scenarios, this paper uses the tuple ( S , A , R , γ ) to describe its Markov decision process. where s S is the intelligent vehicle state space, a A is the action space, R is a model on immediate rewards r ( s , a , s ) , and γ [ 0 , 1 ] is a discount factor for delayed rewards, which is used to reduce the weight of distant rewards when balancing the importance of current and future rewards.

4.4. State Space

This paper proposes a deep-reinforcement-learning-based algorithm for intelligent vehicle driving decision-making. The algorithm describes the intelligent vehicle driving decision problem as a Markov decision process and represents it by a quadratic group ( S , A , R , γ ) . In the intelligent vehicle driving decision algorithm proposed in this paper, the state information represents the environmental information perceived by the intelligent vehicle and the changes caused by its actions. Therefore, the state space designed in this study contains the location information, driving state information, and environmental information of the intelligent vehicle and other vehicles around it. Among them, the joint state information of the autonomous vehicle s 0 and the location and driving state information of the surrounding N other vehicles are represented as follows:
s v = ( s k ) k [ 0 , N ] , s k = [ x k   y k   v k x   v k y   a k x   a k y   y a w k ] T
where x and y represent the position of the vehicle along the lateral and longitudinal axes of space, respectively; v x and v y represent the velocity of the vehicle along the lateral and longitudinal axes of space, respectively; a x and a y represent the acceleration of the vehicle along the lateral and longitudinal axes of space, respectively; and y a w represents the direction of vehicle travel.
Using such a representation minimizes the traffic information in the information representation environment. However, this representation has two problems. First, the dimensionality of its information vector varies with the number of vehicles present in the environment, which is detrimental to the approximation of an input function that expects a constant size. Second, driving decisions trained using this form are affected by the order in which the information about other states around the scene is arranged. This leads to a lack of some generalizability of the derived driving strategies. Therefore, when designing the state space, the filtered state information must be processed to ensure its formal and logical uniformity to improve the universality of the driving decisions derived in this paper.
In order to ensure the uniformity of the designed state space form and to guarantee the constant length of the output state information vector, this paper preprocesses the filtered state information. Specifically, as shown in Figure 9, the map information is gridded, and the vehicle state information in the environment is represented in the grid as a tensor. This processing not only ensures the stability of the state space structure but also makes the intelligent vehicle better perceive the relative position relationships of other vehicles around. At the same time, to learn a more general driving strategy, the designed state space must ensure that the state information has logical uniformity. Therefore, this paper takes the center of the intelligent vehicle as the origin and defines the state information of other surrounding vehicles as the position information and motion state information relative to the intelligent vehicle. The specific state information definition is shown in Table 1.
The road information in the state space follows the same formal and logical consistency described before. Specifically, the process is shown in Figure 10 by representing the road information in the form of 0 and 1 in the grid to ensure the uniformity and logical uniformity of the road information in the state space.
In summary, the state space in the study of this paper is the set of the joint state information of the position information, motion state information, and road information of the vehicles in the environment, and the state information of the environment at the moment is represented as follows (Figure 11):
s t = { s m , s v }

4.5. Action Space

As an intelligent vehicle passes through an intersection, the intelligent vehicle can adjust the throttle and brake operations to accelerate, decelerate, or maintain a constant speed for safety, depending on the surrounding traffic environment. Therefore, the action space of the intersection driving decision learning algorithm is defined as three speed-related operations: acceleration, deceleration, and constant speed driving.

4.6. Reward Functions

Reinforcement learning aims to maximize the designed reward function by seeking a strategy. Therefore, an appropriate reward function must be designed for an intelligent body that accomplishes a specific task. In this paper, the intelligent vehicle must drive safely and efficiently in an intersection scenario. To solve the reward sparsity problem, we decompose the target task of the intelligent vehicle into multiple sub-goal tasks with appropriate rewards or penalties to ensure that the intelligent vehicle can receive timely feedback at each time step. Therefore, in this paper, the goal task is decomposed into the following subgoal tasks: avoid the collision, drive at high speed, and try to ensure minimum driving risk. Based on this sub-goal task, we define the primary reward function as follows:
r t o t a l = λ 1 r c o l l i s i o n + λ 2 r r i s k + λ 3 r v e l o c i t y
where r c o l l i s i o n denotes the penalty for a collision of an intelligent vehicle; r r i s k is the reward for the driving risk of an intelligent vehicle based on the current traffic environment assessment; r v e l o c i t y is the reward for evaluating the driving efficiency of an intelligent vehicle; λ 1 , λ 2 , and λ 3 denote the relative weight coefficients of the reward functions r c o l l i s i o n , r r i s k , and r v e l o c i t y in the total immediate reward, respectively.
The penalty function r c o l l i s i o n for a collision of an intelligent vehicle is defined as follows:
r c o l l i s i o n = { 1 0   if   ego   vehicle   collides otherwise
In order to ensure the safety of intelligent driving vehicles and avoid possible collisions, intelligent vehicles should assess the risk level of their driving state according to the surrounding traffic environment and adjust their driving strategies accordingly. Therefore, this paper designs a reward function r r i s k based on the risk assessment of the current driving state of the intelligent vehicle based on the vehicle driving risk assessment method based on the Bayesian probability theory proposed in Section 3, defined as follows:
r r i s k = 1 ε r i s k ε m a x
where ε r i s k denotes the intelligent vehicle’s expected driving risk level value at the current time step and ε m a x denotes the defined maximum risk level value.
In this paper, it is proposed that excessive conservatism should be avoided in the driving safety of intelligent vehicles because it may not only negatively affect the efficiency of driving but also cause traffic accidents, delays, and gridlock [35]. Therefore, to maximize driving efficiency while ensuring safety, this paper designs the reward function r v e l o c i t y on driving efficiency, defined as follows:
r v e l o c i t y = { 0 1 v x v m i n v m a x v m i n if   v x < v m i n if   v x > v m a x otherwise
where v x represents the driving speed of the intelligent vehicle in the road direction; v m a x and v m i n represent the maximum desired speed and the minimum desired speed of the intelligent vehicle in the driving scenario studied in this paper, which is set to 9 m/s and 7 m/s, respectively, in this paper.
The total immediate reward r t o t a l designed in this paper is the weighted sum of the reward term λ 1 r c o l l i s i o n for collision avoidance, the reward term λ 2 r r i s k for driving risk, and the reward term λ 3 r v e l o c i t y for driving efficiency. Where λ 1 , λ 2 , and λ 3 are the weighting coefficients of the three reward terms in the total immediate reward. According to the above, r c o l l i s i o n takes values between { 0 , 1 } , r r i s k takes values in the range of [ 0 , 1 ] , and r v e l o c i t y takes values in the range of [0, 1], so the total immediate reward r t o t a l takes values in the range of [ λ 1 , λ 2 + λ 3 ] . In deep reinforcement learning, it is beneficial to use normalized rewards [36]. Therefore, in this paper, the total immediate reward r_total is normalized from the range of values [ λ 1 , λ 2 + λ 3 ] to the range [0, 1] to normalize the total reward value within each time step as shown below:
r = r t o t a l λ 1 λ 2 + λ 3 λ 1
In setting the three reward weight coefficients λ 1 , λ 2 , and λ 3 , this paper avoids using negative values as much as possible to avoid the situation in which the intelligent vehicle chooses to pre-emptively end a training set by colliding with other vehicles around it to avoid receiving large negative rewards. In this paper, the weight coefficient of the reward function for the collision of the intelligent vehicle is set to −1. In addition, this paper believes that intelligent vehicles should pay more attention to driving risk than driving efficiency during driving, so the weight coefficients of the reward functions for driving risk and driving efficiency are set to 0.3 and 0.2, respectively.

4.7. Neural Network Structure

The model in this study consists of a value function network containing an online Q-value network and a target Q-value network with identical structures. The network’s input comes from the state space, and the output is the distribution of state-action values. This paper argues that intelligent vehicles should focus on the vehicles associated with their driving rather than all the vehicles around them during driving. Therefore, this paper proposes an attention network based on state information to realize this idea, as shown in Figure 12. The network introduces a particular attention module consisting of two parts: a channel attention module and a spatial attention module.
The attention network designed in this paper uses the state space S R C × H × W as input, and the state information in each dimension of the state space is normalized and processed by the channel attention module M c R C × 1 × 1 and the spatial attention module M s R 1 × H × W , as shown in Figure 13. The attentional process of the designed attentional network can be summarized as follows.
S S n
S n = M c ( S n ) S n M s ( S n )
where ⨂ denotes element-by-element multiplication. S n is the normalized state space. In this study, the state information of each dimension of the state space S R C × H × W is normalized by batch normalization. M c ( S n ) is the output of S n after the channel attention module M c ; M s ( S n ) is the final output of S n after the spatial attention module M s .
The channel attention module implements the channel attention mechanism by analyzing the relationships between feature channels. Each information dimension in the state space can be considered a channel, and each channel can be regarded as a feature extractor. The channel attention module uses the attention mechanism to focus on meaningful information dimensions, thus improving the model’s focus on essential features and enhancing its expressive power. Woo et al. argue that average and maximum pooling, which collect information about different object features, leads to more accurate channel attention [37]. Therefore, in this paper, both averaging and maximum pooling methods are used with reference to the above research work. The computation process of the two attention modules is shown in Figure 14, and the two are computed as follows:
M c ( S n ) = σ ( M L P ( M a x P o o l ( S n ) ) + M L P ( A v g P o o l ( S n ) ) ) = σ ( W 1 ( W 0 ( S n m a x c ) ) + W 1 ( W 0 ( S n a v g c ) ) )
M s ( S n ) = σ ( f 3 × 3 ( [ A v g P o o l ( S n ) ; M a x P o o l ( S n ) ] ) ) = σ ( f 3 × 3 ( [ S n a v g s ; S n m a x s ] ) )
where f 3 × 3 denotes a convolution operation with a convolution kernel size of 3 × 3.
The attention network designed in this paper uses the channel and spatial attention modules to analyze the state information in the current driving scenario to determine important information and the surrounding vehicles relevant to their driving. This attention network helps intelligent driving vehicles understand their surroundings more accurately and make more effective driving decisions.

5. Results

The algorithms studied in this paper are based on the implementation of PyTorch. The learning framework and the models are trained using CUDA acceleration techniques. CUDA is a parallel computing platform and programming model from NVIDIA that can be used to accelerate high-performance computing applications based on NVIDIA GPUs. The CUDA platform provides higher parallel computing performance than traditional CPUs by distributing parallel computing work to the thousands of compute cores in the GPU to achieve acceleration. All experiments in this chapter are conducted on the same experimental platform. In this paper, the network is trained using the Adam optimizer, and the network parameters are updated with the learning rate η . All hyperparameters are detailed in Table 2.
In this paper, based on the constructed experimental environment and the pre-set experimental platform, the algorithm is trained and evaluated through the network structure and parameter settings.

5.1. Algorithm Performance Analysis

In this section, intelligent vehicles are trained to learn through the intersection scenario designed in this study using the distributional DQN algorithm, DDQN algorithm, and DQN algorithm, and with all of them using state information-based attention networks. This paper uses the intersection passing rate and the standardized reward of the intelligent vehicle in the scenario as quantitative metrics to verify the effectiveness and superior performance of the driving decision algorithms in this study. Among them, the intersection passing rate is defined as follows:
t h r o u g h p u t   r a t e = n n 9 I n ( t h r o u g h p u t = t r u e ) 10
Figure 15 shows the comparison of different quantitative metrics during the training process for the distributional DQN algorithm, DDQN algorithm, and DQN algorithm. The vertical axis indicates the specific values of the quantified metrics, and the horizontal axis indicates the number of episodes in the training process.
As shown in Figure 15, the intelligent vehicle trained with reinforcement learning showed significant performance improvement in the intersection scenario. In the aspect of reward value, the intelligent vehicle showed an apparent upward trend during the training process, and all of them reached a stable state after a period of training; meanwhile, in the aspect of the intersection passing rate, the intelligent vehicle showed a trend of gradual improvement during the training process, which indicates that with the training process, the intelligent vehicle gradually mastered the driving skills of passing the intersection as the training process progressed.
As can be seen from the curves in Figure 15 regarding the intersection passing rate, the collision rate curves of the intelligent vehicles using reinforcement learning show a sudden drop during the rise. The analysis reveals that during the reinforcement learning training process, the intelligent vehicle cannot fully explore the state-space aspect, so it will prioritize the decision to increase the driving speed without colliding with the surrounding vehicles to obtain a higher reward for driving efficiency. In this process, in order to obtain higher driving efficiency, the intelligent vehicle will lead to frequent collisions. After this, the intelligent vehicle learns the driving measures it should take when encountering obstacles by fully exploring the environment. After experiencing the penalty of collisions, the intelligent vehicle learns the driving skill of avoiding obstacles, which increases the reward value. The above description shows that the reward function designed in this study plays an influential guiding role in the training process of driving measures of intelligent vehicles.
The experimental analysis conducted for the three algorithms in the intersection scenario shows that the distributional DQN algorithm can converge to the optimal policy relatively quickly. Specifically, the algorithm converged to the optimal policy at about 970 training rounds, much faster than the other two. Regarding the intersection passing rate, both algorithms used in this study show a trend of gradually increasing the success rate of intelligent vehicles passing the intersection. Specifically, after 1910 rounds, the intersection passing rate of the intelligent vehicles trained with the distributional DQN algorithm was stable at about 100%. In contrast, the intersection passing rate of the intelligent vehicles trained with the DDQN algorithm was slightly lower during the training process, stabilizing at around 80%. In comparison, the intersection passing rate of the intelligent vehicles trained with the DQN algorithm was around 70%. These experimental results show that the distributional DQN algorithm performs better in complex intersection scenarios and can achieve the optimal strategy in a shorter time and a higher intersection passing rate.
Under the experimental conditions designed in this section, the intelligent vehicles are trained for 2000 episodes. To further verify the effectiveness of their strategies, we create an independent test set with 100 episodes and the same environment settings as the training process. The evaluation of the test set allows us to compare the differences in policy effectiveness of the intelligent vehicles trained by different algorithms in the intersection scenario in this section of the experiments. Table 3 shows the differences between the average driving performance and intersection passing rate scenarios of the intelligent vehicles using the distributional DQN algorithm, DDQN algorithm, and DQN algorithm proposed in this paper during the testing process, indicating the relative rate of change. Specifically, the average passing rate of the intelligent vehicle using the distributional DQN algorithm proposed in this paper is 98% during the test, which is a 22% improvement compared to the intelligent vehicle with the DQN algorithm. In addition, in terms of driving efficiency, the average driving speed of the intelligent vehicle using the distributional DQN algorithm proposed in this paper is 8.71 m/s during the test, which is 10.39% higher relative to the intelligent vehicle with the DQN algorithm. The above results reveal that the driving strategy derived using the distributional DQN algorithm proposed in this paper has higher driving efficiency and intersection passing success rate than the DDQN and DQN algorithms.
Combining the above, the distributional DQN algorithm proposed in this paper outperforms the DDQN algorithm and the DQN algorithm, enabling the trained intelligent vehicles to master the driving strategies in intersection scenarios faster and better. This research proposes a deep Q-network learning method based on state-action value distribution to derive the best driving strategy for intelligent vehicles in intersection scenarios, considering driving safety. The algorithm proposed in this paper reconstructs the neural network to output each state-action’s value probability distribution. This change allows the algorithm to predict expected returns more accurately than traditional deep Q-network learning algorithms that rely on individual scalar value predictions and improves learning stability. As an example, Figure 16 shows the process of finding an action through the state-action value distribution for each state of the intelligent vehicle trained by the Distributional DQN algorithm in an intersection scenario, where Ac, Dc, and Keep denote “accelerate,” “decelerate” and “keep current state”, respectively. In this traffic situation, the intelligent vehicle selects actions in the following order: Keep, Dc, Ac, and Ac. This further validates that the proposed algorithm can help the intelligent vehicle avoid risky maneuvers and make optimal driving decisions in scenarios where there are many uncertainties and risks.
The experimental results show that using deep reinforcement learning algorithms in intersection scenarios enables intelligent vehicles to learn practical and intelligent driving decisions. This approach of continuously exploring driving skills using deep reinforcement learning algorithms using a reward function enables vehicles to learn more diverse driving skills to deal with complex and random traffic environments and improve as the scenario changes continuously.

5.2. Risk Assessment Method Analysis

The vehicle driving risk assessment method proposed in this paper applies continuous risk assessment values to the reward function to avoid behaviors that may lead the intelligent vehicle into higher-risk situations during training. This allows the intelligent vehicle to become aware of driving behaviors that may lead the vehicle into dangerous situations by penalizing them to different degrees when making driving decisions. However, even without using risk assessment methods, training an intelligent vehicle with a reward function based only on collision penalties allows it to learn to avoid collisions after some time. However, the driving strategy derived from this algorithm tends to put the intelligent vehicle into a potentially dangerous driving state. Figure 17 shows the driving risk states per second for successful test sets of intelligent vehicles using the driving risk assessment method and the deep reinforcement learning driving decision algorithm without the driving risk assessment method. It can be found that the intelligent vehicles with driving strategies derived using the risk assessment method maintain a lower risk state during driving, while the intelligent vehicles without driving strategies derived using the risk assessment method are always at a higher risk state. In summary, applying risk assessment methods can more accurately reflect the potential risks of intelligent vehicles in driving scenarios and help intelligent vehicles find the best driving strategy in the current driving environment.

5.3. Neural Network Structure Analysis

In this paper, to demonstrate the superiority of the state-information-based attention network proposed in this paper, we designed experiments using the Distributional DQN algorithm and a multilayer perceptron (MLP) neural network for reinforcement learning training in an intersection scenario. In this section, the attention network and MLP network were trained using the distributional DQN algorithm for reinforcement learning, and a comparison of their training results is shown in Figure 18. In Figure 18, the horizontal axis represents the number of episodes during training and the vertical axis represents the round reward.
The state-information-based attention network proposed in this paper contains a channel attention module and a spatial attention module, which focus on the practical information in state space and the spatial location of the practical information, respectively. The channel attention module can help the intelligent vehicle identify critical state information in the current driving scenario, such as relative speed information. In contrast, the spatial attention module can help the intelligent car identify which vehicles around it are of importance to understand the surrounding environment. By analyzing the experimental results of different network structures, we find that intelligent vehicles using the attention network algorithm perform better than those using the MLP network algorithm.
The specific performance of the intelligent vehicle during the experiment also proves that the intelligent vehicle with an attention network based on state information can complete the driving task more safely in the complex intersection scenario. The specific performance during the experiments is shown in Figure 18. In this paper, we use green-width lines to connect the intelligent vehicle with the surrounding vehicles, and the width of the connecting lines is proportional to the corresponding attention weights. The list on the right side shows the attention weight ratio for each dimension of state information output by the attention network. In Figure 19a, the intelligent vehicle has not yet entered the intersection, the surrounding vehicles 1, 2, 3, and 4 are far away, and the destinations of these vehicles are not yet precise. Therefore, the intelligent vehicle shows equal attention to different dimensions of state information and these vehicles. In addition, for vehicles that do not pose a threat, intelligent vehicles do not assign attention. In Figure 19b, the direction of travel of vehicles 1 and 2, whose original destinations are uncertain, changes, showing that vehicle 1 turns left and vehicle 2 goes straight, while vehicles 3 and 4 remain far away from the intelligent vehicles and their destinations are unclear. Therefore, the intelligent vehicle pays more attention to vehicles 1 and 2 than vehicles 3 and 4. As the intelligent vehicle drove into the intersection, the attention level of the attention network for map information increased. As the relative distance between vehicles 1 and 2 and the intelligent vehicles on the x-axis gradually decreases, the attention network starts to pay more attention to the state information in the x-axis direction.
In summary, this section demonstrates the superiority of attention networks based on state information through experimental results. Introducing the attention mechanism makes the algorithm training process more stable, it converges faster, and it improves intelligence performance.

6. Conclusions

In this paper, a non-deterministic vehicle driving risk assessment method is proposed for the intersection scenario and introduced into the learning-based intelligent driving decision algorithm to derive an automated driving decision strategy with low expected risk and high driving efficiency. According to the analysis of the experimental results, the proposed algorithm can effectively derive a driving strategy with both driving efficiency and safety in intersection driving scenarios. The designed risk assessment method can improve driving safety while ensuring the driving efficiency of the intelligent vehicle. In contrast, the designed attention neural network helps the intelligent vehicle perceive the surrounding environment more accurately and identify important information related to its driving and the surrounding vehicles, thus improving the training stability of the algorithm, enhancing the performance of the intelligent body, and accelerating the convergence speed. The research in this paper is still an exploration of an ideal lane environment, where various other driving scenarios exist in the actual traffic environment. Therefore, introducing the idea of transfer learning to transfer knowledge or skills learned in a single scenario to other environments, or transferring knowledge gained in virtual environments to real scenarios, will be the direction of continued research in future work.

Author Contributions

Conceptualization, W.Y. and Y.Q.; methodology, W.Y.; software, W.Y.; validation, W.Y., Y.Q. and H.S.; formal analysis, W.Y.; investigation, J.W.; resources, J.X.; data curation, J.X.; writing—original draft preparation, W.Y.; writing—review and editing, Y.Q.; visualization, H.S.; supervision, J.X.; project administration, Y.Q.; funding acquisition, Y.Q. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Bigra, E.M.; Connelly, E.; Gorner, M.; Lowans, C.; Paoli, L.; Tattini, J.; Teter, J.; LeCroy, C.; MacDonnell, O.; Welch, D.; et al. Global EV Outlook 2021|Accelerating Ambitions Despite the Pandemic; IEA: Paris, France, 2021. [Google Scholar]
  2. World Health Organization. Global Status Report on Road Safety: Summary; World Health Organization: Geneva, Switzerland, 2018. [Google Scholar]
  3. GIDAS: German In-Depth Accident Study. Available online: http://www.gidas.org/ (accessed on 1 February 2018).
  4. Namazi, E.; Li, J.; Lu, C. Intelligent intersection management systems considering autonomous vehicles: A systematic literature review. IEEE Access 2019, 7, 91946–91965. [Google Scholar] [CrossRef]
  5. Li, G.; Li, S.; Li, S.; Qin, Y.; Cao, D.; Qu, X.; Cheng, B. Deep reinforcement learning enabled decision-making for autonomous driving at intersections. Automot. Innov. 2020, 3, 374–385. [Google Scholar] [CrossRef]
  6. Seong, H.; Jung, C.; Lee, S.; Shim, D.H. Learning to drive at unsignalized intersections using attention-based deep reinforcement learning. In Proceedings of the 2021 IEEE International Intelligent Transportation Systems Conference (ITSC), Indianapolis, IN, USA, 19–22 September 2021; pp. 559–566. [Google Scholar]
  7. Jurgen, R.K. V2V/V2I Communications for Improved road Safety and Efficiency; SAE International: Warrendale, PA, USA, 2012; Volume 154. [Google Scholar]
  8. Furda, A.; Vlacic, L. Enabling safe autonomous driving in real-world city traffic using multiple criteria decision making. IEEE Intell. Transp. Syst. Mag. 2011, 3, 4–17. [Google Scholar] [CrossRef] [Green Version]
  9. Chong, L.; Abbas, M.M.; Flintsch, A.M.; Higgs, B. A rule-based neural network approach to model driver naturalistic behavior in traffic. Transp. Res. Part C Emerg. Technol. 2013, 32, 207–223. [Google Scholar] [CrossRef]
  10. Li, S.; Zhang, J.; Wang, S.; Li, P.; Liao, Y. Ethical and legal dilemma of autonomous vehicles: Study on driving decision-making model under the emergency situations of red light-running behaviors. Electronics 2018, 7, 264. [Google Scholar] [CrossRef] [Green Version]
  11. Bai, Z.; Shangguan, W.; Cai, B.; Chai, L. Deep reinforcement learning based high-level driving behavior decision-making model in heterogeneous traffic. In Proceedings of the 2019 Chinese Control Conference (CCC), Guangzhou, China, 27–30 July 2019; pp. 8600–8605. [Google Scholar]
  12. Hillenbrand, J.; Spieker, A.M.; Kroschel, K. A multilevel collision mitigation approach—Its situation assessment, decision making, and performance tradeoffs. IEEE Trans. Intell. Transp. Syst. 2006, 7, 528–540. [Google Scholar] [CrossRef]
  13. Glaser, S.; Vanholme, B.; Mammar, S.; Gruyer, D.; Nouveliere, L. Maneuver-based trajectory planning for highly autonomous vehicles on real road with traffic and driver interaction. IEEE Trans. Intell. Transp. Syst. 2010, 11, 589–606. [Google Scholar] [CrossRef]
  14. Lee, H.; Kang, C.M.; Kim, W.; Choi, W.Y.; Chung, C.C. Predictive risk assessment using cooperation concept for collision avoidance of side crash in autonomous lane change systems. In Proceedings of the 2017 17th International Conference on Control, Automation and Systems (ICCAS), Jeju, Republic of Korea, 18–21 October 2017; pp. 47–52. [Google Scholar]
  15. Mar, J.; Lin, H.-T. The car-following and lane-changing collision prevention system based on the cascaded fuzzy inference system. IEEE Trans. Veh. Technol. 2005, 54, 910–924. [Google Scholar] [CrossRef]
  16. Wang, C.; Stamatiadis, N. Surrogate safety measure for simulation-based conflict study. Transp. Res. Rec. 2013, 2386, 72–80. [Google Scholar] [CrossRef]
  17. Li, Y.; Lu, J.; Xu, K. Crash risk prediction model of lane-change behavior on approaching intersections. Discret. Dyn. Nat. Soc. 2017, 2017, 7328562. [Google Scholar] [CrossRef] [Green Version]
  18. Schubert, R.; Schulze, K.; Wanielik, G. Situation assessment for automatic lane-change maneuvers. IEEE Trans. Intell. Transp. Syst. 2010, 11, 607–616. [Google Scholar] [CrossRef]
  19. Kim, B.; Park, K.; Yi, K. Probabilistic threat assessment with environment description and rule-based multi-traffic prediction for integrated risk management system. IEEE Intell. Transp. Syst. Mag. 2017, 9, 8–22. [Google Scholar] [CrossRef]
  20. Noh, S.; An, K. Decision-making framework for automated driving in highway environments. IEEE Trans. Intell. Transp. Syst. 2017, 19, 58–71. [Google Scholar] [CrossRef]
  21. Laugier, C.; Paromtchik, I.E.; Perrollaz, M.; Yong, M.; Yoder, J.-D.; Tay, C.; Mekhnacha, K.; Nègre, A. Probabilistic analysis of dynamic scenes and collision risks assessment to improve driving safety. IEEE Intell. Transp. Syst. Mag. 2011, 3, 4–19. [Google Scholar] [CrossRef] [Green Version]
  22. Lambert, A.; Gruyer, D.; Saint Pierre, G. A fast monte carlo algorithm for collision probability estimation. In Proceedings of the 2008 10th International Conference on Control, Automation, Robotics and Vision, Hanoi, Vietnam, 17–20 December 2008; pp. 406–411. [Google Scholar]
  23. Xu, H.; Gao, Y.; Yu, F.; Darrell, T. End-to-end learning of driving models from large-scale video datasets. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2174–2182. [Google Scholar]
  24. Codevilla, F.; Müller, M.; López, A.; Koltun, V.; Dosovitskiy, A. End-to-end driving via conditional imitation learning. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QSL, Australia, 21–25 May 2018; pp. 4693–4700. [Google Scholar]
  25. Mirchevska, B.; Blum, M.; Louis, L.; Boedecker, J.; Werling, M. Reinforcement learning for autonomous maneuvering in highway scenarios. In Proceedings of the Workshop for Driving Assistance Systems and Autonomous Driving, Yokohama, Japan, 16–19 October 2017; pp. 32–41. [Google Scholar]
  26. Mukadam, M.; Cosgun, A.; Nakhaei, A.; Fujimura, K. Tactical decision making for lane changing with deep reinforcement learning. In Proceedings of the NIPS Workshop on Machine Learning for Intelligent Transportation Systems, Long Beach, CA, USA, 9 December 2017. [Google Scholar]
  27. Hu, Y.; Nakhaei, A.; Tomizuka, M.; Fujimura, K. Interaction-aware decision making with adaptive strategies under merging scenarios. In Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China, 3–8 November 2019; pp. 151–158. [Google Scholar]
  28. Bouton, M.; Nakhaei, A.; Fujimura, K.; Kochenderfer, M.J. Cooperation-aware reinforcement learning for merging in dense traffic. In Proceedings of the 2019 IEEE Intelligent Transportation Systems Conference (ITSC), Auckland, New Zealand, 27–30 October 2019; pp. 3441–3447. [Google Scholar]
  29. Choi, E.H. Crash Factors in Intersection-Related Crashes: An On-Scene Perspective; Technical Report No. DOT HS 811 366; U.S. Department of Transportation, National Highway Traffic Safety Administration (NHTSA): Washington, DC, USA, 2010.
  30. Zhou, M.; Qu, X.; Jin, S. On the impact of cooperative autonomous vehicles in improving freeway merging: A modified intelligent driver model-based approach. IEEE Trans. Intell. Transp. Syst. 2016, 18, 1422–1428. [Google Scholar] [CrossRef]
  31. ISO Std. 22 179:2009; Intelligent Transport Systems—Full Speed Range Adaptive Cruise Control (FSRA) Systems—Performance Requirements and Test Procedures. ISO: Geneva, Switzerland, 2009.
  32. ISO Std. 26 684:2015; Intelligent Transport Systems (ITS)—Cooperative Intersection Signal Information and Violation Warning Systems (CIWS)—Performance Requirements and Test Procedures. ISO: Geneva, Switzerland, 2015.
  33. Na, K.; Byun, J.; Roh, M.; Seo, B. RoadPlot-DATMO: Moving object tracking and track fusion system using multiple sensors. In Proceedings of the 2015 International Conference on Connected Vehicles and EXPO (ICCVE), Shenzhen, China, 19–23 October 2015; pp. 142–143. [Google Scholar]
  34. Van Hasselt, H.; Guez, A.; Silver, D. Deep reinforcement learning with double q-learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016. [Google Scholar]
  35. Zhan, W.; Liu, C.; Chan, C.-Y.; Tomizuka, M. A non-conservatively defensive strategy for urban autonomous driving. In Proceedings of the 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC), Rio de Janeiro, Brazil, 1–4 November 2016; pp. 459–464. [Google Scholar]
  36. Min, K.; Kim, H.; Huh, K. Deep distributional reinforcement learning based high-level driving policy determination. IEEE Trans. Intell. Veh. 2019, 4, 416–424. [Google Scholar] [CrossRef]
  37. Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
Figure 1. Advantages and disadvantages of different driving decision methods (* The green boxes represent advantages, and the red boxes represent disadvantages).
Figure 1. Advantages and disadvantages of different driving decision methods (* The green boxes represent advantages, and the red boxes represent disadvantages).
Wevj 14 00079 g001
Figure 2. Intersection simulation environment: (a) left turn and straight ahead; (b) right turn and straight ahead.
Figure 2. Intersection simulation environment: (a) left turn and straight ahead; (b) right turn and straight ahead.
Wevj 14 00079 g002
Figure 3. Projecting the vehicle onto the digital map.
Figure 3. Projecting the vehicle onto the digital map.
Wevj 14 00079 g003
Figure 4. Potential threat identification.
Figure 4. Potential threat identification.
Wevj 14 00079 g004
Figure 5. Classification of TTE sequence arrangement in the collision area.
Figure 5. Classification of TTE sequence arrangement in the collision area.
Wevj 14 00079 g005aWevj 14 00079 g005b
Figure 6. Framework of the distributional DQN algorithm.
Figure 6. Framework of the distributional DQN algorithm.
Wevj 14 00079 g006
Figure 7. Example output of expected future earnings in the distributional DQN.
Figure 7. Example output of expected future earnings in the distributional DQN.
Wevj 14 00079 g007
Figure 8. Intelligent driving decision algorithm framework.
Figure 8. Intelligent driving decision algorithm framework.
Wevj 14 00079 g008
Figure 9. Vehicle status information pre-processing process.
Figure 9. Vehicle status information pre-processing process.
Wevj 14 00079 g009
Figure 10. Map status information pre-processing process.
Figure 10. Map status information pre-processing process.
Wevj 14 00079 g010
Figure 11. Environment status information.
Figure 11. Environment status information.
Wevj 14 00079 g011
Figure 12. Structure of the attention network based on state information.
Figure 12. Structure of the attention network based on state information.
Wevj 14 00079 g012
Figure 13. Attention network.
Figure 13. Attention network.
Wevj 14 00079 g013
Figure 14. Attention modules.
Figure 14. Attention modules.
Wevj 14 00079 g014
Figure 15. Intersection scenario training results: (a) average reward variation for the three methods during training and (b) intersection pass rate variation for the three methods during training.
Figure 15. Intersection scenario training results: (a) average reward variation for the three methods during training and (b) intersection pass rate variation for the three methods during training.
Wevj 14 00079 g015
Figure 16. Intelligent vehicles make driving decisions through state-action value distribution.
Figure 16. Intelligent vehicles make driving decisions through state-action value distribution.
Wevj 14 00079 g016
Figure 17. Intersection scenario test results: RA denotes that the algorithm uses a risk assessment method; 0 denotes that the risk level is safe; 1 denotes that the risk level is attentive; and 2 denotes that the risk level is dangerous.
Figure 17. Intersection scenario test results: RA denotes that the algorithm uses a risk assessment method; 0 denotes that the risk level is safe; 1 denotes that the risk level is attentive; and 2 denotes that the risk level is dangerous.
Wevj 14 00079 g017
Figure 18. Training results of algorithms using different network structures: (a) using the attention network and (b) not using the attention network.
Figure 18. Training results of algorithms using different network structures: (a) using the attention network and (b) not using the attention network.
Wevj 14 00079 g018
Figure 19. Changes in attention distribution during driving in the intelligent vehicle. (a) The intelligent vehicle has not yet entered the intersection.; (b) The intelligent vehicle has entered the intersection.
Figure 19. Changes in attention distribution during driving in the intelligent vehicle. (a) The intelligent vehicle has not yet entered the intersection.; (b) The intelligent vehicle has entered the intersection.
Wevj 14 00079 g019
Table 1. Definition of vehicle status information.
Table 1. Definition of vehicle status information.
Status InformationScope (Unit)Description
x k [−200, 200] (m)Relative distance along the longitudinal axis to the intelligent vehicle
y k [−200, 200] (m)Relative distance along the transverse axis to the intelligent vehicle
v k x [−40, 40] (m/s)Relative intelligent vehicle travel speed along the longitudinal axis
v k y [−40, 40] (m/s)Relative intelligent vehicle travel speed along the transverse axis
a k x [−8, 8] (m/s2)Acceleration of travel along the longitudinal axis relative to the intelligent vehicle
a k y [−8, 8] (m/s2)Acceleration of travel along the transverse axis relative to the intelligent vehicle
y a w k [− π , π ] (rad)Vehicle direction of travel
Table 2. Network training hyperparameters.
Table 2. Network training hyperparameters.
HyperparameterValue
Discount factor γ 0.99
Learning rate η 0.001
Experience replay memory size M r e p l a y 500,000
Number of empirically collected samples M b a t c h 128
Target network update frequency F 10
Number of discrete branch points of Q-value distribution N 51
Q-value range [ V m i n , V m a x ] [−10, 10]
Table 3. Test results of using different algorithms in the intersection scenario.
Table 3. Test results of using different algorithms in the intersection scenario.
AlgorithmPassing Rate (%) Relative   Rate   of   Change   (%) Average Speed (m/s) Relative   Rate   of   Change   (%)
DQN77-7.89-
DDQN86 9 8.27 4.82
Distributional DQN98 22 8.71 10.39
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yu, W.; Qian, Y.; Xu, J.; Sun, H.; Wang, J. Driving Decisions for Autonomous Vehicles in Intersection Environments: Deep Reinforcement Learning Approaches with Risk Assessment. World Electr. Veh. J. 2023, 14, 79. https://0-doi-org.brum.beds.ac.uk/10.3390/wevj14040079

AMA Style

Yu W, Qian Y, Xu J, Sun H, Wang J. Driving Decisions for Autonomous Vehicles in Intersection Environments: Deep Reinforcement Learning Approaches with Risk Assessment. World Electric Vehicle Journal. 2023; 14(4):79. https://0-doi-org.brum.beds.ac.uk/10.3390/wevj14040079

Chicago/Turabian Style

Yu, Wangpengfei, Yubin Qian, Jiejie Xu, Hongtao Sun, and Junxiang Wang. 2023. "Driving Decisions for Autonomous Vehicles in Intersection Environments: Deep Reinforcement Learning Approaches with Risk Assessment" World Electric Vehicle Journal 14, no. 4: 79. https://0-doi-org.brum.beds.ac.uk/10.3390/wevj14040079

Article Metrics

Back to TopTop