Risk Identification and Conflict Prediction from Videos Based on TTC-ML of a Multi-Lane Weaving Area

Xia, Yulan; Qin, Yaqin; Li, Xiaobing; Xie, Jiming

doi:10.3390/su14084620

Open AccessEditor’s ChoiceArticle

Risk Identification and Conflict Prediction from Videos Based on TTC-ML of a Multi-Lane Weaving Area

¹

School of Traffic Engineering, Kunming University of Science and Technology, Kunming 650500, China

²

Center for Urban Transportation Research (CUTR), 3808 USF Alumni Drive, Tampa, FL 33620, USA

^*

Author to whom correspondence should be addressed.

Sustainability 2022, 14(8), 4620; https://0-doi-org.brum.beds.ac.uk/10.3390/su14084620

Submission received: 22 February 2022 / Revised: 30 March 2022 / Accepted: 7 April 2022 / Published: 12 April 2022

(This article belongs to the Special Issue Sustainable Transportation for the Future: Automated Vehicles and Big Data on Traffic Operations)

Download

Browse Figures

Versions Notes

Abstract

:

Crash risk identification and prediction are expected to play an important role in traffic accident prevention. However, most of the existing studies focus only on highways, not on multi-lane weaving areas. In this paper, a potential collision risk identification and conflict prediction model based on extending Time-to-Collision-Machine Learning (TTC-ML) for multi-lane weaving zone was proposed. The model can accurately learn various features, such as vehicle operation characteristics, risk and conflict distributions, and physical zoning characteristics in the weaving area. Specifically, TTC was used to capture the collision risk severity, and ML extracted vehicle trajectory features. After normalizing and dimensionality reduction of the vehicle trajectory dataset, Naive Bayes, Logistic Regression, and Gradient Boosting Decision Tree (GBDT) models were selected for traffic conflict prediction, and the experiments showed that the GBDT model outperforms two remaining models in terms of prediction accuracy, precision, false-positive rate (FPR) and Area Under Curve (AUC). The research findings of this paper help traffic management departments develop and optimize traffic control schemes, which can be applied to Intelligent Vehicle Infrastructure Cooperative Systems (IVICS) dynamic warning.

Keywords:

intelligent transportation; conflict prediction; extended time to collision; multi-lane weaving area; risk identification; micro-trajectory data

1. Introduction

1.1. Background

The multi-lane weaving area plays a vital role in providing access to expressways with an uninterrupted traffic flow. A bottleneck could occur if there is a crash in the multi-lane weaving area. Evidence showed that multi-lane weaving areas have a significantly higher crash risk in the expressway system regarding crash frequency and severity [1,2,3]. One possible reason for such high crash risk is that drivers have a different perception ability of risks, which results in various mandatory weaving decisions to get into the expressways [4,5,6]. Hence, it is important to investigate drivers’ weaving behavior and crash occurrence mechanisms to help enhance safety and prioritize the countermeasures of the multi-lane weaving areas.

1.2. Literature Review

1.2.1. Crash Risks Based on Historical Crash Data

Unsurprisingly, much attention and effort have been paid to exploring the crash occurrence mechanism from various roadway entities, with the major focus on highways [7,8], urban roads [9], intersections [10], and tunnels [11]. In particular, studies are conducted with the help of historical accident data to identify and rank hazardous locations, predict crash frequency and severity, and analyze influencing factors [12,13,14]. For instance, Li et al. (2021) developed Safety Performance Functions (SPFs) to predict crash frequencies for different contexts by using Georgia state-wide crash data and estimated Crash Modification Factors (CMFs) [15]. Sun (2016) presented a hybrid model combining a support vector machine model with a k-means clustering algorithm to predict the possibility of crashes, based on the crash data and traffic flow detection data of Shanghai expressways. The results showed that the accuracy of the accident prediction model could reach 78% [16,17]. In addition, for the factors influencing the crash, data unavailability due to some of the unobserved factors would also lead to the unobserved heterogeneity issue, which would potentially lead to biased estimation and predictions of the crash risk. Wang et al. (2019) developed two Bayesian logistic regression models to identify the significant variables in crash analysis. However, it may not be able to explain the specific impact of unobserved heterogeneity [18]. To improve the accuracy and consistency of parameter estimation and crash predictions, Waseem and Saeed (2019) explored the heterogeneous effects of independent factors across the crashes and the cross-correlations among the random parameter estimates with the help of historical crash data. The results can serve as a point of reference in developing countermeasures and design policies aimed at reducing crash frequency [19,20,21,22].

1.2.2. Crash Risks Based on Trajectory Data

With the advancement of traffic surveillance and image recognition, recent studies have gradually employed microscopic vehicle trajectory data. The trajectory data includes dynamic information, such as vehicle’s speed, position, and acceleration rate per 0.1 s [23,24,25]. Based on the data, a variety of surrogate safety measures based on traffic conflicts could be calculated to assess traffic safety. Although conflicts might not be identical to real traffic crash occurrences, the flexibility of data collection and the richness of microscopic information led to the widespread application of trajectory data [26,27]. The traffic conflict-based approach utilizes microscopic trajectory data and applies surrogate safety measures to establish the relationship between traffic safety and trajectory data. There are various surrogate safety measures, among which the TTC is the most commonly used one with the advantage of simple and reasonable meaning [28,29]. Oh and Kim (2010) used modified TTC to estimate the crash probability between two adjacent vehicles. According to previous studies, TTC is a proper measurement to reflect the crash risk [30]. Yang (2011) used modified TTC to evaluate the probabilistic risk of merging vehicles involved in a rear-end crash on freeway merging areas [31].

1.2.3. Driving Behavior and Risk in Weaving Areas

Existing research has conducted preliminary studies about the crash occurrence mechanism. However, previous literature is limited in the sense that the weaving area received little attention in road crash studies. In recent years, many researchers have shifted their research focus to urban expressway weaving areas [32,33,34]. Chen et al. (2016) presented an in-depth analysis of the weaving behaviors using an analytical model developed by a microscopic simulation method [35]. Wang et al. (2015) presented a multilevel Bayesian logistic regression model to study the crash risk for the following 5–10 min of weaving segments related to the mainline speed at the beginning of a weaving segment. The speed difference at the beginning and end of a weaving segment and the volume logarithm were analyzed [36]. Wang et al. (2021) considered conservative and radical driving behaviors to conduct quantitative analysis and evaluation of the confluence behavior of freeway on-ramp congested traffic. Based on the analysis of actual data, it is concluded that the interactive decision-making among autonomous vehicles should consider the social preferences of surrounding vehicles and filter out irrelevant information to improve their performance [37].

1.3. Challenges

As aforementioned, there are some limitations in both historical crash data and trajectory data, as shown in Table 1., although the findings from studies based on crash data are significant for a better understanding of the crash occurrence mechanism, it is hard to record an adequate number of crashes over a short period of time to conduct an effective safety evaluation, and the publicly available crash data are not reliable and accurate in many countries, including in China. As an alternative to data sources, video data has been widely used to collect vehicle trajectories. A traditional way to get the weaving segment’s videos is by installing a fixed camera on a high building or a high pole near the study segment. However, there are some limitations: (i) the required height is great while rare buildings are available for recording videos near the interchanges area; (ii) these videos are recorded at a tilt angle, which could cause more errors when processing the video data; and (iii) the shooting range of the camera is restricted by the height of building or pole.

Moreover, most of the previous studies emphasize conventional road sections, which may neglect the risk of weaving areas. The studies for the weaving area also mostly focused on driving behavior (e.g., lane change behavior, merging behavior, and diverging behavior) and road alignment analysis. However, few studies address the active fine risk management for weaving areas of expressways. Table 1 summarizes the research gap in those studies.

1.4. Objective

To address the above research gaps, this study aims to propose a model to identify and predict drivers’ crash risk during the entire weaving duration by considering the driver’s weaving behaviors. It is expected to provide advanced driving warnings for drivers in inter-weaving areas. To this end, a framework for traffic risk identification and conflict prediction in a weaving area is proposed by combining the weaving behaviors of vehicles with a safe approach based on TICs. To achieve the research goal, a UAV video data acquisition and processing approach is introduced to obtain vehicle trajectory data (e.g., speed, acceleration, following distance, etc.). It provides a better understanding of detailed and real driving behaviors [38,39]. By establishing the relationship between microscopic vehicle trajectory data and crash risk, the distribution of crash risk is obtained. Finally, factors that have a significant impact on drivers’ weaving behaviors and safety will be identified, and conflict prediction results are also provided based on the results.

The organization of this paper is as follows. The methodology section gives a framework for the identification and prediction of collision risk considering inter-weaving behaviors. Next, the data collection and processing steps are presented, followed by the section that presents the results and discussion of the model. Section five discusses the limitation of this study, and the last section concludes this paper and provides suggestions for future work.

2. Methodology

2.1. Research Framework

This study mainly applies the method of combining extended Time-to-Collision (TTC) theory and machine learning to predict traffic conflicts in the weaving area. Figure 1 presents the research framework of this study.

2.1.1. Video Data Processing

Multi-scale Kernelized Correlation Filters (KCF) optimization algorithm was utilized to extract the video data. The data were further cleaned by coordinate transformation, data validation, and errors elimination. We obtained (1) vehicle behavioral information such as frame ID, time ID, vehicle ID, and vehicle centroid coordinates, and (2) traffic operation information such as speed, acceleration, speed angle, and following distance of the corresponding vehicles.

2.1.2. Traffic Risk Identification

Based on the microscopic trajectory information and two-dimensional TTC theory, the vector positions and velocity can be processed. We construct a driving risk discrimination model applicable to the special geometric configuration of the weaving area. In addition, we classified the characteristic data obtained from potential risks, which are used as the input to the prediction model.

2.1.3. Traffic Risk Prediction

We evaluated the impact of major model structure parameters (i.e., data dimensions) on the prediction results. We found that Naive Bayes and Logistic Regression classical machine learning methods based on statistical principles are widely used in text classification, intrusion detection, fault diagnosis, and other fields. They have the advantages of simple form, stable performance, and robustness. In terms of traffic data, they can explain the discrete multi-dimensional data information of traffic vectors such as speed and acceleration. The Gradient Boosting Decision Tree (GBDT), a more popular integrated learning model in the past two years, has the characteristics of high complexity and strong parameter randomness, can flexibly handle various mixed data without doing complex feature engineering and feature transformation, and has powerful prediction capability. Therefore, we used the three models mentioned above for training and testing according to the classification labels that characterize the risk state. Eventually, we aim to achieve crash risk prediction in the weaving area based on comprehensive evaluation results with the best prediction model.

• Naive Bayes

The Naive Bayes is one of the classical supervised learning classification models, which is based on the Bayesian network model proposed by Pearl [40]. It is widely used in the fields of text classification and intrusion detection, with the advantage of stable performance. The model is trained by discrete-time data for effective prediction of traffic events. The Naive Bayes predicts the classification through feature probabilities [41]. The collected data contain independent variables

{\hat{X}}_{N}

of different dimensions, and the corresponding trajectory data

{\hat{X}}_{N, q}

in

{\hat{X}}_{N}

is labelled with a category

π_{q} = \{c_{1}, c_{2}, \dots, c_{k}\}

,

c_{k}

is the risk category, and

k

is the

k_{t h}

category of risk, which is taken as

π_{q} = \{0, 1\}

, indicating that it can be classified into two categories. Thus, we build a training set

U = \{({\hat{X}}_{N, 1}, π_{1}), ({\hat{X}}_{N, 2}, π_{2}), \dots, ({\hat{X}}_{N, q}, π_{q}), \dots, ({\hat{X}}_{N, Q}, π_{Q})\}

to train the model and obtain the characteristic probabilities to classify the sample categories. The following describes the modeling process:

We calculate the a priori probability

P (Y = c_{k})

and the conditional probability

P ({\hat{X}}_{N, q} = σ_{i j} | Y = c_{k})

of the potential collision risk.

P (Y = c_{k}) = \frac{\sum_{i = 1}^{Q} I (π_{q} = c_{k})}{Q}

(1)

P ({\hat{X}}_{N, q} = σ_{i j} | Y = c_{k}) = \frac{\sum_{i = 1}^{Q} I ({\hat{X}}_{N, q} = σ_{i j} \cap π_{q} = c_{k})}{\sum_{i = 1}^{Q} I (π_{q} = c_{k})}

(2)

where

I (\cdot)

is the Bayesian function of the potential risk,

{\hat{X}}_{N, q} = ({\hat{x}}_{1, q}, {\hat{x}}_{2, q}, \cdot \cdot \cdot, {\hat{x}}_{l, q}, \cdot \cdot \cdot, {\hat{x}}_{ϒ_{N}, q})

.

Using the

q_{t h}

trajectory data as an example, we can calculate the posterior probability

P (Y = c_{k} | X = {\hat{X}}_{N})

of whether a potential collision risk exists for a particular data

{\hat{X}}_{N, q} = ({\hat{x}}_{1, q}, {\hat{x}}_{2, q}, \cdot \cdot \cdot, {\hat{x}}_{l, q}, \cdot \cdot \cdot, {\hat{x}}_{ϒ_{N}, q})

.

P (Y = c_{k} | X = {\hat{X}}_{N}) = P (Y = c_{k}) \prod_{i = 1}^{j} P (π_{q} = {\hat{x}}_{l} | Y = c_{k})

(3)

Based on the risk discrimination results of the preceding example data

{\hat{X}}_{N, q}

, it is possible to predict whether a vehicle is potentially at risk of collision under the influence of a combination of location, traffic, and other circumstances.

π_{q} = argmax P (Y = c_{k}) \prod_{i = 1}^{j} P (π = {\hat{x}}_{l} | Y = c_{k})

(4)

To avoid the possibility that the great likelihood estimate may have a probability value

λ

to be estimated at zero, which may affect the calculation of the posterior probability and bias the classification, we use Bayesian estimation to calculate the specific probability value,

λ

is taken as 1.

P ({\hat{X}}_{N, q} = σ_{i j} | Y = c_{k}) = \frac{\sum_{i = 1}^{Q} I ({\hat{X}}_{N, q} = σ_{i j} \cap π_{q} = c_{k}) + λ}{\sum_{i = 1}^{Q} I (π_{q} = c_{k}) + λ}

(5)

• Logistic Regression

Compared with the Naive Bayes, Logistic Regression [42] is a discriminative model, which is more suitable for describing the interaction between the complex influencing factors of traffic conflicts in weaving areas, explaining the discrete characteristics of multidimensional data information of traffic vectors such as speed and acceleration. It is more advantageous in terms of model generalization and robustness. Therefore, the use of Logistic Regression to build a dynamic prediction model for traffic risk in multi-lane weaving areas can well solve the problem of dichotomous risk prediction arising in traffic risk prediction. Based on the training set samples of the weaving area conflict prediction, the Sigmoid function is chosen to realize the non-linear transformation. Two types of sample decision boundaries

π_{q} = \{0, 1\}

are established according to the polynomial characteristics to obtain the traffic conflict prediction results. The modeling steps involve:

Using the

q_{t h}

trajectory data as an example, we can calculate the conditional probability

P (π_{q})

of the potential collision risk of the vehicle as

P (π_{q}) = \frac{\exp [g (π_{q})]}{1 + \exp [g ({\hat{X}}_{N, q})]}

(6)

The Logistic Regression model

g (π_{q})

for the instance-specific data

{\hat{X}}_{N, q}

is derived as

g (π_{q}) = \ln \frac{P (π_{q})}{1 - P (π_{q})} = β {\hat{X}}_{N, q} + ε = β_{0} + β_{1} {\hat{x}}_{1, q} + β_{2} {\hat{x}}_{2, q} + \dots + β_{r N} {\hat{x}}_{r N, q} + ε_{r N}

(7)

where

{\hat{X}}_{N, q}

is the

q_{t h}

trajectory data,

β

is the parameter value of the corresponding independent variable for the

q_{t h}

data,

β_{0} ~ β_{r N}

is the corresponding value of the independent parameter, and

ε_{r N}

is the fixed error term of the normal distribution.

Adding random parameters

{\overset{´}{β}}_{i}

. Taking into account the unobserved heterogeneity of the complex behavior of different vehicles and the multidimensional nature of the observed data, we introduce a stochastic parameter in Equation (7) to more precisely characterize the stochastic nature of the potential risk of vehicles traveling through the weaving area, namely

{\overset{´}{β}}_{i} = β_{i} + ω_{i}, ω_{i} ~ N (0, σ_{i}^{2})

(8)

where

{\overset{´}{β}}_{i}

is a random parameter that varies with the vehicle,

β_{i}

is a fixed parameter for the vehicle

i_{t h}

,

ω_{i}

is a random distribution term for

N (0, σ_{i}^{2})

and

σ_{i}^{2}

is a normal distribution form parameter, which is taken as 1.

• Gradient Boosting Decision Tree

Risk identification and prediction are non-linear and complex problems, and linear models or single models may ignore the influencing factors of risk discriminations. Integrated learning achieves better risk prediction capability by combining multiple learners to work together to form an integrated learner with stronger learning performance, and by fusing trained models in different ways. In this paper, the implicit relationship between risk and explanatory variables is deeply explored by the Gradient Boosting Decision Tree (GBDT).

GBDT is a classification regression tree integration algorithm proposed by Freiman [43], which is based on the principle of learning sample paths to achieve classification by finding the best segmentation features and is a high model complexity and parameter randomization learner. Based on the Gradient Boosting learning strategy, GBDT creates a new weak decision tree at each iteration by minimizing its loss function

L (Θ, f ({\hat{x}}_{ϒ_{N}, q}))

in the direction of the gradient that reduces the residuals, and finally sums up the conclusions of all trees to obtain the final prediction results. The modeling process is as follows:

Initializing the learner

f_{0} ({\hat{x}}_{ϒ_{N}, q}) = argmin \sum_{i = 1}^{r N} L (Θ, Γ)

(9)

where

Γ

is the constant value estimated to minimize the loss function, it is a tree with only one root node,

L (Θ, Γ)

is the loss function. In the GBDT, the loss function used for the regression problem is the mean square error loss function.

L (Θ, f ({\hat{x}}_{ϒ_{N}, q})) = {(Θ - f ({\hat{x}}_{ϒ_{N}, q}))}^{2}

(10)

For the number

m = 1, 2, \dots, M

of iterative rounds, we can calculate the negative gradient

r_{m i}

as

r_{m i} = - {[\frac{\partial L (Θ, f ({\hat{x}}_{ϒ_{N}, q}))}{\partial f ({\hat{x}}_{ϒ_{N}, q})}]}_{f ({\hat{x}}_{ϒ_{N}, q}) = f_{m - 1} ({\hat{x}}_{ϒ_{N}, q})} = Θ - f ({\hat{x}}_{ϒ_{N}, q})

(11)

Based on all the samples and their negative gradient directions

({\hat{x}}_{ϒ_{N}, q}, r_{m i})

i = 1, 2, \dots, N

, we obtain a decision tree consisting of one leaf node, whose corresponding leaf node regions are

R_{m j}

,

j = 1, 2, \dots, J

, and the best residual fit value of each leaf node is

Γ_{m j} = argmin \sum_{{\hat{x}}_{ϒ_{N}, q} ϵ R_{j}} L (Θ, f_{m - 1} ({\hat{x}}_{ϒ_{N}, q}) + Γ)

(12)

The learner is obtained as

f_{m} ({\hat{x}}_{ϒ_{N}, q}) = f_{m - 1} ({\hat{x}}_{ϒ_{N}, q}) + \sum_{j = 1}^{J} Γ_{m j} I, {\hat{x}}_{ϒ_{N}, q} ϵ R_{m j}

(13)

I ({\hat{x}}_{ϒ_{N}, q} ϵ R_{m j}) = \{\begin{matrix} 1, {\hat{x}}_{ϒ_{N}, q} ϵ R_{m j} \\ 0, {\hat{x}}_{ϒ_{N}, q} \notin R_{m j} \end{matrix}

(14)

After

M

rounds of iterations, the final decision model is obtained as

f ({\hat{x}}_{ϒ_{N}, q}) = f_{M} ({\hat{x}}_{ϒ_{N}, q}) = Γ + \sum_{m = 1}^{M} \sum_{j = 1}^{J} Γ_{m j} I, {\hat{x}}_{ϒ_{N}, q} ϵ R_{m j}

(15)

2.2. Identification of Traffic Risks

2.2.1. Basic Time-to-Collision

Time-to-Collision (TTC) in Traffic Conflict Techniques (TCTs) has been widely used as an effective safety indicator in estimating the individual vehicle collision risk [44,45]. TTC is defined as the time required for two vehicles to collide if they continue at their present speed and along the same path [46]. The TTC of a following vehicle at time step with respect to the leading vehicle (

i - 1

) can be calculated as Equation (16):

T_{i} (t) = \{\begin{matrix} \frac{X_{i - 1} (t) - X_{i} (t) - L_{i}}{V_{i} (t) - V_{i - 1} (t)}, V_{i} (t) > V_{i - 1} (t) \\ + \infty, V_{i} (t) \leq V_{i - 1} (t) \end{matrix}

(16)

where

X_{i} (t)

and

X_{i - 1} (t)

denote the position of the vehicle

i

and

i - 1

at time

t

,

V_{i} (t)

and

V_{i - 1} (t)

are the speed of the vehicle

i

and

i - 1

at time

t

, and

L_{i}

is the length of vehicle

i

.

T_{i} (t)

is TTC. According to the definition, a larger TTC could provide more time for a driver to avoid a collision, which results in a smaller probability of a traffic collision. In contrast, a smaller TTC could result in a higher probability of risks [47,48].

2.2.2. Extended Time-to-Collision

The calculation of TTC usually describes the car-following behavior as a one-dimensional problem [49]. For the multi-lane weaving area, drivers need to merge in or out of the arterials in the weaving area to reach the target lane within a limited distance, resulting in constrained driving conditions and frequent abnormal behaviors, such as rapid acceleration, rapid deceleration, forced lane change, etc. At the same time, because it is generally a stretching segment in the merging and diverging areas, and there is a certain angle between the stretching segment and the main lanes, the conventional TTC method is not fully applicable to the conflict analysis in multi-lane weaving areas. To solve this issue, we introduce a more widely applicable two-dimensional extended TTC theory [50] to analyze the complex behavioral conflicts in multi-lane weaving areas and to more accurately measure the vehicle driving risk in weaving areas (see Figure 2).

Assume that the vehicles’ centroid position vectors of the leading vehicle

i

and the following vehicle

j

are

O_{i}

and

O_{j}

, and the distance between the vehicles’ centroids is

D_{i j}

. The position vectors of the nearest point of the two vehicles are

C_{i}

and

C_{j}

, and the distance between the nearest points is

d_{i j}

. The vector position of the two vehicles can be expressed as

(O_{i}, O_{j}, C_{i}, C_{j})

, and the vector speed can be expressed as

(V_{i}, V_{j})

.

Since the collision points are at the outer edge of the vehicles, so the collision first occurs at the closest points

C_{i}

and

C_{j}

of the front and rear vehicles. The distance

d_{i j}

between the nearest points is given in Equation (17):

d_{i j} = \sqrt{{(C_{i} - C_{j})}^{⊤} (C_{i} - C_{j})} d_{i j} = \sqrt{{(C_{i} - C_{j})}^{⊤} (C_{i} - C_{j})}

(17)

Then,

d_{i j}^{2}

is calculated as

d_{i j}^{2}

:

d_{i j}^{2} = ∥ C_{i} - C_{j} ∥^{2} = {(C_{i} - C_{j})}^{⊤} (C_{i} - C_{j}) d_{i j}^{2} = ∥ C_{i} - C_{j} ∥^{2} = {(C_{i} - C_{j})}^{⊤} (C_{i} - C_{j})

(18)

We differentiate both sides of Equation (18) to obtain:

d_{i j} d_{i j}^{'} = {(C_{i} - C_{j})}^{⊤} (V_{i} - V_{j})

(19)

where

d_{i j}^{'}

is the first derivative of the interval distance

d_{i j}

, the change rate of the interval distance. Therefore, it can be concluded that the relative speed of the collision between two vehicles is

d_{i j}^{'}

, called the vehicle approach rate.

Based on Equation (19), the first derivative of the interval distance between vehicle

i

and

j

is:

d_{i j}^{'} = \frac{1}{d_{i j}} {(C_{i} - C_{j})}^{⊤} (V_{i} - V_{j}) d_{i j}^{'} = \frac{1}{d_{i j}} {(C_{i} - C_{j})}^{⊤} (V_{i} - V_{j})

(20)

Given the calculation for TTC is based on a fixed time step of 0.1 s [26] in the present work, we assume that the approach rate between the two cars is constant. According to the definition of TTC,

d_{i j} + d_{i j}^{'} \cdot T_{i} = 0 d_{i j} + d_{i j}^{'} \cdot T_{i} = 0

, the TTC when two vehicles collide is:

T_{i} = - \frac{d_{i j}}{d_{i j}^{'}} T_{i} = - \frac{d_{i j}}{d_{i j}^{'}}

(21)

The positions of vehicles extracted from the video are the centroid positions

O_{i}

and

O_{j}

, so its centroid distance

D_{i j}

is greater than the nearest point distance

d_{i j}

[26], In this study, we consider the size of vehicles

i

and

j

, it is assumed that the difference between

d_{i j}

and

D_{i j}

is the sum of 1/2 of the length of the leading vehicle (

L_{i}

) and 1/2 of the length of the following vehicle (

L_{j}

). Then,

d_{i j}

can be calculated as:

d_{i j} = D_{i j} - 0.5 L_{j} - 0.5 L_{i}

(22)

and

D_{i j}

can be calculated as:

D_{i j} = \sqrt{{(O_{i} - O_{j})}^{⊤} (O_{i} - O_{j})}

(23)

Assuming that the approach rate

D_{i j}^{'}

of the centroid of two vehicles is equal to the approach rate

d_{i j}^{'}

of the nearest point of two vehicles, then:

D_{i j}^{'} = d_{i j}^{'} = \frac{1}{D_{i j}} {(O_{i} - O_{j})}^{⊤} (V_{i} - V_{j})

(24)

In summary, the two-dimensional TTC risk discrimination model is obtained as:

T_{i} = - \frac{d_{i j}}{d_{i j}^{'}} = - \frac{D_{i j} - 0.5 L_{i} - 0.5 L_{j}}{\frac{1}{D_{i j}} {(O_{i} - O_{j})}^{⊤} (V_{i} - V_{j})} = - \frac{\sqrt{{(O_{i} - O_{j})}^{⊤} (O_{i} - O_{j})} - 0.5 L_{i} - 0.5 L_{j}}{\frac{1}{\sqrt{{(O_{i} - O_{j})}^{⊤} (O_{i} - O_{j})}} {(O_{i} - O_{j})}^{⊤} (V_{i} - V_{j})}

(25)

In order to describe the fluctuation of the velocity direction before the collision, the velocity angle

θ

of the current velocity direction and the horizontal velocity direction

θ

is given in Equation (26):

θ = \arctan (- \frac{v_{y}}{v_{x}})

(26)

where

v_{x}

and

v_{y}

are the velocities in

x

direction and

y

direction, respectively.

Finally, we compare the TTC threshold (T*) with the TTC to determine whether the vehicle is operating with potential crash risk [4].

(1): If TTC≤T*, $π_{q}$ =1, there is a potential collision risk;
(2): If TTC>T*, $π_{q}$ =0, there is no risk.

In this paper, T* is set to 4 s, the time step is set to 0.1 s [26].

2.3. Traffic Conflict Prediction

To objectively evaluate the model performance without ignoring the potential information of different dimension indicators, and reduce the interference of sample data size and uniformity, we use the extreme value method to normalize the sample data after data pre-processing. The dimensions of the obtained data are reduced by stepwise regression. Additionally, 80% of the sample data is randomly selected as the training dataset, and the rest is the testing dataset. We input the divided dataset into the traffic conflict prediction model for the complex weaving area, including the Logistic Regression, Naive Bayes, and GBDT models.

{\hat{X}}_{N} = ({\hat{X}}_{N, 1}, {\hat{X}}_{N, 2}, \cdot \cdot \cdot, {\hat{X}}_{N, q}, \cdot \cdot \cdot, {\hat{X}}_{N, Q})

(27)

In Equation (27),

{\hat{X}}_{N, q}

is the extracted

q th

trajectory detection data,

{\hat{X}}_{N, q} = ({\hat{x}}_{1, q}, {\hat{x}}_{2, q}, \cdot \cdot \cdot, {\hat{x}}_{l, q}, \cdot \cdot \cdot, {\hat{x}}_{ϒ_{N}, q})

;

ϒ_{Ν}

is the independent variable of different dimensions,

N

represents the sample dimension, including velocity, acceleration, etc.;

{\hat{X}}_{l, q}

is the

l

_th element value of the

q

_th sample,

{\hat{x}}_{l, q} \in \{σ_{l 1}, σ_{l 2}, \dots, σ_{l j}\}

;

σ_{l j}

is the

l

_th possible value of the

j

_th feature, that is, the value of attributes such as position and speed;

Q

is the sample size of the filtered trajectories dataset.

3. Data

3.1. Data Introduction

We selected the representative Jiangnan weaving area of Chongqing Inner Ring Expressway for research (see Figure 3). The Inner Ring Expressway is a major traffic corridor in Chongqing, and the Jiangnan weaving area is the main bottleneck of the Inner Ring. Vehicles in four lanes of arterials (the innermost lane is a directional lane with stable vehicle speed and the following traffic is evenly distributed) and two lanes of ramps involved in the weaving lane change, the length of the weaving area is about 70 m. At the same time, the maximum number of lanes that can be used by intertwined vehicles in the A-type weaving area is the most restricted. The maximum number of lanes that can be used by intertwined vehicles is 1.4 lanes, even though the implementation of the restriction of trucks and motorcycles control, in peak hours the number of lanes in the weaving area is high, the weaving distance is short, and the function of transforming traffic between several groups is undertaken.

Data collection was initiated during peak hours on a sunny weekday morning. The UAV was set up at a height of 120 m above the weaving area to collect video and extract vehicle trajectory. The UAV shoots 4K ultra-high-resolution video at a frame rate of 30 frames per second. As shown in Figure 3a, the green box represents the current tracking vehicle, and the green line is the vehicle trajectory. The vehicle trajectory extraction process is shown in Figure 3b. The specific data processing process is described in the following part

3.2. Data Processing

3.2.1. Feature Detection and Tracking

Video target tracking technology belongs to the popular research field of computer vision. Firstly, we perform data model analysis based on video image sequences and solve to get the corresponding position of the target tracking in each image frame. Then, the corresponding data features of the target location obtained from the current image are correlated with the continuous video coordinate positions, and the corresponding relationship of the motion target is established in the continuous video sequence to achieve target tracking.

According to the characteristics of video samples, this paper adopts the classical KCF (Kernel Correlation Filter) algorithm developed by OPEN CV in video target tracking technology. Furthermore, we optimize the algorithm for complex environments such as the existence of occlusion in the weaving area, complex background, small target spacing, multiple similar targets, light changes, etc. The main idea of optimization replaces the original fixed-scale detection window with a multi-scale window for tracking and obtains multiple groups of Histogram of Oriented Gradients (HOG) features through iterative calculation of the multi-scale window (enlarged/normal/reduced detection scale), to reduce the impact of targets caused by environmental interference and scale changes.

Based on the multi-scale KCF optimization algorithm for automatic vehicle tracking, we extract microscopic vehicle trajectories. The algorithm can automatically detect frame by frame and pixel by pixel by framing the tracking target, and extracting and recording the vehicle position and status information.

To verify the accuracy of the automatic tracking algorithm, we randomly sampled the position data of 800 vehicles as the validation set. The manually framed vehicle coordinate information is used as the actual value M_(x,y), and the predicted position M′_(x,y) of the automatic tracking algorithm is paired by frame to calculate the distance D of the anchor frame center (center of mass) position. We evaluate the accuracy of the algorithm in terms of the mean relative error (MAE). The results show that the position error of the automatic tracking algorithm in this paper is only 3.32%, and the robustness and accuracy of the algorithm can meet the detection requirements. Meanwhile, to ensure the integrity of the vehicle trajectory, when the automatic tracking lost the target manually calibrated and re-detected, the complete trajectory of 2912 vehicles was finally obtained.

D = \sqrt{{(M_{x} - M_{x}^{'})}^{2} + {(M_{y} - M_{y}^{'})}^{2}}

(28)

3.2.2. Error Elimination

Although the UAV can ensure its stability by combining GPS, inertial measurement elements, geomagnetic compass, and other positioning technologies, it still encounters certain uncertainties that interfere with the acquisition of images of data in the actual environment. Therefore, to avoid the video image jitter when the camera is slightly shaken or offset, this paper first extracts the feature points based on the FAST algorithm. Then fast scene matching is performed based on FREAK features to estimate the geometric transformation parameters. Finally, video correction is performed by affine transformation to achieve a smooth trajectory and eliminate tracking noise. The impact of noise on TTC calculation is reduced.

To ensure the accuracy of the video anti-shake algorithm, we hovered the UAV at 120 m directly above the road to shoot vertically downward, while the testers held a radar speed meter to detect and record the speed of each vehicle passing through the test section. The speed data of 116 groups of different running states were randomly selected, and Figure 4 shows the comparison of vehicle speed detection data. It can be seen that the vehicle speed extracted by the multi-scale KCF-optimized automatic vehicle tracking algorithm basically matches the detection results of radar speed measurement data, and the vehicle speed can be accurately detected.

Figure 5 shows the error distribution with the radar test data, and it can be seen that the overall distribution of the error conforms to a normal distribution, with a mean error value of −0.39 km/h (−0.11 m/s). Among them, the 25% quantile error value is about −2.3 km/h (−0.64 m/s) and the 75% quantile error value is about 1.4 km/h (−0.39 m/s), which verifies the accuracy of the vehicle trajectory data extraction method in this paper.

Based on the Multi-scale Kernel Correlation Filter (KCF) algorithm developed by OPEN CV, after error elimination and verification, feature detection, and vehicle tracking are conducted, we obtain a total of 363,600 microscopic trajectory data. The time accuracy is 0.1 s (10 Hz), and the position accuracy is 0.1 m (0.1 m px⁻¹). On this basis, velocity, acceleration, following distance, density, coordinates, and direction of vehicle lane change position are calculated to provide data support for model construction and parameter setting and validation, as shown in Table 2.

4. Results and Discussion

4.1. Analysis of Potential Risk Identification Results

4.1.1. Analysis of Results in Different Merging Directions

Figure 6 distributes the vehicles’ T* along different lanes. It is worth noting that the northernmost lane of the weaving area in this study is a directional lane, which can be seen without analyzing the potential collision risk:

(1): There are significant differences in the T* distribution of different lanes, the risk distribution of the inner and outer lane 1 and ramp 2 is relatively concentrated, and the risk distribution of the middle 3 lanes is relatively discrete, indicating that the inner and outer lane has less interwoven participation and high traffic priority, which is consistent with actual operation characteristics;
(2): From upstream to downstream T* distribution trend, trajectory gradually evolved from stable continuous disorder, which shows that the shunt demand will lead to vehicle conflict and interference, thus forming traffic bottleneck, congestion and spread to upstream, and from the weaving area close to the downstream, traffic operation state gradually stable, eventually reflects the closer to the downstream, the smaller the potential collision duration gradient phenomenon;
(3): Overall, the potential risks of continuous crossing lanes can be accurately recorded, and the adventure process of individual vehicles entering the directional lane can be accurately captured, indicating that the model can better describe the characteristics of vehicle conflict in the weaving area and verify the effectiveness of the risk discrimination model in this paper.

In general, the multi-lane weaving area may be passively adventurous. However, after most vehicles enter the target lane 1–2 times, they can eliminate potential risks. Therefore, traffic engineers can consider setting up variable message signs upstream or warn vehicles to orderly change lanes in advance to reduce the frequency of lane-changing in weaving areas based on emerging technologies such as Intelligent Vehicle Infrastructure Cooperative Systems (IVICS), thus reducing the driving risk.

4.1.2. Analysis of Results from Different Zones

In order to accurately explore the differences in traffic conflicts in each zone of the weaving area, we adopt the idea of zonal modeling [51], and use HALCON software to delineate, crop, and extract the Region of Interest (ROI) area based on the traffic operation characteristics such as vehicle behavior and geometric conditions. Finally, we divide the weaving area into 6 zones (

Z_{1} - Z_{6}

) along the driving direction (see Figure 7). In addition, to further analyze the degree of aggregation of the risk of lane change, a heat map of the density of the risk of lane change after normalization is drawn, as shown in Figure 8. We use the following three definitions to describe traffic conflicts. (i) Potential collision per-unit area: the ratio of the volume of conflicts per zone to the area of each zone; the larger the ratio, the higher the risk of potential collisions. (ii) Risk density: we normalize the risk density of lane change to a heat map of 0–10 degrees; the higher the level, the darker the color, and the higher the risk. (iii) The ratio of TTC*: the ratio of the number of TTC* in each zone with potential conflicts to volume; the larger the ratio, the higher the risk of potential collisions.

(1): In terms of lanes, the potential risk density of the inner and outer lane is greater than the middle lane, because the inner and outer speed is higher than the middle area. Stable speed, even car-following, low-frequency lane-changing do not necessarily indicate a weak risk, but collaborative “spacing-speed” is the key to safe driving;
(2): In terms of zoning, $Z_{2} - Z_{3}$ are the risk heartland. The risk radiates outward in $x$ direction and gradually weakens. In addition, the risk associated with the merging of vehicles in $Z_{5} - Z_{6}$ cannot be ignored;
(3): Interestingly, an accident occurred at the location of coordinates (800, 550) during our data collection process (see Figure 8b). Figure 8a shows that this location has the highest conflict density, which directly reflects the effectiveness of our risk identification model.

Distinct characteristics between the zones lead to different severity of conflicts in different zones. We calculated the ratio of

Z_{1} - Z_{6}

based on their area sizes, and the ratio of TTC* based on the number of conflicts per unit zone to accurately describe the conflict severity in different zones. Table 3 and Figure 9 present the comparison results of conflict severity in

Z_{1} - Z_{6}

:

(1): The conflict severity of each zone is ranked as $Z_{3}$ , $Z_{2}$ , $Z_{1}$ , $Z_{5}$ , $Z_{4}$ , $Z_{6}$ . Among them, $Z_{2} - Z_{3}$ (23.59~24.97) unit conflict is the highest, which is about 2–4 times the conflict risk probability of other zones;
(2): The zones ratio of $Z_{1}$ , $Z_{5}$ is not much different from their conflict ratio. However, due to a large number of vehicles in $Z_{5}$ that need to merge into the arterials, the local potential collision risk density is significantly greater than $Z_{1}$ (see Figure 8a), which reflects the weaving conflict characteristics that are different from the basic road sections, and also reflects the model can better describe the distribution characteristics of vehicle conflict density.

In general, except for

Z_{2} - Z_{3}

with serious conflicts, it is also essential for early warning, prevention, and control of zones close to

Z_{1}

,

Z_{4}

. In addition, due to the need to yield to the arterial vehicles in

Z_{5} - Z_{6}

, it is difficult to change lanes and the risks are concentrated. In the future, according to the risk distribution characteristics of each zone, combined with vehicle networking technology, we can consider and collaborate with the requirements of speed, distance, merging, and diverting to achieve the goal of safe and efficient traffic.

4.2. Analysis of Traffic Conflict Prediction Results

4.2.1. Definition of Variables

We selected 22 candidate independent variables from two aspects (i.e., parameters of vehicle and zones) based on the characteristics of vehicles and traffic flow obtained from the multi-lane weaving area. It is likely that some indicators with a very low degree of influence on the dependent variable among these 22 independent variables, and there may also be autocorrelation effects among the independent variables, which will lead to the instability of the coefficients of the subsequent model, and even the situation of absurd conclusions. In other words, the existence of multicollinearity will make the model far from the actual estimations, and such a model will not be able to well explain the changes of the dependent variable [52,53]. Therefore, to improve the model prediction accuracy, we use T* as the dependent variable and perform a multicollinearity diagnosis on 22 candidate independent variables to eliminate the influence of multicollinearity of the independent variables and screen out the independent variables with strong autocorrelation with each other.

The methods of multicollinearity diagnosis are Tolerance and Variance inflation factor (VIF). The most commonly used one is VIF, which is calculated as

V I F = \frac{1}{1 - R_{i}^{2}}

(29)

where,

R_{i}

is the negative correlation coefficient of the independent variables on the remaining independent variables for regression analysis.

The larger the coefficient VIF, the greater the possibility of co-collinearity between the independent variables. When

V I F < 10

, there is no multicollinearity; when

10 \leq V I F < 100

, there is strong multicollinearity; when

V I F \geq 100

, there is severe multicollinearity. By establishing regression equations for the selected variables and testing them one by one, five variables,

y_{i} (t)

,

{\overline{θ}}_{i}

,

{\overline{a}}_{i}

,

θ_{k, S}

, and

{\overline{v}}_{k},

were finally excluded. At this point, the remaining 17 variables are the input variables of the model (see Table 4).

4.2.2. Validity Test of Traffic Conflict Prediction Model

To objectively evaluate the performance of the prediction model, this study first normalizes the data to reduce the interference of data volume, sample uniformity, and magnitude. The confusion matrix is selected to measure the effectiveness of the model.

From Figure 10 and Table 5, it can be seen that the Naive Bayes model performs the worst in the four aspects of Accuracy, Precision, Sensitivity, and False Positive Rate (FPR), probably due to the unbalanced distribution of the risk data samples; the Naive Bayes model was not sufficient to portray the correlation characteristics and distribution among the data, which was reflected in the lower prediction accuracy, precision, and higher FPR. The Logistic Regression model performs slightly better than the Naive Bayes when faced with such complex samples. The GBDT model is the best with improved performance by 20.24%, 48.74%, 3.73%, and 23.69%, respectively, in terms of accuracy, precision, sensitivity, and FPR, relative to the Naive Bayes. Relative to the Logistic Regression, the distribution improved by 8%, 19.84%, 35.56%, and 2.26% based on the same performance indicators. It shows that the GBDT model can effectively improve the shortcomings of risk and conflict prediction in the weaving area which cannot be easily predicted. GBDT has the potential to be applied to TTC-ML risk identification and conflict prediction in the weaving area.

This paper further constructs an integrated evaluation index AUC (Area Under Curve, which is used to comprehensively measure the prediction model’s merits) based on sensitivity and FPR, the larger the AUC, the better the model performance. As shown in Figure 11, overall, the AUC of all three prediction models is greater than 0.8, with the AUC of the GBDT model being as high as 0.99, indicating that the TTC-ML risk identification and prediction model for the multi-lane weaving area constructed in this paper has a better prediction performance. Compared with the Naive Bayes model and Logistic Regression model, the AUC of the GBDT model was improved by 15% and 11% respectively, showing better prediction capability.

5. Limitation

This study has several limitations. The crash risk identification is affected by many factors that cannot be collected in this study. Data unavailability for some of the unobserved factors (i.e., mode of person when driving) would also lead to unobserved heterogeneity issue, which would potentially lead to biased estimation and predictions of the crash risk. Future work that incorporates more behavioral data should be proposed, and more advanced regression models (i.e., random parameters models) will be sought to improve the prediction power. In addition, deep learning and computer vision-based methods can be applied to further analyze driving risks under complex driving environments, which can help realize the achievement of real-time traffic risk prediction-one of our future research directions.

6. Conclusions

This study aims to identify and analyze the collision risk and conflict prediction in the multi-lane weaving area. The analysis was based on high-precision vehicle trajectory data in the weaving area collected by the UAV. To achieve detailed evaluations, partition modeling was used to divide the multi-lane weaving area into six zones, such as upstream, downstream, and weaving influence areas. To investigate the effects of vehicle and decision behaviors, the two-dimensional TTC theory was proposed to construct a traffic risk discrimination model applicable to the special geometric configuration of the weaving area. Finally, the TTC-ML model was established to predict the collision risk based on Naive Bayes, Logistic Regression, and GBDT model.

The results showed that crash risk can be avoided after the vehicles change lanes in the weaving area one to two times, and the potential collision risk duration gradually decreased as the vehicles get closer downstream. In addition, the prediction accuracy of the GBDT model was 95.1%, much greater than the Naive Bayes and Logistic Regression models (74.86% and 87.10%, respectively), and the AUCs were 0.99, 0.84, and 0.88, respectively, indicating that the GBDT model had better prediction performance.

Overall, this study successfully verified the possibility of combining two-dimensional TTC and machine learning methods for crash risk identification and prediction, and the model has good adaptability to different evaluation indicators. It can effectively characterize the strong lane-changing conflict in the weaving area, depict the complex risks in the weaving area, and evaluate the actual vehicle driving safety in the weaving area. This study lays a theoretical foundation for exploring the mechanism of vehicle-road coordination and congestion in the weaving area.

Author Contributions

Conceptualization, Y.Q. and J.X.; methodology, Y.X. and J.X.; software, X.L.; validation, J.X.; formal analysis, Y.X.; investigation, J.X.; writing—original draft preparation, J.X., X.L. and Y.X.; writing—review and editing, J.X., X.L., and Y.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key Research and Development Program of China (2018YFB1600500) and the National Natural Science Foundation of China (71861016).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are available from the corresponding author upon request.

Conflicts of Interest

The authors declare no conflict of interest.

References

Golob, T.E.; Recker, W.W.; Alvarez, V.M. Safety aspects of freeway weaving sections. Transp. Res. Part A-Policy Pract. 2004, 38, 35–51. [Google Scholar] [CrossRef] [Green Version]
Hidas, P. Modelling vehicle interactions in microscopic simulation of merging and weaving. Transp. Res. Part C-Emerg. Technol. 2005, 13, 37–62. [Google Scholar] [CrossRef]
Mao, X.; Yuan, C.; Gan, J.; Zhang, S. Risk factors affecting traffic accidents at urban weaving sections: Evidence from China. Int. J. Environ. Res. Public Health 2019, 16, 1542. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Gu, X.; Abdel-Aty, M.; Xiang, Q.; Cai, Q.; Yuan, J. Utilizing UAV video data for in-depth analysis of drivers’ crash risk at interchange merging areas. Accid. Anal. Prev. 2019, 123, 159–169. [Google Scholar] [CrossRef]
Yuan, J.; Abdel-Aty, M.; Cai, Q.; Lee, J. Investigating drivers’ mandatory lane change behavior on the weaving section of freeway with managed lanes: A driving simulator study. Transp. Res. Part F-Traffic Psychol. Behav. 2019, 62, 11–32. [Google Scholar] [CrossRef]
Sun, J.; Zuo, K.; Jiang, S.; Zheng, Z. Modeling and predicting stochastic merging behaviors at freeway on-ramp bottlenecks. J. Adv. Transp. 2018, 2018, 9308580. [Google Scholar] [CrossRef] [Green Version]
Xu, J.; Lin, W.; Wang, X.; Shao, Y. Acceleration and deceleration calibration of operating speed prediction models for two-lane mountain highways. J. Transp. Eng. Part A Syst. 2017, 143, 04017024. [Google Scholar] [CrossRef]
Savolainen, P.T.; Mannering, F.L.; Lord, D.; Quddus, M.A. The statistical analysis of highway crash-injury severities: A review and assessment of methodological alternatives. Accid. Anal. Prev. 2011, 43, 1666–1676. [Google Scholar] [CrossRef] [Green Version]
Kopelias, P.; Papadimitriou, F.; Papandreou, K.; Prevedouros, P. Urban freeway crash analysis: Geometric, operational, and weather effects on crash number and severity. Transp. Res. Rec. 2015, 9, 123–131. [Google Scholar] [CrossRef]
Tay, R.; Rifaat, S.M. Factors contributing to the severity of intersection crashes. J. Adv. Transp. 2007, 41, 245–265. [Google Scholar] [CrossRef]
Caliendo, C.; De Guglielmo, M.L.; Guida, M. Comparison and analysis of road tunnel traffic accident frequencies and rates using random-parameter models. J. Transp. Saf. Secur. 2016, 8, 177–195. [Google Scholar] [CrossRef]
Li, X.; Liu, J.; Khattak, A.; Nambisan, S. Sequential prediction for large-scale traffic incident duration: Application and comparison of survival models. Transp. Res. Rec. 2020, 2674, 79–93. [Google Scholar] [CrossRef]
Wang, L.; Abdel-Aty, M. Microscopic safety evaluation and prediction for free-way-to-freeway interchange ramps. Transp. Res. Rec. 2016, 2583, 56–64. [Google Scholar] [CrossRef]
Guo, Y.; Li, Z.; Sayed, T. Analysis of crash rates at freeway diverge areas using Bayesian tobit modeling framework. Transp. Res. Rec. 2019, 2673, 652–662. [Google Scholar] [CrossRef]
Li, X.; Liu, J.; Yang, C.; Barnett, T. Bayesian approach to developing context-based crash modification factors for medians on rural four-lane roadways. Transp. Res. Rec. 2021, 2675, 1316–1330. [Google Scholar] [CrossRef]
Sun, J.; Sun, J. Real-time crash prediction on urban expressways: Identification of key variables and a hybrid support vector machine model. IET Intell. Transp. Syst. 2016, 10, 331–337. [Google Scholar] [CrossRef]
Sun, J.; Li, T.; Li, F.; Chen, F. Analysis of safety factors for urban expressways considering the effect of congestion in Shanghai, China. Accid. Anal. Prev. 2016, 95, 503–511. [Google Scholar] [CrossRef]
Wang, L.; Abdel-Aty, M.; Lee, J.; Shi, Q. Analysis of real-time crash risk for expressway ramps using traffic, geometric, trip generation, and socio-demographic predictors. Accid. Anal. Prev. 2019, 122, 378–384. [Google Scholar] [CrossRef]
Waseem, M.; Ahmed, A.; Saeed, T.U. Factors affecting motorcyclists’ injury severities: An empirical assessment using random parameters logit model with heterogeneity in means and variances. Accid. Anal. Prev. 2019, 123, 12–19. [Google Scholar] [CrossRef]
Saeed, T.U.; Hall, T.; Baroud, H.; Volovski, M.J. Analyzing road crash frequencies with uncorrelated and correlated random-parameters count models: An empirical assessment of multilane highways. Anal. Methods Accid. Res. 2019, 23, 100101. [Google Scholar] [CrossRef]
Chen, S.; Saeed, T.U.; Alqadhi, S.D.; Labi, S. Safety impacts of pavement surface roughness at two-lane and multi-lane highways: Accounting for heterogeneity and seemingly unrelated correlation across crash severities. Transp. A 2019, 15, 18–33. [Google Scholar] [CrossRef]
Chen, S.; Saeed, T.U.; Alinizzi, M.; Lavrenz, S.; Labi, S. Safety sensitivity to roadway characteristics: A comparison across highway classes. Accid. Anal. Prev. 2019, 123, 39–50. [Google Scholar] [CrossRef] [PubMed]
Chen, Q.; Gu, R.; Huang, H.; Lee, J.; Zhai, X.; Li, Y. Using vehicular trajectory data to explore risky factors and unobserved heterogeneity during lane-changing. Accid. Anal. Prev. 2021, 151, 105871. [Google Scholar] [CrossRef] [PubMed]
Chen, Q.H.; Huang, H.; Li, Y.; Lee, J.; Long, K.J.; Gu, R.; Zhai, X. Modeling accident risks in different lane-changing behavioral patterns. Anal. Methods Accid. Res. 2021, 30, 100159. [Google Scholar] [CrossRef]
Sharma, K.; Poonia, R.; Sunda, S. Accurate Real-Time Location Map Matching Algorithm for Large Scale Trajectory Data. In Proceedings of the 7th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), Noida, India, 29–31 August 2018; pp. 646–651. [Google Scholar]
Xing, L.; He, J.; Abdel-Aty, M.; Cai, Q.; Li, Y.; Zheng, O. Examining traffic conflicts of up stream toll plaza area using vehicles’ trajectory data. Accid. Anal. Prev. 2019, 125, 174–187. [Google Scholar] [CrossRef]
Yang, Z.; Yu, Q.; Zhang, W.; Shen, H. A comparison of experienced and novice drivers’ rear-end collision avoidance maneuvers under urgent decelerating events. Transp. Res. Part F-Traffic Psychol. Behav. 2021, 76, 353–368. [Google Scholar] [CrossRef]
Schwarz, C. On computing time-to-collision for automation scenarios. Transp. Res. Part F 2014, 27, 283–294. [Google Scholar] [CrossRef]
Schleinitz, K.; Petzoldt, T.; Gehlert, T. Drivers’ gap acceptance and time to arrival judgements when confronted with approaching bicycles, e-bikes, and scooters. J. Transp. Saf. Secur. 2020, 12, 3–16. [Google Scholar] [CrossRef]
Oh, C.; Kim, T. Estimation of rear-end crash potential using vehicle trajectory data. Accid. Anal. Prev. 2010, 42, 1888–1893. [Google Scholar] [CrossRef]
Yang, H.; Ozbay, K. 2011.Estimation of traffic conflict risk for merging vehicles on highway merge section. Trans. Res. Rec. 2011, 2236, 58–65. [Google Scholar] [CrossRef]
Li, H.; Zhang, J.; Zhang, Z.; Huang, Z. Active lane management for intelligent connected vehicles in weaving areas of urban expressway. J. Intell. Connect. Veh. 2021, 4, 52–67. [Google Scholar] [CrossRef]
Abdel-Aty, M.; Wang, L. Implementation of variable speed limits to improve safety of congested expressway weaving segments in microsimulation. Transp. Res. Procedia 2017, 27, 577–584. [Google Scholar] [CrossRef]
Hao, W.; Zhang, Z.; Gao, Z.; Yi, K.; Liu, L.; Wang, J. Research on mandatory lane-changing behavior in highway weaving sections. J. Adv. Transp. 2020, 2020, 3754062. [Google Scholar] [CrossRef]
Chen, X.; Yu, L.; Jia, X. Capacity Modeling for Weaving, Merge, and Diverge Sections with Median Exclusive Bus Lanes on an Urban Expressway: Microsimulation Approach. Trans. Res. Rec. 2016, 2553, 99–107. [Google Scholar] [CrossRef]
Wang, L.; Abdel-Aty, M.; Shi, Q.; Park, J. Real-time crash prediction for express-way weaving segments. Transp. Res. Part C-Emerg. Technol. 2015, 61, 1–10. [Google Scholar] [CrossRef]
Wang, H.; Wang, W.; Yuan, S.; Li, X.; Sun, L. On social interactions of merging behaviors at highway on-Ramps in congested traffic. IEEE Trans. Intell. Transp. Syst. 2021, 7, 1–12. [Google Scholar] [CrossRef]
Hu, Y.; LI, Y.; Huang, H.; Lee, J.; Yuan, C.; Zou, G. A high-resolution trajectory data driven method for real-time evaluation of traffic safety. Accid. Anal. Prev. 2021, 165, 106503. [Google Scholar] [CrossRef]
Chen, X.; Li, Z.; Yang, Y.; Qi, L.; Ke, R. High-resolution vehicle trajectory extraction and denoising from aerial videos. IEEE Trans. Intell. Transp. Syst. 2020, 22, 3190–3202. [Google Scholar] [CrossRef]
Pearl, J. Reasoning under Uncertainty. Annu. Rev. Comput. Sci. 2003, 4, 37–72. [Google Scholar] [CrossRef]
Tang, L.L.; Yang, X.; Kan, Z.H. Traffic Lane Numbers Detection Based on the Naive Bayesian Classification. China J. Highw. Transp. 2016, 29, 116–123. [Google Scholar]
Yang, W.C.; Xie, B.S.; Fang, R. Comparative Analysis and Prediction of Motor Vehicle Crash Severity on Mountainous Two-lane Highways. J. Transp. Syst. Eng. Inf. Technol. 2021, 21, 190–195. [Google Scholar]
Friedman, J.H. Greedy Function Approximation: A Gradient Boosting Machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
Mahmud, S.M.S.; Ferreira, L.; Hoque, M.S.; Tavassoli, A. Application of proximal surrogate indicators for safety evaluation: A review of recent developments and research needs. IATSS Res. 2017, 41, 153–163. [Google Scholar] [CrossRef]
Cai, Q.; Saad, M.; Abdel-Aty, M.; Yuan, J.; Lee, J. Safety impact of weaving distance on freeway facilities with managed lanes using both microscopic traffic and driving simulations. Transp. Res. Rec. 2018, 2672, 130–141. [Google Scholar] [CrossRef]
Hayward, J.C. Near-miss determination through use of a scale of danger. Highw. Res. Rec. 1972, 384, 24–34. [Google Scholar]
Li, Y.; Li, Z.; Wang, H.; Wang, W.; Xing, L. Evaluating the safety impact of adaptive cruise control in traffic oscillations on freeways. Accid. Anal. Prev. 2017, 104, 137–145. [Google Scholar] [CrossRef]
Li, Y.; Xu, C.C.; Xing, L.; Wang, W. Integrated cooperative adaptive cruise and variable speed limit controls for reducing rear-end collision risks near freeway bottlenecks based on micro-simulations. IEEE Trans. Intell. Transp. Syst. 2017, 18, 3157–3167. [Google Scholar] [CrossRef]
Li, Y.; Wu, D.; LEE, J.; Yang, M.; Shi, Y. Analysis of the transition condition of rear-end collisions using time-to-collision index and vehicle trajectory data. Accid. Anal. Prev. 2020, 144, 105676. [Google Scholar] [CrossRef]
Ward, J.R.; Agamennoni, G.; Worrall, S.; Bender, A.; Nebot, E. Extending Time to Collision for probabilistic reasoning in general traffic scenarios. Transp. Res. Part C-Emerg. Technol. 2015, 51, 66–82. [Google Scholar] [CrossRef]
Peng, B.; Wang, Y.T.; Xie, J.M.; Zhang, Y.; Tang, J. Multi-stage Lane Changing Decision Model of Urban Trunk Road’s Short Weaving Area Based on Cellular Automata. J. Transp. Syst. Eng. Inf. Technol. 2020, 20, 41–48. [Google Scholar]
Han, S.; Lan, D.; Gui, Q.; Du, L. Characteristics Analysis Approach for Multicollinearity Diagnosis and Its Applications in Orbit Determination of GEO Satellites. Acta Geod. Cartogr. Sin. 2013, 42, 19–26. [Google Scholar]
Zhu, Y.; Zheng, Y.; Yin, M. Multicollinearity Test Under Statistical Significance. Stat. Decis. 2020, 36, 3. [Google Scholar]

Figure 1. Study Framework.

Figure 2. Vector position and velocity of the vehicles.

Figure 3. Data extraction from UAV videos. (a) UAV video recording scene, (b) Vehicle trajectory extraction process.

Figure 4. Comparison of the data obtained by the two methods.

Figure 5. The error distribution with the radar data.

Figure 6. Risk distribution from different lane directions.

Figure 7. Zoning structure.

Figure 8. Risk density distribution in different zones. (a) Heat map of the density of the risk of lane change, (b) A coincidental accident during data collection.

Figure 9. Zone size vs. TTC* volume.

Figure 10. Results of confusion matrix. (a) Naive Bayes; (b) Logistic Regression; and (c) GBDT.

Figure 11. ROC curves. (a) Naive Bayes; (b) Logistic Regression; and (c) GBDT.

Table 1. Summary of previous studies.

Research Gap		Studies
Data	Costly and unreliable access to publicly available crash data.	Guo et al., (2019) [14]; Wang et al., (2019) [18]
Data	The sample size in the surveillance camera is not comprehensive and it is difficult to extract the trajectory data.	Chen et al., (2021) [24]; Xing et al., (2019) [26]; Yang et al., (2021) [27]
Subjects	Incomplete studies of crash risk in weaving area.	Sun et al., (2016) [15]; Wang et al., (2015) [36]

Table 2. Examples of trajectory data.

Time (s)	Lane ID	Vehicle ID	Longitude Position (Pixel)	Latitude Position (Pixel)	Gap (m)	Speed (m·s⁻¹)	Acceleration (m·s⁻²)	Density (pcu·Lane⁻¹·km⁻¹)
0.1	2	1	1409	600	2.95	5.578	0.156	90
0.2	2	1	1412	600	2.43	3.000	−2.578	90
0.3	2	1	1418	600	1.90	6.000	3.000	91
0.4	2	1	1425	602	1.06	7.280	1.280	91
0.5	2	1	1430	605	0.53	5.831	−1.449	91

Table 3. Number of potential collisions per-unit area.

Zones	Potential Collision Per-Unit Area/(Sub m⁻²)	$The Ratio of Z_{1} ~ Z_{6}$	The Ratio of TTC*
$Z_{1}$	14.81	16.97%	19.12%
$Z_{2}$	23.59	13.14%	23.58%
$Z_{3}$	24.97	8.69%	16.50%
$Z_{4}$	8.21	21.79%	13.61%
$Z_{5}$	14.28	14.69%	15.96%
$Z_{6}$	5.97	24.72%	11.23%

TTC* is TTC threshold.

Table 4. Classification and explanation of variables.

Classification of Variables	Meaning and Symbols of Variables
$Z_{k}$	$The average vehicle speed {\overline{v}}_{k}$ $and Standard deviation of vehicle speed v_{k, S}$ $in Z_{k}$
	$The average vehicle acceleration {\overline{a}}_{k}$ $and Standard deviation of vehicle acceleration a_{k, S}$ $in Z_{k}$
	$The average vehicle speed angle {\overline{θ}}_{k}$ $and Standard deviation of vehicle speed angle θ_{k, S}$ $in Z_{k}$
Vehicle i	$Arterial (ramp) {\tilde{ϑ}}_{i}$ at which vehicle i enters (leaves) the research area
	$Lane ξ_{i}$ in which vehicle i enters (leaves) the research area
	$The coordinates x_{i} (t)$ $, y_{i} (t)$ $of the corresponding vehicle i at moment ℑ_{i} (t)$
	$Instantaneous speed v_{i, x} (t)$ $, v_{i, y} (t)$ $, v_{i} (t)$ $of vehicle I average speed {\overline{v}}_{i}$ $, standard deviation v_{i . S}$ of speed
	$Instantaneous acceleration a_{i} (t)$ $of vehicle i average acceleration {\overline{a}}_{i}$ $, standard deviation a_{i, S}$ of acceleration
	$Instantaneous angular velocity θ_{i} (t)$ $of vehicle i average angular velocity {\overline{θ}}_{i},$ $standard deviation θ_{i, S}$ of angular velocity

Table 5. Comparison of model prediction accuracy.

Metric Name	Naive Bayes	Logistic Regression	Gradient Boosting Decision Tree
Accuracy/%	74.86	87.10	95.1
Precision/%	38.86	67.76	87.6
Sensitivity/%	79.87	48.04	83.6
FPR/%	26.19	4.76	2.5

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xia, Y.; Qin, Y.; Li, X.; Xie, J. Risk Identification and Conflict Prediction from Videos Based on TTC-ML of a Multi-Lane Weaving Area. Sustainability 2022, 14, 4620. https://0-doi-org.brum.beds.ac.uk/10.3390/su14084620

AMA Style

Xia Y, Qin Y, Li X, Xie J. Risk Identification and Conflict Prediction from Videos Based on TTC-ML of a Multi-Lane Weaving Area. Sustainability. 2022; 14(8):4620. https://0-doi-org.brum.beds.ac.uk/10.3390/su14084620

Chicago/Turabian Style

Xia, Yulan, Yaqin Qin, Xiaobing Li, and Jiming Xie. 2022. "Risk Identification and Conflict Prediction from Videos Based on TTC-ML of a Multi-Lane Weaving Area" Sustainability 14, no. 8: 4620. https://0-doi-org.brum.beds.ac.uk/10.3390/su14084620

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Risk Identification and Conflict Prediction from Videos Based on TTC-ML of a Multi-Lane Weaving Area

Abstract

1. Introduction

1.1. Background

1.2. Literature Review

1.2.1. Crash Risks Based on Historical Crash Data

1.2.2. Crash Risks Based on Trajectory Data

1.2.3. Driving Behavior and Risk in Weaving Areas

1.3. Challenges

1.4. Objective

2. Methodology

2.1. Research Framework

2.1.1. Video Data Processing

2.1.2. Traffic Risk Identification

2.1.3. Traffic Risk Prediction

2.2. Identification of Traffic Risks

2.2.1. Basic Time-to-Collision

2.2.2. Extended Time-to-Collision

2.3. Traffic Conflict Prediction

3. Data

3.1. Data Introduction

3.2. Data Processing

3.2.1. Feature Detection and Tracking

3.2.2. Error Elimination

4. Results and Discussion

4.1. Analysis of Potential Risk Identification Results

4.1.1. Analysis of Results in Different Merging Directions

4.1.2. Analysis of Results from Different Zones

4.2. Analysis of Traffic Conflict Prediction Results

4.2.1. Definition of Variables

4.2.2. Validity Test of Traffic Conflict Prediction Model

5. Limitation

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI