Next Article in Journal
A Study on the Recycling Classification Behavior of Express Packaging Based on UTAUT under “Dual Carbon” Targets
Previous Article in Journal
Examining the Role of Motivation, Attitude, and Self-Efficacy Beliefs in Shaping Secondary School Students’ Academic Achievement in Science Course
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Travel Characteristics Identification Method for Expressway Passenger Cars Based on Electronic Toll Collection Data

1
College of Smart City, Chongqing Jiaotong University, Chongqing 400074, China
2
Chongqing Key Laboratory of Mountain City Traffic System and Safety, Chongqing 400074, China
3
College of Traffic & Transportation, Chongqing Jiaotong University, Chongqing 400074, China
4
Cmcu Engineering Co., Ltd., Chongqing 400039, China
*
Author to whom correspondence should be addressed.
Sustainability 2023, 15(15), 11619; https://0-doi-org.brum.beds.ac.uk/10.3390/su151511619
Submission received: 19 June 2023 / Revised: 22 July 2023 / Accepted: 23 July 2023 / Published: 27 July 2023

Abstract

:
Passenger cars have emerged as a substantial segment of the vehicles traversing expressways, generating extensive traffic data on a daily basis. Accurately identifying individual vehicles and their travel patterns and characteristics is crucial in addressing the issues that impede the sustainable development of expressways, including traffic accidents, congestion, environmental pollution, and losses of both personnel and property. Regrettably, the utilization of electronic toll collection (ETC) data on expressways is currently not adequate, and data analysis and feature mining methods are underdeveloped, leading to the undervaluation of data potential. Focusing on ETC data from expressways, this study deeply analyzes the spatiotemporal characteristics of travel by passenger car users. Here, we propose an advanced user classification model by combining the traditional clustering algorithm with the feature grouping recognition model based on a back propagation neural network (BPNN) algorithm. Real-world data on expressway vehicle travel are used to validate our models. The results show a significant improvement in iteration efficiency of over 26.4% and a 23.17% accuracy improvement compared to traditional algorithms. The travel feature grouping recognition model yielded an accuracy of 95.23%. Furthermore, among the identified groups, such as “Public and commercial affairs” and “Commuting”, there is a notable characteristic of high travel frequency and concentrated travel periods. This indicates that these groups have placed significant pressure on the construction of a safe, efficient, and sustainable urban transportation system.

1. Introduction

Passenger car are playing an increasingly prominent role on expressways. Traditionally, expressway management and operational services have treated passenger car similarly to other individual vehicles, leading to inaccurate identification of similar group travel characteristics and a lack of precise understanding of the diverse travel needs of different individuals. This increases the risk of expressway traffic congestion and accidents, posing a threat to the safety of travelers and exacerbating environmental pollution, including exhaust emissions and noise pollution [1]. Consequently, it impacts the sustainable development of the expressway economy and society. To fully explore the powerful value of ETC data on expressways, a scientific and efficient method is necessary. The use of an objective, accurate, and highly applicable algorithm model to classify and identify the heterogeneity of travel behavior among ETC passenger car users on expressways should be explored.
Expressways are an essential part of transportation infrastructure and play a significant role in meeting the high-speed and high-efficiency travel demands of society, with passenger cars representing the primary user group. The travel behavior of passenger car users directly affects the work mode of expressway management and service departments and has an impact on the operational status of expressways. Therefore, a fundamental approach to enhancing the operational environment of expressways, addressing the differentiated travel demands of diverse groups, and achieving sustainable development in urban transportation is to explore the travel characteristics and patterns of passenger car users on expressways, accurately and meticulously classify heterogeneous groups, and develop targeted management measures and service plans. This study contributes specifically to the literature in the following ways:
(1) This study explores ETC data on expressways, analyzes the spatiotemporal distribution patterns of vehicle travel, and identifies the travel characteristics of ETC passenger car users. The findings can serve as a reference for future research on the travel behavior of vehicles on expressways.
(2) We developed a precise feature identification model using machine learning methods to overcome the shortcomings of traditional clustering algorithms. The model accurately “partitions–recognizes” different types of passenger car travel groups. This model can provide data support for the design of governance programs for expressway traffic management departments and enable the identification of the needs of different user groups. This, in turn, can facilitate the development of personalized ETC value-added service products and improve overall satisfaction with the expressway travel services.
This paper is structured as follows. Section 2 provides an overview of the existing literature, followed by Section 3 which introduces the travel feature identification combination model. The combination model includes the user classification model based on improved traditional clustering algorithm, and the group recognition model based on back propagation neural network. Section 4 focuses on verifying and examining the results of the model. Finally, the conclusion and future prospects of the research are discussed in Section 5.

2. Literature Review

In recent years, digital and intelligent highways have rapidly developed, and big data have played an increasingly significant role in related research on highways. The kinds of data on highways mainly include construction, toll, traffic, facility maintenance, and traffic accident data. The research and application fields are mainly concerned with traffic flow prediction, traffic operation status analysis, and traffic safety risk assessment and prediction.
In terms of traffic flow prediction, He et al. [2] analyzed the distribution law of traffic flow through ETC toll data and used the Gaussian mixture regression (GMR) model to predict short-term traffic flow in ETC lanes. Shuai et al. [3] proposed a mixed two-layered model for predicting highway traffic flow and assessed its effectiveness by using data from 51 toll stations. In terms of traffic operation status judgment, Wang et al. [4] used actual highway data for parameter regression analysis and evaluated the traffic operation status between the mainline and ramp systems. Wang et al. [5] considered the sequential nature of traffic states and established an ordered Logit model to achieve traffic state prediction. In terms of traffic safety risk assessment and prediction, Yang et al. [6] proposed a method for accident detection and classification on highways based on deep convolutional neural networks using Global Navigation Satellite System (GNSS) positioning data. Jung et al. [7] used binary regression of random forests based on freeway tunnel accident data to evaluate the strategies recommended by the Korea Ministry of Land, Infrastructure and Transport (KMOLIT).
Studies of vehicle travel behavior mainly analyze how people choose travel modes, plan itineraries, and use transportation tools. Various methods have been proposed for studying vehicle travel behavior. These methods are generally divided into two categories: travel characteristics mining and travel characteristics identification.
Extracting travel characteristics usually involves processing large numbers of transportation data, such as vehicle travel trajectories, Global Positioning System (GPS) data, and Radio Frequency Identification (RFID) data to identify rational and scientific indicators for characterizing travel behavior. Chen et al. [8] analyzed the number of trips, travel distances, and travel time of local vehicles, non-local vehicles, and ride-hailing vehicles to understand the travel characteristics of these three different types of vehicles. Gong et al. [9] extracted the travel features of motor vehicles from license plate recognition system data, identifying four types of vehicles with unique structures while accounting for the number of trips and travel time. Lv et al. [10] used the vehicle travel trajectory data to extract feature indicators including travel frequency, travel time, and travel distance. Cluster analysis, as the main research method of data mining, has become a focus of researchers’ attention. Magnana et al. [11] analyzed the path selection rules based on travel characteristics and used density clustering algorithm to analyze GPS data, obtaining the original candidate path set. Gao et al. [12] used the traveler’s job type as a basis, analyzed trajectory data and combined with hierarchical clustering and random forest algorithm to classify and predict travel purpose. Chang et al. [13] determined three characteristic parameters by analyzing vehicle license plate data, completed the identification of commuting vehicles using the K-means algorithm, and obtained individual vehicles with commuting characteristics and concentrated distribution areas.
Mode recognition is an important method for identifying vehicle travel characteristics. It is a mathematical method of summarizing and classifying information from data. Researchers usually use machine learning methods to construct feature recognition models for recognizing travel modes of distinct groups. Xu [14] and Zhang [15] built recognition models with support vector machine (SVM) algorithms. Some studies also utilize the random forest model to identify travel modes [16,17]. Furthermore, existing studies indicate that utilizing neural networks to identify travel characteristics tend to yield favorable outcomes [18,19].
We have also reviewed studies aimed at improving the K-means algorithm. K-means algorithm [20] has three main limitations: sensitivity to the initial clustering center, manual determination of the number of clusters, and varying degrees of impact on results by data outliers. Existing studies have identified the selection of initial clustering centers as a crucial aspect to improve [21,22]. In the issue of determining the optimal number of clusters, several methods have been proposed. Guo et al. [23] utilized sample density and crown algorithm to cluster sample data, which enabled them to obtain the number of clusters and cluster centers. Liu et al. [24] developed a clear hierarchical clustering objective function by combining Bayesian theory analysis with K-means algorithm. Zhang et al. [25], on the other hand, identified the number of clusters based on the similarity in the data and applied iterative clustering to find the best cluster number. Three methods proposed in recent literature have addressed the problem of outlier influence in K-means clustering. Zhang et al. [26] introduced a K-means local search algorithm with relaxation objective function for outlier detection. Chen et al. [27] defined outliers as a new cluster for detection and elimination. Yu et al. [28] proposed a genetic algorithm-based three-layer and two-layer K-means algorithm that overcomes the influence of outliers, noise data, and initial cluster centers.

3. Materials and Methods

3.1. Overview of The Travel Feature Identification Combination Model

Figure 1 displays the technical roadmap of the combination model used for recognizing travel characteristics of ETC passenger car users on expressways.
This study analyzes the overall travel patterns of ETC small car users on highways, proposing feature indicators that can accurately describe travel differentiation. It combines a user feature grouping model based on an improved clustering algorithm with a feature grouping recognition model based on BP neural network in order to achieve group classification and individual recognition of travel behavior of ETC small car users on highways. Identifying travel characteristics of ETC small cars consists of three parts.
(1)
Firstly, the ETC data should be pre-processed, which includes handling missing, duplicate, and abnormal values. The effective passing records can then be extracted, and the spatiotemporal characteristics of vehicle travel can be analyzed to select the travel characteristic indicators.
(2)
Secondly, this part focuses on classifying user feature groups, primarily through the use of an improved K-means algorithm for clustering selected travel feature indicators.
(3)
Finally, based on the clustering analysis results, the BP neural network algorithm can be utilized to learn and train the user classification data, thereby establishing a recognizable model for user travel feature groups.

3.2. Analysis of Temporal and Spatial Characteristics of Passenger Car Travel on Expressways

3.2.1. Data Preparation

The electronic toll collection (ETC) data used in this study were obtained from the Chongqing Expressway Group, which is characterized by its significant size, diversity, and wide sources. To conduct big data mining and research on passenger car users’ travel behavior more effectively, ETC transaction, vehicle passage, and vehicle information data were ultimately chosen as fundamental data for this research. The collection spanned across 4 months from May to June 2021 and from September to October 2021. Before analysis, the data underwent preprocessing to address missing and duplicate values, outliers, and establish correlation among dataset variables.

3.2.2. Analysis of Travel Time Characteristics

(1)
Analysis of vehicle travel days
In order to gain a comprehensive and clear understanding of the characteristics of vehicle travel behavior with respect to travel days, the data were divided into three distinct categories for analysis: monthly total travel days, workday travel days, and weekend and holiday travel days.
Figure 2 displays statistical results of the number of vehicles associated with various travel days of electronic toll collection (ETC) passenger car users recorded in May and June 2021. The trend of vehicle frequency corresponds to the total number of monthly travel days is consistent for both months. A significant number of vehicles, about 40%, have a total monthly travel of 1–2 days, whereas vehicles with a total number of monthly travel days within 5 days represent around 75% of the sample.
A significant proportion of vehicles, approximately 65%, have a travel duration within three days on workdays. Furthermore, the percentage of vehicles remains largely unchanged after the number of travel days increases to around 10 days, as shown in Figure 3, suggesting a group of passengers who travel by expressway for an average daily commute. This statistical pattern aligns with the objective understanding of commuter groups during daily life.
Figure 4 shows the highest proportion of vehicles traveling on weekends and holidays is for a one-day duration, and approximately 80% of vehicles travel for a duration of less than 5 days.
(2)
Analysis of Vehicle Travel Frequency
In May and June 2021, the statistical results of the number of vehicles corresponding to various travel frequencies of the ETC passenger car users are shown in Figure 5. Most vehicles had fewer than four travel instances within a month, of which vehicles that traveled six or fewer times accounted for approximately 60% of the dataset. To furnish a better characterization of the uneven distribution among vehicles in terms of travel frequency, the Gini coefficient was introduced and employed to produce the Lorenz curve corresponding to the number of vehicles and toll collection data.
G = S 1 S 1 + S 2
G represents the Gini coefficient, S 1 represents the area between the Lorenz curve and the line of equality, and S 2 represents the area between the Lorenz curve and the horizontal axis, as shown in Figure 6.
Approximately 20% of the vehicles have a higher frequency of monthly trips and tend to use the highway more often. The highway toll data of these vehicles account for 60% of the total toll data. Meanwhile, the calculation of G value at 0.59 suggests noticeable discrepancies and an unbalanced distribution of the indicators.
(3)
Analysis of Vehicle Travel Time
The difference in traffic flow between highways during workdays and weekends or holidays is noticeable in daily life. To understand the distribution of vehicle travel times more accurately, date types can be divided into workdays and weekends or holidays (non-workdays). For each category, the distribution of the count of vehicles corresponding to various travel times can be studied.
Figure 7 shows notable bimodal changes in the distribution of average vehicle counts during different time periods on highways during workdays. The trend in changes is similar. On Fridays and the day before holidays, many vehicles travel on highways afternoon, which significantly increases the demand for traffic. However, the demand for transportation on the morning peak of the first working day after the holiday is much higher than that during the evening peak.
Unlike workdays, weekends or holidays only have one obvious peak in the distribution of vehicle counts for different travel times. In particular, passenger cars traveling through highways have a distinct morning peak during weekends and holidays, which is delayed for about 1–2 h compared with that on workdays, as shown in Figure 8.
(4)
Analysis of Vehicle Travel Duration
It is apparent from Figure 9a that, during working days, most vehicles make short trips. The majority of them complete a single trip within half an hour or less, whereas only a small fraction takes over two hours, accounting for only 10% of the total. On the other hand, during weekends and holidays, around 30% of vehicles spend over six hours travelling, illustrating a significant difference in the duration of vehicle travel between working days and non-working days.

3.2.3. Analysis of Travel Space Characteristics

(1)
Analysis of Vehicle Travel Distance
This paper examines the spatial characteristics of overall travel for passenger cars by analyzing the distribution of the distance and trajectory repetition rate of ETC passenger car users on highways over two months.
As shown in Figure 10, on workdays, trips of less than 50 km accounted for approximately 42.71% of all trips, whereas mid-range trips from 50 km to 200 km accounted for approximately 45.67%. On non-workdays, although the proportion of short-distance trips was lower than on workdays, the percentage of mid-range trips was higher, reaching approximately 52%.
(2)
Analysis of Vehicle Travel Trajectory Repetition Rate
In this paper, the trajectory repetition rate is defined as the ratio of the number of instances where vehicles follow the same travel path during the study period to the total number of travel instances for each individual vehicle. The determination of vehicle trajectories is based on the entry and exit toll station information derived from the traffic data. Figure 11 shows that there is a positive correlation between the vehicle travel trajectory repetition rate and the number of trips taken by the vehicle. ETC passenger car users on expressways mostly show low trajectory repetition rates, accompanied by a subset of vehicles exhibiting high trajectory repetition rates.
Table 1 illustrates that the trends of the four indicators remain consistent across different months as well as during weekends and holidays. Table 2 indicates the variations in travel time and distance between workdays and non-workdays (weekends and holidays).
In summary, there are significant variations in the travel patterns and durations of vehicles between workdays and non-workdays. On workdays, most trips are for commuting or temporary business affairs, typically involving short distances and travel times. Conversely, during weekends and holidays, travel activities usually involve longer distances and longer travel times, in contrast to commuting or temporary business affairs.

3.2.4. Travel Characteristic Index Extraction

We identified six indicators that can describe the characteristics of small passenger car travel on highways: monthly number of travel days (X1), monthly travel frequency (X2), average travel distance per trip (X3), trajectory repetition rate (X4), travel preference during peak hours (X5), and travel preference on weekends and holidays (X6). These indicators were obtained from the data analysis. Subsequently, we performed a correlation analysis on these indicators, the results of which are presented in Table 3. The results revealed a significant positive correlation among monthly number of travel days (X1), monthly travel frequency (X2), and trajectory repetition rate (X4). Consequently, based on the results of the correlation analysis, the travel characteristic indicators were identified as monthly number of travel days, average travel distance per trip, travel preference during peak hours, and travel preference on weekends and holidays.

3.3. Development of a User Classification Model using an Improved Clustering Algorithm

3.3.1. Canopy-K-Means Clustering Algorithm Construction

McCallum et al. [29] first proposed the Canopy algorithm in 2000. The algorithm is commonly utilized for clustering analysis of high-dimensional datasets. The algorithm principle is illustrated in Figure 12. Unlike traditional clustering algorithms, the Canopy algorithm does not require pre-detection of the number of clusters. The number of clusters can be obtained by the Canopy algorithm after a single traversal of the sample dataset. The input of this algorithm is an n-sample dataset, and the output consists of k cluster centers.
The process of the improved K-means clustering algorithm based on the Canopy algorithm has two main parts [30,31]. The first part involves Canopy pre-clustering, which utilizes the Canopy algorithm to determine the number of clusters (‘k’), and obtain the initial clustering centers, also known as the centroids of each Canopy sub-group. The second part involves the iterative process of the K-means algorithm, which starts with the initial clustering centers obtained from pre-clustering, iterates the algorithm until the clustering centers converge and stop changing, and finally outputs the results.
Figure 13 is the flow chart of Canopy-k-means algorithm. The detailed algorithm steps are as follows:
Step 1: Build the target data sample set S, use the Euclidean distance to calculate the distance sample set D.
Step 2: The distance distribution histogram is generated based on the numerical features of the distance sample set, and the initial distance thresholds T_1 and T_2 are obtained.
Step 3: Determine the sample mean point of the target data sample set and designate it as the initial clustering center for the Canopy algorithm. Output the resulting clustering center as well as the k-value obtained from pre-clustering.
Step 4: Employ the clustering result obtained from the Canopy algorithm as an input parameter for the K-means algorithm. Calculate the distance between each sample point and every clustering center based on Euclidean distance calculation formula. Then, assign every sample point to its closest clustering center category.
Step 5: After all sample points are assigned, recalculate the center of each cluster.
Step 6: Compare the newly obtained clustering center with the previous clustering center. If they are different, go to Step 4 to continue calculation. Otherwise, go to Step 7.
Step 7: Output the final results.
The Canopy algorithm is used to enhance the K-means clustering method, which effectively resolves the challenge of uncertain primary centers and hard determination of ‘k’ number of clusters in the traditional method. Nevertheless, the efficacy of such an enhancement approach is confined to the input of primary parameters, with no actual performance optimization of the K-means algorithm’s internal framework. Since the K-means algorithm employs a partitioning strategy, it is inept in averting the impact of anomalous data on clustering output, resulting in local optimum problems. In light of this, specific optimization schemes will be proposed in the following section to tackle the aforementioned issue.

3.3.2. Construction of the Canopy-K-Means Clustering Algorithm Based on Ant Colony Optimization

In 1991, M. Dorigo [32] first introduced the ant colony algorithm and established the fundamental principles and core concepts through extensive research. Utilizing the stochastic search characteristic of the ant colony algorithm can enhance clustering outcomes, alleviate local optima problems, and enhance the overall accuracy of user segmentation models. Consequently, this paper suggests utilizing an ant colony algorithm optimization technique to create a mixed clustering method for Canopy-K-means.
Assuming that X = { X i | i = 1 , 2 , , N } is a set of data samples, where X i = { X i 1 , X i 2 , , X i z } is a Z-dimensional vector and the number of clusters is k , the objective function is defined as follows:
m i n F = k = 1 K i = 1 N y i k d i k
d i k = d X i , m k = j = 1 Z x i j m k j 2 1 2
s . t = k = 1 K y i k = 1 , i = 1 , 2 , , N i = 1 N y i k 1 , k = 1 , 2 , , K
y i k = 1 , X i M k 0 , X i M k
where d i k symbolizes the distance between the sample point X i and the cluster center m k ; y i k denotes the affiliation of X i to M k , and M k is the k-th class.
The calculation of pheromone concentration on each path is as follows:
τ i k t + 1 = 1 ρ τ i k t + l = 1 L Δ τ i k l
τ i k 0 = 1 d i k
where τ i j k ( t ) represents the residual pheromone concentration from sample point X i to clustering center m k on the tth iterative path. τ i k = 1 / F l , F l is the minimum value of the objective function, L is a constant, and ρ is the volatility factor. τ i k 0 = 1 / d i k is the initialization of pheromone matrix.
The detailed process of the Canopy-K-means clustering algorithm optimized by the ant colony algorithm is shown in Figure 14. The specific steps are as follows:
As shown in Figure 14, the first part is the Canopy-K-means clustering algorithm mentioned in the previous section. The second part outlines the ant colony algorithm, which optimizes clustering results through the following steps:
Step 8: Initialize the pheromone matrix and set the following parameter values: ρ ,   q 0 ,   L ,   P s ,   K ,   A ,   t _ m a x ;
Step 9: To determine the category of all samples in each ant, each ant generates a corresponding random number q , q ϵ ( 0 , 1 ) , where q 0 is a preset value. If q q 0 , the ant calculates the mobility probability p using Equation (8) to assign sample X i to M K . If q > q 0 , the ant randomly assigns sample X i to M K using the normalized probability Equation (9).
p = m a x p i k = τ i k η i j / k = 1 K τ i k η i j η i j = 1 d i k
p i j = τ i j / k = 1 K τ i j , j = 1 , 2 , , K
Step 10: Obtain the cluster center by calculating the mean attribute values of each category of samples in each ant’s corresponding solution. Calculate the objective function value using Equation (2).
Step 11: After assigning sample points to their corresponding categories, arrange the value F for each ant in ascending order, and perform a simple local search on the top L ant solutions. For the random number r ϵ ( 0 , 1 ) allocated to the element corresponding to the sample X i in the solution, if r < P s , assign sample X i to another category. Recalculate F , if F < F , replace the original solution.
Step 12: Calculate and update the global pheromone concentration on each path according to Equation (6);
Step 13: If the number of iterations t = t m a x , output the optimal solution, otherwise t = t + 1 , and go to Step 9.

3.4. Development of a BP Neural Network Algorithm for the Identification of Travel Characteristic Groups

The BP (back propagation) neural network, also known as the error back propagation network, uses the error reverse transmission algorithm as its core rule for model training. The essence of the neural network training process is to minimize the loss function, and each optimization corresponds to an iteration of the network. The iterative forward information propagation and reverse error propagation are the processes of neural network learning and training. The process stops when the error of the model output meets the expected error target set in advance or the maximum number of model training times is reached. The operating structure of the BP neural network model is shown in Figure 15.
The construction idea of BP neural network recognition model in this study is as follows:
The number of layers of the neural network (1 ≤ L ≤ n) and the number of hidden layer neurons (1 ≤ L ≤ m) are selected based on experience to obtain varied combinations of layer numbers and neuron numbers. The model takes four feature indicators, i.e., monthly number of travel days, average travel distance per trip, travel preference during peak hours, and travel preference on weekends and holidays as inputs. The samples are labeled based on the user group partitioning results of the classification model. A BP neural network is then constructed to identify travel feature groups. The expected model recognition error is set as a threshold and the iteration is stopped to complete the training when the error between the predicted output and the actual value is lower than the expected error. The parameter combination with the shortest training time is recorded as the optimum parameter combination. This determination finally defines the number of hidden layers and neurons in the BP neural network. The specific model design process is shown in Figure 16.

4. Empirical Analysis

4.1. Model Validation

The study collected 61 days of ETC data from September to October 2021, which include 39 workdays and 22 weekends and holidays, including two large statutory holidays, Mid-Autumn Festival and National Day. The total number of target objects collected was 1,642,920. Furthermore, the data indicated the effective travel records (21,649,767 for the two months) and the result of data volume statistics for different time ranges (shown in Figure 17).

4.1.1. Model Validation for Classifying Expressway ETC Passenger Car Users

This article applies the Canopy algorithm for pre-clustering. We selected a random 2% sample from the dataset to obtain the sample distances. We drew a histogram based upon the sample distances and obtained corresponding T 1 and T 2 values. To ensure sampling consistency, we conducted ten iterative samplings. After each iteration, we calculated the average value of the initial threshold values of T 1 and T 2 . This average value served as the final initial threshold for the study. As shown in Table 4, the final initial thresholds are T 1 = 4.22 and T 2 = 3.11 .
The model computes the initial threshold values of T 1 and T 2 . The model outputs six Canopy subsets and their corresponding centers. These are shown in Figure 18 and Table 5; the result is a cluster number k = 6 .
We input the six initial clustering centers obtained by Canopy into the K-means clustering model for iterative computation. We obtained a final output of six clustering centers and the number of iterations when the model converges. The results are displayed in Figure 19 and Table 6.
To further improve the accuracy of clustering results, the ant colony algorithm was used to optimize the Canopy-K-means clustering results. In this paper, the volatility factor was set to p = 0.1 , the thresholds q 0 and P s were set to 0.9, and the constant L = 50 . The number of ants ( A ) was set to 200. The maximum number of iterations ( t _ m a x ) was determined by analyzing and comparing the clustering results with different numbers of iterations, as shown in Table 7. We found that the clustering effect was optimal when the maximum number of iterations was set to 300.
Table 8 shows the parameter settings of the ant colony algorithm used in this study. Figure 20 and Table 9 display the optimized clustering results.
To clarify the actual meaning of each feature index of the clustering centers and study the feature performance of different categories, we performed data reduction on the clustering centers to obtain the actual value of the feature clustering centers as presented in Table 10. Figure 21 provides an intuitive comparison of the differences in travel characteristics between groups.
Group 0 has a prominent preference for traveling during weekends and holidays, with fewer average travel days per month, as shown by the results. We define this group as the “travel and visitation” group.
Group 1 is defined as the long-distance travel group, as its most significant characteristic is the average distance of a single trip.
Having an average of 3.79 travel days per month, Group 2 primarily concentrates on short to medium-distance trips with no apparent preference for weekend and holiday travel. Thus, we identify Group 2 as the business travel group.
Group 3 has a higher average number of monthly travel days. While there is no apparent preference for travel on weekends and holidays, the travel characteristics are similar to those of commercial vehicles. Therefore, Group 3 is more in line with official or commercial vehicles.
Group 4 is identified as the “commuting” group due to their high travel frequency during workday peak periods, involving relatively short traveling distances with less travel on weekends and holidays.
Group 5’s travel days per month are lower, and travel times rarely occur on weekends or holidays. Their traveling activities rarely involve highways, which implies more city-based daily travel. Hence, we define this group as the “sporadic travel” group.
Figure 21 reveals that the “public and commercial affairs” and the “commuting” groups have higher travel frequency and shorter travel distance, especially on working days. The “commuting” group is particularly concentrated during morning and evening peak hours on working days, causing the most traffic pressure on expressways and easily resulting in traffic congestion. This will increase commuting time and costs, leading to environmental pollution and energy waste. The expressway management department can formulate effective traffic control measures specifically for this group to improve the operational efficiency and sustainability of expressways.

4.1.2. Model Validation for Identifying Travel Characteristics Groups

Based on the results of a user classification model, classification tags are assigned to all expressway ETC passenger car users. First, 80% of randomly selected individuals are assigned as the training set, and a BP neural network is trained to identify the travel characteristics of the expressway ETC passenger car user group. The remaining 20% of individuals are designated as the test set to calculate the identification accuracy of the recognition model. The construction of the characteristic group identification model is illustrated in Figure 22.
The detailed parameters of the neural network model are set as shown in Table 11.
By comparing the predicted results of the model with the actual values, the recognition results of each characteristic group are determined, as shown in Figure 23.

4.2. Effect Test

The validity of cluster analysis can be evaluated using commonly used indicators, such as the SSE (Sum of Squared Errors), CH (Calinski–Harabasz Index), and DB (Davies–Bouldin Index) [33]. The SSE index mathematically represents the sum of squared errors. The CH index is defined as the ratio of inter-cluster dispersion to intra-cluster dispersion. It can be obtained by calculating the between-class variance and within-class variance. A larger CH value indicates a better clustering effect. The DB index, also known as the classification appropriateness index, is obtained by calculating the average intra-class distance between any two categories divided by the maximum distance between the two cluster centers. A smaller DB value indicates a better clustering effect due to a smaller intra-class distance and a larger inter-class distance between the categories.
I S S E = i = 1 k I S S E i
where k represents the number of clusters; I S S E ( i ) represents the distance between the data samples in the same cluster and the cluster centroid.
I C H I = B G S S / k 1 W G S S / N k
W G S S k = i k x i k Z k 2
B G S S k = i = 1 k n i Z i Z 2
where N represents the total number of samples in the dataset, k represents the number of clusters, B G S S represents between-class variance, and W G S S represents within-class variance.
D B I k = 1 k i = 1 k m a x j = 1 k , j i W i + W j C i j
W i = 1 n i x i C i x i Z i 2
C i j = Z i Z j 2
where k represents the number of clusters, C i represents the i-th class object set, | C i j | represents the distance between the cluster centers of the i-th class and j-th class, W i and W j respectively represent the average distance between the sample points in the i-th class and j-th class to their respective cluster centers, and n i represents the number of samples in that class.
The performance of the identification model for travel characteristics group is tested by selecting TP (true and positive), FN (false and negative), FP (false and positive), and TN (true and negative) as evaluation indicators, based on the combination of its actual category and predicted category [17]. The precision rate refers to the proportion of the correct number of travel group samples identified by the model to the total samples identified as the travel group.
P i = T P i T P i + F P i
where P i denotes the precision rate of group i , T P i represents the correct number of samples of group i in the model, and F P i represents the number of incorrect samples of group i .
Recall rate refers to the proportion of the correct sample number of a particular travel group identified by the model to the actual travel group.
R i = T P i T P i + F N i
where R i denotes the precision rate of group i , and F N i stands for the number of incorrect samples of group i .
F 1 s c o r e = 2 × P i × R i P i + R i
A c c u r a c y = T P i + T N i T P i + F N i + F P i + T N i
Firstly, the efficacy of the method employed to determine the initial thresholds T 1 and T 2 of the Canopy algorithm in this study is validated by verifying the number of clusters generated by the model. The evaluation indicator values of different cluster numbers in the Canopy algorithm are demonstrated in Figure 24. When the number of clusters k = 6 , the clustering effect is superior, as indicated by optimal values in all evaluation indicators, in line with the corresponding conclusion yielded by the Canopy algorithm.
The efficacy of the performance improvement provided by the Canopy-K-means algorithm is verified through comparison of its clustering outcomes with those yielded by the traditional K-means algorithm. The output results of the traditional K-means clustering model are presented in Table 12. Figure 25 displays a comparison of the iteration effects between the two models. After 368 iterations, the Canopy-K-means algorithm attains the optimal solution, whereas the traditional K-means algorithm does not converge when iteration numbers surpass the maximum iteration limit due to the heavy initial-value randomness.
In summary, using the Canopy algorithm for pre-clustering to obtain initial cluster centers not only sped up the convergence rate of the K-means algorithm, with a computation efficiency improved by over 26.4%, but also improved the accuracy of algorithm results. As shown in Table 13 and Figure 26, the Canopy-K-means clustering algorithm reduced the clustering error by 13.48% compared to the traditional K-means algorithm, resulting in smaller intra-class distances and larger inter-class distances. As a result, individuals of the same category were more closely related and similarities between them were higher, while the differences between individuals of different categories were more significant. The error in the model clustering results was reduced by 8.54% compared to the Canopy-K-means algorithm and by 23.17% compared to the traditional K-means clustering algorithm, after optimization by the ant colony algorithm. The clustering effect was significantly enhanced, resulting in remarkable optimization outcomes.
A recognition confusion matrix was assembled with regard to the recognition outcomes of various travel groups to evaluate the model’s recognition performance through calculation of an evaluation table. Refer to Table 14 and Table 15 for details.
As shown in Figure 27, the results indicate that the highest F1-score value, reaching 96.94%, is that of the “commuting” group, followed by the “official business” group. The identification accuracy of the travel characteristic group recognition model for expressway ETC passenger car users is 95.23%. In general, the model performs well in recognition and can accurately identify various travel characteristic groups.

5. Conclusions

This study aims to assist expressway operation departments in formulating management measures and service plans for different travel groups. To achieve this, we extracted vehicle travel characteristic indicators from electronic toll collection (ETC) data, analyzed the time and space characteristics of passenger car, optimized the K-means clustering algorithm, and proposed a combination model for recognizing feature groups based on a neural network algorithm.
(1)
In the traditional K-means algorithm, the problem of determining the initial clustering center and number of clusters is addressed through the use of the Canopy algorithm for pre-clustering. This improvement results in the K-means clustering algorithm being at least 26.4% more efficient.
(2)
The ant colony algorithm optimized clustering results have reduced the error of the Canopy-K-means algorithm by 8.54% and decreased the inter-cluster error of the traditional K-means clustering algorithm by 23.17%. The accuracy of clustering has improved significantly.
(3)
We utilized the group classification results as labels and performed neural network training to achieve efficient identification of various travel feature groups. The findings indicated a model recognition accuracy of 95.23%.
The results demonstrate that the model proposed in this paper accurately identifies the travel characteristics of passenger cars, offering valuable insights for traffic management departments to develop effective traffic control measures, thereby contributing to the achievement of sustainable development in expressway traffic. For instance, official vehicles are prone to wear and tear due to their frequent usage. To prevent vehicle-related traffic accidents, traffic management departments can disseminate network information [34] to remind users to inspect their vehicles daily before driving. Additionally, highway law enforcement agencies should intensify their oversight of social commercial operating vehicles, enhance the scrutiny of their operational qualifications, standardize their driving behavior, ensure the safety of drivers and passengers, and elevate the level of highway safety. In terms of the commuter travel group, implementing time-sharing and regional travel guidance can mitigate the concentration of vehicle travel, thereby reducing traffic congestion at expressway toll stations during peak morning and evening hours. This approach ensures the smooth operation of the expressway and enhances overall vehicle traffic efficiency.
The research data used in this paper are mainly ETC traffic data and user basic information, which are offline historical data. In the future, we will incorporate the following two aspects:
(1)
To achieve secondary division of ETC passenger car user groups, we will utilize additional ETC internet operation data and ETC consumption data to further investigate the diversity of user group characteristics and needs.
(2)
To continuously improve the division criteria for different user groups with distinctive characteristics, an interface for real-time data transmission will be established, which utilizes data that are more current and relevant to update user feature identification models. This approach enables us to provide a more targeted and refined service to meet user needs, resulting in the improvement of efficiency in expressway management and service levels.

Author Contributions

Methodology, X.C. and Y.Z.; validation, B.P.; writing—original draft preparation, X.Z.; writing—review and editing, Y.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded in part by Science and Technology Research Major Project of Chongqing Municipal Education Commission: Research and application of AI-driven traffic congestion mechanism and core model algorithm of circle layer intelligent control in super-large mountainous cities (KJZD-M202300702).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The numerical data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Ge, W.; Zhang, G. Resilient Public Transport Construction in Mega Cities from the Perspective of Ecological Environment Governance. J. Environ. Public Health 2022, 2022, 9143618. [Google Scholar] [CrossRef] [PubMed]
  2. He, M.; Gao, L.; Shuai, C.; Lee, J.; Luo, J. Distribution Analysis and Forecast of Traffic Flow of an Expressway Electronic Toll Collection Lane. J. Transp. Eng. Part A Syst. 2021, 147, 04021043. [Google Scholar] [CrossRef]
  3. Shuai, C.; Wang, W.; Xu, G.; He, M.; Lee, J. Short-Term Traffic Flow Prediction of Expressway Considering Spatial Influences. J. Transp. Eng. Part A Syst. 2022, 148, 04022026. [Google Scholar] [CrossRef]
  4. Wang, Y.; Fu, Q.; Wang, X. A Traffic Status Evaluation Method of Expressway Merging Area Based on Improved Coupling Theory. Mod. Phys. Lett. B 2022, 36, 2150616. [Google Scholar] [CrossRef]
  5. Wang, K.; Wang, L.; Ma, W. Real-Time Traffic State Prediction and Congestion Mechanism Analysis for Expressways. In Proceedings of the CICTP 2022, Changsha, China, 8–11 July 2022. [Google Scholar]
  6. Yang, D.; Wu, Y.; Sun, F.; Chen, J.; Zhai, D.; Fu, C. Freeway Accident Detection and Classification Based on the Multi-Vehicle Trajectory Data and Deep Learning Model. Transp. Res. Part C Emerg. Technol. 2021, 130, 103303. [Google Scholar] [CrossRef]
  7. Jung, S.; Qin, X. A Data-Driven Approach to Strengthening Policies to Prevent Freeway Tunnel Strikes by Motor Vehicles. Accid. Anal. Prev. 2021, 157, 106171. [Google Scholar] [CrossRef] [PubMed]
  8. Chen, Z.-C. Spatio-Temporal Analysis and Cost Modeling of Trip Data Based on License Plate Recognition BigData: An Case Study of Shenzhen City, China. Master’s Thesis, School of Civil and Traffic Engineering, Shenzhen University, Shenzhen, China, 2020. [Google Scholar]
  9. Gong, Y.-G.; Li, J.; Wang, Y.; Ye, H. Analysis of Vehicle Travel Characteristics Based on License Plate Recognition Data. Traffic Transp. 2020, 36, 54–58. [Google Scholar]
  10. Lv, M.; Chen, L.; Chen, T.; Zeng, D.; Cao, B. Discovering Individual Movement Patterns from Cell-Id Trajectory Data by Exploiting Handoff Features. Inf. Sci. 2019, 474, 18–32. [Google Scholar] [CrossRef]
  11. Magnana, L.; Rivano, H.; Chiabaut, N. Implicit GPS-based Bicycle Route Choice Model Using Clustering Methods and a LSTM Network. PLoS ONE 2022, 17, e0264196. [Google Scholar] [CrossRef]
  12. Gao, Q.; Molloy, J.; Axhausen, K.-W. Trip Purpose Imputation Using GPS Trajectories with Machine Learning. ISPRS Int. J. Geo-Inf. 2021, 10, 775. [Google Scholar] [CrossRef]
  13. Chang, Y.-J.; Yang, D.-Y. Recognition of Vehicles with Commuting Property Using License Plate Data. J. Transp. Syst. Eng. Inf. Technol. 2016, 16, 77–82. [Google Scholar]
  14. Xu, Z.; Aghaabbasi, M.; Ali, M.; Macioszek, E. Targeting Sustainable Transportation Development: The Support Vector Machine and the Bayesian Optimization Algorithm for Classifying Household Vehicle Ownership. Sustainability 2022, 14, 11094. [Google Scholar] [CrossRef]
  15. Peng, H.; Wang, J.-P.; Zhang, N. Travel Mode Choice of Commuters in Corridor Valley Pattern City of Loess Plateau Based on SVM. J. Chongqing Jiaotong Univ. Nat. Sci. 2021, 40, 18–23. [Google Scholar]
  16. Zhao, P.-J.; Cao, Y.-S. Identifying metro trip purpose using multi-source geographic big data and machine learning approach. J. Geo-Inf. Sci. 2020, 22, 1753–1765. [Google Scholar]
  17. Lu, Z.; Long, Z.; Xia, J.; An, C. A random forest model for travel mode identification based on mobile phone signaling data. Sustainability 2019, 11, 5950. [Google Scholar] [CrossRef] [Green Version]
  18. Xia, Y.; Chen, H.; Zimmermann, R. A Random Effect Bayesian Neural Network (RE-BNN) for Travel Mode Choice Analysis Across Multiple Regions. Travel Behav. Soc. 2023, 30, 118–134. [Google Scholar] [CrossRef]
  19. Tang, Y.-L.; Jiang, C.; Zheng, B.-H.; Li, Q.-M. Taxi on Service Trip Characteristics Based on Multi-source Data Fusion: A Case of Yueyang. J. Transp. Syst. Eng. Inf. Technol. 2018, 18, 45–51. [Google Scholar]
  20. MacQueen, J.; Plaut, D.; Blanchard, R. A Simplified Colorimetric Method for Serum Isocitrate Dehydrogenase. Am. J. Med. Technol. 1972, 38, 377–380. [Google Scholar]
  21. Zhao, S.; Xiao, Y.; Ning, Y.; Zhou, Y.; Zhang, D. An Optimized K-Means Clustering for Improving Accuracy in Traffic Classification. Wireless Pers. Commun. 2021, 120, 81–93. [Google Scholar] [CrossRef]
  22. Kumar, K.-M.; Reddy, A.-R.-M. An efficient k-means clustering filtering algorithm using density based initial cluster centers. Inf. Sci. 2017, 418, 286–301. [Google Scholar] [CrossRef]
  23. Guo, X.-S.; Zhong, J. Optimisation of K-means Algorithm Based on Sample Density Canopy. Int. J. Ad Hoc Ubiquitous Comput. 2021, 38, 62–69. [Google Scholar]
  24. Liu, Y.; Li, B. Bayesian Hierarchical K-means Clustering. Intell. Data Anal. 2020, 24, 977–992. [Google Scholar] [CrossRef]
  25. Zhang, Y.; Zhou, Y.; Guo, X.; Wu, J.; He, Q.; Liu, X.; Yang, Y. Self-Adaptive K-Means Based on a Covering Algorithm. Complexity 2018, 2018, 7698274. [Google Scholar] [CrossRef]
  26. Zhang, Z.; Feng, Q.; Huang, J.; Guo, Y.; Xu, J.; Wang, J. A Local Search Algorithm for K-Means with Outliers. Neurocomputing 2021, 450, 230–241. [Google Scholar] [CrossRef]
  27. Chen, C.; Wang, Y.; Hu, W.; Zheng, Z. Robust Multi-View K-Means Clustering with Outlier Removal. Knowl.-Based Syst. 2020, 210, 106518. [Google Scholar] [CrossRef]
  28. Yu, S.-S.; Chu, S.-W.; Wang, C.-M.; Chan, Y.-K.; Chang, T.-C. Two Improved K-Means Algorithms. Appl. Soft Comput. 2018, 68, 747–755. [Google Scholar] [CrossRef]
  29. McCallum, A.; Nigam, K.; Ungar, L.-H. Efficient clustering of high-dimensional data sets with application to reference matching. In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Boston, MA, USA, 20–23 August 2000. [Google Scholar]
  30. Zhang, G.; Zhang, C.; Zhang, H. Improved K-means Algorithm Based on Density Canopy. Knowl.-Based Syst. 2018, 145, 289–297. [Google Scholar] [CrossRef]
  31. Xia, D.; Ning, F.; He, W. Research on Parallel Adaptive Canopy-K-Means Clustering Algorithm for Big Data Mining Based on Cloud Platform. J. Grid Comput. 2020, 18, 263–273. [Google Scholar] [CrossRef]
  32. Dorigo, M.; Maniezzo, V.; Colorni, A. Ant system: Optimization by a colony of cooperating agents. IEEE Trans. Syst. Man Cybern. Part B Cybern. 1996, 26, 29–41. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  33. Maulik, U.; Bandyopadhyay, S. Performance evaluation of some clustering algorithms and validity indices. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 24, 1650–1654. [Google Scholar] [CrossRef] [Green Version]
  34. Ouallane, A.A.; Bakali, A.; Bahnasse, A.; Broumi, S.; Talea, M. Fusion of engineering insights and emerging trends: Intelligent urban traffic management system. Inf. Fusion 2022, 88, 218–248. [Google Scholar] [CrossRef]
Figure 1. Technical roadmap of ETC passenger car user travel characteristics identification combined model on expressway.
Figure 1. Technical roadmap of ETC passenger car user travel characteristics identification combined model on expressway.
Sustainability 15 11619 g001
Figure 2. Statistics of the number of vehicles corresponding to the total number of days of monthly travel. (a) May; (b) June.
Figure 2. Statistics of the number of vehicles corresponding to the total number of days of monthly travel. (a) May; (b) June.
Sustainability 15 11619 g002
Figure 3. Statistics of the number of vehicles corresponding to the number of days of travel on workdays. (a) May; (b) June.
Figure 3. Statistics of the number of vehicles corresponding to the number of days of travel on workdays. (a) May; (b) June.
Sustainability 15 11619 g003
Figure 4. Statistics of the number of vehicles corresponding to the number of travel days on weekends and holidays.
Figure 4. Statistics of the number of vehicles corresponding to the number of travel days on weekends and holidays.
Sustainability 15 11619 g004
Figure 5. Statistics of the number of vehicles corresponding to the number of monthly trips: (a) May; (b) June.
Figure 5. Statistics of the number of vehicles corresponding to the number of monthly trips: (a) May; (b) June.
Sustainability 15 11619 g005
Figure 6. Lorenz curve of the number of travel vehicles and toll records.
Figure 6. Lorenz curve of the number of travel vehicles and toll records.
Sustainability 15 11619 g006
Figure 7. Traffic flow chart for different travel periods on workdays.
Figure 7. Traffic flow chart for different travel periods on workdays.
Sustainability 15 11619 g007
Figure 8. Traffic flow chart for different travel periods on weekends and holidays.
Figure 8. Traffic flow chart for different travel periods on weekends and holidays.
Sustainability 15 11619 g008
Figure 9. Statistics of the number of vehicles corresponding to travel duration. (a) Workdays; (b) weekends and holidays.
Figure 9. Statistics of the number of vehicles corresponding to travel duration. (a) Workdays; (b) weekends and holidays.
Sustainability 15 11619 g009
Figure 10. Statistics of the number of vehicles corresponding to travel distance. (a) Workdays; (b) weekends and holidays.
Figure 10. Statistics of the number of vehicles corresponding to travel distance. (a) Workdays; (b) weekends and holidays.
Sustainability 15 11619 g010
Figure 11. Statistics of the number of vehicles corresponding to trajectory repetition rate.
Figure 11. Statistics of the number of vehicles corresponding to trajectory repetition rate.
Sustainability 15 11619 g011
Figure 12. Schematic diagram of the principle of the Canopy algorithm.
Figure 12. Schematic diagram of the principle of the Canopy algorithm.
Sustainability 15 11619 g012
Figure 13. Flow chart of Canopy-k-means algorithm.
Figure 13. Flow chart of Canopy-k-means algorithm.
Sustainability 15 11619 g013
Figure 14. Flow chart of Canopy-K-means clustering algorithm based on ant colony algorithm optimization.
Figure 14. Flow chart of Canopy-K-means clustering algorithm based on ant colony algorithm optimization.
Sustainability 15 11619 g014
Figure 15. BP neural network model operation diagram.
Figure 15. BP neural network model operation diagram.
Sustainability 15 11619 g015
Figure 16. Design framework of BP neural network recognition model.
Figure 16. Design framework of BP neural network recognition model.
Sustainability 15 11619 g016
Figure 17. Statistics of effective traffic data.
Figure 17. Statistics of effective traffic data.
Sustainability 15 11619 g017
Figure 18. Clustering results of the Canopy algorithm.
Figure 18. Clustering results of the Canopy algorithm.
Sustainability 15 11619 g018
Figure 19. Clustering results of the Canopy-K-means algorithm.
Figure 19. Clustering results of the Canopy-K-means algorithm.
Sustainability 15 11619 g019
Figure 20. Clustering results after optimization with ant colony algorithm.
Figure 20. Clustering results after optimization with ant colony algorithm.
Sustainability 15 11619 g020
Figure 21. Radar map of travel characteristics of expressway ETC passenger car users.
Figure 21. Radar map of travel characteristics of expressway ETC passenger car users.
Sustainability 15 11619 g021
Figure 22. Architecture of identification model for travel characteristics group of expressways ETC passenger car users.
Figure 22. Architecture of identification model for travel characteristics group of expressways ETC passenger car users.
Sustainability 15 11619 g022
Figure 23. Identification results chart of different travel groups. (a) “Visiting and traveling” group, (b) “long-distance” group, (c) “official business” group, (d) “public and commercial affairs” group, (e) “commuting” group, and (f) “sporadic” group.
Figure 23. Identification results chart of different travel groups. (a) “Visiting and traveling” group, (b) “long-distance” group, (c) “official business” group, (d) “public and commercial affairs” group, (e) “commuting” group, and (f) “sporadic” group.
Sustainability 15 11619 g023
Figure 24. Testing of clustering effectiveness indicators under different cluster numbers using Canopy algorithm. (a) SSE (Sum of Squared Error), (b) CH (Calinski–Harabasz Index), and (c) DB (Davies–Bouldin Index).
Figure 24. Testing of clustering effectiveness indicators under different cluster numbers using Canopy algorithm. (a) SSE (Sum of Squared Error), (b) CH (Calinski–Harabasz Index), and (c) DB (Davies–Bouldin Index).
Sustainability 15 11619 g024
Figure 25. Comparison chart of iterative effects. The dotted line is the CH value when Canopy-K-means converges.
Figure 25. Comparison chart of iterative effects. The dotted line is the CH value when Canopy-K-means converges.
Sustainability 15 11619 g025
Figure 26. Comparison of clustering CH index before and after optimization.
Figure 26. Comparison of clustering CH index before and after optimization.
Sustainability 15 11619 g026
Figure 27. Evaluation of model recognition performance.
Figure 27. Evaluation of model recognition performance.
Sustainability 15 11619 g027
Table 1. Correlation analysis of characteristic indicators in different periods.
Table 1. Correlation analysis of characteristic indicators in different periods.
IndicatorTime PeriodPearson Correlation Coefficient
Monthly total travel daysMayJune0.9939
Travel days on workdaysMayJune0.9993
Monthly travel tripsMayJune0.9928
Travel periodsWeekendsHolidays0.9683
Table 2. Summary of characteristic indicators in different periods.
Table 2. Summary of characteristic indicators in different periods.
Time PeriodTravel Duration (h)Travel Distance (km)
MeanMean
Workdays2.25100.03
Non-workdays4.33118.71
Table 3. Correlation analysis results of travel characteristic indicators.
Table 3. Correlation analysis results of travel characteristic indicators.
X1X2X3X4X5X6
X11.0000.909 **−0.1570.707 **0.018−0.160
X20.909 **1.000−0.1080.604 **0.088−0.121
X3−0.157−0.1081.000−0.098−0.040−0.006
X40.707 **0.604 **−0.0981.0000.1330.061
X50.0180.088−0.0400.1331.000−0.041
X6−0.160−0.121−0.0060.061−0.0411.000
X1 0.0000.0210.0000.0490.008
X20.000 0.0180.0000.0220.010
X30.0210.018 0.0210.0290.018
X40.0000.0000.021 0.0090.024
X50.0490.0220.0290.009 0.032
X60.0080.0100.0180.0240.032
**. Significant correlation at a confidence level of 0.01 (one-tailed).
Table 4. Sampling results of initial distance threshold T 1 and T 2 .
Table 4. Sampling results of initial distance threshold T 1 and T 2 .
Sampling TimesT1MeanT2Mean
14.34.222.93.11
24.53.4
34.63.5
43.92.8
54.12.9
64.13.0
74.23.3
84.33.1
94.23.2
104.03.0
Table 5. Clustering results of the Canopy algorithm.
Table 5. Clustering results of the Canopy algorithm.
Cluster CategoriesCentroid of Canopy Subset
Monthly Travel FrequencyAverage Travel Distance Per TripTravel Preference during Peak HoursTravel Preference on Weekends and Holidays
0−0.3745−0.0697−0.8747−0.6868
1−0.56754.9653−0.0172−0.2047
23.3059−0.06770.9569−0.7106
3−0.7481−0.27380.0210−0.3106
4−0.47420.0310−0.05761.1288
52.0318−0.38310.5309−0.7557
Table 6. Clustering results of the Canopy-K-means algorithm.
Table 6. Clustering results of the Canopy-K-means algorithm.
Cluster CategoriesMonthly Travel FrequencyAverage Travel Distance Per TripTravel Preference during Peak HoursTravel Preference on Weekends and Holidays
0−0.1289−0.18760.0285−0.3056
1−0.68743.6852−0.0578−0.1678
2−0.85470.3256−0.11031.2587
31.4587−0.09850.1035−0.7058
4−0.9875−0.5089−0.0238−0.5269
53.0167−0.67121.7265−0.9537
Number of iterations368
Table 7. Comparison of clustering effects with different iterations.
Table 7. Comparison of clustering effects with different iterations.
Number of IterationsCluster NumberCH Value
10068,185,487.3264
20068,398,752.5481
30068,433,489.2158
40068,296,325.589
50068,133,256.6584
Table 8. Parameter setting of ant colony algorithm.
Table 8. Parameter setting of ant colony algorithm.
ParameterValue
Volatile factor ( ρ )0.1
Threshold 1 ( q 0 )0.9
Constant ( L )50
Threshold 2 ( P s )0.9
Cluster number ( K )6
Ant quantity ( A )200
Maximum iteration times ( t _ max )300
Table 9. Optimization results of ant colony algorithm.
Table 9. Optimization results of ant colony algorithm.
Cluster CategoriesSample SizeMonthly Travel FrequencyAverage Travel Distance Per TripTravel Preference during Peak HoursTravel Preference on Weekends and Holidays
0441,360−0.53380.1866−0.08871.1134
187,030−0.48073.5016−0.0397−0.1508
2520,395−0.0313−0.12950.0043−0.2916
3238,3331.1706−0.00220.0734−0.5551
435,1362.7889−0.48161.6572−0.9112
5320,666−0.6048−0.0714−0.0652−0.7548
Table 10. Clustering centers of different feature groups.
Table 10. Clustering centers of different feature groups.
Cluster CategoriesSample SizeMonthly Travel FrequencyAverage Travel Distance Per TripTravel Preference during Peak HoursTravel Preference on Weekends and Holidays
0441,3601.5222123.61260.35880.9017
187,0301.7623531.93980.37130.4902
2520,3953.793584.67750.38250.4444
3238,3339.22667.45910.40010.3586
435,13616.540541.30790.80350.2427
5320,6661.201491.82840.36480.2936
Table 11. Neural network model parameter setting.
Table 11. Neural network model parameter setting.
ParameterValue
Number of layers in neural network5
Number of neurons in hidden layer7
Expected error0.05
Learning rate0.01
Momentum factor0.9
Activation functionSigmoid
Table 12. Clustering results of traditional K-means algorithm.
Table 12. Clustering results of traditional K-means algorithm.
Cluster CategoriesMonthly Travel FrequencyAverage Travel Distance Per TripTravel Preference during Peak HoursTravel Preference on Weekends and Holidays
02.5523−0.35951.0325−0.8512
1−0.1864−0.11460.2641−0.3887
2−0.56504.9311−0.0402−0.2561
3−0.5004−0.09630.0184−0.6978
41.0772−0.0656−0.2013−0.5878
5−0.4516−0.0471−0.23791.0767
Number of iterationsWhen the maximum number of iterations was set to 500,
the model did not converge.
Table 13. Comparison of clustering CH index before and after optimization.
Table 13. Comparison of clustering CH index before and after optimization.
Traditional K-MeansCanopy-K-MeansAnt Colony Optimization-Based Canopy-K-Means
Cluster number666
CH value6,846,927.07717,769,583.56918,433,489.2158
Table 14. Confusion matrix of model recognition results.
Table 14. Confusion matrix of model recognition results.
Visiting and TravelingLong-DistanceOfficial BusinessPublic and Commercial AffairsCommutingSporadic
Visiting and traveling83,631414665863548
Long-distance18416,97821123208
Official business8323299,459322731498
Public and commercial affairs673202645,18132664
Commuting5265269593
Sporadic240955343328560,705
Table 15. Evaluation of model recognition performance.
Table 15. Evaluation of model recognition performance.
PopulationPrecisionRecallF1-ScoreAccuracy
Visiting and traveling95.99%94.74%95.36%95.23%
Long-distance94.42%97.54%95.95%
Official business96.93%95.56%96.24%
Public and commercial affairs93.14%94.78%93.96%
Commuting94.94%99.03%96.94%
Sporadic93.35%94.65%94.00%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Cai, X.; Zhang, Y.; Zhang, X.; Peng, B. Travel Characteristics Identification Method for Expressway Passenger Cars Based on Electronic Toll Collection Data. Sustainability 2023, 15, 11619. https://0-doi-org.brum.beds.ac.uk/10.3390/su151511619

AMA Style

Cai X, Zhang Y, Zhang X, Peng B. Travel Characteristics Identification Method for Expressway Passenger Cars Based on Electronic Toll Collection Data. Sustainability. 2023; 15(15):11619. https://0-doi-org.brum.beds.ac.uk/10.3390/su151511619

Chicago/Turabian Style

Cai, Xiaoyu, Yihan Zhang, Xin Zhang, and Bo Peng. 2023. "Travel Characteristics Identification Method for Expressway Passenger Cars Based on Electronic Toll Collection Data" Sustainability 15, no. 15: 11619. https://0-doi-org.brum.beds.ac.uk/10.3390/su151511619

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop