Next Article in Journal
Symmetries and Solutions for a Class of Advective Reaction-Diffusion Systems with a Special Reaction Term
Next Article in Special Issue
Differential Evolution Based Numerical Variable Speed Limit Control Method with a Non-Equilibrium Traffic Model
Previous Article in Journal
Quantum Inspired Task Optimization for IoT Edge Fog Computing Environment
Previous Article in Special Issue
Neural Network-Based Hybrid Forecasting Models for Time-Varying Passenger Flow of Intercity High-Speed Railways
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Framework to Analyze Function Domains of Autonomous Transportation Systems Based on Text Analysis

1
School of Traffic and Transportation Engineering, Central South University, Changsha 410075, China
2
School of Intelligent Systems Engineering, Sun Yat-sen University, Guangzhou 510275, China
*
Author to whom correspondence should be addressed.
Submission received: 13 November 2022 / Revised: 16 December 2022 / Accepted: 23 December 2022 / Published: 28 December 2022
(This article belongs to the Special Issue Mathematical Optimization in Transportation Engineering)

Abstract

:
With the development of information and communication technologies, the current intelligent transportation systems (ITSs) will gradually become automated and connected, and can be treated as autonomous transportation systems (ATSs). Function, which unites cutting-edge technology with ATS services as a fundamental component of ATS operation, should be categorized into function domains to more clearly show how ATS operates. Existing ITS function domains are classified mostly based on the experience of experts or the needs of practitioners, using vague classification criteria. To ensure tractability, we aim to categorize ATS functions into function domains based on text analysis, minimizing the reliance on subjective experience. First, we introduce the Latent Dirichlet Allocation (LDA) topic model to extract text features of functions into distribution weights, reflecting the semantics of the text data. Second, based on the LDA model, we categorize ATS functions into twelve function domains by the k-means method. The comparison between the proposed function domains and the existing counterparts of other ITS framework demonstrates the effectiveness of the LDA-based classification method. This study provides a reference for text processing and function classification of ATS architecture. The proposed functions and function domains reveal the objectives in future transportation systems, which could guide urban planners or engineers to better design control strategies when facing new technologies.

1. Introduction

1.1. Background

The current intelligent transportation systems (ITSs) will evolve towards connectivity and automaticity, paving the way for autonomous transportation systems (ATSs) to meet increasingly complex information in the future [1,2]. Through the logic of autonomous perception, learning, decision, and response, ATS will operate autonomously to satisfy the needs of users in the system, including travelers and managers. The development of technology has not only provided new services for the transportation system but also brought about a massive amount of data [3], which must be carefully managed. As a society evolves, the mobility demands of people and goods, as well as the mobility supply given by carriers and infrastructures, become more numerous and diverse. Such rapid expansion can greatly complicate efforts to supply the demand for transportation. Incremental pressures are placed on modern transportation systems to provide more intelligent and autonomous services to alleviate traffic congestion, optimize resource allocation, minimize user costs, and enhance service quality. ATS can accept increasingly autonomous and intelligent services due to the seamless connectivity of the next-generation network and its broad integration with developing technologies, such as big data, artificial intelligence, and federated learning. Consequently, operation costs can be decreased and service quality can be enhanced to enable mass and individual mobility, as well as the management of intelligent carriers and infrastructures.
Automation is the core feature of ATS. On the one hand, ATS will autonomously operate, including dispatching public transportation to satisfy travelers, and managing traffic without staff. The essence is to shift the status quo of relying on human management of the transportation system and to increase autonomy. The objective of ATS is to provide a transportation system that is safe, effective, convenient, low-carbon, and economical. On the other hand, as technologies advance such as cloud computing and artificial intelligence, functions will be more sophisticated to dynamically orchestrate the services. Therefore, system participants (e.g., users, carriers, and supporters) will be better integrated to enable cloud-to-edge service interoperability. Consequently, mobility services (such as operation, maintenance, emergency, etc.) will become more autonomous. ATS will adapt to such variations autonomously, which is the most important characteristic of ATS.
To construct ATS, an architecture is required to clarify the relationship among the elements (i.e., component, demand, service, function, technology) through their function [1]. Services provided by ATS, e.g., travel planning, automatic driving, and automatic parking, are supported by corresponding ATS functions to satisfy the demand from users. Functions control equipment and manage resources to support services, e.g., collect travel information, analyze road information, generate travel plans, and convey information to users. The classification of functions, called the function domain, is an essential part of ATS, since it can further illustrate the connection among the functions and demonstrate how ATS operates. The function domain contributes to the arrangement of ATS functions and the resolution of intra-domain and inter-domain correlation. Moreover, it facilitates the identification of essential ATS components.
Therefore, it is necessary to obtain functions and function domains when constructing transportation systems. Functions not only connect other elements but also could provide a reference for engineers to design specific systems. Function domains could further clarify the structure and operation mechanism of systems. Among various ITS architectures, as well as in the ATS architecture, functions are generally characterized by text data. To classify functions into function domains objectively, it is necessary to analyze the textual features of functions.

1.2. Paper Contribution

This study proposes an analytical framework to construct the function architecture of ATS referring to ITS architectures. Considering that functions are characterized by text data, we introduce a topic model to extract features of functions. The features are clustered by k-mean and k-medoid algorithms, and the results are evaluated by silhouette coefficient. According to the topic of each cluster, we define function domains, which can further clarify the relationship among ATS elements. In all, the contributions of this study are as follows:
(1) We propose a framework to obtain ATS functions by analyzing the requirements of ATS.
(2) We classify ATS functions into twelve function domains based on the LDA model.
(3) The comparison between ATS function domains and those from traditional ITS architectures demonstrate the applicability of the proposed function domains.

1.3. Organization

The remaining sections of this paper are structured as follows. Section 2 provides a literature review of ITS architectures and text analysis, and indicates the research gap. Section 3 provides an illustration of ATS with a specific example. Section 4 presents data and methodology using a flow chart, and explains the methods in detail. Section 5 describes the results, including the silhouette coefficient of each cluster and topic of each domain. Section 6 discusses the results compared with ITS architectures in China, America, and Europe, which demonstrates the rationality of ATS function domains. Section 7 concludes the article and proposes future work.

2. Literature Review

2.1. Traditional Classifications of Functions

Traditionally, different countries and regions classify ITS functions based on different objectives and criteria, formulating distinct ITS frameworks [4]. The International Organization for Standardization (ISO) has established standards referring to different representative frameworks [5], such as the Architecture Reference for Cooperative and Intelligent Transportation (ARC-IT), Connected Vehicle Reference Implementation Architecture, and European ITS Framework Architecture. These ITS architectures have provided practical guidance in both the research field and engineering applications. However, there are still differences in some countries or regions, represented by ARC-IT. The United States proposed ARC-IT, taking advice from transportation practitioners, systems engineers, system developers, technical experts, and consultants. ARC-IT consists of four views: user view, function view, physical view, and communications view [6]. User view describes the organizations in the system and the relationships among them. Function view demonstrates function elements (also known as processes) and their logical interaction modes in the system. Physical view describes physical objects such as systems and equipment and their relations with functions. Communication view illustrates the set of protocols needed to communicate between physical objects.
The ITS framework of the European Union, known as the FRAME architecture, has been proposed since the 1990s, and it has been updated to version 4.1 [7]. The architecture contains a number of views, including functional view, physical view, communications view, organizational view, and information view. Regarding the functional view, the architecture arranges functions according to their complexity in a hierarchy. Besides this, the description of function domains consists of the functionality of domains and the links among domains. Specific function domains are shown in the discussion section.
The ITS framework in China is mainly composed of user services, logical framework, physical framework, and standards [8]. The designers define eight service domains based on ISO and then propose users’ demand for ITS. The function set is divided into three levels: system function, process, and sub-process, which are used to illustrate the connection between the physical framework and the logical framework. The Chinese ITS framework serves as a foundation for the development of local transportation systems [9,10,11,12]. With the development of autonomous driving, Chinese scholars have proposed a corresponding framework, which integrates a cooperative vehicle infrastructure system (CVIS) into the ITS framework [13].
Based on ITS, many subsystems have been developed considering emerging technologies, such as vehicle tracking in ITS [14], accident management systems [15], and communication systems [16]. However, few studies focus on the whole transportation system, neglecting its adaptability to future technologies and requirements.
In addition, although the elements in different ITS framework are similar and function is the core among these, the traditional classifications of functions are still based on the experience of designers, which lacks specific criteria.

2.2. Text Analysis

Clustering is a traditional method to divide objectives into different categories without preset labels. Multiple clustering algorithms exist, including hierarchical clustering [17], division-based k-value clustering [18], density-based spatial clustering, grid-based statistical information clustering, and model-based latent class analysis [19]. The selection of clustering algorithms is commonly determined by the data types.
However, functions are characterized by text data, which could not be clustered by traditional algorithms, as it needs to extract text features to calculate the similarity. Latent Dirichlet Allocation is a three-level hierarchical Bayesian model [20,21], which has been widely applied to various fields of text analysis [22,23,24]. Guo et al. used LDA to make a cluster analysis of 266,544 user comments on hotels and to dig out the factors that have an important influence on hotel operation [25]. Tirunillai et al. use LDA to analyze user reviews of 15 companies’ products, mining potential information to develop business strategies [26]. After feature extraction by LDA, a weight distribution will be obtained to denote the original function text. Text similarity could be measured by different metrics, including Euclidean distance, Manhattan distance, Hamming distance, etc.
Inspired by the above studies, we introduce LDA to extract features of function text, which can measure the similarity among functions. In addition, we calculate the similarity of weight distribution obtained from LDA and utilize the k-value clustering algorithm to obtain function domains.

3. Autonomous Transportation Systems

Referring to the existing transportation system, ATS is composed of five elements, i.e., component, demand, service, function, and technology. Component consists of four parts: user, facility, vehicle, and environment. Produced by the users in the component, demand includes the activities in ATS. Service meets users’ needs based on demand. Function is supported by technology and related equipment to implement services. These five elements work together to ensure that ATS can operate autonomously in the logic of autonomy and adapt to changes in technology and demand. To meet the requirements of the whole transportation system, elements will change significantly in both expression forms and operation mechanisms in different generations of technology.
To further illustrate ATS, we take a trip as an example, as shown in Figure 1. The components involved in a trip include users, vehicles, the highway, and the environment. The basic demand of travelers, the main users, is to get somewhere. The goals of this demand are diverse, including safety, convenience, efficiency, environmental protection, and economy. A series of services are provided to satisfy these needs, such as travel planning, automatic driving, facilities maintenance, and traffic management. To provide these services, ATS integrates technologies to support functions, including sensor technology, data calculation, and communication. Functions operate in the logic of perception, learning, decision, and response, which are the characteristics of ATS. For instance, to provide a travel-planning service, it is necessary to collect users’ travel intention and traffic network data, which are analyzed to generate plans for users. Function is directly driven by technology, which is the key element in the framework. Functions will update as technology advances, and other elements will change accordingly. Therefore, it is necessary to explore the mechanisms of interaction among functions based on the above architecture to make sure that ATS operates and iterates autonomously.

4. Methodology

This study consists of four parts, namely, data collection, data processing, data clustering, and results analysis, as indicated in Figure 2. Aiming at supporting services, this paper considers function with the logic of autonomy combining related frameworks and literature in the first part. In the second part, we employ the topic model Latent Dirichlet Allocation (LDA) to extract text features after tokenization and obtain topic distribution weight. In the clustering part, we input the distribution-weight matrix into k-mean and k-medoid algorithms, and the results are compared through a word cloud. To better analyze the clusters in the fourth part, we visualize results in low-dimensional space. The final clustering results are output as function domains after comparison with their counterpart in ITS frameworks. The pseudo-code of the whole methodology is defined Algorithm 1:
Algorithm 1: The pseudo-code of the whole methodology
  • Input: function set { f 1 , f 2 , , f n }
  • Output: clustering labels for each function C = { C 1 , C 2 , , C k }
    • Divide function text into words with Jieba
    • Extract feature with LDA according to Equations (1)–(4)
    • Obtain weight matrix G from LDA, i.e., Equation (5)
    • Calculate the similarity among samples in G using Equation (7)
    • Cluster the samples with k-mean and k-medoid methods separately, using Equations (6) and (8)
    • Reduce the dimensionality of G using Equations (9)–(14)
    • Repeat Step 4 and Step 5 using low-dimensional G
    • Compare the results from Step 5 and Step 7
    • Obtain the final clustering results C
  • End

4.1. Data

According to the definition of ATS services, we build an analytical framework to obtain functions in the logic of autonomy, i.e., perception, learning, decision and response. Function set can be denoted by { f 1 , f 2 , , f n } . Denoting service set as { s 1 , s 2 , , s u } , any service s i requires the combination of functions { f i 1 , f i 2 , , f i t } . When analyzing the functions, we use the framework considering related references and externalize each layer of logic, as shown in Figure 3. To better utilize data, we define ATS functions with six attributes, including function provider P , process information I , service object O , realization approach A , logic L , and technology T , as shown in Table 1. Based on these attributes, one function f j can be described as { P j , I j , O j , A j , L j , T j } and the symbols denote the six attributes, which all have corresponding text values, respectively. P j and O j come from one of the basic elements in ATS (i.e., component), while T j belongs to the technology used in ATS.s
Regarding how to obtain functions using an analytical framework, we take an automatic parking service as an example, as shown in Figure 3. To support this service, perception is the first thing to consider, which requires acquiring surrounding information, i.e., monitoring real-time surroundings. Learning is a process of transferring and analyzing data to provide detailed information to decision units in vehicles, which needs the functions store, import, and analyze surrounding information. ‘Decision’ denotes making a decision or generating plans according to analysis results, which requires the function generate scheme in this service. ‘Response’ represents carrying out decisions through a series of activities including releasing orders and controlling devices. In the service automatic parking, a vehicle needs to control the parking device to perform the scheme generated from decision units. The mentioned functions are shown in Table 1 as semi-structured text data. Each function is determined by six attributes to form redefinitions using a unified paradigm. For instance, to support the automatic parking service, the vehicle provides the perception function, i.e., monitoring real-time surroundings. The surrounding environmental information is utilized by the subsequent functions through data collection, data analysis, equipment control, etc. Sensor technology is essential technology in this function. The perception module in automatic parking is accomplished through the above process.

4.2. Feature Extraction

In order to extract feature of function data, we use a tokenization tool, named Jieba, to process the unstructured part of the text. Tokenization is basic processing in text mining [27], which divides sentences into words or phrases. Natural Language Toolkit and Stanford CoreNLP are commonly used in English text mining [28]. However, Chinese text is different from English text in that there are no spaces between words in a sentence, and the combination of words may show very different meanings. Tokenization of Chinese text is more complicated, which needs more training. Jieba is the most widely used Chinese tokenization tool with high accuracy [29,30,31]. Therefore, we use Jieba in the following process.
Latent Dirichlet Allocation is a three-level hierarchical Bayesian model. LDA models each item as a finite mixture over underlying topics [20,21]. LDA obtains insightful experimental findings in various fields [32,33,34,35]. In LDA, a word, the basic unit of discrete data, is defined to be an item from a vocabulary denoted by { 1 , , V } . An N-word document is characterized by the notation w = ( w 1 , w 2 , , w N ) . The corpus is a set of M documents represented by D = { w 1 , w 2 , , w N } . As shown in Figure 4, LDA models documents as a random mix of potential topics with unique word distributions.
The generation process for each document w in a corpus D are as follows [22]:
  • Select N Poisson ( ξ ) ;
  • Select θ Dir ( α ) ;
  • For each of the N words w n :
(a)
Select a topic z n Multinomial ( θ ) ;
(b)
Select a word w n from p ( w n | z n , β ) , a multinomial probability subject to the topic z n .
A k-dimensional variable θ can obtain values in the ( k 1 ) -simplex (a k-vector θ exists in the ( k 1 ) -simplex if θ i 0 , i = 1 k θ i = 1 ), with the following probability density on the simplex:
p ( θ | α ) = Γ ( i = 1 k α i ) i = 1 k Γ ( α i ) θ 1 α 1 1 θ k α k 1
where α is a k-vector with components α i > 0 , and Γ ( x ) is the gamma function.
Determined the parameters α and β , the joint distribution of a topic mixture θ , N topics   z , and N words w are as follows:
p ( θ , z , w | α , β ) = p ( θ | α ) n = 1 N p ( z n | θ ) p ( w n | z n , β )
where p ( z n | θ ) is simply θ i for the unique i such that z n i = 1 . Subsequently, integrating over θ and summing over z , the marginal distribution of the document is as follows:
p ( w | α , β ) = p ( θ | α ) ( n = 1 N z n p ( z n | θ ) p ( w n | z n , β ) ) d θ
The above equation denotes the probability distribution of the document given α and β . With the marginal probabilities of single documents, the probability of a corpus is calculated by:
p ( D | α , β ) = d = 1 M p ( θ d | α ) ( n = 1 N d z d n p ( z d n | θ d ) p ( w d n | z d n , β ) ) d θ d
This paper uses the LDA database built by Baidu for feature extraction. When we have enough experimental data, it is preferable to use LDA, taking part of the data as the training set, and build our topic library for higher accuracy. However, the amount of function text data is less than the requirement of the self-built topic database. Therefore, we select the developed database in related fields, such as the Baidu LDA topic database. Baidu LDA news uses massive news data training with a list containing 294,657 words and 2000 topics. It is extensively utilized in both research and industry. Therefore, we select Baidu LDA for function-feature extraction.

4.3. Clustering Method

After processing by LDA, the feature of one function f j can be expressed as the row vector [ g j 1 , g j 2 , , g j m ] and the weight matrix of the whole function set can be denoted by:
G = [ g 11 g 12 g 1 m g 21 g 22 g 2 m g n 1 g n 2 g n m ] .
This study inputs the weight matrix G to the clustering algorithm. Assuming that each dimension weight is the same, we calculate the spatial distance as the similarity of functions. There are various distance calculation formulas to represent the similarity between points in the spatial coordinate system, including Euclidean distance and cosine distance. Due to high-dimensional topic distribution, the paper employs Euclidean distance as the similarity index to simplify the calculation.
K-mean and k-medoid methods are utilized to cluster before and after dimensionality reduction [36,37,38]. The extracted weight matrix is high-dimensional, which has a great impact on clustering results [39]. Therefore, this paper adopts t-distributed stochastic neighbor embedding (t-SNE) to reduce the dimension of the weight matrix [40]. The silhouette coefficient is used to evaluate the results [41]. The above methods are demonstrated as follows.

4.3.1. K-Means

K-means, a traditional unsupervised algorithm, partitions objects into k clusters, enabling each object to cluster around the nearest centroid [42]. K-means method outputs clusters with centroids, which minimizes the sum of error over all k clusters, i.e., the function below:
E = i = 1 k x C i d ( x , μ ( C i )
where C 1 , C 2 , , C k denote the clusters, μ ( C i ) denotes the mean of cluster C i , and d ( x , μ ( C i ) represents the similarity between the observation x and μ ( C i ) .
There are different methods defining the dissimilarity d ( x , μ ( C i ) , and a Euclidean metric is typically used. The metric is defined as follows:
d = j = 1 n ( x j μ j ) 2
where x j and μ j denote a point and the centroid of a cluster, respectively.
The steps of the k-means algorithm could be described as follows [43]:
  • Step 1: Choose centroids C 1 , C 2 , , C k from the dataset D randomly as initialized centroids of k clusters.
  • Step 2: Allocate the remaining data to the clusters with the closest centroids by calculating the distance between the points and centroids.
  • Step 3: Take the mean value of each cluster as the new centroid.
  • Step 4: Repeat steps 3 and 4 until there are no changes in the error function or the loop satisfy the pre-set iterations.
  • Step 5: Output the final centroids and clusters as results.

4.3.2. K-Medoids

Unlike k-means, the k-medoid algorithm uses the median of the data, called medoids, instead of the mean. The error function calculates the dissimilarities between each data point and its corresponding medoid as follows:
E = i = 1 k x C i ( x m ( C i )
where C 1 , C 2 , , C k denote the clusters and m ( C i ) represents the medoid of cluster C i .
Several algorithms are derived from k-medoids, and partitioning around medoids is one representative among them. The basic steps could be described as follows [42]:
  • Step 1: Select k points from the dataset D randomly as initialized medoids of k clusters.
  • Step 2: Allocate the remaining data to the clusters with the closest medoids by calculating the distance between the points and medoids.
  • Step 3: Choose a non-medoid point from D randomly and compute the new sum error.
  • Step 4: If the new error calculated in Step 3 is lower than the old value, then exchange the old medoid with the new one.
  • Step 5: Repeat steps 2 to 4 until there are no changes in the error function or the loop satisfy the pre-set iterations.
  • Step 6: Output the final centroids and clusters as results.

4.3.3. t-Distributed Stochastic Neighbor Embedding

The t-distributed stochastic neighbor embedding is adopted to reduce the dimension of the weight matrix, and its main steps can be demonstrated as follows [44]:
  • For the data X = { x 1 , x 2 , , x n } , the conditional probability p j | i represents the similarity of data point x j to x i , which can be calculated by the formula:
    p j | i = exp ( x i x j 2 / 2 δ i 2 ) k i exp ( x i x k 2 / 2 δ i 2 )
    where δ i stands for the variance of the Gaussian, which is centered on data point x i .
2.
In the high-dimensional space, the joint probability p i j is a symmetrical conditional probability, which can be defined as:
p i j = p j | i + p i | j 2 n
where n denotes the amount of data points.
3.
The initial low-dimensional data is characterized as y ( 0 ) = { y 1 , y 2 , , y n } .
4.
In the low-dimensional space, the joint probability q i j is defined based on Student’s t-distribution with one degree of freedom:
q i j = ( 1 + y i y j 2 ) 1 k l ( 1 + y k y l 2 ) 1
5.
To measure similarity between the joint probability P in high-dimensional space and joint probability Q in low-dimensional space, the Kullbach–Leibler divergence between P and Q is defined as:
C = i K L ( P i | | Q i ) = i j p j | i l o g p j | i q j | i
Following this, a gradient descent algorithm is used to minimize the above cost function, and the gradient is computed by:
δ C δ y i = 4 j ( p i j q i j ) ( y i y j ) ( 1 + y i y j 2 ) 1
6.
The low-dimensional space data points is calculated by the formula:
y ( t ) = y ( t 1 ) + η δ C δ y + α ( t ) ( y ( t 1 ) y ( t 2 ) )
where learning rate η and momentum α ( t ) are optimization parameters that should be pre-set.
7.
Assuming T is the pre-set number of iterations, steps (4) to (6) are repeated from t = 1 to t = T . Eventually, the low-dimensional data y ( T ) = { y 1 , y 2 , , y n } are obtained.

4.3.4. Silhouette Coefficient

In both k-mean and k-medoid algorithms, the most critical choice is the number of clusters K, and we choose the silhouette coefficient to measure the results. The silhouette coefficient ranges from −1 to 1, with larger values indicating greater results in terms of intra-cluster homogeneity and inter-cluster separation [42]. Here is the formula for the silhouette coefficient:
S ( i ) = b ( i ) a ( i ) m a x { a ( i ) ,   b ( i ) }
where:
i is an object in a cluster;
S ( i ) is the silhouette coefficient of i ;
a ( i ) is the average dissimilarity of i to remaining objects within the same cluster;
b ( i ) is the minimum value of average dissimilarity of i to objects in other clusters.
We calculate similarities among functions through the weight of topic distribution G (Equation (5)) as the input of the clustering algorithm. We obtain clustering result C for each function and demonstrate function f j as { P j , I j , O j , A j , L j , T j , C j } . Therefore, the function set can be described as:
F = { P 1 , I 1 , O 1 , A 1 , L 1 , T 1 , C 1 P 2 , I 2 , O 2 , A 2 , L 2 , T 2 , C 2 P j , I j , O j , A j , L j , T j , C j P n , I n , O n , A n , L n , T n , C n }
The value of C varies in { C 1 , C 2 , , C k } , where k represents the clustering number in the algorithm. To analyze the characteristic of function domains, we consider C as the label to divide the function set F into different categories { F 1 , F 2 , , F k } , which are the basis of function domains in the following process.

5. Results

After processing with Jieba and LDA models, we obtain tokenized text and topic keywords. Figure 5a,b show the word cloud of function-text words and topic keywords, respectively. The most frequently appearing texts are traffic, product, service, information, management, and data. From Figure 5b, we can observe that traffic, information, data, analysis, management, etc., also appear frequently. The similarity between the two figures demonstrates the rationality of LDA processing.
The optimal clustering number K = 12 and the clustering method k-means are determined by the silhouette coefficient. The average silhouette coefficient of each cluster is shown in Figure 6a. We could obtain the following information from the figure:
  • The results after dimension reduction are better than those obtained from the original data.
  • Results from the k-mean method are superior to the results from the k-medoid method before and after dimension reduction.
  • When the cluster number lies in the interval [7,12], the variation in the index from k-means clustering is small.
To further compare the results from the two algorithms, we record the index changes during the study. In Figure 6b, k-means clustering results are rather stable at the beginning; only when there are more than 12 clusters will there be fluctuations.
According to the above results, we determine K = 12 and select the k-means algorithm. We visualize the results in Figure 7, showing the results in two-dimensional space. It is evident that all types of clusters are evenly distributed and that the cluster partition is clear, with a distinct clustering center and perimeter. We compare the results mapping cluster labels with function text. In conclusion, the results obtained by the k-means algorithm after dimensionality reduction are more reasonable.
According to the results, we extract keywords of function in 12 clusters. To establish the topic of each cluster, we plot the word cloud of keywords in each topic according to the frequency and mark the topic keywords that occur frequently as shown in Table 2. The combination of keywords can suggest the topics of clusters.

6. Discussion

According to the results above, we speculate about function domains by keyword frequency and functions in the same cluster. From topic 1 to topic 12, function domains could be named as shown in Table 3.
In Table 3, we describe each function domain from the perspective of its component and the connection among domains. Functions must have something common such as theme, target, effect, or logic of realization within a single domain. To support one service, it normally needs functions from different domains to collaborate. For instance, when dealing with emergencies, the system needs to arrange ambulance routes and lead traffic flow, which involves functions in transportation management and vehicle operation. The interaction among domains is inevitable, as seen in Figure 8.
We compared the description in Table 3 with other architecture. In Figure 8, there are three architectures of transportation systems as reference. The same color means these function domains are the same or similar except that the domains in red mean there is no match. From the figure, we could see that most domains defined by us are similar to those in the existing architecture, such as public transport information management, transportation infrastructure management, traffic information collection, commercial traffic management, freight transport, emergency response, data management and collaboration, and traffic management. Compared to other classifications, the most obvious difference is domains related to vehicles. As one of the most significant elements in transportation, each architecture sets an independent domain for vehicle operation. In ATS, the vehicle conveys more functions with autonomy, which could replace some functions provided by humans. Additionally, as technology advances, ATS may need additional autonomous functions equipped on vehicles. Therefore, it makes sense to set three domains for vehicles according to the operation of the vehicle. The other distinction is environmental information management, which other architectures do not set. Besides the reason that function domain classification in ATS is more detailed, this is because the influence of the environment on traffic becomes increasingly important.
The relationships between the functions become more apparent when the function set is divided into 12 function domains. In the function set, service is the basic starting point. There is only a top-down relationship between functions that work together to support a service. In function domains, there are obvious similarities between functions. Functions in the same domain have competitive relationships, while functions in different domains have cooperative, competitive, or completely unrelated relationships. As systems evolve, some functions will grow stronger while some may disappear without competitiveness. However, it is hard to classify such relationships without function domains. The function domain based on function similarity is necessary for the evolution of the ATS framework, since domains could indicate the intrinsic operation of the system.

7. Conclusions

In this study, we introduced the ATS architecture briefly and proposed a method to construct function domains composed of similar functions. To obtain ATS functions, we designed a framework to analyze services in ATS and reconstructed functions with six attributes defined by us. According to the function text, we introduced the LDA model to extract function features and clustered them into 12 categories. Compared to function domains of other architecture from China, America, and Europe, we defined 12 ATS domains that are essential to the whole architecture. Function domains are connected by functions as shown in Section 5, which further reflects the logic of system operation.
Analyzing the basic element of ATS, we provide an idea for developing the architecture with mathematical technology, which could be applied in various fields involving text data. The proposed functions and function domains are fundamental to design equipment in the ATS, since they provide the objectives and classify the connections among ATS elements. With respect to functions, this could provide guidance for designers or researchers to better design future transportation systems. The function domains classify functions into different fields identifying the relationship among functions as well as other ATS elements. Future research should focus on the quantitative analysis of relationships among different ATS elements, as well as the data transformation between different ATS elements.

Author Contributions

Conceptualization, X.H. and X.C.; methodology, X.H.; software, X.H. and R.Z.; validation, X.H. and X.C.; formal analysis, X.H.; data curation, R.Z.; writing—original draft preparation, X.H.; writing—review and editing, X.C.; supervision, M.C.; funding acquisition, M.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by the National Key R&D Program of China (2020YFB1600400).

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. You, L.; He, J.; Wang, W.; Cai, M. Autonomous Transportation Systems and Services Enabled by the Next-Generation Network. IEEE Netw. 2022, 36, 66–72. [Google Scholar] [CrossRef]
  2. Hancock, P.A.; Nourbakhsh, I.; Stewart, J. On the Future of Transportation in an Era of Automated and Autonomous Vehicles. Proc. Natl. Acad. Sci. USA 2019, 116, 7684–7691. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  3. Lian, Y.; Zhang, G.; Lee, J.; Huang, H. Review on Big Data Applications in Safety Research of Intelligent Transportation Systems and Connected/Automated Vehicles. Accid. Anal. Prev. 2020, 146, 105711. [Google Scholar] [CrossRef] [PubMed]
  4. He, J.W.; Zeng, Z.X.; Li, Z.H. Benefit Evaluation Framework of Intelligent Transportation Systems. J. Transp. Syst. Eng. Inf. Technol. 2010, 10, 81–87. [Google Scholar] [CrossRef]
  5. ISO 14813-5:2020; Intelligent Transport Systems-Reference Model Architecture(s) for the ITS Sector—Part 5: Requirements for Architecture Description in ITS Standards. International Standard Organization: Geneva, Switzerland, 2020. Available online: https://www.iso.org/standard/73746.html (accessed on 1 November 2020).
  6. Architecture Reference for Cooperative and Intelligent Transportation. Available online: https://www.arc-it.net/ (accessed on 11 November 2022).
  7. The European Intelligent Transport Systems (ITS) Framework Architecture. 2011. Available online: https://frame-online.eu/ (accessed on 1 December 2020).
  8. Subject Groups of National Intelligent Transport System Architecture. National Intelligent Transport System Architecture, 1st ed.; China Communications Press: Beijing, China, 2003. [Google Scholar]
  9. Yang, Q.; Wang, X.; Qi, T. Intelligent transport systems standards architecture research. J. Highw. Transp. Res. Dev. 2004, 21, 91–94. [Google Scholar]
  10. Chen, X.; Yu, L.; Geng, Y.; Gao, Y. Research on development method of regional ITS architecture. China J. Highw. Transp. 2006, 19, 84. [Google Scholar]
  11. Zhang, K.; Qi, Y.; Jin, L.; Liu, H.; Shen, H.; Liu, D. Development of Regional ITS Architecture for Jiangsu Province. J. Transp. Syst. Eng. Inf. Technol. 2007, 7, 141. [Google Scholar]
  12. Yang, D.; Lin, Q. A General Description of Developing the Shenzhen ITS System. Urban Transp. China 2007, 005, 13–21. [Google Scholar]
  13. Zhang, Y.; Yao, D. Architecture for Intelligent Transportation Systems Based on Intelligent Vehicle-Infrastructure Cooperation Systems; Publishing House of Electronics Industry: Beijing, China, 2015. [Google Scholar]
  14. Salazar-Cabrera, R.; de la Cruz, Á.P.; Molina, J.M.M. Design of a public vehicle tracking service using long-range (LoRa) and intelligent transportation system Architecture. J. Inf. Technol. Res. JITR 2021, 14, 147–166. [Google Scholar] [CrossRef]
  15. Al-Mayouf, Y.R.B.; Mahdi, O.A.; Taha, N.A.; Abdullah, N.F.; Khan, S.; Alam, M. Accident management system based on vehicular network for an intelligent transportation system in urban environments. J. Adv. Transp. 2018, 2018, 6168981. [Google Scholar] [CrossRef]
  16. Cheng, Y.; Zhou, T.; Liang, P. The Improved Precoding Method in the VLC-Based Intelligent Transportation System. J. Adv. Transp. 2022, 2022, 5951389. [Google Scholar]
  17. Lomakina, L.S.; Rodionov, V.; Surkova, A.S. Hierarchical clustering of text documents. Autom. Remote Control 2014, 75, 1309–1315. [Google Scholar] [CrossRef]
  18. Xiong, C.; Hua, Z.; Lv, K.; Li, X. An Improved K-means text clustering algorithm By Optimizing initial cluster centers. In Proceedings of the 2016 7th International Conference on Cloud Computing and Big Data (CCBD), Macau, China, 16–18 November 2016; pp. 265–268. [Google Scholar]
  19. Goodman, L.A. Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika 1974, 61, 215–231. [Google Scholar] [CrossRef]
  20. Blei, D.M.; Ng, A.Y.; Jordan, M.I. Latent Dirichlet Allocation. J. Mach. Learn. Res. 2003, 3, 993–1022. [Google Scholar] [CrossRef]
  21. Blei, D.M.; Lafferty, J.D. A Correlated Topic Model of Science. Ann. Appl. Stat. 2007, 1, 17–35. [Google Scholar] [CrossRef]
  22. Cao, J.; Xia, T.; Li, J.; Zhang, Y.; Tang, S. A Density-Based Method for Adaptive LDA Model Selection. Neurocomputing 2009, 72, 1775–1781. [Google Scholar] [CrossRef]
  23. Hassanpour, S.; Langlotz, C.P. Unsupervised Topic Modeling in a Large Free Text Radiology Report Repository. J. Digit. Imaging 2016, 29, 59–62. [Google Scholar] [CrossRef]
  24. McLaurin, E.; McDonald, A.D.; Lee, J.D.; Aksan, N.; Dawson, J.; Tippin, J.; Rizzo, M. Variations on a theme: Topic modeling of naturalistic driving data. Proc. Hum. Factors. Ergon. Soc. Annu. Meet. 2014, 58, 2107–2111. [Google Scholar] [CrossRef] [Green Version]
  25. Guo, Y.; Barnes, S.J.; Jia, Q. Mining Meaning from Online Ratings and Reviews: Tourist Satisfaction Analysis Using Latent Dirichlet Allocation. Tour. Manag. 2017, 59, 467–483. [Google Scholar] [CrossRef] [Green Version]
  26. Tirunillai, S.; Tellis, G.J. Mining marketing meaning from online chatter: Strategic brand analysis of big data using Latent Dirichlet Allocation. J. Mark. Res. 2014, 51, 463–479. [Google Scholar] [CrossRef] [Green Version]
  27. Tseng, Y.H.; Lin, C.J.; Lin, Y.I. Text Mining Techniques for Patent Analysis. Inf. Process. Manag. 2007, 43, 1216–1247. [Google Scholar] [CrossRef]
  28. Islam, M.R.; Zibran, M.F. SentiStrength-SE: Exploiting Domain Specificity for Improved Sentiment Analysis in Software Engineering Text. J. Syst. Softw. 2018, 145, 125–146. [Google Scholar] [CrossRef]
  29. Peng, K.H.; Liou, L.H.; Chang, C.S.; Lee, D.S. Predicting Personality Traits of Chinese Users Based on Facebook Wall Posts. In Proceedings of the 2015 24th Wireless & Optical Communication Conference, Taipei, Taiwan, 23–24 October 2015; pp. 9–14. [Google Scholar] [CrossRef]
  30. Lin, B.-S.; Wang, C.-M.; Yu, C.-N. The establishment of human-computer interaction based on Word2Vec. In Proceedings of the 2017 IEEE International Conference on Mechatronics and Automation (ICMA), Takamatsu, Japan, 6–9 August 2017; pp. 1698–1703. [Google Scholar] [CrossRef]
  31. Liu, Q.; Zheng, Z.; Zheng, J.; Chen, Q.; Liu, G.; Chen, S.; Chu, B.; Zhu, H.; Akinwunmi, B.; Huang, J.; et al. Health Communication through News Media during the Early Stage of the COVID-19 Outbreak in China: Digital Topic Modeling Approach. J. Med. Internet Res. 2020, 22, e19118. [Google Scholar] [CrossRef]
  32. Wang, Z.; Liao, J.; Cao, Q.; Qi, H.; Wang, Z. Friendbook: A Semantic-Based Friend Recommendation System for Social Networks. IEEE Trans. Mob. Comput. 2015, 14, 538–551. [Google Scholar] [CrossRef]
  33. Sun, L.; Yin, Y. Discovering Themes and Trends in Transportation Research Using Topic Modeling. Transp. Res. Part C Emerg. Technol. 2017, 77, 49–66. [Google Scholar] [CrossRef] [Green Version]
  34. Hwang, S.; Cho, E. Exploring Latent Topics and Research Trends in Mathematics Teachers’ Knowledge Using Topic Modeling: A Systematic Review. Mathematics 2021, 9, 2956. [Google Scholar] [CrossRef]
  35. Escobar, K.M.; Vicente-Villardon, J.L.; de la Hoz-M, J.; Useche-Castro, L.M.; Alarcón Cano, D.F.; Siteneski, A. Frequency of Neuroendocrine Tumor Studies: Using Latent Dirichlet Allocation and Hj-Biplot Statistical Methods. Mathematics 2021, 9, 2281. [Google Scholar] [CrossRef]
  36. Macqueen, J. Some Methods for Classification and Analysis of Multivariate Observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability; University of California Press: Los Angeles, CA, USA, 1967; Volume 1, pp. 281–297. [Google Scholar]
  37. Park, H.S.; Jun, C.H. A Simple and Fast Algorithm for K-Medoids Clustering. Expert Syst. Appl. 2009, 36, 3336–3341. [Google Scholar] [CrossRef]
  38. Zhao, X.; Yu, Q.; Ma, J.; Wu, Y.; Yu, M.; Ye, Y. Development of a Representative EV Urban Driving Cycle Based on a K-Means and SVM Hybrid Clustering Algorithm. J. Adv. Transp. 2018, 2018, 22–25. [Google Scholar] [CrossRef]
  39. Giraud, C. Introduction to High-Dimensional Statistics; Chapman and Hall/CRC: London, UK, 2021. [Google Scholar]
  40. Van der Maaten, L.; Hinton, G. Visualizing Data Using T-SNE. J. Mach. Learn. Res. 2014, 219, 187–202. [Google Scholar] [CrossRef]
  41. Rousseeuw, P.J. Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis. J. Comput. Appl. Math. 1987, 20, 53–65. [Google Scholar] [CrossRef] [Green Version]
  42. Jain, A.K. Data Clustering: 50 Years beyond K-Means. Pattern Recognit. Lett. 2010, 31, 651–666. [Google Scholar] [CrossRef]
  43. Mohammadnazar, A.; Arvin, R.; Khattak, A.J. Classifying Travelers’ Driving Style Using Basic Safety Messages Generated by Connected Vehicles: Application of Unsupervised Machine Learning. Transp. Res. Part C Emerg. Technol. 2021, 122, 102917. [Google Scholar] [CrossRef]
  44. Huang, Y.; Yang, D.; Wang, L.; Jieren, G.; Xiaoyong, Z.; Wang, K. Classification of Weld Seam Width Based on Detrended Fluctuation Analysis, t-Distributed Stochastic Neighbor Embedding, and Support Vector Machine. J. Mater. Eng. Perform. 2022, 31, 3975–3984. [Google Scholar] [CrossRef]
Figure 1. The relationship among ATS elements in a trip.
Figure 1. The relationship among ATS elements in a trip.
Mathematics 11 00158 g001
Figure 2. Flow chart of the study.
Figure 2. Flow chart of the study.
Mathematics 11 00158 g002
Figure 3. Flow diagram of data collection and process.
Figure 3. Flow diagram of data collection and process.
Mathematics 11 00158 g003
Figure 4. Schematic diagram of LDA model.
Figure 4. Schematic diagram of LDA model.
Mathematics 11 00158 g004
Figure 5. Word cloud of function text (a) and topic keywords word (b).
Figure 5. Word cloud of function text (a) and topic keywords word (b).
Mathematics 11 00158 g005
Figure 6. (a) Silhouette coefficient variation of different clustering methods; (b) Silhouette coefficient variation of k-means method after dimensionality reduction, and different colors represent different scenarios.
Figure 6. (a) Silhouette coefficient variation of different clustering methods; (b) Silhouette coefficient variation of k-means method after dimensionality reduction, and different colors represent different scenarios.
Mathematics 11 00158 g006
Figure 7. Visualization of 12 clusters in two-dimensional space where different colors represent different categories.
Figure 7. Visualization of 12 clusters in two-dimensional space where different colors represent different categories.
Mathematics 11 00158 g007
Figure 8. The relationship among function domains.
Figure 8. The relationship among function domains.
Mathematics 11 00158 g008
Table 1. Examples of function text data.
Table 1. Examples of function text data.
Service NameFunction NameAttribute
ProviderProcess
Information
Service ObjectApproachLogicTechnology
automatic parking1. monitor
real-time
surroundings
vehicleambient
information
vehicledata
collection
perceptionsensor
2. store, import, and analyze
surrounding
information
vehicleambient
information
vehicledata
analysis
learningcomputing
3. generate schemesvehicledevice
operation
information
vehicledata
analysis
decisioncomputing
4. control
parking device
vehicledevice
operation
information
driverdevice controlresponsesmart car
Table 2. Topic keyword distribution of each cluster.
Table 2. Topic keyword distribution of each cluster.
Topic 1Topic 2Topic 3
Tool, Data, TechnologyAnalysis, Product, ServiceManagement, Infrastructure
Mathematics 11 00158 i001Mathematics 11 00158 i002Mathematics 11 00158 i003
Topic 4Topic 5Topic 6
Management, SystemEnvironment, CenterInformation, Management
Mathematics 11 00158 i004Mathematics 11 00158 i005Mathematics 11 00158 i006
Topic 7Topic 8Topic 9
Transportation, ServiceAnalysis, Management, EventTool, Analysis, Operation
Mathematics 11 00158 i007Mathematics 11 00158 i008Mathematics 11 00158 i009
Topic 10Topic 11Topic 12
VehicleInformation, ServiceData, Management, Analysis
Mathematics 11 00158 i010Mathematics 11 00158 i011Mathematics 11 00158 i012
Table 3. Description of function domains.
Table 3. Description of function domains.
Topic
No.
Function DomainDescription
1Vehicle Data
Collection
This domain mostly provides perception-related functions to collect the data of vehicles. Functions provided by vehicle equipment and roadside equipment capture data on the ego vehicle, driver, and surrounding vehicles. This domain’s function technology is strongly tied to sensor technology. The functions are mostly used to support services within the Vehicle Operation service domain. This function domain delivers data to other domains, such as vehicle operation, vehicle assistance and safety, and public transport information management.
2Public Transport
Information
Management
This domain serves analysis services for public information to various service domains, such as travel information, traffic management, and freight transport. Technology, such as big data analysis and network computing, supports the functions. This domain provides an analysis of public transport data to other function domains such as traffic information collection, commercial traffic management, and traffic data management.
3Transportation
Infrastructure
Management
This domain provides functions to serve infrastructure including management, analysis, and maintenance. Targeting infrastructure, functions primarily process data relating to big data analysis, computation, and control. This domain delivers data for transportation management and vehicle operations in coordination with data management and collaboration.
4Traffic Information CollectionThis domain covers functions linked to perception for data collection in ATS, excluding vehicle functions. Using sensing and analyzing technology, the function provides various data to support services related to traffic information. This domain is crucial for other domains as it complements vehicle data collection.
5Environmental Information ManagementThis domain provides functions that manage information about the environment related to traffic. Functions collect and analyze environmental data, and output information to other functions, supporting services together. This domain delivers environment data to the function domains vehicle operation, emergency response, and traffic management.
6Transportation ManagementThis domain provides functions to manage traffic comprehensively, such as adjusting traffic signals and leading traffic flow. Using data from the entire system, functions in this domain analyze, decide and respond directly to influence other functions. Functions of management are fundamental to other domains such as public transport information management and freight transport.
7Freight TransportThis domain provides functions to transport freight including the whole logic, such as locating and monitoring freight and planning routes. The functions not only require freight information but also need to cooperate with functions in other domains. This domain necessitates information from environmental information management, transportation management, and domains about vehicles.
8Emergency ResponseThis domain provides functions that enable the system to respond appropriately to emergencies. Functions in this domain could recognize, evaluate events, and respond by specific plans, which requires collaboration with other domains. This domain usually interacts with environmental information management, transportation management, and vehicle operation.
9Vehicle OperationThis domain mainly includes functions that facilitate the vehicle’s safe operation. The functions within the domain are provided by vehicles’ onboard equipment. The functions within the domain are extremely pertinent to vehicle operation service domains. This domain is closely related to vehicle data collection, vehicle assistance, and safety. Supporting freight transport, the functions in the domain require data provided by the vehicle data collection, transport management, and other function domains.
10Vehicle Assistance and SafetyThis domain encompasses the functions associated with electronic payment, performance testing, maintenance, and rescue of the vehicle, which is distinctly different from the vehicle data collection and operation. Functions process vehicle information by collecting, transmitting, analyzing, and outputting data. The functions operate across a variety of service domains, such as freight transport, traffic management, and control, and traffic safety management. This domain is a crucial assurance for the function domain vehicle operation.
11Commercial Traffic ManagementThis domain provides functions that manage commercial traffic, such as taxis. Functions gather traffic data, monitor commercial vehicles, and generate plans. This domain has a particular focus and requires collaboration with traffic management and the domains of vehicles.
12Data Management and CollaborationThis domain provides functions that manage data in ATS, which is the core of all other domains. Functions collect and analyze data from other basic domains. When certain functions require data, this domain can provide exact data.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Huang, X.; Cen, X.; Cai, M.; Zhou, R. A Framework to Analyze Function Domains of Autonomous Transportation Systems Based on Text Analysis. Mathematics 2023, 11, 158. https://0-doi-org.brum.beds.ac.uk/10.3390/math11010158

AMA Style

Huang X, Cen X, Cai M, Zhou R. A Framework to Analyze Function Domains of Autonomous Transportation Systems Based on Text Analysis. Mathematics. 2023; 11(1):158. https://0-doi-org.brum.beds.ac.uk/10.3390/math11010158

Chicago/Turabian Style

Huang, Xiangzhi, Xuekai Cen, Ming Cai, and Rui Zhou. 2023. "A Framework to Analyze Function Domains of Autonomous Transportation Systems Based on Text Analysis" Mathematics 11, no. 1: 158. https://0-doi-org.brum.beds.ac.uk/10.3390/math11010158

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop