Multi-Scale Residual Depthwise Separable Convolution for Metro Passenger Flow Prediction

Li, Taoying; Liu, Lu; Li, Meng

doi:10.3390/app132011272

Open AccessArticle

Multi-Scale Residual Depthwise Separable Convolution for Metro Passenger Flow Prediction

by

Taoying Li

^*,

Lu Liu

and

Meng Li

School of Maritime Economics and Management, Dalian Maritime University, Dalian 116026, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(20), 11272; https://0-doi-org.brum.beds.ac.uk/10.3390/app132011272

Submission received: 18 September 2023 / Revised: 9 October 2023 / Accepted: 12 October 2023 / Published: 13 October 2023

(This article belongs to the Special Issue Machine/Deep Learning: Applications, Technologies and Algorithms)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Accurate prediction of metro passenger flow helps operating departments optimize scheduling plans, alleviate passenger flow pressure, and improve service quality. However, existing passenger flow prediction models tend to only consider the historical passenger flow of a single station while ignoring the spatial relationships between different stations and correlations between passenger flows, resulting in low prediction accuracy. Therefore, a multi-scale residual depthwise separable convolution network (MRDSCNN) is proposed for metro passenger flow prediction, which consists of three pivotal components, including residual depthwise separable convolution (RDSC), multi-scale depthwise separable convolution (MDSC), and attention bidirectional gated recurrent unit (AttBiGRU). The RDSC module is designed to capture local spatial and temporal correlations leveraging the diverse temporal patterns of passenger flows, and then the MDSC module is specialized in obtaining the inter-station correlations between the target station and other heterogeneous stations throughout the metro network. Subsequently, these correlations are fed into AttBiGRU to extract global interaction features and obtain passenger flow prediction results. Finally, the Hangzhou metro passenger inflow and outflow data are employed to assess the model performance, and the results show that the proposed model outperforms other models.

Keywords:

metro passenger flow prediction; spatiotemporal dependencies; graph convolutional network; residual network

1. Introduction

Along with the rapid development of urban transportation, the metro has become an important choice for urban residents due to its convenience and efficiency. The development of the metro network and the continuous growth of passengers have led to increased difficulty in metro operation, which in turn results in station congestion and prolonged commuting time for passengers. In this context, precise prediction of passenger flow has emerged as a pivotal solution to alleviate the operational pressure posed by the ever-increasing demand [1], thereby ensuring a secure and comfortable travel experience for passengers [2]. In addition, accurate passenger flow prediction can also be used to respond to emergency situations or traffic interruptions [3], reduce train delays [4], improve overall punctuality, optimize system operations [5,6], and thereby improve the overall efficiency and service quality of the transportation system. The relevant literature on partial passenger flow prediction and its role in operation is shown in Table 1. Nevertheless, existing passenger prediction methods struggle to handle complex nonlinear interactions and integrate multi-source data, which leads to poor prediction results. Therefore, it is particularly important to improve the prediction accuracy of the models.

In response to the above issues, a considerable amount of research has been conducted on passenger flow prediction methods. The earliest passenger flow prediction methods were statistical methods, followed by machine learning methods. Statistical methods primarily rely on extracting time series patterns from historical data [7,8]. These methods struggle to meet the requirements for real-time performance and prediction accuracy. Compared with statistical methods, machine learning approaches have higher prediction precision and have more advantages in solving multi-source data integration problems. Moreover, deep nonlinear relationships modeling can be achieved by some models, such as back-propagation neural network (BPNN) [9], gradient boosting decision tree (GBDT) [10], and multilayer perceptron (MLP) [11]. In recent years, deep learning methods have attracted considerable attention as a branch of machine learning, which not only automatically extracts intricate temporal and spatial features but also effectively models high-dimensional data. In the initial stage of deep learning, many models based on recurrent neural networks (RNNs) have emerged, which are widely employed in passenger flow prediction tasks for their superior ability to handle temporal features [12,13,14]. As one of the variants of RNN, the gated recurrent unit (GRU) distinguishes itself through its advanced gating mechanism [15,16]. However, despite the progress made in contemporary passenger flow prediction models, a fundamental limitation of these models is their inability to capture spatial features. Thanks to the excellent properties to extract connections in terms of space, the convolutional neural network (CNN) is widely applied to process spatial data with regular grid structures or Euclidean data [17]. As a supplement to the CNN, the graph convolutional network (GCN) [18] is valuable in capturing network-based spatial dependencies from graph-structured data, and it aggregates the edge features of adjacent nodes to promote global modeling [19,20,21].

The spatial dependencies are optimal for metro passenger flow prediction since they can capture the inter-station flow patterns between stations [22]. Considering this, we need to capture not only the local dependencies within each station but also the global interactions between stations throughout the entire metro network. So, some studies focus on multi-station passenger flow prediction, which individually predicts passenger flow for each cluster comprising multiple stations. For instance, Dong et al. [23] employed K-Means to classify all stations into different categories and created a prediction model for the same category in the network. Liu et al. [24] proposed a novel two-step model that predicted the passenger flow for each type of classification result. Unlike stand-alone prediction strategies that only consider interdependencies within a cluster, network-level prediction focuses on interactions across clusters and the flow dynamics of the entire network. In addition to temporal and spatial correlations among stations, the stations are also affected by inter-station correlations measured by geographical connectivity and passenger travel behavior patterns [25]. The inter-station correlations not only exist in local areas but can also occur between geographically distant stations. Nevertheless, some recent models only consider the physical topology of metro networks and neglect the diversity of inter-station dependencies. Zhang et al. [26] constructed a geographic topology graph based on the connection relationships in the metro network. Overall, these methods are limited to learning local spatial dependencies between adjacent stations and cannot fully capture the spatial dependencies of long-term passenger flow trends. Therefore, it provides substantial opportunities to improve the accuracy and robustness of passenger flow prediction.

To address these limitations and improve the performance of passenger flow prediction, we propose a deep learning framework for metro passenger flow prediction, named MRDSCNN (Multiscale Residual Depthwise Separable Convolutional Neural Network). Based on the travel behavior and operational principles of the metro, we extract the historical passenger flow from the smart card data as input. The proposed approach leverages the residual network to capture spatiotemporal dependency details from multiple temporal patterns of passenger flows and utilizes multi-scale convolution to extract inter-station correlations from two graphs. Subsequently, spatiotemporal dependencies and inter-station correlations features are fused and then fed into the AttBiGRU to integrate the comprehensive global information. The passenger inflow and outflow prediction results are obtained through a fully connected layer. The main contributions of this study can be summarized as follows:

(1): We employ the RDSC module to capture spatiotemporal dependencies between stations from various temporal patterns (real-time, daily, and weekly), which is a residual network structure with a channel attention mechanism.
(2): We model inter-station interactions through a network structure correlation graph and passenger flow similarity graph and utilize the MDSC module to enhance multi-scale spatial correlations on these graphs.
(3): Experimental results based on real data of metro passenger inflow and outflow prove that our approach outperforms other baseline models in terms of prediction performance.

The remainder of this study is organized as follows. In Section 2, we define the problem and describe the architecture of the proposed model, providing a detailed overview of the different components of the framework. In Section 3, we elaborate on the experimental details and discuss the results of the case study. The conclusion is given in Section 4.

2. Methodology

2.1. Overall Framework

This study aims to predict the inflow and outflow of passengers based on historical metro passenger flow data and predefined graph structure data. Assuming there are

N

metro stations in the network, the passenger flow at time interval

t

is denoted as

X_{t} \in R^{N \times 1}

. The previous

τ

time step historical passenger flow data across the entire metro system are represented as a signal

X_{T} \in R^{N \times τ} = (X_{t - τ}, X_{t - τ + 1}, X_{t - τ + 2}, \dots, X_{t})

,

T = \{t - τ, t - τ + 1, t - τ + 2, \dots, t\}

, a series of time intervals, whereby

τ

,

τ + 1

, and

τ + 2

represent different time offset values from the current time interval

t

to the past. Aiming to efficiently capture the trends of passenger flow across various time periods (real-time, daily, and weekly), our model contains three different types of temporal patterns

X_{T}^{p} (p = r, d, w)

. The passenger flow data of

N

metro stations in

T

time intervals can be defined as

X = (X_{T}^{r}, X_{T}^{d}, X_{T}^{w})

, whereby

X_{T}^{r}

,

X_{T}^{d}

, and

X_{T}^{w}

represent the passenger inflow or outflow in the three temporal patterns mentioned above, respectively.

The inter-station correlation not only represents the connection between adjacent stations but also represents the correlation between station passenger flow trends in the metro network. Station interaction graphs are constructed to capture inter-station correlations based on network structure correlation graph

G_{c} = (V, E_{c}, A_{c})

and passenger flow similarity graph

G_{s} = (V, E_{s}, A_{s})

, where

V = (v_{1}, v_{2}, \dots, v_{n})

is the set of stations, and

E_{c}

and

E_{s}

are the set of edges that depict connections between stations in different graphs.

A = \{(i, j, a (i, j))| i, j \in V, i \neq j\}

is denoted as adjacency matrix, indicating the weights of the edge from station

v_{i}

to station

v_{j}

.

A_{c}

is the adjacency matrix of

G_{c}

, and the adjacency matrix of

G_{s}

is represented by

A_{s} \in R^{N \times N}

, which is the normalized result of the passenger flow similarity matrix. The main purpose of this study is to perform network-level prediction for passenger inflow and outflow by the data matrices

X

and the station interaction graphs

G_{c}

and

G_{s}

.

The framework of the MRDSCNN model is depicted in Figure 1, encompassing RDSC module, MDSC module, and AttBiGRU module. The RDSC module cleverly utilizes the residual structure of hopping connections and integrates stacked residual blocks in three different temporal modes. The residual block incorporates a channel mechanism, which focuses on the important temporal dynamics features of metro passenger flow. Meanwhile, the MDSC module utilizes multi-scale convolution to extract complex inter-station correlations from network structure correlation graph and passenger flow similarity graph. This strategy empowers the model to capture features at different scales and enhances fine-grained feature modeling. Then, the outputs of the above modules are integrated through the fusion layer and fed into AttBiGRU to learn the global evolutionary features of all stations. Notably, the adoption of depthwise separable convolution (DSC) effectively reduces parameter overhead and maintains predictive performance.

2.2. Construction of Relationship Graphs between Metro Stations

In this section, we explore the relationship between metro stations from two perspectives. One is the track connectivity of adjacent stations, and the other is the correlation of passenger flow patterns between different stations. Considering the station-to-station relevant information from the network map and historical passenger flow, network structure correlation graph and passenger flow similarity graph are constructed to fully explore the complex inter-station correlations.

2.2.1. Network Structure Correlation Graph

When two stations are situated in close proximity, their passenger flow interactions tend to exhibit stronger correlations. That means the topology association between adjacent stations is a critical spatial relationship within the metro network. This spatial relationship exerts a notable influence on the trajectory and speed of passenger traffic, thereby shaping the dynamic behavior of the metro system. In order to effectively capture the relationship among stations, we establish a network structure correlation graph denoted as

G_{c}

referring to the real-world network map. The value of item

a_{c} (i, j)

of adjacency matrix

A_{c}

is assigned to 1 if two stations

i

and

j

are adjacent, and 0 otherwise. Additionally, diagonal entries of the matrix are set to 0 to avoid self-loop connections between stations and themselves, thus reducing the repeated redundant information and computational burden.

a_{c} (i, j) = \{\begin{array}{l} 1, w h e n s t a t i o n s v_{i} a n d v_{j} a r e a d j a c e n t, \\ 0, o t h e r w i s e . \end{array}

(1)

2.2.2. Passenger Flow Similarity Graph

Although the network structure correlation graph can depict the spatial position of stations, it falls short of capturing the intricate interaction of long-term trends in passenger flow. At the same time, the construction of a single graph is limited in fully describing inter-station interactions. Thus, we introduce a passenger flow similarity graph

G_{s}

to provide rich insights and an adequate understanding of the high-order spatiotemporal relationships among stations. Diverging from the network structure correlation graph based on web map, passenger flow similarity graph employs historical passenger flow to model the inter-station correlations through dynamic time warping (DTW) algorithm. The DTW algorithm captures the matching relationships between passenger flows at different time points, which can identify stations with similar travel patterns in passenger flow prediction [27]. Based on this approach, we align and flexibly extend the passenger flow data to determine optimal matching paths and then calculate the distances between data points at the corresponding locations on these paths.

Supposing there are two stations,

v_{i}

and

v_{j}

, which have the passenger flow series

X_{i} = (x_{i, 1}, x_{i, 2}, \dots, x_{i, m})

and

X_{j} = (x_{j, 1}, x_{j, 2}, \dots, x_{j, n})

, respectively. We generate a distance matrix of dimensions

(m \times n)

, where

m

and

n

represent the length of two time series being compared, as shown in Figure 2. The set

L = \{(l_{1}^{i}, l_{1}^{j}), (l_{2}^{i}, l_{2}^{j}), \dots, (l_{k}^{i}, l_{k}^{j})\}

represents the index of each element in the matrix, and

{d i s}_{i, j} (m, n)

denotes the value of every element. The unit of each element in distance matrix of DTW algorithm is person.

The path consisting of gray grids in Figure 2 is the optimal warping path, which depicts how the data points of

X_{i}

are matched with

X_{j}

to achieve the best alignment. Overall differences resulting from alignment are measured by the cumulative warp distance

D_{i, j} (m, n)

, which is defined as the distance between the first

m

data points of

X_{i}

and the first

n

data points of

X_{j}

. In other words, it is the shortest path length from the lower left corner

(1,1)

to any point

(m, n)

in the matrix of Figure 2. Moreover, the computation of the shortest distance adheres strictly to the constraints of boundedness, monotonicity, and continuity, ensuring the reliability and accuracy of the results, as expressed in Equation (2).

\{\begin{matrix} l_{1}^{i} = l_{1}^{j} = 1, l_{k}^{i} = m, l_{k}^{j} = n, \\ l_{k}^{i} \leq l_{k + 1}^{i}, l_{k}^{j} \leq l_{k + 1}^{j}, \\ l_{k + 1}^{i} - l_{k}^{i} \leq 1, l_{k + 1}^{j} - l_{k}^{j} \leq 1 \end{matrix}

(2)

where

l_{1}^{i}

represents the index of the first point

x_{i, 1}

in

X_{i}

, and

l_{1}^{j}

represents the index of the first point

x_{j, 1}

in

X_{j}

.

During the path movement of the DTW algorithm, three potential directions are available: horizontal, vertical, and diagonal directions. For instance, when the current point is at coordinates

(m, n)

, the subsequent point can be chosen from three options:

(m - 1, n)

,

(m, n - 1)

, and

(m - 1, n - 1)

. Consequently, the warped distance resulting from the alignment of the path can be represented, as shown in Equation (3).

D_{i, j} (m, n) = {d i s}_{i, j} (m, n) + m i n \{D_{i, j} (m - 1, n), D_{i, j} (m, n - 1), D_{i, j} (m - 1, n - 1)\}

(3)

where

{d i s}_{i, j} (m, n)

represents the calculated value of point

(m, n)

in the matrix, and we choose the Euclidean distance to measure the warped distance between time series.

The smaller the warped distance, the more similar the passenger flow patterns between two stations, as shown in Equation (4).

D (X_{i}, X_{j}) = m i n \{D_{i, j} (m, n)\}

(4)

Subsequently, we take

1 / D (X_{i}, X_{j})

as the value of item

a_{s} (i, j)

of passenger flow similarity matrix

A_{s}

[28], as shown in Equation (5). Considering the different impacts of time granularities and directional distinctions inherent in historical passenger flow data, a more detailed division of the passenger flow similarity matrix is required in the construction process (presented in Section 3.4.1). In this way, these graphs can help to identify stations (e.g., similar passenger flow patterns, functions, and network structures), thereby responding to different passenger flow changes and complex network dynamics.

A_{s} = [\begin{matrix} \begin{matrix} 1 / D (X_{1}, X_{1}) & 1 / D (X_{1}, X_{2}) & 1 / D (X_{1}, X_{3}) \\ 1 / D (X_{2}, X_{1}) & 1 / D (X_{2}, X_{2}) & 1 / D (X_{2}, X_{3}) \\ 1 / D (X_{3}, X_{1}) & 1 / D (X_{3}, X_{2}) & 1 / D (X_{3}, X_{3}) \end{matrix} & \begin{matrix} \dots & 1 / D (X_{1}, X_{n}) \\ \dots & 1 / D (X_{2}, X_{n}) \\ \dots & 1 / D (X_{3}, X_{n}) \end{matrix} \\ \begin{matrix} ⋮ & ⋮ & ⋮ \\ 1 / D (X_{m}, X_{1}) & 1 / D (X_{m}, X_{2}) & 1 / D (X_{m}, X_{3}) \end{matrix} & \begin{matrix} ⋱ & ⋮ \\ \dots & 1 / D (X_{m}, X_{n}) \end{matrix} \end{matrix}]

(5)

2.3. Residual Depthwise Separable Convolution Module

As we all know, traditional CNN may suffer from performance degradation and overfitting as the network depth increases, especially for limited training data. Therefore, the residual network (ResNet) has emerged as a solution to improve performance and optimize deeper models [29]. In this study, we employ an improved residual module RDSC that leverages the synergy between depthwise separable convolution and attention mechanism. As depicted in Figure 3, the input passenger flow passes through a series of processes to obtain the output features, DSConv represents a depthwise separable convolution, BN indicates a batch normalization layer, ReLu denotes an activation function, and channel-attention mechanism includes squeeze and excitation operations. The main effect of the attention mechanism is to make the model adaptively learn the importance of each channel, thus reducing the focus on non-essential feature channels. Through these connections, information can flow directly from one layer to another, thus alleviating the vanishing gradient problem and facilitating the training of deep networks. Figure 1 illustrates two RDSC modules being used to extract inflow or outflow features. Subsequently, the extracted features are flattened and fed into the fully connected layer.

In this module, the structure of traditional residual networks is improved to simultaneously optimize the network model parameters and increase the prediction accuracy. Specifically, we introduce a depthwise separable convolution to divide the convolution process into two distinct stages, depthwise convolution and pointwise convolution, as depicted in Figure 4. The numbers of input channels and output channels are

C_{i n}

and

C_{o u t}

, the sizes of input space width and height are

H_{i n}

and

W_{i n}

, the size of output space width and height are

H_{o u t}

and

W_{o u t}

, and the size of convolutional kernel is

K \times K

and spatial dimension is

D_{k}

. First of all, depthwise convolution only operates one convolution kernel for each input data channel and obtains feature maps with number

C_{i n}

in size

H_{o u t} \times W_{o u t} \times 1

. Subsequently, pointwise convolution takes the output feature maps obtained from depthwise convolution as input and uses convolution kernels of size

1 \times 1 \times C_{i n}

with number

C_{o u t}

to perform the convolution operation for each channel, thus obtaining a new feature map of size

H_{o u t} \times W_{o u t} \times C_{o u t}

. This improvement significantly reduces the number of parameters and computational complexity, as well as retaining capacity of the model for intricate feature representation.

We also introduce the concept of dilation rates within the framework of depthwise separable convolution. The dilation rate permits convolution kernels to traverse across selective pixels or feature points while performing convolution, thus extending the receptive field. Standard convolution is a special case, with dilation rate being equal to 1, as shown in Figure 5a, which demonstrates the standard convolutional receptive field for a filter size of

3 \times 3

. When the filter sizes are the same, both being

3 \times 3

, the particular receptive fields with different dilation rates for convolution can be observed in Figure 5b,c. The sensory field that extends beyond the range of the standard convolution contributes to a comprehensive understanding of the data distribution and improves the ability to discern complex spatiotemporal relationships.

2.4. Multi-Scale Depthwise Separable Convolution Module

Based on the two predefined graphs, we construct the MDSC module to achieve a balance between feature propagation and extraction, which complements the spatial fine-grained information. Since GCN can capture spatial relationships from graphical data (presented in Section 2.2), two parallel GCNs are used to extract inter-station correlation of graphs. The GCN model can obtain the topological relationship between the center node and its neighboring nodes to obtain the spatial features reflected by the network, which mainly includes self-looping, neighbor aggregation, and normalization, as illustrated in Figure 6.

Taking the central node in Figure 6 as an example, a node’s neighbors are the nodes directly connected to it in the graph. Self-looping means that central node can consider its own features when aggregating information from its neighbors during convolution operations. Subsequently, GCN aggregates information from these neighboring nodes to update the features of the central node. Finally, GCN normalizes the adjacency matrix to ensure the stability and controllability of information transmission. In this study, GCN revolves around the convolution operation in the spatial domain, i.e., spatial-based GCN. It performs convolution on the adjacency matrices of network structure correlation graph and passenger flow similarity graph. Each station accumulates and assigns weights to its own features and its adjacent stations’ features. The functionality of the GCN can be mathematically expressed as shown in Equation (6).

H^{(l + 1)} = σ ({\hat{D}}^{- \frac{1}{2}} \hat{A} {\hat{D}}^{- \frac{1}{2}} H^{(l)} W^{(l)})

(6)

where

H^{(l)}

and

H^{(l + 1)}

represent the feature matrix of the

l

layer and the

l + 1

layer.

\hat{A} \in R^{N \times N}

represents the adjacency matrix with self-connections—that is,

\hat{A} = A + I

.

A \in R^{N \times N}

is based on the predefined adjacent matrix of network structure correlation graph and passenger flow similarity graph, and

I

is the identity matrix.

\hat{D} \in R^{N \times N}

is the degree matrix of graph.

W^{(l)}

represents the weight matrix of the

l

layer, and

σ (\cdot)

is the sigmoid activation function.

However, several studies have shown that the increase in the number of GCN layers not only leads to higher computational cost during backpropagation but also leads to the disappearance of gradients. Therefore, we input the inter-station correlation features

H^{(l + 1)}

obtained through GCN into the multi-scale convolution for extracting multiple spatial features. A single convolutional kernel cannot capture dependencies on various scales, while several convolutional kernels of different sizes can be viewed as feature extractors that capture different levels of features. Multi-scale convolution is able to extend the multidimensional relationships between input and output, as shown in Figure 7, which consists of four branches combining a single convolutional layer with different kernel sizes and a

3 \times 3

maximum pooling layer. Each branch uses a

1 \times 1

filter to compress the number of channels and improve nonlinear fitting of the model.

2.5. AttBiGRU

In view of the various influences that may lead to changes in spatiotemporal correlations, we have embraced a weighted fusion methodology in our framework to amplify the important information from the input data and avoid limitations arising from over-reliance on a single feature, as shown in Equation (7).

O u t = W_{1} * {O u t}_{1} + W_{2} * {O u t}_{2} + W_{3} * {O u t}_{3}

(7)

where

O u t

represents the output of multiple branches.

{O u t}_{1}

,

{O u t}_{2}

, and

{O u t}_{3}

represent the outputs of RDSC modules, MDSC module based on

G_{c}

and MDSC module based on

G_{s}

, respectively.

W_{1}

,

W_{2}

, and

W_{3}

are the weight parameters of each branch, which are automatically updated through backpropagation during the training process.

*

represents the product of Hadamard.

The AttBiGRU takes

O u t

as input and provides powerful bidirectional modeling and prediction performance for the proposed model through the combination of attention mechanism and BiGRU. As the latest advancement in the field of sequence modeling following RNN, gated recurrent unit (GRU) inherits its advantages and has the potential to surpass it in various processing applications. Nevertheless, the unidirectional nature of GRU imposes constraints on its capacity to access global information, thereby engendering potential losses or errors in information accumulation. Bidirectional gated recurrent unit (BiGRU) emerges as a solution by processing input sequences in both the forward and reverse directions, enabling it to identify critical factors or patterns that might be missed when only processing the data in a single direction. The attention mechanism of AttBiGRU can capture key bidirectional dependencies and context information, thus allowing the model to maintain a long-term memory for useful information in prediction tasks, as depicted in Figure 8.

The fused features are passed through AttBiGRU to obtain

{O u t}_{A t t B i G R U}

. Then, via flattenning and fully connected layer, we obtain the final predicted passenger flow results, as shown in Equation (8).

P r e d i c t i o n = f (W_{f c} {* O u t}_{A t t B i G R U} + b_{f c})

(8)

where

{O u t}_{A t t B i G R U}

is the output of AttBiGRU, and

f

denotes the fully connected network.

W_{f c}

and

b_{f c}

are the weights and biases of fully connected layer, respectively.

3. Experiments

In this section, the passenger inflow and outflow data from Hangzhou Metro are employed to assess the performance of our proposed model.

3.1. Data Description

The passenger flow dataset is collected from the Hangzhou Metro system, which contains approximately 70 million pieces of data from 3 lines covering 80 stations, as illustrated in Figure 9. Due to different operating schedules at different stations, we uniformly analyzed passenger flow data for a total of 18 h from 5:30 to 23:30 every day. The time granularities used in this section are 10, 15, and 30 min. Taking the 10 min time granularity as an example, a station has 144 time slots per day, as shown in Table 2. The original dataset covers metro data from 1st to 25th January 2019, which includes time, line ID, station ID, device ID, status, user ID, and payment type. The status in the dataset represents the passenger’s entry and exit status, where 0 represents the exit, and 1 represents the entry. A total of 80% of the dataset is chosen as training data and validation data, and the remaining 20% is used as the testing data.

The passenger flow is preprocessed with the min-max normalization method, as defined in Equation (9).

x_{i}^{’} = \frac{x_{i} - x_{m i n}}{x_{m a x} - x_{m i n}}

(9)

where

x_{i}^{’}

is the normalized value and

x_{m a x}

and

x_{m i n}

are the maximum and minimum values of passenger flow data, respectively.

By analyzing the correlation of passenger flow at different stations, the characteristics of the entire metro network can be revealed. These characteristics not only help to understand the interactions between different stations but also help determine if there are common patterns and trends in passenger flow. As shown in Figure 10, the spatial correlation between adjacent stations is significantly strong due to the proximity of their geographical locations, facilitating flow and interchange among passengers. Furthermore, it is crucial to recognize that some stations are not physically adjacent but might exhibit similarities owing to common characteristics. These commonalities may be caused by factors such as population distribution, station planning, and urban structure, resulting in similar trend and demand patterns between non-adjacent stations.

In order to further visualize the fluctuations in passenger flow, we have chosen three stations for analysis during the same time period, including Fengqi Road Station, Wulin Square Station, and Qingchun Square Station. Among these three stations, Fengqi Road Station and Wulin Square Station are adjacent, while Fengqi Road Station and Qingchun Square Station are not adjacent to each other. As illustrated in Figure 11, in spite of the differences in geographic locations, all of these three stations demonstrate similar passenger inflow and outflow patterns, which suggests that it is essential to consider the inter-station interactions.

3.2. Evaluation Metrics

All models and figures are generated and executed on a desktop computer with an AMD Ryzen7 3800X 8-core processor CPU and an NVIDIA GeForce RTX 2060 SUPER, using the Tensorflow2.3 framework. The Hangzhou Metro dataset mentioned above is divided into a training set, validation set, and test set according to the ratio of 7:1:2. For different time granularities (10, 15, 30 min), we employed 18, 12, and 6 time steps, respectively. The Mean Squared Error (MSE) is used as the loss function with a learning rate of 0.001, the dropout is set to 0.3, and the training epoch is set as 390. More details about the hyperparameters of our model are given in Table 3.

In this study, four evaluation indicators are adopted to assess the performance of prediction models, including Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and Weighted Mean Absolute Percentage Error (WMAPE).

M A E = \frac{1}{n} \sum_{i = 1}^{n} |y_{i} - \hat{y_{i}}|

(10)

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - \hat{y_{i}})}^{2}}

(11)

W M A P E = \frac{\sum_{i = 1}^{n} |y_{i} - \hat{y_{i}}|}{\sum_{i = 1}^{n} y_{i}}

(12)

where

y_{i}

is the true value and

{\hat{y}}_{i}

is the predicted value of the

i

test sample, and

n

is the number of all predicted values.

3.3. Baseline Models

In this study, we assess the performance of the proposed MRDSCNN model by comparing it with the following baseline models.

HA [7]: this is a simple baseline model that predicts future passenger flow based on historical average values.

CNN [17]: this applies convolutional operations to extract spatial and temporal features from the passenger flow time series. By stacking multiple convolutional layers, the CNN can capture complex and hierarchical representations of the input sequence.

GRU [16]: this has gating mechanisms that control the flow of information within the network, allowing it to selectively retain and update important information from past time steps.

BiGRU [30]: this is a model extended from GRU to deal with sequence data by introducing a bidirectional loop structure. It effectively captures bidirectional dependencies in sequence data by combining two GRU modules, forward and backward.

TCN [31]: this employs residual concatenation and causal convolutions to accelerate model convergence and improve modeling capabilities.

ResLSTM [32]: this is a multi-branch deep learning architecture that combines the residual module, GCN, and attention LSTM.

Conv-GCN [26]: this is a fusion of multi-graph GCN and 3D CNN, allowing it to effectively capture high-level spatiotemporal features among different patterns of passenger flow.

3.4. Results and Discussion

In this section, we demonstrate the effectiveness of the network structure correlation graph and passenger flow similarity graph through ablation experiments. Then, the MRDSCNN model is compared with a series of baseline models with different time granularities. The experimental results indicate that the proposed model performs effectively in predicting both passenger inflow and outflow.

3.4.1. Analysis of Relationship Graphs between Metro Stations

We visualize the adjacency matrices as heatmaps to better understand relationships between stations in the graphs. Darker colors in the heatmap represent higher weights. For better comprehension, we use alphanumeric code to represent each station. Figure 12 illustrates the heatmap of the network structure correlation adjacency matrix. We can see that the matrix exhibits elevated weights for neighboring stations, indicating that the neighboring relationship fosters frequent passenger movement and generates distinct spatial dependencies.

Unlike a single network structure correlation adjacency matrix, there are six adjacency matrices for the passenger flow similarity graphs, which arise from variations in flow direction and time granularities. As illustrated in Figure 13, the passenger flows are divided into inbound and outbound directions and visualized with three time granularities. For the passenger flow relationships between different stations, we can see that the higher weighted stations have similar passenger flow trends. The darker color of each cell in the heatmaps indicates higher similarity, and the lighter color indicates lower similarity.

Figure 13a–c show the passenger inflow similarity adjacency matrices at time periods of 10 min, 15 min, and 30 min, respectively. Some stations share similar passenger inflow patterns, probably because they serve similar demographics during specific time periods. Meanwhile, Figure 13d–f show the passenger outflow similarity adjacency matrices at different time granularities. Analyzing areas of high- and low-intensity color distribution in heatmaps contributes to identifying potential or hotspot passenger flow patterns. High-intensity areas in heatmaps usually correspond to specific passenger hotspots, such as busy commercial areas, important transfer hubs, or densely populated residential areas. In the above areas, passengers exhibit consistent travel behavior, resulting in a more stable and densely packed passenger flow pattern. Conversely, areas with low intensity indicate weak spatial correlation or low similarity between stations. These stations are spatially dispersed due to different transportation options and the uniqueness of the surrounding areas.

3.4.2. Effectiveness of Graph Construction

According to the analysis of relationship graphs between metro stations in the previous section, the network structure correlation graph and the passenger flow similarity graph are two important components in passenger flow prediction. In this section, we conduct ablation experiments to assess the significance of introducing these two components. With the intent of removing the graph components, we conducted four experiments as follows.

MRDSCNN (No

G

): MRDSCNN without graphs

G

.

MRDSCNN (with

G_{c}

): MRDSCNN with passenger flow similarity graph

G_{c}

.

MRDSCNN (with

G_{s}

): MRDSCNN with network structure correlation graph

G_{s}

.

MRDSCNN: Complete MRDSCNN.

Table 4 and Table 5 show the outcomes of experiments evaluating the individual contributions of pre-defined graph components. As we can see, any graphical component is important, and removing any component degrades the performance of the whole model. Specifically, the passenger flow similarity graph performs significantly well, especially on the RMSE, indicating that the incorporation of information from non-neighboring stations substantially enhances the model stability. In contrast to models without graph structures, our proposed approach exhibits remarkable enhancements in predicting passenger inflow across various time granularities, yielding reductions of 6.63%, 7.54%, and 8.03% on the RMSE. Similarly, predictions of passenger outflow also reap the benefits, with reductions of 7.27%, 4.00%, and 8.24% on the RMSE. Each graph component distinctly improves the performance of the MRDSCNN model. Crucially, the collaborative amalgamation of both graph components assumes a pivotal role in fostering heightened stability and predictive proficiency.

Figure 14 shows the observed passenger inflow and outflow at Fengqi Road Station throughout the day. The comparison results of different ablation experiments prove that the proposed model and its variants are in very good agreement with the actual passenger flow. The model with the graph components at Fengqi Road Station performs better, especially during the evening peak in passenger entry and the morning peak in passenger exit.

3.4.3. Comparative Analysis of Different Models

In this experiment, the input passenger inflow and outflow data are entered into all models. When analyzing the results of the performance experiments based on the above three metrics (as shown in Figure 15 and Figure 16), we found a clear trend that the MRDSCNN model consistently outperforms other models in both passenger inflow and outflow predictions.

As shown in Table 6 and Table 7, deep learning methods perform better than traditional models in predicting metro passenger flow at different time granularities. The traditional statistical approach HA performs the worst because of its limited ability to capture complex spatiotemporal dynamics through fitted regression. The second-worst model is the deep learning model CNN, which does not perform better than the other six deep learning methods, mainly because its fixed receptive field does not allow for the extraction of global information. Then, we can see that both GRU and BiGRU perform better than CNN. This indicates that the recurrence-based approaches can model long-term dependencies in sequential data, allowing them to capture and update crucial information from past time steps. Unlike the single model described above, TCN, as an ensemble model, can expand the receptive field and capture multi-scale features through the residual structure. However, ResLSTM has a better performance than TCN because ResLSTM constitutes a multifaceted architecture influenced by intricate factors, where the GCN component is an important part of processing graph structure data. Furthermore, as the best baseline model in this paper, Conv-GCN integrates the inflow and outflow information to extract high-level spatiotemporal features between near and far stations. Although all the models demonstrate small prediction performance gaps when the time granularity is 10 min, MRDSCNN begins to show its superior prediction performance compared to other baseline models as the time granularity increases. It is worth mentioning that our proposed model integrates multiple graphs to cover the spatiotemporal complexity inherent in different inter-station connections, resulting in superior prediction accuracy and the best model fit. Moreover, we also compared the experimental results of models STGCN [33], DCRNN [33], GBDT [22], TSTFN [34], and MGSTCN [35] with the Hangzhou Metro dataset, which indicated that the proposed model has better predictive performance.

We also assessed the performance of the proposed model at different time granularities, as illustrated in Figure 17. Remarkably, we observed that as the time granularity increases, both the MAE and RMSE rise while the WMAPE decreases. This is mainly because the influence of short-term fluctuations on passenger flow is reduced when the time granularity is from 10 to 30 min, making it easier to capture general trends and patterns of passenger flow.

3.4.4. Prediction Performance of Typical Stations

In addition to network scale prediction, we also explored the effect of individual station scale prediction through typical stations. Inspired by Ref. [36], we selected transfer stations, terminal stations, and regular stations as representatives to comprehensively understand the passenger flow characteristics of different types of metro stations. The selected stations include Xianghu Station, Fengqi Road Station, and Qingchun Square Station. Xianghu Station is a terminal station on Metro Line 1. Fengqi Road Station is a transfer station on Metro Line 2. Qingchun Square Station is a regular station on Metro Line 2. Time series comparison charts were generated by integrating the ground truth with predicted results derived from our proposed model, as shown in Figure 18.

Xianghu Station emerges as a vital link connecting urban centers and suburban regions, facilitating the transit of passengers between these spheres. Its passenger inflow displays a characteristic of traversing through suburban areas, while the passenger outflow exhibits notable morning and evening peaks. Fengqi Road Station is surrounded by many commercial and residential areas. So, it exhibits a pronounced concentrated peak pattern, especially during the morning and evening commuting peak hours. In contrast, Qingchun Square Station presents the characteristic attribute of a large passenger flow. As a significant transportation hub located within a commercial district, this station experiences a pronounced influx of passengers during weekday mornings, which is closely associated with the surrounding urban dynamics. These three stations encompass diverse geographic contexts, showcasing a high degree of representativeness and applicability. In general, the proposed model can achieve accurate predictions at both the network level and station level.

4. Conclusions

As an important aspect of urban transportation planning and management, accurate passenger flow prediction is crucial for ensuring efficient and reliable metro services. Predicting metro passenger flow is a difficult task due to intricacies and uncertainties. In this study, we propose an MRDSCNN model based on the residual structure, multi-scale convolution, and AttBiGRU to predict the passenger inflow and outflow in the metro. Considering changes in passenger flow at different time scales, we take three different passenger flow patterns as input of the residual network and capture the spatiotemporal characteristics. Moreover, we develop two graphs based on prior knowledge and model the inter-station interaction correlations through multi-scale convolution operations. Then, AttBiGRU is utilized for global information fusion. The proposed model is evaluated by the Hangzhou metro passenger flow data, and the results demonstrate that the model effectively simulates the global dynamic dependencies of the entire metro system. In the future, we will explore more inter-station correlations and external factors (such as weather and emergencies) to further improve the accuracy of passenger flow prediction, as well as extend its application to other urban transportation domains.

Author Contributions

Conceptualization, T.L. and M.L.; methodology, T.L. and L.L.; validation, L.L.; formal analysis, T.L.; investigation, L.L.; resources, L.L.; data curation, M.L.; writing—original draft preparation, T.L. and L.L; writing—review and editing, T.L. and M.L.; visualization, L.L; supervision, T.L. and M.L.; project administration, T.L.; funding acquisition, T.L. All authors have read and agreed to the published version of the manuscript.

Funding

The Humanities and Social Science Foundation of the Ministry of Education of China under Grant 21YJC630066, the Liaoning Revitalization Talents Program under Grant XLYC1907084, and the Key Research and Development Project in Liaoning Province under Grant 2020JH2/10100042.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data used in this study can be obtained through the following link: https://tianchi.aliyun.com/competition/entrance/231708/information, accessed on 18 February 2023.

Conflicts of Interest

The authors declare no conflict of interest.

References

Cavone, G.; Montaruli, V.; Van Den Boom, T.; Dotoli, M. Demand-Oriented Rescheduling of Railway Traffic in Case of Delays. In Proceedings of the 2020 7th International Conference on Control, Decision and Information Technologies (CoDIT), Prague, Czech Republic, 29 June–2 July 2020; pp. 1040–1045. [Google Scholar]
Luo, J.; Tong, Y.; Cavone, G.; Dotoli, M. A Service-Oriented Metro Traffic Regulation Method for Improving Operation Performance. In Proceedings of the 2021 IEEE International Intelligent Transportation Systems Conference (ITSC), Indianapolis, IN, USA, 19–22 September 2021; pp. 3533–3538. [Google Scholar]
Cavone, G.; Blenkers, L.; Van Den Boom, T.; Dotoli, M.; Seatzu, C.; De Schutter, B. Railway disruption: A bi-level rescheduling algorithm. In Proceedings of the 2019 6th International Conference on Control, Decision and Information Technologies (CoDIT), Paris, France, 23–26 April 2019; pp. 54–59. [Google Scholar]
Ghasempour, T.; Nicholson, G.L.; Kirkwood, D.; Fujiyama, T.; Heydecker, B. Distributed approximate dynamic control for traffic management of busy railway networks. IEEE Trans. Intell. Transp. Syst. 2019, 21, 3788–3798. [Google Scholar] [CrossRef]
Liu, J.; Chen, L.; Roberts, C.; Gemma, N.; Bo, A. Algorithm and peer-to-peer negotiation strategies for train dispatching problems in railway bottleneck sections. IET Intell. Transp. Syst. 2019, 13, 1717–1725. [Google Scholar] [CrossRef]
Hou, Z.; Dong, H.; Gao, S.; Nicholson, G.; Chen, L.; Roberts, C. Energy-saving metro train timetable rescheduling model considering ATO profiles and dynamic passenger flow. IEEE Trans. Intell. Transp. Syst. 2019, 20, 2774–2785. [Google Scholar] [CrossRef]
Smith, B.L.; Demetsky, M.J. Traffic flow forecasting: Comparison of modeling approaches. J. Transp. Eng. 1997, 123, 261–266. [Google Scholar] [CrossRef]
Bai, Y.; Sun, Z.; Zeng, B.; Deng, J.; Li, C. A multi-pattern deep fusion model for short-term bus passenger flow forecasting. Appl. Soft Comput. 2017, 58, 669–680. [Google Scholar] [CrossRef]
Wei, Y.; Chen, M.-C. Forecasting the short-term metro passenger flow with empirical mode decomposition and neural networks. Transp. Res. Part C Emerg. Technol. 2012, 21, 148–162. [Google Scholar] [CrossRef]
Tang, T.; Liu, R.; Choudhury, C. Incorporating weather conditions and travel history in estimating the alighting bus stops from smart card data. Sustain. Cities Soc. 2020, 53, 101927. [Google Scholar] [CrossRef]
Lin, L.; Gao, Y.; Cao, B.; Wang, Z.; Jia, C.; Guo, L. Passenger Flow Scale Prediction of Urban Rail Transit Stations Based on Multilayer Perceptron (MLP). Complexity 2023, 2023, 1430449. [Google Scholar] [CrossRef]
Peng, H.; Wang, H.; Du, B.; Bhuiyan, M.Z.A.; Ma, H.; Liu, J.; Wang, L.; Yang, Z.; Du, L.; Wang, S.; et al. Spatial temporal incidence dynamic graph neural networks for traffic flow forecasting. Inf. Sci. 2020, 521, 277–290. [Google Scholar] [CrossRef]
Hou, Z.; Du, Z.; Yang, G.; Yang, Z. Short-Term Passenger Flow Prediction of Urban Rail Transit Based on a Combined Deep Learning Model. Appl. Sci. 2022, 12, 7597. [Google Scholar] [CrossRef]
Zhai, X.; Shen, Y. Short-Term Bus Passenger Flow Prediction Based on Graph Diffusion Convolutional Recurrent Neural Network. Appl. Sci. 2023, 13, 4910. [Google Scholar] [CrossRef]
Wu, J.; Li, X.; He, D.; Li, Q.; Xiang, W. Learning spatial-temporal dynamics and interactivity for short-term passenger flow prediction in urban rail transit. Appl. Intell. 2023, 53, 19785–19806. [Google Scholar] [CrossRef]
Sha, S.; Li, J.; Zhang, K.; Yang, Z.; Wei, Z.; Li, X.; Zhu, X. RNN-Based Subway Passenger Flow Rolling Prediction. IEEE Access 2020, 8, 15232–15240. [Google Scholar] [CrossRef]
Niu, K.; Cheng, C.; Chang, J.; Zhang, H.; Zhou, T. Real-Time Taxi-Passenger Prediction With L-CNN. IEEE Trans. Veh. Technol. 2019, 68, 4122–4129. [Google Scholar] [CrossRef]
Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
Zhang, X.; Wang, C.; Chen, J.; Chen, D. A deep neural network model with GCN and 3D convolutional network for short-term metro passenger flow forecasting. IET Intell. Transp. Syst. 2023, 17, 1599–1607. [Google Scholar] [CrossRef]
Chen, P.; Fu, X.; Wang, X. A Graph Convolutional Stacked Bidirectional Unidirectional-LSTM Neural Network for Metro Ridership Prediction. IEEE Trans. Intell. Transp. Syst. 2022, 23, 6950–6962. [Google Scholar] [CrossRef]
Zhao, L.; Song, Y.; Zhang, C.; Liu, Y.; Wang, P.; Lin, T.; Deng, M.; Li, H. T-GCN: A Temporal Graph Convolutional Network for Traffic Prediction. IEEE Trans. Intell. Transp. Syst. 2020, 21, 3848–3858. [Google Scholar] [CrossRef]
Liu, L.; Chen, J.; Wu, H.; Zhen, J.; Li, G.; Lin, L. Physical-Virtual Collaboration Modeling for Intra- and Inter-Station Metro Ridership Prediction. IEEE Trans. Intell. Transport. Syst. 2022, 23, 3377–3391. [Google Scholar] [CrossRef]
Dong, N.; Li, T.; Liu, T.; Tu, R.; Lin, F.; Liu, H.; Bo, Y. A method for short-term passenger flow prediction in urban rail transit based on deep learning. Multimed. Tools Appl. 2023. [Google Scholar] [CrossRef]
Liu, L.; Wu, M.; Chen, R.-C.; Zhu, S.; Wang, Y. A Hybrid Deep Learning Model for Multi-Station Classification and Passenger Flow Prediction. Appl. Sci. 2023, 13, 2899. [Google Scholar] [CrossRef]
Ke, J.; Qin, X.; Yang, H.; Zheng, Z.; Zhu, Z.; Ye, J. Predicting origin-destination ride-sourcing demand with a spatio-temporal encoder-decoder residual multi-graph convolutional network. Transp. Res. Part C Emerg. Technol. 2021, 122, 102858. [Google Scholar] [CrossRef]
Zhang, J.; Chen, F.; Guo, Y.; Li, X. Multi-graph convolutional network for short-term passenger flow forecasting in urban rail transit. IET Intell. Transp. Syst. 2020, 14, 1210–1217. [Google Scholar] [CrossRef]
Li, H. Time works well: Dynamic time warping based on time weighting for time series data mining. Inf. Sci. 2021, 547, 592–608. [Google Scholar] [CrossRef]
Ni, Q.; Zhang, M. STGMN: A gated multi-graph convolutional network framework for traffic flow prediction. Appl. Intell. 2022, 52, 15026–15039. [Google Scholar] [CrossRef]
Zhang, J.; Zheng, Y.; Qi, D.; Li, R.; Yi, X.; Li, T. Predicting citywide crowd flows using deep spatio-temporal residual networks. Artif. Intell. 2018, 259, 147–166. [Google Scholar] [CrossRef]
Chen, D.; Yan, X.; Liu, X.; Li, S.; Wang, L.; Tian, X. A multiscale grid-based stacked bidirectional GRU neural network model for predicting traffic speeds of urban expressways. IEEE Access 2021, 9, 1321–1337. [Google Scholar] [CrossRef]
Bai, S.; Kolter, J.Z.; Koltun, V. An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling. arXiv 2018, arXiv:1803.01271. [Google Scholar]
Zhang, J.; Chen, F.; Cui, Z.; Guo, Y.; Zhu, Y. Deep Learning Architecture for Short-Term Passenger Flow Forecasting in Urban Rail Transit. IEEE Trans. Intell. Transp. Syst. 2021, 22, 7004–7014. [Google Scholar] [CrossRef]
Yin, D.; Jiang, R.; Deng, J.; Li, Y.; Xie, Y.; Wang, Z.; Zhou, Y.; Song, X.; Shang, J. MTMGNN: Multi-time multi-graph neural network for metro passenger flow prediction. GeoInformatica 2023, 27, 77–105. [Google Scholar] [CrossRef]
Zhang, W.; Zhang, C.; Tsung, F. Transformer Based Spatial-Temporal Fusion Network for Metro Passenger Flow Forecasting. In Proceedings of the 2021 17th IEEE International Conference on Automation Science and Engineering (CASE), Lyon, France, 23–27 August 2021; pp. 1515–1520. [Google Scholar]
Yang, J.; Liu, T.; Li, C.; Tong, W.; Zhu, Y.; Ai, W. MGSTCN: A multi-graph spatio-temporal convolutional network for metro passenger flow prediction. In Proceedings of the 2021 7th International Conference on Big Data Computing and Communications (BigCom), Deqing, China, 13–15 August 2021; pp. 164–171. [Google Scholar]
Lu, Y.; Ding, H.; Ji, S.; Sze, N.N.; He, Z. Dual attentive graph neural network for metro passenger flow prediction. Neural Comput. Appl. 2021, 33, 13417–13431. [Google Scholar] [CrossRef]

Figure 1. Framework of the proposed MRDSCNN model.

Figure 2. Distance matrix of DTW algorithm.

Figure 3. Residual depthwise separable convolution.

Figure 4. Depthwise separable convolution.

Figure 5. Dilated convolution. (a) Dilation rate = 1; (b) Dilation rate = 2; (c) Dilation rate = 3.

Figure 6. The process of graph convolution. (a) Self-looping; (b) Neighbor connections; (c) Normalization.

Figure 7. Multi-scale convolution.

Figure 8. The structure of AttBiGRU.

Figure 9. Hangzhou metro system map.

Figure 10. Pearson correlation coefficient between different passenger flows at all stations. (a) Passenger inflow; (b) Passenger outflow.

Figure 11. Passenger flow at three stations. (a) Flow of Fengqi Road Station; (b) Flow of Wulin Square Station; (c) Flow of Qingchun Square Station.

Figure 12. The heatmap of network structure correlation adjacency matrix.

Figure 13. The heatmaps of passenger flow similarity adjacency matrices. (a) A 10 min passenger inflow; (b) 15 min passenger inflow; (c) 30 min passenger inflow; (d) 10 min passenger outflow; (e) 15 min passenger outflow; (f) 30 min passenger outflow.

Figure 14. Ablation experiment for passenger flow at Fengqi Road Station. (a) Passenger inflow at Fengqi Road Station; (b) Passenger outflow at Fengqi Road Station.

Figure 15. Comparison of model performance for passenger inflow. (a) MAE; (b) RMSE; (c) WMAPE.

Figure 16. Comparison of model performance for passenger outflow. (a) MAE; (b) RMSE; (c) WMAPE.

Figure 17. Comparison of model performance at different time granularities. (a) Prediction performance of passenger inflow; (b) Prediction performance of passenger outflow.

Figure 18. Comparison of ground truth and predicted values of three typical stations. (a) Inflow of Xianghu Station; (b) Inflow of Fengqi Road Station; (c) Inflow of Qingchun Square Station; (d) Outflow of Xianghu Station; (e) Outflow of Fengqi Road Station; (f) Outflow of Qingchun Square Station.

Table 1. Overview of studies on passenger flow prediction and its application.

Reference	Description
Smith et al. [7]	The HA model forecasted Lisbon Airport’s passenger numbers to determine the demand in a new airport using non-causal methods.
Bai et al. [8]	This paper proposed a mutil-pattern deep fusion approach which used the AP algorithm for clustering analysis and fused the output of the multi-pattern DBNs to obtain the final prediction results.
Wei et al. [9]	The EMD-BPN model decomposed the short-term passenger flow series data into a number of intrinsic mode function (IMF) components as inputs to the BPN and then applied BPN to passenger flow prediction.
Tang et al. [10]	The paper incorporated weather variables, personal travel history, and network features into the GBDT algorithm.
Lin et al. [11]	The multilayer perceptron (MLP)-based passenger flow prediction model was developed to predict the passenger flow at key stations.
Peng et al. [12]	Dynamic-GRCNN model integrated the relationship of passengers and GCN and LSTM units to learn complex traffic spatial–temporal features.
Hou et al. [13]	The TCN-LSTM model solved the difficulty of accurate prediction due to the large fluctuation and randomness of short-term passenger flow in rail transit.
Zhai et al. [14]	Graph-based DCRNN integrated graph features into the RNN to capture the spatiotemporal dependencies in the bus network. The diffusion convolution recurrent neural network (DCRNN) architecture was adopted to forecast the future number of passengers on each bus line.
Wu et al. [15]	The MFGCN model included the GCN with a spatial-attention mechanism and the LSTM with a temporal-attention mechanism to extract higher-order spatial–temporal interaction characteristics.
Sha et al. [16]	The GRU model can selectively retain and update important information about past time steps due to its gating mechanism of information flow within the network.
Niu et al. [17]	Authors applied convolutional operations to extract spatial and temporal features from the passenger flow time series. By stacking multiple convolutional layers, the CNN can capture complex and hierarchical representations of the input sequence.
Kipf et al. [18]	The GCN presented a scalable approach for semi-supervised learning on graph-structured data that is based on an efficient variant of convolutional neural networks which operate directly on graphs.
Zhang et al. [19]	The ARConv-GCN model combined GCN and the attention residual 3DCNN for more accurate short-term passenger flow prediction in rail transit.
Chen et al. [20]	This paper proposed a parallel-structured deep learning model that consists of a GCN and a stacked BiLSTM for predicting passenger inflow.
Zhao et al. [21]	Authors introduced GCN to capture the geographical connectivity relationships between sites on the basis of GRU.
Liu et al. [22]	The PVCGN is a Seq2Seq model including a graph convolution gated recurrent unit (GC-GRU) for spatial–temporal representation learning and a fully connected gated recurrent unit (FC-GRU) for capturing the global evolution tendency.
Dong et al. [23]	This paper incorporated the convolution operation between LSTM cells and predicted passenger flow for three different types of stations: terminal station, transfer station, and regular station.
Liu et al. [24]	This paper proposed a novel two-step strategy (Transformer-K-Means)-(ResNet-GCN-AttLSTM), which includes classification block based on the K-Means, and prediction block, which includes ResNet, GCN, and AttLSTM.
Ke et al. [25]	The ST-ED-RMGC model was proposed, which is an encoder–decoder framework used to study the OD-based ride-sourcing demand prediction problem.
Zhang et al. [26]	The Conv-GCN model is a fusion of multi-graph GCN and 3D CNN, allowing it to effectively capture high-level spatiotemporal features among different patterns of passenger flow.
Cavone et al. [1]	The predicted passenger flow results can serve as a key solution to alleviate the operational pressure brought about by increasing demand.
Luo et al. [2]	Passenger flow prediction can provide reference for optimizing traffic operations in disturbed environments and ensuring a secure and comfortable travel experience for passengers.
Cavone et al. [3]	Passenger flow prediction can also be used to respond to emergency situations or traffic interruptions.
Ghasempour et al. [4]	Passenger flow prediction results can help to rearrange subway schedules and reduce delays.
Liu et al. [5]	Accurate passenger flow prediction can contribute to limiting train delays, especially in a bottleneck area of railway traffic management.
Hou et al. [6]	The results of passenger flow prediction are helpful for train scheduling in bottleneck sections and reducing train energy consumption.

Table 2. Data format of metro passenger flow.

Station	5:30–5:40	5:40–5:50	5:50–6:00	…	23:10–23:20	23:10–23:20	23:20–23:30
0	1	4	3	…	1	0	0
1	0	5	5	…	2	1	1
2	0	9	11	…	6	4	1
…	…	…	…	…	…	…	…
79	0	0	4	…	5	7	1

Table 3. Hyperparameters of MRDSCNN model.

Parameters	Passenger Inflow Prediction	Passenger Outflow Prediction	AttBiGRU
Kernel Size	3 × 3	3 × 3	-
Dilation rate	3	3	-
Residual Module1	32 filters	32 filters	-
Residual Module2	64 filters	64 filters	-
Batch Size	64	64	64
Activation Function	ReLu	ReLu	Linear
Number of Layers	-	-	1
Number of Neurons	-	-	128
Number of Neurons in FC1	80	80	-
Number of Neurons in FC2	80	80	-
Number of GCN Layers	1	1	-

Table 4. Results of ablation experiments on passenger inflow.

Model	10 Min			15 Min			30 Min
Model	MAE	RMSE	WMAPE	MAE	RMSE	WMAPE	MAE	RMSE	WMAPE
MRDSCNN-No G	19.23	29.71	13.07%	23.92	38.60	10.80%	40.55	67.14	9.26%
MRDSCNN-G_c	17.41	28.58	11.86%	22.25	36.76	10.09%	39.72	65.14	9.08%
MRDSCNN-G_s	17.35	28.39	11.81%	22.24	36.75	10.08%	39.69	65.99	9.06%
MRDSCNN	16.91	27.74	11.53%	21.84	35.99	9.89%	37.83	61.75	8.65%

Table 5. Results of ablation experiments on passenger outflow.

Model	10 Min			15 Min			30 Min
Model	MAE	RMSE	WMAPE	MAE	RMSE	WMAPE	MAE	RMSE	WMAPE
MRDSCNN-No G	20.52	33.72	13.97%	24.55	40.20	11.06%	40.26	68.10	9.07%
MRDSCNN-G_c	20.32	33.56	13.85%	24.00	39.12	10.85%	38.93	64.64	8.75%
MRDSCNN-G_s	20.28	33.19	13.78%	23.83	38.65	10.75%	37.68	63.70	8.48%
MRDSCNN	19.75	31.27	13.48%	23.83	38.59	10.73%	37.46	62.49	8.45%

Table 6. Comparison of different models for passenger inflow.

Model	10 Min			15 Min			30 Min
Model	MAE	RMSE	WMAPE	MAE	RMSE	WMAPE	MAE	RMSE	WMAPE
HA [7]	31.57	56.50	21.53%	58.59	104.87	26.55%	178.53	311.32	38.96%
CNN [17]	22.40	33.46	15.23%	26.54	42.20	12.03%	47.00	71.44	10.74%
GRU [16]	20.64	31.94	14.04%	26.39	41.15	11.94%	44.33	69.85	10.14%
BiGRU [30]	21.72	33.30	14.74%	28.28	43.29	12.80%	45.28	71.31	10.35%
TCN [31]	21.29	32.25	14.50%	26.41	40.76	11.97%	43.52	66.53	9.95%
ResLSTM [32]	20.10	32.08	13.68%	24.13	39.17	10.96%	40.70	65.89	9.31%
Conv-GCN [26]	18.11	29.75	12.24%	23.41	39.22	10.68%	40.20	64.75	9.34%
MRDSCNN	16.91	27.74	11.53%	21.84	35.99	9.89%	37.83	61.75	8.65%
STGCN [33]	16.74	30.36	11.30%	25.11	45.54	16.95%	50.22	91.08	33.90%
DCRNN [33]	18.00	32.24	12.14%	27.00	48.36	18.21%	54.00	96.72	36.42%
GBDT [22]	20.59	34.33	-	30.88	51.50	-	36.48	61.94	-
TSTFN [34]	-	-	-	23.56	39.70	-	47.12	79.40	-

Table 7. Comparison of different models for passenger outflow.

Model	10 Min			15 Min			30 Min
Model	MAE	RMSE	WMAPE	MAE	RMSE	WMAPE	MAE	RMSE	WMAPE
HA [7]	50.34	104.47	22.32%	74.24	154.63	27.41%	146.55	303.63	35.73%
CNN [17]	25.50	39.03	17.35%	27.18	42.06	12.28%	48.09	73.95	10.85%
GRU [16]	23.48	36.49	15.94%	26.76	41.40	12.06%	48.02	72.86	10.79%
BiGRU [30]	23.96	37.35	16.27%	27.21	41.95	12.22%	45.48	72.15	10.24%
TCN [31]	23.85	36.68	16.22%	26.98	41.54	12.19%	41.20	65.10	9.30%
ResLSTM [32]	21.57	35.78	14.69%	25.17	40.55	11.41%	38.30	65.44	8.64%
Conv-GCN [26]	20.31	33.12	13.82%	24.01	39.02	10.82%	38.41	64.62	8.65%
MRDSCNN	19.75	31.27	13.48%	23.83	38.59	10.73%	37.46	62.49	8.45%
STGCN [33]	20.25	34.72	13.68%	30.38	52.08	20.52%	60.76	104.16	41.04%
DCRNN [33]	21.77	37.84	14.69%	32.66	56.76	22.04%	65.32	113.52	44.07%
MGSTCN [35]	-	-	-	24.10	36.50	-	48.20	73.00	-

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, T.; Liu, L.; Li, M. Multi-Scale Residual Depthwise Separable Convolution for Metro Passenger Flow Prediction. Appl. Sci. 2023, 13, 11272. https://0-doi-org.brum.beds.ac.uk/10.3390/app132011272

AMA Style

Li T, Liu L, Li M. Multi-Scale Residual Depthwise Separable Convolution for Metro Passenger Flow Prediction. Applied Sciences. 2023; 13(20):11272. https://0-doi-org.brum.beds.ac.uk/10.3390/app132011272

Chicago/Turabian Style

Li, Taoying, Lu Liu, and Meng Li. 2023. "Multi-Scale Residual Depthwise Separable Convolution for Metro Passenger Flow Prediction" Applied Sciences 13, no. 20: 11272. https://0-doi-org.brum.beds.ac.uk/10.3390/app132011272

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Multi-Scale Residual Depthwise Separable Convolution for Metro Passenger Flow Prediction

Abstract

1. Introduction

2. Methodology

2.1. Overall Framework

2.2. Construction of Relationship Graphs between Metro Stations

2.2.1. Network Structure Correlation Graph

2.2.2. Passenger Flow Similarity Graph

2.3. Residual Depthwise Separable Convolution Module

2.4. Multi-Scale Depthwise Separable Convolution Module

2.5. AttBiGRU

3. Experiments

3.1. Data Description

3.2. Evaluation Metrics

3.3. Baseline Models

3.4. Results and Discussion

3.4.1. Analysis of Relationship Graphs between Metro Stations

3.4.2. Effectiveness of Graph Construction

3.4.3. Comparative Analysis of Different Models

3.4.4. Prediction Performance of Typical Stations

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Station	5:30–5:40	5:40–5:50	5:50–6:00	…	23:10–23:20	23:10–23:20	23:20–23:30
0	1	4	3	…	1	0	0
1	0	5	5	…	2	1	1
2	0	9	11	…	6	4	1
…	…	…	…	…	…	…	…
79	0	0	4	…	5	7	1

Station	5:30–5:40	5:40–5:50	5:50–6:00	…	23:10–23:20	23:10–23:20	23:20–23:30
0	1	4	3	…	1	0	0
1	0	5	5	…	2	1	1
2	0	9	11	…	6	4	1
…	…	…	…	…	…	…	…
79	0	0	4	…	5	7	1

Station	5:30–5:40	5:40–5:50	5:50–6:00	…	23:10–23:20	23:10–23:20	23:20–23:30
0	1	4	3	…	1	0	0
1	0	5	5	…	2	1	1
2	0	9	11	…	6	4	1
…	…	…	…	…	…	…	…
79	0	0	4	…	5	7	1