4.1. Alpha-Extended Kalman Filter
In order to not only filter the target position, Doppler, and covariance, but also continually evaluate the number of reflection points and extension information, we propose an AEKF in the prediction and updating process. The C-K equation, which is computationally intensive, is used in Bayesian filtering to determine the prior information [
28]. To lower the computing expenses, the following assumptions were adopted: (a) prior density follows a Gaussian distribution with a mean of
and a covariance of
; (b) the detection probability is constant; (c) measurement likelihood follows a Gaussian distribution with a mean of
and a covariance of
; (d) clutter intensity
. The specific calculation process of the above parameters will be given later in this subsection.
Suppose that there are active targets in the previous frame and the posterior estimate of the j-th target state is , including the target position ( and ) and the target velocity ( and ). The target extension is estimated to be , containing the number of reflection points and extension . The target posterior state covariance matrix is . In addition, we define the m-th reflection point taken from the sparse point cloud generation module in this frame to be with .
For the target moving model, a constant-velocity motion (CV) model [
29] was selected. However, even if the CV model is employed, the target is not required to move at a constant speed. The velocity change in one frame can be fully characterized by process noise since the acceleration of the human target is low. The target state transition matrix is defined as follows:
which describes the expected state change between two adjacent frames,
is the time of one frame, and
is the identity matrix.
The prior information of the target state and covariance matrix for target
j is calculated as
where
, and
represents the covariance matrix of process noise, which follows a Gaussian distribution with zero mean. Moreover, in the prediction step, assume that the prior information of the number of points and expansion remains the same, i.e.,
,
.
The collected point cloud is in Cartesian coordinates, and the target state must be mapped from the Cartesian space to the polar coordinate space using an observation equation with a nonlinear function. The Jacobian matrix [
30] is used to linearize the nonlinear function by Taylor expanding and retaining the first-order component of the expansion terms to obtain the observation matrix
.
During the update phase, a series of reflection points associated with the target is gathered. The sequence numbers of reflection points related to the
j-th prior expectation are defined by
with a length of
. The association process will be elaborated upon in
Section 4.2.
First, we updated the group expansion and the estimated amount of points. The extension is defined as the covariance matrix of the SNR-weighted mixture Gaussian distribution consisting of all correlation points, which can be obtained using the EM algorithm [
31]:
where
represents the mean value of all associated points.
The values of
and
were set as constants, and the number of the target reflection points was updated to
The extension is calculated as
Meanwhile, the target posterior state is expressed as
In the above equations, the Kalman gain is written as , and stands for the reference point of the group, which is the SNR-weighted mean of related points.
Moreover, conventional PTT often assigns a constant value for
that is of interest. By considering the time-varying character of ETT,
is defined as an equation involving the number of expansion and reflection points:
The first part of (
17) represents the observed noise covariance of the single reflection point, where
is the difference between the maximum and minimum values of each dimension in the associated sequence; the second part of (
17) is the estimation of the expansion, whose physical significance is that, as the expansion
becomes larger, the uncertainty of the measurement increases, which leads directly to the target posterior covariance becoming large. Furthermore, a constraint is placed on the value of
to avoid situations where convergence is impossible. In practice, it should be less than the approximate length of the open arms of the human body and larger than the width of the human shoulders.
The complete process of the algorithm is demonstrated in Algorithm 1.
Algorithm 1: AEKF. |
- Input:
, , , - Output:
, , , - 1:
for each do - 2:
Compute , , , , and - 3:
end for - 4:
Get the association result from group association module - 5:
Get new tracks from the track initialization module - 6:
for each do - 7:
Compute and - 8:
Compute and - 9:
Compute and - 10:
end for
|
Compared to the standard extended Kalman method, the proposed method benefits from considering the “many-to-one” association in the case of ETT, which ensures that the posterior density covariance can accurately model the uncertainty and enhances the accuracy of tracking associations.
4.2. Points-to-Prior Association
Target association is a crucial step that substantially decreases the number of posterior mixing density components. In other words, only the major components of the Gaussian mixture density are maintained, while the minor parts are eliminated. It can significantly reduce the algorithm’s complexity while maintaining its efficacy. In PTT-like association processes, the Hungarian algorithm [
32], Murty algorithm [
33], and Gibbs sampling method [
34] are frequently employed to determine single or many optimum allocation strategies. They are characterized by their “one-to-one” relationship, which is supplemented by a clustering technique before tracking in each frame to accomplish a priori cluster matching. This approach relies primarily on clustering, which might result in many association failures as the population grows and the quality of the point cloud deteriorates.
The non-normalized weight of the posterior mixture-density component resulting from the
j-th prior expectation associated with the reflection point
is
Under the assumption that the detection probability and clutter intensity are constants, we have
where
,
, and the calculation processes was described in detail in
Section 4.1.
The entire general pseudocode to the algorithm is shown in Algorithm 2. After the procedure, a series of correlation sequences are obtained. In contrast to the standard PTT-like approach, we associated and matched each reflection point that satisfies the gating requirements to choose the best-matching track, so the number of reflection points associated with each track might be zero or larger than one. The gating parameter
G changes in each frame based on the expected number of people on the field. Furthermore,
is the association result, where
stands for that for the
c-th measurement whose optimal pairing is the
j-th prior;
means that there is no matching target for this reflection point. Finally, we obtain the associated sequence for each subject:
When it is not equal to 0, the sequence length is
, which represents the number of reflected points matched by the track
j, and
is the index vector of these points.
Algorithm 2: Group association. |
- Input:
, , gating parameter G - Output:
- 1:
for each do - 2:
for each do - 3:
Compute - 4:
if then - 5:
Compute - 6:
end if - 7:
end for - 8:
Find - 9:
end for
|
4.3. Track Initialization
For points that , the probability of a new target entering the field of view is considered. In our method, clustering is used to determine the possible new targets, while a false target removal method is applied to distinguish unreal targets within them.
DBSCAN can divide sufficiently dense areas into clusters and form arbitrary clusters in a noisy spatial database, where clusters are defined as the greatest group of densely linked points. This has benefits over k-means of not requiring the number of clusters to be specified in advance, not initializing the cluster center, and not being sensitive to noise. Thus, in crowded indoor situations, DBSCAN is better suited for human target initialization than k-means. Traditional DBSCAN requires just the parameters and , which describe the neighborhood radius and the minimum number of points, to classify and cluster a set of points.
However, on the one hand, there are cases of target splitting in mmWave tracking situations due to target expansion or inadequate angular resolution, which is shown in
Figure 8a. A single human target may generate two distinct high-density clusters in the X-Y plane. Increasing the value of
can mitigate this problem, but it will result in a single group when numerous new targets approach, which is shown in
Figure 8b. On the other hand, multipath effects may generate ghost targets. These fake targets are related to the actual target movement for numerous frames in succession, as shown in
Figure 9, finally generating a ghost trace.
To address these two issues, we examined a vast quantity of measured data and investigated reducing the appearance of false targets from three perspectives: cluster location, cluster centroid height, and cluster SNR.
Suppose that the unassociated points can be written as
, whose length is equal to
. They are entered into the new target initialization module, and DBSCAN and false target removal algorithm are executed. The methods are shown in Algorithm 3. After the algorithm has completed, the track management module initializes temporary tracks for the new targets since
are considered new human targets in this frame.
Algorithm 3: DBSCAN-based false target suppression. |
- Input:
, , , - Output:
- 1:
Perform the DBSCAN algorithm on , and are obtained - 2:
Initialize , - 3:
for each do - 4:
for each do - 5:
Compute reference centroid - 6:
if then - 7:
if then - 8:
- 9:
end if - 10:
Compute group , azimuth difference A - 11:
if , , and then - 12:
- 13:
end if - 14:
end if - 15:
- 16:
end for - 17:
end for
|
4.4. Track Re-Association and Missing Track Estimation
Given a track variable , where represents the short track start time, denotes the short track stop time, and is the posterior estimate of successive target locations, the possibility of track matching was considered. Duplicate tracks reserved by track management were not considered, as these reserved tracks will change the natural features of the trajectory. Moreover, considering the complexity of human target movement, the value of is not the start time of the complete global track, but rather the artificially set start time of the local short track. The analysis was restricted to the latest 40 distinct track states, from which we obtained the of each .
At the moment
k, a sequence of trajectories
is produced and entered into the re-association module. We assumed that the current target state
is related to the target states of the historical
M frames. Then, the target state model can be represented by the linear regression relationship:
where
M is also referred to as the regression model order, which determines the state model complexity, and
is the weight of the order
i. The re-association method is based on this premise of the regression model. The procedure is displayed in
Figure 10.
The motion state (static or moving) of tracks needs to be identified first. The state variation of each short track is calculated, and we labeled it as stationary if the variance was less than the threshold; otherwise, we identified it as a moving track. Next, the moving and stationary tracks will be separately re-assigned.
4.4.1. Track Re-Association for Moving Target
For tracks
and
identified as moving targets with non-overlapping time, the time, slope, and state differences (
,
, and
) are calculated. If these three values exceed the thresholds, they are marked as possible homologous track pairings. After obtaining a sequence of potential homology track pairs, the completed track variable is denoted by
, and the trajectory is
, where
stands for the estimated missing trajectories. The
n-order Hankel matrix of
can be obtained as follows:
where
n is produced under the condition that the Hankel matrix arrangement becomes a square matrix; otherwise, the matrix can be interpolated.
As the minimized regression model can be converted to reduce the rank of the Hankel matrix [
35], the lost track of the chosen track pair can be calculated by
. This issue can be translated into a semi-positive definite problem as follows:
where
X and
Z are free variables and
is the trace of the matrix.
Then, for these completed tracks, define the correlation probability of the short track variables
and
:
The rank of the matrix is solved using SVD decomposition, and the cost matrix
is derived according to the correlation probability, with the likelihood that the track is associated with itself set to infinity. Finally, we used the Hungarian algorithm [
32] to calculate the following optimal allocation problem:
where
represents the allocation matrix and
p is the number of short tracks participating in the allocation. Once tracks
and
are successfully re-associated,
will be deleted and
will be set in the track management module.
4.4.2. Track Re-Association for Near-Static Target
Short tracks
and
, marked as stationary targets with non-overlapping time, are identified as possible homologous pairs when their mean states and time difference are both below the thresholds. Indeed, practically stationary tracks frequently break several times. Thus, we decreased the correlation cost by the number of track re-associations
. Assume the cost of inactive target association as the Euclidean distance divided by the number of re-associations, which can be expressed as
As the time of the re-associations of a stationary target grows, the target is more likely to be an actual, stationary target instead of clutter and, hence, has a lower correlation cost. The optimum allocation problem is then solved using the Hungarian method to produce the association result. In addition, because it is of little necessity to estimate the lost track of a stationary target, linear interpolation is used to calculate the estimated track and trajectory variable . After assignment, in the track management module, the track will be deleted, and then, we set and .
4.5. Track Management
To guarantee that the tracking method can retain a high degree of resilience in the case of false tracks, target entrance, exit, etc., the track management module separates the tracking status into five stages based on a combined evaluation of multi-frame association and re-association information: temporary, active, reserved, leaving, and released tracks.
When a new cluster is formed, the new target track is initialized as temporary until it has been successfully associated
M times in the previous
N frames. When an active trajectory fails to be associated continuously in the last
K frames, its status is changed to be reserved to ensure that the two tracks can be re-associated after beginning a new track of the same source. Once it is determined that the track cannot be re-associated in the previous
frames, it will be released. The state of the trajectory determines the values of
K and
, which is evaluated in
Section 4.4. To be specific, these two thresholds have lower values for moving targets and larger values for stationary targets. Moreover, the values are lowest for targets at the field of view bounds, which are referred to as leaving tracks.