Next Article in Journal
Optimal Excitation and Readout of Resonators Used as Wireless Passive Sensors
Previous Article in Journal
From Local Issues to Global Impacts: Evidence of Air Pollution for Romania and Turkey
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

LRPL-VIO: A Lightweight and Robust Visual–Inertial Odometry with Point and Line Features

1
College of Artificial Intelligence, Nankai University, Tianjin 300350, China
2
Shenzhen Research Institute, Nankai University, Shenzhen 518081, China
*
Author to whom correspondence should be addressed.
Submission received: 4 January 2024 / Revised: 26 January 2024 / Accepted: 15 February 2024 / Published: 18 February 2024
(This article belongs to the Section Sensors and Robotics)

Abstract

:
Visual-inertial odometry (VIO) algorithms, fusing various features such as points and lines, are able to improve their performance in challenging scenes while the running time severely increases. In this paper, we propose a novel lightweight point–line visual–inertial odometry algorithm to solve this problem, called LRPL-VIO. Firstly, a fast line matching method is proposed based on the assumption that the photometric values of endpoints and midpoints are invariant between consecutive frames, which greatly reduces the time consumption of the front end. Then, an efficient filter-based state estimation framework is designed to finish information fusion (point, line, and inertial). Fresh measurements of line features with good tracking quality are selected for state estimation using a unique feature selection scheme, which improves the efficiency of the proposed algorithm. Finally, validation experiments are conducted on public datasets and in real-world tests to evaluate the performance of LRPL-VIO and the results show that we outperform other state-of-the-art algorithms especially in terms of speed and robustness.

1. Introduction

State estimation is crucial for unmanned mobile platforms, especially when operating in GPS-denied areas. Simultaneous localization and mapping (SLAM) algorithms have the ability to provide real-time pose estimation and build consistent maps; thus, it is a crucial technique for robots, self-driving cars and augmented reality (AR) devices [1]. Pure visual SLAM algorithms [2,3,4], which use cameras as the sole sensor, are lightweight, low-cost, and have gained popularity over the past decade. However, they lack strong robustness because of sensitivity to illumination change and motion blur.
Many researchers have found that combining a camera with an inertial measurement unit (IMU) offers complementary advantages [5]. IMUs output high-frequency and biased inertial measurements while cameras produce images with rich information. Based on this, numerous visual–inertial odometry and SLAM systems are designed to obtain accurate and robust pose estimation. According to the estimation strategy, they can be divided into two categories: optimization-based methods and filter-based methods. The former constructs a factor graph with visual re-projection errors and IMU pre-integration errors to optimize poses and feature landmarks such as OKVIS [6] and VINS-Mono [7]. The computational load is managed using a sliding window and marginalization to achieve real-time performance. The latter holds a state vector which consists of body states (position, speed, orientation, and inertial biases) and a fixed number of history poses such as MSCKF [8] and HybVIO [9]. State propagation is finished on the basis of IMU kinematic model and visual update provides multi-frame constraints to produce an accurate trajectory. However, the aforementioned algorithms rely solely on points for visual constraints, which can lead to divergence or failure in low-texture environments.
As line features are abundant in human-made worlds, more and more VIO frameworks fuse both points and lines to improve their performance. PL-VIO [10] is the first optimization-based point–line visual–inertial odometry framework. Points, lines and IMU pre-integration terms are integrated into the optimization window to recover trajectories and scene appearances. Hence, it can outperform its predecessor VINS-Mono in some large difficult environments with severe sacrifice of running time. To speed up the processing of line features, the effect of the hidden parameters in the LSD algorithm [11] was studied in PL-VINS [12]. The authors modified a proper set of parameters to balance the speed and quality of line feature extraction in the original LSD for pose estimation tasks. In this way, PL-VINS is capable of outputting estimated poses in real-time. FPL-VIO [13] applied two methods to make the front end lightweight. It uses a fast line detection algorithm FLD [14] instead of LSD to extract line features and BRIEF descriptors [15] of midpoints to perform line matching, which greatly reduces the running time of the front end. The authors in [16] presented a similar solution, choosing EDlines [17] with gamma correction for rapid detection of long line features. They tracked a certain number of points on the line, instead of the entire segment, using the sparse KLT algorithm for line matching. As a result, the consumed time of line features in the front end is declined. However, the back end of these optimization-based methods is still a heavy module because of the repeated linearization of visual and inertial error terms, which becomes worse after fusing both point and line features [10].
Since filter-based methods avoid the re-linearization, they are considered to be more efficient [5]. Trifo-VIO [18] is a stereo point–line VIO algorithm based on MSCKF. After state propagation, both point and line features are used for visual update. However, the line features are parameterized using a 3D point and a normal vector in this system, which is an over-parameterized representation because a space line has only four degrees of freedom. Another MSCKF with lines framework is proposed in [19]. This system adopts the closest point method to represent line features and shows a good performance in real-world experiments. However, its front end uses LBD [20] to match line features; thus, its real-time performance is severely limited. A hybrid point–line MSCKF algorithm is proposed in [21]. Based on the sparse KLT algorithm, it tracks sampled points on the line between three consecutive frames in a predicting–matching way; thus, a new line can be recovered if the original one is lost. However, extra memories and operations are required in the hybrid framework since line feature landmarks are preserved in the state vector.
Most SLAM and odometry algorithms run on small-sized devices with limited available resources. How to provide accurate and high-frequency pose estimation with low computational consumption for multiple feature frameworks is still an open problem. To solve this, we propose a novel lightweight point–line visual–inertial odometry algorithm which can robustly track the poses of moving platforms. The main contributions of this paper are as follows:
  • A novel filter-based point–line VIO framework with a unique feature selection scheme is proposed to produce high-frequency and accurate pose estimation results. The whole system is fast, robust, and accurate to work in complex environments such as weak texture and motion blur.
  • A fast line matching method is proposed in order to decline the running time of the front end. The lines are matched using an endpoint–midpoint tracking way and a complete prediction–tracking–rejection scheme, which can ensure the matching quality with a fast speed.
  • Validation experiments on public datasets and in real-world tests are conducted to evaluate the proposed LRPL-VIO. The results prove the better performance of LRPL-VIO compared with other state-of-the-art systems (HybVIO [9], VINS-Mono [7], PL-VIO [10], and PL-VINS [12]), especially in terms of speed and robustness.
The rest of this paper is organized as follows. Section 2 describes our filter-based point–line VIO system. The proposed fast line matching method is detailed in Section 3. The experiment results are explained and presented in Section 4. Finally, conclusion and future works are discussed in Section 5.

2. Filter-Based Point–Line Visual–Inertial Odometry

While point-only visual–inertial odometry algorithms can produce accurate pose estimations in environments with constant illumination and rich texture, they often struggle, tending to diverge or fail in more challenging scenes. Fusing multiple features is a good solution, while the whole system becomes heavy. In this paper, we design a lightweight and efficient point–line VIO system based on HybVIO [9] to tackle this issue. The working flowchart of LRPL-VIO is shown in Figure 1.

2.1. State Definition

Similar to most filters derived from MSCKF [8], the state vector in our system consists of the body states and a window of past poses. At timestamp k, the state vector is constructed as:
x k = ( p k T , v k T , q k T , b k T , τ k , Π k T ) T ,
where p k and q k denote the current pose of the body. v k is the velocity. And
b k = ( b k a T , b k ω T , d i a g ( T k a ) T ) T
is a vector related to inertial biases. Only the diagonal elements of T k a are used for the multiplicative correction of the accelerometer. τ k represents the IMU-camera time shift. A fixed-length window
Π k = ( p k ( 1 ) T , q k ( 1 ) T , , p k ( n a ) T , q k ( n a ) T ) T
holds n a poses of past moments.

2.2. Filter Propagation

The states are initialized as m 1 | 1 after obtaining the current orientation q 0 using the first inertial measurement. The initial covariance matrix P 1 | 1 are a diagonal matrix. The system are propagated using each subsequent inertial measurement as the prediction steps of the core filter:
x k | k 1 = f k x k 1 | k 1 , ε k ,
where ε k N ( 0 , Q k ) is the Gaussian process noise. This propagation is finished in discrete-time by a mechanization equation [22]:
p k v k q k = p k 1 + v k 1 Δ t k v k 1 + q k a ˜ k + ε k a q k * g Δ t k Ω ω ˜ k + ε k ω Δ t k q k 1 ,
where Δ t k is the current time increment. The biased inputs of gyroscope and accelerometer are calculated as a ˜ k = T k a a k b k a and ω ˜ k = ω k b k ω . ε k a N ( 0 , Σ k a Δ t k ) and ε k ω N ( 0 , Σ k ω Δ t k ) are i.i.d. Gaussian noises. g is the gravity vector. The rotation process represented by the quaternion is q k · q k * and the quaternion is updated by the function Ω : R 3 R 4 × 4 [23]. The bias vector is propagated by
b k a b k ω T k a = exp ( α a Δ t k ) b k 1 a + ϵ k a exp ( α ω Δ t k ) b k 1 ω + ϵ k ω T k 1 a ,
where ϵ k N ( 0 , σ 2 2 α 1 exp ( 2 α Δ t k ) ) is modeled as the Ornstein–Uhlenbeck random walks [24] to better match the characteristics of the IMU sensor.

2.3. Image Processing

For points, we use the Good Features to Track (GFTT) algorithm [25] to extract new features and the sparse KLT optical flow algorithm [26] to perform feature tracking. The inertial measurements between consecutive frames are integrated to obtain the instant rotation. Initial values for the feature tracker, based on two-view geometry, could be obtained (See Equation (28)) and enhance tracking quality during rapid camera motions After all this, a hybrid 2-point [27] and 5-point [28] RANSAC method is performed to reject outliers.
For lines, we use the modified LSD algorithm [11,12] to detect new line segments and set a fixed threshold to abandon short lines. The line matching is finished using the proposed fast line matching method (See Section 3), which can greatly decrease the execution time of the front end and provide higher accuracy for our VIO system than the traditional descriptor-based method LBD [20].

2.4. Feature Selection

In addition to feature detection and matching, visual update in filter-based VIO methods is another time-consuming module. Paying more attention to the most informative features is an efficient way of decreasing computational load. Another novelty of the proposed LRPL-VIO is that we do not use all the tracked features (both points and lines) but a subset of them to perform visual updates.
For a visual feature j, its whole track is a set of pose indices i = i m i n j , , i m a x j where i m i n j denotes its first detection frame and i m a x j denotes its last tracked frame. As the system moves, old poses are abandoned; thus, the oldest pose in the window denoted as b ( i ) may not be i m i n j anymore. We use b ( i , j ) = m a x ( i m i n j , b ( i ) ) to represent the oldest tracked frame in the window. Not all the measurements but a subset of them are used for triangulation and linearization:
S ( i , j ) = { b ( i , j ) } { m a x ( S ( i , j ) ) + 1 , , i } ,
where i < i is the newest frame used in the last update. In a word, we always choose the freshest information for efficiency.
For a new received frame, we also select a subset of all available visual feature tracks (denoted as U ( i ) ) to perform visual update at random from more-than-median ones
{ j U ( i ) | L ( i , j ) > m e d i a n U ( i ) ( L ( i , · ) ) } ,
where the implementation of L ( i , j ) are different for points and lines in LRPL-VIO. For points, they are evaluated by the tracking length:
L ( i , j ) p o i n t = l S ( i , j ) { b ( i , j ) } y l j y l 1 j 1 ,
where y j is the pixel coordinate. For lines, they are less sensitive to tracking length change than points. Thus, we use the frame number as the scoring policy:
L ( i , j ) l i n e = i m a x ( S ( i , j ) ) 1 ,
which ensures the update accuracy even using a small number of line features.

2.5. Feature Triangulation and Update

The visual update is triggered track by track until the target number is reached:
h k , j ( x k | k 1 , j 1 ) = γ k , j N ( 0 , σ v i s u 2 I )
with
h k , j ( x ) = d ( r S ( ξ S ( x , y ˜ S j ) , x ) , y ˜ S j ) ,
where
ξ S ( x , y ˜ S j ) = T r i ( Π ( S ) , y ˜ S j )
denotes the triangulated landmark using its tracked feature measurements y ˜ S j . r ( · ) is the re-projection process and d ( · ) is the error calculation.

2.5.1. Point Feature

The point error is the difference between the re-projected landmark and tracked measurements:
h k , j ( x ) p o i n t = r S ( p S ( x , y ˜ S j ) , x ) y ˜ S j ,
where the point triangulation is the minimization process of the re-projection error
R M S E j ( p S , x ) = r S ( p S , x )
using the GN method. Since the Jacobian of p S with respect to x is available after the initial value is provided by a two-frame triangulation, the whole optimization process of Equation (15) needs to be differentiated to render the direct linearization of Equation (14) with respect to x
h k , j ( x ) p o i n t J h , k , j ( x 0 ) ( x x 0 ) + h k , j ( x 0 ) p o i n t ,
which avoids the null space projection motion and can be used for visual update.

2.5.2. Line Feature

The line error is defined as the distance between the endpoints of tracked measurements and the re-projected line:
h k , j ( x ) l i n e = d ( r S ( l S ( x , y ˜ S j ) , x ) , y ˜ S , s j ) d ( r S ( l S ( x , y ˜ S j ) , x ) , y ˜ S , e j )
with
d ( l , e ) = e T l l 1 2 + l 2 2 ,
where l = [ l 1 , l 2 , l 3 ] is the re-projected line. For a space line representation, the Plücker coordinate [29] L = [ n T , d T ] T is used in our system. On the basis of two camera poses ( p j ( 1 ) , q j ( 1 ) , p j ( 2 ) , q j ( 2 ) ) and their corresponding measurements ( e s , j ( 1 ) , e e , j ( 1 ) , e s , j ( 2 ) , e e , j ( 2 ) ) , we can obtain the dual Plücker matrix of a line feature [30] as
L d u a l = π ( 1 ) π ( 2 ) T π ( 2 ) π ( 1 ) T = [ d ] × n n T 0 ,
where
π = ( e s , j w p j ) × ( e e , j w p j ) , p j ( e s , j w × e e , j w )
are the measurement plane determined by two endpoints and the camera optical center. Triangulation depending on just two frames is not reliable enough; thus, we introduce a n-views method proposed in [31]. Specifically, for n L measurements of a line L , we stack all relevant planes:
W = π ( 1 ) T π ( 2 ) T π ( n L ) T T
and perform singular value decomposition of Equation (21) as s v d ( W ) = [ s , d , v ] . We can obtain two main planes π 1 and π 2 from the columns of v by checking two largest singular values. We use Equation (19) to obtain the initial value of L if the singular values are reasonable and perform a nonlinear optimization to further improve the accuracy of this triangulation. Based on the above methods, the linearization of Equation (17) is performed as
h k , j ( x , L ) l i n e J h , k , j ( x 0 ) ( x x 0 ) + J h , k , j ( L 0 ) ( L L 0 ) + h k , j ( x 0 , L 0 ) l i n e
and the null space projection motion [19] is unavoidable for visual update because the feature positions are not maintained in the state vector.

2.6. Pose Augmentation and Stationary Detection

Every time a new camera frame is received, its predicted pose is inserted into the window and an old pose is removed. This process is performed as an EKF prediction step:
x k + 1 | k = A d A u g x k | k
with
A d A u g = I 20 I 7 I 7 ( d 1 ) I 7 ( n a d ) .
The adjustment of d can be treated as an efficient strategy and we follow [9] to combine a fixed-size n F I F O with a Towers-of-Hanoi scheme:
d i = m a x ( n F I F O , n a L S B ( i ) ) ,
where L S B ( i ) is the least-significant zero bit index of i. Then the max stride of poses is exponentially increased and the update time of old and new poses are properly set to different frequencies.
When the moving platform stays still, the poses in the window are quickly be the same due to Equation (23), which makes the VIO unstable. Thus, an unaugmentation step is performed if a stationary signal is received as
x k + 1 | k = ( A d A u g ) T x k | k + 0 d i m ( x ) 7 ε u ,
which pops the new inserted frame and holds most of old poses. We judge the stationary condition by the maximum pixel change of tracked point features:
m k = m a x j y k j , L y k 1 j , L < m m i n ,
where m m i n is a fixed threshold. And a ZUPT of velocity [32] is also performed to correct the pose estimation results.

3. Fast Line Matching

The complex pixel distribution of line features makes their matching more challenging and time-consuming compared to point features. In this section, we propose a novel fast line matching method to break this bottleneck. An overview of our method is shown in Algorithm 1 and details are explained below.
Algorithm 1 Fast Line Matching
Require: 
I 1 , I 2 , I M U 1 2 , K , L 1
Ensure: 
L 2
  1:
R 21 I n t e g r a t e ( I M U 1 2 )
  2:
for  l i L 1 do
  3:
    s i , m i , e i E x t r a c t ( l i )
  4:
    P r e d i c t ( R 21 , K , s i , m i , e i )
  5:
    s i , m i , e i T r a c k ( I 1 , I 2 , s i , m i , e i )
  6:
    l i O u t l i e r   R e j e c t ( I 1 , I 2 , s i , m i , e i , s i , m i , e i )
  7:
    L 2 L 2 l i
  8:
end for
  9:
L 2 R A N S A C ( L 1 , L 2 )
10:
L 2 M a t c h A n d R e m o v e S h o r t L i n e s ( L 2 )
11:
returnL2
Extraction: For each line feature, tracking is focused on its two endpoints and midpoint, rather than the entire line or other sampled points. In other words, for n line features, we have 3 n points in total.
Prediction: To counteract aggressive motions, inertial measurements between two camera frames are used to determine the initial positions of the points for tracking. Specifically, for two consecutive frames, I 1 and I 2 , a point transformation between them is:
λ 2 K 1 v 2 = λ 1 R 21 K 1 v 1 + t 21 ,
where v 1 and v 2 are pixel coordinates of the same point in these frames. λ 1 and λ 2 are the corresponding depth measurements. K is the intrinsic matrix which is considered as a static variable. The pose between I 1 and I 2 is represented by R 21 and t 21 . By taking the assumption that the translation t 21 between two consecutive frames is small enough to be ignored, λ 1 and λ 2 can be removed from Equation (28). Thus, a simplified version is:
v 2 = K R 21 K 1 v 1 .
We obtain the rotation R 21 through gyroscope measurements integration and then the predicted positions of the points using Equation (29).
Tracking: After the above stages, the line matching task becomes the tracking of the points, which is finished based on the photometric invariance assumption in LRPL-VIO. Take a single line endpoint as an example. With its original pixel coordinate ( x , y ) in I 1 , our idea is to find the target pixel coordinate ( x + d x , y + d y ) in I 2 to satisfy Equation (30):
I 1 ( x , y ) = I 2 ( x + d x , y + d y ) ,
where I i ( a , b ) is the photometric value of the pixel ( a , b ) in I i . Apparently we can not obtain ( d x , d y ) using one equation; thus, another assumption that the movements of all pixels in a local window are the same is applied. That is, we have
I 1 ( x 1 , y 1 ) = I 2 ( x 1 + d x , y 1 + d y ) I 1 ( x 2 , y 2 ) = I 2 ( x 2 + d x , y 2 + d y ) I 1 ( x w , y w ) = I 2 ( x w + d x , y w + d y )
for all w pixels in the window. To solve Equation (31), a nonlinear optimization problem is constructed:
d x , d y = a r g min d x , d y g ( d x , d y ) 2 ,
where
g ( d x , d y ) = i = 1 w ( I 1 ( x i , y i ) I 2 ( x i + d x , y i + d y ) ) .
Equation (32) is a typical least squares problem and can be solved in an iterative way with the initial values provided by Equation (29). In addition, the image pyramids are introduced to improve the tracking quality.
Outlier Rejection: As long as the points of a line feature are tracked, we first check the average photometric values of two endpoints. In other words, an endpoint track is considered as an inlier if
Δ I ¯ = 1 w i = 1 w ( I 1 ( x i , y i ) I 2 ( x i + d x , y i + d y ) ) < ε I ,
where ε I is the threshold. However, Equation (34) is not enough to reject outliers when there is a large repeated texture area in the image. For this reason, an angle variation check is also performed if both two endpoints passed Equation (34). Namely, if a line matching pair [ ( s i , e i ) , ( s i , e i ) ] meets
Δ θ i = θ i θ i < ε θ ,
where θ i and θ i are the angles of the line in consecutive frames, ( s i , e i ) is seen as a candidate line.
Generally, endpoints have the potential to move out of view or be tracked unsuccessfully. Hence, after obtaining the first batch of candidate lines by checking endpoints, we take tracked midpoints as new endpoints of the line features which failed to pass the above tests. For example, if [ ( s i , e i ) , ( s i , e i ) ] is not an acceptable tracking result, it will be replaced by [ ( s i , m i ) , ( s i , m i ) ] or [ ( m i , e i ) , ( m i , e i ) ] . Certainly, the replaced line pairs have to satisfy both Equations (34) and (35). This scheme is able to improve the tracking length of line features with no additional sampled points. Finally, an 8-point RANSAC is performed to further reject outliers in these candidates.
Matching: After all this, we build matched line features through connecting the reserved endpoints and remove short ones which are useless for pose estimation.

4. Experiments

4.1. Dataset and Evaluation

To validate the necessity of fusing point–line features and the performance of our LRPL-VIO in different scenes, we conduct various experiments on three public academic datasets (EuRoC [33], UMA-VI [34], and VIODE [35]) and a collected real-world dataset. Four state-of-the-art algorithms (point-based VINS-Mono [7] and HybVIO [9], point–line-based PL-VIO [10] and PL-VINS [12]) are selected for comparison.
For the evaluation criteria, we choose the root mean square error (RMSE) of the absolute trajectory error (ATE) to test the estimation accuracy of different algorithms. For the EuRoC, VIODE and our collected dataset which provide groundtruth poses during the whole running process, we use the evo [36] toolbox to compute RMSE ATE between the whole estimated trajectory and groundtruth poses. For the UMA-VI dataset whose groundtruth poses are available at the start and end segments of the whole running process, we use their python tool to compute RMSE ATE between these segments of the estimated trajectory and the ground truth poses (the alignment error [34,37]). And we report the average value of five times.
A desktop computer with an Intel Core i7-9750H processor @2.60GHz and 15.5 GB RAM is used as the main experiment platform running Ubuntu 18.04 with ROS melodic.

4.2. Accuracy

In this subsection, we conduct an accuracy experiment on the EuRoC [33] dataset. It is made by a micro aerial vehicle (MAV) in three different indoor scenes. Sequences in each scene are divided into three modes: easy, medium, and difficult, according to the image quality and MAV motion speed. The results are shown as follows.

4.2.1. Ablation Experiment

In order to validate the effectiveness of our LRPL-VIO with point–line fusion, fast front end and feature track selection, we first conduct an ablation experiment on five sequences of EuRoC dataset including MH_02_easy, MH_03_medium, MH_05_difficult, V1_03_difficult, and V2_02_medium. We replace the fast line matching method with the PL-VINS LBD matching module in our system (denoted as LRPL-VIO (LBD)) for matching comparison. And the line feature selection module is disabled (denoted as LRPL-VIO (All Line Track)) to prove its necessity. The results are shown in Table 1.
First, it can be seen from Table 1 that the point–line fusion strategy could bring more visual constraints for the VIO system; thus, LRPL-VIO could produce more accurate trajectories than the point-only HybVIO (with 11% enhance on the average). Second, the proposed fast line matching method could finish line matching more efficiently than LBD with higher matching quality (LRPL-VIO obtains lower RMSE ATE than LRPL-VIO (LBD) on all five sequences) and less running time (See Table 6). Finally, the feature track selection scheme avoids using all tracked line features and their updated measurements; thus, the pose estimation accuracy could be guaranteed (with 2% enhance on the average) even using a small numbers of features (5 successful line updates at most for one frame in our implementation).

4.2.2. Accuracy Experiment

We use all 11 sequences on the EuRoC dataset to test the pose estimation accuracy of LRPL-VIO and compare it with four SOTA open-source algorithms. The results are shown in Table 2.
Compared with two point-only methods VINS-Mono and HybVIO, LRPL-VIO outperforms them on most sequences because of successful point–line fusion. Using visual constraints from various features, visual–inertial navigation systems could perform pose estimation more accurately. The average RMSE of LRPL-VIO is more than 10% lower than them. With improved line matching quality using the proposed method and feature selection scheme, line features could be used in LRPL-VIO in a more efficient way. Thus, compared with the LBD-based PL-VIO and PL-VINS, we outperform them with more than 7% lower average RMSE and less computational resource consumption (See Table 6).

4.3. Robustness

To further validate the robustness of the proposed LRPL-VIO, we select some challenging sequences from the following two datasets:
The UMA-VI dataset [34] is recorded by a custom handheld visual–inertial sensor suite. The images recorded in different scenes are severely affected by many challenging factors including low texture, illumination change, sun overexposure, and motion blur, which makes it a difficult dataset for VIO algorithms.
The VIODE dataset [35] is recorded by a simulated unmanned aerial vehicle (UAV) in dynamic environments. The novelty of this dataset is that the UAV navigates the same path in four sub-sequences (none, low, mid, high) of each scene, and the only difference between them is the number of dynamic objects.
The sequence features are listed in Table 3 and the results are shown in Table 4.
Effective point–line fusion strategy could improve the robustness of visual–inertial odometry algorithms. From Table 4, we can see that PL-VINS and LRPL-VIO can perform successful pose estimation on all these challenging sequences. However, we show a better performance with a lower error on each sequence, which validates the better robustness of LRPL-VIO. We also provide the alignment error figures and heat maps of estimated trajectories of PL-VINS and LRPL-VIO in Figure 2. For the alignment error figures, the smaller the translational error is, the better accuracy the VIO could provide. For the heat maps, we could focus on the difference between the estimated trajectory and groundtruth poses, which is marked in different colors. Based on this, Figure 2 can validate the better robustness of LRPL-VIO than PL-VINS on the other hand.

4.4. Real-World Performance

To test the performance of LRPL-VIO in real-world applications, we collected a custom dataset in a challenging indoor scene. A sensor suite with a Intel Realsense D455 camera (gray image, 30 Hz) and a Xsens MTi-680G IMU (inertial measurement, 200 Hz) is used as the collection platform. Two motion modes (normal and fast rotation) are applied to produce different evaluation sequences, which are shown in Figure 3a,b. The results are shown in Table 5.
From Table 5, it can be seen that LRPL-VIO could perform pose estimation more accurately than HybVIO in the experiments. The RMSE ATE of LRPL-VIO is 35.4% lower in Lab_Normal and 26.5% lower in Lab_FastRotation. Fusing various features could bring more constraints; thus, the whole estimated trajectories of LRPL-VIO are closer to groundtruth poses. And Figure 3c–j could validate this more intuitively.

4.5. Runtime

To evaluate the real-time performance of LRPL-VIO, we divide it into three main modules including point processing (front end), line processing (front end), and VIO (back end) for convenience of comparison with PL-VIO and PL-VINS. And the MH_04_difficult sequence of EuRoC dataset is used to conduct this test. The results are shown in Table 6.
As shown in Table 6, the time-consuming LBD and the heavy optimization back end are the most time-consuming module of PL-VIO and PL-VINS. In contrast, the proposed fast line matching method in Section 3 brings our system high efficiency. The execution time of line detection and tracking process of LRPL-VIO is much less than them. In addition, our core pose estimation scheme is an efficient EKF with a unique feature selection scheme, which ensures that our total processing speed of a single frame is nearly three times faster than PL-VINS.

5. Conclusions and Future Work

In this paper, a novel point–line visual–inertial odometry is proposed to address positioning issues in complex environments such as weak texture and dynamic features. The short runtime of feature correspondence is maintained by a fast line matching method; thus, the whole system can work at a high frequency. A line feature selection scheme is utilized to further improve the efficiency of the core filter. Validation experiments on the EuRoC, UMA-VI, and VIODE dataset have shown the better performance and efficiency of our system against other SOTA open-source algorithms (HybVIO [9], VINS-Mono [7], PL-VIO [10], and PL-VINS [12]). In the future, we will try to introduce the structural constraints of 3D line features and plane features to further improve the accuracy.

Author Contributions

Conceptualization, F.Z. and L.S.; investigation, F.Z.; methodology, F.Z., L.Z. and L.S.; software, F.Z., W.L. and J.L.; writing—original draft preparation, F.Z.; writing—review and editing, F.Z., L.Z., W.L., J.L. and L.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation under Grant 62173192 and Shenzhen Natural Science Foundation under Grant JCYJ20220530162202005.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data sharing is not applicable.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Tourani, A.; Bavle, H.; Sanchez-Lopez, J.L.; Voos, H. Visual SLAM: What Are the Current Trends and What to Expect? Sensors 2022, 22, 9297. [Google Scholar] [CrossRef]
  2. Forster, C.; Pizzoli, M.; Scaramuzza, D. SVO: Fast Semi-Direct Monocular Visual Odometry. In Proceedings of the 2014 IEEE International Conference on Robotics and Automation (ICRA), Hong Kong, China, 31 May 2014–7 June 2014; pp. 15–22. [Google Scholar]
  3. Mur-Artal, R.; Montiel, J.M.M.; Tardos, J.D. ORB-SLAM: A Versatile and Accurate Monocular SLAM System. IEEE Trans. Robot. 2015, 31, 1147–1163. [Google Scholar] [CrossRef]
  4. Engel, J.; Koltun, V.; Cremers, D. Direct sparse odometry. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 611–625. [Google Scholar] [CrossRef] [PubMed]
  5. Huang, G. Visual-Inertial Navigation: A Concise Review. In Proceedings of the 2019 IEEE International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; pp. 9572–9582. [Google Scholar]
  6. Leutenegger, S.; Lynen, S.; Bosse, M.; Siegwart, R.; Furgale, P. Keyframe-based Visual-Inertial Odometry Using Nonlinear Optimization. Int. J. Robot. Res. 2015, 34, 314–334. [Google Scholar] [CrossRef]
  7. Qin, T.; Li, P.; Shen, S. VINS-Mono: A Robust and Versatile Monocular Visual-Inertial State Estimator. IEEE Trans. Robot. 2018, 34, 1004–1020. [Google Scholar] [CrossRef]
  8. Mourikis, A.I.; Roumeliotis, S.I. A Multi-State Constraint Kalman Filter for Vision-Aided Inertial Navigation. In Proceedings of the 2007 IEEE International Conference on Robotics and Automation (ICRA), Rome, Italy, 10–14 April 2007; pp. 3565–3572. [Google Scholar]
  9. Seiskari, O.; Rantalankila, P.; Kannala, J.; Ylilammi, J.; Rahtu, E.; Solin, A. HybVIO: Pushing the Limits of Real-time Visual-Inertial Odometry. In Proceedings of the 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 3–8 January 2022; pp. 701–710. [Google Scholar]
  10. He, Y.; Zhao, J.; Guo, Y.; He, W.; Yuan, K. PL-VIO: Tightly-Coupled Monocular Visual-Inertial Odometry using Point and Line Features. Sensors 2018, 18, 1159. [Google Scholar] [CrossRef]
  11. Von Gioi, R.G.; Jakubowicz, J.; Morel, J.M.; Randall, G. LSD: A Fast Line Segment Detector with A False Detection Control. IEEE Trans. Pattern Anal. Mach. Intell. 2008, 32, 722–732. [Google Scholar] [CrossRef]
  12. Fu, Q.; Wang, J.; Yu, H.; Ali, I.; Guo, F.; He, Y.; Zhang, H. PL-VINS: Real-Time Monocular Visual-Inertial SLAM with Point and Line Features. arXiv 2020, arXiv:2009.07462. [Google Scholar]
  13. Li, W.; Cai, H.; Zhao, S.; Liu, Y.; Liu, C. A Fast Visual-Inertial Odometry Based on Line Midpoint Descriptor. Int. J. Autom. Comput. 2021, 18, 667–679. [Google Scholar] [CrossRef]
  14. Li, J.H.; Li, S.; Zhang, G.; Lim, J.; Chung, W.K.; Suh, I.H. Outdoor Place Recognition in Urban Environments Using Straight Lines. In Proceedings of the 2014 IEEE International Conference on Robotics and Automation (ICRA), Hong Kong, China, 31 May–7 June 2014; pp. 5550–5557. [Google Scholar]
  15. Calonder, M.; Lepetit, V.; Strecha, C.; Fua, P. BRIEF: Binary Robust Independent Elementary Features. In Proceedings of the 2010 European Conference on Computer Vision (ECCV), Heraklion, Crete, Greece, 5–11 September 2010; pp. 778–792. [Google Scholar]
  16. Kuang, Z.; Wei, W.; Yan, Y.; Li, J.; Lu, G.; Peng, Y.; Li, J.; Shang, W. A Real-time and Robust Monocular Visual Inertial SLAM System Based on Point and Line Features for Mobile Robots of Smart Cities Toward 6G. IEEE Open J. Commun. Soc. 2022, 3, 1950–1962. [Google Scholar] [CrossRef]
  17. Akinlar, C.; Topal, C. EDLines: A Real-time Line Segment Detector with A False Detection Control. Pattern Recognit. Lett. 2011, 32, 1633–1642. [Google Scholar] [CrossRef]
  18. Zheng, F.; Tsai, G.; Zhang, Z.; Liu, S.; Chu, C.C.; Hu, H. Trifo-VIO: Robust and Efficient Stereo Visual Inertial Odometry using Points and Lines. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; pp. 3686–3693. [Google Scholar]
  19. Yang, Y.; Geneva, P.; Eckenhoff, K.; Huang, G. Visual-Inertial Navigation with Point and Line Features. In Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China, 4–8 November 2019; p. 3. [Google Scholar]
  20. Zhang, L.; Koch, R. An Efficient and Robust Line Segment Matching Approach Based on LBD Descriptor and Pairwise Geometric Consistency. J. Vis. Commun. Image Represent. 2013, 24, 794–805. [Google Scholar] [CrossRef]
  21. Wei, H.; Tang, F.; Xu, Z.; Zhang, C.; Wu, Y. A Point-Line VIO System With Novel Feature Hybrids and With Novel Line Predicting-Matching. IEEE Robot. Automat. Lett. 2021, 6, 8681–8688. [Google Scholar] [CrossRef]
  22. Solin, A.; Cortes, S.; Rahtu, E.; Kannala, J. PIVO: Probabilistic Inertial-Visual Odometry for Occlusion-Robust Navigation. In Proceedings of the 2018 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, USA, 12–15 March 2018; pp. 616–625. [Google Scholar]
  23. Titterton, D.; Weston, J.L. Strapdown Inertial Navigation Technology; The Institution of Electrical Engineers: Stevenage, UK, 2004; pp. 42–45. [Google Scholar]
  24. Uhlenbeck, G.E.; Ornstein, L.S. On The Theory of The Brownian Motion. Phys. Rev. 1930, 36, 823. [Google Scholar] [CrossRef]
  25. Shi, J.; Tomasi, C. Good Features To Track. In Proceedings of the 1994 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 21–23 June 1994; pp. 593–600. [Google Scholar]
  26. Lucas, B.D.; Kanade, T. An Iterative Image Registration Technique with An Application to Stereo Vision. In Proceedings of the 1981 International Joint Conference on Artificial Intelligence (IJCAI), Vancouver, BC, Canada, 24–28 August 1981; pp. 674–679. [Google Scholar]
  27. Kanatani, K.i. Analysis of 3-D Rotation Fitting. IEEE Trans. Pattern Anal. Mach. Intell. 1994, 16, 543–549. [Google Scholar] [CrossRef]
  28. Nistér, D. An Efficient Solution to The Five-Point Relative Pose Problem. IEEE Trans. Pattern Anal. Mach. Intell. 2004, 26, 756–770. [Google Scholar] [CrossRef]
  29. Zhang, G.; Lee, J.H.; Lim, J.; Suh, I.H. Building a 3-D Line-Based Map Using Stereo SLAM. IEEE Trans. Robot. 2015, 31, 1364–1377. [Google Scholar] [CrossRef]
  30. Hartley, R.; Zisserman, A. Multiple View Geometry in Computer Vision; Cambridge University Press: Cambridge, UK, 2003; pp. 322–323. [Google Scholar]
  31. Lee, S.; Hwang, S. Elaborate Monocular Point and Line SLAM with Robust Initialization. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1121–1129. [Google Scholar]
  32. Solin, A.; Cortes, S.; Rahtu, E.; Kannala, J. Inertial Odometry on Handheld Smartphones. In Proceedings of the 2018 International Conference on Information Fusion (FUSION), Cambridge, UK, 10–13 July 2018; pp. 1–5. [Google Scholar]
  33. Burri, M.; Nikolic, J.; Gohl, P.; Schneider, T.; Rehder, J.; Omari, S.; Achtelik, M.W.; Siegwart, R. The EuRoC Micro Aerial Vehicle Datasets. Int. J. Robot. Res. 2016, 35, 1157–1163. [Google Scholar] [CrossRef]
  34. Zuñiga-Noël, D.; Jaenal, A.; Gomez-Ojeda, R.; Gonzalez-Jimenez, J. The UMA-VI Dataset: Visual-Inertial Odometry in Low-textured and Dynamic Illumination Environments. Int. J. Robot. Res. 2020, 39, 1052–1060. [Google Scholar] [CrossRef]
  35. Minoda, K.; Schilling, F.; Wüest, V.; Floreano, D.; Yairi, T. VIODE: A Simulated Dataset to Address The Challenges of Visual-Inertial Odometry in Dynamic Environments. IEEE Robot. Automat. Lett. 2021, 6, 1343–1350. [Google Scholar] [CrossRef]
  36. Grupp, M. EVO: Python Package for The Evaluation of Odometry and SLAM. 2017. Available online: https://github.com/MichaelGrupp/evo (accessed on 3 January 2024).
  37. Engel, J.; Usenko, V.; Cremers, D. A Photometrically Calibrated Benchmark For Monocular Visual Odometry. arXiv 2016, arXiv:1607.02555. [Google Scholar]
Figure 1. The working flowchart of LRPL-VIO.
Figure 1. The working flowchart of LRPL-VIO.
Sensors 24 01322 g001
Figure 2. The pose estimation error of PL-VINS and LRPL-VIO on the UMA-VI and VIODE dataset. (a) The alignment error of PL-VINS in class_csc2. (b) The alignment error of PL-VINS in parking_csc2. (c) The RMSE ATE of PL-VINS in cd3_high. (d) The alignment error of LRPL-VIO in class_csc2. (e) The alignment error of LRPL-VIO in parking_csc2. (f) The RMSE ATE of LRPL-VIO in cd3_high.
Figure 2. The pose estimation error of PL-VINS and LRPL-VIO on the UMA-VI and VIODE dataset. (a) The alignment error of PL-VINS in class_csc2. (b) The alignment error of PL-VINS in parking_csc2. (c) The RMSE ATE of PL-VINS in cd3_high. (d) The alignment error of LRPL-VIO in class_csc2. (e) The alignment error of LRPL-VIO in parking_csc2. (f) The RMSE ATE of LRPL-VIO in cd3_high.
Sensors 24 01322 g002
Figure 3. The figures of real-world experiments. (a) An example image of sequence Lab_Normal. (b) An example image of sequence Lab_FastRotation. (c) The 3D error map of HybVIO in Lab_Normal. (d) The X-Y plane of 3D error map of HybVIO in Lab_Normal. (e) The 3D error map of HybVIO in Lab_FastRotation. (f) The X-Y plane of 3D error map of HybVIO in Lab_FastRotation. (g) The 3D error map of LRPL-VIO in Lab_Normal. (h) The X-Y plane of 3D error map of LRPL-VIO in Lab_Normal. (i) The 3D error map of LRPL-VIO in Lab_FastRotation. (j) The X-Y plane of 3D error map of LRPL-VIO in Lab_FastRotation.
Figure 3. The figures of real-world experiments. (a) An example image of sequence Lab_Normal. (b) An example image of sequence Lab_FastRotation. (c) The 3D error map of HybVIO in Lab_Normal. (d) The X-Y plane of 3D error map of HybVIO in Lab_Normal. (e) The 3D error map of HybVIO in Lab_FastRotation. (f) The X-Y plane of 3D error map of HybVIO in Lab_FastRotation. (g) The 3D error map of LRPL-VIO in Lab_Normal. (h) The X-Y plane of 3D error map of LRPL-VIO in Lab_Normal. (i) The 3D error map of LRPL-VIO in Lab_FastRotation. (j) The X-Y plane of 3D error map of LRPL-VIO in Lab_FastRotation.
Sensors 24 01322 g003
Table 1. The results of the ablation experiment, which is evaluated using RMSE ATE in meter.
Table 1. The results of the ablation experiment, which is evaluated using RMSE ATE in meter.
HybVIOLRPL-VIO (LBD)LRPL-VIO (All Line Track)LRPL-VIO
MH_02_easy0.2130.1860.160 10.178 2
MH_03_medium0.3190.3150.307 20.281 1
MH_05_difficult0.3680.3620.355 10.358 2
V1_03_difficult0.110 10.1400.1330.118 2
V2_02_medium0.1270.077 20.0800.076 1
Mean0.2270.2160.207 20.202 1
Enhance11%6%2%-
1 means the best while 2 means the second best.
Table 2. The results of the pose estimation accuracy test, which is evaluated using RMSE ATE in meter.
Table 2. The results of the pose estimation accuracy test, which is evaluated using RMSE ATE in meter.
PL-VIOVINS-MonoHybVIOPL-VINSLRPL-VIO
MH_01_easy0.136 10.155 20.2880.1570.212
MH_02_easy0.141 10.1780.2130.170 20.178
MH_03_medium0.2640.194 10.3190.227 20.281
MH_04_difficult0.3630.3640.218 10.275 20.218 1
MH_05_difficult0.276 10.3030.3680.288 20.358
V1_01_easy0.0830.0890.0840.075 20.053 1
V1_02_medium*0.1120.104 20.1230.088 1
V1_03_difficult0.1990.1870.110 10.1820.118 2
V2_01_easy0.0880.0870.057 10.0810.058 2
V2_02_medium0.1350.1520.1270.124 20.076 1
V2_03_difficult0.2810.2930.127 10.2100.144 2
Mean0.1970.1920.1830.174 20.162 1
* means failure. 1 means the best while 2 means the second best.
Table 3. The features of the selected challenging sequences.
Table 3. The features of the selected challenging sequences.
DatasetSequenceFeatures
UMA-VI     class_csc2low texture, indoor–outdoor change
parking_csc2low texture, dark scene, illumination change
third_floor_englow texture, illumination change, fast motion
VIODEcd3_highdynamic objects
cn3_highdark scene, dynamic objects
Table 4. The results of the robustness experiment. For evaluation, the alignment error in meter is calculated on the UMA-VI dataset and the RMSE ATE in meter is calculated on the VIODE dataset.
Table 4. The results of the robustness experiment. For evaluation, the alignment error in meter is calculated on the UMA-VI dataset and the RMSE ATE in meter is calculated on the VIODE dataset.
PL-VIOVINS-MonoHybVIOPL-VINSLRPL-VIO
class_csc2**2.491 23.5072.161 1
parking_csc2*7.8695.1014.216 22.479 1
third_floor_eng***14.511 24.929 1
cd3_high1.3581.6530.619 25.9460.391 1
cn3_high0.861 20.639 10.9711.2250.864
* means failure. 1 means the best while 2 means the second best.
Table 5. The results of the real-world experiments, which is evaluated using RMSE ATE in meter.
Table 5. The results of the real-world experiments, which is evaluated using RMSE ATE in meter.
SequenceHybVIOLRPL-VIOEnhance
Lab_Normal0.1300.084 135.4%
Lab_FastRotation0.2150.158 126.5%
1 means the best.
Table 6. The results of the runtime analysis, which is evaluated using millisecond.
Table 6. The results of the runtime analysis, which is evaluated using millisecond.
Point ProcessingLine ProcessingVIOSum
PL-VIO22.1115.228.7166.0
PL-VINS15.032.046.093.0
LRPL-VIO9.58.812.230.5
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zheng, F.; Zhou, L.; Lin, W.; Liu, J.; Sun, L. LRPL-VIO: A Lightweight and Robust Visual–Inertial Odometry with Point and Line Features. Sensors 2024, 24, 1322. https://0-doi-org.brum.beds.ac.uk/10.3390/s24041322

AMA Style

Zheng F, Zhou L, Lin W, Liu J, Sun L. LRPL-VIO: A Lightweight and Robust Visual–Inertial Odometry with Point and Line Features. Sensors. 2024; 24(4):1322. https://0-doi-org.brum.beds.ac.uk/10.3390/s24041322

Chicago/Turabian Style

Zheng, Feixiang, Lu Zhou, Wanbiao Lin, Jingyang Liu, and Lei Sun. 2024. "LRPL-VIO: A Lightweight and Robust Visual–Inertial Odometry with Point and Line Features" Sensors 24, no. 4: 1322. https://0-doi-org.brum.beds.ac.uk/10.3390/s24041322

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop