A Visual SLAM Robust against Dynamic Objects Based on Hybrid Semantic-Geometry Information
Abstract
:1. Introduction
- (1)
- We propose a hybrid RGB-D SLAM that features two advanced deep learning networks: DeepLabv3 for semantic segmentation, and PWC-Net for optical flow prediction. The hybrid SLAM is capable of curbing the interference of dynamic elements in complicated scenes via the combination of geometry and semantic modules, while retaining sufficient static elements for accurate estimation of position.
- (2)
- We propose an efficient optimization strategy based on geometric relationships to synthesize the coarse moving area of the current frame. Leveraging the depth image and the matched keypoints, we employed bundle adjustment to calculate the initial transform matrix and smooth the crude optical flow generated by PWC-Net. Then, we applied the k-means algorithm to cluster the region with relatively high flow value, and supplemented this via an epipolar check. Additionally, as a technical implementation of the optical flow residuals, we provide the speed of the common dynamic instance in our mechanism, especially in reference to the person studied in the present paper.
- (3)
- To verify the effectiveness of the proposed method, we performed a systematic evaluation of both the benchmark sequences and datasets recorded by our experimental quadcopter. Compared with state-of-the-art SLAM (e.g., ORBSLAM2) and other prominent dynamic SLAM systems, our approach demonstrated superior intelligence in challenging scenarios.
2. Overview of the Framework
3. Methodology
3.1. Optical Flow Residual Clustering
3.2. Geometric Segmentation
- Points that are clustered to the static background with large epipolar distances should be determined to be dynamic points through the joint cost function.
- Although some points are not in possession of epipolar distance, they have strong optical flow residuals. Theoretically, they are part of the geometry mask.
3.3. Semantic Segmentation
3.4. Outlier Rejection
Algorithm 1 Outliers Rejection Algorithm. | |
Output: Local feature map M | |
1: | do |
2: | if then |
3: | Instance count |
4: | if then |
5: | Dynamic count |
6: | end if |
7: | end if |
8: | end for |
9: | then |
10: |
4. Experiments
Algorithm 2 Dynamic keypoints filtering with original ORB-SLAM2 system. | |
Input: Image Sequence H, Depth Sequence D, Frames Output: Local feature frame | |
1 | for in H do |
2 | = Img_pairs_predict from PWC-Net |
3 | = Img_predict from Deeplabv3 |
4 | |
5 | Optimization: |
6 | |
7 | |
8 | |
9 | for each matched pairs in do |
10 | |
11 | |
12 | |
13 | If then |
14 | Append to |
15 | end if |
16 | end for |
17 | end for |
18 | Execute Outliers Rejection Algorithm |
19 | final |
4.1. Experiment on Public Datasets
4.2. Comparison with Other Dynamic SLAM Systems
4.3. Robustness Test in Real Environments
5. Conclusions
Author Contributions
Funding
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Kanellakis, C.; Nikolakopoulos, G. Survey on Computer Vision for UAVs: Current Developments and Trends. J. Intell. Robot. Syst. 2017, 87, 141–168. [Google Scholar] [CrossRef] [Green Version]
- Liu, Y.; Wang, C. Hybrid real-time stereo visual odometry for unmanned aerial vehicles. Opt. Eng. 2018, 57, 073104. [Google Scholar] [CrossRef]
- Deng, J.; Wu, S.; Zhao, H.; Cai, D. Measurement model and observability analysis for optical flow-aided inertial navigation. Opt. Eng. 2019, 58, 083102. [Google Scholar] [CrossRef]
- Davison, A.J.; Reid, I.D.; Molton, N.D.; Stasse, O. MonoSLAM: Real-time single camera SLAM. IEEE Trans. Pattern Anal. Mach. Intell. 2007, 29, 1052–1067. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Mur-Artal, R.; Tardos, J.D. ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras. IEEE Trans. Robot. 2017, 33, 1255–1262. [Google Scholar] [CrossRef] [Green Version]
- Engel, J.; Schöps, T.; Cremers, D. LSD-SLAM: Large-Scale Direct Monocular SLAM. In Proceedings of the 13th European Conference of Computer Vision, Zürich, Switzerland, 6–12 September 2014; pp. 834–849. [Google Scholar]
- Engel, J.; Koltun, V.; Cremers, D. Direct Sparse Odometry. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 611–625. [Google Scholar] [CrossRef] [PubMed]
- Forster, C.; Pizzoli, M.; Scaramuzza, D. SVO: Fast semi-direct monocular visual odometry. In Proceedings of the 2014 IEEE international conference on robotics and automation (ICRA), Hong Kong, China, 31 May–7 June 2014. [Google Scholar]
- Fischler, M.A.; Bolles, R.C. Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography. Commun. ACM 1981, 24, 381–395. [Google Scholar] [CrossRef]
- Sun, Y.; Liu, M.; Meng, M.Q. Motion removal from moving platforms: An RGB-D data-based motion detection, tracking and segmentation approach. In Proceedings of the 2015 IEEE International Conference on Robotics and Biomimetics (ROBIO), Zhuhai, China, 6–9 December 2015; pp. 1377–1382. [Google Scholar]
- Wang, R.; Wan, W.; Wang, Y.; Di, K. A New RGB-D SLAM Method with Moving Object Detection for Dynamic Indoor Scenes. Remote. Sens. 2019, 11, 1143. [Google Scholar] [CrossRef] [Green Version]
- Zhang, T.; Zhang, H.; Li, Y.; Nakamura, Y.; Zhang, L. FlowFusion: Dynamic Dense RGB-D SLAM Based on Optical Flow. In Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May–31 August 2020; pp. 7322–7328. [Google Scholar]
- Cheng, J.; Sun, Y.; Meng, M.Q. Improving monocular visual SLAM in dynamic environments: An optical-flow-based approach. Adv. Robot. 2019, 33, 576–589. [Google Scholar] [CrossRef]
- Scona, R.; Jaimez, M.; Petillot, Y.R.; Fallon, M.; Cremers, D. StaticFusion: Background Reconstruction for Dense RGB-D SLAM in Dynamic Environments. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 21–25 May 2018; pp. 3849–3856. [Google Scholar]
- Li, S.; Lee, D. RGB-D SLAM in dynamic environments using static point weighting. IEEE Robot. Autom. Lett. 2017, 2, 2263–2270. [Google Scholar] [CrossRef]
- Dai, W.; Zhang, Y.; Li, P.; Fang, Z.; Scherer, S. RGB-D SLAM in Dynamic Environments Using Point Correlations. IEEE Trans. Pattern Anal. Mach. Intell. 2020. [Google Scholar] [CrossRef] [PubMed]
- Yu, C.; Liu, Z.; Liu, X.-J.; Xie, F.; Yang, Y.; Wei, Q.; Fei, Q. DS-SLAM: A Semantic Visual SLAM towards Dynamic Environments. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; pp. 1168–1174. [Google Scholar]
- Bescos, B.; Facil, J.M.; Civera, J.; Neira, J. DynaSLAM: Tracking, Mapping, and Inpainting in Dynamic Scenes. IEEE Robot. Autom. Lett. 2018, 3, 4076–4083. [Google Scholar] [CrossRef] [Green Version]
- Zhong, F.; Wang, S.; Zhang, Z.; Chen, C.; Wang, Y. Detect-SLAM: Making Object Detection and SLAM Mutually Beneficial. In Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, USA, 12–15 March 2018; pp. 1001–1010. [Google Scholar]
- Lv, X.; Wang, B.; Ye, D.; Wang, S. Semantic Flow-guided Motion Removal Method for Robust Mapping. arXiv 2020, arXiv:2010.06876. [Google Scholar]
- Li, A.; Wang, J.; Xu, M.; Chen, Z. DP-SLAM: A visual SLAM with moving probability towards dynamic environments. Inf. Sci. 2021, 556, 128–142. [Google Scholar] [CrossRef]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Girshick, R. Fast R-CNN. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Sun, D.; Yang, X.; Liu, M.-Y.; Kautz, J. PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 8934–8943. [Google Scholar]
- Chen, L.C.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking atrous convolution for semantic image segmentation. arXiv 2017, arXiv:1706.05587. [Google Scholar]
- Quigley, M.; Conley, K.; Gerkey, B.; Faust, J.; Foote, T.; Leibs, J.; Wheeler, R.; Ng, A.Y. ROS: An Open-Source Robot Operating System. Available online: https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&cad=rja&uact=8&ved=2ahUKEwjRl5zuiqvzAhWCFogKHTurCJoQFnoECAQQAQ&url=http%3A%2F%2Frobotics.stanford.edu%2F~ang%2Fpapers%2Ficraoss09-ROS.pdf&usg=AOvVaw2B1QakGGpsgu8z8h5Pxx5C (accessed on 30 June 2021).
- Farnebäck, G. Two-frame motion estimation based on polynomial expansion. In Proceedings of the 13th Scandinavian Conference, SCIA 2003, Halmstad, Sweden, 29 June–2 July 2003; pp. 363–370. [Google Scholar]
- Bouguet, J.Y. Pyramidal implementation of the affine lucas kanade feature tracker description of the algorithm. Intel Corp. 2001, 5, 4. [Google Scholar]
- Fernando, W.; Udawatta, L.; Pathirana, P. Identification of moving obstacles with Pyramidal Lucas Kanade optical flow and k means clustering. In Proceedings of the 2007 Third International Conference on Information and Automation for Sustainability, Melbourne, Australia, 4–6 December 2007; pp. 111–117. [Google Scholar]
- Gujunoori, S.; Oruganti, M. Tracking and Size Estimation of Objects in Motion using Optical flow and K-means Clustering. In Proceedings of the 2017 2nd International Conference on Emerging Computation and Information Technologies (ICECIT), Tumakuru, India, 15–16 December 2017; pp. 1–6. [Google Scholar]
- Lepetit, V.; Moreno-Noguer, F.; Fua, P. Epnp: An accurate o (n) solution to the pnp problem. Int. J. Comput. Vis. 2009, 81, 155. [Google Scholar] [CrossRef] [Green Version]
- Abadi, M.; Barham, P.; Chen, J.; Chen, Z.; Davis, A.; Dean, J.; Devin, M.; Ghemawat, S.; Irving, G.; Isard, M.; et al. Tensorflow: A system for large-scale machine learning. In Proceedings of the 12th (USENIX) Symposium on Operating Systems Design and Implementation (OSDI 16), Savannah, GA, USA, 2–4 November 2016; pp. 265–283. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Everingham, M.; Van Gool, L.; Williams, C.K.I.; Winn, J.; Zisserman, A. The Pascal Visual Object Classes (VOC) Challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef] [Green Version]
- Sturm, J.; Engelhard, N.; Endres, F.; Burgard, W.; Cremers, D. A benchmark for the evaluation of RGB-D SLAM systems. In Proceedings of the 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, Vilamoura-Algarve, Portugal, 7–12 October 2012; pp. 573–580. [Google Scholar]
Category | Sequences | ORB-SLAM2 | Hybrid SLAM | Improvements | ||||||
---|---|---|---|---|---|---|---|---|---|---|
Median | RMSE | Std | Median | RMSE | Std | Median | RMSE | Std | ||
Low dynamic sequence | fr3/s/xyz | 0.0068 | 0.0085 | 0.0043 | 0.0078 | 0.0100 | 0.0050 | −14.71% | −17.65% | −16.28% |
fr3/s/static | 0.0067 | 0.0084 | 0.0039 | 0.0046 | 0.0060 | 0.0030 | 31.34% | 28.57% | 23.08% | |
fr3/s/rpy | 0.0132 | 0.0213 | 0.0130 | 0.0115 | 0.0160 | 0.0085 | 12.88% | 24.88% | 34.62% | |
fr3/s/halfsphere | 0.0136 | 0.0207 | 0.0132 | 0.0113 | 0.0137 | 0.0061 | 16.91% | 33.82% | 53.79% | |
High dynamic sequence | fr3/w/xyz | 0.4598 | 0.6711 | 0.3752 | 0.0114 | 0.0152 | 0.0075 | 97.52% | 97.74% | 98.00% |
fr3/w/static | 0.2812 | 0.3948 | 0.1650 | 0.0052 | 0.0069 | 0.0034 | 98.15% | 98.25% | 97.94% | |
fr3/w/rpy | 0.5655 | 0.7005 | 0.2866 | 0.0781 | 0.1050 | 0.0504 | 86.19% | 85.01% | 82.41% | |
fr3/w/halfsphere | 0.3512 | 0.4320 | 0.1764 | 0.0208 | 0.0280 | 0.0144 | 94.08% | 93.52% | 91.84% |
Category | Sequences | ORB-SLAM2 | Hybrid SLAM | Improvements | ||||||
---|---|---|---|---|---|---|---|---|---|---|
Median | RMSE | Std | Median | RMSE | Std | Median | RMSE | Std | ||
Low dynamic sequence | fr3/s/xyz | 0.0064 | 0.0082 | 0.0041 | 0.0064 | 0.0086 | 0.0046 | 0.00% | −4.88% | −12.20% |
fr3/s/static | 0.0040 | 0.0056 | 0.0029 | 0.0033 | 0.0045 | 0.0024 | 17.50% | 19.64% | 17.24% | |
fr3/s/rpy | 0.0076 | 0.0126 | 0.0082 | 0.0072 | 0.0105 | 0.0060 | 5.26% | 16.67% | 26.83% | |
fr3/s/halfsphere | 0.0060 | 0.0082 | 0.0045 | 0.0068 | 0.0090 | 0.0047 | −13.33% | −9.76% | −4.44% | |
High dynamic sequence | fr3/w/xyz | 0.0176 | 0.2726 | 0.0164 | 0.0077 | 0.0113 | 0.0064 | 56.25% | 95.85% | 60.98% |
fr3/w/static | 0.0053 | 0.0137 | 0.1074 | 0.0037 | 0.0056 | 0.0033 | 30.19% | 59.12% | 96.93% | |
fr3/w/rpy | 0.0162 | 0.0493 | 0.0437 | 0.0126 | 0.0237 | 0.0162 | 22.22% | 51.93% | 62.93% | |
fr3/w/halfsphere | 0.0136 | 0.0372 | 0.0320 | 0.0084 | 0.0135 | 0.0083 | 38.24% | 63.71% | 74.06% |
Category | Sequences | ORB-SLAM2 | Hybrid SLAM | Improvements | ||||||
---|---|---|---|---|---|---|---|---|---|---|
Median | RMSE | Std | Median | RMSE | Std | Median | RMSE | Std | ||
Low dynamic sequence | fr3/s/xyz | 0.0058 | 0.0079 | 0.0041 | 0.0058 | 0.0077 | 0.0040 | 0.00% | 2.53% | 2.44% |
fr3/s/static | 0.0030 | 0.0041 | 0.0021 | 0.0025 | 0.0037 | 0.0020 | 16.67% | 9.76% | 4.76% | |
fr3/s/rpy | 0.0071 | 0.0099 | 0.0056 | 0.0071 | 0.0092 | 0.0045 | 0.00% | 7.07% | 19.64% | |
fr3/s/halfsphere | 0.0065 | 0.0089 | 0.0046 | 0.0067 | 0.0089 | 0.0045 | −3.08% | 0.00% | 2.17% | |
High dynamic sequence | fr3/w/xyz | 0.0100 | 0.0157 | 0.0096 | 0.0057 | 0.0095 | 0.0067 | 43.00% | 39.49% | 30.21% |
fr3/w/static | 0.0040 | 0.0070 | 0.0047 | 0.0031 | 0.0041 | 0.0022 | 22.50% | 41.43% | 53.19% | |
fr3/w/rpy | 0.0107 | 0.0196 | 0.0143 | 0.0085 | 0.0135 | 0.0082 | 20.56% | 31.12% | 42.66% | |
fr3/w/halfsphere | 0.1001 | 0.0240 | 0.0204 | 0.0072 | 0.0098 | 0.0052 | 92.81% | 59.17% | 74.51% |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Miao, S.; Liu, X.; Wei, D.; Li, C. A Visual SLAM Robust against Dynamic Objects Based on Hybrid Semantic-Geometry Information. ISPRS Int. J. Geo-Inf. 2021, 10, 673. https://0-doi-org.brum.beds.ac.uk/10.3390/ijgi10100673
Miao S, Liu X, Wei D, Li C. A Visual SLAM Robust against Dynamic Objects Based on Hybrid Semantic-Geometry Information. ISPRS International Journal of Geo-Information. 2021; 10(10):673. https://0-doi-org.brum.beds.ac.uk/10.3390/ijgi10100673
Chicago/Turabian StyleMiao, Sheng, Xiaoxiong Liu, Dazheng Wei, and Changze Li. 2021. "A Visual SLAM Robust against Dynamic Objects Based on Hybrid Semantic-Geometry Information" ISPRS International Journal of Geo-Information 10, no. 10: 673. https://0-doi-org.brum.beds.ac.uk/10.3390/ijgi10100673