ORBTSDF-SCNet: An Online 3D reconstruction Method for Dynamic Scene
-
摘要: 传统的三维重建技术在面对移动物体干扰时难以有效完成场景重建任务。针对该问题,本文提出一种基于SLAM(Simultaneous Localization and Mapping)、TSDF(truncated signed distance function)和SCNet(Sample Consistency Networks)实例分割网络的三维重建方法ORBTSDF-SCNet。该方法采用深度相机或双目相机获取重建物体及场景的深度图与RGB图,基于ORB_SLAM2实时获取位姿信息;采用基于结构化点云数据的表面重建算法TSDF与深度图相结合,实现在线三维模型重建;为了消除场景中移动物体对场景三维重建的干扰,提出采用SCNet实例分割网络检测和分割移动物体,并结合优化策略减小检测和实例分割误差以及深度图和RGB图对准误差。通过抠除移动物体,保证了重建场景的完整性。在ICL-NUIM、TUM数据集上的实验表明了本文所提方法的有效性。
-
关键词:
- 三维重建 /
- RGB-D SLAM /
- TSDF /
- SCNet
Abstract: One of the important way of 3D model making is 3D reconstruction. At present, 3D scene reconstruction with moving object interference is a research hotspot. To solve this problem, this paper proposes a 3D reconstruction framework named ORBTSDF-SCNet. This framework combines SLAM (Simultaneous Localization And Mapping), TSDF(Truncated Signed Distance Function) and SCNet(Sample Consistency Networks) technology to complete 3D scene reconstruction with moving object interference. In this framework, firstly, aiming at the fact that SLAM system can only output point cloud and can not directly generate 3D model, this paper proposes a 3D reconstruction method ORBTSDF.In this method, depth camera or binocular camera obtains RGBD image of the moving objects and scene, the tracking thread of ORB_SLAM2 is applied to obtains pose information in real time, the surface reconstruction algorithm TSDF is adopted to realize 3D model reconstruction combined with depth image.At the same time, in order to eliminate the interference of moving objects in 3D scene reconstruction, such as image smear, low accuracy or reconstruction failure etc., a deep learning instance segmentation network SCNet is used to detect and segment moving objects. By combining with some optimization strategies, the error of detection and instance segmentation , the alignment error of depth map and RGB map are reduced. When the instance of the moving object is removed, the RGBD image is transmitted back to the part of ORBTSDF to form a 3D scene reconstruction without moving objects. Comparative experiments on ICL-NUM and TUM datasets shows the effectiveness of the proposed method.-
Key words:
- 3D reconstruction /
- RGB-D SLAM /
- TSDF /
- SCNet
-
表 1 在ICL-NUIM数据集上的ATE RMSE(mm)对比实验结果
Table 1. The comparative results of ATE RMSE(mm) on ICL-NUIM dataset
DS RSV2 MRS KS FTM EF OS2 Ours kt0 104 26 204 72 497 9 8 6 kt1 29 8 228 5 9 9 162 74 kt2 191 18 189 10 20 14 18 19 kt3 152 433 1090 355 243 106 19 11 表 2 TUM数据集上不同方法的RMSE(mm)实验结果
Table 2. The comparative results of RMSE(mm) on TUM dataset with different methods
OF SM SD
ATEwalk_hsp 572 143 55 walk_static 39 8 14 walk_xyz 667 29 25 sit_hsp 29 26 23 sit_static 8 7 7 sit_xyz 11 15 18 average 221 38 24
RPEwalk_hsp 39 125 61 walk_static 667 14 31 walk_xyz 29 38 20 sit_hsp 8 37 32 sit_static 11 15 10 sit_xyz 8 19 25 average 127 41 30 表 3 TUM数据集上不同实例分割网络RMSE(mm)实验结果
Table 3. The comparative results of RMSE(mm) on TUM dataset with different instance segmentation networks
MR CMR HTC Ours
ATEwalk_hsp 391 265 182 143 walk_static 10 12 9 8 walk_xyz 45 29 33 29 sit_hsp 31 25 42 26 sit_static 7 8 8 7 sit_xyz 20 21 26 15 average 84 60 50 38
RPEwalk_hsp 463 351 126 125 walk_static 13 25 15 14 walk_xyz 55 58 39 38 sit_hsp 41 25 40 37 sit_static 10 10 10 15 sit_xyz 26 21 33 19 average 101 82 44 41 表 4 在TUM数据集两序列上RMSE(mm)对比实验结果
Table 4. The comparative results of RMSE(mm) on two sequences of TUM dataset
SF FF PF ours ATE walk_sta 350 37 72 14 walk_xyz 510 210 41 25 average 430 124 57 19 RPE walk_sta 180 97 72 31 walk_xyz 680 290 130 20 average 430 194 101 25 表 5 在TUM数据集六序列上RMSE(mm)对比实验结果
Table 5. The comparative results of RMSE(mm) on six sequences of TUM dataset
VS EF CF MF ours
ATEwalk_hsp 739 209 803 106 55 walk_static 327 62 551 35 14 walk_xyz 874 216 696 104 25 sit_hsp 180 138 36 52 23 sit_static 29 9 11 21 7 sit_xyz 111 26 27 31 18 average 155 87 182 55 24
RPEwalk_hsp 335 163 400 93 61 walk_sta 101 58 224 39 31 walk_xyz 335 163 400 93 20 sit_hsp 75 102 30 41 32 sit_static 24 10 11 17 10 sit_xyz 57 28 27 46 25 average 155 87 182 55 30 -
[1] 危双丰, 刘振彬, 赵江洪, et al. SLAM室内三维重建技术综述[J]. 测绘科学, 2018, 43(7): 15-26. [2] NEWCOMBE R A, IZADI S, HILLIGES O, et al. KinectFusion: Real-Time Dense Surface Mapping and Tracking[C]//2011 10th Ieee International Symposium on Mixed and Augmented Reality (ISMAR). New York: IEEE, 2011: 127–136. [3] ENDRES F, HESS J, STURM J, et al. 3-D Mapping With an RGB-D Camera[J]. IEEE Transactions on Robotics, 2014, 30(1): 177-187. doi: 10.1109/TRO.2013.2279412 [4] WHELAN T, KAESS M, FALLON M, et al. Kintinuous: Spatially Extended KinectFusion[J/OL]. Robotics & Autonomous Systems, 2012. http://hdl.handle.net/1721.1/71756. [5] WHELAN T, SALAS-MORENO R F, GLOCKER B, et al. ElasticFusion: Real-Time Dense SLAM and Light Source Estimation[J]. The International Journal of Robotics Research, 2016, 35(14): 1697-1716. doi: 10.1177/0278364916669237 [6] ZHANG T, ZHANG H, LI Y, et al. FlowFusion: Dynamic Dense RGB-D SLAM Based on Optical Flow[C]//2020 IEEE International Conference on Robotics and Automation (ICRA). Paris: IEEE, 2020: 7322–7328. [7] SCONA R, JAIMEZ M, PETILLOT Y R, et al. StaticFusion: Background Reconstruction for Dense RGB-D SLAM in Dynamic Environments[C]//2018 IEEE International Conference on Robotics and Automation (ICRA). Brisbane: IEEE, 2018: 3849–3856. [8] LONG R, RAUCH C, ZHANG T, et al. RigidFusion: Robot Localisation and Mapping in Environments With Large Dynamic Rigid Objects[J]. IEEE Robotics and Automation Letters, 2021, 6(2): 3703-3710. doi: 10.1109/LRA.2021.3066375 [9] GIRSHICK R. Fast R-CNN[C]//2015 Ieee International Conference on Computer Vision (ICCV). New York: IEEE, 2015: 1440–1448. [10] HE K, GKIOXARI G, DOLLÁR P, et al. Mask R-CNN[J/OL]. CoRR, 2017, abs/1703.06870. http://arxiv.org/abs/1703.06870. [11] RUNZ M, BUFFIER M, AGAPITO L. MaskFusion: Real-Time Recognition, Tracking and Reconstruction of Multiple Moving Objects[C]. CHU D, GABBARD J L, GRUBERT J, et al. , eds. //Proceedings of the 2018 Ieee International Symposium on Mixed and Augmented Reality (ISMAR). New York: IEEE, 2018: 10–20. [12] LI Y, ZHANG T, NAKAMURA Y, et al. SplitFusion: Simultaneous Tracking and Mapping for Non-Rigid Scenes[C]//2020 Ieee/Rsj International Conference on Intelligent Robots and Systems (IROS). New York: IEEE, 2020: 5128–5134. [13] MUR-ARTAL R, TARDÓS J D. ORB-SLAM2: An Open-Source Slam System for Monocular, Stereo, and Rgb-d Cameras[J]. IEEE Transactions on Robotics, 2017, 33(5): 1255-1262. doi: 10.1109/TRO.2017.2705103 [14] VU T, KANG H, YOO C D. SCNet: Training Inference Sample Consistency for Instance Segmentation[C]//Thirty-Fifth Aaai Conference on Artificial Intelligence. Palo Alto: Assoc Advancement Artificial Intelligence, 2021: 2701–2709. [15] CAI Z, VASCONCELOS N. Cascade R-CNN: High Quality Object Detection and Instance Segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 43(5): 1483-1498. [16] CHEN K, PANG J, WANG J, et al. Hybrid Task Cascade for Instance Segmentation[C]//2019 Ieee/Cvf Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos: IEEE Computer Soc, 2019: 4969–4978. [17] HANDA A, WHELAN T, MCDONALD J, et al. A Benchmark for RGB-D Visual Odometry, 3D Reconstruction and SLAM[C]//2014 Ieee International Conference on Robotics and Automation (ICRA). New York: IEEE, 2014: 1524–1531. [18] STURM J, MAGNENAT S, ENGELHARD N, et al. Towards a Benchmark for RGB-D SLAM Evaluation[C/OL]//RGB-D Workshop on Advanced Reasoning with Depth Cameras at Robotics: Science and Systems Conf. (RSS). [s. n. ], 2011.https://hal.archives-ouvertes.fr/hal-01142608. [19] KERL C, STURM J, CREMERS D. Dense Visual SLAM for RGB-D Cameras[C]//2013 IEEE/RSJ International Conference on Intelligent Robots and Systems. Tokyo: IEEE, 2013: 2100–2106. [20] STÜCKLER J, BEHNKE S. Multi-Resolution Surfel Maps for Efficient Dense 3D Modeling and Tracking[J]. Journal of Visual Communication and Image Representation, 2014, 25(1): 137-147. doi: 10.1016/j.jvcir.2013.02.008 [21] RUNZ M, AGAPITO L. Co-Fusion: Real-Time Segmentation, Tracking and Fusion of Multiple Objects[J]. 2017 IEEE International Conference on Robotics and Automation (ICRA), 2017: 4471-8. [22] ZHANG T, NAKAMURA Y. PoseFusion: Dense RGB-D SLAM in Dynamic Human Environments[C]. XIAO J, KROGER T, KHATIB O, eds. //Proceedings of the 2018 International Symposium on Experimental Robotics. Cham: Springer International Publishing Ag, 2020: 772–780. -