高级检索

  • ISSN 1006-3080
  • CN 31-1691/TQ

基于感知注意力和轻量金字塔融合网络模型的室内场景语义分割方法

李钰 袁晴龙 徐少铭 和嘉鹏

李钰, 袁晴龙, 徐少铭, 和嘉鹏. 基于感知注意力和轻量金字塔融合网络模型的室内场景语义分割方法[J]. 华东理工大学学报(自然科学版). doi: 10.14135/j.cnki.1006-3080.20210928002
引用本文: 李钰, 袁晴龙, 徐少铭, 和嘉鹏. 基于感知注意力和轻量金字塔融合网络模型的室内场景语义分割方法[J]. 华东理工大学学报(自然科学版). doi: 10.14135/j.cnki.1006-3080.20210928002
LI Yu, YUAN Qinglong, XU Shaoming, HE Jiapeng. Semantic Segmentation Method of Indoor Scene Based on Perceptual Attention and Lightweight Pyramid Fusion Network Model[J]. Journal of East China University of Science and Technology. doi: 10.14135/j.cnki.1006-3080.20210928002
Citation: LI Yu, YUAN Qinglong, XU Shaoming, HE Jiapeng. Semantic Segmentation Method of Indoor Scene Based on Perceptual Attention and Lightweight Pyramid Fusion Network Model[J]. Journal of East China University of Science and Technology. doi: 10.14135/j.cnki.1006-3080.20210928002

基于感知注意力和轻量金字塔融合网络模型的室内场景语义分割方法

doi: 10.14135/j.cnki.1006-3080.20210928002
详细信息
    作者简介:

    李钰:李 钰(1973-),男,浙江诸暨人,博士,副教授,主要研究方向为信号与信息处理。E-mail:liyu@ecust.edu.cn

  • 中图分类号: TP391

Semantic Segmentation Method of Indoor Scene Based on Perceptual Attention and Lightweight Pyramid Fusion Network Model

  • 摘要: 针对实验室场景理解时存在背景复杂、光照多变等问题,利用RGB信息与深度信息在场景理解中具有互补性的特点,提出了一种感知注意力和轻量空间金字塔融合的网络模型(Perception Attention and Lightweight Spatial Fusion Network,PLFNet)。在该模型的感知注意力模块中,利用RGB图像与深度图像在网络中的权重不同,以加权的方式实现深度信息对RGB信息的多级辅助;在轻量空间金字塔池化模块中,通过增加级联的空洞空间卷积,不但有效地聚集了多尺度特征,而且比传统空间金字塔池化模块的参数量减少了约92%,使RGB信息和深度信息的融合更充分。在两个室内场景公开数据集上的实验结果表明,该模型的表现均优于经典算法。消融实验的结果表明,本文模型的平均交并比分别提高了4.3%和3.5%。最后,利用场景较复杂的生物实验室数据集进行测试,结果表明本文模型可以有效地实现对生物实验室的场景理解。

     

  • 图  1  PLFNet网络结构整体框架

    Figure  1.  Overall framework of PLFNet network structure

    图  2  标准残差模块

    Figure  2.  Standard residual block

    图  3  PLFNet中使用的标准残差模块

    Figure  3.  Standard residual block in PLFNet

    图  4  感知注意力模块

    Figure  4.  Perception attention module

    图  5  传统ASPP模块图

    Figure  5.  Traditional ASPP module

    图  6  轻量空间金字塔池化模块图

    Figure  6.  Lightweight spatial pyramid pooling module

    图  7  编-解码感知模块

    Figure  7.  Encoding and decoding perception module

    图  8  本文算法在NYU-Depth V2数据集上各阶段分割结果可视化对比

    Figure  8.  Visualization comparison of segmentation results of each stage of the algorithm in NYU-Depth V2 dataset

    图  9  相机外部结构图

    Figure  9.  External structure of the camera

    图  10  相机内部结构图

    Figure  10.  Internal structure of the camera

    图  11  RGB棋盘图

    Figure  11.  RGB checkerboard image

    图  12  红外棋盘图

    Figure  12.  Infrared checkerboard image

    图  13  原始RGB图

    Figure  13.  Original RGB image

    图  14  原始深度图

    Figure  14.  Original depth image

    图  15  配准后的深度图

    Figure  15.  Depth image after registration

    图  16  配准后的RGB图

    Figure  16.  RGB image after registration

    图  17  深度图像修复前后对比图

    Figure  17.  Comparison before and after depth image restoration

    图  18  生物实验室场景测试结果

    Figure  18.  Test results of the biological laboratory scene

    表  1  不同算法PA、MPA和MIoU在NYU-Depth V2数据集上的比较

    Table  1.   PA, MPA and MIoU of different algorithms in NYU-Depth V2 dataset

    Algorithm$ \mathrm{P}\mathrm{A}/\mathrm{\%} $MPA$ /\mathrm{\%} $MIoU$ /\mathrm{\%} $
    Bayesian SegNet[28]68.045.832.4
    3DGNN[29]55.743.1
    Context[30]70.053.640.6
    LSD-GF[31]71.960.745.9
    CFN(VGG-16)[32]41.7
    Dilated FCN [33]62.647.132.3
    RefineNet-LW-152[34]44.4
    TD2-PSP50[36]55.243.5
    DACNN-DSPP[35]61.738.728.0
    PLFNet72.260.846.2
    下载: 导出CSV

    表  2  不同算法PA、MPA和MIoU在SUN RGB-D数据集上的比较

    Table  2.   PA, MPA and MIoU of different algorithms in SUN RGB-D dataset

    Algorithm$ \mathrm{P}\mathrm{A}/\mathrm{\%} $MPA$ /\mathrm{\%} $MIoU$ /\mathrm{\%} $
    Bayesian SegNet[28]71.245.930.7
    FuseNet [17]76.348.337.3
    SegNet [37]72.644.831.9
    Context[30]78.453.442.3
    CFN(VGG-16)[32]42.5
    Depth-Aware [38]53.542.0
    IndexNet[39]33.5
    DACNN-DSPP [35]72.942.032.5
    PLFNet79.657.945.5
    下载: 导出CSV

    表  3  两个模块对网络模型PA、MPA和MIoU的影响

    Table  3.   Impact of two modules on PA, MPA and MIoU

    ModelRGBDepthPAMLSPP$ \mathrm{P}\mathrm{A} $/%$ \mathrm{M}\mathrm{P}\mathrm{A}/\mathrm{\%} $$ \mathrm{M}\mathrm{I}\mathrm{o}\mathrm{U}/\mathrm{\%} $
    Model_0×××58.748.732.2
    Model_1××64.354.938.4
    Model_2×69.357.342.7
    PLFNet72.260.846.2
    下载: 导出CSV

    表  4  RGB相机与深度相机内参数

    Table  4.   Internal parameters of RGB camera and depth camera

    Camera$ {f}_{x} $$ {f}_{y} $$ {c}_{x} $$ {c}_{x} $
    RGB1045.915881046.39491944.06530542.89044
    Depth362.07166362.06795257.70148206.12828
    下载: 导出CSV
  • [1] GARCIA-GARCIA A, ORTS-ESCOLANO S, OPREA S, et al. A survey on deep learning techniques for image and video semantic segmentation[J]. Applied Soft Computing, 2018, 70: 41-65. doi: 10.1016/j.asoc.2018.05.018
    [2] CORTINHAL T, TZELEPIS G, AKSOY E E. SalsaNext: Fast, uncertainty-aware semantic segmentation of LiDAR point clouds[C]//International Symposium on Visual Computing. Cham: Springer, 2020: 207-222.
    [3] 康萌萌, 杨浩, 谷小婧, 等. 基于融合路径监督的多波段图像语义分割[J]. 华东理工大学学报 (自然科学版), 2021, 47(2): 233-240.
    [4] TEICHMANN M, WEBER M, ZOELLNER M, et al. Multinet: Real-time joint semantic reasoning for autonomous driving[C]//2018 IEEE Intelligent Vehicles Symposium (IV). USA: IEEE, 2018: 1013-1020.
    [5] TESO-FZ-BETOÑO D, ZULUETA E, SÁNCHEZ-CHICA A, et al. Semantic segmentation to develop an indoor navigation system for an autonomous mobile robot[J]. Mathematics, 2020, 8(5): 855. doi: 10.3390/math8050855
    [6] MIYAMOTO R, ADACHI M, NAKAMURA Y, et al. Accuracy improvement of semantic segmentation using appropriate datasets for robot navigation[C]//2019 6th International Conference on Control, Decision and Information Technologies (CoDIT). Paris: IEEE, 2019: 1610-1615.
    [7] ALONSO I, RIAZUELO L, MURILLO A C. Mininet: An efficient semantic segmentation ConvNet for real-time robotic applications[J]. IEEE Transactions on Robotics, 2020, 36(4): 1340-1347. doi: 10.1109/TRO.2020.2974099
    [8] TANZI L, PIAZZOLLA P, PORPIGLIA F, et al. Real-time deep learning semantic segmentation during intra-operative surgery for 3D augmented reality assistance[J]. International Journal of Computer Assisted Radiology and Surgery, 2021, 16(9): 1435-1445. doi: 10.1007/s11548-021-02432-y
    [9] ZHANG H, HAN B, IP C Y, et al. Slimmer: Accelerating 3D semantic segmentation for mobile augmented reality[C]//2020 IEEE 17th International Conference on Mobile Ad Hoc and Sensor Systems (MASS). USA: IEEE, 2020: 603-612.
    [10] AN Z, XU X P,YANG J H, et al. Design of augmented reality head-up display system based on image semantic segmentation[J]. Acta Optica Sinica, 2018, 38(7): 0710004. doi: 10.3788/AOS201838.0710004
    [11] VALENTIN J P C, SENGUPTA S, WARRELL J, et al. Mesh based semantic modelling for indoor and outdoor scenes[C]// IEEE Conference on Computer Vision and Pattern Recognition. USA: IEEE, 2013: 2067-2074.
    [12] SHAO L, HAN J, KOHLI P, et al. Computer Vision and Machine Learning with RGB-D Sensors[M]. UK: Springer International Publishing, 2014.
    [13] ZHOU T, FAN D P, CHENG M M, et al. RGB-D salient object detection: A survey[J]. Computational Visual Media, 2021, 7: 37-69. doi: 10.1007/s41095-020-0199-z
    [14] COUPRIE C, FARABET C, NAJMAN L, et al. Indoor semantic segmentation using depth information[C]//First International Conference on Learning Representations (ICLR 2013). [s. l. ]: [s. n. ] , 2013: 1-8.
    [15] GUPTA S, GIRSHICK R, ARBELÁEZ P, et al. Learning rich features from RGB-D images for object detection and segmentation[C]//European Conference on Computer Vision. Cham: Springer, 2014: 345-360.
    [16] 何俊, 张彩庆, 李小珍, 等. 面向深度学习的多模态融合技术研究综述[J]. 计算机工程, 2020, 46(5): 1-11.
    [17] HAZIRBAS C, MA L, DOMOKOS C, et al. Fusenet: Incorporating depth into semantic segmentation via fusion-based CNN architecture[C]//Asian Conference on Computer Vision. Cham: Springer, 2016: 213-228.
    [18] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. USA: IEEE, 2016: 770-778.
    [19] HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. USA: IEEE, 2018: 7132-7141.
    [20] CHEN L C, ZHU Y, PAPANDREOU G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation[C]//Proceedings of the European Conference on Computer Vision (ECCV). Cham: Springer, 2018: 801-818.
    [21] RONNEBERGER O, FISCHER P, BROX T. U-net: Convolutional networks for biomedical image segmentation[C]//International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer, 2015: 234-241.
    [22] SILBERMAN N, HOIEM D, KOHLI P, et al. Indoor segmentation and support inference from RGBD images[C]//European Conference on Computer Vision. Berlin, Heidelberg: Springer, 2012: 746-760.
    [23] SONG S, LICHTENBERG S P, XIAO J. Sun RGB-D: A RGB-D scene understanding benchmark suite[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. USA: IEEE, 2015: 567-576.
    [24] LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. USA: IEEE, 2015: 3431-3440.
    [25] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[J]. Advances in Neural Information Processing Systems, 2012, 25: 1097-1105.
    [26] GLOROT X, BENGIO Y. Understanding the difficulty of training deep feedforward neural networks[J]. Journal of Machine Learning Research, 2010, 9: 249-256.
    [27] LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(2): 318-327. doi: 10.1109/TPAMI.2018.2858826
    [28] KENDALL A, BADRINARAYANAN V, CIPOLLA R. Bayesian segNet: Model uncertainty in deep convolutional encoder-decoder architectures for scene understanding[EB/OL]. (2015-11-09)[2021-08-30].https://arxiv.org/pdf/1511.02680.pdf .
    [29] QI X, LIAO R, JIA J, et al. 3D graph neural networks for RGBD semantic segmentation[C]//Proceedings of the IEEE International Conference on Computer Vision. Italy : IEEE, 2017: 5199-5208.
    [30] LIN G, SHEN C, VAN DEN HENGEL A, et al. Exploring context with deep structured models for semantic segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 40(6): 1352-1366.
    [31] CHENG Y, CAI R, LI Z, et al. Locality-sensitive deconvolution networks with gated fusion for RGB-D indoor semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. USA: IEEE, 2017: 3029-3037.
    [32] LIN D, CHEN G, COHEN-OR D, et al. Cascaded feature network for semantic segmentation of RGB-D images[C]//Proceedings of the IEEE International Conference on Computer Vision. Italy : IEEE, 2017: 1311-1319.
    [33] KAMRAN S A, SABBIR A S. Efficient yet deep convolutional neural networks for semantic segmentation[C]//2018 International Symposium on Advanced Intelligent Informatics (SAIN). Indonesia : IEEE, 2018: 123-130.
    [34] NEKRASOV V, SHEN C, REID I. Light-weight refinenet for real-time semantic segmentation[EB/OL]. (2018-10-8)[2021-9-20].https://arxiv.org/abs/1810.03272.
    [35] 杨胜杰, 仇振安, 高小宁, 等. 基于深度敏感空间金字塔池化的 RGBD 语义分割[J]. 电光与控制, 2020, 27(12): 84-89. doi: 10.3969/j.issn.1671-637X.2020.12.018
    [36] HU P, CABA F, WANG O, et al. Temporally distributed networks for fast video semantic segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. USA: IEEE, 2020: 8818-8827.
    [37] BADRINARAYANAN V, KENDALL A, CIPOLLA R. SegNet: A deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12): 2481-2495. doi: 10.1109/TPAMI.2016.2644615
    [38] WANG W, NEUMANN U. Depth-aware CNN for RGB-D segmentation[C]//Proceedings of the European Conference on Computer Vision (ECCV). UK: Springer, 2018: 135-150.
    [39] LU H, DAI Y, SHEN C, et al. Index network[EB/OL]. (2019-8-11)[2021-9-20].https://arxiv.org/abs/1908.09895.
    [40] 崔微, 朱民耀, 颜柯, 等. 基于 Kinect 的三维重建技术在三维显示中的应用[J]. 电子测量技术, 2017, 40(3): 113-116. doi: 10.3969/j.issn.1002-7300.2017.03.024
    [41] ZHANG Z. Flexible camera calibration by viewing a plane from unknown orientations[C]//Proceedings of the Seventh IEEE International Conference on Computer Vision. Greece: IEEE, 1999: 666-673.
    [42] LIU J, GONG X. Guided depth enhancement via anisotropic diffusion[C]//Pacific-Rim Conference on Multimedia. Cham: Springer, 2013: 408-417.
  • 加载中
图(18) / 表(4)
计量
  • 文章访问数:  26
  • HTML全文浏览量:  22
  • PDF下载量:  3
  • 被引次数: 0
出版历程
  • 收稿日期:  2021-09-28
  • 录用日期:  2021-12-06
  • 网络出版日期:  2022-04-16

目录

    /

    返回文章
    返回