高级检索

    李钰, 袁晴龙, 徐少铭, 和嘉鹏. 基于感知注意力和轻量金字塔融合网络模型的室内场景语义分割方法[J]. 华东理工大学学报(自然科学版), 2023, 49(1): 116-127. DOI: 10.14135/j.cnki.1006-3080.20210928002
    引用本文: 李钰, 袁晴龙, 徐少铭, 和嘉鹏. 基于感知注意力和轻量金字塔融合网络模型的室内场景语义分割方法[J]. 华东理工大学学报(自然科学版), 2023, 49(1): 116-127. DOI: 10.14135/j.cnki.1006-3080.20210928002
    LI Yu, YUAN Qinglong, XU Shaoming, HE Jiapeng. Semantic Segmentation Method of Indoor Scene Based on Perceptual Attention and Lightweight Pyramid Fusion Network Model[J]. Journal of East China University of Science and Technology, 2023, 49(1): 116-127. DOI: 10.14135/j.cnki.1006-3080.20210928002
    Citation: LI Yu, YUAN Qinglong, XU Shaoming, HE Jiapeng. Semantic Segmentation Method of Indoor Scene Based on Perceptual Attention and Lightweight Pyramid Fusion Network Model[J]. Journal of East China University of Science and Technology, 2023, 49(1): 116-127. DOI: 10.14135/j.cnki.1006-3080.20210928002

    基于感知注意力和轻量金字塔融合网络模型的室内场景语义分割方法

    Semantic Segmentation Method of Indoor Scene Based on Perceptual Attention and Lightweight Pyramid Fusion Network Model

    • 摘要: 针对实验室场景理解时存在背景复杂、光照多变等问题,利用RGB信息与深度信息在场景理解中具有互补性的特点,提出了一种感知注意力和轻量空间金字塔融合的网络模型(Perception Attention and Lightweight Spatial Fusion Network,PLFNet)。在该模型的感知注意力模块中,利用RGB图像与深度图像在网络中的权重不同,以加权的方式实现深度信息对RGB信息的多级辅助;在轻量空间金字塔池化模块中,通过增加级联的空洞空间卷积,不但有效地聚集了多尺度特征,而且比传统空间金字塔池化模块的参数量减少了约92%,使RGB信息和深度信息的融合更充分。在两个室内场景公开数据集上的实验结果表明,该模型的表现均优于经典算法。消融实验结果表明,本文模型添加感知注意力模块和轻量空间金字塔池化模块后,平均交并比分别提高了4.3%和3.5%。最后,利用场景较复杂的生物实验室数据集进行测试,结果表明本文模型可以有效地实现对生物实验室的场景理解。

       

      Abstract: Aiming at the problems of complex background and variable lighting in laboratory scene understanding, this paper proposes a perceptual attention and lightweight spatial fusion network model, PLFNet, by using the complementary characteristics of RGB image information and depth image information in scene understanding. In the perceptual attention module of this model, RGB image and the depth image in the network are used via weighting to implement the multi-level assistance of depth information to the RGB information. In the lightweight spatial pyramid pooling module, by adding cascaded hole spatial convolution, not only the multi-scale features are effectively gathered, but also the parameters are reduced by about 92% compared with the traditional spatial pyramid pooling module, which enables the fusion of RGB information and depth information to fuse more adequately. It is shown via the experiments on two public datasets of indoor scenes that the proposed model performs better than some classic algorithms. The results of ablation experiments verify that the average intersection and union ratio of this model is increased by 4.3% and 3.5% ,respectively. Finally, a test based on the dataset of biological laboratory on the more complex scenes is carried out, whose results show that the model can effectively realize the scene understanding of biological laboratory.

       

    /

    返回文章
    返回