Abstract:
Aiming at the problems of complex background and variable lighting in laboratory scene understanding, this paper proposes a perceptual attention and lightweight spatial fusion network model, PLFNet, by using the complementary characteristics of RGB image information and depth image information in scene understanding. In the perceptual attention module of this model, RGB image and the depth image in the network are used via weighting to implement the multi-level assistance of depth information to the RGB information. In the lightweight spatial pyramid pooling module, by adding cascaded hole spatial convolution, not only the multi-scale features are effectively gathered, but also the parameters are reduced by about 92% compared with the traditional spatial pyramid pooling module, which enables the fusion of RGB information and depth information to fuse more adequately. It is shown via the experiments on two public datasets of indoor scenes that the proposed model performs better than some classic algorithms. The results of ablation experiments verify that the average intersection and union ratio of this model is increased by 4.3% and 3.5% ,respectively. Finally, a test based on the dataset of biological laboratory on the more complex scenes is carried out, whose results show that the model can effectively realize the scene understanding of biological laboratory.