高级检索

  • ISSN 1006-3080
  • CN 31-1691/TQ

基于图像描述的实验室气瓶危险场景辨识方法

傅煦嘉 周家乐 顾震 颜秉勇 王慧锋

傅煦嘉, 周家乐, 顾震, 颜秉勇, 王慧锋. 基于图像描述的实验室气瓶危险场景辨识方法[J]. 华东理工大学学报(自然科学版). doi: 10.14135/j.cnki.1006-3080.20220124002
引用本文: 傅煦嘉, 周家乐, 顾震, 颜秉勇, 王慧锋. 基于图像描述的实验室气瓶危险场景辨识方法[J]. 华东理工大学学报(自然科学版). doi: 10.14135/j.cnki.1006-3080.20220124002
FU Xujia, ZHOU Jiale, GU Zhen, YAN Bingyong, WANG Huifeng. Identification Method of Cylinder in Laboratory Dangerous Scene Based on Image Caption[J]. Journal of East China University of Science and Technology. doi: 10.14135/j.cnki.1006-3080.20220124002
Citation: FU Xujia, ZHOU Jiale, GU Zhen, YAN Bingyong, WANG Huifeng. Identification Method of Cylinder in Laboratory Dangerous Scene Based on Image Caption[J]. Journal of East China University of Science and Technology. doi: 10.14135/j.cnki.1006-3080.20220124002

基于图像描述的实验室气瓶危险场景辨识方法

doi: 10.14135/j.cnki.1006-3080.20220124002
基金项目: 青年科学基金项目(61906068);国家重点研发计划(2018YFC1803306)
详细信息
    作者简介:

    傅煦嘉(1996—),男,河南洛阳人,硕士生,主要研究方向:图像描述,深度学习。E-mail:lyfuxujia@163.com

    通讯作者:

    周家乐, E-mail:zhou.jiale@ecust.edu.cn

  • 中图分类号: TP183;TP391.9

Identification Method of Cylinder in Laboratory Dangerous Scene Based on Image Caption

  • 摘要: 气瓶是实验室中常见设备,具有数量大、危险隐蔽性高、事故危害大等特点,因此气瓶监管问题一直是实验室安全管理的痛点。视频监控是有效的实验室安全管理手段,但监控视频需要有专人盯看,且监控人员的素质不一,无法保证其可以识别出视频图片中的危险信息。本文针对实验室气瓶场景首次提出了一种结合目标检测与文本识别的图像描述生成方法,用于辨识气瓶场景中的潜在危险信息,并以文本形式警示监控人员。该方法首先提取场景物体的特征与瓶身上文字的特征,而后将将特征映射入多模态嵌入空间,接着使用Transformer结构生成描述结果,最后根据描述语句判断场景是否危险。实验结果表明,通过本方法生成的描述语句可以有效辨识出实验室气瓶场景中的危险物品与危险原因。

     

  • 图  1  图像描述模型结构图

    Figure  1.  The structure diagram of image caption model

    图  2  改进的Faster R-CNN结构

    Figure  2.  Improved Faster R-CNN structure

    图  3  文本检测网络流程图

    Figure  3.  Flow chart of text detection network

    图  4  文本识别网络结构

    Figure  4.  Text recognition network structure

    图  5  基于Transformer的多模态融合预测

    Figure  5.  Multimodal fusion prediction based on Transformer

    图  6  气瓶图像标签示例

    Figure  6.  Example of cylinder image label

    图  7  气瓶目标检测实验效果

    Figure  7.  Results of object detection in cylinder scene

    图  8  气瓶图像描述示例

    Figure  8.  Example of image caption in cylinder scene

    表  1  气瓶危险场景分类

    Table  1.   Classification of cylinder dangerous scene

    ClassesCause of danger
    IThe cylinder is not fixed
    IITwo cylinders can’t be placed together
    下载: 导出CSV

    表  2  部分网络参数设置

    Table  2.   Part of network parameters

    ParametersBatch sizeMomentumDecayLearning rate
    Values40.90.0010.0005
    下载: 导出CSV

    表  3  气瓶场景不同目标检测算法对比

    Table  3.   Comparison of different object detection algorithms in cylinder scene

    ScenebaselineResNetFPNAPMAP
    cylindercarrierstrapcabinet
    I0.7460.7540.6250.7480.718
    0.7590.7510.6280.7740.728
    0.8170.8270.6870.8170.787
    II0.754-0.6360.7820.724
    0.767-0.6410.8210.743
    0.826-0.7130.8920.810
    III0.7520.7350.6300.7700.722
    0.7630.7300.6340.8120.734
    0.8210.7960.7020.8840.801
    下载: 导出CSV

    表  4  正负样本判定规则

    Table  4.   Positive and negative sample rule

    Sample classesRules
    PositiveThe candidate box has the highest IOU with the GT,and the included angle is less than 15°
    The IOU between candidate box and GT is greater than 0.7, and the included angle is less than 15°
    NegativeThe IOU between the candidate box and the GT is less than 0.3
    The IOU between candidate box and GT is greater than 0.7, and the included angle is greater than 15°
    下载: 导出CSV

    表  5  气瓶文本检测识别实验结果

    Table  5.   Experimental results of text detection and recognition in cylinder scene

    Detection results
    Recognition resultsTextCO2TextOXYGENTextN2
    Confidence0.807Confidence0.917Confidence0.772
    下载: 导出CSV

    表  6  本文算法与其他算法对比

    Table  6.   Comparation between our method and other algorithms

    AlgorithmBLEU-1BLEU-4ROUGECIDER
    Soft-Attention0.6300.248-0.653
    Adaptive0.6420.3450.5390.788
    Ours0.7920.5720.7241.068
    下载: 导出CSV
  • [1] 何浏, 石荣铭, 陈艳, 高维银. 高校实验室气瓶管理问题分析[J]. 中国特种设备安全, 2021, 37(7): 51-54. doi: 10.3969/j.issn.1673-257X.2021.07.011
    [2] 陶亚辉, 冯玉如. 高校实验室气瓶管理的探讨[J]. 化工管理, 2019(22): 10-12. doi: 10.3969/j.issn.1008-4800.2019.22.008
    [3] Hossain M D Z, Sohel F, Shiratuddin M F, et al. A comprehensive survey of deep learning for image captioning[J]. ACM Computing Surveys (CsUR), 2019, 51(6): 1-36.
    [4] Vinyals O, Toshev A, Bengio S, et al. Show and tell: A neural image caption generator[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. USA: IEEE, 2015: 3156-3164.
    [5] Xu K, Ba J, Kiros R, et al. Show, attend and tell: Neural image caption generation with visual attention[C]// International conference on machine learning. France: PMLR, 2015: 2048-2057.
    [6] Lu J, Xiong C, Parikh D, et al. Knowing when to look: Adaptive attention via a visual sentinel for image captioning[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. USA: IEEE, 2017: 375-383.
    [7] Anderson P, He X, Buehler C, et al. Bottom-up and top-down attention for image captioning and visual question answering[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. USA: IEEE, 2018: 6077-6086.
    [8] Wang Z, Bao R, Wu Q, et al. Confidence-aware Non-repetitive Multimodal Transformers for TextCaps[C]// Proceedings of the AAAI Conference on Artificial Intelligence. USA: AAAI, 2021, 35(4): 2835-2843.
    [9] Ren S, He K, Girshick R, et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 39(6): 1137-1149.
    [10] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. USA: IEEE, 2016: 770-778.
    [11] Lin T Y, Dollár P, Girshick R, et al. Feature pyramid networks for object detection[C]// Proceedings of the IEEE conference on computer vision and pattern recognition. USA: IEEE, 2017: 2117-2125.
    [12] Ma J, Shao W, Ye H, et al. Arbitrary-oriented scene text detection via rotation proposals[J]. IEEE Transactions on Multimedia, 2018, 20(11): 3111-3122. doi: 10.1109/TMM.2018.2818020
    [13] Shi B, Bai X, Yao C. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition[J]. IEEE transactions on pattern analysis and machine intelligence, 2016, 39(11): 2298-2304.
    [14] Jin S, Jang H, Kim W. Improving Bidirectional LSTM-CRF model Of Sequence Tagging by using Ontology knowledge based feature[J]. Journal of intelligence and information systems, 2018, 24(1): 253-266.
    [15] 陈颖呈, 陈宁. 基于音频内容和歌词文本相似度融合的翻唱歌曲识别模型[J]. 华东理工大学学报(自然科学版), 2021, 47(1): 74-80.
    [16] Joulin A, Grave É, Bojanowski P, et al. Bag of Tricks for Efficient Text Classification[C]// Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. Spain: ACL, 2017: 427-431.
    [17] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. USA: MIT Press, 2017: 6000-6010.
    [18] Lin T Y, Maire M, Belongie S, et al. Microsoft coco: Common objects in context[C]//European conference on computer vision. Springer, Cham, Switzerland: IEEE, 2014: 740-755.
    [19] Neumann L, Matas J. Efficient scene text localization and recognition with local character refinement[C]//2015 13th international conference on document analysis and recognition (ICDAR). USA: IEEE, 2015: 746-750.
    [20] Sidorov O, Hu R, Rohrbach M, et al. Textcaps: a dataset for image captioning with reading comprehension[C]// European Conference on Computer Vision. Springer, Cham, UK: IEEE, 2020: 742-758.
    [21] Kingma D P, Ba J L. Adam: A method for stochastic optimization[C]//3rd International Conference on Learning Representations. USA: ICLR, 2015: 273-297.
    [22] Vedantam R, Lawrence Zitnick C, Parikh D. Cider: Consensus-based image description evaluation[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. USA: IEEE, 2015: 4566-4575.
  • 加载中
图(8) / 表(6)
计量
  • 文章访问数:  20
  • HTML全文浏览量:  25
  • PDF下载量:  4
  • 被引次数: 0
出版历程
  • 收稿日期:  2022-01-24
  • 网络出版日期:  2022-06-07

目录

    /

    返回文章
    返回