高级检索

  • ISSN 1006-3080
  • CN 31-1691/TQ

基于区域时空二合一网络的动作检测方法

汤强 朱煜 郑兵兵 郑婕

汤强, 朱煜, 郑兵兵, 郑婕. 基于区域时空二合一网络的动作检测方法[J]. 华东理工大学学报(自然科学版). doi: 10.14135/j.cnki.1006-3080.20201126004
引用本文: 汤强, 朱煜, 郑兵兵, 郑婕. 基于区域时空二合一网络的动作检测方法[J]. 华东理工大学学报(自然科学版). doi: 10.14135/j.cnki.1006-3080.20201126004
TANG Qiang, ZHU Yu, ZHENG Bingbing, ZHENG Jie. Action Detection Based on Region Spatiotemporal Two-in-One Network[J]. Journal of East China University of Science and Technology. doi: 10.14135/j.cnki.1006-3080.20201126004
Citation: TANG Qiang, ZHU Yu, ZHENG Bingbing, ZHENG Jie. Action Detection Based on Region Spatiotemporal Two-in-One Network[J]. Journal of East China University of Science and Technology. doi: 10.14135/j.cnki.1006-3080.20201126004

基于区域时空二合一网络的动作检测方法

doi: 10.14135/j.cnki.1006-3080.20201126004
详细信息
    作者简介:

    汤强:汤 强(1995—),男,江苏宿迁人,硕士生,主要研究方向为视频理解、图像处理。E-mail:TQ1508420095@163.com

    通讯作者:

    朱 煜,E-mail:zhuyu@ecust.edu.cn

  • 中图分类号: TP391.4

Action Detection Based on Region Spatiotemporal Two-in-One Network

  • 摘要: 视频动作检测研究是在动作识别的基础上进一步获取动作发生的位置和时间信息。结合RGB空间流和光流时间流,提出了一种基于SSD的区域时空二合一动作检测网络。改进了非局部时空模块,在光流中设计了像素点筛选器来提取运动关键区域信息,只对空间流中筛选出的动作关键区域进行相关性计算,有效获得动作长距离依赖并改善非局部模块计算成本较大的缺陷,同时降低了视频背景噪声的干扰。在基准数据集UCF101-24上进行了实验,结果表明所提出的区域时空二合一网络具有更好的检测性能,视频级别的平均精度(video_mAP)达到了43.17%@0.5。

     

  • 图  1  区域时空二合一网络结构图

    Figure  1.  Region spatiotemporal two-in-one network structure

    图  2  区域时空二合一模块结构

    Figure  2.  Region spatiotemporal two-in-one module structure

    图  3  标记筛选器原理图

    Figure  3.  Schematic diagram of mark selector

    图  4  “滑雪”动作数据集示例

    Figure  4.  Example of action skiing in dataset

    图  5  区域时空二合一模块前后特征图可视化结果示例

    Figure  5.  Example of visualization of the region spatiotemporal two-in-one module

    图  6  区域时空二合一网络对于UCF-24数据集中部分示例检测结果

    Figure  6.  Detection results of proposed network for UCF101 dataset

    表  1  UCF101-24数据集中各类别在IoU阈值为0.5时frame_AP的对比结果

    Table  1.   Comparison of frame_AP of UCF101-24 at IOU threshold of 0.5

    Classframe_AP/%$ \Delta $(diff)
    SSDThis paper
    Basketball28.9132.373.46
    Basketball_dunk49.9049.61-0.29
    Biking78.3678.27-0.09
    Cliff_diving50.1957.957.76
    Crick_bowling27.6831.443.76
    Diving78.9780.701.73
    Fencing87.9588.160.21
    Floor_gymnastics83.3885.442.06
    Golf_swing43.4444.831.39
    Horse_riding88.5788.41-0.16
    Ice_dancing71.6172.400.79
    Long_jump56.7759.442.67
    Pole_vault55.0456.721.68
    Rope-climbing81.3682.120.76
    Salsa_spin69.2669.01-0.25
    Skate_boarding68.6371.713.08
    Skiing68.0977.739.64
    Skijet84.4487.453.01
    Soccer_juggling79.9780.140.17
    Surfing82.8886.503.62
    Tennis_swing37.2637.18-0.08
    Trampoline_jumping60.6360.54-0.09
    Volleyball_spiking35.5136.500.99
    Walking_with_dog74.2674.440.18
    frame_mAP64.2966.211.92
    下载: 导出CSV

    表  2  不同算法在UCF101-24数据集上的video_mAP结果对比

    Table  2.   Comparison of video_mAP of different algorithms on UCF101-24

    Algorithmvideo_mAP/%
    0.200.500.750.50:0.95
    Literature [9]71.8035.901.608.80
    Literature[10]69.840.915.518.7
    Literature [15]66.7035.907.9014.40
    Literature [16]73.537.8--
    Literature [17]71.5340.0713.9117.90
    Literature [18]56.736.6--
    Literature [19]72.941.4--
    This paper74.2243.1714.8219.05
    下载: 导出CSV
  • [1] 黄晴晴, 周风余, 刘美珍. 基于视频的人体动作识别算法综述[J]. 计算机应用研究, 2020, 37(11): 3213-3219.
    [2] 朱煜, 赵江坤, 王逸宁, 等. 基于深度学习的人体行为识别算法综述[J]. 自动化学报, 2016, 42(6): 848-857.
    [3] TIAN Y, SUKTHANKAR R, SHAH M. Spatiotemporal deformable part models for action detection[C]//the IEEE International Conference on Computer Vision and Pattern Recognition. USA: IEEE, 2013: 2642-2649.
    [4] TRAN D, YUAN J. Max-margin structured output regression for spatio-temporal action localization[C]//Advances in Neural Information Processing Systems. USA: NIPS, 2012: 350-358.
    [5] YUAN J, LIU Z, WU Y. Discriminative video pattern search for efficient action detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(9): 1728-1743. doi: 10.1109/TPAMI.2011.38
    [6] GAIDON A, HARCHAOUI Z, SCHMID C. Temporal localization of actions with actoms[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(11): 2782-2795. doi: 10.1109/TPAMI.2013.65
    [7] ONEATA D, VERBEEK J J, SCHMID C. Efficient action localization with approximately normalized fisher vectors[C]//the IEEE Conference on Computer Vision and Pattern Recognition. USA: IEEE, 2014: 2545-2552.
    [8] VAN G J C, JAIN M, GATI E, et al. APT: Action localization proposals from dense trajectories[C]//the British Machine Vision Conference. UK: BMVA Press, 2015: 1-12.
    [9] PENG X, SCHMID C. Multi-region two-stream R-CNN for action detection[C]//European Conference on Computer Visio. Amsterdam, Netherlands: Springer, 2016: 744-759.
    [10] SINGH G, SAHA S, SAPIENZA M, et al. Online real-time multiple spatiotemporal action localisation and prediction[C]//the IEEE International Conference on Computer Vision. Italy: IEEE, 2017: 3637-3646.
    [11] WANG X, GIRSHICK R, GUPTA A, et al. Non-local neural networks[C]//the IEEE Conference on Computer Vision and Pattern Recognitio. USA: IEEE, 2018: 7794-7803.
    [12] SIMONYAN K, ZISSERMAN A. Two-stream convolutional networks for action recognition in videos[C]//Advances in Neural Information Processing Systems. Canada: NIPS, 2014: 568-576.
    [13] FEICHTENHOFER C, PINZ A, ZISSERMAN A. Convolutional two-stream network fusion for video action recognition[C]//the IEEE International Conference on Computer Vision and Pattern Recognitio. USA: IEEE, 2016: 1933-1941.
    [14] 杨天明, 陈志, 岳文静. 基于视频深度学习的时空双流人物动作识别模型[J]. 计算机应用, 2018, 38(3): 895-899, 915. doi: 10.11772/j.issn.1001-9081.2017071740
    [15] SAHA S, SINGH G, SAPIENZA M, et al. Deep learning for detecting multiple space-time action tubes in videos[C]//International Computer Vision Summer School. Italy: ICVSS, 2016: 1-13.
    [16] YANG Z H, GAO J Y, NEVATIA R. Spatio-temporal action detection with cascade proposal and location anticipation[C]//British Machine Vision Conference. UK: BMVC, 2017: 1-12.
    [17] BEHL H S, SAPIENZA M, SINGH G, et al. Incremental tube construction for human action detection[C]//British Machine Vision Conference. Newcastle, UK: BMVC, 2018: 1-12.
    [18] SONG Y, KIM I. Spatio-temporal action detection in untrimmed videos by using multimodal features and region proposals[J]. Sensors, 2019, 19(5): 1085-1103. doi: 10.3390/s19051085
    [19] ALWANDO E H P, CHEN Y T, FANG W H. CNN-Based multiple path search for action tube detection in videos[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2020, 30(1): 104-116. doi: 10.1109/TCSVT.2018.2887283
  • 加载中
图(6) / 表(2)
计量
  • 文章访问数:  221
  • HTML全文浏览量:  159
  • PDF下载量:  1
  • 被引次数: 0
出版历程
  • 收稿日期:  2020-11-26
  • 网络出版日期:  2021-03-24

目录

    /

    返回文章
    返回