高级检索

  • ISSN 1006-3080
  • CN 31-1691/TQ

基于骨架的自适应图卷积和LSTM行为识别

冒鑫鑫 吴胜昔 咸博龙 顾幸生

冒鑫鑫, 吴胜昔, 咸博龙, 顾幸生. 基于骨架的自适应图卷积和LSTM行为识别[J]. 华东理工大学学报(自然科学版). doi: 10.14135/j.cnki.1006-3080.20210625001
引用本文: 冒鑫鑫, 吴胜昔, 咸博龙, 顾幸生. 基于骨架的自适应图卷积和LSTM行为识别[J]. 华东理工大学学报(自然科学版). doi: 10.14135/j.cnki.1006-3080.20210625001
MAO Xinxin, WU Shengxi, XIAN BoLong, GU Xingsheng. Adaptive Graph Convolution and LSTM Action Recognition Based on Skeleton[J]. Journal of East China University of Science and Technology. doi: 10.14135/j.cnki.1006-3080.20210625001
Citation: MAO Xinxin, WU Shengxi, XIAN BoLong, GU Xingsheng. Adaptive Graph Convolution and LSTM Action Recognition Based on Skeleton[J]. Journal of East China University of Science and Technology. doi: 10.14135/j.cnki.1006-3080.20210625001

基于骨架的自适应图卷积和LSTM行为识别

doi: 10.14135/j.cnki.1006-3080.20210625001
基金项目: 国家自然科学基金(61973120);上海汽车工业科技发展基金(1837)
详细信息
    作者简介:

    冒鑫鑫(1998-),男,江苏南通人,硕士生,研究方向为机器学习与人工智能。E-mail: 18321629335@163.com

    通讯作者:

    顾幸生,E-mail: xsgu@ecust.edu.cn

  • 中图分类号: TP273

Adaptive Graph Convolution and LSTM Action Recognition Based on Skeleton

  • 摘要: 针对骨架行为识别任务的识别精确度问题,提出了一种自适应图卷积和长短时记忆相结合的模型(AAGC-LSTM)。该模型以捕获人体骨架运动的时空共现特征为出发点,提取运动特征时打破以人体自然骨架为固有图卷积邻接矩阵的束缚,利用自适应图卷积与长短时记忆神经网络的结合进行时空共现特征的提取。为了捕获行为识别任务的关键节点信息,嵌入了空间注意力模块,将人体骨架信息以一种动态的方式进行结合,同时将骨骼关节点一级运动信息和骨骼边二级运动信息送入模型组成双流分支并进行融合以提高模型识别的准确率。该模型在NTU RGB+D数据集的Cross Subject和Cross View协议下分别取得了90.1%和95.6%的准确率,在North Western数据集上取得了93.6%的准确率,验证了该模型在提取骨架运动时空特征和行为识别任务上的优越性。

     

  • 图  1  AAGC-LSTM网络模型结构

    Figure  1.  AAGC-LSTM network model structure

    图  2  图卷积子集划分方式

    Figure  2.  Graph convolution subset partitioning strategy

    图  3  自适应图卷积模型结构

    Figure  3.  Adaptive graph convolution model structure

    图  4  AAGC-LSTM结构

    Figure  4.  AAGC-LSTM structure

    图  5  双流网络框架

    Figure  5.  Two stream network structure

    图  6  骨架自适应拓扑结构可视化图

    Figure  6.  Skeleton adaptive topology visualization

    图  7  损失值变化曲线

    Figure  7.  Change curves of loss values

    图  8  识别准确率变化曲线

    Figure  8.  Change curves of recognition accuracy

    图  9  分类准确率低于0.8的行为混淆矩阵

    Figure  9.  Action Confusion matrix that the classification accuracy less than 0.8

    图  10  SR-TCL模型的混淆矩阵

    Figure  10.  SR-TCL’s Confusion matrix

    图  11  AAGC-LSTM模型的混淆矩阵

    Figure  11.  AAGC-LSTM’s confusion matrix

    表  1  自适应模块消融实验结果

    Table  1.   Ablation experiment results of adaptive modules

    ModuleAccuracy/%
    CSCV
    ST-GCN84.392.7
    AAGC-LSTM-B86.793.8
    AAGC-LSTM-C86.593.6
    AAGC-LSTM-ABC87.793.9
    AAGC-LSTM-BC88.194.1
    下载: 导出CSV

    表  2  双流网络框架的消融实验结果

    Table  2.   Ablation experiment results of two stream network structure

    Network structureAccuracy/%
    CSCV
    J-Stream88.194.1
    B-Stream88.693.8
    2-Stream90.195.6
    下载: 导出CSV

    表  3  各种模型NTU RGB+D数据集上的比较结果

    Table  3.   Comparison results of various models on NTU RGB+D dataset

    ModelAccuracy/%
    CSCV
    Lie group50.182.3
    TG ST-LSTM[22]77.769.2
    STA-LSTM[12]81.273.4
    ST-GCN[14]88.381.5
    BGC-LSTM[23]89.081.8
    SR-TCL[17]92.484.8
    HCN[4]91.186.5
    PB-GCN93.287.5
    SGN[21]94.589.0
    AAGC-LSTM95.690.1
    下载: 导出CSV

    表  4  与现有模型在Northwestern UCLA数据集上的比较

    Table  4.   Comparison with state of arts on Northwestern UCLA dataset

    ModelAccuracy/%
    Lie group74.2
    HBRNN-L[24]78.5
    Visualization CNN[25]86.1
    STA-LSTM89.2
    EleAtt-GRU[26]90.7
    TS+MSSFN[27]88.9
    AAGC-LSTM93.6
    下载: 导出CSV
  • [1] POPPE R. A survey on vision-based human action recognition[J]. Image and Vision Computing, 2010, 28(6): 976-990. doi: 10.1016/j.imavis.2009.11.014
    [2] WEINLAND D, RONFARD R, BOYER E. A survey of vision-based methods for action representation, segmentation and recognition[J]. Computer Vision and Image Understanding, 2011, 115(2): 224-241. doi: 10.1016/j.cviu.2010.10.002
    [3] AGGARWAL J K, RYOO M S. Human activity analysis: A review[J]. ACM Computing Surveys, 2011, 43(3): 1-43.
    [4] LI C, ZHONG Q, XIE D, et al. Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation[C]// Proceedings of the 27th International Joint Conference on Artificial Intelligence. USA: ACM, 2018: 786-792.
    [5] YAN Y, XU J, NI B, et al. Skeleton aided articulated motion generation[C]//ACM International Conference on Multimedia . USA: ACM, 2017: 199-207.
    [6] ZHANG Z Y. Microsoft kinect sensor and its effect[J]. IEEE Multimedia, 2012, 19(2): 4-10. doi: 10.1109/MMUL.2012.24
    [7] CAO Z, HIDALGO G, SIMON T. et al. OpenPose: Realtime multi-person 2D pose estimation using part affinity fields[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(1): 172-186. doi: 10.1109/TPAMI.2019.2929257
    [8] VEMULAPALLI R, ARRATE F, Chellappa R. Human action recognition by representing 3D skeletons as points in a lie group[C]//The IEEE Conference on Computer Vision and Pattern Recognition . USA: IEEE, 2014: 588-595.
    [9] FERNANDO B, GAVVES E, ORAMAS J M, et al. Modeling video evolution for action recognition[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition . USA: IEEE, 2015: 5378-5387.
    [10] XIA L, CHEN C C, AGGARWAL J K. View invariant human action recognition using histograms of 3D joints[C]//2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. USA: IEEE, 2012: 20-27.
    [11] WANG H, KLÄSER A, SCHMID C, et al. Dense trajectories and motion boundary descriptors for action recognition[J]. International Journal of Computer Vision, 2013, 103(1): 60-79. doi: 10.1007/s11263-012-0594-8
    [12] SONG S J, LAN C L, XING J L, et al. An end-to-end spatio-temporal attention model for human action recognition from skeleton data[C]// Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence. USA: ACM, 2017: 4263-4270.
    [13] KIPF T, ETHAN F, WANG K C, et al. Neural relational inference for interacting systems[C]// 35th International Conference on Machine Learning (ICML). [s. l. ]: PMLR, 2018: 2688-2697.
    [14] YAN S, XIONG Y, Lin D. Spatial temporal graph convolutional networks for skeleton-based action recogni-tion[EB/OL]. arxiv. org, (2018-01-25)[2021-05-25]. https//arxiv. org. /abs/1801.07455v1.
    [15] THAKKAR K C, NARAYANAN P J. Part-based graph Convolutional network for action recognition[EB/OL]//arxiv. org, (2018-09-13)[2021-05-25].https://arxiv.org/abs/1809.04983.
    [16] 付仔蓉, 吴胜昔, 吴潇颖, 等. 基于空间特征的BI-LSTM人体行为识别[J]. 华东理工大学学报(自然科学版), 2021, 47(2): 225-232.
    [17] SI C Y, JING Y, WANG W, et al. Skeleton-based action recognition with spatial reasoning and temporal stack learning[C]// European Conference on Computer Vision . UK : Springer , 2018: 106-112.
    [18] SHAHROUDY A, LIU J, NG T T, et al. NTU RGB+D: A large scale dataset for 3D human activity analysis[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition . USA: IEEE, 2016: 1010-1019.
    [19] WANG J, NIE X, XIA Y, et al. Cross-view action modeling, learning, and recognition[C]// 2014 IEEE Conference on Computer Vision and Pattern Recognition. USA: IEEE, 2014: 1-8 .
    [20] DIEDERIK K , JIMMY B. Adam: A method for stochastic optimization[EB/OL]//arxiv. org, (2014-12-22)[2021-05-25].https://arxiv.org/abs/1412.6980v5.
    [21] ZHANG P F, LAN C L, ZENG W J, et al. Semantics-guided neural networks for efficient skeleton-based human action recognition[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE, 2020: 1109-1118.
    [22] LIU J, SHAHROUDY A, XU D, et al. Spatio-temporal LSTM with trust gates for 3d human action recognition[C]// European Conference on Computer Vision . UK : Springer , 2016: 816-833.
    [23] ZHAO R, WANG K, SU H, et al. Bayesian graph convolution LSTM for skeleton based action recognition[C]// The 2019 IEEE Conference on International Conference on Computer Vision. Seoul, Korea: IEEE, 2019: 6881-6891.
    [24] DU Y, WANG W, WANG L. Hierarchical recurrent neural network for skeleton based action recognition[C]// 2015 IEEE Conference on Computer Vision and Pattern Recognition . USA: IEEE, 2015: 1110-1118 .
    [25] SIMONYAN K, ZISSERMAN A. Two-stream convolutional networks for action recognition in videos[EB/OL]//arxiv. org, (2014-06-09)[2021-05-25].https://arxiv.org/abs/1406.2199.
    [26] ZHANG P, XUE J, LANC C, et al. Adding attentiveness to the neurons in recurrent neural networks[C]// The 15th European Conference on Computer Vision. UK: Springer, 2018: 136-152.
    [27] MENG F, LIU H, LIANG Y, et al. Sample fusion network: An end to end date augmentation network for skeleton-based human action recognition[J]. IEEE Transactions on Image Processing, 2019, 28(11): 5281-5295. doi: 10.1109/TIP.2019.2913544
  • 加载中
图(11) / 表(4)
计量
  • 文章访问数:  25
  • HTML全文浏览量:  7
  • PDF下载量:  11
  • 被引次数: 0
出版历程
  • 收稿日期:  2021-06-25
  • 录用日期:  2021-11-05
  • 网络出版日期:  2021-11-12

目录

    /

    返回文章
    返回