[1] 黄晴晴, 周风余, 刘美珍.  基于视频的人体动作识别算法综述[J]. 计算机应用研究, 2020, 37(11): 3213-3219.
[2] 朱煜, 赵江坤, 王逸宁, 等.  基于深度学习的人体行为识别算法综述[J]. 自动化学报, 2016, 42(6): 848-857.
[3]

TIAN Y, SUKTHANKAR R, SHAH M. Spatiotemporal deformable part models for action detection[C]//the IEEE International Conference on Computer Vision and Pattern Recognition. USA: IEEE, 2013: 2642-2649.

[4]

TRAN D, YUAN J. Max-margin structured output regression for spatio-temporal action localization[C]//Advances in Neural Information Processing Systems. USA: NIPS, 2012: 350-358.

[5] YUAN J, LIU Z, WU Y.  Discriminative video pattern search for efficient action detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(9): 1728-1743.   doi: 10.1109/TPAMI.2011.38
[6] GAIDON A, HARCHAOUI Z, SCHMID C.  Temporal localization of actions with actoms[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(11): 2782-2795.   doi: 10.1109/TPAMI.2013.65
[7]

ONEATA D, VERBEEK J J, SCHMID C. Efficient action localization with approximately normalized fisher vectors[C]//the IEEE Conference on Computer Vision and Pattern Recognition. USA: IEEE, 2014: 2545-2552.

[8]

VAN G J C, JAIN M, GATI E, et al. APT: Action localization proposals from dense trajectories[C]//the British Machine Vision Conference. UK: BMVA Press, 2015: 1-12.

[9]

PENG X, SCHMID C. Multi-region two-stream R-CNN for action detection[C]//European Conference on Computer Visio. Amsterdam, Netherlands: Springer, 2016: 744-759.

[10]

SINGH G, SAHA S, SAPIENZA M, et al. Online real-time multiple spatiotemporal action localisation and prediction[C]//the IEEE International Conference on Computer Vision. Italy: IEEE, 2017: 3637-3646.

[11]

WANG X, GIRSHICK R, GUPTA A, et al. Non-local neural networks[C]//the IEEE Conference on Computer Vision and Pattern Recognitio. USA: IEEE, 2018: 7794-7803.

[12]

SIMONYAN K, ZISSERMAN A. Two-stream convolutional networks for action recognition in videos[C]//Advances in Neural Information Processing Systems. Canada: NIPS, 2014: 568-576.

[13]

FEICHTENHOFER C, PINZ A, ZISSERMAN A. Convolutional two-stream network fusion for video action recognition[C]//the IEEE International Conference on Computer Vision and Pattern Recognitio. USA: IEEE, 2016: 1933-1941.

[14] 杨天明, 陈志, 岳文静.  基于视频深度学习的时空双流人物动作识别模型[J]. 计算机应用, 2018, 38(3): 895-899, 915.   doi: 10.11772/j.issn.1001-9081.2017071740
[15]

SAHA S, SINGH G, SAPIENZA M, et al. Deep learning for detecting multiple space-time action tubes in videos[C]//International Computer Vision Summer School. Italy: ICVSS, 2016: 1-13.

[16]

YANG Z H, GAO J Y, NEVATIA R. Spatio-temporal action detection with cascade proposal and location anticipation[C]//British Machine Vision Conference. UK: BMVC, 2017: 1-12.

[17]

BEHL H S, SAPIENZA M, SINGH G, et al. Incremental tube construction for human action detection[C]//British Machine Vision Conference. Newcastle, UK: BMVC, 2018: 1-12.

[18] SONG Y, KIM I.  Spatio-temporal action detection in untrimmed videos by using multimodal features and region proposals[J]. Sensors, 2019, 19(5): 1085-1103.   doi: 10.3390/s19051085
[19] ALWANDO E H P, CHEN Y T, FANG W H.  CNN-Based multiple path search for action tube detection in videos[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2020, 30(1): 104-116.   doi: 10.1109/TCSVT.2018.2887283