Abstract:
Aiming at the accuracy problem of action recognition task, this paper proposes an adaptive graph convolution and long short-term memory (AAGC-LSTM) based model. By capturing the spatial-temporal co-occurrence features of human skeleton motion, this model breaks the constraint of using the natural human skeleton as the inherent adjacency matrix in graph convolution, and combines both the adaptive graph convolution and LSTM to achieve the extraction of spatial-temporal co-occurrence-features. In order to capture the key nodes’ information of the action recognition task, an attention module is embedded into the proposed model to combine the human skeleton information in a dynamic way. Meanwhile, the primary motion information of skeleton joints and secondary motion information of skeleton edges are integrated into the AAGC-LSTM model separately to form the two branches, and are further merged to improve the accuracy of recognition. It is shown via experiments that the proposed model can achieve 90.1% and 95.6% accuracy on the NTU RGB+D dataset under the Cross Subject and Cross View metric, respectively, and 93.6% accuracy on the North Western dataset, which verifies its superior in extracting skeleton motion spatial-temporal features and action recognition task.