Abstract:
With the explosive growth of video data, video intelligent analysis has been becoming the academic and industrial research hotspot. The objective of video action detection is to obtain the location and time information of actions based on action recognition. By combining the single shot multi-box detector (SSD) with the RGB space flow and optical flow, this paper proposes a region spatiotemporal two-in-one action detection network. To improve the nonlocal spatiotemporal module in the network, a pixel filter is proposed in optical flow to extract the information of key motion regions, and then, the correlation calculation is performed only on the selected key motion regions in the spatial flow. The proposed module can get long-range dependence of actions effectively and reduce the computational cost of the nonlocal module and the interference of video background noise. Finally, the proposed network is tested on the benchmark dataset UCF101-24, and attain better detection performance.