Abstract:
Although significant progress has been made in the field of human pose estimation, it still faces enormous challenges for achieving high-precision and robust pose estimation for the case of dynamic scene changes, occlusions, and complex backgrounds. To address these issues—particularly keypoint occlusion, overlap, and interference from complex environments—this paper proposes a high-resolution human pose estimation model incorporating large kernel convolution techniques, named RepLK-HRNet. The core innovation of the proposed model lies in its unique design of the feature extraction network, which introduces a reparameterized large kernel convolution strategy to enhance the model's ability in capturing multi-scale and multi-level feature information. Meanwhile, the network architecture is optimized to significantly reduce the number of parameters and computational complexity. Experimental results demonstrate that, compared to the traditional HRNet model, the RepLK-HRNet model achieves an improvement of 1.83% in accuracy on the standard MS COCO 2017 dataset and an increase of 23.7% in accuracy on the occlusion dataset OCHuman, while reducing Params by 63.84% and GFLOPs by 37.69%. These results indicate that RepLK-HRNet significantly improves pose estimation accuracy under general, occluded, and keypoint-confused conditions, showcasing excellent robustness and generalization capabilities. Moreover, it meets practical application demands in terms of computational efficiency and memory usage.