基于重参数化大核卷积的高分辨率姿态估计

陈佳艺; 黄晓宇; 吴胜昔; 王学武

doi:10.14135/j.cnki.1006-3080.20240722001

基于重参数化大核卷积的高分辨率姿态估计

High-Resolution Pose Estimation Based on Reparameterized Large Kernel Convolution

摘要

摘要: 尽管人体姿态估计领域的研究已取得显著进展，但面对动态场景变化、目标遮挡及背景复杂等难题，实现高精度、强鲁棒性的姿态估计依然面临巨大挑战。为解决这些问题，特别是关键点遮挡、重合及复杂环境干扰问题，本文提出了一种融合大核卷积技术的高分辨率人体姿态估计模型（RepLK-HRNet）。该模型的核心在于特征提取网络的独特设计，通过引入重参数化大核卷积策略，增强了模型捕捉多尺度、多层次特征信息的能力，同时通过调整网络结构，显著降低了参数量和计算复杂度。实验结果表明，相较于传统的高分辨率网络（HRNet）模型，RepLK-HRNet模型在标准数据集MS COCO2017上的精度提高了1.83%，在遮挡数据集OCHuman上的精度提高了23.7%，计算复杂度参数Params和GFLOPs分别下降了63.84%、37.69%。RepLK-HRNet模型在常规及遮挡、关键点混淆等条件下的人体姿态估计精度均实现了显著提升，展现了出色的鲁棒性和泛化能力，同时还满足了实际应用中对计算效率和存储空间的要求。

Abstract: Although significant progress has been made in the field of human pose estimation, it still faces enormous challenges for achieving high-precision and robust pose estimation for the case of dynamic scene changes, occlusions, and complex backgrounds. To address these issues—particularly keypoint occlusion, overlap, and interference from complex environments—this paper proposes a high-resolution human pose estimation model incorporating large kernel convolution techniques, named RepLK-HRNet. The core innovation of the proposed model lies in its unique design of the feature extraction network, which introduces a reparameterized large kernel convolution strategy to enhance the model's ability in capturing multi-scale and multi-level feature information. Meanwhile, the network architecture is optimized to significantly reduce the number of parameters and computational complexity. Experimental results demonstrate that, compared to the traditional HRNet model, the RepLK-HRNet model achieves an improvement of 1.83% in accuracy on the standard MS COCO 2017 dataset and an increase of 23.7% in accuracy on the occlusion dataset OCHuman, while reducing Params by 63.84% and GFLOPs by 37.69%. These results indicate that RepLK-HRNet significantly improves pose estimation accuracy under general, occluded, and keypoint-confused conditions, showcasing excellent robustness and generalization capabilities. Moreover, it meets practical application demands in terms of computational efficiency and memory usage.

HTML全文

参考文献(45)

施引文献

资源附件(0)