基于局部和全局特征提取及多级特征聚合的中文方言识别模型

孟一凡; 陈宁; 李泓锴

doi:10.14135/j.cnki.1006-3080.20231011003

基于局部和全局特征提取及多级特征聚合的中文方言识别模型

Chinese Dialect Identification Based on Local and Global Feature Fusion and Multi-Level Feature Aggregation

摘要

摘要: 与其他语种的方言相比，中文方言种类较多，且方言类间差异小，类内差异大，因此中文方言识别极具挑战性。考虑到中文方言间的差异性可能体现在语音的局部（短时）特性上，也可能体现在语音的全局（长时）特性上，同时还可能反映在语音不同层级的特性上，本文提出一种融合语音局部和全局特征提取以及多级特征聚合的中文方言识别模型。首先通过Res2Block提取语音的局部特征，然后利用Conformer提取语音的全局特征，最后通过将多个Conformer级联输出进行多层级特征的聚合。跨域和非跨域的实验结果表明，该模型取得了比基线模型更高的识别准确率。

Abstract: Compared to dialects in other languages, there are a wide variety of dialects with small inter-class differences but large intra-class differences in China. Therefore, Chinese dialect identification poses significant challenges. Considering that the differences between Chinese dialects may manifest in both local (short-term) and global (long-term) speech characteristics, as well as in different hierarchical levels of speech, this paper proposes a Chinese dialect identification model that integrates the extraction of both local and global speech features and the aggregation of multi-level features. Specifically, this paper first extracts the local features of speech using Res2Block, then utilizes Conformer to extract the global features of speech, and finally aggregates multi-level features by cascading the outputs of multiple Conformers. Experimental results on both unseen domain and seen domain settings demonstrate that the proposed model achieves higher recognition accuracy compared to the baseline model.

HTML全文

参考文献(37)

施引文献

资源附件(0)