基于对抗网络的声纹识别域迁移算法

季敏飞; 陈宁

doi:10.14135/j.cnki.1006-3080.20201209001

基于对抗网络的声纹识别域迁移算法

季敏飞,
陈宁

GAN-Based Domain Adaptation Algorithm for Speaker Verification

摘要

摘要: 针对声纹识别任务中常常出现的由于真实场景语音与模型训练语料在内部特征(情感、语言、说话风格、年龄)或外部特征(背景噪声、传输信号、麦克风、室内混响)等方面的差异所导致的模型识别率低的问题，提出了一种基于对抗网络的声纹识别域迁移算法。首先，利用源域语音对X-Vector的声纹识别模型进行训练；然后，采用域迁移方法将源域训练的X-Vector模型迁移至目标域训练数据；最后，在目标域测试数据上检测迁移后的模型性能，并将其与迁移前的模型性能进行对比。实验中采用AISHELL1作为源域，采用VoxCeleb1和CN-Celeb分别作为目标域对算法性能进行测试。实验结果表明，采用本文方法进行迁移后，在VoxCeleb1和CN-Celeb的目标域测试集上的等错误率分别下降了21.46%和19.24%。

Abstract: A key problem in speaker verification task is the condition mismatch between the training data and the testing data, which may significantly affect the verification performance. In most of the speaker recognition application scenarios, it is usually impossible to obtain enough samples to retrain the speaker recognition model. At the same time, the samples that is used to train the original model usually may be quite different from those obtained in real applications due to the variability caused by the intrinsic factors (e.g., the changes in emotion, language, vocal effect, speaking style, and aging, etc.) or extrinsic ones (e.g., background noise, transmission channel, microphone, room acoustics, and distance from the microphone, etc.). In this paper, an adversarial domain adaptation strategy is designed and applied to the X-Vector-based speaker verification scheme to enhance its domain adaptation ability. First, the X-Vector scheme is trained on the source dataset (AISHELL1). Then, the domain adaptation strategy is applied to the obtained X-Vector scheme for enabling it adapt to the target dataset (VoxCeleb1 or CN-Celeb). Finally, the performances of the X-Vector schemes obtained before and after adaptation are compared via the target dataset, from which it is demonstrated that the proposed adaptation strategy achieves 21.46% and 19.24% Equal Error Rate (EER) reduction on VoxCeleb1 and CN-Celeb dataset, respectively.

HTML全文

参考文献(21)

施引文献

资源附件(0)