高级检索

    季敏飞, 陈宁. 基于对抗网络的声纹识别域迁移算法[J]. 华东理工大学学报(自然科学版), 2022, 48(2): 231-236. DOI: 10.14135/j.cnki.1006-3080.20201209001
    引用本文: 季敏飞, 陈宁. 基于对抗网络的声纹识别域迁移算法[J]. 华东理工大学学报(自然科学版), 2022, 48(2): 231-236. DOI: 10.14135/j.cnki.1006-3080.20201209001
    JI Minfei, CHEN Ning. GAN-Based Domain Adaptation Algorithm for Speaker Verification[J]. Journal of East China University of Science and Technology, 2022, 48(2): 231-236. DOI: 10.14135/j.cnki.1006-3080.20201209001
    Citation: JI Minfei, CHEN Ning. GAN-Based Domain Adaptation Algorithm for Speaker Verification[J]. Journal of East China University of Science and Technology, 2022, 48(2): 231-236. DOI: 10.14135/j.cnki.1006-3080.20201209001

    基于对抗网络的声纹识别域迁移算法

    GAN-Based Domain Adaptation Algorithm for Speaker Verification

    • 摘要: 针对声纹识别任务中常常出现的由于真实场景语音与模型训练语料在内部特征(情感、语言、说话风格、年龄)或外部特征(背景噪声、传输信号、麦克风、室内混响)等方面的差异所导致的模型识别率低的问题,提出了一种基于对抗网络的声纹识别域迁移算法。首先,利用源域语音对X-Vector的声纹识别模型进行训练;然后,采用域迁移方法将源域训练的X-Vector模型迁移至目标域训练数据;最后,在目标域测试数据上检测迁移后的模型性能,并将其与迁移前的模型性能进行对比。实验中采用AISHELL1作为源域,采用VoxCeleb1和CN-Celeb分别作为目标域对算法性能进行测试。实验结果表明,采用本文方法进行迁移后,在VoxCeleb1和CN-Celeb的目标域测试集上的等错误率分别下降了21.46%和19.24%。

       

      Abstract: A key problem in speaker verification task is the condition mismatch between the training data and the testing data, which may significantly affect the verification performance. In most of the speaker recognition application scenarios, it is usually impossible to obtain enough samples to retrain the speaker recognition model. At the same time, the samples that is used to train the original model usually may be quite different from those obtained in real applications due to the variability caused by the intrinsic factors (e.g., the changes in emotion, language, vocal effect, speaking style, and aging, etc.) or extrinsic ones (e.g., background noise, transmission channel, microphone, room acoustics, and distance from the microphone, etc.). In this paper, an adversarial domain adaptation strategy is designed and applied to the X-Vector-based speaker verification scheme to enhance its domain adaptation ability. First, the X-Vector scheme is trained on the source dataset (AISHELL1). Then, the domain adaptation strategy is applied to the obtained X-Vector scheme for enabling it adapt to the target dataset (VoxCeleb1 or CN-Celeb). Finally, the performances of the X-Vector schemes obtained before and after adaptation are compared via the target dataset, from which it is demonstrated that the proposed adaptation strategy achieves 21.46% and 19.24% Equal Error Rate (EER) reduction on VoxCeleb1 and CN-Celeb dataset, respectively.

       

    /

    返回文章
    返回