面向性别识别的基于GAN的域自适应模型

吕乔健; 陈宁

doi:10.14135/j.cnki.1006-3080.20210104002

面向性别识别的基于GAN的域自适应模型

吕乔健,
陈宁

GAN-Based Domain Adaptation Model for Gender Identification

摘要

摘要: 在实际应用场景中，由于实际语音数据与模型训练数据存在较大差异，导致基于音频的性别识别模型的性能严重下降。为了解决这一问题，提出了一种结合生成对抗网络（GAN）和GhostVLAD层的域自适应模型。基于GhostVLAD的引入可有效减少语音中噪声和无关信息的干扰，而基于GAN思想的训练方法可以实现模型对目标域数据的自适应。在对抗训练中，通过引入辅助损失保持网络对性别特征的表征能力。采用Voxceleb1数据集作为源域，Audioset和Movie数据集分别作为目标域，对本文的域自适应模型的性能进行测试实验。实验结果表明，相比于基于卷积神经网络的性别识别模型，本文模型可将性别识别的准确率分别提高5.13%和7.72%。

Abstract: Gender identiﬁcation is a quite important task in speaker verification and can also be used as an auxiliary tool in automatic speech recognition (ASR) to improve model performance. In order to increase the accuracy of gender identification, some schemes based on deep learning have been recently reported. However, compared with the acoustic conditioned data in training, speech data in the actual application scenarios is usually masked by the background noise, such as music, environmental noise, background chatter, etc. Thus, the performance of gender identification model based on audio is seriously degraded due to the great difference between the actual speech data and the model training data. In order to solve this problem, we propose a domain adaptive model via combining generative adversarial network(GAN) and GhostVLAD layer. The introduction of GhostVLAD can effectively reduce the interference of noise and irrelevant information in speech and the training method based on GAN can realize the adaptation of the model to the target domain data. During the confrontation training, auxiliary loss is introduced to maintain the representation ability of gender characteristics. Finally, by Voxceleb1 data set as the source domain, Audioset and Movie data set as the target domain, the performance of the domain adaptive model is tested, from which it is shown that compared with the gender recognition model based on convolution neural network, this model can improve the accuracy of gender recognition by 5.13% and 7.72% , respectively.

HTML全文

参考文献(22)

施引文献

资源附件(0)