Abstract:
Gender identification is a quite important task in speaker verification and can also be used as an auxiliary tool in automatic speech recognition (ASR) to improve model performance. In order to increase the accuracy of gender identification, some schemes based on deep learning have been recently reported. However, compared with the acoustic conditioned data in training, speech data in the actual application scenarios is usually masked by the background noise, such as music, environmental noise, background chatter, etc. Thus, the performance of gender identification model based on audio is seriously degraded due to the great difference between the actual speech data and the model training data. In order to solve this problem, we propose a domain adaptive model via combining generative adversarial network(GAN) and GhostVLAD layer. The introduction of GhostVLAD can effectively reduce the interference of noise and irrelevant information in speech and the training method based on GAN can realize the adaptation of the model to the target domain data. During the confrontation training, auxiliary loss is introduced to maintain the representation ability of gender characteristics. Finally, by Voxceleb1 data set as the source domain, Audioset and Movie data set as the target domain, the performance of the domain adaptive model is tested, from which it is shown that compared with the gender recognition model based on convolution neural network, this model can improve the accuracy of gender recognition by 5.13% and 7.72% , respectively.