Abstract:
A key problem in speaker verification task is the condition mismatch between the training data and the testing data, which may significantly affect the verification performance. In most of the speaker recognition application scenarios, it is usually impossible to obtain enough samples to retrain the speaker recognition model. At the same time, the samples that is used to train the original model usually may be quite different from those obtained in real applications due to the variability caused by the intrinsic factors (e.g., the changes in emotion, language, vocal effect, speaking style, and aging, etc.) or extrinsic ones (e.g., background noise, transmission channel, microphone, room acoustics, and distance from the microphone, etc.). In this paper, an adversarial domain adaptation strategy is designed and applied to the X-Vector-based speaker verification scheme to enhance its domain adaptation ability. First, the X-Vector scheme is trained on the source dataset (AISHELL1). Then, the domain adaptation strategy is applied to the obtained X-Vector scheme for enabling it adapt to the target dataset (VoxCeleb1 or CN-Celeb). Finally, the performances of the X-Vector schemes obtained before and after adaptation are compared via the target dataset, from which it is demonstrated that the proposed adaptation strategy achieves 21.46% and 19.24% Equal Error Rate (EER) reduction on VoxCeleb1 and CN-Celeb dataset, respectively.