高级检索

    汪斌, 陈宁. 基于残差注意力U-Net结构的端到端歌声分离模型[J]. 华东理工大学学报(自然科学版), 2021, 47(5): 619-626. DOI: 10.14135/j.cnki.1006-3080.20200903001
    引用本文: 汪斌, 陈宁. 基于残差注意力U-Net结构的端到端歌声分离模型[J]. 华东理工大学学报(自然科学版), 2021, 47(5): 619-626. DOI: 10.14135/j.cnki.1006-3080.20200903001
    WANG Bin, CHEN Ning. An End-to-End Singing Voice Separation Model Based on Residual Attention U-Net[J]. Journal of East China University of Science and Technology, 2021, 47(5): 619-626. DOI: 10.14135/j.cnki.1006-3080.20200903001
    Citation: WANG Bin, CHEN Ning. An End-to-End Singing Voice Separation Model Based on Residual Attention U-Net[J]. Journal of East China University of Science and Technology, 2021, 47(5): 619-626. DOI: 10.14135/j.cnki.1006-3080.20200903001

    基于残差注意力U-Net结构的端到端歌声分离模型

    An End-to-End Singing Voice Separation Model Based on Residual Attention U-Net

    • 摘要: 歌声分离是音乐信息检索领域最具挑战的任务之一,本文对基于Wave-U-Net的歌声分离模型进行了改进以增强其性能。首先,在Wave-U-Net的编码和解码块中设计并引入了残差单元以增强其特征提取的有效性和训练效率;然后,在Wave-U-Net的跳跃连接部分设计并引入了注意力门控机制以减少从编码块对应层提取的特征和来自解码块上一层特征之间的语义鸿沟。在MUSDB18数据集上的实验结果表明:本文提出的RA-WaveUNet模型在分离性能上优于传统的Wave-U-Net模型;采用残差单元和注意力门控机制均有助于提高模型的性能。

       

      Abstract: The music source separation is to separate a piece of music into its individual sounds. As a specific case, the Singing Voice Separation (SVS) separates the music into vocals and accompaniment. Due to its potential applications in music melody extraction, music genre classification, singing voice detection, and singer identification, etc, SVS has been becoming a hot topic in the music information retrieval field in recent years. It is recently reported that a variety of convolutional neural network architectures based on U-Net has been successfully employed for the SVS task and the better performance can be achieved. Besides, Wave-U-Net is proposed to achieve the end-to-end SVS by analyzing the music waveform directly. However, the performance of the SVS approaches in the time-domain relies heavily on the quality of the feature extraction procedure. In this paper, the conventional Wave-U-Net based SVS scheme is modified to enhance its performance. Firstly, at the encoding and decoding blocks, a residual unit is designed and adopted to replace the plain neural unit to solve the degradation problem to some extent. Secondly, at the skip connection, an attention gate mechanism is introduced to reduce the semantic gap between the output of the previous layer in the decoding block and the one of the corresponding layer in the encoding block. To verify the effectiveness of the proposed scheme, termed as RA-WaveUNet, in the SVS task, its performances are compared with those of state-of-the-art schemes on the maximum open dataset MUSDB18. It is demonstrated from experimental results that the proposed scheme can achieve better performances than Wave-U-Net based ones and other SVS schemes. Moreover, both the above modifications contribute to the performance enhancement.

       

    /

    返回文章
    返回