基于音频内容和歌词文本相似度融合的翻唱歌曲识别模型

陈颖呈; 陈宁

doi:10.14135/j.cnki.1006-3080.20191029002

基于音频内容和歌词文本相似度融合的翻唱歌曲识别模型

陈颖呈,
陈宁

Cover Song Identification Model Based on Similarity Fusion of Audio Content and Lyrics Text

摘要

摘要: 翻唱歌曲识别是音乐信息检索(Music Information Retrieval, MIR)领域最具挑战性的任务之一。为了提高翻唱歌曲识别(Cover Song Identification, CSI)的准确率，研究者提出了多音频特征相似度张量积图融合的翻唱识别模型，但相似度高维空间几何结构的学习大幅度增加了模型的时间复杂度，同时该模型没有考虑歌词对翻唱识别的重要性。本文提出了基于音频内容和歌词文本相似度融合的翻唱识别模型。采用深度学习的方法分别提取音频特征和歌词特征，并采用相似度网络融合模型对这两种特征的相似度进行融合。为了验证算法的有效性，构建了Covers2326多模态数据库。实验结果表明，与基于多音频特征相似度张量积图融合模型相比，本文模型取得了更高的识别准确率和更低的时间复杂度。

Abstract: Cover song identification is one of challenging tasks in music information retrieval (MIR). To the recognition accuracy of the cover song identification, some researchers have recently proposed the song identification scheme based on non-linear graph fusion and tensor product graphs (TPGs) diffusion, which can reduce the influence of noise by learning the geometry structure of similarity high dimensional space. However, this kind of model has the following problems. Firstly, the audio features, e.g., harmonic pitch class profile (HPCP), main melody (MLD), and beat-synchronous chroma (BSC), are all the hand-crafting ones, and it is difficult to convey the nonlinear structure of music. Secondly, this algorithm only considers the audio content and ignores the importance of lyrics information in cover song identification. Thirdly, the learning on the geometry structure of similarity high dimensional space will greatly increase the computation complexity. To deal with these above problems, this paper proposes a cover song identification algorithm based on the audio content and the lyrics text. To extract the high-level audio features, the deep pitch class profile (DPCP) scheme is introduced to analyze the audio content. By adopting the bi-directional long short-term memory (BiLSTM) architecture and TF·IDF scheme, the contextual semantics and the term frequency in song lyrics can be extracted. Besides, both the audio similarity and the similarity of lyrics are fused by the similarity network fusion (SNF) algorithm to get the final similarity. Finally, the Covers2326 dataset is constructed to verify the effectiveness of the proposed algorithm. It is shown via the experiment results that, compared with the existing algorithm, the proposed algorithm can improve the identification accuracy and the computational speed.

HTML全文

参考文献(19)

施引文献

资源附件(0)