高级检索

    杨妹, 陈宁. 基于深度学习和手工设计特征融合的翻唱歌曲识别模型[J]. 华东理工大学学报(自然科学版), 2018, (5): 752-759. DOI: 10.14135/j.cnki.1006-3080.20170704003
    引用本文: 杨妹, 陈宁. 基于深度学习和手工设计特征融合的翻唱歌曲识别模型[J]. 华东理工大学学报(自然科学版), 2018, (5): 752-759. DOI: 10.14135/j.cnki.1006-3080.20170704003
    YANG Mei, CHEN Ning. Cover Song Identification Based on Fusion of Deep Learning and Manual Design Features[J]. Journal of East China University of Science and Technology, 2018, (5): 752-759. DOI: 10.14135/j.cnki.1006-3080.20170704003
    Citation: YANG Mei, CHEN Ning. Cover Song Identification Based on Fusion of Deep Learning and Manual Design Features[J]. Journal of East China University of Science and Technology, 2018, (5): 752-759. DOI: 10.14135/j.cnki.1006-3080.20170704003

    基于深度学习和手工设计特征融合的翻唱歌曲识别模型

    Cover Song Identification Based on Fusion of Deep Learning and Manual Design Features

    • 摘要: 在翻唱歌曲识别中,手工设计的特征虽然具有高可定制性,但其采用的浅层线性结构难以表现音乐的非线性长效结构,而采用基于深度学习的特征提取算法分析音乐的非线性动力学特性可以弥补这一缺陷。本文在研究两者互补性的基础上,提出了一种融合手工特征和深度特征的翻唱歌曲识别算法。该算法分别采用深度学习模型和手工设计算法提取歌曲的音级轮廓特征和旋律特征,然后将基于这两种特征的相似度组合成相似度向量输入到改进的SVM模型中,并将输入歌曲属于翻唱组合的概率作为融合相似度。为了验证算法性能,以两个公开的数据库(covers80,covers1212)作为测试对象进行测试,实验结果表明该算法比基于单个特征的算法和基于相似度融合的算法取得了更高的识别率和分类准确率。

       

      Abstract: Since the cover version may differ from the original version in various respects, such as timbre, tempo, structure, key, arrangement, and even the language of the vocals, it will be a challenging work for automatically identifying all cover versions for a given original version. Most of the conventional cover song identification (CSI) schemes adopt hand-crafted features, which are highly customizable and effective. However, their shallow processing strategy and linear mapping cannot precisely describe the complex dynamic characteristics contained in the music. To deal with this problem, the deep-learning architecture has been recently introduced in some music feature extraction algorithms for achieving good results. However, it is noted that the performance of the deep-learning based schemes totally depend on the size of the training set such that the easily fall into local optimum. In this paper, by analyzing the complementarity between the hand-craft feature and deep-learning feature by experiment, we propose a feature fusion model. Firstly, a deep learning model is trained to extract deep pitch class profile (DPCP) feature. Meanwhile, a hand-crafted model is utilized to extract the main melody (MLD) feature. And then, the DPCP-based similarity score and MLD-based one are calculated via Dmax and the similarity scores are used to construct a similarity function. Furthermore, the two similarity scores are used to construct a similarity vector, by which an improved support vector machine (SVM) is given to obtain the probability that the input track pair belongs to reference/cover pair. Finally, in terms of the receiver operating characteristic (ROC) curve and the area under curve (AUC), the proposed model is compared with the state-of-the-art CSI schemes based on single feature and multiple features, respectively. It is shown from experimental results that the proposed scheme outperforms the CSI schemes based on hand-crafted feature and deep learning feature, respectively, and has the common and complementary properties in hand-crafted feature and deep-learning feature.

       

    /

    返回文章
    返回