Abstract:
Cover song identification is one of challenging tasks in music information retrieval (MIR). To the recognition accuracy of the cover song identification, some researchers have recently proposed the song identification scheme based on non-linear graph fusion and tensor product graphs (TPGs) diffusion, which can reduce the influence of noise by learning the geometry structure of similarity high dimensional space. However, this kind of model has the following problems. Firstly, the audio features, e.g., harmonic pitch class profile (HPCP), main melody (MLD), and beat-synchronous chroma (BSC), are all the hand-crafting ones, and it is difficult to convey the nonlinear structure of music. Secondly, this algorithm only considers the audio content and ignores the importance of lyrics information in cover song identification. Thirdly, the learning on the geometry structure of similarity high dimensional space will greatly increase the computation complexity. To deal with these above problems, this paper proposes a cover song identification algorithm based on the audio content and the lyrics text. To extract the high-level audio features, the deep pitch class profile (DPCP) scheme is introduced to analyze the audio content. By adopting the bi-directional long short-term memory (BiLSTM) architecture and TF·IDF scheme, the contextual semantics and the term frequency in song lyrics can be extracted. Besides, both the audio similarity and the similarity of lyrics are fused by the similarity network fusion (SNF) algorithm to get the final similarity. Finally, the Covers2326 dataset is constructed to verify the effectiveness of the proposed algorithm. It is shown via the experiment results that, compared with the existing algorithm, the proposed algorithm can improve the identification accuracy and the computational speed.