基于宽深学习网络的音乐情感识别

王晶晶; 黄如

doi:10.14135/j.cnki.1006-3080.20210225007

基于宽深学习网络的音乐情感识别

王晶晶,
黄如

Music Emotion Recognition Based on the Broad and Deep Learning Network

摘要

摘要: 将梅尔频率倒谱系数(Mel Frequency Cepstral Coefficient, MFCC)和残差相位(Residual Phase, RP)进行加权结合来提取音乐情感特征，提高了音乐情感特征的挖掘效率；同时为了提高音乐情感的分类精度，缩短模型训练时间，将长短期记忆网络(Long Short-Term Memory, LSTM)和宽度学习系统(Broad Learning System, BLS)相结合，使用LSTM作为BLS的特征映射节点，搭建了一种新型宽深学习网络（LSTM-BLS）进行音乐情感识别分类训练。在Emotion数据集上的实验结果表明，本文算法取得了比其他复杂网络更高的识别准确率，为音乐情感识别的发展提供了新的可行性思路。

Abstract: With the development of artificial intelligence and digital audio technology, music information retrieval (MIR) has gradually become a research hotspot. Meanwhile, music emotion recognition (MER) is becoming an important research direction, due to its great research value for video soundtracks. Although some researchers combine Mel Frequency Cepstral coefficient (MFCC) and Residual Phase (RP) to extract music emotional features and improve classification accuracy, the training models in traditional deep learning takes longer time. In order to improve the efficiency of feature mining of music emotional features, MFCC and RP are weighted and combined in this work to extract music emotion features so that the mining efficiency of music emotion features can be effectively improved. At the same time, in order to improve the classification accuracy of music emotion and shorten the training time of the model, by integrating the Long Short-Term Memory (LSTM) and the Broad Learning System (BLS), a new wide and deep learning network (LSTM-BLS) is further built to train music emotion recognition and classification by using LSTM as the feature mapping node of BLS. The network structure of this model makes full use of the ability of BLS to quickly process complex data. Its advantages are simple structure and short model training time, thereby improving recognition efficiency, and LSTM has excellent performance in extracting time series features from time series data. The time sequence relationship of music can be extracted so that the emotional characteristics of the music can be preserved to the greatest extent. Finally, the experimental results on the emotion dataset show that the proposed algorithm can achieve higher recognition accuracy than other complex networks and provide new feasible ideas for the music emotion recognition.

HTML全文

参考文献(19)

施引文献

资源附件(0)