Abstract:
With the development of artificial intelligence and digital audio technology, music information retrieval (MIR) has gradually become a research hotspot. Meanwhile, music emotion recognition (MER) is becoming an important research direction, due to its great research value for video soundtracks. Although some researchers combine Mel Frequency Cepstral coefficient (MFCC) and Residual Phase (RP) to extract music emotional features and improve classification accuracy, the training models in traditional deep learning takes longer time. In order to improve the efficiency of feature mining of music emotional features, MFCC and RP are weighted and combined in this work to extract music emotion features so that the mining efficiency of music emotion features can be effectively improved. At the same time, in order to improve the classification accuracy of music emotion and shorten the training time of the model, by integrating the Long Short-Term Memory (LSTM) and the Broad Learning System (BLS), a new wide and deep learning network (LSTM-BLS) is further built to train music emotion recognition and classification by using LSTM as the feature mapping node of BLS. The network structure of this model makes full use of the ability of BLS to quickly process complex data. Its advantages are simple structure and short model training time, thereby improving recognition efficiency, and LSTM has excellent performance in extracting time series features from time series data. The time sequence relationship of music can be extracted so that the emotional characteristics of the music can be preserved to the greatest extent. Finally, the experimental results on the emotion dataset show that the proposed algorithm can achieve higher recognition accuracy than other complex networks and provide new feasible ideas for the music emotion recognition.