高级检索

  • ISSN 1006-3080
  • CN 31-1691/TQ

基于宽深学习网络的音乐情感识别

王晶晶 黄如

王晶晶, 黄如. 基于宽深学习网络的音乐情感识别[J]. 华东理工大学学报(自然科学版). doi: 10.14135/j.cnki.1006-3080.20210225007
引用本文: 王晶晶, 黄如. 基于宽深学习网络的音乐情感识别[J]. 华东理工大学学报(自然科学版). doi: 10.14135/j.cnki.1006-3080.20210225007
WANG Jingjing, HUANG Ru. Music Emotion Recognition Based on the Broad and Deep Learning Network[J]. Journal of East China University of Science and Technology. doi: 10.14135/j.cnki.1006-3080.20210225007
Citation: WANG Jingjing, HUANG Ru. Music Emotion Recognition Based on the Broad and Deep Learning Network[J]. Journal of East China University of Science and Technology. doi: 10.14135/j.cnki.1006-3080.20210225007

基于宽深学习网络的音乐情感识别

doi: 10.14135/j.cnki.1006-3080.20210225007
基金项目: 国家自然科学基金(61673178,61922063);上海市自然科学基金(20ZR1413800)
详细信息
    作者简介:

    王晶晶(1995—),女,河南人,硕士生,主要研究方向为音频信号处理和物联网技术。E-mail:2448402346@qq.com

    通讯作者:

    黄 如,E-mail:huangrabbit@163.com

  • 中图分类号: TP391

Music Emotion Recognition Based on the Broad and Deep Learning Network

  • 摘要: 随着人工智能和数字音频技术的发展,音乐信息检索(Music Information Retrieval, MIR)逐渐成为研究热点。其中,音乐情感识别(Music Emotion Recognition, MER)对于视频配乐等具有很大的研究价值,成为重要的研究方向,但目前对其研究还较少。本文将梅尔频率倒谱系数(Mel Frequency Cepstral Coefficient, MFCC)和残差相位(Residual Phase, RP)进行加权结合来提取音乐情感特征,提高了音乐情感特征的挖掘效率。同时为了提高音乐情感的分类精度,缩短模型训练时间,将长短期记忆网络(Long Short-Term Memory, LSTM)和宽度学习系统(Broad Learning System, BLS)结合在一起,使用LSTM作为BLS的特征映射节点,搭建了一种新型宽深学习网络(LSTM-BLS)进行音乐情感识别分类训练。在Emotion数据集上的实验结果表明,本文算法取得了比其他复杂网络更高的识别准确率,对于音乐情感识别的发展提供了新的可行性思路。

     

  • 图  1  特征提取流程图

    Figure  1.  Flow diagram of feature extraction

    图  2  LSTM-BLS网络结构

    Figure  2.  Structure of LSTM-BLS model

    图  3  LSTM-BLS模型框图

    Figure  3.  Block diagram of LSTM-BLS model

    图  4  4种情感音乐的时序特征图

    Figure  4.  Timing features extracted from four music emotions

    图  5  不同LSTM层数的分类准确率比较

    Figure  5.  Classification accuracy comparison of different numbers of LSTM mapping nodes

    图  6  不同模型分类准确性分布比较

    Figure  6.  Comparison of classification accuracy distribution obtained by different schemes

    表  1  模型参数设置

    Table  1.   Parameters for models

    ModelParameters
    LSTM-BLS${k_1} = 40$,${k_2} = 80$
    OutputDim_L1=400
    OutputDim_L2=200
    ${N_1} = 10$,${N_{\rm{2}}} = 10$,${N_{\rm{3}}} = 10{\rm{0}}$
    LSTMOutputDim_L1=400
    OutputDim_L2=200
    OutputDim_L3=100
    BLS${N_1} = 10$,${N_{\rm{2}}} = 10$,${N_{\rm{3}}} = {\rm{5}}0{\rm{0}}$
    CCFBLS$F = {\rm{3}} \times {\rm{3}}$
    ${N_1} = 10$,${N_{\rm{2}}} = 10$,${N_{\rm{3}}} = {\rm{5}}0{\rm{0}}$
    下载: 导出CSV

    表  2  模型分类准确率比较

    Table  2.   Classification accuracy comparison of different models

    SchemeClassification accuracy /%
    CNN52.36±2.31
    LSTM56.17±3.53
    MCCLSTM[12]56.33±2.15
    MCCBL[12]55.81±2.64
    RCNNLSTM[12]57.33±3.03
    RCNNBL[12]59.56±2.16
    MCCLSTM+BLS60.71±1.39
    CCFBLS62.33±2.03
    LSTM-BLS66.78±2.12
    下载: 导出CSV

    表  3  模型训练效率比较

    Table  3.   Training efficiency comparison of different models

    SchemeTraining time/s
    MCCLSTM247.96
    RCNNLSTM615.57
    MCCLSTM+BLS285.65
    CCFBLS123.39
    LSTM-BLS169.32
    下载: 导出CSV
  • [1] 陈颖呈, 陈宁. 基于音频内容和歌词文本相似度融合的翻唱歌曲识别模型[J]. 华东理工大学学报(自然科学版), 2021, 47(1): 74-80.
    [2] WENINGER F, EYBEN F, SCHULLER B. On-line continuous-time music mood regression with deep recurrent neural networks[C]//IEEE International Conference on Acoustics, Speech and Signal Processing. Italy: IEEE, 2014: 5412-5416.
    [3] MARKOV K, Matsui T. Music genre and emotion recognition using Gaussian processes[J]. IEEE Access, 2014, 2: 688-697. doi: 10.1109/ACCESS.2014.2333095
    [4] CHEN S H, LEE Y S, HSIEH W C, et al. Music emotion recognition using deep Gaussian process[C]//2015Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA). Hong Kong, China: IEEE, 2015: 495-498.
    [5] LI X X, XIAN YU H, TIAN J, et al. A deep bidirectional long short-term memory based multi-scale approach for music dynamic emotion prediction[C]//2016IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP). China: IEEE, 2016: 544-548.
    [6] 魏琛, 陈兰岚, 张傲. 基于集成卷积神经网络的脑电情感识别[J]. 华东理工大学学报(自然科学版), 2019, 45(4): 614-622.
    [7] 宋振振, 陈兰岚, 娄晓光. 基于时序卷积网络的情感识别算法[J]. 华东理工大学学报(自然科学版), 2020, 46(4): 564-572.
    [8] SARKAR R, CHOUDHURY S, DUTTA S, et al. Recognition of emotion in music based on deep convolutional neural network[J]. Multimedia Tools and Applications, 2020, 79(10): 765-783.
    [9] 唐霞, 张晨曦, 李江峰. 基于深度学习的音乐情感识别[J]. 电脑知识与技术, 2019, 15(11): 232-237.
    [10] ISSA D, DEMIRCI M F, YAZICI A. Speech emotion recognition with deep convolutional neural networks[J]. Biomedical Signal Processing and Control, 2020, 59: 101894. doi: 10.1016/j.bspc.2020.101894
    [11] NALINI N J, PALANIVE S. Music emotion recognition: The combined evidence of MFCC and residual phase[J]. Egyptian Informatics Journal, 2016, 17(1): 1-10. doi: 10.1016/j.eij.2015.05.004
    [12] TANG H, CHEN N. Combining CNN and broad learning for music classification[J]. IEICE Transactions on Information and Systems, 2020, E103. D(3): 695-703. doi: 10.1587/transinf.2019EDP7175
    [13] CHEN P C L, LIU Z L, FENG S. Universal approximation capability of broad learning system and its structural variations[J]. IEEE Transactions on Neural Networks and Learning Systems, 2019, 30(4): 1191-1204. doi: 10.1109/TNNLS.2018.2866622
    [14] MARIUS K, FRANCESCO R. Contextual music information retrieval and recommendation: State of the art and challenges[J]. Computer Science Review, 2012, 6(2/3): 89-119.
    [15] KOOLAGUDI S G, RAO K S. Emotion recognition from speech: A review[J]. International Journal of Speech Technology, 2012, 15: 99--117. doi: 10.1007/s10772-011-9125-1
    [16] CHEN PHILIP C L, LIU Z L. Broad learning system: An effective and efficient incremental learning system without the need for deep architecture[J]. IEEE Transactions on Neural Networks and Learning Systems, 2018, 29(1): 10-24. doi: 10.1109/TNNLS.2017.2716952
    [17] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735-1780. doi: 10.1162/neco.1997.9.8.1735
    [18] CHEN N, WANG S. High-level music descriptor extraction algorithm based on combination of multi-channel CNNs and LSTM[C]// 18th International Society of Music Information Retrieval (ISMIR). Suzhou, China: National University of Singapore, 2017: 509-514.
    [19] PONS J, LIDY T, SERRA X. Experimenting with musically motivated convolutional neural networks[C]//14th IEEE International Workshop on Content-Based Multimedia Indexing (CBMI). Romania: IEEE, 2016: 1-6.
  • 加载中
图(6) / 表(3)
计量
  • 文章访问数:  64
  • HTML全文浏览量:  31
  • PDF下载量:  9
  • 被引次数: 0
出版历程
  • 收稿日期:  2021-02-25
  • 网络出版日期:  2021-06-16

目录

    /

    返回文章
    返回