基于注意力机制的多任务3D CNN-BLSTM情感语音识别

姜特; 陈志刚; 万永菁

doi:10.14135/j.cnki.1006-3080.20210326001

基于注意力机制的多任务3D CNN-BLSTM情感语音识别

Multi-Task Learning 3D CNN-BLSTM with Attention Mechanism for Speech Emotion Recognition

摘要

摘要: 语音情感识别广泛应用于车载驾驶系统、服务行业、教育以及医疗等各个领域。为了使计算机能更准确地识别出说话人的情感，提出了一种基于注意力机制的多任务三维卷积神经网络(Convolution Neural Network, CNN)和双向长短期记忆网络（Bidirectional Long-Short Term Memory, BLSTM）相结合的情感语音识别方法（3D CNN-BLSTM）。基于多谱特征融合组图，利用三维卷积神经网络提取深层语音情感特征，结合性别分类的多任务学习机制提升语音情感识别准确率。在CASIA汉语情感语料库上的实验结果表明，该方法获得了较高的准确率。

Abstract: Speech emotion recognition has been widely used in various fields such as vehicle driving systems, service industries, education, and medical care. In order to make the computer recognize the speaker's emotion more accurately, this paper proposes an emotional speech recognition method based on the combination of multi-task 3D convolutional neural network and bidirectional long-short term memory network with attention mechanism. Based on the multi-spectral feature fusion group map, the deep speech emotion features are extracted by three-dimensional convolutional neural network, and the multi-task learning mechanism of gender classification is combined to improve the accuracy of speech emotion recognition. Finally, experimental results show that the proposed model can attain higher accuracy on CASIA Chinese emotional corpus.

HTML全文

参考文献(23)

施引文献

资源附件(0)