基于TF-IDF和多头注意力Transformer模型的文本情感分析

高佳希; 黄海燕

doi:10.14135/j.cnki.1006-3080.20221218002

基于TF-IDF和多头注意力Transformer模型的文本情感分析

Text Emotion Analysis Based on TF-IDF and Multihead Attention Transformer Model

摘要

摘要: 文本情感分析旨在对带有情感色彩的主观性文本进行分析、处理、归纳和推理，是自然语言处理中一项重要任务。针对现有的计算方法不能充分处理复杂度和混淆度较高的文本数据集的问题，提出了一种基于TF-IDF（Term Frequency-Inverse Document Frequency）和多头注意力Transformer模型的文本情感分析模型。在文本预处理阶段，利用 TF-IDF算法对影响文本情感倾向较大的词语进行初步筛选，舍去常见的停用词及其他文本所属邻域对文本情感倾向影响较小的专有名词。然后，利用多头注意力Transformer模型编码器进行特征提取，抓取文本内部重要的语义信息，提高模型对语义的分析和泛化能力。该模型在多领域、多类型评论语料库数据集上取得了98.17%的准确率。

Abstract: Text emotion analysis is an important task in natural language processing. Aiming at the problem that the existing calculation methods can not fully deal with the text datasets with high complexity and confusion, a text emotion analysis model based on TF-IDF (Term Frequency-Inverse Document Frequency) and multihead attention Transformer model is proposed. In the text pre-processing stage, TF-IDF algorithm is used to preliminarily screen words that have a greater impact on the text's emotional orientation, leaving out common stop words and proper nouns that have less impact on the text's emotional orientation from the neighborhood of other texts. After that, the multihead attention Transformer model encoder is used for feature extraction to capture the important semantic information inside the text, which improves the model's semantic analysis and generalization ability. The experimental results show that this model achieves 98.17% accuracy in the multi-field and multi-type comment corpus dataset, which is significantly improved compared with other groups of comparative models.

HTML全文

参考文献(18)

施引文献

资源附件(0)