高级检索

    高佳希, 黄海燕. 基于TF-IDF和多头注意力Transformer模型的文本情感分析[J]. 华东理工大学学报(自然科学版), 2024, 50(1): 129-136. DOI: 10.14135/j.cnki.1006-3080.20221218002
    引用本文: 高佳希, 黄海燕. 基于TF-IDF和多头注意力Transformer模型的文本情感分析[J]. 华东理工大学学报(自然科学版), 2024, 50(1): 129-136. DOI: 10.14135/j.cnki.1006-3080.20221218002
    GAO Jiaxi, HUANG Haiyan. Text Emotion Analysis Based on TF-IDF and Multihead Attention Transformer Model[J]. Journal of East China University of Science and Technology, 2024, 50(1): 129-136. DOI: 10.14135/j.cnki.1006-3080.20221218002
    Citation: GAO Jiaxi, HUANG Haiyan. Text Emotion Analysis Based on TF-IDF and Multihead Attention Transformer Model[J]. Journal of East China University of Science and Technology, 2024, 50(1): 129-136. DOI: 10.14135/j.cnki.1006-3080.20221218002

    基于TF-IDF和多头注意力Transformer模型的文本情感分析

    Text Emotion Analysis Based on TF-IDF and Multihead Attention Transformer Model

    • 摘要: 文本情感分析旨在对带有情感色彩的主观性文本进行分析、处理、归纳和推理,是自然语言处理中一项重要任务。针对现有的计算方法不能充分处理复杂度和混淆度较高的文本数据集的问题,提出了一种基于TF-IDF(Term Frequency-Inverse Document Frequency)和多头注意力Transformer模型的文本情感分析模型。在文本预处理阶段,利用 TF-IDF算法对影响文本情感倾向较大的词语进行初步筛选,舍去常见的停用词及其他文本所属邻域对文本情感倾向影响较小的专有名词。然后,利用多头注意力Transformer模型编码器进行特征提取,抓取文本内部重要的语义信息,提高模型对语义的分析和泛化能力。该模型在多领域、多类型评论语料库数据集上取得了98.17%的准确率。

       

      Abstract: Text emotion analysis is an important task in natural language processing. Aiming at the problem that the existing calculation methods can not fully deal with the text datasets with high complexity and confusion, a text emotion analysis model based on TF-IDF (Term Frequency-Inverse Document Frequency) and multihead attention Transformer model is proposed. In the text pre-processing stage, TF-IDF algorithm is used to preliminarily screen words that have a greater impact on the text's emotional orientation, leaving out common stop words and proper nouns that have less impact on the text's emotional orientation from the neighborhood of other texts. After that, the multihead attention Transformer model encoder is used for feature extraction to capture the important semantic information inside the text, which improves the model's semantic analysis and generalization ability. The experimental results show that this model achieves 98.17% accuracy in the multi-field and multi-type comment corpus dataset, which is significantly improved compared with other groups of comparative models.

       

    /

    返回文章
    返回