词典信息分层调整的中文命名实体识别方法

李宝昌; 郭卫斌

doi:10.14135/j.cnki.1006-3080.20211105003

词典信息分层调整的中文命名实体识别方法

Chinese Named Entity Recognition Based on Hierarchical Adjustment of Lexicon Information

摘要

摘要: 在中文命名实体识别任务中，字信息融合词汇信息能丰富文本特征，但一个字可能对应多个候选词汇，容易产生词汇冲突，融合无关词汇信息会影响模型的识别效果，对此提出了词典信息分层调整的中文命名实体识别方法。首先将所有潜在词语按照词语长度进行分层，通过高层词语反馈调整低层词语的权重来保留更有用的信息，以此缓解语义偏差问题和降低词汇冲突影响；然后将词汇信息拼接到字信息来增强文本特征表示。在Resume和Weibo数据集上的实验结果表明，本文方法与传统方法相比具有更优的效果。

Abstract: In the task of Chinese named entity recognition, although the fusion of word information and vocabulary information can enrich text features, a word may correspond to multiple candidate words such that the vocabulary conflict is easily caused. Moreover, the fusion of irrelevant vocabulary information will affect the recognition effect of the model. Aiming the above shortcoming, this paper proposes a Chinese named entity recognition method based on hierarchical adjustment of dictionary information. Firstly, all potential words are layered according to the length of words, and the weight of low-level words is adjusted through high-level word feedback to retain more useful information, so as to alleviate the problem of semantic deviation and reduce the impact of word conflict. Then, the word information is spliced into the word information to enhance the representation of text feature. It is shown via experiments on Resume and Weibo data sets that the proposed method has better effect than the traditional method.

HTML全文

参考文献(24)

施引文献

资源附件(0)