高级检索

    中文重叠关系抽取的动态分层级联标记模型

    Dynamic Hierarchical Cascade Tagging Model for Chinese Overlapping Relation Extraction

    • 摘要: 构建了动态分层级联标记中文重叠关系抽取(RWG-LSA)模型:首先基于预训练语言模型和gated机制构建了动态字词融合特征学习模型(RWG),有效避免了主体标记模块的特征缺失和无法并行计算等问题;其次引入动态权局部自注意力(LSA),自主学习到主体层面的语义特征;最后在有效融合了输入序列的全局和主体局部特征的基础上,实现RWG-LSA模型对文本中实体对和关系的抽取。在SKE中文数据集上的实验表明,本模型对重叠关系抽取有显著效果,F1值达到了82.44%。

       

      Abstract: Relation extraction is a key task in text data mining, whose technical goal is to mine triplet information composed of entities and semantic relations between them. The problem of overlapping relationships is currently a difficult issue in the field of relationship extraction. The hierarchical cascading tagging strategy has shown remarkable performance in solving this problem. However, BERT (Bidirectional Encoder Respresentations from Transformers) is only used as the feature input for the subject labeling module, and no relevant feature mining is performed on the identified subjects, resulting in the problem of missing features. Aiming at the above shortcoming, this paper proposes a dynamic hierarchical cascade tagging model for Chinese overlapping relation extraction (RWG-LSA). Firstly, a dynamic character-word fusion feature learning model (RWG) is constructed based on RoBERTa-wwm-ext (A Robustly Optimized BERT Pre-training Approach-whole word masking-extended), WoBERT (Word BERT) and gated mechanism, effectively avoiding the problems of missing features in the subject tagging module and inability to perform parallel computation. Secondly, dynamic weight local self-attention (LSA) is introduced to autonomously learn subject-level semantic information. Finally, on the basis of effectively integrating the global and local features of the input text, the RWG-LSA model is implemented to extract entity pairs and relationships from the text. The experimental results on the SKE Chinese dataset show that this model is significantly effective in extracting overlapping relationships, with an F1 value of 82.44%.

       

    /

    返回文章
    返回