高级检索

  • ISSN 1006-3080
  • CN 31-1691/TQ

基于RoBERTa和对抗训练的中文医疗命名实体识别

郭瑞 张欢欢

郭瑞, 张欢欢. 基于RoBERTa和对抗训练的中文医疗命名实体识别[J]. 华东理工大学学报(自然科学版). doi: 10.14135/j.cnki.1006-3080.20210909003
引用本文: 郭瑞, 张欢欢. 基于RoBERTa和对抗训练的中文医疗命名实体识别[J]. 华东理工大学学报(自然科学版). doi: 10.14135/j.cnki.1006-3080.20210909003
GUO Rui, ZHANG Huanhuan. Chinese Medical Named Entity Recognition Based on RoBERTa and Adversarial Training[J]. Journal of East China University of Science and Technology. doi: 10.14135/j.cnki.1006-3080.20210909003
Citation: GUO Rui, ZHANG Huanhuan. Chinese Medical Named Entity Recognition Based on RoBERTa and Adversarial Training[J]. Journal of East China University of Science and Technology. doi: 10.14135/j.cnki.1006-3080.20210909003

基于RoBERTa和对抗训练的中文医疗命名实体识别

doi: 10.14135/j.cnki.1006-3080.20210909003
详细信息
    作者简介:

    郭瑞:作者简介:郭 瑞(1996-),女,河南兰考人,硕士生,主要研究方向为自然语言处理。E-mail:hedgehog_r@163.com

    通讯作者:

    张欢欢,E-mail:hzhang@ecust.edu.cn

  • 中图分类号: TP391.1

Chinese Medical Named Entity Recognition Based on RoBERTa and Adversarial Training

  • 摘要: BERT(Bidirectional Encoder Representations from Transformers)和神经网络模型相结合的方法目前已被广泛应用于中文医疗命名实体识别领域。但BERT在中文中是以字为粒度切分的,没有考虑到中文分词。而神经网络模型往往是局部不稳定的,即使微小的扰动也可能误导它们,导致模型的鲁棒性差。为了解决这两个问题,提出了一种基于RoBERTa(A Robustly Optimized BERT Pre-training Approach)和对抗训练的中文医疗命名实体识别模型(AT-RBC)。首先,使用RoBERTa-wwm-ext-large(A Robustly Optimized BERT Pre-training Approach-whole word masking-extended data-large)预训练模型得到输入文本的初始向量表示;其次,在初始向量表示上添加一些扰动来生成对抗样本;最后,将初始向量表示和对抗样本一同依次输入双向长短期记忆网络和条件随机场中,得到最终的预测结果。在CCKS 2019数据集上的实验结果表明,AT-RBC模型的F1值达到了88.96%;在Resume数据集上的实验结果表明,AT-RBC模型的F1值也达到了97.14%,证明了该模型的有效性。

     

  • 图  1  AT-RBC模型图

    Figure  1.  AT-RBC model diagram

    图  2  $ \alpha $不同取值的影响

    Figure  2.  Impact of different values of $ \alpha $

    表  1  BERT以及RoBERTa-wwm-ext-large的掩码策略

    Table  1.   Masking strategy of BERT and RoBERTa-wwm-ext-large

    IllustrationSample
    Original text胃溃疡的症状为
    Segmented text胃溃疡 的 症状 为
    BERT's masking strategy胃 溃 [MASK] 的 [MASK] 状 为
    RoBERTa-wwm-ext-large's whole word masking strategy[MASK] [MASK] [MASK] 的 [MASK] [MASK] 为
    下载: 导出CSV

    表  2  CCKS 2019数据集实体类型及数目

    Table  2.   Types and numbers of entities in CCKS 2019 dataset

    Entity typeTraining setValidation setTest set
    Disease and diagnosis36455671808
    Operation908121162
    Drug1593229485
    Anatomy715812683094
    Image inspection88881348
    Laboratory inspection991204590
    下载: 导出CSV

    表  3  Resume数据集实体类型及数目

    Table  3.   Types and numbers of entities in Resume dataset

    Entity typeTraining setValidation setTest set
    Citizenship2603328
    Educational institution858106112
    Address4726
    Person name952110112
    Organization name4611523553
    Specialty2871833
    Nation1151514
    Job title6308690772
    下载: 导出CSV

    表  4  与现有方法在CCKS 2019数据集上的比较

    Table  4.   Comparison with existing methods on CCKS 2019 dataset

    ModelP/%R/%F1/%
    Word2Vec+BiLSTM+CRF[16]73.6169.9071.71
    笔画ELMo+BiLSTM+CRF[5]--85.16
    BERT+BiLSTM+CRF[6]87.3785.8086.57
    BERT-wwm+BiLSTM+CRF[18]87.3386.7587.04
    RoBERTa-wwm-ext-large+BiLSTM+CRF87.9588.1888.06
    AT-RBC88.5289.4188.96
    下载: 导出CSV

    表  5  不同模型对各实体及总体的F1值预测结果

    Table  5.   The prediction results of the F1 value for each entity and the whole by different models

    Entity type F1/%
    Word2Vec
    +BiLSTM
    +CRF[16]
    笔画ELMo
    +BiLSTM
    +CRF[5]
    BERT
    +BiLSTM
    +CRF[6]
    BERT-wwm
    +BiLSTM
    +CRF[18]
    RoBERTa-wwm-ext-large
    +BiLSTM
    +CRF
    AT-RBC
    Disease and diagnosis67.3482.8185.4186.0085.9688.15
    Operation77.9786.7982.7485.4585.9987.42
    Drug71.1894.4986.4189.3993.4093.68
    Anatomy72.5085.9988.9788.7289.6390.08
    Image inspection81.5488.0183.2884.8186.9786.34
    Laboratory inspection68.4775.6578.6679.6680.3482.94
    Overall71.7185.1686.5787.0488.0688.96
    下载: 导出CSV

    表  6  与现有方法在Resume数据集上的比较

    Table  6.   Comparison with existing methods on the Resume dataset

    ModelP/%R/%F1/%
    Word baseline93.7293.4493.58
    Char baseline93.6693.3193.48
    Lattice-LSTM[23]94.8194.1194.46
    FLAT[24]--94.93
    BERT+FLAT[24]--95.86
    AT-RBC97.2397.0697.14
    下载: 导出CSV
  • [1] QIU J, ZHOU Y, WANG Q, et al. Chinese clinical named entity recognition using residual dilated convolutional neural network with conditional random field[J]. IEEE Transactions on NanoBioscience, 2019, 18(3): 306-315. doi: 10.1109/TNB.2019.2908678
    [2] YANG X, HUANG W. A conditional random fields approach to clinical name entity recognition[C]//Proceedings of the Evaluation Tasks at the China Conference on Knowledge Graph and Semantic Computing(CCKS 2018). Tianjin, China: CCKS, 2018: 1-6.
    [3] PETERS M, NEUMANN M, IYYER M, et al. Deep contextualized word representations[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. New Orleans, USA: ACL, 2018: 2227-2237.
    [4] DEVLIN J, CHANG M W, LEE K, et al. Bert: Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Minneapolis, Minnesota: ACL, 2019: 4171-4186.
    [5] LI N, LUO L, DING Z, et al. DUTIR at the CCKS-2019 Task1: Improving Chinese clinical named entity recognition using stroke ELMo and transfer learning[C]//4th China Conference on Knowledge Graph and Semantic Computing (CCKS 2019). Hangzhou, China: CCKS, 2019: 24-27.
    [6] ZHANG M, WANG J, ZHANG X. Using a pre-trained language model for medical named entity extraction in Chinese clinic text[C]//2020 IEEE 10th International Conference on Electronics Information and Emergency Communication (ICEIEC). Beijing, China: IEEE, 2020: 312-317.
    [7] CUI Y, CHE W, LIU T, et al. Pre-training with whole word masking for Chinese bert[EB/OL]. (2019-06-19) [2021-08-30]. https://arXiv.org/abs/1906.08101.
    [8] MIYATO T, DAI A M, GOODFELLOW I. Adversarial training methods for semi-supervised text classification[EB/OL]. (2016-05-25) [2021-08-30]. https://arXiv.org/abs/1605.07725.
    [9] GOODFELLOW I J, SHLENS J, SZEGEDY C. Explaining and harnessing adversarial examples[EB/02]. (2014-11-20) [2021-08-30]. https://arXiv.org/abs/1412.6572.
    [10] LIU P, QIU X, HUANG X. Adversarial multi-task learning for text classification[C]//55th Annual Meeting of the Association for Computational Linguistics. Vancouver, Canada: ACL, 2017: 1-10.
    [11] YASUNAGA M, KASAI J, RADEV D. Robust multilingual part-of-speech tagging via adversarial training[EB/OL]. (2017-10-14)[2021-08-30]. https://arXiv.org/abs/1711.04903.
    [12] SHEN D, ZHANG J, ZHOU G, et al. Effective adaptation of a hidden markov model-based named entity recognizer for biomedical domain[C]//Proceedings of the ACL 2003 Workshop on Natural Language Processing in Biomedicine. Sapporo, Japan: ACL, 2003: 49-56.
    [13] LEE K, HWANG Y, KIM S, et al. Biomedical named entity recognition using two-phase model based on SVMs[J]. Journal of Biomedical Informatics, 2004, 37(6): 436-447. doi: 10.1016/j.jbi.2004.08.012
    [14] LIU J, HUANG M, ZHU X. Recognizing biomedical named entities using skip-chain conditional random fields[C]//2010 Workshop on Biomedical Natural Language Processing. Association for Computational Linguistics. Uppsala, Sweden: ACL, 2010: 10-18.
    [15] WU F, LIU J, WU C, et al. Neural Chinese named entity recognition via CNN-LSTM-CRF and joint training with word segmentation[C]//The World Wide Web Conference. San Francisco, USA: ACM, 2019: 3342-3348.
    [16] GRIDACH M. Character-level neural network for biomedical named entity recognition[J]. Journal of Biomedical Informatics, 2017, 70: 85-91. doi: 10.1016/j.jbi.2017.05.002
    [17] WEI H, GAO M, ZHOU A, et al. Named entity recognition from biomedical texts using a fusion attention-based BiLSTM-CRF[J]. IEEE Access, 2019, 7: 73627-73636. doi: 10.1109/ACCESS.2019.2920734
    [18] ZHOU S, LIU J, ZHONG X, et al. Named entity recognition using BERT with whole world masking in cybersecurity domain[C]//2021 IEEE 6th International Conference on Big Data Analytics (ICBDA). Xiamen, China: IEEE, 2021: 316-320.
    [19] GUI T, ZHANG Q, HUANG H, et al. Part-of-speech tagging for twitter with adversarial neural networks[C]//2017 Conference on Empirical Methods in Natural Language Processing. Copenhagen, Denmark: ACL, 2017: 2411-2420.
    [20] CHEN X, CARDIE C. Multinomial Adversarial Networks for Multi-Domain Text Classification[C]//2018 Conference of the North American Chapter of the Association for Computational Linguistics. New Orleans, Louisiana: ACL, 2018: 1226-1240.
    [21] BEKOULIS G, DELEU J, DEMEESTER T, et al. Adversarial training for multi-context joint entity and relation extraction[C]//2018 Conference on Empirical Methods in Natural Language Processing. Brussels, Belgium: ACL, 2018: 2830- 2836.
    [22] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735-1780. doi: 10.1162/neco.1997.9.8.1735
    [23] ZHANG Y, YANG J. Chinese NER using lattice LSTM[C]//56th Annual Meeting of the Association for Computational Linguistics. Melbourne, Australia: ACL, 2018: 1554-1564.
    [24] LI X, YAN H, QIU X, et al. FLAT: Chinese NER using flat-lattice transformer[C]//58th Annual Meeting of the Association for Computational Linguistics. Washington, USA: ACL, 2020: 6836-6842.
  • 加载中
图(2) / 表(6)
计量
  • 文章访问数:  75
  • HTML全文浏览量:  153
  • PDF下载量:  26
  • 被引次数: 0
出版历程
  • 收稿日期:  2021-09-09
  • 录用日期:  2021-12-07
  • 网络出版日期:  2022-04-12

目录

    /

    返回文章
    返回