Abstract:
Recently, the method of combining BERT (Bidirectional Encoder Representations from Transformers) and neural network model has been widely used in the field of Chinese medical named entity recognition. However, BERT is word segmentation in Chinese, without considering Chinese word segmentation. Neural network models are often locally unstable, and even small disturbances may mislead them and result in poor model robustness. In order to solve these two problems, this paper proposes a Chinese medical named entity recognition model based on RoBERTa (A Robustly Optimized BERT Pre-training Approach) and adversarial training, namely AT-RBC (Adversarial Training with RoBERTa-wwm-ext-large+BiLSTM+CRF). Firstly, RoBERTa-wwm-ext-large (A Robustly Optimized BERT Pre-training Approach-whole word masking-extended data-large) pre-trained model is utilized to obtain the initial vector representation of input text. Secondly, some perturbations are added to the initial vector representation for generating adversarial samples. Finally, the initial vector representation and adversarial samples are sequentially inputted into bidirectional long short-term memory network and conditional random field to obtain the final prediction. Finally, it is shown via the experiments that the
F1 value of AT-RBC model on the CCKS 2019 data set can reach 88.96%, and this value on the resume dataset reaches 97.14%.