基于BERT预训练模型的事故案例文本分类方法

涂远来; 周家乐; 王慧锋

doi:10.14135/j.cnki.1006-3080.20220223002

基于BERT预训练模型的事故案例文本分类方法

Text Classification Method of Accident Cases Based on BERT Pre-Training Model

摘要

摘要: 事故案例数据库中的大量事故信息为安全攸关系统的设计提供了丰富、宝贵的经验，包括事故发生的时间、地点、原因、经过等。这些信息在危险辨识中起着至关重要的作用，但它们通常分布在事故文档的各个段落中，使得人工提取的效率低且成本高。本文提出了一种基于BERT（Bidirectional Encoder Representations from Transformers）预训练模型的事故案例文本分类方法，可将事故案例文本分为ACCIDENT、CAUSE、CONSEQUENCE、RESPONSE这4类。此外，收集并构建了事故案例文本数据集用于训练模型。实验结果表明，本文方法可以实现对事故案例文本的自动分类，分类准确率达到73.44%，召回率为69.13%，F1值为0.71。

Abstract: The large amount of accident information in the accident case database can provide rich and valuable experience for the design of safety related system, including time, location, cause, process of accidents, etc. These informations play an important role in hazard identification, but they are usually distributed in various paragraphs of accident documents, which makes manual extraction inefficient and costly. This paper proposes a text classification method for accident cases based on BERT pre-training model, which can classify accident case texts into four categories: ACCIDENT、CAUSE、CONSEQUENCE, and RESPONSE. In addition, a test dataset of accident cases is collected and produced for training the model. The experiment shows that this method can achieve the automatic classification of accident case text, with a classification accuracy of 73.44%, a recall rate of 69.13%, and an F1 value of 0.71. In this paper, multiple groups of different experimental parameters are set up, and the effect of parameter settings on classification is fully explored through experiments to find the best parameter settings. The proposed classification method can help better mine the semantic information in the accident case text and provide powerful technical support for the subsequent establishment of expert knowledge base and efficient accident retrieval platform.

HTML全文

参考文献(23)

施引文献

资源附件(1)