高级检索

    曾露, 高大启, 阮彤, 王祺, 高炬, 何萍. 基于CRF的症状构成分析与标注[J]. 华东理工大学学报(自然科学版), 2018, (2): 277-282. DOI: 10.14135/j.cnki.1006-3080.20170308002
    引用本文: 曾露, 高大启, 阮彤, 王祺, 高炬, 何萍. 基于CRF的症状构成分析与标注[J]. 华东理工大学学报(自然科学版), 2018, (2): 277-282. DOI: 10.14135/j.cnki.1006-3080.20170308002
    ZENG Lu, GAO Da-qi, RUAN Tong, WANG Qi, GAO Ju, HE Ping. Analysis and Annotation of Symptom Composition Based on CRF[J]. Journal of East China University of Science and Technology, 2018, (2): 277-282. DOI: 10.14135/j.cnki.1006-3080.20170308002
    Citation: ZENG Lu, GAO Da-qi, RUAN Tong, WANG Qi, GAO Ju, HE Ping. Analysis and Annotation of Symptom Composition Based on CRF[J]. Journal of East China University of Science and Technology, 2018, (2): 277-282. DOI: 10.14135/j.cnki.1006-3080.20170308002

    基于CRF的症状构成分析与标注

    Analysis and Annotation of Symptom Composition Based on CRF

    • 摘要: 中文症状的描述丰富多样,症状的构成元素复杂多变,对症状构成的研究有助于全面理解症状成分、识别症状名称的同义词以及定量分析患者的患病情况。本文提出了一种中文症状构成模型,将中文症状看作是一个由原子症状、连词、否定词等16种构成元素中的一个或多个所组成的构成序列,并利用条件随机场模型实现对症状构成序列的自动标注。实验结果表明,该方法能够很好地识别中文症状的构成元素,其症状和构成元素两种统计粒度上的标注正确率分别达到了90.53%和93.91%。

       

      Abstract: The description of Chinese symptoms is rich and varied, and the constituent elements of symptoms are complex and changeable. As an important step to transform unstructured electronic medical records into structured ones, the recognition on the composition of Chinese symptoms will be helpful for fully grasping the information of symptoms, distinguishing the synonym of symptom name, and quantitative analyzing the patient's condition. In this paper, we present a composition model of Chinese symptom, in which a symptom name is taken as a sequence composed of one or more of the 16 elements, e.g., atomic symptom, conjunction and negative word. Moreover, the conditional random fields (CRF) is utilized to realize the automatic recognition of the sequences of symptoms. Firstly, we collect 5 645 Chinese symptoms from eight healthcare websites and semi-automatically annotate them. Then, CRF algorithm is used to recognize symptom composition elements. By choosing proper feature template on the symptom composition recognition, we verify the effect of CRF features and analyze the unrecognized symptom composition elements among the recognition results. We also design artificial rules with a symptom composition dictionary that targets at the wrong-type entities for correcting the recognition results. Finally, it has been shown from experiment results that the proposed method can effectively identify the composition elements of Chinese symptoms and increase the accuracy of the recognition of symptoms and composition elements by 90.53% and 93.91%, respectively.

       

    /

    返回文章
    返回