Abstract:
The description of Chinese symptoms is rich and varied, and the constituent elements of symptoms are complex and changeable. As an important step to transform unstructured electronic medical records into structured ones, the recognition on the composition of Chinese symptoms will be helpful for fully grasping the information of symptoms, distinguishing the synonym of symptom name, and quantitative analyzing the patient's condition. In this paper, we present a composition model of Chinese symptom, in which a symptom name is taken as a sequence composed of one or more of the 16 elements, e.g., atomic symptom, conjunction and negative word. Moreover, the conditional random fields (CRF) is utilized to realize the automatic recognition of the sequences of symptoms. Firstly, we collect 5 645 Chinese symptoms from eight healthcare websites and semi-automatically annotate them. Then, CRF algorithm is used to recognize symptom composition elements. By choosing proper feature template on the symptom composition recognition, we verify the effect of CRF features and analyze the unrecognized symptom composition elements among the recognition results. We also design artificial rules with a symptom composition dictionary that targets at the wrong-type entities for correcting the recognition results. Finally, it has been shown from experiment results that the proposed method can effectively identify the composition elements of Chinese symptoms and increase the accuracy of the recognition of symptoms and composition elements by 90.53% and 93.91%, respectively.