高级检索

    栾伟锋, 张欢欢. 一种基于扩展模式集的中国人名识别方法[J]. 华东理工大学学报(自然科学版), 2018, (3): 425-430. DOI: 10.14135/j.cnki.1006-3080.20170509001
    引用本文: 栾伟锋, 张欢欢. 一种基于扩展模式集的中国人名识别方法[J]. 华东理工大学学报(自然科学版), 2018, (3): 425-430. DOI: 10.14135/j.cnki.1006-3080.20170509001
    LUAN Wei-feng, ZHANG Huan-huan. An Expanded Pattern Set-Based Approach to Chinese Name Recognition[J]. Journal of East China University of Science and Technology, 2018, (3): 425-430. DOI: 10.14135/j.cnki.1006-3080.20170509001
    Citation: LUAN Wei-feng, ZHANG Huan-huan. An Expanded Pattern Set-Based Approach to Chinese Name Recognition[J]. Journal of East China University of Science and Technology, 2018, (3): 425-430. DOI: 10.14135/j.cnki.1006-3080.20170509001

    一种基于扩展模式集的中国人名识别方法

    An Expanded Pattern Set-Based Approach to Chinese Name Recognition

    • 摘要: 由于中国人名形式复杂多样,且存在简称、别名等不规范形式,针对传统的中国人名识别方法对诸如人名简称或别名这类非完整形式中国人名识别尚不完善的问题,提出了一种基于扩展模式集的中国人名识别方法,通过扩展人名识别模式集,提高对于非完整形式的中国人名的识别效果。实验结果表明,该方法取得了较好的正确率和召回率,尤其对于非完整形式的中文人名识别取得了一定效果,促进了人名识别工作的完整性。

       

      Abstract: Named entity recognition is a foundational task in Chinese information processing. Entity identification is the extraction of proper nouns and numeric information from the text and classifies them into categories such as person, organization and location. The Chinese names appear with a higher frequency in Chinese texts, so as an important basic subject of named entity recognition, the study of Chinese names recognition can significantly improve the quality of Chinese information processing. The forms of Chinese names are complex and diverse, which can be short names, aliases and other non-standard forms of names. Since the traditional Chinese name recognition methods are not yet perfect, we propose a new recognition method based on the expanded pattern set, and improve the recognition accuracy of non-complete Chinese names by expanding the set of recognition patterns. The main idea of this method is using role labeling to achieve Chinese name recognition. Firstly, through training of the corpus, we achieve the automatic role labeling and get the role sequence of the text. The role of each word is mainly based on the different roles in the composition of a person's name, such as family name, name, above, below, etc. Secondly, on the basis of the role sequence and the name recognition pattern set, the pattern matching algorithm is used to find the strings that match the name pattern defined by the name recognition pattern set from the text, and ultimately identify them as names. In this paper, the non-complete forms of names are fully considered, and the pattern set of name recognition is extended to adapt to more complex names. The experimental results demonstrate that the method is especially effective in recognition of non-complete Chinese names, thereby promoting the integrity of name recognition.

       

    /

    返回文章
    返回