Abstract:
Named entity recognition is a foundational task in Chinese information processing. Entity identification is the extraction of proper nouns and numeric information from the text and classifies them into categories such as person, organization and location. The Chinese names appear with a higher frequency in Chinese texts, so as an important basic subject of named entity recognition, the study of Chinese names recognition can significantly improve the quality of Chinese information processing. The forms of Chinese names are complex and diverse, which can be short names, aliases and other non-standard forms of names. Since the traditional Chinese name recognition methods are not yet perfect, we propose a new recognition method based on the expanded pattern set, and improve the recognition accuracy of non-complete Chinese names by expanding the set of recognition patterns. The main idea of this method is using role labeling to achieve Chinese name recognition. Firstly, through training of the corpus, we achieve the automatic role labeling and get the role sequence of the text. The role of each word is mainly based on the different roles in the composition of a person's name, such as family name, name, above, below, etc. Secondly, on the basis of the role sequence and the name recognition pattern set, the pattern matching algorithm is used to find the strings that match the name pattern defined by the name recognition pattern set from the text, and ultimately identify them as names. In this paper, the non-complete forms of names are fully considered, and the pattern set of name recognition is extended to adapt to more complex names. The experimental results demonstrate that the method is especially effective in recognition of non-complete Chinese names, thereby promoting the integrity of name recognition.