[1]余诗媛,郭淑明,黄瑞阳,等.分层区域穷举的中文嵌套命名实体识别方法[J].计算机技术与发展,2022,32(09):161-166.[doi:10. 3969 / j. issn. 1673-629X. 2022. 09. 025]
 YU Shi-yuan,GUO Shu-ming,HUANG Rui-yang,et al.Layered Regional Exhaustive Model for Chinese Nested Named Entity Recognition[J].,2022,32(09):161-166.[doi:10. 3969 / j. issn. 1673-629X. 2022. 09. 025]
点击复制

分层区域穷举的中文嵌套命名实体识别方法()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
32
期数:
2022年09期
页码:
161-166
栏目:
新型计算应用系统
出版日期:
2022-09-10

文章信息/Info

Title:
Layered Regional Exhaustive Model for Chinese Nested Named Entity Recognition
文章编号:
1673-629X(2022)09-0161-06
作者:
余诗媛12 郭淑明2 黄瑞阳2 张建朋2 胡 楠12
1. 郑州大学 软件学院,河南 郑州 450001
2. 国家数字交换系统工程技术研究中心,河南 郑州 450002
Author(s):
YU Shi-yuan12 GUO Shu-ming2 HUANG Rui-yang2 ZHANG Jian-peng2 HU Nan12
1. School of Software,Zhengzhou University,Zhengzhou 450001,China
2. National Digital Switching System Engineering and Technological R&D Center,Zhengzhou 450002,China
关键词:
嵌套命名实体识别分层区域穷举卷积神经网络双向长短时记忆网络信息抽取
Keywords:
nested named entity recognitionlayered regional exhaustive modelconvolutional neural networkbi-directional long shortterm memory network information extraction
分类号:
TP18
DOI:
10. 3969 / j. issn. 1673-629X. 2022. 09. 025
摘要:
嵌套命名实体之间蕴含着丰富的语义关系与结构信息,开发能够准确识别嵌套命名实体的算法具有重要研究意义。 针对现有的中文嵌套命名实体数据集中存在错标漏标以及现有识别方法大多忽略嵌套实体内部信息关联关系而导致准确性下降的问题,结合自动生成与手动标注的方法构建新的中文嵌套命名实体数据集 NEPD,在此基础上,设计一种利用分层区域穷举的中文嵌套命名实体识别模型。 该模型通过遍历文本组合实体,获取低层编码层的词嵌入信息;其次,为使邻接编码层之间实现信息交换,将低层编码层的词嵌入信息融入高层编码层;最后,利用多层解码层使长度为 L 的命名实体仅在第 L 层预测,有效防止错误传播现象发生从而提高识别准确度。 实验结果表明,在没有外部知识资源的情况下,LREM 模型在嵌套命名实体与非嵌套命名实体上的识别 F1 值分别达到 87. 19% 和 86. 27% ,其中非嵌套命名实体识别的 F1 值比传统的 BiLSTM+CRF 模型提升 1. 18% ,验证了该模型的可靠性。
Abstract:
Nested named entities contain rich semantic relationships and structural information among them,and it is essential to develop algorithms that? can accurately identify nested named entities. To address the problems of mislabeling and omission in the existing Chinesenested named entity dataset,and the problem that most of the existing recognition methods ignore the internal information association relationship of nested entities,? ?a new Chinese nested named entity dataset NEPD is constructed by combining automatic generation and manualannotation methods,based on which a Chinese nested named entity recognition model is designed using hierarchical region exhaustive.The model obtains the word embedding information of the lower coding layer by traversing the text combination entities. Furthe rmore,the word embedding information of the lower coding layer is incorporated into the higher coding layer to exchange data between neighboring coding layers. Finally, the named entities of length L are predicted only in the L layer by using multiple decoding layers, which effectively prevents the occurrence of error propagation and thus improves the recognition accuracy. The experimental results show that without external knowledge resources, the F1 values of the LREM model reach? ? ?87. 19% and 86. 27% for the recognition of nested name dentities and non - nested named entities, respectively, with the F1 value of non - nested named entities recognition improving 1. 18%compared with the traditional BiLSTM+CRF model. The experiments verify the reliability of the model in this paper.
更新日期/Last Update: 2022-09-10