[1]崔从敏,施运梅,袁 博,等.面向政府公文的关系抽取方法研究[J].计算机技术与发展,2021,31(12):26-32.[doi:10. 3969 / j. issn. 1673-629X. 2021. 12. 005]
 CUI Cong-min,SHI Yun-mei,YUAN Bo,et al.Research on Relation Extraction Method for Government Documents[J].,2021,31(12):26-32.[doi:10. 3969 / j. issn. 1673-629X. 2021. 12. 005]
点击复制

面向政府公文的关系抽取方法研究()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
31
期数:
2021年12期
页码:
26-32
栏目:
大数据分析与挖掘
出版日期:
2021-12-10

文章信息/Info

Title:
Research on Relation Extraction Method for Government Documents
文章编号:
1673-629X(2021)12-0026-07
作者:
崔从敏12 施运梅12 袁 博12 李云汉12 李源华12 周楚围12
1. 北京信息科技大学 网络文化与数字传播北京市重点实验室,北京 100101;
2. 北京信息科技大学,北京 100101
Author(s):
CUI Cong-min12 SHI Yun-mei12 YUAN Bo12 LI Yun-han12 LI Yuan-hua12 ZHOU Chu-wei12
1. Beijing Key Laboratory of Internet Culture Digital Dissemination,Beijing InformationScience and Technology University,Beijing 100101,China;
2. Beijing Information Science and Technology University,Beijing 100101,China
关键词:
实体关系抽取远程监督ALBERT预训练语言模型胶囊网络
Keywords:
entity relationship extractiondistant supervisionALBERTpre-training language modelcapsule network
分类号:
TP391
DOI:
10. 3969 / j. issn. 1673-629X. 2021. 12. 005
摘要:
政府公文内容多,涉及范围广,从中挖掘出有价值的信息,可减轻政府工作人员的压力,比如应用实体关系抽取技术挖掘人事信息。 采用远程监督的关系抽取方法可以减少人工标注成本,提高关系抽取效率,进而保证了获取重要信息的质量和实效性。 该文提出一种 ALBERT 预训练语言模型和胶囊网络相结合的远程监督实体关系抽取方法,抽取公文中的人名职务关系。 ALBERT 通过字嵌入和位置嵌入的方式,提取文本中深层的语义信息,胶囊网络通过传输低层到高层的特征,提高关系分类效果。 实验结果表明,提出的关系抽取模型的准确率、召回率、F1 值均高于基线方法,能够有效提高关系抽取性能,解决公文领域标注数据集少的问题。 该方法所获实例可扩充现有公文领域知识库,可以辅助政府工作人员在书写公文时快速获取人事信息,避免信息传递错误。
Abstract:
Government documents contain rich contents and cover a wide range. Mining valuable information from them can relieve the pressure on staffs, such as using entity relationship extraction technology to mine personnel information. The method of distant supervision for relation extraction can reduce the cost of manual labeling,improve the efficiency of relation extraction,and ensure the quality and effectiveness of obtaining important information. We propose a method of distant supervision for entity relation extraction based on combining ALBERT? ? ? ?pre-training language model with capsule network to extract the person names and positions relationship in the official documents.? ? ? ALBERT extracts the deep semantic information from the text by way of word embedding and position embedding. Capsule network improves relationship classification by transferring low-level to high-level features. The experiment shows that the accuracy,recall rate and F1 value of the proposed relationship extraction model are higher than the baseline method,which can effectively improve the performance of relation extraction and solve the problem of fewer labeled datasets in the field of official documents.The examples obtained in this paper can expand the existing document domain knowledge base,and help government staffs to quickly obtain personnel information when writing documents, so as to avoid information transmission errors.

相似文献/References:

[1]尹 鹏,周 林,郭 强,等.基于短语级注意力机制的关系抽取方法[J].计算机技术与发展,2019,29(09):24.[doi:10. 3969 / j. issn. 1673-629X. 2019. 09. 005]
 YIN Peng,ZHOU Lin,GUO Qiang,et al.Relation Extraction Based on Phrase-level Attention[J].,2019,29(12):24.[doi:10. 3969 / j. issn. 1673-629X. 2019. 09. 005]
[2]何阳宇,易晓宇,唐 亮,等.基于BLSTM-ATT的老挝语军事领域实体关系抽取[J].计算机技术与发展,2021,31(05):31.[doi:10. 3969 / j. issn. 1673-629X. 2021. 05. 006]
 ,,et al.LaoEntityRelationExtractioninMilitaryDomainBasedonBLSTM andAttentionMechanism[J].,2021,31(12):31.[doi:10. 3969 / j. issn. 1673-629X. 2021. 05. 006]
[3]潘理虎,陈亭亭,闫慧敏,等.基于滑动窗口注意力网络的关系分类模型[J].计算机技术与发展,2022,32(06):21.[doi:10. 3969 / j. issn. 1673-629X. 2022. 06. 004]
 PAN Li-hu,CHEN Ting-ting,YAN Hui-min,et al.Relation Classification Model Based on Sliding Window Attention Network[J].,2022,32(12):21.[doi:10. 3969 / j. issn. 1673-629X. 2022. 06. 004]
[4]祝振赫,武 虹,高 洁,等.融合外部知识的生成式实体关系联合抽取方法[J].计算机技术与发展,2023,33(08):124.[doi:10. 3969 / j. issn. 1673-629X. 2023. 08. 018]
 ZHU Zhen-he,WU Hong,GAO Jie,et al.A Generative Entity Relation Extraction Method Based on External Knowledge[J].,2023,33(12):124.[doi:10. 3969 / j. issn. 1673-629X. 2023. 08. 018]

更新日期/Last Update: 2021-12-10