[1]熊建华,韩永国,廖 竞,等.基于长句简化的中文开放关系抽取[J].计算机技术与发展,2023,33(02):203-207.[doi:10. 3969 / j. issn. 1673-629X. 2023. 02. 030]
XIONG Jian-hua,HAN Yong-guo,LIAO Jing,et al.Chinese Open Relation Extraction Based on Long Sentence Simplification[J].,2023,33(02):203-207.[doi:10. 3969 / j. issn. 1673-629X. 2023. 02. 030]
点击复制
基于长句简化的中文开放关系抽取(
)
《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]
- 卷:
-
33
- 期数:
-
2023年02期
- 页码:
-
203-207
- 栏目:
-
新型计算应用系统
- 出版日期:
-
2023-02-10
文章信息/Info
- Title:
-
Chinese Open Relation Extraction Based on Long Sentence Simplification
- 文章编号:
-
1673-629X(2023)02-0203-05
- 作者:
-
熊建华; 韩永国; 廖 竞; 寇露彦; 吴昌述
-
西南科技大学 计算机科学与技术学院,四川 绵阳 621010
- Author(s):
-
XIONG Jian-hua; HAN Yong-guo; LIAO Jing; KOU Lu-yan; WU Chang-shu
-
School of Computer Science and Technology,Southwest University of Science and Technology,Mianyang 621010,China
-
- 关键词:
-
开放关系抽取; 长句简化; 依存句法分析; 序列到序列模型; BERT 模型
- Keywords:
-
open relation extraction; long sentence simplification; dependency syntactic analysis; sequence to sequence model; BERT model
- 分类号:
-
TP391. 1
- DOI:
-
10. 3969 / j. issn. 1673-629X. 2023. 02. 030
- 摘要:
-
目前中文开放关系抽取的主流方法是根据句法分析结果制定抽取规则进行抽取,这种方法严重依赖于自然语言处理工具的处理效果。 当文本句子长度较长时,自然语言处理工具准确性较低,关系抽取质量也随之下降。 因此,提出了一种基于长句简化的开放关系抽取方法。 首先,基于序列到序列模型框架对文本中的长句进行化简;然后,利用词法和句法规则对化简后的各个子句分别进行关系抽取。 长句简化部分,将 BERT 的双向 Transformer 结构作为序列到序列模型的主体,输入端通过 BERT-WWM 预训练模型获取句子的文本向量,解码器利用 UniLM 的 Seq2Seq Mask 机制进行解码。 关系抽取部分,首先,根据依存句法分析结果抽取出主谓宾结构的基础关系数据。 然后,再根据词法、句法信息对实体和关系词进行补充。 实验结果表明,该方法有效提高了对复杂长句的开放关系抽取的准确率和召回率。 最后,对抽取的关系数据进行了错误分析,并对错误种类进行了归纳,为以后的开放关系抽取研究提供了参考。
- Abstract:
-
At present, the mainstream method of Chinese open relation extraction is to develop extraction rules based on syntacticanalysis results,which relies heavily on the processing effect of natural language processing tools. However, when the length of textsentences is too long,the accuracy of natural language processing tools is low,and the quality of relation extraction decreases. Therefore,we propose an open relation extraction method based on long sentence simplification. The method firstly simplifies the long sentences inthe text on the basis of sequence-to-sequence model,and then uses lexical and syntactic rules to perform relation extraction for each sub-sentence after the simplification. As for the long sentence simplification part,the bidirectional Transformer structure of BERT is used asthe main body of the sequence-to-sequence model,and the text vector of the sentence can be obtained through the pre-training model ofBERT-WWM by the input side,and the Seq2Seq Mask mechanism of UniLM can be used to decode. As for the relation extraction part,we firstly extract the basic relation data of Subject-Predicate-Object structure based on the result of dependency syntactic analysis,andthen supplement entities and relation words according to lexical and syntactic information. The experimental results show that theproposed method effectively improves the accuracy and recall rate of open relation extraction for complex long sentences. Finally,weanalyze the errors in extracted relation data and summarized the types of errors,which provides a reference for future research on openrelation extraction.
更新日期/Last Update:
2023-02-10