[1]魏媛媛,倪建成,高 峰,等.结合主题信息聚类编码的文本摘要模型[J].计算机技术与发展,2021,31(01):30-34.[doi:10. 3969 / j. issn. 1673-629X. 2021. 01. 006]
 WEI Yuan-yuan,NI Jian-cheng,GAO Feng,et al.A Text Abstract Summarization Model Combined with Theme Information Clustering Coding[J].,2021,31(01):30-34.[doi:10. 3969 / j. issn. 1673-629X. 2021. 01. 006]
点击复制

结合主题信息聚类编码的文本摘要模型()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
31
期数:
2021年01期
页码:
30-34
栏目:
大数据分析与挖掘
出版日期:
2021-01-10

文章信息/Info

Title:
A Text Abstract Summarization Model Combined with Theme Information Clustering Coding
文章编号:
1673-629X(2021)01-0030-05
作者:
魏媛媛倪建成高 峰吴俊清
曲阜师范大学 软件学院,山东 济宁 272000
Author(s):
WEI Yuan-yuanNI Jian-chengGAO FengWU Jun-qing
School of Software,Qufu Normal University,Jining 272000,China
关键词:
序列到序列模型生成式文本摘要词向量聚类主题编码余弦相似度
Keywords:
sequence-to-sequence modelgenerative text abstractword vector clusteringtheme codingcosine similarity
分类号:
TP391. 1
DOI:
10. 3969 / j. issn. 1673-629X. 2021. 01. 006
摘要:
结合注意力机制的序列到序列模型在生成式文本摘要的研究中已取得了广泛应用, 但基于该模型的摘要生成技术依然存在信息编码不充分、生成的摘要偏离主题的问题,对此提出了一种结合主题信息聚类编码的文本摘要生成模型TICTS(theme information clustering coding text summarization)。 将传统的抽取式文本摘要方法与基于深度学习的生成式文本摘要方法相结合,使用基于词向量的聚类算法进行主题信息提取,利用余弦相似度计算输入文本与所提取关键信息的主题相关性, 将其作为主题编码的权重以修正注意力机制,在序列到序列模型的基础上结合主题信息与注意力机制生成摘要。 模型在 LCSTS 数据集上进行实验, 以 ROUGE 为评价标准,实验结果相对于基线模型在 ROUGE-1 的得分上提高了 1.1,ROUGE-2 提高了 1.3,ROUGE-L 提高了 1.1。 实验证明结合主题信息聚类编码的摘要模型生成的摘要更切合主题,摘要质量有所提高。
Abstract:
The sequence-to-sequence model combined with the attention mechanism has been widely used in the research of the generative text abstract, but the abstract generation technology based on this model still has the problems of insufficient information encoding and the generated abstract deviating from the topic. Therefore,we present a TICTS (theme information clustering coding text summarization) model based on the cluster encoding of topic information. The traditional extraction text abstract method is combined with the genera-tion text summary method based on deep learning,and the topic information is extracted by using the clustering algorithm based on word vector. The topic correlation between the input text and the extracted key information is calculated by cosine similarity,which is used as the weight of topic encoding to modify the attention mechanism,and the abstract is generated by combining the topic informa-tion and attention mechanism on the basis of the sequence-to-sequence model. The model is tested on the LCSTS dataset. With ROUGE as the evaluation standard,compared with the baseline model,the experimental results are improved by 1.1,1.3 and 1.1 in terms of the score of Rouges-1,Rouges-2 and Rouges-L. It is showed that the summary model combined with the abstract model of topic information cluster encoding is more relevant to the topic,and the quality of abstract is improved.

相似文献/References:

[1]熊建华,韩永国,廖 竞,等.基于长句简化的中文开放关系抽取[J].计算机技术与发展,2023,33(02):203.[doi:10. 3969 / j. issn. 1673-629X. 2023. 02. 030]
 XIONG Jian-hua,HAN Yong-guo,LIAO Jing,et al.Chinese Open Relation Extraction Based on Long Sentence Simplification[J].,2023,33(01):203.[doi:10. 3969 / j. issn. 1673-629X. 2023. 02. 030]

更新日期/Last Update: 2020-01-10