[1]尹春勇,沈子宁.基于交互式特征与多尺度特征的文本相似度研究[J].计算机技术与发展,2024,34(08):86-92.[doi:10.20165/j.cnki.ISSN1673-629X.2024.0140]
 YIN Chun-yong,SHEN Zi-ning.Research on Text Similarity Based on Interactive Features and Multi-scale Features[J].,2024,34(08):86-92.[doi:10.20165/j.cnki.ISSN1673-629X.2024.0140]
点击复制

基于交互式特征与多尺度特征的文本相似度研究

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
34
期数:
2024年08期
页码:
86-92
栏目:
人工智能
出版日期:
2024-08-10

文章信息/Info

Title:
Research on Text Similarity Based on Interactive Features and Multi-scale Features
文章编号:
1673-629X(2024)08-0086-07
作者:
尹春勇沈子宁
南京信息工程大学 计算机学院、网络空间安全学院,江苏 南京 210044
Author(s):
YIN Chun-yongSHEN Zi-ning
School of Computer Science,Nanjing University of Information Science and Technology,Nanjing 210044,China
关键词:
文本相似度双向长短期记忆交互式特征多尺度特征通道注意力
Keywords:
text similaritybidirectional long short-term memoryinteractive featuresmulti-scale featureschannel attention
分类号:
TP391
DOI:
10.20165/j.cnki.ISSN1673-629X.2024.0140
摘要:
针对文本相似度分析过程中缺乏信息传递和忽略多元语义信息而导致相似度计算结果准确率低的问题,结合双 向长短期记忆网络(BiLSTM),提出一种新颖的交互式特征与多尺度特征的文本相似度模型( IF-MSF)。 首先,利用 BiLSTM 对句子进行编码提取全局特征矩阵,分别用软注意力机制和余弦相似度对特征矩阵进行交互,以相互传递两组特征矩阵内部的语义信息。 其次,加权两组交互式特征以综合所有交互信息,并利用 BiLSTM 对加权交互式特征和初始编码特征再编码以捕获特征之间的差异信息。 再次,使用多尺度卷积提取差异信息的多元语义特征并结合通道注意力机制增强重要特征信息。 最后,融合两组增强特征判断文本对是否相似。 实验选取 2 个数据集来验证该方法,该模型 F1 值分别取得最高值 88. 15% 和 85. 03% ,优于其他方法。
Abstract:
Aiming at the problem of low accuracy of similarity calculation results caused by lack of information transmission and neglecting multiple semantic information in the process of text similarity analysis, a novel text similarity model based on interactive features and multi-scale features was proposed by combining bidirectional long short-term memory (BiLSTM). Firstly,BiLSTM was used to encode the sentences and extract the global feature matrix,and the soft attention mechanism and cosine similarity were used to interact with the feature matrix respectively, so as to transfer the semantic information inside the two groups of feature matrices.Secondly,the two groups of interaction features were weighted to synthesize all interactive information,and BiLSTM was used to re-encode the weighted interactive features and the initial coding features to capture the difference information between the features. Thirdly,multiple semantic information of differential information were extracted by multi-scale convolution and channel attention was combined to enhance significant feature information. Finally,two sets of enhanced features were fused to judge whether the text pairs are similar.Two data sets were selected to verify the proposed method,and F1 values of the proposed model reached the highest values of 88. 15% and 85. 03% , which is better than that of other methods.

相似文献/References:

[1]邱欢堂 何聚厚 何秀青.教学反思内容自动评估模型研究[J].计算机技术与发展,2012,(09):173.
 QIU Huan-tang,HE Ju-hou,HE Xiu-qing.Automatic Assessment Model for Content of Teaching Reflection[J].,2012,(08):173.
[2]孙昌年,郑诚,夏青松.基于 LDA 的中文文本相似度计算[J].计算机技术与发展,2013,(01):217.
 SUN Chang-nian,ZHENG Cheng,XIA Qing-song.Chinese Text Similarity Computing Based on LDA[J].,2013,(08):217.
[3]王小林,肖慧,邰伟鹏. 基于Hadoop平台的文本相似度检测系统的研究[J].计算机技术与发展,2015,25(08):90.
 WANG Xiao-lin,XIAO Hui,TAI Wei-peng. Research on Text Similarity Detection System Based on Hadoop[J].,2015,25(08):90.
[4]李梦洁,邵曦.基于文本属性的微博用户相似度研究[J].计算机技术与发展,2018,28(05):17.[doi:10.3969/j.issn.1673-629X.2018.05.005]
 LI Meng-jie,SHAO Xi. Research on Micro-blog User Similarity Based on Text Similarity[J].,2018,28(08):17.[doi:10.3969/j.issn.1673-629X.2018.05.005]
[5]陈攀[],杨浩[],吕品[][],等. 基于LDA模型的文本相似度研究[J].计算机技术与发展,2016,26(04):82.
 CHEN Pan[],YANG Hao[],L Pin[][],et al. Study on Text Similarity Based on LDA Model[J].,2016,26(08):82.
[6]殷 硕,王卫亚,柳有权.基于语义特征抽取的文本聚类研究[J].计算机技术与发展,2020,30(03):46.[doi:10. 3969 / j. issn. 1673-629X. 2020. 03. 009]
 YIN Shuo,WANG Wei-ya,LIU You-quan.Research on Text Clustering Based on Semantic Feature Extraction[J].,2020,30(08):46.[doi:10. 3969 / j. issn. 1673-629X. 2020. 03. 009]
[7]陈家乐,张艳玲.计算机算法类资料的中英文智能翻译[J].计算机技术与发展,2021,31(07):176.[doi:10. 3969 / j. issn. 1673-629X. 2021. 07. 029]
 CHEN Jia-le,ZHANG Yan-ling.English-Chinese Intelligent Translation of Computer Algorithm Corpus[J].,2021,31(08):176.[doi:10. 3969 / j. issn. 1673-629X. 2021. 07. 029]

更新日期/Last Update: 2024-08-10