[1]尚福华,蒋毅文,曹茂俊.一种增强的多粒度特征融合语义匹配模型[J].计算机技术与发展,2022,32(07):28-33.[doi:10. 3969 / j. issn. 1673-629X. 2022. 07. 005]
 SHANG Fu-hua,JIANG Yi-wen*,CAO Mao-jun.An Enhanced Multi Granularity Feature Fusion Model for Semantic Matching[J].,2022,32(07):28-33.[doi:10. 3969 / j. issn. 1673-629X. 2022. 07. 005]
点击复制

一种增强的多粒度特征融合语义匹配模型()

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
32
期数:
2022年07期
页码:
28-33
栏目:
大数据分析与挖掘
出版日期:
2022-07-10

文章信息/Info

Title:
An Enhanced Multi Granularity Feature Fusion Model for Semantic Matching
文章编号:
1673-629X(2022)07-0028-06
作者:
尚福华蒋毅文曹茂俊
东北石油大学 计算机与信息技术学院,黑龙江 大庆 163318
Author(s):
SHANG Fu-huaJIANG Yi-wen* CAO Mao-jun
School of Computer and Information Technology,Northeast Petroleum University,Daqing 163318,China
关键词:
语义匹配语言颗粒度Siamese 网络可分解注意力机制BERT 模型
Keywords:
semantic matchinglanguage granularitySiamese networkdecomposable attention mechanismBERT model
分类号:
TP301
DOI:
10. 3969 / j. issn. 1673-629X. 2022. 07. 005
摘要:
语义匹配作为自然语言处理任务中重要的一环,直接制约问答系统、信息检索等任务的效率。 针对现有语义模型大多只以词为基本语义单元进行注意力交互,较少考虑中文中的词边界模糊和字符信息获取不足而带来的语言颗粒度对整体建模忽略的问题,提出一种增强的多粒度特征融合语义匹配模型 EMGFM。 首先结合 BERT 模型和 word2vec 以获得增强的字符向量表示,然后从字、词、句三种粒度进行注意力的交互,并对交互结果进行加权融合,以突出不同交互信息对整体建模的贡献。 为减少交互过程中产生的信息损失,通过构造差异性来对交互信息进行信息增强。 最后通过最大池化、平均池化两种方式获得文本的最终语义表示以进行匹配度的计算。 该模型在 CCKS 问句匹配大赛中文数据集上达到了 87% 的正确率,相比于一些语义匹配的经典模型准确率均有提升,证明该方法确实能有效提升问句语义匹配的准确性。
Abstract:
As an important part of natural language processing tasks, semantic matching directly restricts the efficiency of question answering system,information retrieval and other tasks. Most of the existing semantic models only take words as the basic semantic unit for attention interaction,and less take into account the problem of language granularity ignoring the overall modeling caused by the fuzzy word boundary and insufficient acquisition of character information in Chinese. Therefore, an enhanced multi granularity feature fusion semantic matching model EMGFM is proposed. Firstly,the BERT model and word2vec are combined to obtain the enhanced character vector representation,then the attention interaction is carried out from the three granularity of words, phrases and sentences, and the interaction results are weighted fused to highlight the contribution of different interaction information to the overall modeling. In order to reduce the information loss in the interactive process,the interactive information is enhanced by constructing differences. Finally,the final semantic representation of the text is obtained by maximum pooling and average pooling to calculate the matching degree. The model achieves 87% accuracy on the Chinese data set of CCKS question matching competition. Compared with some classical models of semantic matching,the accuracy is improved. It proves that the proposed method can effectively improve the accuracy of question semantic matching.

相似文献/References:

[1]张成伟 郑诚.基于改进VSM的文本信息检索研究[J].计算机技术与发展,2009,(01):71.
 ZHANG Cheng-wei,ZHENG Cheng.Research of Text Information Retrieval Based on Improved VSM[J].,2009,(07):71.
[2]朱星浩,胥 备.基于 GRU 算法的音乐和词语的情感语义匹配算法[J].计算机技术与发展,2021,31(11):46.[doi:10. 3969 / j. issn. 1673-629X. 2021. 11. 008]
 ZHU Xing-hao,XU Bei.Emotion Semantic Matching Algorithm of Music and WordsBased on GRU[J].,2021,31(07):46.[doi:10. 3969 / j. issn. 1673-629X. 2021. 11. 008]

更新日期/Last Update: 2022-07-10