一种增强的多粒度特征融合语义匹配模型-《计算机技术与发展》

文章信息/Info

Title:: An Enhanced Multi Granularity Feature Fusion Model for Semantic Matching

Author(s):: SHANG Fu-hua; JIANG Yi-wen* ; CAO Mao-jun; School of Computer and Information Technology,Northeast Petroleum University,Daqing 163318,China

Keywords:: semantic matching; language granularity; Siamese network; decomposable attention mechanism; BERT model

摘要:: 语义匹配作为自然语言处理任务中重要的一环,直接制约问答系统、信息检索等任务的效率。针对现有语义模型大多只以词为基本语义单元进行注意力交互,较少考虑中文中的词边界模糊和字符信息获取不足而带来的语言颗粒度对整体建模忽略的问题,提出一种增强的多粒度特征融合语义匹配模型 EMGFM。首先结合 BERT 模型和 word2vec 以获得增强的字符向量表示,然后从字、词、句三种粒度进行注意力的交互,并对交互结果进行加权融合,以突出不同交互信息对整体建模的贡献。为减少交互过程中产生的信息损失,通过构造差异性来对交互信息进行信息增强。最后通过最大池化、平均池化两种方式获得文本的最终语义表示以进行匹配度的计算。该模型在 CCKS 问句匹配大赛中文数据集上达到了 87% 的正确率,相比于一些语义匹配的经典模型准确率均有提升,证明该方法确实能有效提升问句语义匹配的准确性。

Abstract:: As an important part of natural language processing tasks, semantic matching directly restricts the efficiency of question answering system,information retrieval and other tasks. Most of the existing semantic models only take words as the basic semantic unit for attention interaction,and less take into account the problem of language granularity ignoring the overall modeling caused by the fuzzy word boundary and insufficient acquisition of character information in Chinese. Therefore, an enhanced multi granularity feature fusion semantic matching model EMGFM is proposed. Firstly,the BERT model and word2vec are combined to obtain the enhanced character vector representation,then the attention interaction is carried out from the three granularity of words, phrases and sentences, and the interaction results are weighted fused to highlight the contribution of different interaction information to the overall modeling. In order to reduce the information loss in the interactive process,the interactive information is enhanced by constructing differences. Finally,the final semantic representation of the text is obtained by maximum pooling and average pooling to calculate the matching degree. The model achieves 87% accuracy on the Chinese data set of CCKS question matching competition. Compared with some classical models of semantic matching,the accuracy is improved. It proves that the proposed method can effectively improve the accuracy of question semantic matching.