融合动态卷积注意力的机器阅读理解研究-《计算机技术与发展》

文章信息/Info

Title:: Study on Machine Reading Comprehension Hybriding Dynamic Convolution Attention

作者:: 吴春燕¹ ; 李理¹ ; 黄鹏程¹; 刘知贵¹; 2 ; 张小乾²; 1. 西南科技大学计算机科学与技术学院,四川绵阳 621000;
2. 西南科技大学信息工程学院,四川绵阳 621000

Author(s):: WU Chun-yan1 ; LI Li1 ; HUANG Peng-cheng1 ; LIU Zhi-gui1; 2 ; ZHANG Xiao-qian2; 1. School of Computer Science and Technology,Southwest University of Science and Technology,Mianyang 621000,China;
2. School of Information Engineering,Southwest University of Science and Technology,Mianyang 621000,China

Keywords:: machine reading comprehension; span-extracting; answer prediction; long short-term memory; dynamic convolution

摘要:: 针对机器阅读理解在采用长短期记忆神经网络和注意力机制处理文本序列信息时,存在特征信息提取不足和预测结果准确性不高的问题,提出了一种融合动态卷积注意力的片段抽取型机器阅读理解模型。该模型考虑到 LSTM 的当前输入和之前的状态相互独立,可能会导致上下文信息丢失,采用 Mogrifier 作为编码器,让当前输入与前一个状态充分交互多次,增强上下文和问题中的显著结构特征并减弱其次要特征;其次,由于静态卷积的卷积核相同,只能提取固定长度文本的特征,这可能对机器更好的理解文本产生阻碍,通过引入动态卷积,采用多个不同卷积核的一维卷积来捕获上下文和问题的局部结构,弥补注意力机制只有全局捕获能力的缺点。在 SQuAD 数据集上的实验结果表明,与其他模型相比,该方法有效提升了模型在特征信息提取和答案预测方面的能力。

Abstract:: To solve the problems of insufficient feature information extraction and low accuracy of prediction results when?
using long short-term memory and attention mechanism to process text sequence information in machine reading comprehension, we propose a span -extracting machine reading comprehension model hybriding dynamic convolution attention. Considering that the current input and theprevious state of LSTM are independent of each other,which may lead to the loss of context information,the Mogrifier is adopted as theencoder,which makes the current input fully interact with the previous state several times, so as to enhance the significant structuralfeatures in the context and the problem and weaken the secondary features. Secondly,because the convolution kernel of static convolutionis the same,only the features of fixed length text can be extracted,which may hinder the machine from better understanding the text. Byintroducing dynamic convolution, one - dimensional convolution of multiple different convolution kernels is used to capture the localstructure of the context and the problem, which makes up for the disadvantage that the attention mechanism has only global captureability. Experimental results on SQuAD datasets show that compared with other models,the proposed method can effectively improve the　model’ s ability in feature information extraction and answer prediction.