基于增强 BiLSTM 的网络文章核心实体识别-《计算机技术与发展》

文章信息/Info

Author(s):: ZHOU Kang; QU Wei-dong; YANG Yi-chen; School of Information Engineering,Chang’an University,Xi’an 710064,China

Keywords:: Chinese named entity recognition; core entity recognition; deep learning; BiLSTM; AdaBoost

摘要:: 文章核心实体是文章主要描述或担任文章主要角色的实体词。随着互联网的发展与网络传媒平台的兴起,自媒体网络新闻传播严重影响着公众的价值导向,网络文章核心实体识别是情感分析、舆情监测等自然语言处理任务的重要基础步骤。对于较易识别的传统命名实体,文章核心实体的识别难度更大,其抽取工作不仅需要基于篇章理解的复杂特征信息,还涉及灵活多样的领域表述方式。针对这些问题,提出了一种基于增强双向长短期记忆网络(bidirectional long short-term memory,BiLSTM)的网络文章核心实体识别模型。该模型通过 BiLSTM 来捕捉刻画文章的篇章级特征,在BiLSTM 模型的基础上利用 Bert 模型提升上下文语义质量,增强原有模型的语义表达能力,同时引入 Ada Boost 集成学习框架来组合多个针对语料数据的不同侧面训练的子模型,从而增强整体的识别效果。文章所做实验验证了该模型的可行性和有效性。

Abstract:: The core entity of the article is the entity word that mainly describes or plays the main role of the article. With the development of? ? ? ?the Internet and the rise of online media platforms,self-media online news dissemination has seriously affected the public’s value orientation. The core entity recognition of web articles is an important basic step for natural language processing tasks such as sentiment analysis and public sentiment monitoring. Compared with the more easily-recognized traditional named entities, the recognition of the core entities of the article is more difficult. Its extraction work not only requires complex feature information based? ? on chapter-level understanding,but also involves flexible and diverse domain representations. Aiming at these problems,we propose? ? a core entity recognition model for web articles based on enhanced BiLSTM (bidirectional long short-term memory). This model uses BiLSTM to capture the text-level features of the article. Based on the BiLSTM model,the Bert model is used to improve the semantic quality of the context and enhance the semantic expression ability of the original model. At the same time,the AdaBoost integrated learning framework is used to combine multiple sub-models trained on different sides of the corpus data to enhance the overall recognition effect. The experiments verify the feasibility and effectiveness of this model.