[1]刘华玲,孙 毅.基于实体识别和信息融合的知识图谱研究[J].计算机技术与发展,2022,32(09):107-113.[doi:10. 3969 / j. issn. 1673-629X. 2022. 09. 017]
 LIU Hua-ling,SUN Yi.Knowledge Graph Based on Entity Recognition and Information Fusion--A Case Study of COVID-19[J].,2022,32(09):107-113.[doi:10. 3969 / j. issn. 1673-629X. 2022. 09. 017]
点击复制

基于实体识别和信息融合的知识图谱研究()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
32
期数:
2022年09期
页码:
107-113
栏目:
人工智能
出版日期:
2022-09-10

文章信息/Info

Title:
Knowledge Graph Based on Entity Recognition and Information Fusion--A Case Study of COVID-19
文章编号:
1673-629X(2022)09-0107-07
作者:
刘华玲孙 毅
上海对外经贸大学 统计与信息学院,上海 201620
Author(s):
LIU Hua-lingSUN Yi
Department of Statistics and Information,Shanghai University of International Business and Economics,Shanghai 201620,China
关键词:
命名实体识别实体消歧BERT知识图谱新冠肺炎疫情可视化分析
Keywords:
named entity recognitionentity disambiguationBERTknowledge graphCOVID-19visualization analysis
分类号:
TP391.1
DOI:
10. 3969 / j. issn. 1673-629X. 2022. 09. 017
摘要:
突发公共卫生事件通常会造成巨大的破坏, 研究时效性与可理解性在解决这类事件中尤为重要, 亟需快速分析研究现状、抽取特定研究信息的方法。 科学文献是知识传播的主要载体与重要途径之一,针对文献中专业术语特殊性与歧义性导致的传播受阻问题,该文通过自然语言处理与知识图谱技术,以新冠疫情研究相关文献为例,结合实体识别与信息融合构建知识图谱。 该方法首先通过对文献的题目与摘要标注实体以构建数据集用于训练 BERT-BiLSTM-CRF 模型,该模型可以对文本中的医学实体自动识别并提取。 然后根据作者信息的多源交叉验证与领域、机构相似度消除作者姓名歧义并构建一个作者集合。 最后根据实体-实体、作者-作者和实体-作者关系,在融合多源信息后增量构建新冠肺炎疫情知识图谱。 命名实体识别模型在 6 类不同医学实体上的平均 F1 分数达到 92. 86% ,知识图谱包含了 34 802 个医学实体与397 163 名作者。 这项研究表明以上流程可以有效地构建知识图谱,并据此快速找到前沿研究热点和相关领域核心学者,有效促进知识的获取和概念的传播。
Abstract:
Public health emergencies usually cause great damage. Timeliness and comprehensibility of research are particularly important in solving such incidents. It is urgent to analyze the current situation of research quickly and extract specific research information.Scientific literature is one of the main carriers and important ways of knowledge dissemination. In view of the problem of transmission obstruction caused by the special terminology and ambiguity in the literature,we use natural language processing and knowledge graph technology,and take COVID-19 as an example to build knowledge graph with recognized entities and fused information. Firstly,the method labels the entities of the title and abstract of the literature to construct a data set for training the BERT-BiLSTM-CRF model,which can automatically recognize and extract the medical entities in the papers. Then,according to the multi-source cross validation of author information and the similarity of domain and organization,the author name ambiguity is eliminated and an author information set is constructed. Finally,a knowledge graph about COVID-19 is constructed after the integration of multiple sources information based on entity-entity,author-author and entity-author relationships. The average F1 score of the entity recognition model on 6 different medical entities reached 92. 86% . The knowledge graph contains 34 802 medical entities and 397 163 authors. This study shows that this process can effectively construct the knowledge graph, quickly find cutting - edge research hotspots and core scholars in related fields, which effectively promote the acquisition of knowledge and the dissemination of concepts.

相似文献/References:

[1]陈 琛,刘小云,方玉华.融合注意力机制的电子病历命名实体识别[J].计算机技术与发展,2020,30(10):216.[doi:10. 3969 / j. issn. 1673-629X. 2020. 10. 038]
 CHEN Chen,LIU Xiao-yun,FANG Yu-hua.Named Entity Recognition in Electronic Medical Record Introducing Attention Mechanisms[J].,2020,30(09):216.[doi:10. 3969 / j. issn. 1673-629X. 2020. 10. 038]
[2]王卫红,吕红燕,曹玉辉,等.基于 BERT 的混合神经网络实体识别方法[J].计算机技术与发展,2021,31(08):100.[doi:10. 3969 / j. issn. 1673-629X. 2021. 08. 017]
 WANG Wei-hong,LYU Hong-yan,CAO Yu-hui,et al.A Hybrid Neural Network Entity Recognition Method Based on BERT Model[J].,2021,31(09):100.[doi:10. 3969 / j. issn. 1673-629X. 2021. 08. 017]
[3]王 俊,王修来*,栾伟先,等.基于 BERT 模型的科研人才领域命名实体识别[J].计算机技术与发展,2021,31(11):21.[doi:10. 3969 / j. issn. 1673-629X. 2021. 11. 004]
 WANG Jun,WANG Xiu-lai*,LUAN Wei-xian,et al.Research on Named Entity Recognition of Scientific Research Talents Field Based on BERT Model[J].,2021,31(09):21.[doi:10. 3969 / j. issn. 1673-629X. 2021. 11. 004]
[4]潘理虎,赵彭彭,龚大立,等.煤矿事故案例命名实体识别方法研究[J].计算机技术与发展,2022,32(02):154.[doi:10. 3969 / j. issn. 1673-629X. 2022. 02. 025]
 PAN Li-hu,ZHAO Peng-peng,GONG Da-li,et al.Combined ALBERT for Named Entity Recognition in Coal Mine Accident Cases[J].,2022,32(09):154.[doi:10. 3969 / j. issn. 1673-629X. 2022. 02. 025]
[5]孙安亮,时宏伟,王金策.基于字符与单词嵌入的航空安全命名实体识别[J].计算机技术与发展,2022,32(09):148.[doi:10. 3969 / j. issn. 1673-629X. 2022. 09. 023]
 SUN An-liang,SHI Hong-wei,WANG Jin-ce.Named Entity Recognition Based on Character and Word Embedding in Aviation Safety[J].,2022,32(09):148.[doi:10. 3969 / j. issn. 1673-629X. 2022. 09. 023]
[6]杜睿山,陈思路,刘文豪.基于岩石文本信息的命名实体识别[J].计算机技术与发展,2022,32(09):188.[doi:10. 3969 / j. issn. 1673-629X. 2022. 09. 029]
 DU Rui-shan,CHEN Si-lu,LIU Wen-hao.Named Entity Recognition Based on Rock Text Information[J].,2022,32(09):188.[doi:10. 3969 / j. issn. 1673-629X. 2022. 09. 029]
[7]胡慧婷,李建平,董振荣,等.基于 BERT 模型的教育技术学领域实体抽取[J].计算机技术与发展,2022,32(10):164.[doi:10. 3969 / j. issn. 1673-629X. 2022. 10. 027]
 HU Hui-ting,LI Jian-ping,DONG Zhen-rong,et al.Named Entity Recognition Method in Educational Technology Field Based on BERT[J].,2022,32(09):164.[doi:10. 3969 / j. issn. 1673-629X. 2022. 10. 027]
[8]罗 峦,夏骄雄.融合 ERNIE 与改进 Transformer 的中文 NER 模型[J].计算机技术与发展,2022,32(10):120.[doi:10. 3969 / j. issn. 1673-629X. 2022. 10. 020]
 LUO Luan,XIA Jiao-xiong.Research on Chinese Named Entity Recognition Combining ERNIE with Improved Transformer[J].,2022,32(09):120.[doi:10. 3969 / j. issn. 1673-629X. 2022. 10. 020]
[9]赵建强,朱万彤,陈 诚.基于多重卷积神经网络模型的命名实体识别[J].计算机技术与发展,2023,33(01):187.[doi:10. 3969 / j. issn. 1673-629X. 2023. 01. 028]
 ZHAO Jian-qiang,ZHU Wan-tong,CHEN Cheng.Named Entity Recognition Based on Duplex Convolution Neural Network Model[J].,2023,33(09):187.[doi:10. 3969 / j. issn. 1673-629X. 2023. 01. 028]
[10]贵向泉,郭 亮,李 立.基于 MRC 和 ERNIE 的有色冶金命名实体识别模型[J].计算机技术与发展,2023,33(10):93.[doi:10. 3969 / j. issn. 1673-629X. 2023. 10. 015]
 GUI Xiang-quan,GUO Liang,LI Li.Nonferrous Metallurgical Named Entity Recognition Model Based on MRC and ERNIE[J].,2023,33(09):93.[doi:10. 3969 / j. issn. 1673-629X. 2023. 10. 015]
[11]李欣宇,赵 震*.命名实体消歧研究综述[J].计算机技术与发展,2024,34(02):1.[doi:10. 3969 / j. issn. 1673-629X. 2024. 02. 001]
 LI Xin-yu,ZHAO Zhen*.Review of Named Entity Disambiguation Studies[J].,2024,34(09):1.[doi:10. 3969 / j. issn. 1673-629X. 2024. 02. 001]

更新日期/Last Update: 2022-09-10