[1]刘申凯,周霁婷,朱永华,等.融合知识图谱和 ESA 方法的网络新词识别[J].计算机技术与发展,2019,29(03):12-17.[doi:10.3969/ j. issn.1673-629X.2019.03.003]
 LIU Shen-kai,ZHOU Ji-ting,ZHU Yong-hua,et al.Network New Word Recognition Based on Fusion of Knowledge Graph and ESA[J].,2019,29(03):12-17.[doi:10.3969/ j. issn.1673-629X.2019.03.003]
点击复制

融合知识图谱和 ESA 方法的网络新词识别()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
29
期数:
2019年03期
页码:
12-17
栏目:
智能、算法、系统工程
出版日期:
2019-03-10

文章信息/Info

Title:
Network New Word Recognition Based on Fusion of Knowledge Graph and ESA
文章编号:
1673-629X(2019)03-0012-06
作者:
刘申凯1 周霁婷1 朱永华2 高洪皓 23
1. 上海大学,上海 200072;2. 上海大学 计算机工程与科学学院,上海 200444;3. 上海大学 计算中心,上海 200444
Author(s):
LIU Shen-kai1 ZHOU Ji-ting1 ZHU Yong-hua2 GAO Hong-hao23
1. Shanghai University,Shanghai 200072,China;2. School of Computer Engineering and Science,Shanghai University,Shanghai 200444,China;3. Computing Center,Shanghai University,Shanghai 200444,China
关键词:
语义识别语义相关度新词识别知识图谱显性语义分析
Keywords:
semantic recognitionsemantic relevanceneologism recognitionknowledge graphexplicit semantic analysis
分类号:
TP391.1
DOI:
10.3969/ j. issn.1673-629X.2019.03.003
摘要:
随着互联网的高速发展,微博、微信等文本形式的使用量逐渐增多,对于这类文本的分析理解在自然语言处理领域形成了新的挑战,尤其是文本中的网络新词识别与语义理解方面。 为了克服传统方法无法识别网络新词及其语义的缺点,提出了一种融合知识图谱和显性语义分析(explicit semantic analysis,ESA)方法的网络新词识别方法。 该方法以短语的粗粒度对原文进行切分来保留词语间的逻辑关系,利用百度知识图谱 Schema 匹配短语的语义表达后,再逐步以 ESA 方法分解剩余文本并将短语的百科信息提炼出核心语义词汇来补充 Schema 无法识别的部分。 实验结果表明,与已有新词识别算法相比,该算法仅需要少量的语料库作为底层知识支撑,大幅降低了人工规则制订的成本,并提高了网络新词识别正确率与词语理解准确率。
Abstract:
With the rapid development of the Internet,the use of Weibo,WeChat and other text forms is gradually increasing. The analysisand understanding of such texts has posed new challenges in the field of natural language processing,especially in the field of network neologism recognition and semantic understanding. In order to overcome the shortcomings of traditional methods that cannot identify network neologism and their semantics,we propose a new method of network neologism recognition by combining knowledge map and ex plicit semantic analysis methods,which segments the original text with the coarse-grained phrase to preserve the logical relationship be tween the words. After using the semantic expression phrase of the Baidu knowledge map Schema,the ESA method is used to gradually decompose the remaining texts and extract the phrase encyclopedia information into the core semantic vocabulary,supplementing the un recognized part of the Schema. Experiment shows that compared with the existing neologism recognition algorithms,the proposed algo rithm requires only a small amount of corpus,which reduces the cost of manual rules formulation and improves the recognition of network neologism and the accuracy of word comprehension

相似文献/References:

[1]孟涛,王诚.基于扩展短文本词特征向量的分类研究[J].计算机技术与发展,2019,29(04):57.[doi:10. 3969 / j. issn. 1673-629X. 2019. 04. 012]
 MENG Tao,WANG Cheng.Research on Short Text Classification Based on Extended Word Feature Vectors[J].,2019,29(03):57.[doi:10. 3969 / j. issn. 1673-629X. 2019. 04. 012]
[2]常识知识的思维机理.常识知识的思维机理[J].计算机技术与发展,2021,31(增刊):41.[doi:10. 3969 / j. issn. 1673-629X. 2021. S. 008]
 ZHU Ping.Thinking Mechanism of Commonsense Knowledge[J].,2021,31(03):41.[doi:10. 3969 / j. issn. 1673-629X. 2021. S. 008]

更新日期/Last Update: 2019-03-10