[1]李志强,潘苏含,戴 娟,等.一种改进的 TextRank 关键词提取算法[J].计算机技术与发展,2020,30(03):77-81.[doi:10. 3969 / j. issn. 1673-629X. 2020. 03. 015]
 LI Zhi-qiang,PAN Su-han,DAI Juan,et al.An Improved TextRank Keyword Extraction Algorithm[J].Computer Technology and Development,2020,30(03):77-81.[doi:10. 3969 / j. issn. 1673-629X. 2020. 03. 015]
点击复制

一种改进的 TextRank 关键词提取算法()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
30
期数:
2020年03期
页码:
77-81
栏目:
智能、算法、系统工程
出版日期:
2020-03-10

文章信息/Info

Title:
An Improved TextRank Keyword Extraction Algorithm
文章编号:
1673-629X(2020)03-0077-05
作者:
李志强潘苏含戴 娟胡佳佳
扬州大学 信息工程学院,江苏 扬州 225000
Author(s):
LI Zhi-qiangPAN Su-hanDAI JuanHU Jia-jia
School of Information Engineering,Yangzhou University,Yangzhou 225000,China
关键词:
关键词提取TF-IDF算法TextRank算法平均信息熵自然语言处理
Keywords:
Keyword extractionTF-IDF algorithmTextRank algorithmaverage information entropynatural language processing
分类号:
TP301
DOI:
10. 3969 / j. issn. 1673-629X. 2020. 03. 015
摘要:
关键词提取在自然语言处理领域有着广泛的应用,如何准确、快速地从文本中获取关键词信息已经成为文本处理 的关键性问题。 现有的关键词提取方法很多,但是这些关键词提取方法的准确率和通用性有待提高。 因此,提出了一种 改进的 TextRank 关键词提取方法,该方法使用 TF-IDF 方法与平均信息熵方法计算文本中词语的重要性,然后根据计算结 果得到词语的综合权重。 利用词语的综合权重改进 TextRank 算法的节点初始值以及节点概率转移矩阵,通过迭代的方式 计算各个节点的权重,直至收敛,从而得到词语的权重信息,选择Top N 个词语作为关键词输出,实现关键词的提取功能。 实验结果表明,相较于传统的 TF-IDF 方法和 TextRank 方法,提出的改进后的 TextRank 关键词提取方法有更好的通用性, 提取的关键词的准确率更高。
Abstract:
Keyword extraction is widely used in the field of natural language processing. How to quickly and accurately extract keywords has become the key issue in text processing. At present,there are many methods for keyword extraction,but the accuracy and versatility of them need to be improved. Thus,we propose an improved TextRank keyword extraction method which uses the TF-IDF method and the average information entropy method to calculate the importance of words in the text,and then calculates the comprehensive weight of words based on the calculation results. The initial node weight of the TextRank algorithm and the node probability transfer matrix are improved by using the comprehensive weight of words,and the weights of each node are iteratively calculated until convergence. The weights of the nodes are sorted to obtain the weight information of the words. Then,the Top N words are selected as the keywords. The experiment shows that compared with the traditional TF-IDF method and TextRank method,the improved TextRank keyword extraction method proposed is more general and accurate in keywords extraction.

相似文献/References:

[1]许晓昕 李安贵.一种基于TFIDF的网络聊天关键词提取算法[J].计算机技术与发展,2006,(03):122.
 XU Xiao-xin,LI An-gui.A New TFIDF- Based Chat Stream Keyword Extraction Algorithm[J].Computer Technology and Development,2006,(03):122.
[2]许甜华,吴明礼.一种基于 TF-IDF 的朴素贝叶斯算法改进[J].计算机技术与发展,2020,30(02):75.[doi:10. 3969 / j. issn. 1673-629X. 2020. 02. 016]
 XU Tian-hua,WU Ming-li.An Improved Naive Bayes Algorithm Based on TF-IDF[J].Computer Technology and Development,2020,30(03):75.[doi:10. 3969 / j. issn. 1673-629X. 2020. 02. 016]

更新日期/Last Update: 2020-03-10