[1]牛永洁,田成龙.融合多因素的 TFIDF 关键词提取算法研究[J].计算机技术与发展,2019,29(07):80-83.[doi:10. 3969 / j. issn. 1673-629X. 2019. 07. 016]
 NIU Yong-jie,TIAN Cheng-long.Research on TFIDF Keyword Extraction Algorithm Based on Multiple Factors[J].,2019,29(07):80-83.[doi:10. 3969 / j. issn. 1673-629X. 2019. 07. 016]
点击复制

融合多因素的 TFIDF 关键词提取算法研究()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
29
期数:
2019年07期
页码:
80-83
栏目:
智能、算法、系统工程
出版日期:
2019-07-10

文章信息/Info

Title:
Research on TFIDF Keyword Extraction Algorithm Based on Multiple Factors
文章编号:
1673-629X(2019)07-0080-04
作者:
牛永洁田成龙
延安大学 数学与计算机学院,陕西 延安 716000
Author(s):
NIU Yong-jieTIAN Cheng-long
School of Mathematics &Computer,Yan’an University,Yan’an 716000,China
关键词:
TFIDF 算法词位置词性词语关联词长词跨度
Keywords:
TFIDFword positionpart of speechword correlationword lengthword span
分类号:
TP301.6
DOI:
10. 3969 / j. issn. 1673-629X. 2019. 07. 016
摘要:
为了能更加准确、快速地提取文本中的关键词,首先需要对待提取的文本进行数据清洗,去掉其中的噪声数据,接着对文本进行分词操作,在去掉停用词的基础上,综合考虑词语的位置、词性、词语关联性、词长和词跨度等因素,将这些因素与经典的 TFIDF 关键词提取算法相结合,采用不同权重的方法得到最终的词语权重,按照词语权重从大到小取得前 5个词作为文本的关键词。 以本校图书馆提供的 8 045 篇《红色中华》新闻为源数据,从准确度、召回率及 F 1 值三个指标对文中算法、经典的 TFIDF 算法和专家标注进行对比,发现文中算法在三个指标上均优于经典的 TFIDF 算法,与专家标注比较接近。
Abstract:
In order to extract the key words in the text more accurately and quickly,the first step is to clean the extracted text,remove the noise data,and then perform word segmentation on the text. On the basis of removing the stop words,the word location,part of speech,word relevance,word length and word span are considered comprehensively. These factors are combined with the classic TFIDF key word extraction algorithm. The final word weight is obtained by using the method of different weights,and the first five words are taken as the key words in the text according to the weight of words from large to small. Based on the news of the 8 045 “Red China” provided by the library,by comparing the algorithm proposed,the classical TFIDF algorithm and expert annotation from three indexes of accuracy,recall rand F 1 ,it is found that the algorithm proposed is superior to the classical TFIDF algorithm in three indexes and is close to expert annotation.
更新日期/Last Update: 2019-07-10