[1]牛永洁,田成龙.融合多因素的 TFIDF 关键词提取算法研究[J].计算机技术与发展,2019,29(07):80-83.[doi:10. 3969 / j. issn. 1673-629X. 2019. 07. 016]
NIU Yong-jie,TIAN Cheng-long.Research on TFIDF Keyword Extraction Algorithm Based on Multiple Factors[J].,2019,29(07):80-83.[doi:10. 3969 / j. issn. 1673-629X. 2019. 07. 016]
点击复制
融合多因素的 TFIDF 关键词提取算法研究(
)
《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]
- 卷:
-
29
- 期数:
-
2019年07期
- 页码:
-
80-83
- 栏目:
-
智能、算法、系统工程
- 出版日期:
-
2019-07-10
文章信息/Info
- Title:
-
Research on TFIDF Keyword Extraction Algorithm Based on Multiple Factors
- 文章编号:
-
1673-629X(2019)07-0080-04
- 作者:
-
牛永洁; 田成龙
-
延安大学 数学与计算机学院,陕西 延安 716000
- Author(s):
-
NIU Yong-jie; TIAN Cheng-long
-
School of Mathematics &Computer,Yan’an University,Yan’an 716000,China
-
- 关键词:
-
TFIDF 算法; 词位置; 词性; 词语关联; 词长; 词跨度
- Keywords:
-
TFIDF; word position; part of speech; word correlation; word length; word span
- 分类号:
-
TP301.6
- DOI:
-
10. 3969 / j. issn. 1673-629X. 2019. 07. 016
- 摘要:
-
为了能更加准确、快速地提取文本中的关键词,首先需要对待提取的文本进行数据清洗,去掉其中的噪声数据,接着对文本进行分词操作,在去掉停用词的基础上,综合考虑词语的位置、词性、词语关联性、词长和词跨度等因素,将这些因素与经典的 TFIDF 关键词提取算法相结合,采用不同权重的方法得到最终的词语权重,按照词语权重从大到小取得前 5个词作为文本的关键词。 以本校图书馆提供的 8 045 篇《红色中华》新闻为源数据,从准确度、召回率及 F 1 值三个指标对文中算法、经典的 TFIDF 算法和专家标注进行对比,发现文中算法在三个指标上均优于经典的 TFIDF 算法,与专家标注比较接近。
- Abstract:
-
In order to extract the key words in the text more accurately and quickly,the first step is to clean the extracted text,remove the noise data,and then perform word segmentation on the text. On the basis of removing the stop words,the word location,part of speech,word relevance,word length and word span are considered comprehensively. These factors are combined with the classic TFIDF key word extraction algorithm. The final word weight is obtained by using the method of different weights,and the first five words are taken as the key words in the text according to the weight of words from large to small. Based on the news of the 8 045 “Red China” provided by the library,by comparing the algorithm proposed,the classical TFIDF algorithm and expert annotation from three indexes of accuracy,recall rand F 1 ,it is found that the algorithm proposed is superior to the classical TFIDF algorithm in three indexes and is close to expert annotation.
更新日期/Last Update:
2019-07-10