[1]潘晓英,胡开开,朱静. 一种基于TextRank的文本二次聚类算法[J].计算机技术与发展,2016,26(08):7-11.
 PAN Xiao-ying,HU Kai-kai,ZHU Jing. A Secondary Text Clustering Algorithm Based on TextRank[J].,2016,26(08):7-11.
点击复制

 一种基于TextRank的文本二次聚类算法()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
26
期数:
2016年08期
页码:
7-11
栏目:
智能、算法、系统工程
出版日期:
2016-08-10

文章信息/Info

Title:
 A Secondary Text Clustering Algorithm Based on TextRank
文章编号:
1673-629X(2016)08-0007-05
作者:
 潘晓英胡开开朱静
 西安邮电大学 计算机学院
Author(s):
 PAN Xiao-yingHU Kai-kaiZHU Jing
关键词:
 文本聚类TextRank 提取向量空间模型LDA
Keywords:
 text clusteringTextRankkeyword extractionVSMLDA
分类号:
TP391.9
文献标志码:
A
摘要:
 针对传统文本聚类技术中存在的聚类精度一般或者运算时间复杂度过高等问题,文中首先介绍了两种较为常用的文本聚类技术:基于划分的K-means和基于主题模型的LDA。在分析各自缺陷的基础上,提出一种基于TextRank的文本二次聚类算法。该算法借鉴主题模型的思想,在传统的聚类过程中引入词聚类,并在关键词提取阶段融合词语的位置与跨度特征,减少了由局部关键词作为全局关键词带来的误差。实验结果表明,改进后的算法在聚类效果上要优于传统的VSM聚类和基于主题模型的LDA算法。
Abstract:
 In view of the existing problems in the traditional text clustering technology,such as the general accuracy or the higher time complexity,two kinds of the commonly used text clustering technology are introduced at first,including K -means based on the division and LDA based on the theme. On the basis of the analysis of their respective defects,a secondary text clustering algorithm based on the TextRank is presented. Reference of idea of theme model,the algorithm introduces the word clustering in the process of traditional cluste-ring,and merges the futures of location and span in the keyword extraction phase,reducing the error by local keywords as global key-words. The experimental results show that the improved algorithm on the cluster effect is superior to the traditional VSM clustering and LDA algorithm based on the theme model.

相似文献/References:

[1]黄文江 李翔 林祥.基于Chameleon算法的文本聚类技术研究[J].计算机技术与发展,2010,(06):1.
 HUANG Wen-jiang,LI Xiang,LIN Xiang.Research on Text Clustering Based on Chameleon Algorithm[J].,2010,(08):1.
[2]许高建.基于Web的文本挖掘技术研究[J].计算机技术与发展,2007,(06):187.
 XU Gao-jian.Research on Text Mining Techniques Web- Based[J].,2007,(08):187.
[3]费洪晓 穆珺 刘正.基于文本聚类和权重调整的用户兴趣建模算法[J].计算机技术与发展,2007,(02):128.
 FEI Hong-xiao,MU Jun,LIU Zheng.Study on User Profile Learning Algorithm Based on Document Clustering and Feature Weight Adjustment[J].,2007,(08):128.
[4]何聚厚,范文静.基于改进K-Means算法的教学反思文本聚类研究[J].计算机技术与发展,2013,(11):99.
 HE Ju-hou[],FAN Wen-jing[].Research on Text Clustering of Teaching Reflection Based on Improved K-Means Algorithm[J].,2013,(08):99.
[5]李培,马力.网络用户兴趣的智能挖掘方法研究[J].计算机技术与发展,2014,24(02):76.
 LI Pei,MA Li.Research on Intelligent Mining Method for Web Users Interests[J].,2014,24(08):76.
[6]张志宏,吴庆波,邵立松,等.基于飞腾平台TOE协议栈的设计与实现[J].计算机技术与发展,2014,24(07):1.
 ZHANG Zhi-hong,WU Qing-bo,SHAO Li-song,et al. Design and Implementation of TCP/IP Offload Engine Protocol Stack Based on FT Platform[J].,2014,24(08):1.
[7]梁文快,李毅. 改进的基因表达算法对航班优化排序问题研究[J].计算机技术与发展,2014,24(07):5.
 LIANG Wen-kuai,LI Yi. Research on Optimization of Flight Scheduling Problem Based on Improved Gene Expression Algorithm[J].,2014,24(08):5.
[8]黄静,王枫,谢志新,等. EAST文档管理系统的设计与实现[J].计算机技术与发展,2014,24(07):13.
 HUANG Jing,WANG Feng,XIE Zhi-xin,et al. Design and Implementation of EAST Document Management System[J].,2014,24(08):13.
[9]侯善江[],张代远[][][]. 基于样条权函数神经网络P2P流量识别方法[J].计算机技术与发展,2014,24(07):21.
 HOU Shan-jiang[],ZHANG Dai-yuan[][][]. P2P Traffic Identification Based on Spline Weight Function Neural Network[J].,2014,24(08):21.
[10]李璨,耿国华,李康,等. 一种基于三维模型的文物碎片线图生成方法[J].计算机技术与发展,2014,24(07):25.
 LI Can,GENG Guo-hua,LI Kang,et al. A Method of Obtaining Cultural Debris’ s Line Chart Based on Three-dimensional Model[J].,2014,24(08):25.
[11]李晨,杨子江,朱世伟,等. 基于Hadoop的网络舆情监控平台设计与实现[J].计算机技术与发展,2016,26(02):144.
 LI Chen,YANG Zi-jiang,ZHU Shi-wei,et al. Design and Implementation of Network Consensus Monitoring System Based on Hadoop[J].,2016,26(08):144.

更新日期/Last Update: 2016-09-29