[1]杨丹,朱世玲,卞正宇.基于改进的K-means算法在文本挖掘中的应用[J].计算机技术与发展,2019,29(04):68-71.[doi:10. 3969 / j. issn. 1673-629X. 2019. 04. 014]
 YANG Dan,ZHU Shi-ling,BIAN Zheng-yu.Application of Improved K-means Algorithm in Text Mining[J].,2019,29(04):68-71.[doi:10. 3969 / j. issn. 1673-629X. 2019. 04. 014]
点击复制

基于改进的K-means算法在文本挖掘中的应用()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
29
期数:
2019年04期
页码:
68-71
栏目:
智能、算法、系统工程
出版日期:
2019-04-10

文章信息/Info

Title:
Application of Improved K-means Algorithm in Text Mining
文章编号:
1673-629X(2019)04-0068-04
作者:
杨丹朱世玲卞正宇
南京邮电大学 计算机学院,江苏 南京 210003
Author(s):
YANG DanZHU Shi-lingBIAN Zheng-yu
School of Computer Science and Technology,Nanjing University of Posts and Telecommunications, Nanjing 210003,China
关键词:
K-means算法聚类中心文本聚类文本距离稀疏度
Keywords:
K-means algorithmclustering centertext clusteringtext distancesparseness
分类号:
TP391. 1
DOI:
10. 3969 / j. issn. 1673-629X. 2019. 04. 014
摘要:
K-means算法具有简单易于理解的特征,广泛运用于聚类过程中,但是其初始聚类中心是随机确定的,这样极容易导致聚类结果的稳定性很差。针对传统K-means算法对于初始聚类中心选择的敏感性及最大最小距离法容易选取离散点的不足,提出了一种新的聚类中心选择评判函数,依次考察每个点的函数值,选取当前函数值最大的点作为新的聚类中心,直到满足事先确定的聚类中心数。新聚类中心评判函数既可以保证新中心点周围是紧凑的,又可以保证远离其他中心点。最后将该算法运应用于文本聚类之中,根据准确率、召回率及F度量值来衡量算法的聚类质量。实验结果表明,该算法相对于传统算法和最大最小距离算法,准确率更高,聚类质量更好,较适合于文本聚类。
Abstract:
The K-means algorithm is simple and easy to understand,widely used in the clustering process. However,the initial cluster centers are randomly determined,which can easily lead to poor stability of the clustering results. In view of the sensitivity of the traditional K-means algorithm to the selection of the initial clustering center and the shortcoming of the maximum and minimum distance method in the selection of discrete points,we propose a new evaluation function for the selection of the clustering center. The function value ofeach point is examined successively,and the point with the largest current function value is selected as the new clustering center until the predetermined number of clustering centers is satisfied. The new clustering center evaluation function can not only ensure the compactness around the new center point,but also keep it away from other centers. In the last,the improved algorithm is applied to text clustering,and its clustering quality is measured according to the accuracy rate,recall rate and F metric. The experiment shows that the proposed algorithm has higher accuracy,better clustering quality,which is more suitable for text clustering than the traditional algorithm and the maximum and minimum distance algorithm.

相似文献/References:

[1]张义超 卢英 李炜.RBF网络隐含层节点的优化[J].计算机技术与发展,2009,(01):103.
 ZHANG Yi-chao,LU Ying,LI Wei.RBF Network of Hidden Layer Nodes Optimization[J].,2009,(04):103.
[2]耿筱媛 张燕平 闫屹.改进的K—means算法在电信客户细分中的应用[J].计算机技术与发展,2008,(05):163.
 GENG Xiao-yuan,ZHANG Yan-ping,YAN Yi.Application of Improved K - means Algorithm Subdivision of Telecom Clients[J].,2008,(04):163.
[3]朱云贺 张春海 张博.基于数据分段的K-means的优化研究[J].计算机技术与发展,2010,(11):130.
 ZHU Yun-he,ZHANG Chun-hai,ZHANG Bo.Optimizing Research on K-means Based on Data Partition[J].,2010,(04):130.
[4]黄韬 刘胜辉 谭艳娜.基于k-means聚类算法的研究[J].计算机技术与发展,2011,(07):54.
 HUANG Tao,LIU Sheng-hui,TAN Yan-na.Research of Clustering Algorithm Based on K-means[J].,2011,(04):54.
[5]周婷,张君瑛,罗成.基于Hadoop的K-means聚类算法的实现[J].计算机技术与发展,2013,(07):18.
 ZHOU Ting[],ZHANG Jun-ying[],LUO Cheng[].Realization of K-means Clustering Algorithm Based on Hadoop[J].,2013,(04):18.
[6]邓海,覃华,孙欣.一种优化初始中心的K-means聚类算法[J].计算机技术与发展,2013,(11):42.
 DENG Hai,QIN Hua,SUN Xin.A K-means Clustering Algorithm of Meliorated Initial Center[J].,2013,(04):42.
[7]何聚厚,范文静.基于改进K-Means算法的教学反思文本聚类研究[J].计算机技术与发展,2013,(11):99.
 HE Ju-hou[],FAN Wen-jing[].Research on Text Clustering of Teaching Reflection Based on Improved K-Means Algorithm[J].,2013,(04):99.
[8]谢秀华,李陶深.一种基于改进PSO的K-means优化聚类算法[J].计算机技术与发展,2014,24(02):34.
[9]杨永涛,李静.一种改进的K-means数字资源聚类算法[J].计算机技术与发展,2014,24(06):107.
 YANG Yong-tao[],LI Jing[].An Improved K-means Clustering Algorithm for Digital Resources[J].,2014,24(04):107.
[10]周爱武 于亚飞.K-Means聚类算法的研究[J].计算机技术与发展,2011,(02):62.
 ZHOU Ai-wu,YU Ya-fei.The Research about Clustering Algorithm of K-Means[J].,2011,(04):62.

更新日期/Last Update: 2019-04-10