[1]谢志明[][],王鹏[],黄焱[]. 多维数据K-means谱聚类算法改进研究[J].计算机技术与发展,2017,27(10):60-64.
 XIE Zhi-ming[][],WANG Peng[],HUANG Yan[]. Research on Modification of K-means Spectral Clustering Algorithm of Multidimensional Data[J].,2017,27(10):60-64.
点击复制

 多维数据K-means谱聚类算法改进研究()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
27
期数:
2017年10期
页码:
60-64
栏目:
智能、算法、系统工程
出版日期:
2017-10-10

文章信息/Info

Title:
 Research on Modification of K-means Spectral Clustering Algorithm of Multidimensional Data
文章编号:
1673-629X(2017)10-0060-05
作者:
 谢志明[1][2]; 王鹏[3]黄焱[4]
 1.汕尾职业技术学院 信息工程系;2.汕尾市创新工业设计研究院 云计算与数据中心工程设计研究所;3.西南民族大学 计算机科学与技术学院;4.淮阴师范学院 计算机科学与技术学院
Author(s):
 XIE Zhi-ming[1][2] WANG Peng[3] HUANG Yan[4]
关键词:
 K-means算法谱聚类算法聚类FCM算法隶属度矩阵
Keywords:
 K-means algorithmspectral clustering algorithmclusteringFCM algorithmdegree of membership matrix
分类号:
TP301.6
文献标志码:
A
摘要:
 
针对传统K-means算法不能自动确定初始聚类数目k和谱聚类算法对参数敏感的问题,提出了一种基于谱聚类的K-means(PK-means)算法.该算法在对k值选取时进行了创新改进,将计算所得的高密度数据点按规律排序,选择密度点前96%的进行聚类,可以以较高的准确率取得聚类数目k,同时采用了不受参数影响且稳定性更高的基于谱聚类模糊的相似性度量方法,利用FCM算法求隶属度矩阵确定数据点间的相似性.应用PK-means算法、K均值算法与密度敏感的谱聚类算法(DSSC)进行了多维非线性数据处理的测试实验.实验结果表明,无论是对于低维数据集还是高维数据集,K-means算法的处理效率是最低的,DSSC算法稍好,而PK-means算法优势明显,其相比传统聚类算法具有更高的聚类精度和更强的鲁棒性,且维数越高,聚类性能表现越突出.
Abstract:
 Aiming at the problem that the traditional K-means algorithm cannot determine the initial cluster number k automatically and spectral clustering algorithm is sensitive to parameter,a new K -means algorithm based on spectral clustering called PK-means is pro-posed. It makes improvement and innovation in selection of k values,sorts the calculated high density data points orderly,and then picks out the frontal 96% density point to cluster,so that the number of clusters k can be obtained with high accuracy. In the meantime,it also selects the unaffected and higher stable similarity measure method based on spectral clustering fuzziness and uses the FCM algorithm for membership degree matrix so as to determine the similarity between data points. The PK-means, K -means and DSSC have been em-ployed to deal with multi-dimensional nonlinear datasets. The experimental results show that whether the selected data source is low di-mension or high dimension,the efficiency of K-means is the lowest,followed by DSSC,and PK-means owns obvious advantages which always has the higher clustering accuracy and stronger robustness than the traditional clustering algorithm. The higher the dimension,the more prominent the clustering performance.

相似文献/References:

[1]耿筱媛 张燕平 闫屹.改进的K—means算法在电信客户细分中的应用[J].计算机技术与发展,2008,(05):163.
 GENG Xiao-yuan,ZHANG Yan-ping,YAN Yi.Application of Improved K - means Algorithm Subdivision of Telecom Clients[J].,2008,(10):163.
[2]黄韬 刘胜辉 谭艳娜.基于k-means聚类算法的研究[J].计算机技术与发展,2011,(07):54.
 HUANG Tao,LIU Sheng-hui,TAN Yan-na.Research of Clustering Algorithm Based on K-means[J].,2011,(10):54.
[3]周婷,张君瑛,罗成.基于Hadoop的K-means聚类算法的实现[J].计算机技术与发展,2013,(07):18.
 ZHOU Ting[],ZHANG Jun-ying[],LUO Cheng[].Realization of K-means Clustering Algorithm Based on Hadoop[J].,2013,(10):18.
[4]何聚厚,范文静.基于改进K-Means算法的教学反思文本聚类研究[J].计算机技术与发展,2013,(11):99.
 HE Ju-hou[],FAN Wen-jing[].Research on Text Clustering of Teaching Reflection Based on Improved K-Means Algorithm[J].,2013,(10):99.
[5]谢秀华,李陶深.一种基于改进PSO的K-means优化聚类算法[J].计算机技术与发展,2014,24(02):34.
[6]杨永涛,李静.一种改进的K-means数字资源聚类算法[J].计算机技术与发展,2014,24(06):107.
 YANG Yong-tao[],LI Jing[].An Improved K-means Clustering Algorithm for Digital Resources[J].,2014,24(10):107.
[7]张志宏,吴庆波,邵立松,等.基于飞腾平台TOE协议栈的设计与实现[J].计算机技术与发展,2014,24(07):1.
 ZHANG Zhi-hong,WU Qing-bo,SHAO Li-song,et al. Design and Implementation of TCP/IP Offload Engine Protocol Stack Based on FT Platform[J].,2014,24(10):1.
[8]梁文快,李毅. 改进的基因表达算法对航班优化排序问题研究[J].计算机技术与发展,2014,24(07):5.
 LIANG Wen-kuai,LI Yi. Research on Optimization of Flight Scheduling Problem Based on Improved Gene Expression Algorithm[J].,2014,24(10):5.
[9]周爱武 于亚飞.K-Means聚类算法的研究[J].计算机技术与发展,2011,(02):62.
 ZHOU Ai-wu,YU Ya-fei.The Research about Clustering Algorithm of K-Means[J].,2011,(10):62.
[10]黄静,王枫,谢志新,等. EAST文档管理系统的设计与实现[J].计算机技术与发展,2014,24(07):13.
 HUANG Jing,WANG Feng,XIE Zhi-xin,et al. Design and Implementation of EAST Document Management System[J].,2014,24(10):13.
[11]尹成祥 张宏军,张睿,綦秀利,等. 一种改进的K-Means算法[J].计算机技术与发展,2014,24(10):30.
 YIN Cheng-xiang,ZHANG Hong-jun,ZHANG Rui,et al. An Improved K-Means Clustering Algorithm[J].,2014,24(10):30.
[12]袁武,任勋益. 水平分割数据的保护隐私聚类挖掘方法研究[J].计算机技术与发展,2015,25(05):115.
 YUAN Wu,REN Xun-yi. Research on Privacy Preserving Clustering Method for Horizontal Partitioned Data[J].,2015,25(10):115.
[13]李振,贾瑞玉. 一种改进的K-means蚁群聚类算法[J].计算机技术与发展,2015,25(12):28.
 LI Zhen,JIA Rui-yu. An Improved K-means Ant Colony Clustering Algorithm[J].,2015,25(10):28.
[14]唐丹[],张正军[],王俐莉[]. 基于改进的近邻传播聚类算法的Gap统计研究[J].计算机技术与发展,2017,27(01):182.
 TANG Dan[],ZHANG Zheng-jun[],WANG Li-li[]. Study on Gap Statistic Based on Modified Affinity Propagation Clustering[J].,2017,27(10):182.
[15]戚后林,顾磊. 基于密度与最小距离的K-means算法初始中心方法[J].计算机技术与发展,2017,27(09):60.
 QI Hou-lin,GU Lei. An Initial Center Algorithm of K-means Based on Density and Minimum Distance[J].,2017,27(10):60.

更新日期/Last Update: 2017-11-23