[1]鲍黎明,黄刚. 基于多叉树确定K值的动态K-means聚类算法[J].计算机技术与发展,2017,27(06):41-45.
 BAO Li-ming,HUANG Gang. A Dynamic Clustering Algorithm of K-means Based onMulti-branches Tree for K-values[J].,2017,27(06):41-45.
点击复制

 基于多叉树确定K值的动态K-means聚类算法()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
27
期数:
2017年06期
页码:
41-45
栏目:
智能、算法、系统工程
出版日期:
2017-06-10

文章信息/Info

Title:
 A Dynamic Clustering Algorithm of K-means Based onMulti-branches Tree for K-values
文章编号:
1673-629X(2017)06-0041-05
作者:
 鲍黎明黄刚
 南京邮电大学 计算机学院
Author(s):
 BAO Li-mingHUANG Gang
关键词:
 K-means聚类分裂合并多叉树
Keywords:
 K-meansclusteringdividing mergingmulti-branches tree
分类号:
TP301.6
文献标志码:
A
摘要:
 K-means聚类算法是基于划分的经典聚类算法之一,因其简洁、高效得到了广泛的应用.K-means算法具有容易实现、时间和空间复杂度较小的优点.但该算法的初始聚类数K通常不能通过有效的手段事先确定,其初始聚类中心往往是随机选取的,易收敛于局部最优解,造成聚类结果的不准确.基于多叉树确定K值的动态K-means聚类算法是对传统算法的改进,力求在迭代过程中动态分裂合并簇来确定最合理的聚类数,并且能在一定程度上解决聚类结果收敛于局部最优解的问题.文中还探索了相应的数据模型以支持所改进算法的研究,并从横向与纵向两方面与二分K-means算法作了对比实验.实验结果表明,改进后的K-means算法不依赖于全局数据集,更适用于分布式平台运算;算法相对效率随着数据集规模的增大,特别是在洪量数据集下具有明显的优势.
Abstract:
 K-means algorithm is the one of most classical clustering algorithms with repartition and has been widely used because it’’s really concise and efficient.What’’s more,it has advantages such as being easy to be implemented and low cost of complexity in running time and storing space.However,it’’s normally initial number called K-value which cannot be precisely predicted by effective method.The initial clustering center used to be chosen randomly,so that the result usually converges to local optimal solution,which makes the latest clustering results inaccurate.The dynamic clustering algorithm of K-means based on multi-branches tree to determine the K-value is an improved one.The improved algorithm has been designed to determine the most reasonable K-value by dynamically dividing and merging cluster during the iterative process and partly solved the problem that clustering result converges to local optimal solution.Furthermore,exploration for corresponding data structure model has also been conducted to the investigation of the algorithm mentioned.Horizontal and vertical comparison with the binary K-means algorithm has been achieved.The comparison and analysis results show that the improved K-means algorithm is independent of improved global data sets,which makes it more suitable for distributed computing platform and that relative efficiency has been increased with increase of the size of the data set,especially in magnanimity data set.Therefore the improved K-means algorithm has promoted the clustering performance and can lead to a more stable clustering result.

相似文献/References:

[1]范新 沈闻 丁泉勋 沈洁.基于正例和未标文档的半监督分类研究[J].计算机技术与发展,2009,(06):58.
 FAN Xin,SHEN Wen,DING Quan-xun,et al.Research on Semi- Supervised Classification Based on Positive and Unlabeled Text Document[J].,2009,(06):58.
[2]李若鹏 李翔 林祥 李建华.基于DK算法的互联网热点主动发现研究与实现[J].计算机技术与发展,2008,(09):1.
 LI Ruo-peng,LI Xiang,LIN Xiang,et al.Discovering Information Hotspots on Initiative over Internet Based on DK Clustering Algorithm[J].,2008,(06):1.
[3]朱云贺 张春海 张博.基于数据分段的K-means的优化研究[J].计算机技术与发展,2010,(11):130.
 ZHU Yun-he,ZHANG Chun-hai,ZHANG Bo.Optimizing Research on K-means Based on Data Partition[J].,2010,(06):130.
[4]何云 李辉 姚能坚 赵榕生.改进K-means算法实现移动通信行为特征分析[J].计算机技术与发展,2011,(06):63.
 HE Yun,LI Hui,YAO Neng-jian,et al.Application of Improved K-Means Algorithm in Mobile Communication Behavioral Characteristic Analysis[J].,2011,(06):63.
[5]黎银环,张剑.改进的 K-means 算法在入侵检测中的应用[J].计算机技术与发展,2013,(01):165.
 LI Yin-huan,ZHANG Jian.Application of Improved K-means Clustering Algorithm in Intrusion Detection[J].,2013,(06):165.
[6]李四海,满自斌.自适应特征权重的K-means聚类算法[J].计算机技术与发展,2013,(06):98.
 LI Si-hai[],MAN Zi-bin[].K-means Clustering Algorithm Based on Adaptive Feature Weighted[J].,2013,(06):98.
[7]耿永政,陈坚.结合图论的JSEG彩色图像分割算法[J].计算机技术与发展,2014,24(05):15.
 GENG Yong-zheng,CHEN Jian.JSEG Color Image Segmentation Algorithm Combining Graph Theory[J].,2014,24(06):15.
[8]张志宏,吴庆波,邵立松,等.基于飞腾平台TOE协议栈的设计与实现[J].计算机技术与发展,2014,24(07):1.
 ZHANG Zhi-hong,WU Qing-bo,SHAO Li-song,et al. Design and Implementation of TCP/IP Offload Engine Protocol Stack Based on FT Platform[J].,2014,24(06):1.
[9]梁文快,李毅. 改进的基因表达算法对航班优化排序问题研究[J].计算机技术与发展,2014,24(07):5.
 LIANG Wen-kuai,LI Yi. Research on Optimization of Flight Scheduling Problem Based on Improved Gene Expression Algorithm[J].,2014,24(06):5.
[10]黄静,王枫,谢志新,等. EAST文档管理系统的设计与实现[J].计算机技术与发展,2014,24(07):13.
 HUANG Jing,WANG Feng,XIE Zhi-xin,et al. Design and Implementation of EAST Document Management System[J].,2014,24(06):13.
[11]陈斌,苏一丹,黄山. 基于KM-SMOTE和随机森林的不平衡数据分类[J].计算机技术与发展,2015,25(09):17.
 CHEN Bin,SU Yi-dan,HUANG Shan. Classification of Imbalance Data Based on KM-SMOTE Algorithm and Random Forest[J].,2015,25(06):17.
[12]胡磊,蔡红霞,俞涛. 双重聚类的协同过滤算法在智能家居中的应用[J].计算机技术与发展,2017,27(02):100.
 HU Lei,CAI Hong-xia,YU Tao. Application of Collaborative Filtering Recommendation Based on Double Clustering in Smart Home System[J].,2017,27(06):100.
[13]曹耀彬,王亚刚. 免疫算法优化的RBF在入侵检测中的应用[J].计算机技术与发展,2017,27(06):114.
 CAO Yao-bin,WANG Ya-gang. Application of RBF Neural Network Optimized by Immune Algorithm in Intrusion Detection[J].,2017,27(06):114.
[14]李玉波[],杨余旺[],唐浩[],等. 基于Spark的K-means安全区间更新优化算法[J].计算机技术与发展,2017,27(08):1.
 LI Yu-bo[],YANG Yu-wang[],TANG Hao[],et al. Optimization of K-means Updating Security Interval Based on Spark[J].,2017,27(06):1.
[15]万新贵,李玲娟. 基于结构与属性的社区划分方法[J].计算机技术与发展,2017,27(08):97.
 WAN Xin-gui,LI Ling-juan. Community Division Method with Structure and Attribute[J].,2017,27(06):97.

更新日期/Last Update: 2017-07-20