[1]张雄,赵礼峰. 基于泛化能力的K-均值最佳聚类数确定方法[J].计算机技术与发展,2017,27(09):31-34.
 ZHANG Xiong,ZHAO Li-feng. A Method for Determination of Optimal Value in K-means Clustering with Generalization[J].,2017,27(09):31-34.
点击复制

 基于泛化能力的K-均值最佳聚类数确定方法()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
27
期数:
2017年09期
页码:
31-34
栏目:
智能、算法、系统工程
出版日期:
2017-09-10

文章信息/Info

Title:
 A Method for Determination of Optimal Value in K-means Clustering with Generalization
文章编号:
1673-629X(2017)09-0031-04
作者:
 张雄赵礼峰
 南京邮电大学 理学院
Author(s):
 ZHANG XiongZHAO Li-feng
关键词:
 K-均值最佳聚类数泛化能力非监督学习
Keywords:
 K-means clusteringoptimal number of clustersgeneralizationunsupervised learning
分类号:
TP301
文献标志码:
A
摘要:
 
针对K-均值聚类算法需要事先确定聚类数,而人为设定聚类数存在极大主观性的缺点,提出了一种基于泛化能力的最佳聚类数确定方法.该方法认为:一个好的聚类结果,应该对未知的样本有着良好的泛化能力.其通过设计一种泛化能力指标(GA)来评价得到的聚类模型对未知样本的分类能力,泛化能力指标的值越大,则聚类模型的效果越好,以泛化能力最优的聚类模型所对应的K值作为最佳聚类数.为了测试所提出方法的稳定性和有效性,分别基于真实数据集Iris以及人造数据集对基于泛化能力的最佳聚类数确定方法进行了实验验证,均能准确找到数据集最佳聚类数.实验结果表明,该方法能够简单、高效地获得最佳聚类数,且对数据集的聚类效果良好.
Abstract:
 Aimed at the defect of K-means clustering algorithm determining the clustering number in advance which could be defined arti-ficially and is subjective in computations, a method of determining an optimal clustering value with generalization is proposed. It is thought that a good clustering result should have good generalization to the unknown samples. Therefore,a generalization index is de-signed to evaluate the classification of the unknown samples in the clustering model obtained. The more the value of generalization index, the better the effect of clustering model. The K value corresponded by clustering model with optimal generalization is selected as the opti-mal clustering value. In order to verify its stability and effectiveness, the experiments are carried out in optimal clustering determining methods based on generalization based on Iris and artificial data set,which indicate that it is simple and efficient to obtain the optimal clustering number,and has the good clustering effect.

相似文献/References:

[1]张志宏,吴庆波,邵立松,等.基于飞腾平台TOE协议栈的设计与实现[J].计算机技术与发展,2014,24(07):1.
 ZHANG Zhi-hong,WU Qing-bo,SHAO Li-song,et al. Design and Implementation of TCP/IP Offload Engine Protocol Stack Based on FT Platform[J].,2014,24(09):1.
[2]梁文快,李毅. 改进的基因表达算法对航班优化排序问题研究[J].计算机技术与发展,2014,24(07):5.
 LIANG Wen-kuai,LI Yi. Research on Optimization of Flight Scheduling Problem Based on Improved Gene Expression Algorithm[J].,2014,24(09):5.
[3]黄静,王枫,谢志新,等. EAST文档管理系统的设计与实现[J].计算机技术与发展,2014,24(07):13.
 HUANG Jing,WANG Feng,XIE Zhi-xin,et al. Design and Implementation of EAST Document Management System[J].,2014,24(09):13.
[4]侯善江[],张代远[][][]. 基于样条权函数神经网络P2P流量识别方法[J].计算机技术与发展,2014,24(07):21.
 HOU Shan-jiang[],ZHANG Dai-yuan[][][]. P2P Traffic Identification Based on Spline Weight Function Neural Network[J].,2014,24(09):21.
[5]李璨,耿国华,李康,等. 一种基于三维模型的文物碎片线图生成方法[J].计算机技术与发展,2014,24(07):25.
 LI Can,GENG Guo-hua,LI Kang,et al. A Method of Obtaining Cultural Debris’ s Line Chart Based on Three-dimensional Model[J].,2014,24(09):25.
[6]翁鹤,皮德常. 混沌RBF神经网络异常检测算法[J].计算机技术与发展,2014,24(07):29.
 WENG He,PI De-chang. Chaotic RBF Neural Network Anomaly Detection Algorithm[J].,2014,24(09):29.
[7]刘茜[],荆晓远[],李文倩[],等. 基于流形学习的正交稀疏保留投影[J].计算机技术与发展,2014,24(07):34.
 LIU Qian[],JING Xiao-yuan[,LI Wen-qian[],et al. Orthogonal Sparsity Preserving Projections Based on Manifold Learning[J].,2014,24(09):34.
[8]尚福华,李想,巩淼. 基于模糊框架-产生式知识表示及推理研究[J].计算机技术与发展,2014,24(07):38.
 SHANG Fu-hua,LI Xiang,GONG Miao. Research on Knowledge Representation and Inference Based on Fuzzy Framework-production[J].,2014,24(09):38.
[9]叶偲,李良福,肖樟树. 一种去除运动目标重影的图像镶嵌方法研究[J].计算机技术与发展,2014,24(07):43.
 YE Si,LI Liang-fu,XIAO Zhang-shu. Research of an Image Mosaic Method for Removing Ghost of Moving Targets[J].,2014,24(09):43.
[10]余松平[][],蔡志平[],吴建进[],等. GSM-R信令监测选择录音系统设计与实现[J].计算机技术与发展,2014,24(07):47.
 YU Song-ping[][],CAI Zhi-ping[] WU Jian-jin[],GU Feng-zhi[]. Design and Implementation of an Optional Voice Recording System Based on GSM-R Signaling Monitoring[J].,2014,24(09):47.

更新日期/Last Update: 2017-10-19