基于类间相似性的聚类集成方法-《计算机技术与发展》

文章信息/Info

Author(s):: ZHANG Dong-chao; CAI Jiang-hui; YANG Hai-feng; ZHENG Ai-yu; School of Computer Science and Technology,Taiyuan University of Science and Technology,Taiyuan 030024,China

Keywords:: clustering ensemble; co-association matrix; base clusters; evidence accumulation; complexity

摘要:: 聚类集成是聚类的一个重要分支,它用于融合多个基聚类,来生成具有鲁棒性和高质量的最终聚类划分。将原始信息转化为共协矩阵,通过共协矩阵得到最终聚类划分的聚类集成方法是目前很多研究者研究的内容,然而大多数研究者都忽略了聚类结果容易受到噪声的影响,且忽略了共协矩阵在数据量大时,时间以及空间复杂度高的问题。为了解决以上问题,该文设计了一种基于类间相似性的聚类集成方法( CSCE) 。该方法首先基于证据积累模型找到原始对象之间的相似性,将原始对象划分为多个小簇。然后通过一种新的相似度计算方法,计算簇与簇之间的相似度,形成簇与簇的相似矩阵。最后通过归一化切割( NCUT) 切图的方法,将簇相似矩阵划分为最终聚类结果。该方法将低质量异常对象按相似度并入与之相似的簇中,并在 8 个数据集上进行了实验。结果表明,该方法不仅聚类效果好,而且解决了传统共协矩阵时间以及空间复杂度高的问题。

Abstract:: Clustering ensemble is an important branch of clustering,which is used to fuse multiple base clusters to generate robust and high-quality final clustering partitions. At present,　many researchers focus on the clustering ensemble method of transforming the original information into a co - association matrix to obtain the final clustering partition through?
the co - association matrix. However, mostresearchers ignore that the clustering results are easily affected by noise,and the time and space complexity of the co-association matrix ishigh when the amount of data is large. In order to solve the above problems,we design a clustering ensemble method based on similaritybetween clusters ( CSCE) .?
The method firstly finds the similarity between the original objects based on the evidence accumulationmodel,and divides the original objects into several small clusters. Then a new similarity calculation method is used to calculate thesimilarity between clusters and form the similarity matrix between clusters. Finally,the cluster similarity matrix is divided into the finalclustering results by the method of normalized cut ( NCUT) . The proposed method combines low quality abnormal objects into similarclusters according to similarity,　and experiments are conducted on 8 datasets. It is showed that the proposed method not only has a goodclustering effect,but also solves the problem of high time and space complexity of traditional co-association matrix.