[1]王 诚,高兴东.基于最小生成树的密度聚类算法研究[J].计算机技术与发展,2022,32(02):45-50.[doi:10. 3969 / j. issn. 1673-629X. 2022. 02. 007]
 WANG Cheng,GAO Xing-dong.Research on Density Clustering Algorithm Based on MST[J].,2022,32(02):45-50.[doi:10. 3969 / j. issn. 1673-629X. 2022. 02. 007]
点击复制

基于最小生成树的密度聚类算法研究()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
32
期数:
2022年02期
页码:
45-50
栏目:
大数据分析与挖掘
出版日期:
2022-02-10

文章信息/Info

Title:
Research on Density Clustering Algorithm Based on MST
文章编号:
1673-629X(2022)02-0045-06
作者:
王 诚 高兴东
南京邮电大学 通信与信息工程学院,江苏 南京 210003
Author(s):
WANG ChengGAO Xing-dong
School of Telecommunications & Information Engineering,Nanjing University of Posts and Telecommunications,Nanjing 210003,China
关键词:
DBSCAN相互可达距离密度聚类最小生成树不平衡数据集
Keywords:
DBSCANmutual reachable distancedensity clusteringminimum spanning tree ( MST) unbalanced data set
分类号:
TP399
DOI:
10. 3969 / j. issn. 1673-629X. 2022. 02. 007
摘要:
传统 DBSCAN 算法对密度分布不均匀的不平衡数据集的聚类效果并不理想,同时传统算法的聚类结果对邻域半径(Eps) 以及核心点阈值( MinPts) 敏感。 针对以上问题,改进了传统算法,提出了一种基于最小生成树的密度聚类算法(MST-DBSCAN) 。 由于对象之间的距离对聚类结果影响较大,为了更好地表示对象之间的距离特性,首先使用相互可达距离(mutual reachability distance)代替传统算法中的欧氏距离,表示数据集中对象与对象之间的距离,解决因密度分布不均匀导致效果不佳的问题;为了建立对象与对象之间的联系,同时保留对象之间的距离特性,引用 Prim 算法对数据集中的所有对象构建最小生成树;其次根据指定的簇的数目及最小簇对象数数目参数对得到的最小生成树进行剪枝;根据剪枝的结果,将剪枝后的各个部分进行聚类。 在公开的 UCI 数据集上的实验结果表明,提出的 MST - DBSCAN 算法与现有DBSCAN、OPTICS、KANN-DBSCAN 算法相比,在密度分布不均匀的数据集上聚类效果有所提升并且较原有传统算法有较高的聚类准确性。
Abstract:
The clustering effect of the traditional DBSCAN algorithm on the data set with uneven density distribution is not ideal,and theclustering result of the traditional algorithm is sensitive to the neighborhood radius ( Eps) and the core point threshold ( MinPts) . Tosolve the above problems, the traditional algorithm is improved, and a density clustering algorithm based on minimum spanning tree( MST- DBSCAN) is proposed. Since the distance between objects has a greater impact on the clustering results, in order to betterrepresent the distance characteristics between objects, first use mutual reachability distance instead of the Euclidean distance in thetraditional algorithm to represent the objects in the data set for the problem of poor effect? caused by uneven density distribution. In orderto establish the connection between the object and the object,while retaining the distance characteristics between the objects,the Prim algorithm is used to construct a minimum generation of all objects in the data set tree. Secondly,pruning the obtained minimum spanningtree according to the specified number of clusters? and the minimum number of cluster objects. According to the pruning result,cluster thepruned parts. Experiments on the public UCI data set show that compared with the existing DBSCAN, OPTICS, KANN - DBSCANalgorithm,the proposed MST- DBSCAN algorithm has better clustering performance on data sets with uneven density distribution andhigher clustering accuracy than the traditional algorithm.

相似文献/References:

[1]王玉雷,李玲娟. 一种密度和划分结合的聚类算法[J].计算机技术与发展,2015,25(09):53.
 WANG Yu-le,LI Ling-juan. A Clustering Algorithm of Combination of Density and Division[J].,2015,25(02):53.
[2]朱子龙,李玲娟.基于Spark 的密度聚类算法并行化研究[J].计算机技术与发展,2018,28(06):80.[doi:10.3969/ j. issn.1673-629X.2018.06.018]
 ZHU Zi-long,LI Ling-juan.Research on Parallelization of Density Clustering Algorithm Based on Spark[J].,2018,28(02):80.[doi:10.3969/ j. issn.1673-629X.2018.06.018]
[3]王楠,曹菡. 基于Geo-tagged照片的旅游推荐研究[J].计算机技术与发展,2016,26(10):123.
 WANG Nan,CAO Han. Study on Travel Recommendation Based on Geo-tagged Photos[J].,2016,26(02):123.
[4]王安瑾.一种基于 MinHash 的改进新闻文本聚类算法[J].计算机技术与发展,2019,29(02):39.[doi:10.3969/j.issn.1673-629X.2019.02.008]
 WANG Anjin.An Improved News Text Clustering Algorithm Based on MinHash[J].,2019,29(02):39.[doi:10.3969/j.issn.1673-629X.2019.02.008]
[5]宋金玉,郭一平,王斌.DBSCAN聚类算法的参数配置方法研究[J].计算机技术与发展,2019,29(05):44.[doi:10. 3969 / j. issn. 1673-629X. 2019. 05. 009]
 SONG Jin-yu,GUO Yi-ping,WANG Bin.Research on Parameter Configuration Method of DBSCAN Clustering Algorithm[J].,2019,29(02):44.[doi:10. 3969 / j. issn. 1673-629X. 2019. 05. 009]
[6]刘重男,杨 洋,卢清心,等.低成本二维激光传感器室内移动建图方法研究[J].计算机技术与发展,2023,33(08):30.[doi:10. 3969 / j. issn. 1673-629X. 2023. 08. 005]
 LIU Zhong-nan,YANG Yang,LU Qing-xin,et al.Research on Indoor Mapping Based on Low-cost Two-dimensional Laser Sensor[J].,2023,33(02):30.[doi:10. 3969 / j. issn. 1673-629X. 2023. 08. 005]

更新日期/Last Update: 2022-02-10