«上一篇/Previous Article|本期目录/Table of Contents|下一篇/Next Article»

[1]王玉雷,李玲娟. 一种密度和划分结合的聚类算法[J].计算机技术与发展,2015,25(09):53-56.
　WANG Yu-le,LI Ling-juan. A Clustering Algorithm of Combination of Density and Division[J].,2015,25(09):53-56.
点击复制

一种密度和划分结合的聚类算法()

分享到：

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:: 25
期数:: 2015年09期

页码:: 53-56

栏目:: 智能、算法、系统工程

出版日期:: 2015-09-10

文章信息/Info

Title:: A Clustering Algorithm of Combination of Density and Division

文章编号:: 1673-629X（2015)09-0053-04

作者:: 王玉雷; 李玲娟; 南京邮电大学计算机学院

Author(s):: WANG Yu-le; LI Ling-juan

关键词:: ; 数据挖掘; k -means; DBSCAN; 聚类; 密度; 划分

Keywords:: ; data mining; k -means; DBSCAN; clustering; density ; division

分类号:: TP311

文献标志码:: A

摘要:: 基于密度的DBSCAN聚类算法和基于划分的k-means聚类算法各有其优缺点。文中在k-means和DBSCAN聚类算法的基础上，以减少聚类算法对参数和数据点输入顺序的敏感性，发现任意形状的簇，提高聚类挖掘的质量为目标，提出了一种密度和划分结合的聚类算法—DDCA。该算法首先计算数据点的密度，以密度不小于给定阈值的中心点以及在其密度范围内的点组合成各个基本簇；再依据两个簇中心点之间的距离合并基本簇；最后把没有划分到任意簇的点划分到与其距离最近的簇中。理论分析和基于KDD CUP 99数据集的实验结果表明，提出的DDCA算法能够发现任意形状的簇，对数据点的输入顺序以及参数不敏感，在时间开销仅略有增加的情况下可获得更高的聚类准确度，其总体性能优于k -means。

Abstract:: Both the density-based clustering algorithm DBSCAN and the division-based clustering algorithm k-means have their advan-tages and disadvantages. In order to reduce the sensitivity of clustering algorithm to the parameters and the input order of the data points, finding clusters of arbitrary shape and improving the quality of clustering mining,on the basis of DBSCAN and k-means clustering algo-rithm,propose a clustering algorithm combined density and division,named DDCA. This algorithm firstly calculates the density of data points,then combines the center point which has a density greater than a given threshold value and others point which in the density range of the center point to build basic clusters. Then merge two basic clusters according to the distance between their center points. Finally,di-vide point which is not belong to any cluster into its nearest cluster. Theoretical analysis and experimental results on KDD CUP 99 dataset show that this algorithm can find clusters of arbitrary shape,and is not sensitive to parameters and the input order of data points. It can get higher clustering accuracy with a little additional time cost. Its overall performance is better than k -means clustering algorithm.

相似文献/References:

[1]项响琴汪彩梅.基于聚类高维空间算法的离群数据挖掘技术研究[J].计算机技术与发展,2010,(01):120.
　XIANG Xiang-qin,WANG Cai-mei.Study of Outlier Data Mining Based on CLIQUE Algorithm[J].,2010,(09):120.
[2]李雷丁亚丽罗红旗.基于规则约束制导的入侵检测研究[J].计算机技术与发展,2010,(03):143.
　LI Lei,DING Ya-li,LUO Hong-qi.Intrusion Detection Technology Research Based on Homing - Constraint Rule[J].,2010,(09):143.
[3]吉同路柏永飞王立松.住宅与房地产电子政务中数据挖掘的应用研究[J].计算机技术与发展,2010,(01):235.
　JI Tong-lu,BAI Yong-fei,WANG Li-song.Study and Application of Data Mining in E-government of House and Real Estate Industry[J].,2010,(09):235.
[4]杨静张楠男李建刘延明梁美红.决策树算法的研究与应用[J].计算机技术与发展,2010,(02):114.
　YANG Jing,ZHANG Nan-nan,LI Jian,et al.Research and Application of Decision Tree Algorithm[J].,2010,(09):114.
[5]赵裕啸倪志伟王园园伍章俊.SQL Server 2005数据挖掘技术在证券客户忠诚度的应用[J].计算机技术与发展,2010,(02):229.
　ZHAO Yu-xiao,NI Zhi-wei,WANG Yuan-yuan,et al.Application of Data Mining Technology of SQL Server 2005 in Customer Loyalty Model in Securities Industry[J].,2010,(09):229.
[6]张笑达徐立臻.一种改进的基于矩阵的频繁项集挖掘算法[J].计算机技术与发展,2010,(04):93.
　ZHANG Xiao-da,XU Li-zhen.An Advanced Frequent Itemsets Mining Algorithm Based on Matrix[J].,2010,(09):93.
[7]王爱平王占凤陶嗣干燕飞飞.数据挖掘中常用关联规则挖掘算法[J].计算机技术与发展,2010,(04):105.
　WANG Ai-ping,WANG Zhan-feng,TAO Si-gan,et al.Common Algorithms of Association Rules Mining in Data Mining[J].,2010,(09):105.
[8]张广路雷景生吴兴惠.一种改进的Apriori关联规则挖掘算法（英文）[J].计算机技术与发展,2010,(06):84.
　ZHANG Guang-lu,LEI Jing-sheng,WU Xing-hui.An Improved Apriori Algorithm for Mining Association Rules[J].,2010,(09):84.
[9]吴楠胡学钢.基于聚类分区的序列模式挖掘算法研究[J].计算机技术与发展,2010,(06):109.
　WU Nan,HU Xue-gang.Research on Clustering Partition-Based Approach of Sequential Pattern Mining[J].,2010,(09):109.
[10]吴青傅秀芬.水平分布数据库的正负关联规则挖掘[J].计算机技术与发展,2010,(06):113.
　WU Qing,FU Xiu-fen.Positive and Negative Association Rules Mining on Horizontally Partitioned Database[J].,2010,(09):113.
[11]李蓉,周维柏. 基于多特征选取和类完全加权的入侵检测[J].计算机技术与发展,2014,24(07):145.
　LI Rong,ZHOU Wei-bai. Intrusion Detection Based on Multiple Feature Selection and Class Fully Weighted [J].,2014,24(09):145.
[12]占美星[],杨颖[],杨磊[]. 基于树结构多重最小支持度的挖掘算法研究[J].计算机技术与发展,2014,24(08):45.
　ZHAN Mei-xing[],YANG Ying[],YANG Lei[]. Study on Mining Algorithm Based on Tree Structure Multiple Minimum Supports[J].,2014,24(09):45.
[13]于海平[],林晓丽[],刘会超[]. 基于数据挖掘的移动广告个性化推荐研究[J].计算机技术与发展,2014,24(08):234.
　YU Hai-ping[],LIN Xiao-li[],LIU Hui-chao[]. Research of Mobile Internet Advertising Personalized Recommendation Based on Data Mining[J].,2014,24(09):234.
[14]孙媛,黄刚. 基于Hadoop平台的C4.5算法的分析与研究[J].计算机技术与发展,2014,24(11):83.
　SUN Yuan,HUANG Gang. Analysis and Study of C4 . 5 Algorithm Based on Hadoop Platform[J].,2014,24(09):83.
[15]牛永洁,薛苏琴. 基于PDFBox抽取学术论文信息的实现[J].计算机技术与发展,2014,24(12):61.
　NIU Yong-jie,XUE Su-qin. Realization of Extraction of Academic Papers Information Based on PDFBox[J].,2014,24(09):61.
[16]郑超,高茂庭,吴爱华. 基于RFID及其路径约束的生产检查流程控制[J].计算机技术与发展,2015,25(02):225.
　ZHENG Chao,GAO Mao-ting,WU Ai-hua. Production Testing Process Control Based on RFID with Path Constraint[J].,2015,25(09):225.
[17]顾伟[][],傅德胜[][],蔡玮[]. 基于命题逻辑的关联规则挖掘算法[J].计算机技术与发展,2015,25(03):91.
　GU Wei[][],FU De-sheng[][],CAI Wei[]. Association Rules Mining Algorithm Based on Propositional Logic[J].,2015,25(09):91.
[18]陈运文,吴飞,吴庐山,等. 基于异常检测的时间序列研究[J].计算机技术与发展,2015,25(04):166.
　CHEN Yun-wen,WU Fei,WU Lu-shan,et al. Research on Time Series Based on Anomaly Detection[J].,2015,25(09):166.
[19]王晓鹏,武彤. 生产质量控制数据仓库模型设计与实现[J].计算机技术与发展,2015,25(06):181.
　WANG Xiao-peng,WU Tong. Design and Realization of Data Warehouse Model on Production Quality Control[J].,2015,25(09):181.
[20]李全. 适用于协议特征提取的多级T+序列树挖掘算法[J].计算机技术与发展,2015,25(10):71.
　LI Quan. Mining Algorithm Based on Multilevel T+ Sequence Tree for Protocol Signatures Extracting[J].,2015,25(09):71.

常用功能

工具/Tools

统计/Statistics

摘要浏览/Viewed1522
全文下载/Downloads902
评论/Comments