[1]熊开玲[],彭俊杰[],杨晓飞[],等. 基于核密度估计的K-means聚类优化[J].计算机技术与发展,2017,27(02):1-5.
 XIONG Kai-ling[],PENG Jun-jie[],YANG Xiao-fei[],et al. K-means Clustering Optimization Based on Kernel Density Estimation[J].,2017,27(02):1-5.
点击复制

 基于核密度估计的K-means聚类优化()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
27
期数:
2017年02期
页码:
1-5
栏目:
智能、算法、系统工程
出版日期:
2017-02-10

文章信息/Info

Title:
 K-means Clustering Optimization Based on Kernel Density Estimation
文章编号:
1673-629X(2017)02-0001-05
作者:
 熊开玲[1]彭俊杰[1]杨晓飞[2]黄俊[2]
 1.上海大学计算机工程与科学学院;2.中国科学院上海高等研究院公共安全中心
Author(s):
 XIONG Kai-ling[1]PENG Jun-jie[1]YANG Xiao-fei[2]HUANG Jun[2]
关键词:
 K-means聚类密度偏差抽样核密度估计数据挖掘
Keywords:
 K-means clusteringdensity bias samplingkernel density estimationdata mining
分类号:
TP305
文献标志码:
A
摘要:
 K-means聚类算法作为一种经典的聚类算法,应用领域十分广泛;但是K-means在处理高维及大数据集的情况下性能较差.核密度估计是一种用来估计未知分布密度函数的非参数估计方法,能够有效地获取数据集的分布情况.抽样是针对大数据集的数据挖掘的常用手段.密度偏差抽样是一种针对简单随机抽样在分布不均匀的数据集下容易丢失重要信息问题的改进方法.提出一种利用核密度估计结果的方法,选取数据集中密度分布函数极值点附近的样本点作为K-means初始中心参数,并使用核密度估计的分布结果,对数据集进行密度偏差抽样,然后对抽样的样本集进行K-means聚类.实验结果表明,使用核密度估计进行初始参数选择和密度偏差抽样能够有效加速K-means聚类过程.
Abstract:
 K-means clustering algorithm is classical and widely used in many fields,but it has poor performance in the case of processing high dimensional and large data sets.Kernel density estimation is a nonparametric estimation method to estimate the density function of unknown distribution,which can effectively obtain the distribution of the data set.Sampling is a common method fordata mining in large data sets.Density biased sampling is an improved method for the problem of easy loss of important information when using the simple random sampling in the inclined date set.A method is proposed using result of kernel density estimation,which chooses sample points from neighborhood of peak of density function of dataset as the initial center parameters of K-means and uses result of kernel density estimation to perform density biased sampling on the dataset,then runs K-means clustering on the sample set.The experimental results show that using the kernel density estimation for selection of initial parameters and density bias sample can effectively accelerate the K-means clustering process.

相似文献/References:

[1]李笔锋 李富荣 于建立 秦浩.惯性仪器故障诊断模型设计与实现[J].计算机技术与发展,2012,(01):143.
 LI Bi-feng,LI Fu-rong,YU Jian-li,et al.Design and Implementation of Inertial Apparatus Fault Diagnosis Model[J].,2012,(02):143.
[2]邓海,覃华,孙欣.一种优化初始中心的K-means聚类算法[J].计算机技术与发展,2013,(11):42.
 DENG Hai,QIN Hua,SUN Xin.A K-means Clustering Algorithm of Meliorated Initial Center[J].,2013,(02):42.
[3]肖秦琨,李俊芳,肖秦汉.基于四元数描述和EMD的人体运动捕获数据检索[J].计算机技术与发展,2014,24(03):90.
 XIAO Qin-kun[],LI Jun-fang[],XIAO Qin-han[].Human Motion Capture Data Retrieval Based on Quaternion and EMD[J].,2014,24(02):90.
[4]张志宏,吴庆波,邵立松,等.基于飞腾平台TOE协议栈的设计与实现[J].计算机技术与发展,2014,24(07):1.
 ZHANG Zhi-hong,WU Qing-bo,SHAO Li-song,et al. Design and Implementation of TCP/IP Offload Engine Protocol Stack Based on FT Platform[J].,2014,24(02):1.
[5]梁文快,李毅. 改进的基因表达算法对航班优化排序问题研究[J].计算机技术与发展,2014,24(07):5.
 LIANG Wen-kuai,LI Yi. Research on Optimization of Flight Scheduling Problem Based on Improved Gene Expression Algorithm[J].,2014,24(02):5.
[6]黄静,王枫,谢志新,等. EAST文档管理系统的设计与实现[J].计算机技术与发展,2014,24(07):13.
 HUANG Jing,WANG Feng,XIE Zhi-xin,et al. Design and Implementation of EAST Document Management System[J].,2014,24(02):13.
[7]侯善江[],张代远[][][]. 基于样条权函数神经网络P2P流量识别方法[J].计算机技术与发展,2014,24(07):21.
 HOU Shan-jiang[],ZHANG Dai-yuan[][][]. P2P Traffic Identification Based on Spline Weight Function Neural Network[J].,2014,24(02):21.
[8]李璨,耿国华,李康,等. 一种基于三维模型的文物碎片线图生成方法[J].计算机技术与发展,2014,24(07):25.
 LI Can,GENG Guo-hua,LI Kang,et al. A Method of Obtaining Cultural Debris’ s Line Chart Based on Three-dimensional Model[J].,2014,24(02):25.
[9]翁鹤,皮德常. 混沌RBF神经网络异常检测算法[J].计算机技术与发展,2014,24(07):29.
 WENG He,PI De-chang. Chaotic RBF Neural Network Anomaly Detection Algorithm[J].,2014,24(02):29.
[10]刘茜[],荆晓远[],李文倩[],等. 基于流形学习的正交稀疏保留投影[J].计算机技术与发展,2014,24(07):34.
 LIU Qian[],JING Xiao-yuan[,LI Wen-qian[],et al. Orthogonal Sparsity Preserving Projections Based on Manifold Learning[J].,2014,24(02):34.

更新日期/Last Update: 2017-05-11