[1]唐浩[],杨余旺[],辛智斌[]. 基于MapReduce的单遍K-means聚类算法[J].计算机技术与发展,2017,27(09):26-30.
 TANG Hao[],YANG Yu-wang[],XIN Zhi-bin[]. A Single-pass K-means Clustering Algorithm with MapReduce[J].,2017,27(09):26-30.
点击复制

 基于MapReduce的单遍K-means聚类算法()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
27
期数:
2017年09期
页码:
26-30
栏目:
智能、算法、系统工程
出版日期:
2017-09-10

文章信息/Info

Title:
 A Single-pass K-means Clustering Algorithm with MapReduce
文章编号:
1673-629X(2017)09-0026-05
作者:
 唐浩[1]杨余旺[1]辛智斌[2]
1. 南京理工大学 计算机科学与工程学院;2.淮海集团工业有限公司
Author(s):
 TANG Hao[1]YANG Yu-wang[1]XIN Zhi-bin[2]
关键词:
 MapReduce框架数据聚类K-means++ Mahout 单遍技术
Keywords:
 MapReduce frameworkdata clusteringK-means++Mahout single-pass
分类号:
TP301.6
文献标志码:
A
摘要:
 K-means应用于MapReduce框架的大数据处理可显著提高K-means对大数据集的处理能力.但K-means聚类算法需要进行多次迭代才能达到可接受的效果,并将每次迭代作为一个独立map作业执行,需要读写整个数据集,从而导致显著的I/O消耗,与MapReduce框架的设计理念不符.为此,提出了一个基于MapReduce的单遍K-means算法(MR-SK).该算法采用流数据单遍算法读取数据,聚类时采用K-means++初始化seeding算法得到初始聚类中心.在理论分析MRSK算法复杂度的基础上,进行了MRSK算法的测试验证和相关分析.验证实验结果表明,相对于基于MapReduce和基于数据流的K-means聚类算法,所提出的MRSK算法在执行速度和聚类效果方面具有更好的优势.
Abstract:
 The application of fitting K-means into MapReduce framework can greatly improve the processing of K-means on large data-sets. But K-means achieves an acceptable clustering effect through multiple iterations. Each iteration is executed as an independent map job,in which the whole dataset must be read and wrote to slow disks,resulting in high I/O overhead,and it is not consistent with the de-sign concept of the MapReduce framework. Therefore,a single-pass K-means clustering algorithm based on MapReduce,called MRSK, is proposed. It reads the data by single-pass and uses the K-means++ seeding algorithm to get the initial cluster center. On the basis of theoretically analyzing the complexity of the MRSK,a series of test and analysis for MRSK is conducted. The experimental results show that compared with the available MapReduce-based and stream-based K-means variants,MRSK performs both faster execution times and higher quality of clustering results.

相似文献/References:

[1]张志宏,吴庆波,邵立松,等.基于飞腾平台TOE协议栈的设计与实现[J].计算机技术与发展,2014,24(07):1.
 ZHANG Zhi-hong,WU Qing-bo,SHAO Li-song,et al. Design and Implementation of TCP/IP Offload Engine Protocol Stack Based on FT Platform[J].,2014,24(09):1.
[2]梁文快,李毅. 改进的基因表达算法对航班优化排序问题研究[J].计算机技术与发展,2014,24(07):5.
 LIANG Wen-kuai,LI Yi. Research on Optimization of Flight Scheduling Problem Based on Improved Gene Expression Algorithm[J].,2014,24(09):5.
[3]黄静,王枫,谢志新,等. EAST文档管理系统的设计与实现[J].计算机技术与发展,2014,24(07):13.
 HUANG Jing,WANG Feng,XIE Zhi-xin,et al. Design and Implementation of EAST Document Management System[J].,2014,24(09):13.
[4]侯善江[],张代远[][][]. 基于样条权函数神经网络P2P流量识别方法[J].计算机技术与发展,2014,24(07):21.
 HOU Shan-jiang[],ZHANG Dai-yuan[][][]. P2P Traffic Identification Based on Spline Weight Function Neural Network[J].,2014,24(09):21.
[5]李璨,耿国华,李康,等. 一种基于三维模型的文物碎片线图生成方法[J].计算机技术与发展,2014,24(07):25.
 LI Can,GENG Guo-hua,LI Kang,et al. A Method of Obtaining Cultural Debris’ s Line Chart Based on Three-dimensional Model[J].,2014,24(09):25.
[6]翁鹤,皮德常. 混沌RBF神经网络异常检测算法[J].计算机技术与发展,2014,24(07):29.
 WENG He,PI De-chang. Chaotic RBF Neural Network Anomaly Detection Algorithm[J].,2014,24(09):29.
[7]刘茜[],荆晓远[],李文倩[],等. 基于流形学习的正交稀疏保留投影[J].计算机技术与发展,2014,24(07):34.
 LIU Qian[],JING Xiao-yuan[,LI Wen-qian[],et al. Orthogonal Sparsity Preserving Projections Based on Manifold Learning[J].,2014,24(09):34.
[8]尚福华,李想,巩淼. 基于模糊框架-产生式知识表示及推理研究[J].计算机技术与发展,2014,24(07):38.
 SHANG Fu-hua,LI Xiang,GONG Miao. Research on Knowledge Representation and Inference Based on Fuzzy Framework-production[J].,2014,24(09):38.
[9]叶偲,李良福,肖樟树. 一种去除运动目标重影的图像镶嵌方法研究[J].计算机技术与发展,2014,24(07):43.
 YE Si,LI Liang-fu,XIAO Zhang-shu. Research of an Image Mosaic Method for Removing Ghost of Moving Targets[J].,2014,24(09):43.
[10]余松平[][],蔡志平[],吴建进[],等. GSM-R信令监测选择录音系统设计与实现[J].计算机技术与发展,2014,24(07):47.
 YU Song-ping[][],CAI Zhi-ping[] WU Jian-jin[],GU Feng-zhi[]. Design and Implementation of an Optional Voice Recording System Based on GSM-R Signaling Monitoring[J].,2014,24(09):47.
[11]张康,喻瑛,王伟杰. 基于MapReduce框架的航班串编制算法[J].计算机技术与发展,2017,27(03):142.
 ZHANG Kang,YU Ying,WANG Wei-jie. Flight String Compilation Algorithm Based on MapReduce Frame[J].,2017,27(09):142.

更新日期/Last Update: 2017-10-19