[1]张磊,张公让,张金广.一种网格化聚类算法的MapReduce并行化研究[J].计算机技术与发展,2013,(02):60-64.
 ZHANG Lei,ZHANG Gong-rang,ZHANG Jin-guang.MapReduce Parallelization Research of a Clustering Algorithm Based on Grid[J].,2013,(02):60-64.
点击复制

一种网格化聚类算法的MapReduce并行化研究()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
期数:
2013年02期
页码:
60-64
栏目:
智能、算法、系统工程
出版日期:
1900-01-01

文章信息/Info

Title:
MapReduce Parallelization Research of a Clustering Algorithm Based on Grid
文章编号:
1673-629X(2013)02-0060-05
作者:
张磊12张公让12张金广12
[1]合肥工业大学 管理学院;[2]教育部过程优化与智能决策重点实验室
Author(s):
ZHANG LeiZHANG Gong-rangZHANG Jin-guang
关键词:
网格聚类数据挖掘MapReduce并行化
Keywords:
gridclustering algorithmdata miningMapReduce parallelization
文献标志码:
A
摘要:
面对增量式增长的聚类数据,受云计算并行化处理模式的启发,文中对一种网格化聚类算法进行了MapReduce并行化研究.该算法首先利用网格处理技术对数据进行预处理,用网格预处理后所得单元的重心点取代该单元中保存的所有点,然后在MapReduce框架下将各个单元的重心点作为聚类分析的基本数据单元,进行聚类分析.实验结果表明,该算法MapReduce并行化后部署在Hadoop集群上运行,具有与原来相同的聚类效果,并能节省聚类分析的时间和降低计算的复杂度,适合用于高纬度、增量式的海量数据的分析和挖掘
Abstract:
As the incremental growth of clustering data and inspired by the parallel processing model of cloud computing,conducted the MapReduce parallelization research for a clustering algorithm based on gird. This algorithm,firstly,preprocessed the data using the grid processing method,then used the center of gravity of the grid unit as the basic data unit for the clustering analysis under the MapReduce framework,instead of using all the points stored in the unit. The result of experiments demonstrate that this clustering algorithm after its MapReduce parallelization had the same result as before running in the Hadoop cluster. This clustering algorithm can also save the time of analysis and reduce the computational complexity. So,it is suitable for the analysis and data mining of incremental massive data with high latitudes

相似文献/References:

[1]蒋璐璐 王适 王宝成 李慧敏 李鑫慧.一种改进的标记分水岭遥感图像分割方法[J].计算机技术与发展,2010,(01):36.
 JIANG Lu-lu,WANG Shi,WANG Bao-cheng,et al.Segmentation of Remote Sensing Image Based on an Improved Labeling Watershed Algorithm[J].,2010,(02):36.
[2]张甜 罗眉 孟晓红 赵宗涛.一种基于状态特征的航天发射故障诊断技术[J].计算机技术与发展,2010,(01):93.
 ZHANG Tian,LUO Mei,MENG Xiao-hong,et al.A Technology in Fault Diagnosis of Spaceflight Launch Based on State Character[J].,2010,(02):93.
[3]高强 刘波.关于网格模拟器的研究[J].计算机技术与发展,2010,(01):97.
 GAO Qiang,LIU Bo.The Research about Grid Simulators[J].,2010,(02):97.
[4]王会颖 章义刚.求解聚类问题的改进人工鱼群算法[J].计算机技术与发展,2010,(03):84.
 WANG Hui-ying,ZHANG Yi-gang.An Improved Artificial Fish- Swarm Algorithm of Solving Clustering Analysis Problem[J].,2010,(02):84.
[5]易侃 王汝传.一种基于SOA的网格任务调度框架[J].计算机技术与发展,2010,(04):155.
 YI Kan,WANG Ru-chuan.A Task Scheduling Framework Based on SOA in Grid Computing[J].,2010,(02):155.
[6]赵敏 倪志伟 刘斌.K—means与朴素贝叶斯在商务智能中的应用[J].计算机技术与发展,2010,(04):179.
 ZHAO Min,NI Zhi-wei,LIU Bin.Application Research of K - Means Clustering and Naive Bayesian Algorithm in Business Intelligence[J].,2010,(02):179.
[7]刘记.教育服务网格监控系统的研究与设计[J].计算机技术与发展,2010,(05):67.
 LIU Ji.Research and Design of Monitoring System about Education Service Grid[J].,2010,(02):67.
[8]赵准 张立臣.基于面向方面的网格QoS建模方法[J].计算机技术与发展,2010,(06):63.
 ZHAO Zhun,ZHANG Li-chen.Grid QoS Modeling Method Based on Aspect-Oriented[J].,2010,(02):63.
[9]吴楠 胡学钢.基于聚类分区的序列模式挖掘算法研究[J].计算机技术与发展,2010,(06):109.
 WU Nan,HU Xue-gang.Research on Clustering Partition-Based Approach of Sequential Pattern Mining[J].,2010,(02):109.
[10]耿波 仲红 徐杰 闫娜娜.用关联分析法对负荷预测结果进行二次处理[J].计算机技术与发展,2008,(04):171.
 GENG Bo,ZHONG Hong,XU Jie,et al.Using Correlation Analysis to Treat Load Forecasting Results[J].,2008,(02):171.

更新日期/Last Update: 1900-01-01