[1]王晓军,邹亮亮. Hadoop迭代优化技术的研究[J].计算机技术与发展,2014,24(09):98-102.
 WANG Xiao-jun,ZOU Liang-liang. Research on Optimizing Iterative Technology of Hadoop[J].,2014,24(09):98-102.
点击复制

 Hadoop迭代优化技术的研究()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
24
期数:
2014年09期
页码:
98-102
栏目:
智能、算法、系统工程
出版日期:
2014-09-10

文章信息/Info

Title:
 Research on Optimizing Iterative Technology of Hadoop
文章编号:
1673-629X(2014)09-0098-05
作者:
 王晓军邹亮亮
 南京邮电大学 信息网络技术研究所
Author(s):
 WANG Xiao-junZOU Liang-liang
关键词:
 Hadoop迭代map端存储
Keywords:
 Hadoopiterationmap side storage
分类号:
TP31
文献标志码:
A
摘要:
 Hadoop是处理海量数据的分布式计算框架,已经得到了广泛的应用。但是Hadoop处理图结构数据存在一些不足。图结构数据的强耦合特性,无法通过一次MapReduce计算得出结果,而是需要迭代计算,甚至一次迭代需要多次Ma-pReduce完成。而重新启动MapReduce作业,开销较大,以及迭代过程中可能存在静态数据的不必要传输。文中在Hadoop的基础之上,提出map端存储的策略,即将静态数据存储在map端,在map端完成静态与动态数据相关的计算,减少了整个迭代计算的总运行时间。通过搭建修改过的Hadoop平台,与改进前迭代方案进行比较,实验结果表明map端存储策略运行时间得到了一定程度的减少。
Abstract:
 Hadoop is a distributed computing framework which has been widely used for dealing with huge data. But Hadoop has some disadvantages to process graph data. Because of strong coupling,graph structure data need multiple iterations which may contains several MapReduce computations instead of one MapReduce computation. It costs too much to restart MapReduce job and exists unnecessary transmission for static data in iteration. Propose map side storage strategy based on Hadoop,the static data is stored in map side and finish some related computations with state data. This strategy could reduce whole running time. Experimental results have shown that map side storage strategy spends less time compared with previous strategy through Hadoop platform.

相似文献/References:

[1]李远方 邓世昆 闻玉彪 韩月阳.Hadoop-MapReduce下的PageRank矩阵分块算法[J].计算机技术与发展,2011,(08):6.
 LI Yuan-fang,DENG Shi-kun,WEN Yu-biao,et al.PageRank Matrix Partitioned Algorithm Using Hadoop-MapReduce[J].,2011,(09):6.
[2]李远方 贾时银 邓世昆 韩月阳.基于树结构的MapReduce模型[J].计算机技术与发展,2011,(08):149.
 LI Yuan-fang,JIA Shi-yin,DENG Shi-kun,et al.MapReduce Model Based on Tree Structure[J].,2011,(09):149.
[3]王梅,朱信忠,赵建民,等.基于 Hadoop 的海量图像检索系统[J].计算机技术与发展,2013,(01):204.
 WANG Mei,ZHU Xin-zhong,ZHAO Jian-min,et al.Massive Images Retrieval System Based on Hadoop[J].,2013,(09):204.
[4]王晓军,孙惠.基于MapReduce的多路连接优化方法研究[J].计算机技术与发展,2013,(06):59.
 WANG Xiao-jun,SUN Hui.Research of Optimizing Multiway Joins Based on MapReduce[J].,2013,(09):59.
[5]朱贤军,李敬兆.无加密模式下对云数据的隐私保密[J].计算机技术与发展,2013,(06):216.
 ZHU Xian-jun,LI Jing-zhao.Cloud Data Privacy under None Encryption[J].,2013,(09):216.
[6]周婷,张君瑛,罗成.基于Hadoop的K-means聚类算法的实现[J].计算机技术与发展,2013,(07):18.
 ZHOU Ting[],ZHANG Jun-ying[],LUO Cheng[].Realization of K-means Clustering Algorithm Based on Hadoop[J].,2013,(09):18.
[7]吕婉琪,钟诚,唐印浒,等.Hadoop分布式架构下大数据集的并行挖掘[J].计算机技术与发展,2014,24(01):22.
 L Wan-qi,ZHONG Cheng,TANG Yin-hu,et al.Parallel Mining of Large Dataset in Hadoop Distributed Computing Framework[J].,2014,24(09):22.
[8]张志宏,吴庆波,邵立松,等.基于飞腾平台TOE协议栈的设计与实现[J].计算机技术与发展,2014,24(07):1.
 ZHANG Zhi-hong,WU Qing-bo,SHAO Li-song,et al. Design and Implementation of TCP/IP Offload Engine Protocol Stack Based on FT Platform[J].,2014,24(09):1.
[9]梁文快,李毅. 改进的基因表达算法对航班优化排序问题研究[J].计算机技术与发展,2014,24(07):5.
 LIANG Wen-kuai,LI Yi. Research on Optimization of Flight Scheduling Problem Based on Improved Gene Expression Algorithm[J].,2014,24(09):5.
[10]黄静,王枫,谢志新,等. EAST文档管理系统的设计与实现[J].计算机技术与发展,2014,24(07):13.
 HUANG Jing,WANG Feng,XIE Zhi-xin,et al. Design and Implementation of EAST Document Management System[J].,2014,24(09):13.
[11]徐源吾[][],王珣[][]. 基于Hadoop的智能家居信息处理平台[J].计算机技术与发展,2014,24(09):183.
 XU Yuan-wu[] [],WANG Xun[][]. nformation Processing Platform of Smart Home Based on Hadoop[J].,2014,24(09):183.
[12]孙媛,黄刚. 基于Hadoop平台的C4.5算法的分析与研究[J].计算机技术与发展,2014,24(11):83.
 SUN Yuan,HUANG Gang. Analysis and Study of C4 . 5 Algorithm Based on Hadoop Platform[J].,2014,24(09):83.
[13]王全民,苗雨,何明,等. 基于矩阵分解的协同过滤算法的并行化研究[J].计算机技术与发展,2015,25(02):55.
 ANG Quan-min,MIAO Yu,HE Ming,et al. Parallelized Research on Collaborative Filtering Algorithm Based on Matrix Factorization[J].,2015,25(09):55.
[14]方木云,刘洪彬,谢恩文. Hadoop下基于边聚类的重叠社区发现算法研究[J].计算机技术与发展,2015,25(03):58.
 FANG Mu-yun,LIU Hong-bin,XIE En-wen. Research on Overlapping Communities Detecting Algorithm Using Hadoop Based on Edge Clustering[J].,2015,25(09):58.
[15]秦军[],童毅[],戴新华[],等. 基于MapReduce数据密集型负载调度策略研究[J].计算机技术与发展,2015,25(04):48.
 QIN Jun[],TONG Yi[],DAI Xin-hua[],et al. Research on Scheduling Strategy of Data Intensive Workloads Based on MapReduce[J].,2015,25(09):48.
[16]徐新瑞,孟彩霞,周雯,等. 一种基于Spark时效化协同过滤推荐算法[J].计算机技术与发展,2015,25(06):48.
 XU Xin-rui,MENG Cai-xia,ZHOU Wen,et al. A Real-time Collaborative Filtering Recommendation Algorithm Based on Spark[J].,2015,25(09):48.
[17]李晨,杨子江,朱世伟,等. 基于Hadoop的网络舆情监控平台设计与实现[J].计算机技术与发展,2016,26(02):144.
 LI Chen,YANG Zi-jiang,ZHU Shi-wei,et al. Design and Implementation of Network Consensus Monitoring System Based on Hadoop[J].,2016,26(09):144.
[18]马腾腾[],朱庆华[],曹菡[],等. 基于Hadoop的旅游景点推荐的算法实现与应用[J].计算机技术与发展,2016,26(03):47.
 MA Teng-teng[],ZHU Qing-hua[],CAO Han[],et al. Implementation and Application of Algorithm of Tourist Attractions Recommendation Based on Hadoop[J].,2016,26(09):47.
[19]李正杰,黄刚. 基于Hadoop平台的SVM KNN分类算法的研究[J].计算机技术与发展,2016,26(03):75.
 LI Zheng-jie,HUANG Gang. Research on SVM KNN Classification Algorithm Based on Hadoop Platform[J].,2016,26(09):75.
[20]王凤领. 基于Hadoop高校教育资源云存储平台构建研究[J].计算机技术与发展,2016,26(03):176.
 WANG Feng-ling. Study on Construction of Cloud Storage Platform for College Education Resources Based on Hadoop[J].,2016,26(09):176.

更新日期/Last Update: 2015-04-01