[1]周长俊,宗平.Hadoop备份数据存放策略的改进[J].计算机技术与发展,2019,29(01):11-16.[doi:10. 3969 / j. issn. 1673-629X. 2019. 01. 003]
 ZHOU Chang-jun,ZONG Ping.Improvement of Backup Data Placement Policy of Hadoop[J].,2019,29(01):11-16.[doi:10. 3969 / j. issn. 1673-629X. 2019. 01. 003]
点击复制

Hadoop备份数据存放策略的改进()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
29
期数:
2019年01期
页码:
11-16
栏目:
智能、算法、系统工程
出版日期:
2019-01-10

文章信息/Info

Title:
Improvement of Backup Data Placement Policy of Hadoop
文章编号:
1673-629X(2019)01-0011-06
作者:
周长俊 宗平
南京邮电大学 计算机学院,江苏 南京,210003;南京邮电大学 海外教育学院,江苏 南京,210023
Author(s):
ZHOU Chang-jun1ZONG Ping2
1. School of Computer,Nanjing University of Posts and Telecommunications,Nanjing 210003,China;2. School of Overseas Education,Nanjing University of Posts and Telecommunications,Nanjing 210023,China
关键词:
Hadoop备份数据存放策略内部带宽负载均衡热点数据
Keywords:
Hadoopbackup data placement policyinternal bandwidthload balancehot data
分类号:
TP31
DOI:
10. 3969 / j. issn. 1673-629X. 2019. 01. 003
文献标志码:
A
摘要:
对于默认的Hadoop备份数据存放策略来说,一旦本地的数据副本发生失效,那么就需通过远端机架上存放的备份数据来实现恢复,而对于默认的备份数据存放策略,备份数据存放节点的选择具有随机性,那么可能带来的问题是不同节点间备份数据存放不均衡,数据恢复时由于距离的因素造成内部带宽的巨大消耗.针对上述问题,提出一种改进的备份数据存放策略.该策略将节点之间的距离,节点的负载以及备份数据恢复次数纳入节点选择的考虑范围,由此计算出每个节点的匹配度,随之选出匹配度最高的节点作为远端机架间的备份数据存放的最优节点.该策略不但实现了节点间备份数据放置的负载均衡,而且兼顾了数据恢复时消耗的内部带宽,将数据副本失效次数纳入考虑,实现了经常失效数据副本的快速恢复.通过在Hadoop平台上实现所提出的改进策略,结果达到了预期的要求.
Abstract:
On the topic of the default Hadoop backup data storage strategy,once the local data copy fails,backup data stored in the remoterack should be used to restore. However,for the default backup data storage strategy,the choice of storage nodes is random,so the problem that may arise is that backup data is stored unevenly among different nodes,and the internal bandwidth is greatly consumed due to thedistance when data is recovered. In order to solve these problems,we propose an improved backup data storage strategy. The strategyconsiders the distance between nodes,the load of nodes and the number of backup data recovery into consideration,and calculates thematching degree of each node. Thus node with the highest matching degree is selected as the optimal node for storing the backup data between the remote racks. This strategy not only realizes the load balancing of backup data placement between nodes,but also takes the internal bandwidth consumed during data recovery into account,besides that it covers the number of data copy failures and achieve rapid recovery of frequently failed data copies. By implementing the proposed improvement strategy on the Hadoop platform,the results meet theexpected requirements.

相似文献/References:

[1]李远方 邓世昆 闻玉彪 韩月阳.Hadoop-MapReduce下的PageRank矩阵分块算法[J].计算机技术与发展,2011,(08):6.
 LI Yuan-fang,DENG Shi-kun,WEN Yu-biao,et al.PageRank Matrix Partitioned Algorithm Using Hadoop-MapReduce[J].,2011,(01):6.
[2]李远方 贾时银 邓世昆 韩月阳.基于树结构的MapReduce模型[J].计算机技术与发展,2011,(08):149.
 LI Yuan-fang,JIA Shi-yin,DENG Shi-kun,et al.MapReduce Model Based on Tree Structure[J].,2011,(01):149.
[3]王梅,朱信忠,赵建民,等.基于 Hadoop 的海量图像检索系统[J].计算机技术与发展,2013,(01):204.
 WANG Mei,ZHU Xin-zhong,ZHAO Jian-min,et al.Massive Images Retrieval System Based on Hadoop[J].,2013,(01):204.
[4]王晓军,孙惠.基于MapReduce的多路连接优化方法研究[J].计算机技术与发展,2013,(06):59.
 WANG Xiao-jun,SUN Hui.Research of Optimizing Multiway Joins Based on MapReduce[J].,2013,(01):59.
[5]朱贤军,李敬兆.无加密模式下对云数据的隐私保密[J].计算机技术与发展,2013,(06):216.
 ZHU Xian-jun,LI Jing-zhao.Cloud Data Privacy under None Encryption[J].,2013,(01):216.
[6]周婷,张君瑛,罗成.基于Hadoop的K-means聚类算法的实现[J].计算机技术与发展,2013,(07):18.
 ZHOU Ting[],ZHANG Jun-ying[],LUO Cheng[].Realization of K-means Clustering Algorithm Based on Hadoop[J].,2013,(01):18.
[7]吕婉琪,钟诚,唐印浒,等.Hadoop分布式架构下大数据集的并行挖掘[J].计算机技术与发展,2014,24(01):22.
 L Wan-qi,ZHONG Cheng,TANG Yin-hu,et al.Parallel Mining of Large Dataset in Hadoop Distributed Computing Framework[J].,2014,24(01):22.
[8]王晓军,邹亮亮. Hadoop迭代优化技术的研究[J].计算机技术与发展,2014,24(09):98.
 WANG Xiao-jun,ZOU Liang-liang. Research on Optimizing Iterative Technology of Hadoop[J].,2014,24(01):98.
[9]徐源吾[][],王珣[][]. 基于Hadoop的智能家居信息处理平台[J].计算机技术与发展,2014,24(09):183.
 XU Yuan-wu[] [],WANG Xun[][]. nformation Processing Platform of Smart Home Based on Hadoop[J].,2014,24(01):183.
[10]孙媛,黄刚. 基于Hadoop平台的C4.5算法的分析与研究[J].计算机技术与发展,2014,24(11):83.
 SUN Yuan,HUANG Gang. Analysis and Study of C4 . 5 Algorithm Based on Hadoop Platform[J].,2014,24(01):83.

更新日期/Last Update: 2019-01-10