[1]杜华,刘华春.云数据中心下重复数据删除技术研究[J].计算机技术与发展,2019,29(02):157-161.[doi:10.3969/j.issn.1673-629X.2019.02.033]
 DU Hua,LIU Huachun.Research on Deduplication of Data in Cloud Data Center[J].,2019,29(02):157-161.[doi:10.3969/j.issn.1673-629X.2019.02.033]
点击复制

云数据中心下重复数据删除技术研究()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
29
期数:
2019年02期
页码:
157-161
栏目:
应用开发研究
出版日期:
2019-02-10

文章信息/Info

Title:
Research on Deduplication of Data in Cloud Data Center
文章编号:
1673-629X(2019)02-0157-05
作者:
杜华12刘华春2
1.核工业西南物理研究院,四川 成都 610000;2.成都理工大学 工程技术学院,四川 乐山 614000
Author(s):
DU Hua12 LIU Hua-chun2
1.Southwestern Institute of Physics,Chengdu 610000,China;2.School of Engineering and Technology,Chengdu University of Technology,Leshan 614000,China
关键词:
重复数据删除云数据中心指纹SSD纠删码
Keywords:
repeat data deletionscloud data centersfingerprintSSDerasure code
分类号:
TP31
DOI:
10.3969/j.issn.1673-629X.2019.02.033
摘要:
云数据中心下企业数据量快速增长,使得数据中心面临严峻挑战。研究发现,存储系统中高达 60%的数据是冗余的,因此云数据中心下的重复数据缩减受到越来越多的关注。以往单一存储结构模式下的存储性能评价指标(平均响应时间、磁盘 I/O 效率和数据冗余度),不但不能完全适应云数据这种以廉价设备为分布式存储结构的新变化,而且也难以较好地满足云服务提供商向用户做出的数据高可用性、高可靠性的 SLA 承诺。为此,在分析和总结云数据中心环境下数据存储的新特征之后,通过对单一存储结构下重复数据删除技术不足的剖析,提出了查询算法优化、基于 SSD 改进置换效率、改进的纠删码数据容错机制三条路径,以提高云数据中心下重删系统的工作效率和工作表现。最后,通过分析云服务下不同用户对 IT 资源需求的区别,有针对性地自动选择合适的去重时机,为从整体上改进云数据中心环境下重复删除系统操作效率指出了进一步研究的方向。
Abstract:
The cloud data center is facing severe challenges with the rapid growth of the data volume from enterprises. Studies have found that up to 60% of the data in storage system is redundant,so reducing the redundant data in the cloud data center is paid more and more attention. The storage performance evaluation index (average response time,disk I/O efficiency and data redundancy) in the previous single storage structure mode not only fail to adapt to the new changes of cloud data completely in the distributed storage structure with cheap devices,but also be difficult to meet SLA commitment about high availability and high reliability of the data made by the cloud service providers to users. Therefore,we propose three paths including query algorithm optimization,improved permutation efficiency based on SSD,improved erasure code data tolerance mechanism after analyzing and summarizing the new features of data storage in cloud data center and shortcoming of repeat data deletion under single storage structure,to enhance the working efficiency and performance of the system in cloud data center. Finally,by analyzing the differences between different user’s demands for IT resources in cloud services, the appropriate de-duplication timing is automatically selected in a targeted way,which points out the direction of further research for improving the efficiency for the deduplication system in cloud data center.

相似文献/References:

[1]李尤丰,王智钢.基于动态云的智慧农业架构研究[J].计算机技术与发展,2014,24(03):190.
 LI You-feng,WANG Zhi-gang.Research on Dynamic Cloud-based Architecture of Wisdom Agriculture[J].,2014,24(02):190.
[2]李君,殷小龙,万明祥. 异构云中综合时间能耗成本的任务调度算法[J].计算机技术与发展,2014,24(08):121.
 LI Jun,YIN Xiao-long,WAN Ming-xiang. Task Scheduling Algorithm Based on Time and Energy Consumption Cost in Heterogeneous Cloud[J].,2014,24(02):121.
[3]彭双和,图尔贡·麦提萨比尔,周巧凤. 基于Simhash的中文文本去重技术研究[J].计算机技术与发展,2017,27(11):137.
 PENG Shuang-he,Tuergong MAITISABIER,ZHOU Qiao-feng. Research on Deduplication Technique of Chinese Text with Simhash[J].,2017,27(02):137.

更新日期/Last Update: 2019-02-10