[1]刘 赛,聂庆节,刘军,等.基于聚类的重复数据去冗算法的研究[J].计算机技术与发展,2018,28(02):125-129.[doi:10.3969/j.issn.1673-629X.2018.02.027]
 LIU Sai,NIE Qing-jie,LIU Jun,et al.Research on Deduplication Algorithm Based on K-medoids Clustering[J].,2018,28(02):125-129.[doi:10.3969/j.issn.1673-629X.2018.02.027]
点击复制

基于聚类的重复数据去冗算法的研究()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
28
期数:
2018年02期
页码:
125-129
栏目:
出版日期:
2018-02-10

文章信息/Info

Title:
Research on Deduplication Algorithm Based on K-medoids Clustering
文章编号:
1673-629X(2018)02-0125-05
作者:
 刘 赛1 聂庆节1刘军1王超2李静2
1.南瑞集团公司,江苏 南京 210003;
2.南京航空航天大学 计算机学院,江苏 南京 211106
Author(s):
LIU Sai1 NIE Qing-jie1 LIU Jun1 WANG Chao 2 LI Jing 2
1.NARI Group Corporation,Nanjing 210003,China;
2.School of Computer,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China)
关键词:
DELTA 压缩数据压缩聚类K-medoids
Keywords:
DELTA compressiondata compressionclusteringK-medoids
分类号:
TP393
DOI:
10.3969/j.issn.1673-629X.2018.02.027
文献标志码:
A
摘要:
数据的损坏和丢失会带来无法弥补的损失,数据备份系统可以将损失降到最低程度。随着收集的数据量的迅速增加,备份系统需要备份与恢复的数据也迅速增加,然而备份文件之间的相似度超过 60%,全部存储在硬盘上十分浪费存储空间,故提出了一种基于 K-medoids 聚类的 DELTA 压缩方法,用来去除备份数据中的重复数据。该方法首先对文件进行切割分块,通过对文件块进行两两 DELTA 压缩,得出各自压缩文件的大小,作为两个文件块之间的相似度。通过得到的相似度进行 K-medoids 聚类,作为 DELTA 压缩前的预处理步骤。然后根据 K-medoids 的聚类结果,合并小文件块之后再进行DELTA 压缩。测试结果表明,该方法提高了压缩率,并减少了 DELTA 压缩中查找指纹的次数,降低了压缩时间。
Abstract:
Data damage and loss will lead the irreparable losses which can be minimized by data backup system.With the increasing amount of data collection,data backup system has to deal with more and more data of backup and recovery,but the similarity between the various backup files is more than 60% so that all the data stored in the hard disk will be a waste of storage space.For this,we propose a DELTA compression method based on K-medoids clustering to remove duplicate data from the backup data.It firstly segments and blocks the files,and then obtains the size of each compression file by means of DELTA compression between the two blocks as the similarity of them.K-me-doids clustering is performed by the similarity obtained as preprocessing steps before DELTA compression.According to the K-medoids clustering,we merge the small similar file blocks before DELTA compression.The tests show that the proposed method can improve the compression rate,reduce the number of fingerprints in DELTA compression and shorten the compression time.

相似文献/References:

[1]祝君 林庆农 徐造林.实时历史数据库中压缩技术的并行化研究[J].计算机技术与发展,2010,(07):36.
 ZHU Jun,LIN Qing-nong,XU Zao-lin.Research on Parallel Compression Technology in Real-Time Historical Database[J].,2010,(02):36.
[2]马恋 何锫.基于神经网络的数据压缩研究[J].计算机技术与发展,2007,(02):12.
 MA Lian,HE Pei.Research on NN- Based Data Compression[J].,2007,(02):12.
[3]秦贞远 马素霞 齐林海.电能质量数据交换平台的关键问题研究[J].计算机技术与发展,2011,(04):206.
 QIN Zhen-yuan,MA Su-xia,QI Lin-hai.Design and Implementation of Data Exchange Platform for Power Quality[J].,2011,(02):206.
[4]郑翠芳.几种常用无损数据压缩算法研究[J].计算机技术与发展,2011,(09):73.
 ZHENG Cui-fang.Research of Several Common Lossless Data Compression Algorithms[J].,2011,(02):73.
[5]杨永军 徐江 许帅 舒逸.实时数据库有损压缩算法的研究[J].计算机技术与发展,2012,(09):5.
 YANG Yong-jun,XU Jiang,XU Shuai,et al.Research on Lossy Compression Algorithm in Real-time Database[J].,2012,(02):5.
[6]方凌江,粘永健,王迎春[].基于分类KLT的高光谱图像压缩[J].计算机技术与发展,2013,(11):82.
 FANG Ling-jiang[],NIAN Yong-jian[],WANG Ying-chun[].Hyperspectral Images Compression Based on Classified KLT[J].,2013,(02):82.

更新日期/Last Update: 2018-03-29