Aiming at the problems of poor effect of existing similar data deduplication techniques and high metadata overhead in cloudstorage environment,variable granularity-based?chunk-context aware similar data deduplication technique for cloud storage is proposed.The technique adopts a feature extraction algorithm based on sub-block reorganization?
to perform initial feature extraction of the internalstructure of the data block content,and utilizes a BP ( Back Propagation) neural network context-aware model to embed the?
data blockcontextual feature information into the initial features,realizing a variable granularity data block with contextual semantic embedding. Abetter representation of similar data blocks is obtained by controlling the data block size,dynamically merging neighboring similar datablocks or non-redundant data blocks to reduce metadata overhead, and segmenting the transition region located between similar and non-redundant data blocks. Finally,to evaluate its performance,a prototype variable granularity similar data detection algorithm,rCARD,isimplemented and extensively experimented on real world datasets. The experimental results show that compared to the latest similarity detection deduplication technique Finesse,rCARD achieves a higher deduplication rate while significantly reducing the metadata size and accelerates the similarity detection speedup by up to 11. 07 times.