Various Chunking and Deduduplication Techniques in Big Data

Authors

  • Naresh Kumar CSE Department, UIET-Kurukshetra University, Kurukshetra, India
  • Ishu Devi CSE Department, UIET, Kurukshetra University, Kurukshetra, Haryana, India

Keywords:

Big Data, Chunking, Deduplication, FBC (Frequency Based Chunking) and CDC (Content Defined Chunking)

Abstract

In today’s environment very huge amount of data is generated with duplication. This huge amount of data is called big data. To handle this kind of big data and reduce duplicity from data chunking and deduplication mechanism is used. In deduplication mechanism duplicate data is removed by using chunking and hash functions. In this paper an attempt has been made to converse different chunking and deduplication techniques. A comparative analysis of these techniques with different pros and cons has been presented.

 

References

M. Dirk, “Advanced data deduplication techniques and their application”, Ph.D. dissertation, Universit¨ at sbibliothek Mainz, pp.1-6, 2013.

M. Dirk, K. J¨urgen, B. Andre, C. Toni, K. Michael, K. Julian, “A study on data deduplication in hpc storage systems”, in Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, USA, pp.1-7, 2012.

Chi Yang, Jinjun Chen, “A Scalable Data Chunk Similarity Based Compression Approach for Efficient Big Sensing Data Processing on Cloud”, IEEE Transactions on Knowledge and Data Engineering, China, pp.1144-1157, 2017.

R. Tuchinda, C. Knoblock, P. Szekely, "Building data integration queries by demonstration", Proceedings of the 12th international conference on Intelligent user interfaces, USA, pp. 170-179, 2007.

Q. He, X. Zhang, Z. Li, "Data deduplication techniques", 2010 International Conference on Future Information Technology and Management Engineering (FITME), , CA, pp. 430-433, 2010.

A. Banu and C. Chandrasekar, "A survey on deduplication methods", International Journal of Computer Trends and Technology, vol.3, no.3, pp. 364-368, 2012.

Zhi Tang, Youjip Won, “Multithread Content Based File Chunking System in CPU GPGPU Heterogeneous Architecture”, 2011 First International Conference on Data Compression, Communications and Processing, China, pp. 58-64, 2011.

Zhike Zhang, Zejun Jiang, Zhiqiang Liu, Cheng Zhang Peng, “LHS: A Nobel Method Of Information Retrieval Avoiding An Index Using Linear Hashing With Key Groups In Deduplication”, Proceedings of the 2012 International Conference on Machine Learning and Cybernetics, China, pp.1312-1318, 2012.

Duane F. Shell, Leen-Kiat Soh, Vlad Chiriacescu, “Modeling Chunking Effects on Learning and Performance using the Computational-Unified Learning Model (C-ULM): A Multiagent Cognitive Process Model”, IEEE 15th International Conference on Cognitive Informatics & Cognitive Computing, India, pp. 77-85, 2016.

Xingyu Zhang, Jian Zhang, “Data Deduplication Cluster Based on Similarity- Locality Approach”, IEEE International Conference on Green Computing and Communications and IEEE Internet of Things and IEEE Cyber, Physical and Social Computing, CA, pp.2168-2173, 2013.

Wen Xia, Hong Jiang, Dan Feng, Lei Tian, “Combining Deduplication and Delta Compression to Achieve Low-Overhead Data Reduction on Backup Datasets”, Data Compression Conference, France, pp. 203-212, 2014.

Bo Mao, Hong Jiang, Suzhen Wu, Lei Tian, “Leveraging Data Deduplication to Improve the Performance of Primary Storage Systems in the Cloud”, IEEE Transactions on Computers, NY, pp.1-14, 2015.

Sonali D. Chaure, M. U. Kulkarni and Pankaj M. Jadhav, "Web based ETL Approach to Transform Relational Database to Graph Database", International Journal of Computer Sciences and Engineering, Vol.3, Issue.7, pp.92-97, 2015.

Downloads

Published

2017-06-30

How to Cite

[1]
N. Kumar and I. Devi, “Various Chunking and Deduduplication Techniques in Big Data”, Int. J. Sci. Res. Comp. Sci. Eng., vol. 5, no. 3, pp. 129–131, Jun. 2017.

Issue

Section

Review Article