Various Chunking and Deduduplication Techniques in Big Data
Keywords:
Big Data, Chunking, Deduplication, FBC (Frequency Based Chunking) and CDC (Content Defined Chunking)Abstract
In today’s environment very huge amount of data is generated with duplication. This huge amount of data is called big data. To handle this kind of big data and reduce duplicity from data chunking and deduplication mechanism is used. In deduplication mechanism duplicate data is removed by using chunking and hash functions. In this paper an attempt has been made to converse different chunking and deduplication techniques. A comparative analysis of these techniques with different pros and cons has been presented.
References
M. Dirk, “Advanced data deduplication techniques and their application”, Ph.D. dissertation, Universit¨ at sbibliothek Mainz, pp.1-6, 2013.
M. Dirk, K. J¨urgen, B. Andre, C. Toni, K. Michael, K. Julian, “A study on data deduplication in hpc storage systems”, in Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, USA, pp.1-7, 2012.
Chi Yang, Jinjun Chen, “A Scalable Data Chunk Similarity Based Compression Approach for Efficient Big Sensing Data Processing on Cloud”, IEEE Transactions on Knowledge and Data Engineering, China, pp.1144-1157, 2017.
R. Tuchinda, C. Knoblock, P. Szekely, "Building data integration queries by demonstration", Proceedings of the 12th international conference on Intelligent user interfaces, USA, pp. 170-179, 2007.
Q. He, X. Zhang, Z. Li, "Data deduplication techniques", 2010 International Conference on Future Information Technology and Management Engineering (FITME), , CA, pp. 430-433, 2010.
A. Banu and C. Chandrasekar, "A survey on deduplication methods", International Journal of Computer Trends and Technology, vol.3, no.3, pp. 364-368, 2012.
Zhi Tang, Youjip Won, “Multithread Content Based File Chunking System in CPU GPGPU Heterogeneous Architecture”, 2011 First International Conference on Data Compression, Communications and Processing, China, pp. 58-64, 2011.
Zhike Zhang, Zejun Jiang, Zhiqiang Liu, Cheng Zhang Peng, “LHS: A Nobel Method Of Information Retrieval Avoiding An Index Using Linear Hashing With Key Groups In Deduplication”, Proceedings of the 2012 International Conference on Machine Learning and Cybernetics, China, pp.1312-1318, 2012.
Duane F. Shell, Leen-Kiat Soh, Vlad Chiriacescu, “Modeling Chunking Effects on Learning and Performance using the Computational-Unified Learning Model (C-ULM): A Multiagent Cognitive Process Model”, IEEE 15th International Conference on Cognitive Informatics & Cognitive Computing, India, pp. 77-85, 2016.
Xingyu Zhang, Jian Zhang, “Data Deduplication Cluster Based on Similarity- Locality Approach”, IEEE International Conference on Green Computing and Communications and IEEE Internet of Things and IEEE Cyber, Physical and Social Computing, CA, pp.2168-2173, 2013.
Wen Xia, Hong Jiang, Dan Feng, Lei Tian, “Combining Deduplication and Delta Compression to Achieve Low-Overhead Data Reduction on Backup Datasets”, Data Compression Conference, France, pp. 203-212, 2014.
Bo Mao, Hong Jiang, Suzhen Wu, Lei Tian, “Leveraging Data Deduplication to Improve the Performance of Primary Storage Systems in the Cloud”, IEEE Transactions on Computers, NY, pp.1-14, 2015.
Sonali D. Chaure, M. U. Kulkarni and Pankaj M. Jadhav, "Web based ETL Approach to Transform Relational Database to Graph Database", International Journal of Computer Sciences and Engineering, Vol.3, Issue.7, pp.92-97, 2015.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors contributing to this journal agree to publish their articles under the Creative Commons Attribution 4.0 International License, allowing third parties to share their work (copy, distribute, transmit) and to adapt it, under the condition that the authors are given credit and that in the event of reuse or distribution, the terms of this license are made clear.