Affiliations: Department of Computer Science and Engineering, G. H. Raisoni College of Engineering, Affiliated to RTM University Nagpur, Nagpur, Maharashtra, India
Corresponding author: Jyoti J. Malhotra, Department of Computer Science and Engineering, G. H. Raisoni College of Engineering, Affiliated to RTM University Nagpur, Nagpur, Maharashtra, India. E-mail: firstname.lastname@example.org.
Abstract: This paper intends to perform de-duplication for enhancing the storage optimization. Hence, this paper contributes by proposing a hybrid fingerprint extracting using simhash (SH) and Huffman coding (HC) algorithms. Secondly, the data is clustered using the latest technique called as grey wolf optimization (GWO) to extract the metadata. The extracted metadata is stored in metadata server which provides better storage optimization and de-duplication. Euclidean distance based GWO is adopted as it provides minimum Euclidean distance in the GWO based clustering for de-duplication. The proposed GWO based clustering method is compared with the existing methods such as k-means, k-mode, Euclidean distance based Particle Swarm Optimization and Euclidean distance based genetic algorithm in terms of accuracy, True Positive Rate (TPR), True Negative Rate (TNR) and performance time and the significance of the GWO based clustering method is described.
Keywords: De-duplication, simhash algorithm, huffman coding, grey wolf optimization, accuracy