Searching for just a few words should be enough to get started. If you need to make more complex queries, use the tips below to guide you.
Article type: Research Article
Authors: Mohan, Prakasha; * | Sundaram, Manikandana | Satpathy, Sambitb | Das, Sanchalib
Affiliations: [a] Data Science and Analytics Center, Karpagam College of Engineering, Coimbatore, India | [b] Noida Institute of Engineering and Technology, Greater Noida, Uttar Pradesh, India
Correspondence: [*] Corresponding author. Prakash Mohan, Data Science and Analytics Center, Karpagam College of Engineering, India. E-mail: [email protected].
Abstract: Techniques of data compression involve de-duplication of data that plays an important role in eliminating duplicate copies of information and has been widely employed in cloud storage to scale back the storage capacity and save information measure. A secure AES encryption de-duplication system for finding duplication with the meaning and store up it in the cloud. To protect the privacy of sensitive information whereas supporting de-duplication, The AES encryption technique and SHA-256 hashing algorithm have been utilized to encrypt the information before outsourcing. Pre-processing is completed and documents are compared and verified with the use of wordnet. Cosine similarity is employed to see the similarity between both the documents and to perform this, a far economical VSM data structure is used. Wordnet hierarchical corpus is used to see syntax and semantics so that the identification of duplicates is done. NLTK provides a large vary of libraries and programs for symbolic and statistical natural language process (NLP) for the Python programming language that is used here for the unidentified words by cosine similarity. Within the previous strategies, cloud storage was used abundantly since similar files were allowed to store. By implementing our system, space for storing is reduced up to 85%. Since AES and SHA-256 are employed, it provides high security and efficiency.
Keywords: Vector space Model, Wordnet, deduplication, cosine similarity, NLTK
DOI: 10.3233/JIFS-210038
Journal: Journal of Intelligent & Fuzzy Systems, vol. 41, no. 2, pp. 2969-2980, 2021
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA
Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]
For editorial issues, like the status of your submitted paper or proposals, write to [email protected]
IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands
Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]
For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]
Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China
Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]
For editorial issues, like the status of your submitted paper or proposals, write to [email protected]
如果您在出版方面需要帮助或有任何建, 件至: [email protected]