Improving Data Processing Speed on Large Datasets in a Hadoop Multi-node Cluster using Enhanced Apriori Algorithm

Sundarakumar, M.R.; Sharma, Ravi; Fathima, S.K.; Gokul Rajan, V.; Dhayanithi, J.; Marimuthu, M.; Mohanraj, G.; Sharma, Aditi; Johny Renoald, A.

doi:10.3233/JIFS-232048

Improving Data Processing Speed on Large Datasets in a Hadoop Multi-node Cluster using Enhanced Apriori Algorithm

Article type: Research Article

Authors: Sundarakumar, M.R.^{a; *} | Sharma, Ravi^a | Fathima, S.K.^b | Gokul Rajan, V.^a | Dhayanithi, J.^b | Marimuthu, M.^b | Mohanraj, G.^c | Sharma, Aditi^d | Johny Renoald, A.^e

Affiliations: [a] School of Computing Science and Engineering, Galgotias University, Greater Noida, Uttar Pradesh, India | [b] Department of CSE, Sona College of Technology, Salem, Tamilnadu, India | [c] Department of Smart Computing, Vellore Institute of Technology, Vellore, Tamilnadu, India | [d] School of Computer Science and Engineering, Parul Institute of Technology, Parul University, Gujarat, India | [e] Department of EEE, Erode Sengunthar Engineering College, Perundurai, Tamilnadu, India

Correspondence: [*] Corresponding author. M.R. Sundarakumar, School of Computing Science and Engineering, Galgotias University, Greater Noida, Uttar Pradesh, India. E-mail: [email protected].

Abstract: For large data, data mining methods were used on a Hadoop-based distributed infrastructure, using map reduction paradigm approaches for rapid data processing. Though data mining approaches are established methodologies, the Apriori algorithm provides a specific strategy for increasing data processing performance in big data analytics by applying map reduction. Apriori property is used to increase the efficiency of level-wise creation of frequent itemsets by minimizing the search area. A frequent itemset’s subsets must also be frequent (Apriori property). If an itemset is rarely, then all of its supersets are infrequent as well. We refined the apriori approach by varying the degree of order in locating frequent item sets in large clusters using map reduction programming. Fixed Pass Combined Counting (FPC) and Dynamic Pass Combined Counting (DPC) is a classical algorithm which are used for data processing from the huge datasets but their accuracy is not up to the mark. In this article, updated Apriori algorithms such as multiplied-fixed-pass combined counting (MFPC) and average time-based dynamic combined counting (ATDFC) are used to successfully achieve data processing speed. The proposed approaches are based on traditional Apriori core notions in data mining and will be used in the map-reduce multi-pass phase by ignoring pruning in some passes. The optimized-MFPC and optimized-ATDFC map-reduce framework model algorithms were also presented. The results of the experiments reveal that MFPC and ATDFC are more efficient in terms of execution time than previously outmoded approaches such as Fixed Pass Combined Counting (FPC) and Dynamic Pass Combined Counting (DPC). In a Hadoop multi-node cluster, this paradigm accelerates data processing on big data sets. Previous techniques were stated in terms of reducing execution time by 60–80% through the use of several passes. Because of the omitted trimming operation in data pre-processing, our proposed new approaches will save up to 84–90% of that time.

Keywords: Algorithms, pruning, data mining, hadoop cluster, map reduce

DOI: 10.3233/JIFS-232048

Journal: Journal of Intelligent & Fuzzy Systems, vol. 45, no. 4, pp. 6161-6177, 2023

Published: 04 October 2023

Price: EUR 27.50

North America

IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA

Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

Europe

IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands

Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]

For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]

Asia

Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China

Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

如果您在出版方面需要帮助或有任何建, 件至: [email protected]

Share this:

North America

Europe

Asia