Searching for just a few words should be enough to get started. If you need to make more complex queries, use the tips below to guide you.
Article type: Research Article
Authors: Li, Yufenga; * | Xu, Keyia | Ding, Yumeib | Sun, Zhiweia | Ke, Tinga
Affiliations: [a] College of Artificial Intelligence, Tianjin University of Science & Technology, Tianjin, China | [b] College of Sciences, Tianjin University of Science & Technology, Tianjin, China
Correspondence: [*] Corresponding author. Yufeng Li, Department of Data Science and Big Data, College of Artificial Intelligence, Tianjin University of Science & Technology, Tianjin, China. E-mail: [email protected].
Abstract: Many traditional clustering algorithms are incapable of processing mixed-type datasets in parallel, limiting their applications in big data. In this paper, we propose a CF tree clustering algorithm based on MapReduce to handle mixed-type datasets. Mapper phase and reducer phase are the two primary phases of MR-CF. In the mapper phase, the original CF tree algorithm is modified to collect intermediate CF entries, and in the reducer phase, k-prototypes is extended to cluster CF entries. To avoid the high costs associated with I/O overheads and data serialization, MR-CF loads a dataset from HDFS only once. We first analyze the time complexity, space complexity, and I/O complexity of MR-CF. We also compare it with sklearn BIRCH, Apache Mahout k-means, k-prototypes, and mrk-prototypes on several real-world datasets and synthetic datasets. Experiments on two mixed-type big datasets reveal that MR-CF reduces execution time by 45.4% and 61.3% when compared to k-prototypes, and it reduces execution time by 73.8% and 55.0% when compared to mrk-prototypes.
Keywords: Clustering analysis, CF tree, mixed-type datasets, BIRCH, k-prototypes
DOI: 10.3233/JIFS-224234
Journal: Journal of Intelligent & Fuzzy Systems, vol. 44, no. 5, pp. 8309-8320, 2023
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA
Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]
For editorial issues, like the status of your submitted paper or proposals, write to [email protected]
IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands
Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]
For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]
Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China
Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]
For editorial issues, like the status of your submitted paper or proposals, write to [email protected]
如果您在出版方面需要帮助或有任何建, 件至: [email protected]