Searching for just a few words should be enough to get started. If you need to make more complex queries, use the tips below to guide you.
Article type: Research Article
Authors: Chen, Danyanga; * | Wang, Xiangyub | Xu, Xiuc | Zhong, Chenga | Xu, Jinhuid
Affiliations: [a] School of Computer, Electronics and Information, Guangxi University, Guangxi, China | [b] Cloud and Smart Industries Group, Tencent, Guangdong, China | [c] School of Computer Science and Technology, China University of Mining and Technology, Jiangsu, China | [d] Department of Computer Science and Engineering, University at Buffalo, NY, USA
Correspondence: [*] Corresponding author: Danyang Chen, School of Computer, Electronics and Information, Guangxi University, Guangxi, China. Tel.: +86 771 3232214; E-mail: [email protected].
Abstract: We consider the problem of clustering a set of uncertain data, where each data consists of a point-set indicating its possible locations. The objective is to identify the representative for each uncertain data and group them into k clusters so as to minimize the total clustering cost. Different from other models, our model does not assume that there is a probability distribution for each uncertain data. Thus, all possible locations need to be considered to determine the representative. Existing methods for this problem are either impractical or have difficulty to handle large-scale datasets due to their pairwise-distance based global search strategy and expensive optimization computation. In this paper, we propose a novel sparse Non-negative Matrix Factorization (NMF) method which measures the similarity of uncertain data by their most commonly shared features. A divide-and-conquer approach is adopted to remarkably improve the efficiency. A novel diagonal l0-constraint and its l1 relaxation are proposed to overcome the challenge of determining the representatives. We give a detailed analysis to show the correctness of our method, and provide an effective initialization and peeling strategy to enhance the ability of processing large-scale datasets. Experimental results on some benchmark datasets confirm the effectiveness of our method.
Keywords: Uncertain data clustering, sparse non-negative matrix factorization, data analysis, machine learning
DOI: 10.3233/IDA-205622
Journal: Intelligent Data Analysis, vol. 26, no. 3, pp. 615-636, 2022
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA
Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]
For editorial issues, like the status of your submitted paper or proposals, write to [email protected]
IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands
Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]
For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]
Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China
Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]
For editorial issues, like the status of your submitted paper or proposals, write to [email protected]
如果您在出版方面需要帮助或有任何建, 件至: [email protected]