Sparse non-negative matrix factorization for uncertain data clustering

Chen, Danyang; Wang, Xiangyu; Xu, Xiu; Zhong, Cheng; Xu, Jinhui

doi:10.3233/IDA-205622

Sparse non-negative matrix factorization for uncertain data clustering

Article type: Research Article

Authors: Chen, Danyang^{a; *} | Wang, Xiangyu^b | Xu, Xiu^c | Zhong, Cheng^a | Xu, Jinhui^d

Affiliations: [a] School of Computer, Electronics and Information, Guangxi University, Guangxi, China | [b] Cloud and Smart Industries Group, Tencent, Guangdong, China | [c] School of Computer Science and Technology, China University of Mining and Technology, Jiangsu, China | [d] Department of Computer Science and Engineering, University at Buffalo, NY, USA

Correspondence: [*] Corresponding author: Danyang Chen, School of Computer, Electronics and Information, Guangxi University, Guangxi, China. Tel.: +86 771 3232214; E-mail: [email protected].

Abstract: We consider the problem of clustering a set of uncertain data, where each data consists of a point-set indicating its possible locations. The objective is to identify the representative for each uncertain data and group them into k clusters so as to minimize the total clustering cost. Different from other models, our model does not assume that there is a probability distribution for each uncertain data. Thus, all possible locations need to be considered to determine the representative. Existing methods for this problem are either impractical or have difficulty to handle large-scale datasets due to their pairwise-distance based global search strategy and expensive optimization computation. In this paper, we propose a novel sparse Non-negative Matrix Factorization (NMF) method which measures the similarity of uncertain data by their most commonly shared features. A divide-and-conquer approach is adopted to remarkably improve the efficiency. A novel diagonal l0-constraint and its l1 relaxation are proposed to overcome the challenge of determining the representatives. We give a detailed analysis to show the correctness of our method, and provide an effective initialization and peeling strategy to enhance the ability of processing large-scale datasets. Experimental results on some benchmark datasets confirm the effectiveness of our method.

Keywords: Uncertain data clustering, sparse non-negative matrix factorization, data analysis, machine learning

DOI: 10.3233/IDA-205622

Journal: Intelligent Data Analysis, vol. 26, no. 3, pp. 615-636, 2022

Published: 18 April 2022

Price: EUR 27.50

North America

IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA

Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

Europe

IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands

Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]

For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]

Asia

Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China

Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

如果您在出版方面需要帮助或有任何建, 件至: [email protected]

Share this:

North America

Europe

Asia