Searching for just a few words should be enough to get started. If you need to make more complex queries, use the tips below to guide you.
Article type: Research Article
Authors: Hu, Yeminga; * | Milios, Evangelos E.a | Blustein, Jamesa; b
Affiliations: [a] Faculty of Computer Science, Dalhousie University, Halifax, Nova Scotia, Canada | [b] School of Management, Dalhousie University, Halifax, Nova Scotia, Canada
Correspondence: [*] Corresponding author: Yeming Hu, Faculty of Computer Science, Dalhousie University, 6050 University Avenue, Halifax, Nova Scotia, Canada. Tel.: +1 902 494 7111; Fax: +1 902 492 1517; E-mail: [email protected].
Note: [1] Short version of this paper appears in [13].
Abstract: Unsupervised document clustering groups documents into clusters without any user effort. However, the clusters produced are often found not in accord with user's perception of the document collection. In this paper we describe a novel framework and explore whether clustering performance can be improved by including user supervision at the feature level. Unlike existing semi-supervised clustering methods, which ask the user to label documents, this framework interactively asks the user to label features. The proposed method ranks all features based on the recent clusters using cluster-based feature selection and presents a list of highly ranked features to the user for labeling. The feature set for the next clustering iteration includes both features accepted by the user and other highly ranked features. The experimental results on several real datasets demonstrate that the feature set obtained using the new interactive framework can produce clusters that better match the user's expectations compared with the unsupervised version of the methods. Moreover, we quantify and evaluate the effect of reweighting previously accepted features and of user effort. Different underlying clustering algorithms such as K Means and Multinomial Naïve Bayes model are demonstrated to perform very well with the newly proposed framework.
Keywords: Interactive clustering, interactive feature selection, user supervision, feature supervision, feature reweighting
DOI: 10.3233/IDA-140658
Journal: Intelligent Data Analysis, vol. 18, no. 4, pp. 561-581, 2014
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA
Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]
For editorial issues, like the status of your submitted paper or proposals, write to [email protected]
IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands
Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]
For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]
Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China
Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]
For editorial issues, like the status of your submitted paper or proposals, write to [email protected]
如果您在出版方面需要帮助或有任何建, 件至: [email protected]