Searching for just a few words should be enough to get started. If you need to make more complex queries, use the tips below to guide you.
Article type: Research Article
Authors: Goswami, Saptarsia; * | Chakrabarti, Amlanb | Chakraborty, Basabic
Affiliations: [a] Computer Science and Engineering, Institute of Engineering & Management, Salt Lake, Kolkata, India | [b] A.k.Choudhury School of Information Technology, Calcutta University, Kolkata, India | [c] Faculty of Software and Information Science, Iwate Prefectural University, Iwate, Japan
Correspondence: [*] Corresponding author. Saptarsi Goswami, Computer Science and Engineering, Institute of Engineering & Management, Salt Lake, Kolkata 700 091, India. Tel.: +91 9836065470; E-mail: [email protected].
Abstract: Feature elimination happens because either the features are irrelevant or they are redundant. The major challenge with feature selection for clustering is that relevance of a feature is not well defined. In this paper, an attempt to address this gap is made. Feature relevance is firstly defined in terms of Variability Score (VSi), a novel score which measures a feature’s contribution to the overall variability of the dataset. Secondly, feature relevance is evaluated using entropy. VSi is a multivariate measure of feature relevance, where as entropy is univariate. Both of them have been used in a greedy forward search to select optimal feature subset (FSELCET –VS, FSELECT –EN). Redundancy is handled using Pearson’s correlation coefficient. Dataset characteristics also influence result. Therefore it is recommended to apply both and adopt the best for that particular dataset. Extensive empirical study over thirty publicly available datasets show that the proposed method produces better performance compared to a few state of the art methods. The average feature reduction produced is 44%. No statistically significant reduction in performance (t = –0.35, p = 0.73) when compared with all features was observed. Moreover, the proposed method is shown to be relatively computationally inexpensive as well.
Keywords: Feature selection, correlation, entropy, principal components analysis, greedy forward
DOI: 10.3233/IFS-162156
Journal: Journal of Intelligent & Fuzzy Systems, vol. 32, no. 6, pp. 3847-3858, 2017
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA
Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]
For editorial issues, like the status of your submitted paper or proposals, write to [email protected]
IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands
Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]
For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]
Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China
Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]
For editorial issues, like the status of your submitted paper or proposals, write to [email protected]
如果您在出版方面需要帮助或有任何建, 件至: [email protected]