An efficient feature selection technique for clustering based on a new measure of feature importance

Goswami, Saptarsi; Chakrabarti, Amlan; Chakraborty, Basabi

doi:10.3233/IFS-162156

An efficient feature selection technique for clustering based on a new measure of feature importance

Article type: Research Article

Authors: Goswami, Saptarsi^{a; *} | Chakrabarti, Amlan^b | Chakraborty, Basabi^c

Affiliations: [a] Computer Science and Engineering, Institute of Engineering & Management, Salt Lake, Kolkata, India | [b] A.k.Choudhury School of Information Technology, Calcutta University, Kolkata, India | [c] Faculty of Software and Information Science, Iwate Prefectural University, Iwate, Japan

Correspondence: [*] Corresponding author. Saptarsi Goswami, Computer Science and Engineering, Institute of Engineering & Management, Salt Lake, Kolkata 700 091, India. Tel.: +91 9836065470; E-mail: [email protected].

Abstract: Feature elimination happens because either the features are irrelevant or they are redundant. The major challenge with feature selection for clustering is that relevance of a feature is not well defined. In this paper, an attempt to address this gap is made. Feature relevance is firstly defined in terms of Variability Score (VSi), a novel score which measures a feature’s contribution to the overall variability of the dataset. Secondly, feature relevance is evaluated using entropy. VSi is a multivariate measure of feature relevance, where as entropy is univariate. Both of them have been used in a greedy forward search to select optimal feature subset (FSELCET –VS, FSELECT –EN). Redundancy is handled using Pearson’s correlation coefficient. Dataset characteristics also influence result. Therefore it is recommended to apply both and adopt the best for that particular dataset. Extensive empirical study over thirty publicly available datasets show that the proposed method produces better performance compared to a few state of the art methods. The average feature reduction produced is 44%. No statistically significant reduction in performance (t = –0.35, p = 0.73) when compared with all features was observed. Moreover, the proposed method is shown to be relatively computationally inexpensive as well.

Keywords: Feature selection, correlation, entropy, principal components analysis, greedy forward

DOI: 10.3233/IFS-162156

Journal: Journal of Intelligent & Fuzzy Systems, vol. 32, no. 6, pp. 3847-3858, 2017

Published: 23 May 2017

Price: EUR 27.50

North America

IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA

Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

Europe

IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands

Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]

For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]

Asia

Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China

Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

如果您在出版方面需要帮助或有任何建, 件至: [email protected]

Share this:

North America

Europe

Asia