The fitness-rough: A new attribute reduction method based on statistical and rough set theory

Choo, Yun-Huoy; Bakar, Azuraliza Abu; Hamdan, Abdul Razak

doi:10.3233/IDA-2008-12105

The fitness-rough: A new attribute reduction method based on statistical and rough set theory

Article type: Research Article

Authors: Choo, Yun-Huoy | Bakar, Azuraliza Abu^{; *} | Hamdan, Abdul Razak

Affiliations: Department of Science and System Management, Faculty of Information Science and Technology, The National University of Malaysia, 43600 Bangi, Selangor, Malaysia

Correspondence: [*] Corresponding author. Tel.: +60 389216748; Fax: +60 38925 6732; E-mail: [email protected].

Abstract: Attribute reduction has become an important pre-processing task to reduce the complexity of the data mining task. Rough reducts, statistical methods and correlation-based methods have gradually contributed towards improving attribute reduction techniques to a certain extent. Statistical methods are generally lower in computational complexity compared to the rough reducts and the correlation-based methods, but many have proven that the rough reducts method is significant in reducing important attributes without causing too much information loss. Correlation-based methods on the other hand evaluate features as a subset instead of individual attribute. In this paper, we propose a combination of statistical and rough set methods to reduce important attributes in a simpler way while maintaining a lesser degree of information loss from the raw data. The fitness-rough method (FsR) indicates important attributes from raw data and it is further simplified to a more compact information table. Besides that, we have also looked into the problem of information loss in this method. Ten UCI machine learning datasets were used as testing sets on the proposed method as compared to the classical rough reducts (RR) method, the statistical entropy (ENT) method and the correlation-based feature selection (CFS) method. Experimental results show that our method has performed comparatively well with higher reduction strength and smaller rules set against the benchmarking methods, especially in medium size datasets. However, the FsR method is basically less efficient when used on mix-mode and nominal datasets as the non-quantitative attributes involved in these datasets are normally pre-categorised.

Keywords: Attribute reduction, fitness degree, information loss, rough reducts, heuristic rule

DOI: 10.3233/IDA-2008-12105

Journal: Intelligent Data Analysis, vol. 12, no. 1, pp. 73-87, 2008

Received 17 January 2007

Accepted 22 August 2007

Published: 18 February 2008

Price: EUR 27.50

North America

IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA

Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

Europe

IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands

Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]

For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]

Asia

Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China

Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

如果您在出版方面需要帮助或有任何建, 件至: [email protected]

Share this:

North America

Europe

Asia