Genetic learner: Discretization and fuzzification of numerical attributes

Bruha, Ivan; Kralik, Pavel; Berka, Petr

doi:10.3233/IDA-2000-4506

Genetic learner: Discretization and fuzzification of numerical attributes

Article type: Research Article

Authors: Bruha, Ivan^a | Kralik, Pavel^b | Berka, Petr^c

Affiliations: [a] McMaster University, Department Computing & Software, Hamilton, Ont., Canada L8S 4L7. E-mail: [email protected]; URL: http://www.cas.mcmaster.ca/~bruha. | [b] Technical University of Brno, Department Automation and Information Technology, Technicka 2, Brno, CZ-61669, Czech Republic. E-mail: [email protected] | [c] Prague University of Economics, Laboratory of Intelligent Systems, Prague, CZ-13067, Czech Republic. E-mail: [email protected]

Abstract: Machine learning (ML) is a useful and productive component of data mining (DM). Given a large database, a learning algorithm induces a description of concepts (classes) which are immersed in a given problem area. The induction itself consists in searching usually a huge space of possible concept descriptions. There exist several paradigms for controlling this search. One of the promising and efficient paradigms are genetic algorithms (GAs). There have been done many research projects of incorporating genetic algorithms into the field of machine learning. This paper describes an efficient application of a GA in the attribute-based rule-inducing learning algorithm. Actually, a domain-independent GA has been integrated into the covering learning algorithm CN4, a large extension of the well-known algorithm CN2; the induction procedure of CN4 (beam search methodology) has been removed and the GA has been implanted into this shell. Genetic algorithms are capable of processing symbolic attributes in a simple, natural manner. The processing of numerical (continuous) attributes by genetic algorithms is not so straightforward. One feasible strategy is to discretize numerical attributes before a generic algorithm is called. There exist quite a few discretization preprocessors in data mining and machine learning. This paper describes a newer preprocessor for discretization (categorization) of numerical attributes. The genuine discretization procedures generate sharp bounds (thresholds) between intervals. It may result in capturing training objects from various classes (concepts) into one interval that will not be `pure'; this in particular happens near the interval borders. One feasible way how to eliminate such an impurity around the interval borders is to fuzzify them. The paper first introduces the methodology of our new learning algorithm, the genetic learner. Then the discretization/fuzzification preprocessor is presented. Finally, the paper compares the entire system (a preprocessor and genetic learner) with well-known covering as well as TDIDT learning algorithms.

Keywords: inductive learning, data mining, knowledge discovery in databases, genetic algorithms, numerical attributes, discretization, fuzzification

DOI: 10.3233/IDA-2000-4506

Journal: Intelligent Data Analysis, vol. 4, no. 5, pp. 445-460, 2000

Received 8 December 1999

Accepted 30 May 2000

Published: 11 December 2000

Price: EUR 27.50

North America

IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA

Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

Europe

IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands

Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]

For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]

Asia

Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China

Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

如果您在出版方面需要帮助或有任何建, 件至: [email protected]

Share this:

North America

Europe

Asia