Searching for just a few words should be enough to get started. If you need to make more complex queries, use the tips below to guide you.
Article type: Research Article
Authors: Boulle, Marc
Affiliations: France Telecom R&D, 2, Avenue Pierre Marzin, 22300 Lannion, France. E-mail: [email protected]
Abstract: While real data often comes in mixed format, discrete and continuous, many supervised induction algorithms require discrete data. Although efficient supervised discretization methods are available, the unsupervised Equal Frequency discretization method is still widely used by the statistician both for data exploration and data preparation. In this paper, we propose an automatic method, based on a Bayesian approach, to optimize the number of bins for Equal Frequency discretizations in the context of supervised learning. We introduce a space of Equal Frequency discretization models and a prior distribution defined on this model space. This results in the definition of a Bayes optimal evaluation criterion for Equal Frequency discretizations. We then propose an optimal search algorithm whose run-time is super-linear in the sample size. Extensive comparative experiments demonstrate that the method works quite well in many cases.
Keywords: data mining, machine learning, discretization, bayesianism, data analysis
DOI: 10.3233/IDA-2005-9204
Journal: Intelligent Data Analysis, vol. 9, no. 2, pp. 175-188, 2005
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA
Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]
For editorial issues, like the status of your submitted paper or proposals, write to [email protected]
IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands
Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]
For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]
Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China
Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]
For editorial issues, like the status of your submitted paper or proposals, write to [email protected]
如果您在出版方面需要帮助或有任何建, 件至: [email protected]