IIvotes ensemble for imbalanced data

Błaszczyński, Jerzy; Deckert, Magdalena; Stefanowski, Jerzy; Wilk, Szymon

doi:10.3233/IDA-2012-0551

IIvotes ensemble for imbalanced data

Issue title: Combined Learning Methods and Mining Complex Data

Article type: Research Article

Authors: Błaszczyński, Jerzy^{; *} | Deckert, Magdalena | Stefanowski, Jerzy | Wilk, Szymon

Affiliations: Institute of Computing Science, Poznań University of Technology, Poznań, Poland

Correspondence: [*] Corresponding author: Jerzy Błaszczyński, Institute of Computing Science, Poznań University of Technology, 60-965~Poznań, Poland. E-mail: [email protected].

Abstract: In the paper we present IIvotes – a new framework for constructing an ensemble of classifiers from imbalanced data. IIvotes incorporates the SPIDER method for selective data pre-processing into the adaptive Ivotes ensemble. Such an integration is aimed at improving balance between sensitivity and specificity (evaluated by the G-mean measure) for the minority class in comparison with single classifiers also combined with SPIDER. Using SPIDER to pre-process specific learning samples inside the ensemble improves sensitivity of derived component classifiers. At the same time the controlling mechanism of IIvotes ensures that overall accuracy (and thus specificity) is kept at a reasonable level. The new proposed IIvotes ensemble was thoroughly evaluated in a series of experiments where we tested it with symbolic (decision trees and rules) and non-symbolic (Naive Bayes) component classifiers. The results confirmed that combining SPIDER with an ensemble improved the performance (in terms of the G-mean measures) in comparison to a single classifier with SPIDER for all tested types of classifiers and two SPIDER pre-processing options (weak and strong amplification). These advantages were especially evident for decision trees and rules where differences between single and ensemble classifiers with SPIDER were more significant for both pre-processing options than for Naive Bayes. Moreover, the results demonstrated advantages of using a special abstaining classification strategy inside IIvotes rule ensembles, where component rule-based classifiers may refrain from predicting a class when in doubt. Abstaining rule ensembles performed much better with regard to G-mean than their non-abstaining variants.

Keywords: Imbalanced data, ensemble classifiers, Ivotes adaptive ensemble, SPIDER method, informed re-sampling

DOI: 10.3233/IDA-2012-0551

Journal: Intelligent Data Analysis, vol. 16, no. 5, pp. 777-801, 2012

Published: 8 October 2012

Price: EUR 27.50

North America

IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA

Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

Europe

IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands

Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]

For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]

Asia

Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China

Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

如果您在出版方面需要帮助或有任何建, 件至: [email protected]

Share this:

North America

Europe

Asia