Improved noise-filtering algorithm for AdaBoost using the inter-and intra-class variability of imbalanced datasets

Lee, Dohyun; Kim, Kyoungok

doi:10.3233/JIFS-213244

Improved noise-filtering algorithm for AdaBoost using the inter-and intra-class variability of imbalanced datasets

Article type: Research Article

Authors: Lee, Dohyun^a | Kim, Kyoungok^{b; *}

Affiliations: [a] Department of Data Science, Seoul National University of Science & Technology (SeoulTech), Seoul, Republic of Korea | [b] Department of Industrial Engineering, Seoul National University of Science & Technology (SeoulTech), Seoul, Republic of Korea

Correspondence: [*] Corresponding author. Kyoungok Kim, Department of Industrial Engineering, Seoul National University of Science & Technology (SeoulTech), Seoul 01811, Republic of Korea. E-mail: [email protected].

Abstract: Boosting methods are known to increase performance outcomes by using multiple learners connected sequentially. In particular, Adaptive boosting (AdaBoost) has been widely used owing to its comparatively improved predictive results for hard-to-learn samples based on misclassification costs. Each weak learner minimizes the expected risk by assigning high misclassification costs to suspect samples. The performance of AdaBoost depends on the distribution of noise samples because the algorithm tends to overfit noisy samples. Various studies have been conducted to address the noise sensitivity issue. Noise-filtering methods used in AdaBoost remove samples defined as noise based on the degree of misclassification to prevent overfitting to noisy samples. However, if the difference in the classification difficulty between classes is considerable, it is easy for samples from classes that are difficult to classify to be defined as noise. This situation is common with imbalanced datasets and can adversely affect performance outcomes. To solve this problem, this study proposes a new noise detection algorithm for AdaBoost that considers differences in the classification difficulty of classes and the characteristics of iteratively recalculated sample weight distributions. Experimental results on ten imbalanced datasets with various degrees of imbalanced ratios demonstrate that the proposed method defines noisy samples properly and improves the overall performance of AdaBoost.

Keywords: AdaBoost, noise-robust learning, noise-filtering, class imbalance, class separation

DOI: 10.3233/JIFS-213244

Journal: Journal of Intelligent & Fuzzy Systems, vol. 43, no. 4, pp. 5035-5051, 2022

Published: 10 August 2022

Price: EUR 27.50

North America

IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA

Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

Europe

IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands

Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]

For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]

Asia

Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China

Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

如果您在出版方面需要帮助或有任何建, 件至: [email protected]

Share this:

North America

Europe

Asia