Searching for just a few words should be enough to get started. If you need to make more complex queries, use the tips below to guide you.
Article type: Research Article
Authors: Van Hulse, Jason | Khoshgoftaar, Taghi M.; * | Napolitano, Amri
Affiliations: Florida Atlantic University, Boca Raton, Florida, USA
Correspondence: [*] Corresponding author: Taghi M. Khoshgoftaar, Data Mining and Machine Learning Laboratory, Department of Computer and Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL 33431, USA. Tel.: +1 561 297 3994; Fax: +1 561 297 2800; E-mail: [email protected].
Abstract: Much of the research literature in data mining and machine learning has focused on developing classification models for various application-specific learning tasks. In contrast, the characteristics of the underlying data, and their impacts on learning, have received much less attention. While it is generally understood that imbalanced, noisy and relatively small datasets make classification tasks more difficult, there has been, to our knowledge, no comprehensive examination of the impacts of these important and commonly-encountered dataset characteristics on the learning process. In this work, we present a comprehensive empirical analysis of learning from imbalanced, limited and noisy data. We present the performance of 11 commonly used learning algorithms and the effects of dataset size, class distribution, noise level and noise distribution on each learner. In this work, for which over one million classification models were built, we identify which learners are most robust to changing each of these experimental factors using two different performance metrics. Our results show that each of these factors plays a critical role in learner performance, with some learners exhibiting much greater stability than others.
Keywords: Class imbalance, class noise, classification, binary classification
DOI: 10.3233/IDA-2010-0464
Journal: Intelligent Data Analysis, vol. 15, no. 2, pp. 215-236, 2011
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA
Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]
For editorial issues, like the status of your submitted paper or proposals, write to [email protected]
IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands
Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]
For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]
Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China
Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]
For editorial issues, like the status of your submitted paper or proposals, write to [email protected]
如果您在出版方面需要帮助或有任何建, 件至: [email protected]