Oversampling method based on GAN for tabular binary classification problems

Yang, Jie; Jiang, Zhenhao; Pan, Tingting; Chen, Yueqi; Pedrycz, Witold

doi:10.3233/IDA-220383

Oversampling method based on GAN for tabular binary classification problems

Article type: Research Article

Authors: Yang, Jie^a | Jiang, Zhenhao^{b; *} | Pan, Tingting^a | Chen, Yueqi^a | Pedrycz, Witold^{c; 1}

Affiliations: [a] School of Mathematical Sciences, Dalian University of Technology, Dalian, Liaoning, China | [b] School of Data Science, Chinese University of Hong Kong (Shenzhen), Shenzhen, Guangdong, China | [c] Department of Electrical and Computer Engineering, University of Alberta, Edmonton, AB, Canada

Correspondence: [*] Corresponding author: Zhenhao Jiang, School of Data Science, Chinese University of Hong Kong (Shenzhen), Shenzhen, Guangdong 518000, China. E-mail: [email protected].

Note: [1] Second corresponding author.

Abstract: Data-imbalanced problems are present in many applications. A big gap in the number of samples in different classes induces classifiers to skew to the majority class and thus diminish the performance of learning and quality of obtained results. Most data level imbalanced learning approaches generate new samples only using the information associated with the minority samples through linearly generating or data distribution fitting. Different from these algorithms, we propose a novel oversampling method based on generative adversarial networks (GANs), named OS-GAN. In this method, GAN is assigned to learn the distribution characteristics of the minority class from some selected majority samples but not random noise. As a result, samples released by the trained generator carry information of both majority and minority classes. Furthermore, the central regularization makes the distribution of all synthetic samples not restricted to the domain of the minority class, which can improve the generalization of learning models or algorithms. Experimental results reported on 14 datasets and one high-dimensional dataset show that OS-GAN outperforms 14 commonly used resampling techniques in terms of G-mean, accuracy and F1-score.

Keywords: Oversampling, GAN, imbalanced learning

DOI: 10.3233/IDA-220383

Journal: Intelligent Data Analysis, vol. 27, no. 5, pp. 1287-1308, 2023

Published: 6 October 2023

Price: EUR 27.50

North America

IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA

Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

Europe

IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands

Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]

For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]

Asia

Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China

Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

如果您在出版方面需要帮助或有任何建, 件至: [email protected]

Share this:

North America

Europe

Asia