A method for balancing a multi-labeled biomedical dataset

Mukhin, A.V.; Kilbas, I.A.; Paringer, R.A.; Ilyasova, N. Yu.; Kupriyanov, A.V.

doi:10.3233/ICA-220676

A method for balancing a multi-labeled biomedical dataset

Article type: Research Article

Authors: Mukhin, A.V.^a | Kilbas, I.A.^a | Paringer, R.A.^{a; b} | Ilyasova, N. Yu.^{a; b} | Kupriyanov, A.V.^{a; b; *}

Affiliations: [a] Samara National Research University, Moskovskoye Shosse, 34, Samara, Russia | [b] IPSI RAS – Branch of the FSRC “Crystallography and Photonics” RAS, Samara, Russia

Correspondence: [*] Corresponding author: A.V. Kupriyanov, Samara National Research University, Moskovskoye Shosse, 34, Samara 443086, Russia. E-mail: [email protected].

Abstract: In this paper, we propose a data balancing method for multi-label biomedical data. The method can be applied in the case of semantic segmentation problems for balancing the corresponding image data. The proposed method performs oversampling of instances of minority classes in a way that increases the frequencies of appearance (a ratio of number of samples, containing this class, over the total number of samples in the dataset) of minority classes in the data, thereby reducing the class imbalance. The effectiveness of the proposed method is shown experimentally by applying it to two highly unbalanced biomedical image datasets. A convolutional neural network (CNN) was trained on several versions of those datasets: one balanced with the proposed method, another balanced with manual oversampling and an unbalanced version. The results of the experiments validate the effectiveness of the proposed method, proving that it allows the influence of class imbalance on the learning algorithm to be reduced, thus improving its original classification results for most of the classes. Apart from biomedical image data, the proposed method was applied to several common multi-label datasets. Inherently, the proposed method does not make any assumptions about the underlying structure of the data to be balanced; therefore, it can be applied to all types of data (vectors, images, etc.) that can be described in a multi-label framework. It also can be used in conjunction with any learning algorithm that is suitable for multi-label data. To illustrate its wider applicability, a series of experiments was conducted using seven common multi-label datasets. An experimental comparison to existing multi-label data balancing approaches is provided, as well. The experimental results show that the proposed method presents a competitive alternative to existing approaches.

Keywords: Multi-label data, multi-label balancing, imbalanced data, neural networks, convolutional network, fundus, biomedical data, semantic segmentation

DOI: 10.3233/ICA-220676

Journal: Integrated Computer-Aided Engineering, vol. 29, no. 2, pp. 209-225, 2022

Published: 14 March 2022

Price: EUR 27.50

North America

IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA

Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

Europe

IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands

Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]

For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]

Asia

Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China

Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

如果您在出版方面需要帮助或有任何建, 件至: [email protected]

Share this:

North America

Europe

Asia