Construct a biased SVM classifier based on Chebyshev distance for PU learning

Ke, Ting; Li, Min; Zhang, Lidong; Lv, Hui; Ge, Xuechun

doi:10.3233/JIFS-192064

Construct a biased SVM classifier based on Chebyshev distance for PU learning

Article type: Research Article

Authors: Ke, Ting^{a; *} | Li, Min^a | Zhang, Lidong^a | Lv, Hui^a | Ge, Xuechun^b

Affiliations: [a] Department of Mathematics, College of Science, Tianjin University of Science & Technology, Tianjin, China | [b] China Academy of Railway Sciences Signal and Communication Research Institute (Beijing Huatie Information Technology Corporation), Beijing, China

Correspondence: [*] Corresponding author. Ting Ke, Department of Mathematics, College of Science, Tianjin University of Science & Technology, Tianjin, China. E-mail: [email protected].

Abstract: In some real applications, only limited labeled positive examples and many unlabeled examples are available, but there are no negative examples. Such learning is termed as positive and unlabeled (PU) learning. PU learning algorithm has been studied extensively in recent years. However, the classical ones based on the Support Vector Machines (SVMs) are assumed that labeled positive data is independent and identically distributed (i.i.d) and the sample size is large enough. It leads to two obvious shortcomings. On the one hand, the performance is not satisfactory, especially when the number of the labeled positive examples is small. On the other hand, classification results are not optimistic when datasets are Non-i.i.d. For this reason, this paper proposes a novel SVM classifier using Chebyshev distance to measure the empirical risk and designs an efficient iterative algorithm, named L∞ - BSVM in short. L∞ - BSVM includes the following merits: (1) it allows all sample points to participate in learning to prompt classification performance, especially in the case where the size of labeled data is small; (2) it minimizes the distance of the sample points that are (outliers in Non-i.i.d) farthest from the hyper-plane, where outliers are sufficiently taken into consideration (3) our iterative algorithm can solve large scale optimization problem with low time complexity and ensure the convergence of the optimum solution. Finally, extensive experiments on three types of datasets: artificial Non-i.i.d datasets, fault diagnosis of railway turnout with few labeled data (abnormal turnout) and six benchmark real-world datasets verify above opinions again and demonstrate that our classifier is much better than state-of-the-art competitors, such as B-SVM, LUHC, Pulce, B-LSSVM, NB and so on.

Keywords: Optimization, SVMs, Chebyshev distance, structural risk, empirical risk

DOI: 10.3233/JIFS-192064

Journal: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 3, pp. 3749-3767, 2020

Published: 07 October 2020

Price: EUR 27.50

North America

IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA

Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

Europe

IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands

Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]

For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]

Asia

Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China

Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

如果您在出版方面需要帮助或有任何建, 件至: [email protected]

Share this:

North America

Europe

Asia