A selective LVQ algorithm for improving instance reduction techniques and its application for text classification

Hayel, Rafa; El Hindi, Khalil; Hosny, Manar; Alharbi, Rawan

doi:10.3233/JIFS-235290

A selective LVQ algorithm for improving instance reduction techniques and its application for text classification

Article type: Research Article

Authors: Hayel, Rafa | El Hindi, Khalil^{; *} | Hosny, Manar | Alharbi, Rawan

Affiliations: Department of Computer Science, College of Computerand Information Sciences, King Saud University, Riyadh, SaudiArabia

Correspondence: [*] Corresponding author. Khalil El Hindi, College of Computer and information Sciences, P.O. Box 51178, Riyadh 11543, Saudi Arabia. E-mail: [email protected].

Abstract: Instance-Based Learning, such as the k Nearest Neighbor (kNN), offers a straightforward and effective solution for text classification. However, as a lazy learner, kNN’s performance heavily relies on the quality and quantity of training instances, often leading to time and space inefficiencies. This challenge has spurred the development of instance-reduction techniques aimed at retaining essential instances and discarding redundant ones. While such trimming optimizes computational demands, it might adversely affect classification accuracy. This study introduces the novel Selective Learning Vector Quantization (SLVQ) algorithm, specifically designed to enhance the performance of datasets reduced through such techniques. Unlike traditional LVQ algorithms that employ random vector weights (codebook vectors), SLVQ utilizes instances selected by the reduction algorithm as the initial weight vectors. Importantly, as these instances often contain nominal values, SLVQ modifies the distances between these nominal values, rather than modifying the values themselves, aiming to improve their representation of the training set. This approach is crucial because nominal attributes are common in real-world datasets and require effective distance measures, such as the Value Difference Measure (VDM), to handle them properly. Therefore, SLVQ adjusts the VDM distances between nominal values, instead of altering the attribute values of the codebook vectors. Hence, the innovation of the SLVQ approach lies in its integration of instance reduction techniques for selecting initial codebook vectors and its effective handling of nominal attributes. Our experiments, conducted on 17 text classification datasets with four different instance reduction algorithms, confirm SLVQ’s effectiveness. It significantly enhances the kNN’s classification accuracy of reduced datasets. In our empirical study, the SLVQ method improved the performance of these datasets, achieving average classification accuracies of 82.55%, 84.07%, 78.54%, and 83.18%, compared to the average accuracies of 76.25%, 79.62%, 66.54%, and 78.19% achieved by non-fine-tuned datasets, respectively.

Keywords: Machine learning, instance based learning, learning vector quantization, k-nearest neighbor, value difference metric (VDM)

DOI: 10.3233/JIFS-235290

Journal: Journal of Intelligent & Fuzzy Systems, vol. 46, no. 5-6, pp. 11353-11366, 2024

Published: 24 October 2024

Price: EUR 27.50

North America

IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA

Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

Europe

IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands

Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]

For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]

Asia

Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China

Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

如果您在出版方面需要帮助或有任何建, 件至: [email protected]

Share this:

North America

Europe

Asia