A new text-based w-distance metric to find the perfect match between words

Ali, Munwar; Jung, Low Tang; Hosam, Osama; Wagan, Asif Ali; Shah, Rehan Ali; Khayyat, Mashael

doi:10.3233/JIFS-179552

A new text-based w-distance metric to find the perfect match between words

Issue title: Mathematical Modelling in Computational and Life Sciences

Guest editors: Ahmed Farouk

Article type: Research Article

Authors: Ali, Munwar^{a; *} | Jung, Low Tang^b | Hosam, Osama^{c; d} | Wagan, Asif Ali^e | Shah, Rehan Ali^f | Khayyat, Mashael^g

Affiliations: [a] Department of IT, Shaheed Benazir Bhutto University, Shaheed Benazirabad, Sindh, Pakistan | [b] Deparment of Computer and Information Sciences, Universiti Teknologi PETRONAS, Malaysia | [c] The College of Computer Science and Engineering in Yanbu, Taibah University, Medina, Saudi Arabia | [d] Informatics Research Institute, The City for Scientific Research and Technology Applications, Alexandria, Egypt | [e] Department of Computer Science, SMIU, Karachi, Pakistan | [f] Department of Computer Systems Engineering, Faculty of Engineering, The Islamia University Bahawalpur, Pakistan | [g] Department of Information Systems and Technology, Faculty of Computer Science and Engineering, University of Jeddah, Jeddah, Saudi Arabia

Correspondence: [*] Corresponding author. Munwar Ali, Department of IT, Shaheed Benazir Bhutto University, Shaheed Benazirabad, Sindh, Pakistan. E-mail: [email protected]

Abstract: The k-NN algorithm is an instance-based learning algorithm which is widely used in the data mining applications. The core engine of the k-NN algorithm is the distance/similarity function. The performance of the k-NN algorithm varies with the selection of distance function. The traditional distance/similarity functions in k-NN do not perfectly handle the mix-mode words such as when one string has multiple substrings/words. For example, a two-word string of “Employee Name”, a one-word string of “Name” or more than one word such as, “Name of Employee”. This ambiguity is faced by different distance/similarity functions causing difficulties in finding the perfect match of words. To improve the perfect-match calculation functionality in the traditional k-NN algorithm, a new similarity distance metric is developed and named as word-distance (w-distance). The perfect match will help us to identify the exact required value. The proposed w-distance is a hybrid of distance and similarity in nature because it is to handle dissimilarity and similarity features of strings at the same time. The simulation results showed that w-distance has a better impact on the performance of the k-NN algorithm as compared to the Euclidean distance and the cosine similarity.

Keywords: k-NN algorithm, distance/similarity metric, text match, data mining, cosine similarity

DOI: 10.3233/JIFS-179552

Journal: Journal of Intelligent & Fuzzy Systems, vol. 38, no. 3, pp. 2661-2672, 2020

Published: 04 March 2020

Price: EUR 27.50

North America

IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA

Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

Europe

IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands

Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]

For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]

Asia

Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China

Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

如果您在出版方面需要帮助或有任何建, 件至: [email protected]

Share this:

North America

Europe

Asia