Searching for just a few words should be enough to get started. If you need to make more complex queries, use the tips below to guide you.
Article type: Research Article
Authors: Senthamizh Selvi, S.a; * | Anitha, R.b
Affiliations: [a] Department of Computer Science and Engineering, Sri Venkateswara College of Engineering, Tamil Nadu, India | [b] Department of Computer Science and Engineering, Sri Venkateswara College of Engineering, Tamil Nadu, India
Correspondence: [*] Corresponding author. S. Senthamizh Selvi, Research Scholar, Anna University, Department of Computer Science and Engineering, Sri Venkateswara College of Engineering, Tamil Nadu, India. E-mail: [email protected].
Abstract: In India, most of the Science and Technology resources available are in English. Developing an Automatic Language Translation Engine from English (source language) to Tamil (target language) is very essential for the people who need to get technical resources in their native language. The challenges in designing such engines using Natural Language Processing (NLP) tools include Lexical, Structural, and Syntax level ambiguity. To solve these challenges, the development of a Part-Of-Speech (POS) tagger is essential. The Verb-Framed languages like Tamil, Japanese, and many languages in Romance, Semitic, and Mayan languages families have high morphological richness but lack either a large volume of annotated corpora or manually constructed linguistic resources for building POS tagger. Moreover, the Tamil Language has a low resource, high word sense ambiguity, and word-free order form giving rise to challenges in designing Tamil POS taggers. In this paper, we postulate a Hybrid POS tagger algorithm for Tamil Language using Cross-Lingual Transformation Learning Techniques. It is a novel Mining-based algorithm (MT), which finds equivalent words of Tamil in English on less volume of English-Tamil bilingual unannotated parallel corpus. To enhance the performance of MT, we developed Tamil language-specific auxiliary algorithms such as Keyword-based tagging algorithm (KT) and Verb pattern-based tagging algorithm (VT). We also developed a Unique pair occurrence-tagging algorithm (UT) to find the one-time occurrence of Tamil-English pair words. Our experiments show that by improving Context-based Bilingual Corpus to Bilingual parallel corpus and after leaving one-time occurrence words, the proposed Hybrid POS tagger can predict 81.15% words, with 73.51% accuracy and 90.50% precision. Evaluations prove our algorithms can generate language resources, which can improve the performance of NLP tasks in Tamil.
Keywords: Natural language processing, part-of-speech tagger, sandhi, bilingual parallel corpus, cross-lingual transformation learning
DOI: 10.3233/JIFS-221278
Journal: Journal of Intelligent & Fuzzy Systems, vol. 43, no. 6, pp. 8329-8348, 2022
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA
Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]
For editorial issues, like the status of your submitted paper or proposals, write to [email protected]
IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands
Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]
For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]
Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China
Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]
For editorial issues, like the status of your submitted paper or proposals, write to [email protected]
如果您在出版方面需要帮助或有任何建, 件至: [email protected]