ViReader: A Wikipedia-based Vietnamese reading comprehension system using transfer learning

Van Nguyen, Kiet; Duy Nguyen, Nhat; Do, Phong Nguyen-Thuan; Gia-Tuan Nguyen, Anh; Nguyen, Ngan Luu-Thuy

doi:10.3233/JIFS-210683

ViReader: A Wikipedia-based Vietnamese reading comprehension system using transfer learning

Article type: Research Article

Authors: Van Nguyen, Kiet^{a; b} | Duy Nguyen, Nhat^{a; b} | Do, Phong Nguyen-Thuan^{a; b} | Gia-Tuan Nguyen, Anh^{a; b} | Nguyen, Ngan Luu-Thuy^{a; b; *}

Affiliations: [a] University of Information Technology, Ho Chi Minh City, Vietnam | [b] VietnamNational University, Ho Chi Minh City, Vietnam

Correspondence: [*] Corresponding author. Ngan Luu-Thuy Nguyen, University of Information Technology, Vietnam National University, Ho Chi Minh City. E-mail: [email protected].

Abstract: Machine Reading Comprehension has attracted significant interest in research on natural language understanding, and large-scale datasets and neural network-based methods have been developed for this task. However, most developments of resources and methods in machine reading comprehension have been investigated using two resource-rich languages, English and Chinese. This article proposes a system called ViReader for open-domain machine reading comprehension in Vietnamese by using Wikipedia as the textual knowledge source, where the answer to any particular question is a textual span derived directly from texts on Vietnamese Wikipedia. Our system combines a sentence retriever component, based on techniques of information retrieval to extract the relevant sentences, with a transfer learning-based answer extractor trained to predict answers based on Wikipedia texts. Experiments on multiple datasets for machine reading comprehension in Vietnamese and other languages demonstrate that (1) our ViReader system is highly competitive with prevalent machine learning-based systems, and (2) multi-task learning by using a combination consisting of the sentence retriever and answer extractor is an end-to-end reading comprehension system. The sentence retriever component of our proposed system retrieves the sentences that are most likely to provide the answer response to the given question. The transfer learning-based answer extractor then reads the document from which the sentences have been retrieved, predicts the answer, and returns it to the user. The ViReader system achieves new state-of-the-art performances, with values of 70.83% EM (exact match) and 89.54% F1, outperforming the BERT-based system by 11.55% and 9.54% , respectively. It also obtains state-of-the-art performance on UIT-ViNewsQA (another Vietnamese dataset consisting of online health-domain news) and BiPaR (a bilingual dataset on English and Chinese novel texts). Compared with the BERT-based system, our system achieves significant improvements (in terms of F1) with 7.65% for English and 6.13% for Chinese on the BiPaR dataset. Furthermore, we build a ViReader application programming interface that programmers can employ in Artificial Intelligence applications.

Keywords: Machine reading comprehension, question answering, transfer learning, sentence transformer

DOI: 10.3233/JIFS-210683

Journal: Journal of Intelligent & Fuzzy Systems, vol. 41, no. 1, pp. 1993-2011, 2021

Published: 11 August 2021

Price: EUR 27.50

North America

IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA

Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

Europe

IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands

Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]

For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]

Asia

Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China

Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

如果您在出版方面需要帮助或有任何建, 件至: [email protected]

Share this:

North America

Europe

Asia