Non-diacritized Arabic speech recognition based on CNN-LSTM and attention-based models

Alsayadi, Hamzah A.; Abdelhamid, Abdelaziz A.; Hegazy, Islam; Fayed, Zaki T.

doi:10.3233/JIFS-202841

Non-diacritized Arabic speech recognition based on CNN-LSTM and attention-based models

Article type: Research Article

Authors: Alsayadi, Hamzah A.^{a; b; *} | Abdelhamid, Abdelaziz A.^{a; c} | Hegazy, Islam^a | Fayed, Zaki T.^a

Affiliations: [a] Computer Science Department, Faculty of Computer & Information Sciences, Ain Shams University, Egypt | [b] Computer Science Department, Faculty of Sciences, Ibb University, Yemen | [c] College of Computing and Information Technology, Shaqra University, Saudi Arabia

Correspondence: [*] Corresponding author. Hamzah A. Alsayadi. E-mail: [email protected].

Abstract: Arabic language has a set of sound letters called diacritics, these diacritics play an essential role in the meaning of words and their articulations. The change in some diacritics leads to a change in the context of the sentence. However, the existence of these letters in the corpus transcription affects the accuracy of speech recognition. In this paper, we investigate the effect of diactrics on the Arabic speech recognition based end-to-end deep learning. The applied end-to-end approach includes CNN-LSTM and attention-based technique presented in the state-of-the-art framework namely, Espresso using Pytorch. In addition, and to the best of our knowledge, the approach of CNN-LSTM with attention-based has not been used in the task of Arabic Automatic speech recognition (ASR). To fill this gap, this paper proposes a new approach based on CNN-LSTM with attention based method for Arabic ASR. The language model in this approach is trained using RNN-LM and LSTM-LM and based on nondiacritized transcription of the speech corpus. The Standard Arabic Single Speaker Corpus (SASSC), after omitting the diacritics, is used to train and test the deep learning model. Experimental results show that the removal of diacritics decreased out-of-vocabulary and perplexity of the language model. In addition, the word error rate (WER) is significantly improved when compared to diacritized data. The achieved average reduction in WER is 13.52%.

Keywords: Arabic speech recognition, Arabic diacritics, End-to-End deep learning, CNN-LSTM

DOI: 10.3233/JIFS-202841

Journal: Journal of Intelligent & Fuzzy Systems, vol. 41, no. 6, pp. 6207-6219, 2021

Published: 16 December 2021

Price: EUR 27.50

North America

IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA

Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

Europe

IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands

Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]

For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]

Asia

Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China

Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

如果您在出版方面需要帮助或有任何建, 件至: [email protected]

Share this:

North America

Europe

Asia