Searching for just a few words should be enough to get started. If you need to make more complex queries, use the tips below to guide you.
Article type: Research Article
Authors: Alsayadi, Hamzah A.a; b; * | Abdelhamid, Abdelaziz A.a; c | Hegazy, Islama | Fayed, Zaki T.a
Affiliations: [a] Computer Science Department, Faculty of Computer & Information Sciences, Ain Shams University, Egypt | [b] Computer Science Department, Faculty of Sciences, Ibb University, Yemen | [c] College of Computing and Information Technology, Shaqra University, Saudi Arabia
Correspondence: [*] Corresponding author. Hamzah A. Alsayadi. E-mail: [email protected].
Abstract: Arabic language has a set of sound letters called diacritics, these diacritics play an essential role in the meaning of words and their articulations. The change in some diacritics leads to a change in the context of the sentence. However, the existence of these letters in the corpus transcription affects the accuracy of speech recognition. In this paper, we investigate the effect of diactrics on the Arabic speech recognition based end-to-end deep learning. The applied end-to-end approach includes CNN-LSTM and attention-based technique presented in the state-of-the-art framework namely, Espresso using Pytorch. In addition, and to the best of our knowledge, the approach of CNN-LSTM with attention-based has not been used in the task of Arabic Automatic speech recognition (ASR). To fill this gap, this paper proposes a new approach based on CNN-LSTM with attention based method for Arabic ASR. The language model in this approach is trained using RNN-LM and LSTM-LM and based on nondiacritized transcription of the speech corpus. The Standard Arabic Single Speaker Corpus (SASSC), after omitting the diacritics, is used to train and test the deep learning model. Experimental results show that the removal of diacritics decreased out-of-vocabulary and perplexity of the language model. In addition, the word error rate (WER) is significantly improved when compared to diacritized data. The achieved average reduction in WER is 13.52%.
Keywords: Arabic speech recognition, Arabic diacritics, End-to-End deep learning, CNN-LSTM
DOI: 10.3233/JIFS-202841
Journal: Journal of Intelligent & Fuzzy Systems, vol. 41, no. 6, pp. 6207-6219, 2021
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA
Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]
For editorial issues, like the status of your submitted paper or proposals, write to [email protected]
IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands
Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]
For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]
Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China
Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]
For editorial issues, like the status of your submitted paper or proposals, write to [email protected]
如果您在出版方面需要帮助或有任何建, 件至: [email protected]