Searching for just a few words should be enough to get started. If you need to make more complex queries, use the tips below to guide you.
Article type: Research Article
Authors: Nataraj, Sathees Kumara; * | Paulraj, M. P.b | Bin Abdullah, Ahmad Nazric | Bin Yaacob, Sazalid
Affiliations: [a] Department of Mechatronics Engineering, AMA International University, Bahrain | [b] Principal, Sri Ramakrishna Institute of Technology, India | [c] Centre for International Languages, Universiti Malaysia Perlis, Malaysia | [d] Electrical, Electronics & Automation Section, UniKL Malaysian Spanish Institute (UniKL MSI), Malaysia
Correspondence: [*] Corresponding author. Sathees Kumar Nataraj, Department of Mechatronics Engineering, AMA International University, Bahrain. E-mail: [email protected].
Abstract: In this paper, a speech-to-text translation model has been developed for Malaysian speakers based on 41 classes of Phonemes. A simple data acquisition algorithm has been used to develop a MATLAB graphical user interface (GUI) for recording the isolated word speech signals from 35 non-native Malaysian speakers. The collected database consists of 86 words with 41 classes of phoneme based on Affricatives, Diphthongs, Fricatives, Liquid, Nasals, Semivowels and Glides, Stop and Vowels. The speech samples are preprocessed to eliminate the undesirable artifacts and the fuzzy voice classifier has been employed to classify the samples into voiced sequence and unvoiced sequence. The voiced sequences are divided into frame segments and for each frame, the Linear Predictive co-efficients features are obtained from the voiced sequence. Then the feature sets are formed by deriving the LPC features from all the extracted voiced sequences, and used for classification. The isolated words chosen based on the phonemes are associated with the extracted features to establish classification system input-output mapping. The data are then normalized and randomized to rearrange the values into definite range. The Multilayer Neural Network (MLNN) model has been developed with four combinations of input and hidden activation functions. The neural network models are trained with 60%, 70% and 80% of the total data samples. The neural network architecture was aimed at creating a robust model with 60%, 70%, and 80% of the feature set with 25 trials. The trained network model is validated by simulating the network with the remaining 40%, 30%, and 20% of the set. The reliability of trained network models were compared by measuring true-positive, false-negative, and network classification accuracy. The LPC features show better discrimination and the MLNN neural network models trained using the LPC spectral band features gives better recognition.
Keywords: Fuzzy voice classifier, Malaysian English pronunciation, linear predictive coefficients (LPCC), neural network models (MLNN).
DOI: 10.3233/JIFS-200780
Journal: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 5, pp. 7411-7429, 2020
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA
Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]
For editorial issues, like the status of your submitted paper or proposals, write to [email protected]
IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands
Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]
For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]
Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China
Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]
For editorial issues, like the status of your submitted paper or proposals, write to [email protected]
如果您在出版方面需要帮助或有任何建, 件至: [email protected]