Searching for just a few words should be enough to get started. If you need to make more complex queries, use the tips below to guide you.
Article type: Research Article
Authors: Chandrabanshi, Vishnu* | Domnic, S.
Affiliations: National Institute of Technology, Tiruchirappalli, India
Correspondence: [*] Corresponding author: Vishnu Chandrabanshi, National Institute of Technology, Tiruchirappalli, 620015, India. E-mail: [email protected].
Abstract: Visual Speech Recognition (VSR) is a popular area in computer vision research, attracting interest for its ability to precisely analyze lip motion and seamlessly convert them into textual representation. VSR systems leverage visual features to augment the understanding of automated speech and predict text. VSR finds various applications, including enhancing speech recognition in scenarios with degraded acoustic signals, aiding individuals with hearing impairments, bolstering security by reducing reliance on text based passwords, facilitating biometric authentication for liveness detection, and enabling underwater communications. Despite the various techniques proposed for improving the resilience and precision of automatic speech recognition, VSR still has challenges like homophones of words, gradient descent issues with varying sequence lengths, and lip reading demands accounting for short and long range correlations between consecutive video frames. We have proposed a hybrid network (HNet) with multilayered three dimensional dilated convolution neural network (3D-CNN). The spatio-temporal feature extraction process will be facilitated by dilated 3D-CNN. HNet integrates two bidirectional recurrent neural networks (BiGRU and BiLSTM) to process the feature sequences bidirectionally to establish the temporal relationship. The fusion of BiGRU-BiLSTM capabilities allows the model to process feature sequences more comprehensively and effectively. The proposed work focuses on face based biometric authentication for liveness detection using the VSR model to boost security against face spoofing. The existing face based biometric systems are widely used for individual authentication and verification but are still vulnerable to 3D masks and adversarial attacks. The VSR system can be added to existing face based verification systems as a second level authentication technique to identify a person with liveness. The working ideology of the VSR system will be based on the challenge response technique, where a person has to pronounce the passcode silently displayed on the screen. The VSR model assesses its effectiveness using word error rate (WER), which matches the pronounced passcode to the one presented on the screen. Overall, the proposed work aims to enhance the accuracy of VSR so that it can be combined with existing face based authentication systems. The proposed system outperforms the existing VSR system and obtained 1.3% WER. The significance of the proposed hybrid model is that it efficiently captures temporal dependencies, enhancing context embedding, improving robustness to input variability, reducing information loss, and enhancing performance and accuracy in modeling and analyzing passcode pronunciation patterns.
Keywords: VSR, Deep Learning, Dilated 3D-CNN, BiLSTM, BiGRU
DOI: 10.3233/HIS-240014
Journal: International Journal of Hybrid Intelligent Systems, vol. 20, no. 4, pp. 385-401, 2024
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA
Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]
For editorial issues, like the status of your submitted paper or proposals, write to [email protected]
IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands
Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]
For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]
Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China
Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]
For editorial issues, like the status of your submitted paper or proposals, write to [email protected]
如果您在出版方面需要帮助或有任何建, 件至: [email protected]