Metaheuristic adapted convolutional neural network for Telugu speaker diarization

V, Sethuram; Prasad, Ande; Rajeswara Rao, R.

doi:10.3233/IDT-211005

Metaheuristic adapted convolutional neural network for Telugu speaker diarization

Article type: Research Article

Authors: V, Sethuram^{a; *} | Prasad, Ande^a | Rajeswara Rao, R.^b

Affiliations: [a] Vikrama Simhapuri University, Kakuur Nellore, Andhra Pradesh, India | [b] JNTU, Vizayanagaram, Andhra Pradesh, India

Correspondence: [*] Corresponding author: Sethuram V, Vikrama Simhapuri University, Kakuur Nellore, Andhra Pradesh, India. E-mail: [email protected].

Abstract: In speech technology, a pivotal role is being played by the Speaker diarization mechanism. In general, speaker diarization is the mechanism of partitioning the input audio stream into homogeneous segments based on the identity of the speakers. The automatic transcription readability can be improved with the speaker diarization as it is good in recognizing the audio stream into the speaker turn and often provides the true speaker identity. In this research work, a novel speaker diarization approach is introduced under three major phases: Feature Extraction, Speech Activity Detection (SAD), and Speaker Segmentation and Clustering process. Initially, from the input audio stream (Telugu language) collected, the Mel Frequency Cepstral coefficient (MFCC) based features are extracted. Subsequently, in Speech Activity Detection (SAD), the music and silence signals are removed. Then, the acquired speech signals are segmented for each individual speaker. Finally, the segmented signals are subjected to the speaker clustering process, where the Optimized Convolutional Neural Network (CNN) is used. To make the clustering more appropriate, the weight and activation function of CNN are fine-tuned by a new Self Adaptive Sea Lion Algorithm (SA-SLnO). Finally, a comparative analysis is made to exhibit the superiority of the proposed speaker diarization work. Accordingly, the accuracy of the proposed method is 0.8073, which is 5.255, 2.45%, and 0.075, superior to the existing works.

Keywords: Speaker diarization, segmentation, clustering, Telugu language, MFCC, optimization, CNN

DOI: 10.3233/IDT-211005

Journal: Intelligent Decision Technologies, vol. 15, no. 4, pp. 561-577, 2021

Published: 10 January 2022

Price: EUR 27.50

North America

IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA

Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

Europe

IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands

Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]

For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]

Asia

Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China

Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

如果您在出版方面需要帮助或有任何建, 件至: [email protected]

Share this:

North America

Europe

Asia