Searching for just a few words should be enough to get started. If you need to make more complex queries, use the tips below to guide you.
Article type: Research Article
Authors: Yao, Xiaoa | Sheng, Zhengyana; * | Gu, Minb; c | Wang, Haibina | Xu, Ninga | Liu, Xiaofenga
Affiliations: [a] The College of IoT Engineering, Hohai University, Jiangsu, China | [b] Department of Stomatology, Affiliated Third Hospital of Soochow University, Suzhou, Jiangsu, China | [c] The First People’s Hospital of Changzhou, Changzhou, Jiangsu, China
Correspondence: [*] Corresponding author: Zhengyan Sheng, The College of IoT Engineering, Hohai University, Jiangsu, China. E-mail: [email protected].
Abstract: In order to improve the robustness of speech recognition systems, this study attempts to classify stressed speech caused by the psychological stress under multitasking workloads. Due to the transient nature and ambiguity of stressed speech, the stress characteristics is not represented in all the segments in stressed speech as labeled. In this paper, we propose a multi-feature fusion model based on the attention mechanism to measure the importance of segments for stress classification. Through the attention mechanism, each speech frame is weighted to reflect the different correlations to the actual stressed state, and the multi-channel fusion of features characterizing the stressed speech to classify the speech under stress. The proposed model further adopts SpecAugment in view of the feature spectrum for data augment to resolve small sample sizes problem among stressed speech. During the experiment, we compared the proposed model with traditional methods on CASIA Chinese emotion corpus and Fujitsu stressed speech corpus, and results show that the proposed model has better performance in speaker-independent stress classification. Transfer learning is also performed for speaker-dependent classification for stressed speech, and the performance is improved. The attention mechanism shows the advantage for continuous speech under stress in authentic context comparing with traditional methods.
Keywords: Attention mechanism, speech under stress, multi-feature fusion, SpecAugment, transfer learning
DOI: 10.3233/IDA-205429
Journal: Intelligent Data Analysis, vol. 25, no. 6, pp. 1603-1627, 2021
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA
Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]
For editorial issues, like the status of your submitted paper or proposals, write to [email protected]
IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands
Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]
For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]
Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China
Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]
For editorial issues, like the status of your submitted paper or proposals, write to [email protected]
如果您在出版方面需要帮助或有任何建, 件至: [email protected]