Learning time-frequency mask for noisy speech enhancement using gaussian-bernoulli pre-trained deep neural networks

Saleem, Nasir; Khattak, Muhammad Irfan; Al-Hasan, Mu’ath; Jan, Atif

doi:10.3233/JIFS-201014

Learning time-frequency mask for noisy speech enhancement using gaussian-bernoulli pre-trained deep neural networks

Article type: Research Article

Authors: Saleem, Nasir^{a; b; *} | Khattak, Muhammad Irfan^{a; b} | Al-Hasan, Mu’ath^c | Jan, Atif^a

Affiliations: [a] Department of Electrical Engineering, University of Engineering & Technology, Peshawar, Pakistan | [b] Department of Electrical Engineering, FET, Gomal University, Dera Ismail Khan, Pakistan | [c] Collage of Engineering, Al Ain University, United Arab Emirates (UAE)

Correspondence: [*] Corresponding author. Nasir Saleem, E-mail: [email protected].

Abstract: Speech enhancement is a very important problem in various speech processing applications. Recently, supervised speech enhancement using deep learning approaches to estimate a time-frequency mask have proved remarkable performance gain. In this paper, we have proposed time-frequency masking-based supervised speech enhancement method for improving intelligibility and quality of the noisy speech. We believe that a large performance gain can be achieved if deep neural networks (DNNs) are layer-wise pre-trained by stacking Gaussian-Bernoulli Restricted Boltzmann Machine (GB-RBM). The proposed DNN is called as Gaussian-Bernoulli Deep Belief Network (GB-DBN) and are optimized by minimizing errors between the estimated and pre-defined masks. Non-linear Mel-Scale weighted mean square error (LMW-MSE) loss function is used as training criterion. We have examined the performance of the proposed pre-training scheme using different DNNs which are established on three time-frequency masks comprised of the ideal amplitude mask (IAM), ideal ratio mask (IRM), and phase sensitive mask (PSM). The results in different noisy conditions demonstrated that when DNNs are pre-trained by the proposed scheme provided a persistent performance gain in terms of the perceived speech intelligibility and quality. Also, the proposed pre-training scheme is effective and robust in noisy training data.

Keywords: Supervised speech enhancement, deep learning, deep belief networks, restricted boltzmann machine, intelligibility, quality

DOI: 10.3233/JIFS-201014

Journal: Journal of Intelligent & Fuzzy Systems, vol. 40, no. 1, pp. 849-864, 2021

Published: 04 January 2021

Price: EUR 27.50

North America

IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA

Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

Europe

IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands

Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]

For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]

Asia

Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China

Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

如果您在出版方面需要帮助或有任何建, 件至: [email protected]

Share this:

North America

Europe

Asia