TY - JOUR
T1 - Speech enhancement using non-negative spectrogram models with mel-generalized cepstral regularization
AU - Li, Li
AU - Kameoka, Hirokazu
AU - Toda, Tomoki
AU - Makino, Shoji
N1 - Funding Information:
This work was supported by JSPS KAKENHI Grant Number 26730100 and 17H01763, and SECOM Science and Technology Foundation.
Publisher Copyright:
Copyright © 2017 ISCA.
PY - 2017
Y1 - 2017
N2 - Spectral domain speech enhancement algorithms based on nonnegative spectrogram models such as non-negative matrix factorization (NMF) and non-negative matrix factor deconvolution are powerful in terms of signal recovery accuracy, however they do not directly lead to an enhancement in the feature domain (e.g., cepstral domain) or in terms of perceived quality. We have previously proposed a method that makes it possible to enhance speech in the spectral and cepstral domains simultaneously. Although this method was shown to be effective, the devised algorithm was computationally demanding. This paper proposes yet another formulation that allows for a fast implementation by replacing the regularization term with a divergence measure between the NMF model and the mel-generalized cepstral (MGC) representation of the target spectrum. Since the MGC is an auditory-motivated representation of an audio signal widely used in parametric speech synthesis, we also expect the proposed method to have an effect in enhancing the perceived quality. Experimental results revealed the effectiveness of the proposed method in terms of both the signal-To-distortion ratio and the cepstral distance.
AB - Spectral domain speech enhancement algorithms based on nonnegative spectrogram models such as non-negative matrix factorization (NMF) and non-negative matrix factor deconvolution are powerful in terms of signal recovery accuracy, however they do not directly lead to an enhancement in the feature domain (e.g., cepstral domain) or in terms of perceived quality. We have previously proposed a method that makes it possible to enhance speech in the spectral and cepstral domains simultaneously. Although this method was shown to be effective, the devised algorithm was computationally demanding. This paper proposes yet another formulation that allows for a fast implementation by replacing the regularization term with a divergence measure between the NMF model and the mel-generalized cepstral (MGC) representation of the target spectrum. Since the MGC is an auditory-motivated representation of an audio signal widely used in parametric speech synthesis, we also expect the proposed method to have an effect in enhancing the perceived quality. Experimental results revealed the effectiveness of the proposed method in terms of both the signal-To-distortion ratio and the cepstral distance.
KW - Mel-generalized cepstral representation
KW - Non-negative matrix factorization
KW - Single channel signal processing
KW - Speech enhancement
UR - http://www.scopus.com/inward/record.url?scp=85039169858&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85039169858&partnerID=8YFLogxK
U2 - 10.21437/Interspeech.2017-1492
DO - 10.21437/Interspeech.2017-1492
M3 - Conference article
AN - SCOPUS:85039169858
SN - 2308-457X
VL - 2017-August
SP - 1998
EP - 2002
JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
T2 - 18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017
Y2 - 20 August 2017 through 24 August 2017
ER -