TY - JOUR
T1 - Coupled initialization of multi-channel non-negative matrix factorization based on spatial and spectral information
AU - Tachioka, Yuuki
AU - Narita, Tomohiro
AU - Miura, Iori
AU - Uramoto, Takanobu
AU - Monta, Natsuki
AU - Uenohara, Shingo
AU - Furuya, Ken'ichi
AU - Watanabe, Shinji
AU - Le Roux, Jonathan
N1 - Publisher Copyright:
Copyright © 2017 ISCA.
PY - 2017
Y1 - 2017
N2 - Multi-channel non-negative matrix factorization (MNMF) is a multi-channel extension of NMF and often outperforms NMF because it can deal with spatial and spectral information simultaneously. On the other hand, MNMF has a larger number of parameters and its performance heavily depends on the initial values. MNMF factorizes an observation matrix into four matrices: spatial correlation, basis, cluster-indicator latent variables, and activation matrices. This paper proposes effective initialization methods for these matrices. First, the spatial correlation matrix, which shows the largest initial value dependencies, is initialized using the cross-spectrum method from enhanced speech by binary masking. Second, when the target is speech, constructing bases from phonemes existing in an utterance can improve the performance: this paper proposes a speech bases selection by using automatic speech recognition (ASR). Third, we also propose an initialization method for the cluster-indicator latent variables that couple the spatial and spectral information, which can achieve the simultaneous optimization of above two matrices. Experiments on a noisy ASR task show that the proposed initialization significantly improves the performance of MNMF by reducing the initial value dependencies.
AB - Multi-channel non-negative matrix factorization (MNMF) is a multi-channel extension of NMF and often outperforms NMF because it can deal with spatial and spectral information simultaneously. On the other hand, MNMF has a larger number of parameters and its performance heavily depends on the initial values. MNMF factorizes an observation matrix into four matrices: spatial correlation, basis, cluster-indicator latent variables, and activation matrices. This paper proposes effective initialization methods for these matrices. First, the spatial correlation matrix, which shows the largest initial value dependencies, is initialized using the cross-spectrum method from enhanced speech by binary masking. Second, when the target is speech, constructing bases from phonemes existing in an utterance can improve the performance: this paper proposes a speech bases selection by using automatic speech recognition (ASR). Third, we also propose an initialization method for the cluster-indicator latent variables that couple the spatial and spectral information, which can achieve the simultaneous optimization of above two matrices. Experiments on a noisy ASR task show that the proposed initialization significantly improves the performance of MNMF by reducing the initial value dependencies.
KW - Automatic speech recognition
KW - Noisy speech
KW - Non-negative matrix factorization
KW - Spatial correlation
KW - Speech basis
UR - http://www.scopus.com/inward/record.url?scp=85039171112&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85039171112&partnerID=8YFLogxK
U2 - 10.21437/Interspeech.2017-61
DO - 10.21437/Interspeech.2017-61
M3 - Conference article
AN - SCOPUS:85039171112
SN - 2308-457X
VL - 2017-August
SP - 2461
EP - 2465
JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
T2 - 18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017
Y2 - 20 August 2017 through 24 August 2017
ER -