TY - JOUR
T1 - Speech Enhancement Based on Bayesian Low-Rank and Sparse Decomposition of Multichannel Magnitude Spectrograms
AU - Bando, Yoshiaki
AU - Itoyama, Katsutoshi
AU - Konyo, Masashi
AU - Tadokoro, Satoshi
AU - Nakadai, Kazuhiro
AU - Yoshii, Kazuyoshi
AU - Kawahara, Tatsuya
AU - Okuno, Hiroshi G.
PY - 2018/2/1
Y1 - 2018/2/1
N2 - This paper presents a blind multichannel speech enhancement method that can deal with the time-varying layout of microphones and sound sources. Since nonnegative tensor factorization (NTF) separates a multichannelmagnitude (or power) spectrogram into source spectrograms without phase information, it is robust against the time-varying mixing system. This method, however, requires prior information such as the spectral bases (templates) of each source spectrogram in advance. To solve this problem, we develop a Bayesian model called robust NTF (Bayesian RNTF) that decomposes a multichannel magnitude spectrogram into target speech and noise spectrograms based on their sparseness and low rankness. Bayesian RNTF is applied to the challenging task of speech enhancement for a microphone array distributed on a hose-shaped rescue robot. When the robot searches for victims under collapsed buildings, the layout of themicrophones changes over time and some of them often fail to capture target speech. Our method robustly works under such situations, thanks to its characteristic of time-varying mixing system. Experiments using a 3-m hose-shaped rescue robot with eight microphones show that the proposed method outperforms conventional blind methods in enhancement performance by the signal-to-noise ratio of 1.03 dB.
AB - This paper presents a blind multichannel speech enhancement method that can deal with the time-varying layout of microphones and sound sources. Since nonnegative tensor factorization (NTF) separates a multichannelmagnitude (or power) spectrogram into source spectrograms without phase information, it is robust against the time-varying mixing system. This method, however, requires prior information such as the spectral bases (templates) of each source spectrogram in advance. To solve this problem, we develop a Bayesian model called robust NTF (Bayesian RNTF) that decomposes a multichannel magnitude spectrogram into target speech and noise spectrograms based on their sparseness and low rankness. Bayesian RNTF is applied to the challenging task of speech enhancement for a microphone array distributed on a hose-shaped rescue robot. When the robot searches for victims under collapsed buildings, the layout of themicrophones changes over time and some of them often fail to capture target speech. Our method robustly works under such situations, thanks to its characteristic of time-varying mixing system. Experiments using a 3-m hose-shaped rescue robot with eight microphones show that the proposed method outperforms conventional blind methods in enhancement performance by the signal-to-noise ratio of 1.03 dB.
KW - Bayesian signal processing
KW - low-rank and sparse decomposition
KW - Multichannel speech enhancement
UR - http://www.scopus.com/inward/record.url?scp=85034267108&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85034267108&partnerID=8YFLogxK
U2 - 10.1109/TASLP.2017.2772340
DO - 10.1109/TASLP.2017.2772340
M3 - Article
AN - SCOPUS:85034267108
SN - 2329-9290
VL - 26
SP - 215
EP - 230
JO - IEEE/ACM Transactions on Speech and Language Processing
JF - IEEE/ACM Transactions on Speech and Language Processing
IS - 2
ER -