TY - GEN
T1 - Online integration of DNN-Based and spatial clustering-based mask estimation for robust MVDR beamforming
AU - Matsui, Yutaro
AU - Nakatani, Tomohiro
AU - Delcroix, Marc
AU - Kinoshita, Keisuke
AU - Ito, Nobutaka
AU - Araki, Shoko
AU - Makino, Shoji
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/11/2
Y1 - 2018/11/2
N2 - This paper discusses the online estimation of time- frequency masks, which enables us to perform mask-based beamforming by online processing for robust automatic speech recognition (ASR). Two approaches to online mask estimation have been separately developed for this purpose. One is based on a deep neural network (DNN), which exploits the spectral features of the signal. The other is based on spatial clustering (SC), which exploits the spatial features of the signal. This paper proposes a new method that integrates the two online estimation approaches to further improve online mask estimation by exploiting the advantages of both approaches. Experiments using the real data of the CHiME-3 multichannel noisy speech corpus show that the proposed method greatly outperforms the conventional approaches in terms of improving the word error rate (WER).
AB - This paper discusses the online estimation of time- frequency masks, which enables us to perform mask-based beamforming by online processing for robust automatic speech recognition (ASR). Two approaches to online mask estimation have been separately developed for this purpose. One is based on a deep neural network (DNN), which exploits the spectral features of the signal. The other is based on spatial clustering (SC), which exploits the spatial features of the signal. This paper proposes a new method that integrates the two online estimation approaches to further improve online mask estimation by exploiting the advantages of both approaches. Experiments using the real data of the CHiME-3 multichannel noisy speech corpus show that the proposed method greatly outperforms the conventional approaches in terms of improving the word error rate (WER).
KW - Beamforming
KW - DNN
KW - Online processing
KW - Spatial clustering
KW - Time-frequency masking
UR - http://www.scopus.com/inward/record.url?scp=85057361959&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85057361959&partnerID=8YFLogxK
U2 - 10.1109/IWAENC.2018.8521354
DO - 10.1109/IWAENC.2018.8521354
M3 - Conference contribution
AN - SCOPUS:85057361959
T3 - 16th International Workshop on Acoustic Signal Enhancement, IWAENC 2018 - Proceedings
SP - 71
EP - 75
BT - 16th International Workshop on Acoustic Signal Enhancement, IWAENC 2018 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 16th International Workshop on Acoustic Signal Enhancement, IWAENC 2018
Y2 - 17 September 2018 through 20 September 2018
ER -