TY - GEN
T1 - Speaker Adaptation for Multichannel End-to-End Speech Recognition
AU - Ochiai, Tsubasa
AU - Watanabe, Shinji
AU - Katagiri, Shigeru
AU - Hori, Takaaki
AU - Hershey, John
N1 - Funding Information:
Tsubasa Ochiai and Shigeru Katagiri was supported in part by JSPS Grants-in-Aid for Scientific Research No. 26280063, MEXT-Supported Program Driver-in-the-Loop, and Grant-in-Aid for JSPS Fellows.
Funding Information:
Tsubasa Ochiai and Shigeru Katagiri was supported in part by JSPS Grants-in-Aid for Scientific Research No. 26280063, MEXT-Supported Program Driver-in-the-Loop, and Grant-in-Aid for JSPS Fellows. Shinji Watan-abe, Takaaki Hori, and John Hershey was supported by MERL. † Currently, he is at Johns Hopkins University.
Publisher Copyright:
© 2018 IEEE.
PY - 2018/9/10
Y1 - 2018/9/10
N2 - Recent work on multichannel end-to-end automatic speech recognition (ASR) has shown that multichannel speech enhancement and speech recognition functions can be integrated into a deep neural network (DNN)-based system, and promising experimental results have been shown using the CHiME-4 and AMI corpora. In other recent DNN-based hidden Markov model (DNN-HMM) hybrid architectures, the effectiveness of speaker adaptation has been well established. Motivated by these results, we propose a multi-path adaptation scheme for end-to-end multichannel ASR, which combines the unprocessed noisy speech features with a speech-enhanced pathway to improve upon previous end-to-end ASR approaches. Experimental results using CHiME-4 show that (1) our proposed multi-path adaptation scheme improves ASR performance and (2) adapting the encoder network is more effective than adapting the neural beamformer, attention mechanism, or decoder network.
AB - Recent work on multichannel end-to-end automatic speech recognition (ASR) has shown that multichannel speech enhancement and speech recognition functions can be integrated into a deep neural network (DNN)-based system, and promising experimental results have been shown using the CHiME-4 and AMI corpora. In other recent DNN-based hidden Markov model (DNN-HMM) hybrid architectures, the effectiveness of speaker adaptation has been well established. Motivated by these results, we propose a multi-path adaptation scheme for end-to-end multichannel ASR, which combines the unprocessed noisy speech features with a speech-enhanced pathway to improve upon previous end-to-end ASR approaches. Experimental results using CHiME-4 show that (1) our proposed multi-path adaptation scheme improves ASR performance and (2) adapting the encoder network is more effective than adapting the neural beamformer, attention mechanism, or decoder network.
KW - Attention-based encoder-decoder
KW - Multichannel end-to-end ASR
KW - Neural beamformer
KW - Speaker adaptation
UR - http://www.scopus.com/inward/record.url?scp=85054252839&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85054252839&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2018.8462161
DO - 10.1109/ICASSP.2018.8462161
M3 - Conference contribution
AN - SCOPUS:85054252839
SN - 9781538646588
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 6707
EP - 6711
BT - 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018
Y2 - 15 April 2018 through 20 April 2018
ER -