TY - GEN
T1 - Multilingual End-To-End Speech Translation
AU - Inaguma, Hirofumi
AU - Duh, Kevin
AU - Kawahara, Tatsuya
AU - Watanabe, Shinji
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/12
Y1 - 2019/12
N2 - In this paper, we propose a simple yet effective framework for multilingual end-To-end speech translation (ST), in which speech utterances in source languages are directly translated to the desired target languages with a universal sequence-To-sequence architecture. While multilingual models have shown to be useful for automatic speech recognition (ASR) and machine translation (MT), this is the first time they are applied to the end-To-end ST problem. We show the effectiveness of multilingual end-To-end ST in two scenarios: one-To-many and many-To-many translations with publicly available data. We experimentally confirm that multilingual end-To-end ST models significantly outperform bilingual ones in both scenarios. The generalization of multilingual training is also evaluated in a transfer learning scenario to a very low-resource language pair. All of our codes and the database are publicly available to encourage further research in this emergent multilingual ST topic11Available at https://github.com/espnet/espnet.
AB - In this paper, we propose a simple yet effective framework for multilingual end-To-end speech translation (ST), in which speech utterances in source languages are directly translated to the desired target languages with a universal sequence-To-sequence architecture. While multilingual models have shown to be useful for automatic speech recognition (ASR) and machine translation (MT), this is the first time they are applied to the end-To-end ST problem. We show the effectiveness of multilingual end-To-end ST in two scenarios: one-To-many and many-To-many translations with publicly available data. We experimentally confirm that multilingual end-To-end ST models significantly outperform bilingual ones in both scenarios. The generalization of multilingual training is also evaluated in a transfer learning scenario to a very low-resource language pair. All of our codes and the database are publicly available to encourage further research in this emergent multilingual ST topic11Available at https://github.com/espnet/espnet.
KW - Speech translation
KW - attention-based sequence-To-sequence
KW - multilingual end-To-end speech translation
KW - transfer learning
UR - http://www.scopus.com/inward/record.url?scp=85081588714&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85081588714&partnerID=8YFLogxK
U2 - 10.1109/ASRU46091.2019.9003832
DO - 10.1109/ASRU46091.2019.9003832
M3 - Conference contribution
AN - SCOPUS:85081588714
T3 - 2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019 - Proceedings
SP - 570
EP - 577
BT - 2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019
Y2 - 15 December 2019 through 18 December 2019
ER -