TY - GEN
T1 - CMU’s IWSLT 2022 Dialect Speech Translation System
AU - Yan, Brian
AU - Fernandes, Patrick
AU - Dalmia, Siddharth
AU - Shi, Jiatong
AU - Peng, Yifan
AU - Berrebbi, Dan
AU - Wang, Xinyi
AU - Neubig, Graham
AU - Watanabe, Shinji
N1 - Funding Information:
Brian Yan and Shinji Watanabe are supported by the Human Language Technology Center of Excellence. This work used the Extreme Science and Engineering Discovery Environment (XSEDE) (Towns et al., 2014), which is supported by National Science Foundation grant number ACI-1548562; specifically, the Bridges system (Nys-trom et al., 2015), as part of project cis210027p, which is supported by NSF award number ACI-1445606, at the Pittsburgh Supercomputing Center. We’d also like to thank Soumi Maiti, Tomoki Hayashi, and Koshak for their contributions.
Funding Information:
Brian Yan and Shinji Watanabe are supported by the Human Language Technology Center of Excellence. This work used the Extreme Science and Engineering Discovery Environment (XSEDE) (Towns et al., 2014), which is supported by National Science Foundation grant number ACI-1548562; specifically, the Bridges system (Nystrom et al., 2015), as part of project cis210027p, which is supported by NSF award number ACI-1445606, at the Pittsburgh Supercomputing Center. We’d also like to thank Soumi Maiti, Tomoki Hayashi, and Koshak for their contributions.
Publisher Copyright:
© 2022 Association for Computational Linguistics.
PY - 2022
Y1 - 2022
N2 - This paper describes CMU’s submissions to the IWSLT 2022 dialect speech translation (ST) shared task for translating Tunisian-Arabic speech to English text. We use additional paired Modern Standard Arabic data (MSA) to directly improve the speech recognition (ASR) and machine translation (MT) components of our cascaded systems. We also augment the paired ASR data with pseudo translations via sequence-level knowledge distillation from an MT model and use these artificial triplet ST data to improve our end-to-end (E2E) systems. Our E2E models are based on the Multi-Decoder architecture with searchable hidden intermediates. We extend the Multi-Decoder by orienting the speech encoder towards the target language by applying ST supervision as hierarchical connectionist temporal classification (CTC) multi-task. During inference, we apply joint decoding of the ST CTC and ST autoregressive decoder branches of our modified Multi-Decoder. Finally, we apply ROVER voting, posterior combination, and minimum bayes-risk decoding with combined N-best lists to ensemble our various cascaded and E2E systems. Our best systems reached 20.8 and 19.5 BLEU on test2 (blind) and test1 respectively Without any additional MSA data, we reached 20.4 and 19.2 on the same test sets.
AB - This paper describes CMU’s submissions to the IWSLT 2022 dialect speech translation (ST) shared task for translating Tunisian-Arabic speech to English text. We use additional paired Modern Standard Arabic data (MSA) to directly improve the speech recognition (ASR) and machine translation (MT) components of our cascaded systems. We also augment the paired ASR data with pseudo translations via sequence-level knowledge distillation from an MT model and use these artificial triplet ST data to improve our end-to-end (E2E) systems. Our E2E models are based on the Multi-Decoder architecture with searchable hidden intermediates. We extend the Multi-Decoder by orienting the speech encoder towards the target language by applying ST supervision as hierarchical connectionist temporal classification (CTC) multi-task. During inference, we apply joint decoding of the ST CTC and ST autoregressive decoder branches of our modified Multi-Decoder. Finally, we apply ROVER voting, posterior combination, and minimum bayes-risk decoding with combined N-best lists to ensemble our various cascaded and E2E systems. Our best systems reached 20.8 and 19.5 BLEU on test2 (blind) and test1 respectively Without any additional MSA data, we reached 20.4 and 19.2 on the same test sets.
UR - http://www.scopus.com/inward/record.url?scp=85137494629&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85137494629&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85137494629
T3 - IWSLT 2022 - 19th International Conference on Spoken Language Translation, Proceedings of the Conference
SP - 298
EP - 307
BT - IWSLT 2022 - 19th International Conference on Spoken Language Translation, Proceedings of the Conference
A2 - Salesky, Elizabeth
A2 - Federico, Marcello
A2 - Costa-Jussa, Marta
PB - Association for Computational Linguistics (ACL)
T2 - 19th International Conference on Spoken Language Translation, IWSLT 2022
Y2 - 26 May 2022 through 27 May 2022
ER -