TY - GEN
T1 - Searchable Hidden Intermediates for End-to-End Models of Decomposable Sequence Tasks
AU - Dalmia, Siddharth
AU - Yan, Brian
AU - Raunak, Vikas
AU - Metze, Florian
AU - Watanabe, Shinji
N1 - Funding Information:
supported by NSF award number ACI-1445606, at the Pittsburgh Supercomputing Center (PSC). The work was supported in part by an AWS Machine Learning Research Award. This research was also supported in part the DARPA KAIROS program from the Air Force Research Laboratory under agreement number FA8750-19-2-0200. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes not withstanding any copyright notation there on. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the Air Force Research Laboratory or the U.S. Government.
Funding Information:
This work started while Vikas Raunak was a stu-dent at CMU, he is now working as a Research Sci-entist at Microsoft. We thank Pengcheng Guo, Hi-rofumi Inaguma, Elizabeth Salesky, Maria Ryskina, Marta Méndez Simón and Vijay Viswanathan for their helpful discussion during the course of this project. We also thank the anonymous reviewers Auto-Regressive Decoding: As auto-regressive for their valuable feedback. This work used the decoders inherently learn a language model along Extreme Science and Engineering Discovery En-with the task at hand, they tend to be domain spe-vironment (XSEDE) (Towns et al., 2014), which cific (Samarakoon et al., 2018; Müller et al., 2020). is supported by National Science Foundation grant This can cause generalizability issues during infer-number ACI-1548562. Specifically, it used the ence (Murray and Chiang, 2018; Yang et al., 2018), Bridges system (Nystrom et al., 2015), which is 1890 impacting the performance of both the task at hand and any downstream tasks. Our approach alleviates these problems through intermediate search, external models for intermediate re-scoring, and multi-sequence attention.
Funding Information:
This work started while Vikas Raunak was a student at CMU, he is now working as a Research Scientist at Microsoft. We thank Pengcheng Guo, Hirofumi Inaguma, Elizabeth Salesky, Maria Ryskina, Marta Méndez Simón and Vijay Viswanathan for their helpful discussion during the course of this project. We also thank the anonymous reviewers for their valuable feedback. This work used the Extreme Science and Engineering Discovery Environment (XSEDE) (Towns et al., 2014), which is supported by National Science Foundation grant number ACI-1548562. Specifically, it used the Bridges system (Nystrom et al., 2015), which is supported by NSF award number ACI-1445606, at the Pittsburgh Supercomputing Center (PSC). The work was supported in part by an AWS Machine Learning Research Award. This research was also supported in part the DARPA KAIROS program from the Air Force Research Laboratory under agreement number FA8750-19-2-0200. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes not withstanding any copyright notation there on. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the Air Force Research Laboratory or the U.S. Government.
Publisher Copyright:
© 2021 Association for Computational Linguistics.
PY - 2021
Y1 - 2021
N2 - End-to-end approaches for sequence tasks are becoming increasingly popular. Yet for complex sequence tasks, like speech translation, systems that cascade several models trained on sub-tasks have shown to be superior, suggesting that the compositionality of cascaded systems simplifies learning and enables sophisticated search capabilities. In this work, we present an end-to-end framework that exploits compositionality to learn searchable hidden representations at intermediate stages of a sequence model using decomposed sub-tasks. These hidden intermediates can be improved using beam search to enhance the overall performance and can also incorporate external models at intermediate stages of the network to re-score or adapt towards out-of-domain data. One instance of the proposed framework is a Multi-Decoder model for speech translation that extracts the searchable hidden intermediates from a speech recognition sub-task. The model demonstrates the aforementioned benefits and outperforms the previous state-of-the-art by around +6 and +3 BLEU on the two test sets of Fisher-CallHome and by around +3 and +4 BLEU on the English-German and English-French test sets of MuST-C.
AB - End-to-end approaches for sequence tasks are becoming increasingly popular. Yet for complex sequence tasks, like speech translation, systems that cascade several models trained on sub-tasks have shown to be superior, suggesting that the compositionality of cascaded systems simplifies learning and enables sophisticated search capabilities. In this work, we present an end-to-end framework that exploits compositionality to learn searchable hidden representations at intermediate stages of a sequence model using decomposed sub-tasks. These hidden intermediates can be improved using beam search to enhance the overall performance and can also incorporate external models at intermediate stages of the network to re-score or adapt towards out-of-domain data. One instance of the proposed framework is a Multi-Decoder model for speech translation that extracts the searchable hidden intermediates from a speech recognition sub-task. The model demonstrates the aforementioned benefits and outperforms the previous state-of-the-art by around +6 and +3 BLEU on the two test sets of Fisher-CallHome and by around +3 and +4 BLEU on the English-German and English-French test sets of MuST-C.
UR - http://www.scopus.com/inward/record.url?scp=85109956050&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85109956050&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85109956050
T3 - NAACL-HLT 2021 - 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference
SP - 1882
EP - 1896
BT - NAACL-HLT 2021 - 2021 Conference of the North American Chapter of the Association for Computational Linguistics
PB - Association for Computational Linguistics (ACL)
T2 - 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021
Y2 - 6 June 2021 through 11 June 2021
ER -