TY - GEN
T1 - Promising Accurate Prefix Boosting for Sequence-to-sequence ASR
AU - Baskar, Murali Karthick
AU - Burget, Lukas
AU - Watanabe, Shinji
AU - Karafiat, Martin
AU - Hori, Takaaki
AU - Cernocky, Jan Honza
N1 - Funding Information:
The work reported here was carried out during the 2018 Jelinek Memorial Summer Workshop on Speech and Language Technologies, supported by Johns Hopkins University via gifts from Microsoft, Amazon, Google, Facebook, and MERL/Mitsubishi Electric. All the authors from Brno university of Technology was supported by Czech Ministry of Education, Youth and Sports from the National Programme of Sustainability (NPU II) project ”IT4Innovations excellence in science - LQ1602” and by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA) MATERIAL program, via Air Force Research Laboratory (AFRL) contract # FA8650-17-C-9118. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of ODNI, IARPA, AFRL or the U.S. Government. Part of computing hardware was provided by Facebook within the FAIR GPU Partnership Program. We thank Ruizhi Li, for finding the hyper-parameters to obtain best baseline in WSJ. We also thank Hiroshi Seki, for providing the batch-wise beam search decoding implementation in ESPnet.
Publisher Copyright:
© 2019 IEEE.
PY - 2019/5
Y1 - 2019/5
N2 - In this paper, we present promising accurate prefix boosting (PAPB), a discriminative training technique for attention based sequence-to-sequence (seq2seq) ASR. PAPB is devised to unify the training and testing scheme effectively. The training procedure involves maximizing the score of each partial correct sequence obtained during beam search compared to other hypotheses. The training objective also includes minimization of token (character) error rate. PAPB shows its efficacy by achieving 10.8% and 3.8% WER with and without external RNNLM respectively on Wall Street Journal dataset.
AB - In this paper, we present promising accurate prefix boosting (PAPB), a discriminative training technique for attention based sequence-to-sequence (seq2seq) ASR. PAPB is devised to unify the training and testing scheme effectively. The training procedure involves maximizing the score of each partial correct sequence obtained during beam search compared to other hypotheses. The training objective also includes minimization of token (character) error rate. PAPB shows its efficacy by achieving 10.8% and 3.8% WER with and without external RNNLM respectively on Wall Street Journal dataset.
KW - Attention models
KW - Beam search training
KW - discriminative training
KW - sequence learning
KW - softmax-margin
UR - http://www.scopus.com/inward/record.url?scp=85068959446&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85068959446&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2019.8682782
DO - 10.1109/ICASSP.2019.8682782
M3 - Conference contribution
AN - SCOPUS:85068959446
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 5646
EP - 5650
BT - 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 44th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019
Y2 - 12 May 2019 through 17 May 2019
ER -