TY - GEN
T1 - SPGISpeech
T2 - 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021
AU - O'Neill, Patrick K.
AU - Lavrukhin, Vitaly
AU - Majumdar, Somshubra
AU - Noroozi, Vahid
AU - Zhang, Yuekai
AU - Kuchaiev, Oleksii
AU - Balam, Jagadeesh
AU - Dovzhenko, Yuliya
AU - Freyberg, Keenan
AU - Shulman, Michael D.
AU - Ginsburg, Boris
AU - Watanabe, Shinji
AU - Kucsko, Georg
N1 - Publisher Copyright:
Copyright © 2021 ISCA.
PY - 2021
Y1 - 2021
N2 - In the English speech-to-text (STT) machine learning task, acoustic models are conventionally trained on uncased Latin characters, and any necessary orthography (such as capitalization, punctuation, and denormalization of non-standard words) is imputed by separate post-processing models. This adds complexity and limits performance, as many formatting tasks benefit from semantic information present in the acoustic signal but absent in transcription. Here we propose a new STT task: endto-end neural transcription with fully formatted text for target labels. We present baseline Conformer-based models trained on a corpus of 5,000 hours of professionally transcribed earnings calls, achieving a CER of 1.7. As a contribution to the STT research community, we release the corpus free for noncommercial use.
AB - In the English speech-to-text (STT) machine learning task, acoustic models are conventionally trained on uncased Latin characters, and any necessary orthography (such as capitalization, punctuation, and denormalization of non-standard words) is imputed by separate post-processing models. This adds complexity and limits performance, as many formatting tasks benefit from semantic information present in the acoustic signal but absent in transcription. Here we propose a new STT task: endto-end neural transcription with fully formatted text for target labels. We present baseline Conformer-based models trained on a corpus of 5,000 hours of professionally transcribed earnings calls, achieving a CER of 1.7. As a contribution to the STT research community, we release the corpus free for noncommercial use.
UR - http://www.scopus.com/inward/record.url?scp=85118697203&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85118697203&partnerID=8YFLogxK
U2 - 10.21437/Interspeech.2021-1860
DO - 10.21437/Interspeech.2021-1860
M3 - Conference contribution
AN - SCOPUS:85118697203
T3 - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
SP - 1081
EP - 1085
BT - 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021
PB - International Speech Communication Association
Y2 - 30 August 2021 through 3 September 2021
ER -