An Exploration of Self-Supervised Pretrained Representations for End-to-End Speech Recognition

Xuankai Chang, Takashi Maekaku, Pengcheng Guo, Jing Shi, Yen Ju Lu, Aswin Shanmugam Subramanian, Tianzi Wang, Shu Wen Yang, Yu Tsao, Hung Yi Lee, Shinji Watanabe

研究成果: Conference contribution

33 被引用数 (Scopus)

抄録

Self-supervised pretraining on speech data has achieved a lot of progress. High-fidelity representation of the speech signal is learned from a lot of untranscribed data and shows promising performance. Recently, there are several works focusing on evaluating the quality of self-supervised pretrained representations on various tasks with-out domain restriction, e.g. SUPERB. However, such evaluations do not provide a comprehensive comparison among many ASR benchmark corpora. In this paper, we focus on the general applications of pretrained speech representations, on advanced end-to-end automatic speech recognition (E2E-ASR) models. We select sev-eral pretrained speech representations and present the experimental results on various open-source and publicly available corpora for E2E-ASR. Without any modification of the back-end model archi-tectures or training strategy, some of the experiments with pretrained representations, e.g., WSJ, WSJ0-2mix with HuBERT, reach or out-perform current state-of-the-art (SOTA) recognition performance. Moreover, we further explore more scenarios for whether the pre-training representations are effective, such as the cross-language or overlapped speech. The scripts, configuratons and the trained mod-els have been released in ESPnet to let the community reproduce our experiments and improve them.

本文言語English
ホスト出版物のタイトル2021 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2021 - Proceedings
出版社Institute of Electrical and Electronics Engineers Inc.
ページ228-235
ページ数8
ISBN(電子版)9781665437394
DOI
出版ステータスPublished - 2021
外部発表はい
イベント2021 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2021 - Cartagena, Colombia
継続期間: 2021 12月 132021 12月 17

出版物シリーズ

名前2021 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2021 - Proceedings

Conference

Conference2021 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2021
国/地域Colombia
CityCartagena
Period21/12/1321/12/17

ASJC Scopus subject areas

  • コンピュータ ビジョンおよびパターン認識
  • 信号処理
  • 言語学および言語

フィンガープリント

「An Exploration of Self-Supervised Pretrained Representations for End-to-End Speech Recognition」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル