Language independent end-to-end architecture for joint language identification and speech recognition

Shinji Watanabe, Takaaki Hori, John R. Hershey

研究成果: Conference contribution

84 被引用数 (Scopus)

抄録

End-to-end automatic speech recognition (ASR) can significantly reduce the burden of developing ASR systems for new languages, by eliminating the need for linguistic information such as pronunciation dictionaries. This also creates an opportunity, which we fully exploit in this paper, to build a monolithic multilingual ASR system with a language-independent neural network architecture. We present a model that can recognize speech in 10 different languages, by directly performing grapheme (character/chunked-character) based speech recognition. The model is based on our hybrid attention/connectionist temporal classification (CTC) architecture which has previously been shown to achieve the state-of-the-art performance in several ASR benchmarks. Here we augment its set of output symbols to include the union of character sets appearing in all the target languages. These include Roman and Cyrillic Alphabets, Arabic numbers, simplified Chinese, and Japanese Kanji/Hiragana/Katakana characters (5,500 characters in all). This allows training of a single multilingual model, whose parameters are shared across all the languages. The model can jointly identify the language and recognize the speech, automatically formatting the recognized text in the appropriate character set. The experiments, which used speech databases composed of Wall Street Journal (English), Corpus of Spontaneous Japanese, HKUST Mandarin CTS, and Voxforge (German, Spanish, French, Italian, Dutch, Portuguese, Russian), demonstrate comparable/superior performance relative to language-dependent end-to-end ASR systems.

本文言語English
ホスト出版物のタイトル2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017 - Proceedings
出版社Institute of Electrical and Electronics Engineers Inc.
ページ265-271
ページ数7
ISBN(電子版)9781509047888
DOI
出版ステータスPublished - 2018 1月 24
外部発表はい
イベント2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017 - Okinawa, Japan
継続期間: 2017 12月 162017 12月 20

出版物シリーズ

名前2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017 - Proceedings
2018-January

Other

Other2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017
国/地域Japan
CityOkinawa
Period17/12/1617/12/20

ASJC Scopus subject areas

  • コンピュータ ビジョンおよびパターン認識
  • 人間とコンピュータの相互作用

フィンガープリント

「Language independent end-to-end architecture for joint language identification and speech recognition」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル