CENSREC-1-AV: An audio-visual corpus for noisy bimodal speech recognition

Satoshi Tamura*, Chiyomi Miyajima, Norihide Kitaoka, Takeshi Yamada, Satoru Tsuge, Tetsuya Takiguchi, Kazumasa Yamamoto, Takanobu Nishiura, Masato Nakayama, Yuki Denda, Masakiyo Fujimoto, Shigeki Matsuda, Tetsuji Ogawa, Shingo Kuroiwa, Kazuya Takeda, Satoshi Nakamura

*この研究の対応する著者

研究成果: Paper査読

20 被引用数 (Scopus)

抄録

In this paper, an audio-visual speech corpus CENSREC-1-AV for noisy speech recognition is introduced. CENSREC-1-AV consists of an audio-visual database and a baseline system of bimodal speech recognition which uses audio and visual information. In the database, there are 3,234 and 1,963 utterances made by 42 and 51 speakers as a training and a test sets respectively. Each utterance consists of a speech signal as well as color and infrared pictures around a speaker's mouth. A baseline system is built so that a user can evaluate a proposed bimodal speech recognizer. In the baseline system, multi-stream HMMs are obtained using training data. A preliminary experiment was conducted to evaluate the baseline using acoustically noisy testing data. The results show that roughly a 35% relative error reduction was achieved in low SNR conditions compared with an audio-only ASR method.

本文言語English
出版ステータスPublished - 2010
イベント2010 International Conference on Auditory-Visual Speech Processing, AVSP 2010 - Hakone, Japan
継続期間: 2010 9月 302010 10月 3

Conference

Conference2010 International Conference on Auditory-Visual Speech Processing, AVSP 2010
国/地域Japan
CityHakone
Period10/9/3010/10/3

ASJC Scopus subject areas

  • 言語および言語学
  • 言語聴覚療法
  • 耳鼻咽喉科学

フィンガープリント

「CENSREC-1-AV: An audio-visual corpus for noisy bimodal speech recognition」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル