Development of audio-visual speech corpus toward speaker-independent Japanese LVCSR

Kazuto Ukai, Satoshi Tamura, Satoru Hayamizu

研究成果: Conference contribution

1 被引用数 (Scopus)

抄録

In the speech recognition literature, building corpora for Large Vocabulary Continuous Speech Recognition (LVCSR) is quite important. In addition, in order to overcome performance decrease caused by noise, using visual information such as lip images is effective. In this paper, therefore, we focus on collecting speech and lip-image data for audio-visual LVCSR. Audio-visual speech data were obtained from 12 speakers, each who uttered ATR503 phonetically-balanced sentences. These data were recorded in acoustically and visually clean environments. Using the data, we conducted recognition experiments. Mel Frequency Cepstral Coefficients (MFCCs) and eigenlip features were obtained, and multi-stream Hidden Markov Models (HMMs) were built. We compared the performance in clean condition to those in noisy environments. It is found that visual information is able to compensate the performance. In addition, it turns out that we should improve visual speech recognition for high-performance audio-visual LVCSR.

本文言語English
ホスト出版物のタイトル2016 Conference of the Oriental Chapter of International Committee for Coordination and Standardization of Speech Databases and Assessment Techniques, O-COCOSDA 2016
出版社Institute of Electrical and Electronics Engineers Inc.
ページ12-15
ページ数4
ISBN(電子版)9781509035168
DOI
出版ステータスPublished - 2017 5月 3
外部発表はい
イベント19th Annual Conference of the Oriental Chapter of International Committee for Coordination and Standardization of Speech Databases and Assessment Techniques, O-COCOSDA 2016 - Bali, Indonesia
継続期間: 2016 10月 262016 10月 28

出版物シリーズ

名前2016 Conference of the Oriental Chapter of International Committee for Coordination and Standardization of Speech Databases and Assessment Techniques, O-COCOSDA 2016

Other

Other19th Annual Conference of the Oriental Chapter of International Committee for Coordination and Standardization of Speech Databases and Assessment Techniques, O-COCOSDA 2016
国/地域Indonesia
CityBali
Period16/10/2616/10/28

ASJC Scopus subject areas

  • 情報システム
  • 信号処理
  • 情報システムおよび情報管理
  • 言語学および言語

フィンガープリント

「Development of audio-visual speech corpus toward speaker-independent Japanese LVCSR」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル