An audio-visual in-car corpus "CENSREC-2-AV" for robust bimodal speech recognition

Takuya Kawasaki, Naoya Ukai, Takumi Seko, Satoshi Tamura, Satoru Hayamizu, Chiyomi Miyajima, Norihide Kitaoka, Kazuya Takeda

研究成果: Paper査読

抄録

The purpose of this study is to build an evaluation framework for robust bimodal speech recognition in real environments, such as in-car conditions. Bimodal speech recognition using lip images has been studied to prevent the deterioration of speech recognition performance in noisy environments. Lip reading technologies using lip images play a great role for the bimodal speech recognition. Therefore, for the bimodal speech recognition, a database both speech signals and lip images is necessary to build a bimodal speech recognizer and to evaluate its performance. An evaluation framework for noisy bimodal speech recognition (CENSREC-1-AV) was constructed by Tamura et al; a subject on a blue screen background spoke Japanese connected digits in a quiet office environment. CENSREC-1-AV was recorded in the clean condition, on the other hand, a database recorded in real environments is required to evaluate a bimodal speech recognizer. Therefore, we have constructed a new audio-visual corpus CENSREC-2-AV, recorded in in-car environments; a subject sitting on a driver's seat in a car uttered Japanese connected digits in various driving conditions: for example, a tunnel situation with music background noises, and driving on an expressway while the window is open. By using CENSREC-2-AV, it is possible to realize a robust bimodal speech recognition method even in real environments.

本文言語English
出版ステータスPublished - 2013
外部発表はい
イベント6th Biennial Workshop on Digital Signal Processing for In-Vehicle Systems and Safety 2013, DSP 2013 - Seoul, Korea, Republic of
継続期間: 2013 9月 292013 10月 2

Conference

Conference6th Biennial Workshop on Digital Signal Processing for In-Vehicle Systems and Safety 2013, DSP 2013
国/地域Korea, Republic of
CitySeoul
Period13/9/2913/10/2

ASJC Scopus subject areas

  • 自動車工学
  • 安全性、リスク、信頼性、品質管理

フィンガープリント

「An audio-visual in-car corpus "CENSREC-2-AV" for robust bimodal speech recognition」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル