抄録
The purpose of this study is to build an evaluation framework for robust bimodal speech recognition in real environments, such as in-car conditions. Bimodal speech recognition using lip images has been studied to prevent the deterioration of speech recognition performance in noisy environments. Lip reading technologies using lip images play a great role for the bimodal speech recognition. Therefore, for the bimodal speech recognition, a database both speech signals and lip images is necessary to build a bimodal speech recognizer and to evaluate its performance. An evaluation framework for noisy bimodal speech recognition (CENSREC-1-AV) was constructed by Tamura et al; a subject on a blue screen background spoke Japanese connected digits in a quiet office environment. CENSREC-1-AV was recorded in the clean condition, on the other hand, a database recorded in real environments is required to evaluate a bimodal speech recognizer. Therefore, we have constructed a new audio-visual corpus CENSREC-2-AV, recorded in in-car environments; a subject sitting on a driver's seat in a car uttered Japanese connected digits in various driving conditions: for example, a tunnel situation with music background noises, and driving on an expressway while the window is open. By using CENSREC-2-AV, it is possible to realize a robust bimodal speech recognition method even in real environments.
本文言語 | English |
---|---|
出版ステータス | Published - 2013 |
外部発表 | はい |
イベント | 6th Biennial Workshop on Digital Signal Processing for In-Vehicle Systems and Safety 2013, DSP 2013 - Seoul, Korea, Republic of 継続期間: 2013 9月 29 → 2013 10月 2 |
Conference
Conference | 6th Biennial Workshop on Digital Signal Processing for In-Vehicle Systems and Safety 2013, DSP 2013 |
---|---|
国/地域 | Korea, Republic of |
City | Seoul |
Period | 13/9/29 → 13/10/2 |
ASJC Scopus subject areas
- 自動車工学
- 安全性、リスク、信頼性、品質管理