An audio-visual in-car corpus "CENSREC-2-AV" for robust bimodal speech recognition

Takuya Kawasaki, Naoya Ukai, Takumi Seko, Satoshi Tamura, Satoru Hayamizu, Chiyomi Miyajima, Norihide Kitaoka, Kazuya Takeda

Research output: Contribution to conferencePaperpeer-review

Abstract

The purpose of this study is to build an evaluation framework for robust bimodal speech recognition in real environments, such as in-car conditions. Bimodal speech recognition using lip images has been studied to prevent the deterioration of speech recognition performance in noisy environments. Lip reading technologies using lip images play a great role for the bimodal speech recognition. Therefore, for the bimodal speech recognition, a database both speech signals and lip images is necessary to build a bimodal speech recognizer and to evaluate its performance. An evaluation framework for noisy bimodal speech recognition (CENSREC-1-AV) was constructed by Tamura et al; a subject on a blue screen background spoke Japanese connected digits in a quiet office environment. CENSREC-1-AV was recorded in the clean condition, on the other hand, a database recorded in real environments is required to evaluate a bimodal speech recognizer. Therefore, we have constructed a new audio-visual corpus CENSREC-2-AV, recorded in in-car environments; a subject sitting on a driver's seat in a car uttered Japanese connected digits in various driving conditions: for example, a tunnel situation with music background noises, and driving on an expressway while the window is open. By using CENSREC-2-AV, it is possible to realize a robust bimodal speech recognition method even in real environments.

Original languageEnglish
Publication statusPublished - 2013
Externally publishedYes
Event6th Biennial Workshop on Digital Signal Processing for In-Vehicle Systems and Safety 2013, DSP 2013 - Seoul, Korea, Republic of
Duration: 2013 Sept 292013 Oct 2

Conference

Conference6th Biennial Workshop on Digital Signal Processing for In-Vehicle Systems and Safety 2013, DSP 2013
Country/TerritoryKorea, Republic of
CitySeoul
Period13/9/2913/10/2

Keywords

  • An evaluation framework
  • Bimodal speech recognition
  • CENSREC
  • Driving condition
  • Real environment

ASJC Scopus subject areas

  • Automotive Engineering
  • Safety, Risk, Reliability and Quality

Fingerprint

Dive into the research topics of 'An audio-visual in-car corpus "CENSREC-2-AV" for robust bimodal speech recognition'. Together they form a unique fingerprint.

Cite this