Audio-visual processing toward robust speech recognition in cars

Satoshi Tamura, Hiroshi Ninomiya, Norihide Kitaoka, Shin Osuga, Yurie Iribe, Kazuya Takeda, Satoru Hayamizu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper reports our recent efforts to develop robust speech recognition in cars. Speech recognition is expected to handle many devices on cars. However, many kinds of acoustic noises, e.g. engine noise and car stereo, are observed in in-car environments, making performance of speech recognition decrease. In order to overcome the degradation, we develop a high-performance audio-visual speech recognition method. Lip images are obtained from captured face images using our face detection scheme. Some basic visual features are computed, then converted into visual features for speech recognition using a deep neural network. Audio features are obtained as well. Audio and visual features are subsequently concatenated into audio-visual features. As a recognition model, a multi-stream hidden Markov model is employed which can adjust contributions of audio and visual modalities. We evaluated our proposed method using an audio-visual corpus CENSREC-1-AV. In order to simulate driving-car condition, we prepared driving and music noises. Experimental results show that our method can significantly improving recognition performance in in-car condition.

Original languageEnglish
Title of host publication7th Biennial Workshop on Digital Signal Processing for In-Vehicle Systems and Safety 2015
PublisherUniversity of Texas at Dallas
Pages31-34
Number of pages4
ISBN (Electronic)9781510827844
Publication statusPublished - 2015
Externally publishedYes
Event7th Biennial Workshop on Digital Signal Processing for In-Vehicle Systems and Safety 2015 - Berkeley, United States
Duration: 2015 Oct 142015 Oct 16

Publication series

Name7th Biennial Workshop on Digital Signal Processing for In-Vehicle Systems and Safety 2015

Other

Other7th Biennial Workshop on Digital Signal Processing for In-Vehicle Systems and Safety 2015
Country/TerritoryUnited States
CityBerkeley
Period15/10/1415/10/16

Keywords

  • Audio-visual speech recognition
  • Deep neural network
  • In-car speech technology
  • Multi-stream hidden markov model

ASJC Scopus subject areas

  • Signal Processing
  • Automotive Engineering
  • Control and Systems Engineering

Fingerprint

Dive into the research topics of 'Audio-visual processing toward robust speech recognition in cars'. Together they form a unique fingerprint.

Cite this