Evaluation of Real-time Audio-Visual Speech Recognition

Peng Shen, Satoshi Tamura, Satoru Hayamizu

研究成果: Paper査読

8 被引用数 (Scopus)

抄録

In this paper, we propose and develop a real-time audio-visual automatic continuous speech recognition system. The system utilizes live speech signals and facial images that collected from a microphone and a camera. Optical-flow-based features are used as visual feature. VAD technology and lip tracking are utilized to improve recognition accuracy. In this paper, several experiments are conducted using Japanese connected digit speech contaminated with white noise, music, television news and car engine noise. Experimental results show when the user is listening news or in a running car with window open the recognition accuracy of the proposed system are not enough. The accuracy of the proposed system is high at a place with light music or in a running car with window close.

本文言語English
出版ステータスPublished - 2010
外部発表はい
イベント2010 International Conference on Auditory-Visual Speech Processing, AVSP 2010 - Hakone, Japan
継続期間: 2010 9月 302010 10月 3

Conference

Conference2010 International Conference on Auditory-Visual Speech Processing, AVSP 2010
国/地域Japan
CityHakone
Period10/9/3010/10/3

ASJC Scopus subject areas

  • 言語および言語学
  • 言語聴覚療法
  • 耳鼻咽喉科学

フィンガープリント

「Evaluation of Real-time Audio-Visual Speech Recognition」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル