Audio-visual interaction in sparse representation features for noise robust audio-visual speech recognition

Peng Shen, Satoshi Tamura, Satoru Hayamizu

Research output: Contribution to conferencePaperpeer-review

1 Citation (Scopus)

Abstract

In this paper, we investigate audio-visual interaction in sparse representation to obtain robust features for audio-visual speech recognition. Firstly, we introduce our system which uses sparse representation method for noise robust audio-visual speech recognition. Then, we introduce the dictionary matrix used in this paper, and consider the construction of audio-visual dictionary. Finally, we reformulate audio and visual signals as a group sparse representation problem in a combined feature-space domain, and then we improve the joint sparsity feature fusion method with the group sparse representation features and audio sparse representation features. The proposed methods are evaluated using CENSREC-1-AV database with both audio noise and visual noise. From the experimental results, we showed the effectiveness of our proposed method comparing with traditional methods.

Original languageEnglish
Pages43-48
Number of pages6
Publication statusPublished - 2013
Externally publishedYes
Event2013 International Conference on Auditory-Visual Speech Processing, AVSP 2013 - Annecy, France
Duration: 2013 Aug 292013 Sept 1

Conference

Conference2013 International Conference on Auditory-Visual Speech Processing, AVSP 2013
Country/TerritoryFrance
CityAnnecy
Period13/8/2913/9/1

Keywords

  • audio-visual speech recognition
  • feature fusion
  • joint sparsity model
  • noise reduction
  • sparse representation

ASJC Scopus subject areas

  • Language and Linguistics
  • Speech and Hearing
  • Otorhinolaryngology

Fingerprint

Dive into the research topics of 'Audio-visual interaction in sparse representation features for noise robust audio-visual speech recognition'. Together they form a unique fingerprint.

Cite this