Abstract
In this paper, we investigate audio-visual interaction in sparse representation to obtain robust features for audio-visual speech recognition. Firstly, we introduce our system which uses sparse representation method for noise robust audio-visual speech recognition. Then, we introduce the dictionary matrix used in this paper, and consider the construction of audio-visual dictionary. Finally, we reformulate audio and visual signals as a group sparse representation problem in a combined feature-space domain, and then we improve the joint sparsity feature fusion method with the group sparse representation features and audio sparse representation features. The proposed methods are evaluated using CENSREC-1-AV database with both audio noise and visual noise. From the experimental results, we showed the effectiveness of our proposed method comparing with traditional methods.
Original language | English |
---|---|
Pages | 43-48 |
Number of pages | 6 |
Publication status | Published - 2013 |
Externally published | Yes |
Event | 2013 International Conference on Auditory-Visual Speech Processing, AVSP 2013 - Annecy, France Duration: 2013 Aug 29 → 2013 Sept 1 |
Conference
Conference | 2013 International Conference on Auditory-Visual Speech Processing, AVSP 2013 |
---|---|
Country/Territory | France |
City | Annecy |
Period | 13/8/29 → 13/9/1 |
Keywords
- audio-visual speech recognition
- feature fusion
- joint sparsity model
- noise reduction
- sparse representation
ASJC Scopus subject areas
- Language and Linguistics
- Speech and Hearing
- Otorhinolaryngology