In this paper, we investigate audio-visual interaction in sparse representation to obtain robust features for audio-visual speech recognition. Firstly, we introduce our system which uses sparse representation method for noise robust audio-visual speech recognition. Then, we introduce the dictionary matrix used in this paper, and consider the construction of audio-visual dictionary. Finally, we reformulate audio and visual signals as a group sparse representation problem in a combined feature-space domain, and then we improve the joint sparsity feature fusion method with the group sparse representation features and audio sparse representation features. The proposed methods are evaluated using CENSREC-1-AV database with both audio noise and visual noise. From the experimental results, we showed the effectiveness of our proposed method comparing with traditional methods.
|出版ステータス||Published - 2013|
|イベント||2013 International Conference on Auditory-Visual Speech Processing, AVSP 2013 - Annecy, France|
継続期間: 2013 8月 29 → 2013 9月 1
|Conference||2013 International Conference on Auditory-Visual Speech Processing, AVSP 2013|
|Period||13/8/29 → 13/9/1|
ASJC Scopus subject areas