TY - CONF
T1 - Audio-visual interaction in sparse representation features for noise robust audio-visual speech recognition
AU - Shen, Peng
AU - Tamura, Satoshi
AU - Hayamizu, Satoru
N1 - Funding Information:
The part of this work was supported by JSPS KAK-ENHI Grant (Grant-in-Aid for Young Scientists (B)) No.25730109.
Publisher Copyright:
© Auditory-Visual Speech Processing 2013, AVSP 2013. All rights reserved.
PY - 2013
Y1 - 2013
N2 - In this paper, we investigate audio-visual interaction in sparse representation to obtain robust features for audio-visual speech recognition. Firstly, we introduce our system which uses sparse representation method for noise robust audio-visual speech recognition. Then, we introduce the dictionary matrix used in this paper, and consider the construction of audio-visual dictionary. Finally, we reformulate audio and visual signals as a group sparse representation problem in a combined feature-space domain, and then we improve the joint sparsity feature fusion method with the group sparse representation features and audio sparse representation features. The proposed methods are evaluated using CENSREC-1-AV database with both audio noise and visual noise. From the experimental results, we showed the effectiveness of our proposed method comparing with traditional methods.
AB - In this paper, we investigate audio-visual interaction in sparse representation to obtain robust features for audio-visual speech recognition. Firstly, we introduce our system which uses sparse representation method for noise robust audio-visual speech recognition. Then, we introduce the dictionary matrix used in this paper, and consider the construction of audio-visual dictionary. Finally, we reformulate audio and visual signals as a group sparse representation problem in a combined feature-space domain, and then we improve the joint sparsity feature fusion method with the group sparse representation features and audio sparse representation features. The proposed methods are evaluated using CENSREC-1-AV database with both audio noise and visual noise. From the experimental results, we showed the effectiveness of our proposed method comparing with traditional methods.
KW - audio-visual speech recognition
KW - feature fusion
KW - joint sparsity model
KW - noise reduction
KW - sparse representation
UR - http://www.scopus.com/inward/record.url?scp=85133387626&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85133387626&partnerID=8YFLogxK
M3 - Paper
AN - SCOPUS:85133387626
SP - 43
EP - 48
T2 - 2013 International Conference on Auditory-Visual Speech Processing, AVSP 2013
Y2 - 29 August 2013 through 1 September 2013
ER -