TY - GEN
T1 - Audio-visual speech recognition using deep bottleneck features and high-performance lipreading
AU - Tamura, Satoshi
AU - Ninomiya, Hiroshi
AU - Kitaoka, Norihide
AU - Osuga, Shin
AU - Iribe, Yurie
AU - Takeda, Kazuya
AU - Hayamizu, Satoru
N1 - Publisher Copyright:
© 2015 Asia-Pacific Signal and Information Processing Association.
PY - 2016/2/19
Y1 - 2016/2/19
N2 - This paper develops an Audio-Visual Speech Recognition (AVSR) method, by (1) exploring high-performance visual features, (2) applying audio and visual deep bottleneck features to improve AVSR performance, and (3) investigating effectiveness of voice activity detection in a visual modality. In our approach, many kinds of visual features are incorporated, subsequently converted into bottleneck features by deep learning technology. By using proposed features, we successfully achieved 73.66% lipreading accuracy in speaker-independent open condition, and about 90% AVSR accuracy on average in noisy environments. In addition, we extracted speech segments from visual features, resulting 77.80% lipreading accuracy. It is found VAD is useful in both audio and visual modalities, for better lipreading and AVSR.
AB - This paper develops an Audio-Visual Speech Recognition (AVSR) method, by (1) exploring high-performance visual features, (2) applying audio and visual deep bottleneck features to improve AVSR performance, and (3) investigating effectiveness of voice activity detection in a visual modality. In our approach, many kinds of visual features are incorporated, subsequently converted into bottleneck features by deep learning technology. By using proposed features, we successfully achieved 73.66% lipreading accuracy in speaker-independent open condition, and about 90% AVSR accuracy on average in noisy environments. In addition, we extracted speech segments from visual features, resulting 77.80% lipreading accuracy. It is found VAD is useful in both audio and visual modalities, for better lipreading and AVSR.
UR - http://www.scopus.com/inward/record.url?scp=84986214282&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84986214282&partnerID=8YFLogxK
U2 - 10.1109/APSIPA.2015.7415335
DO - 10.1109/APSIPA.2015.7415335
M3 - Conference contribution
AN - SCOPUS:84986214282
T3 - 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2015
SP - 575
EP - 582
BT - 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2015
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2015
Y2 - 16 December 2015 through 19 December 2015
ER -