TY - GEN
T1 - Multi-angle lipreading using angle classification and angle-specific feature integration
AU - Isobe, Shinnosuke
AU - Tamura, Satoshi
AU - Hayamizu, Satoru
AU - Gotoh, Yuuto
AU - Nose, Masaki
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2021/3/16
Y1 - 2021/3/16
N2 - Recently, visual speech recognition (VSR), or namely lipreading, has been widely researched due to development of Deep Learning (DL). The most lipreading researches focus only on frontal face images. However, assuming real scenes, it is obvious that a lipreading system should correctly recognize spoken contents not only from frontal but also side faces. In this paper, we propose a novel lipreading method that is applicable to faces taken at any angles, using Convolutional Neural Networks (CNNs) which is one of key deep-learning techniques. Our method consists of three parts; the view classification part, the feature extraction part and the integration part. We firstly apply angle classification to input faces. Based on the results, secondly we determine the best combination of pre-trained angle-specific feature extraction scheme. Finally, we integrate these features followed by DL-based lipreading. We evaluated our method using the open dataset OuluVS2 dataset including multi-angle audiovisual data. We then confirmed our approach has achieved the best performance among conventional and the other DL-based lipreading schemes in the phrase classification task.
AB - Recently, visual speech recognition (VSR), or namely lipreading, has been widely researched due to development of Deep Learning (DL). The most lipreading researches focus only on frontal face images. However, assuming real scenes, it is obvious that a lipreading system should correctly recognize spoken contents not only from frontal but also side faces. In this paper, we propose a novel lipreading method that is applicable to faces taken at any angles, using Convolutional Neural Networks (CNNs) which is one of key deep-learning techniques. Our method consists of three parts; the view classification part, the feature extraction part and the integration part. We firstly apply angle classification to input faces. Based on the results, secondly we determine the best combination of pre-trained angle-specific feature extraction scheme. Finally, we integrate these features followed by DL-based lipreading. We evaluated our method using the open dataset OuluVS2 dataset including multi-angle audiovisual data. We then confirmed our approach has achieved the best performance among conventional and the other DL-based lipreading schemes in the phrase classification task.
KW - Deep-learning
KW - Multi-angle lipreading
KW - View classification
KW - Visual speech recognition
UR - http://www.scopus.com/inward/record.url?scp=85125016551&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85125016551&partnerID=8YFLogxK
U2 - 10.1109/ICCSPA49915.2021.9385743
DO - 10.1109/ICCSPA49915.2021.9385743
M3 - Conference contribution
AN - SCOPUS:85125016551
T3 - ICCSPA 2020 - 4th International Conference on Communications, Signal Processing, and their Applications
BT - ICCSPA 2020 - 4th International Conference on Communications, Signal Processing, and their Applications
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 4th International Conference on Communications, Signal Processing, and their Applications, ICCSPA 2020
Y2 - 16 March 2021 through 18 March 2021
ER -