Abstract
This paper investigates a lipreading scheme employing optical and depth modalities, with using deep bottleneck features. Optical and depth data are captured by Microsoft Kinect v2, followed by computing an appearance-based feature set in each modality. A basic feature set is then converted into a deep bottleneck feature using a deep neural network having a bottleneck layer. Multi-stream hidden Marcov models are used for recognition. We evaluated the method using our connected-digit corpus, comparing to our previous method. It is finally found that we could improve lipreading performance by employing deep bottleneck features.
Original language | English |
---|---|
Pages | 76-77 |
Number of pages | 2 |
Publication status | Published - 2017 |
Externally published | Yes |
Event | 14th International Conference on Auditory-Visual Speech Processing, AVSP 2017 - Stockholm, Sweden Duration: 2017 Aug 25 → 2017 Aug 26 |
Conference
Conference | 14th International Conference on Auditory-Visual Speech Processing, AVSP 2017 |
---|---|
Country/Territory | Sweden |
City | Stockholm |
Period | 17/8/25 → 17/8/26 |
Keywords
- deep bottleneck feature
- depth information
- lipreading
- multi-stream HMM
ASJC Scopus subject areas
- Language and Linguistics
- Otorhinolaryngology
- Speech and Hearing