Abstract
This paper introduces incoming evaluation frameworks for bimodal speech recognition in noisy conditions and real environments. In order to develop a robust speech recognition in noisy environments, bimodal speech recognition which uses acoustic and visual information has been paid attention to particularly for this decade. As a lot of methods and techniques for bimodal speech recognition have been proposed, a common evaluation framework, including audio-visual speech data and baseline system, is needed to estimate and compare these techniques and bimodal speech recognition schemes. Audio-visual evaluation frameworks, CENSREC-1-AV and CENSREC-2-AV, have been being built by the CENSREC project in Japan; CENSREC-1-AV includes artificially noise-added waveforms and image sequences, whereas CENSREC-2-AV consists of audio-visual data recorded in in-car environments. A baseline method and its recognition results will be also provided with these corpora.
Original language | English |
---|---|
Pages | 51-54 |
Number of pages | 4 |
Publication status | Published - 2008 |
Externally published | Yes |
Event | 2008 International Conference on Auditory-Visual Speech Processing, AVSP 2008 - Moreton Island, Australia Duration: 2008 Sept 26 → 2008 Sept 29 |
Conference
Conference | 2008 International Conference on Auditory-Visual Speech Processing, AVSP 2008 |
---|---|
Country/Territory | Australia |
City | Moreton Island |
Period | 08/9/26 → 08/9/29 |
Keywords
- Audio-visual speech corpus
- Bimodal speech recognition
- Evaluation framework
- Noisy environments
ASJC Scopus subject areas
- Language and Linguistics
- Speech and Hearing
- Otorhinolaryngology