Abstract
In this paper, we propose a multi-modal voice activity detection system (VAD) that uses audio and visual information. In multi-modal (speech) signal processing, there are two methods for fusing the audio and the visual information: concatenating the audio and visual features, and employing audio-only and visual-only classifiers, then fusing the unimodal decisions. We investigate the effectiveness of decision fusion given by the results from AdaBoost. AdaBoost is one of the machine learning method. By using AdaBoost, the effective classifier is constructed by combining weak classifiers. It classifies input data into two classes based on the weighted results from weak classifiers. In proposed method, this fusion scheme is applied to decision fusion of multi-modal VAD. Experimental results show proposed method to generally be more effective.
Original language | English |
---|---|
Publication status | Published - 2010 |
Externally published | Yes |
Event | 2010 International Conference on Auditory-Visual Speech Processing, AVSP 2010 - Hakone, Japan Duration: 2010 Sept 30 → 2010 Oct 3 |
Conference
Conference | 2010 International Conference on Auditory-Visual Speech Processing, AVSP 2010 |
---|---|
Country/Territory | Japan |
City | Hakone |
Period | 10/9/30 → 10/10/3 |
Keywords
- VAD
- multi-modal
- voice activity detection
ASJC Scopus subject areas
- Language and Linguistics
- Speech and Hearing
- Otorhinolaryngology