Abstract
This paper presents a multisource sound localization method based on the generalized cross-correlation (GCC) method weighted by the phase transform (PHAT) and a novel multisource speech tracking method consisting of voice activity detection (VAD) and K-means clustering algorithm for binaural robot audition. The standard K-means clustering algorithm was improved for the purpose of multisource speech tracking by adding two additional steps. Experiments conducted on the SIG-2 humanoid robot in a real environment show that our method can track multiple speakers in real-time with tracking error below 4.35°.
Original language | English |
---|---|
Title of host publication | International Workshop on Image Analysis for Multimedia Interactive Services |
DOIs | |
Publication status | Published - 2013 |
Externally published | Yes |
Event | 2013 14th International Workshop on Image Analysis for Multimedia Interactive Services, WIAMIS 2013 - Paris Duration: 2013 Jul 3 → 2013 Jul 5 |
Other
Other | 2013 14th International Workshop on Image Analysis for Multimedia Interactive Services, WIAMIS 2013 |
---|---|
City | Paris |
Period | 13/7/3 → 13/7/5 |
ASJC Scopus subject areas
- Computer Graphics and Computer-Aided Design
- Human-Computer Interaction
- Software