Improved binaural sound localization and tracking for unknown time-varying number of speakers

Ui Hyun Kim*, Hiroshi G. Okuno

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

9 Citations (Scopus)


A method based on the generalized cross-correlation (GCC) method weighted by the phase transform (PHAT) has been developed for binaural sound source localization (SSL) and tracking of multiple sound sources. Accurate binaural audition is important for applying inexpensive and widely applicable auditory capabilities to robots and systems. Conventional SSL based on the GCC-PHAT method is degraded by low resolution of the time difference of arrival estimation, by the interference created when the sound waves arrive at a microphone from two directions around the robot head, and by impaired performance when there are multiple speakers. The low-resolution problem is solved by using a maximum-likelihood-based SSL method in the frequency domain. The multipath interference problem is avoided by incorporating a new time delay factor into the GCC-PHAT method with assuming a spherical robot head. The performance when there are multiple speakers was improved by using a multisource speech tracking method consisting of voice activity detection (VAD) and K-means clustering. The standard K-means clustering algorithm was extended to enable tracking of an unknown time-varying number of speakers by adding two additional steps that increase the number of clusters automatically and eliminate clusters containing incorrect direction estimations. Experiments conducted on the SIG-2 humanoid robot show that this method outperforms the conventional SSL method; it reduces localization errors by 18.1° on average and by over 37° in the side directions. It also tracks multiple speakers in real time with tracking errors below 4.35°.

Original languageEnglish
Pages (from-to)1161-1173
Number of pages13
JournalAdvanced Robotics
Issue number15
Publication statusPublished - 2013 Jul
Externally publishedYes


  • Binaural sound localization
  • Human-robot interaction
  • Multisource sound tracking

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Human-Computer Interaction
  • Computer Science Applications
  • Hardware and Architecture
  • Software


Dive into the research topics of 'Improved binaural sound localization and tracking for unknown time-varying number of speakers'. Together they form a unique fingerprint.

Cite this