Improved binaural sound localization and tracking for unknown time-varying number of speakers

Ui Hyun Kim*, Hiroshi G. Okuno

*この研究の対応する著者

研究成果: Article査読

9 被引用数 (Scopus)

抄録

A method based on the generalized cross-correlation (GCC) method weighted by the phase transform (PHAT) has been developed for binaural sound source localization (SSL) and tracking of multiple sound sources. Accurate binaural audition is important for applying inexpensive and widely applicable auditory capabilities to robots and systems. Conventional SSL based on the GCC-PHAT method is degraded by low resolution of the time difference of arrival estimation, by the interference created when the sound waves arrive at a microphone from two directions around the robot head, and by impaired performance when there are multiple speakers. The low-resolution problem is solved by using a maximum-likelihood-based SSL method in the frequency domain. The multipath interference problem is avoided by incorporating a new time delay factor into the GCC-PHAT method with assuming a spherical robot head. The performance when there are multiple speakers was improved by using a multisource speech tracking method consisting of voice activity detection (VAD) and K-means clustering. The standard K-means clustering algorithm was extended to enable tracking of an unknown time-varying number of speakers by adding two additional steps that increase the number of clusters automatically and eliminate clusters containing incorrect direction estimations. Experiments conducted on the SIG-2 humanoid robot show that this method outperforms the conventional SSL method; it reduces localization errors by 18.1° on average and by over 37° in the side directions. It also tracks multiple speakers in real time with tracking errors below 4.35°.

本文言語English
ページ(範囲)1161-1173
ページ数13
ジャーナルAdvanced Robotics
27
15
DOI
出版ステータスPublished - 2013 7月
外部発表はい

ASJC Scopus subject areas

  • 制御およびシステム工学
  • 人間とコンピュータの相互作用
  • コンピュータ サイエンスの応用
  • ハードウェアとアーキテクチャ
  • ソフトウェア

フィンガープリント

「Improved binaural sound localization and tracking for unknown time-varying number of speakers」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル