A DOA based speaker diarization system for real meetings

Shoko Araki*, Masakiyo Fujimoto, Kentaro Ishizuka, Hiroshi Sawada, Shoji Makino

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

26 Citations (Scopus)

Abstract

This paper presents a speaker diarization system that estimates who spoke when in a meeting. Our proposed system is realized by using a noise robust voice activity detector (VAD), a direction of arrival (DOA) estimator, and a DOA classifier. Our previous system utilized the generalized cross correlation method with the phase transform (GCC-PHAT) approach for the DOA estimation. Because the GCC-PHAT can estimate just one DOA per frame, it was difficult to handle speaker overlaps. This paper tries to deal with this issue by employing a DOA at each time-frequency slot (TFDOA), and reports how it improves diarization performance for real meetings / conversations recorded in a room with a reverberation time of 350 ms.

Original languageEnglish
Title of host publication2008 Hands-free Speech Communication and Microphone Arrays, Proceedings, HSCMA 2008
Pages29-32
Number of pages4
DOIs
Publication statusPublished - 2008
Externally publishedYes
Event2008 Hands-free Speech Communication and Microphone Arrays, HSCMA 2008 - Trento, Italy
Duration: 2008 May 62008 May 8

Publication series

Name2008 Hands-free Speech Communication and Microphone Arrays, Proceedings, HSCMA 2008

Conference

Conference2008 Hands-free Speech Communication and Microphone Arrays, HSCMA 2008
Country/TerritoryItaly
CityTrento
Period08/5/608/5/8

Keywords

  • Diarization
  • Direction of arrival
  • Voice activity detector

ASJC Scopus subject areas

  • Hardware and Architecture
  • Electrical and Electronic Engineering
  • Communication

Fingerprint

Dive into the research topics of 'A DOA based speaker diarization system for real meetings'. Together they form a unique fingerprint.

Cite this