TY - GEN
T1 - Speaker indexing and speech enhancement in real meetings/conversations
AU - Araki, Shoko
AU - Fujimoto, Masakiyo
AU - Ishizuka, Kentaro
AU - Sawada, Hiroshi
AU - Makino, Shoji
PY - 2008
Y1 - 2008
N2 - This paper presents a speaker indexing method that uses a small number of microphones to estimate who spoke when. Our proposed speaker indexing is realized by using a noise robust voice activity detector (VAD), a GCC-PHAT based direction of arrival (DOA) estimator, and a DOA classifier. Using the estimated speaker indexing information, we can also enhance the utterances of each speaker with a maximum signal-to-noise-ratio (MaxSNR) beamformer. This paper applies our system to real recorded meetings / conversations recorded in a room with a reverberation time of 350 ms, and evaluates the performance by a standard measure: the diarization error rate (DER). Even for the real conversations, which have many speaker turn-takings and overlaps, the speaker error time was very small with our proposed system. We are planning to demonstrate a real-time speaker indexing system at ICASSP2008.
AB - This paper presents a speaker indexing method that uses a small number of microphones to estimate who spoke when. Our proposed speaker indexing is realized by using a noise robust voice activity detector (VAD), a GCC-PHAT based direction of arrival (DOA) estimator, and a DOA classifier. Using the estimated speaker indexing information, we can also enhance the utterances of each speaker with a maximum signal-to-noise-ratio (MaxSNR) beamformer. This paper applies our system to real recorded meetings / conversations recorded in a room with a reverberation time of 350 ms, and evaluates the performance by a standard measure: the diarization error rate (DER). Even for the real conversations, which have many speaker turn-takings and overlaps, the speaker error time was very small with our proposed system. We are planning to demonstrate a real-time speaker indexing system at ICASSP2008.
KW - Diarization
KW - Maximum SNR beamformer
KW - Speaker indexing
KW - Voice activity detector
UR - http://www.scopus.com/inward/record.url?scp=51449113843&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=51449113843&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2008.4517554
DO - 10.1109/ICASSP.2008.4517554
M3 - Conference contribution
AN - SCOPUS:51449113843
SN - 1424414849
SN - 9781424414840
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 93
EP - 96
BT - 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP
T2 - 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP
Y2 - 31 March 2008 through 4 April 2008
ER -