TY - GEN
T1 - Speaker Diarization with Region Proposal Network
AU - Huang, Zili
AU - Watanabe, Shinji
AU - Fujita, Yusuke
AU - Garcia, Paola
AU - Shao, Yiwen
AU - Povey, Daniel
AU - Khudanpur, Sanjeev
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/5
Y1 - 2020/5
N2 - Speaker diarization is an important pre-processing step for many speech applications, and it aims to solve the "who spoke when" problem. Although the standard diarization systems can achieve satisfactory results in various scenarios, they are composed of several independently-optimized modules and cannot deal with the overlapped speech. In this paper, we propose a novel speaker diarization method: Region Proposal Network based Speaker Diarization (RPNSD). In this method, a neural network generates overlapped speech segment proposals, and compute their speaker embeddings at the same time. Compared with standard diarization systems, RPNSD has a shorter pipeline and can handle the overlapped speech. Experimental results on three diarization datasets reveal that RPNSD achieves remarkable improvements over the state-of-the-art x-vector baseline.
AB - Speaker diarization is an important pre-processing step for many speech applications, and it aims to solve the "who spoke when" problem. Although the standard diarization systems can achieve satisfactory results in various scenarios, they are composed of several independently-optimized modules and cannot deal with the overlapped speech. In this paper, we propose a novel speaker diarization method: Region Proposal Network based Speaker Diarization (RPNSD). In this method, a neural network generates overlapped speech segment proposals, and compute their speaker embeddings at the same time. Compared with standard diarization systems, RPNSD has a shorter pipeline and can handle the overlapped speech. Experimental results on three diarization datasets reveal that RPNSD achieves remarkable improvements over the state-of-the-art x-vector baseline.
KW - Faster R-CNN
KW - end-to-end
KW - neural network
KW - region proposal network
KW - speaker diarization
UR - http://www.scopus.com/inward/record.url?scp=85089220828&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85089220828&partnerID=8YFLogxK
U2 - 10.1109/ICASSP40776.2020.9053760
DO - 10.1109/ICASSP40776.2020.9053760
M3 - Conference contribution
AN - SCOPUS:85089220828
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 6514
EP - 6518
BT - 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020
Y2 - 4 May 2020 through 8 May 2020
ER -