Speaker Diarization with Region Proposal Network

Zili Huang, Shinji Watanabe, Yusuke Fujita, Paola Garcia, Yiwen Shao, Daniel Povey, Sanjeev Khudanpur

研究成果: Conference contribution

39 被引用数 (Scopus)

抄録

Speaker diarization is an important pre-processing step for many speech applications, and it aims to solve the "who spoke when" problem. Although the standard diarization systems can achieve satisfactory results in various scenarios, they are composed of several independently-optimized modules and cannot deal with the overlapped speech. In this paper, we propose a novel speaker diarization method: Region Proposal Network based Speaker Diarization (RPNSD). In this method, a neural network generates overlapped speech segment proposals, and compute their speaker embeddings at the same time. Compared with standard diarization systems, RPNSD has a shorter pipeline and can handle the overlapped speech. Experimental results on three diarization datasets reveal that RPNSD achieves remarkable improvements over the state-of-the-art x-vector baseline.

本文言語English
ホスト出版物のタイトル2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings
出版社Institute of Electrical and Electronics Engineers Inc.
ページ6514-6518
ページ数5
ISBN(電子版)9781509066315
DOI
出版ステータスPublished - 2020 5月
外部発表はい
イベント2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Barcelona, Spain
継続期間: 2020 5月 42020 5月 8

出版物シリーズ

名前ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
2020-May
ISSN(印刷版)1520-6149

Conference

Conference2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020
国/地域Spain
CityBarcelona
Period20/5/420/5/8

ASJC Scopus subject areas

  • ソフトウェア
  • 信号処理
  • 電子工学および電気工学

フィンガープリント

「Speaker Diarization with Region Proposal Network」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル