TY - GEN
T1 - End-to-End Speaker Diarization Conditioned on Speech Activity and Overlap Detection
AU - Takashima, Yuki
AU - Fujita, Yusuke
AU - Watanabe, Shinji
AU - Horiguchi, Shota
AU - Garcia, Paola
AU - Nagamatsu, Kenji
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021/1/19
Y1 - 2021/1/19
N2 - In this paper, we present a conditional multitask learning method for end-to-end neural speaker diarization (EEND). The EEND system has shown promising performance compared with traditional clustering-based methods, especially in the case of overlapping speech. In this paper, to further improve the performance of the EEND system, we propose a novel multitask learning framework that solves speaker diarization and a desired subtask while explicitly considering the task dependency. We optimize speaker diarization conditioned on speech activity and overlap detection that are subtasks of speaker diarization, based on the probabilistic chain rule. Experimental results show that our proposed method can leverage a subtask to effectively model speaker diarization, and outperforms conventional EEND systems in terms of diarization error rate.
AB - In this paper, we present a conditional multitask learning method for end-to-end neural speaker diarization (EEND). The EEND system has shown promising performance compared with traditional clustering-based methods, especially in the case of overlapping speech. In this paper, to further improve the performance of the EEND system, we propose a novel multitask learning framework that solves speaker diarization and a desired subtask while explicitly considering the task dependency. We optimize speaker diarization conditioned on speech activity and overlap detection that are subtasks of speaker diarization, based on the probabilistic chain rule. Experimental results show that our proposed method can leverage a subtask to effectively model speaker diarization, and outperforms conventional EEND systems in terms of diarization error rate.
KW - chain rule
KW - end-to-end
KW - multitask learning
KW - neural network
KW - speaker diarization
UR - http://www.scopus.com/inward/record.url?scp=85103974334&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85103974334&partnerID=8YFLogxK
U2 - 10.1109/SLT48900.2021.9383555
DO - 10.1109/SLT48900.2021.9383555
M3 - Conference contribution
AN - SCOPUS:85103974334
T3 - 2021 IEEE Spoken Language Technology Workshop, SLT 2021 - Proceedings
SP - 849
EP - 856
BT - 2021 IEEE Spoken Language Technology Workshop, SLT 2021 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2021 IEEE Spoken Language Technology Workshop, SLT 2021
Y2 - 19 January 2021 through 22 January 2021
ER -