TY - GEN
T1 - Conferencingspeech Challenge
T2 - 2021 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2021
AU - Rao, Wei
AU - Fu, Yihui
AU - Hu, Yanxin
AU - Xu, Xin
AU - Jv, Yvkai
AU - Han, Jiangyu
AU - Jiang, Zhongjie
AU - Xie, Lei
AU - Wang, Yannan
AU - Watanabe, Shinji
AU - Tan, Zheng Hua
AU - Bu, Hui
AU - Yu, Tao
AU - Shang, Shidong
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021
Y1 - 2021
N2 - The ConferencingSpeech 2021 challenge is proposed to stimulate research on far-field multi-channel speech enhancement for video conferencing. The challenge consists of two separate tasks: 1) Task 1 is multi-channel speech enhancement with single microphone array and focusing on practical application with real-time requirement and 2) Task 2 is multi-channel speech enhancement with multiple distributed micro-phone arrays, which is a non-real-time track and does not have any constraints so that participants could explore any algorithms to obtain high speech quality. Targeting the real video conferencing room application, the challenge database was recorded from real speakers and all recording facilities were located by following the real setup of conferencing room. In this challenge, we open-sourced the list of open source clean speech and noise datasets, simulation scripts, and a baseline system for participants to develop their own system. The final ranking of the challenge will be decided by the subjective evaluation which is performed using Absolute Category Ratings (ACR) to estimate Mean Opinion Score (MOS), speech MOS (S-MOS), and noise MOS (N-MOS). This paper describes the challenge, tasks, datasets, subjective evaluation, and challenge results. The baseline system which is a complex ratio mask based neural network and its experimental results are also presented.
AB - The ConferencingSpeech 2021 challenge is proposed to stimulate research on far-field multi-channel speech enhancement for video conferencing. The challenge consists of two separate tasks: 1) Task 1 is multi-channel speech enhancement with single microphone array and focusing on practical application with real-time requirement and 2) Task 2 is multi-channel speech enhancement with multiple distributed micro-phone arrays, which is a non-real-time track and does not have any constraints so that participants could explore any algorithms to obtain high speech quality. Targeting the real video conferencing room application, the challenge database was recorded from real speakers and all recording facilities were located by following the real setup of conferencing room. In this challenge, we open-sourced the list of open source clean speech and noise datasets, simulation scripts, and a baseline system for participants to develop their own system. The final ranking of the challenge will be decided by the subjective evaluation which is performed using Absolute Category Ratings (ACR) to estimate Mean Opinion Score (MOS), speech MOS (S-MOS), and noise MOS (N-MOS). This paper describes the challenge, tasks, datasets, subjective evaluation, and challenge results. The baseline system which is a complex ratio mask based neural network and its experimental results are also presented.
KW - casual system
KW - ConferencingSpeech challenge
KW - multi-channel speech enhancement
KW - multiple distributed microphone arrays
KW - subjective evaluation
UR - http://www.scopus.com/inward/record.url?scp=85126797792&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85126797792&partnerID=8YFLogxK
U2 - 10.1109/ASRU51503.2021.9688126
DO - 10.1109/ASRU51503.2021.9688126
M3 - Conference contribution
AN - SCOPUS:85126797792
T3 - 2021 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2021 - Proceedings
SP - 679
EP - 686
BT - 2021 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2021 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 13 December 2021 through 17 December 2021
ER -