TY - GEN
T1 - Cross-Lingual Transfer for Speech Processing Using Acoustic Language Similarity
AU - Wu, Peter
AU - Shi, Jiatong
AU - Zhong, Yifan
AU - Watanabe, Shinji
AU - Black, Alan W.
N1 - Funding Information:
We thank Sai Krishna Rallabandi and our listening test participants for helping us collect MOS values for our Indic TTS experiments. This work used the Extreme Science and Engineering Discovery Environment (XSEDE) [58], which is supported by National Science Foundation grant number ACI-1548562. Specifically, it used the Bridges system [59], which is supported by NSF award number ACI-1445606, at the Pittsburgh Supercomputing Center (PSC).
Publisher Copyright:
© 2021 IEEE.
PY - 2021
Y1 - 2021
N2 - Speech processing systems currently do not support the vast majority of languages, in part due to the lack of data in low-resource languages. Cross-lingual transfer offers a compelling way to help bridge this digital divide by incorporating high-resource data into low-resource systems. Current cross-lingual algorithms have shown success in text-based tasks and speech-related tasks over some low-resource languages. However, scaling up speech systems to support hundreds of low-resource languages remains unsolved. To help bridge this gap, we propose a language similarity approach that can efficiently identify acoustic cross-lingual transfer pairs across hundreds of languages. We demonstrate the effectiveness of our approach in language family classification, speech recognition, and speech synthesis tasks.
AB - Speech processing systems currently do not support the vast majority of languages, in part due to the lack of data in low-resource languages. Cross-lingual transfer offers a compelling way to help bridge this digital divide by incorporating high-resource data into low-resource systems. Current cross-lingual algorithms have shown success in text-based tasks and speech-related tasks over some low-resource languages. However, scaling up speech systems to support hundreds of low-resource languages remains unsolved. To help bridge this gap, we propose a language similarity approach that can efficiently identify acoustic cross-lingual transfer pairs across hundreds of languages. We demonstrate the effectiveness of our approach in language family classification, speech recognition, and speech synthesis tasks.
KW - ASR
KW - TTS
KW - cross-lingual
KW - zero-shot
UR - http://www.scopus.com/inward/record.url?scp=85126768077&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85126768077&partnerID=8YFLogxK
U2 - 10.1109/ASRU51503.2021.9688276
DO - 10.1109/ASRU51503.2021.9688276
M3 - Conference contribution
AN - SCOPUS:85126768077
T3 - 2021 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2021 - Proceedings
SP - 1050
EP - 1057
BT - 2021 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2021 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2021 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2021
Y2 - 13 December 2021 through 17 December 2021
ER -