TY - GEN
T1 - Non-native speech recognition using audio style transfer
AU - Radzikowski, Kacper
AU - Forc, Mateusz
AU - Wang, Le
AU - Yoshie, Osamu
AU - Nowak, Robert M.
N1 - Funding Information:
This work was supported by the statutory funds of Institute of Computer Science of Warsaw University of Technology.
Publisher Copyright:
© 2019 SPIE.
PY - 2019
Y1 - 2019
N2 - Recently automatic speech recognition (ASR) systems achieve higher and higher accuracy rates. However, the score drops significantly, when the ASR system is being used with a non-native speaker of the language to be recognized, mainly because of specific pronunciation and accent features. A limited volume of labeled datasets containing samples of a non-native speech makes it difficult to train any new ASR systems targeted for non-native speakers. In our research, we tried tackling the problem of a non-native accent and its influence on the accuracy of ASR systems, using the style transfer methodology. We designed a pipeline for modifying the speech produced by a nonnative speaker, so that it resembles the native speech to a higher extent, i.e. a method for accent neutralization. Our methodology can be used as a wrapper for any existing ASR system, which reduces the necessity of training new speech recognizers, adapted for non-native speech. The modification can be thus performed on the fly, before passing the data forward to the speech recognition system itself.
AB - Recently automatic speech recognition (ASR) systems achieve higher and higher accuracy rates. However, the score drops significantly, when the ASR system is being used with a non-native speaker of the language to be recognized, mainly because of specific pronunciation and accent features. A limited volume of labeled datasets containing samples of a non-native speech makes it difficult to train any new ASR systems targeted for non-native speakers. In our research, we tried tackling the problem of a non-native accent and its influence on the accuracy of ASR systems, using the style transfer methodology. We designed a pipeline for modifying the speech produced by a nonnative speaker, so that it resembles the native speech to a higher extent, i.e. a method for accent neutralization. Our methodology can be used as a wrapper for any existing ASR system, which reduces the necessity of training new speech recognizers, adapted for non-native speech. The modification can be thus performed on the fly, before passing the data forward to the speech recognition system itself.
UR - http://www.scopus.com/inward/record.url?scp=85075777625&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85075777625&partnerID=8YFLogxK
U2 - 10.1117/12.2536535
DO - 10.1117/12.2536535
M3 - Conference contribution
AN - SCOPUS:85075777625
T3 - Proceedings of SPIE - The International Society for Optical Engineering
BT - Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments 2019
A2 - Romaniuk, Ryszard S.
A2 - Linczuk, Maciej
PB - SPIE
T2 - Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments 2019
Y2 - 26 May 2019 through 2 June 2019
ER -