TY - GEN
T1 - Accent neutralization for speech recognition of non-native speakers
AU - Radzikowski, Kacper
AU - Forc, Mateusz
AU - Wang, Le
AU - Yoshie, Osamu
AU - Nowak, Robert
N1 - Publisher Copyright:
© 2019 Association for Computing Machinery.
PY - 2019/12/2
Y1 - 2019/12/2
N2 - These days, automatic speech recognition (ASR) systems achieve higher and higher accuracy rates. The score drops significantly, in case when the ASR system is being used with a non-native speaker of the language to be recognized. The main reason is specific pronunciation and accent features. A limited volume of labeled nonnative speech datasets makes it difficult to train new ASR systems for non-native speakers. In our research,we tried tackling the problem and its influence on the accuracy of ASR systems, using the style transfer methodology. We designed a pipeline for modifying the speech of a non-native speaker, so that it resembles the native speech to a higher extent. Our methodology can be used as a wrapper for any existing ASR system, which reduces the necessity of training new algorithms for non-native speech. The modification can be thus performed before passing the data forward to the speech recognition system itself.
AB - These days, automatic speech recognition (ASR) systems achieve higher and higher accuracy rates. The score drops significantly, in case when the ASR system is being used with a non-native speaker of the language to be recognized. The main reason is specific pronunciation and accent features. A limited volume of labeled nonnative speech datasets makes it difficult to train new ASR systems for non-native speakers. In our research,we tried tackling the problem and its influence on the accuracy of ASR systems, using the style transfer methodology. We designed a pipeline for modifying the speech of a non-native speaker, so that it resembles the native speech to a higher extent. Our methodology can be used as a wrapper for any existing ASR system, which reduces the necessity of training new algorithms for non-native speech. The modification can be thus performed before passing the data forward to the speech recognition system itself.
KW - Deep learning
KW - Machine learning
KW - Neural network
KW - Non-native speaker
KW - Speech recognition
KW - Style transfer
UR - http://www.scopus.com/inward/record.url?scp=85123043043&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85123043043&partnerID=8YFLogxK
U2 - 10.1145/3366030.3366083
DO - 10.1145/3366030.3366083
M3 - Conference contribution
AN - SCOPUS:85123043043
T3 - ACM International Conference Proceeding Series
BT - 21st International Conference on Information Integration and Web-Based Applications and Services, iiWAS 2019 - Proceedings
A2 - Indrawan-Santiago, Maria
A2 - Pardede, Eric
A2 - Salvadori, Ivan Luiz
A2 - Steinbauer, Matthias
A2 - Khalil, Ismail
A2 - Anderst-Kotsis, Gabriele
PB - Association for Computing Machinery
T2 - 21st International Conference on Information Integration and Web-Based Applications and Services, iiWAS 2019
Y2 - 2 December 2019 through 4 December 2019
ER -