TY - GEN
T1 - Audio style transfer in non-native speech recognition
AU - Radzikowski, Kacper Pawel
PY - 2018/1/1
Y1 - 2018/1/1
N2 - Current automatic speech recognition (ASR) systems achieve the over 90-95% accuracy, depending on methodology applied and datasets. However, the accuracy drops significantly, while the ASR system is being used with a non-native speaker of the language to be recognized, mainly because of specific pronunciation features. At the same time, the volume of labeled datasets of non-native speech samples is extremely limited both in size as well as in the number of existing languages, which makes it difficult to train sufficiently accurate ASR systems targeted for non-native speakers. Therefore applying a different method is necessary. In this paper, we suggest an idea for an alternative approach to the problem, by employing so-called style transfer methodology. Style transfer, used mainly in graphical domain until now, could help solve the problem of non-native speech. Another advantage is that the style transferring algorithm could be compatible with already existing ASR systems, which means it would not be necessary to train new systems which can be difficult and time consuming.
AB - Current automatic speech recognition (ASR) systems achieve the over 90-95% accuracy, depending on methodology applied and datasets. However, the accuracy drops significantly, while the ASR system is being used with a non-native speaker of the language to be recognized, mainly because of specific pronunciation features. At the same time, the volume of labeled datasets of non-native speech samples is extremely limited both in size as well as in the number of existing languages, which makes it difficult to train sufficiently accurate ASR systems targeted for non-native speakers. Therefore applying a different method is necessary. In this paper, we suggest an idea for an alternative approach to the problem, by employing so-called style transfer methodology. Style transfer, used mainly in graphical domain until now, could help solve the problem of non-native speech. Another advantage is that the style transferring algorithm could be compatible with already existing ASR systems, which means it would not be necessary to train new systems which can be difficult and time consuming.
KW - Artificial intelligence
KW - Deep learning
KW - Machine learning
KW - Non-native speaker
KW - Speech recognition
KW - Style transfer
UR - http://www.scopus.com/inward/record.url?scp=85056259045&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85056259045&partnerID=8YFLogxK
U2 - 10.1117/12.2501495
DO - 10.1117/12.2501495
M3 - Conference contribution
AN - SCOPUS:85056259045
T3 - Proceedings of SPIE - The International Society for Optical Engineering
BT - Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments 2018
A2 - Romaniuk, Ryszard S.
A2 - Linczuk, Maciej
PB - SPIE
T2 - Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments 2018
Y2 - 3 June 2018 through 10 June 2018
ER -