Audio style transfer in non-native speech recognition

Kacper Pawel Radzikowski*

*この研究の対応する著者

研究成果: Conference contribution

抄録

Current automatic speech recognition (ASR) systems achieve the over 90-95% accuracy, depending on methodology applied and datasets. However, the accuracy drops significantly, while the ASR system is being used with a non-native speaker of the language to be recognized, mainly because of specific pronunciation features. At the same time, the volume of labeled datasets of non-native speech samples is extremely limited both in size as well as in the number of existing languages, which makes it difficult to train sufficiently accurate ASR systems targeted for non-native speakers. Therefore applying a different method is necessary. In this paper, we suggest an idea for an alternative approach to the problem, by employing so-called style transfer methodology. Style transfer, used mainly in graphical domain until now, could help solve the problem of non-native speech. Another advantage is that the style transferring algorithm could be compatible with already existing ASR systems, which means it would not be necessary to train new systems which can be difficult and time consuming.

本文言語English
ホスト出版物のタイトルPhotonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments 2018
編集者Ryszard S. Romaniuk, Maciej Linczuk
出版社SPIE
ISBN(電子版)9781510622036
DOI
出版ステータスPublished - 2018 1月 1
外部発表はい
イベントPhotonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments 2018 - Wilga, Poland
継続期間: 2018 6月 32018 6月 10

出版物シリーズ

名前Proceedings of SPIE - The International Society for Optical Engineering
10808
ISSN(印刷版)0277-786X
ISSN(電子版)1996-756X

Conference

ConferencePhotonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments 2018
国/地域Poland
CityWilga
Period18/6/318/6/10

ASJC Scopus subject areas

  • 電子材料、光学材料、および磁性材料
  • 凝縮系物理学
  • コンピュータ サイエンスの応用
  • 応用数学
  • 電子工学および電気工学

フィンガープリント

「Audio style transfer in non-native speech recognition」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル