Support software for Automatic Speech Recognition systems targeted for non-native speech

Kacper Radzikowski*, Osamu Yoshie, Robert Nowak

*この研究の対応する著者

研究成果: Conference contribution

抄録

Nowadays automatic speech recognition (ASR) systems can achieve higher and higher accuracy rates depending on the methodology applied and datasets used. The rate decreases significantly when the ASR system is being used with a non-native speaker of the language to be recognized. The main reason for this is specific pronunciation and accent features related to the mother tongue of that speaker, which influence the pronunciation. At the same time, an extremely limited volume of labeled non-native speech datasets makes it difficult to train, from the ground up, sufficiently accurate ASR systems for non-native speakers. In this research we address the problem and its influence on the accuracy of ASR systems, using the style transfer methodology. We designed a pipeline for modifying the speech of a non-native speaker so that it more closely resembles the native speech. This paper covers experiments for accent modification using different setups and different approaches, including neural style transfer and autoencoder. The experiments were conducted on English language pronounced by Japanese speakers (UME-ERJ dataset). The results show that there is a significant relative improvement in terms of the speech recognition accuracy. Our methodology reduces the necessity of training new algorithms for non-native speech (thus overcoming the obstacle related to the data scarcity) and can be used as a wrapper for any existing ASR system. The modification can be performed in real time, before a sample is passed into the speech recognition system itself.

本文言語English
ホスト出版物のタイトル22nd International Conference on Information Integration and Web-Based Applications and Services, iiWAS 2020 - Proceedings
編集者Maria Indrawan-Santiago, Eric Pardede, Ivan Luiz Salvadori, Matthias Steinbauer, Ismail Khalil, Gabriele Kotsis
出版社Association for Computing Machinery
ページ55-61
ページ数7
ISBN(電子版)9781450389228
DOI
出版ステータスPublished - 2020 11月 30
イベント22nd International Conference on Information Integration and Web-Based Applications and Services, iiWAS 2020 - Virtual, Online, Thailand
継続期間: 2020 11月 302020 12月 2

出版物シリーズ

名前ACM International Conference Proceeding Series

Conference

Conference22nd International Conference on Information Integration and Web-Based Applications and Services, iiWAS 2020
国/地域Thailand
CityVirtual, Online
Period20/11/3020/12/2

ASJC Scopus subject areas

  • ソフトウェア
  • 人間とコンピュータの相互作用
  • コンピュータ ビジョンおよびパターン認識
  • コンピュータ ネットワークおよび通信

フィンガープリント

「Support software for Automatic Speech Recognition systems targeted for non-native speech」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル