Speech recognition using deep canonical correlation analysis in noisy environments

Shinnosuke Isobe, Satoshi Tamura, Satoru Hayamizu

研究成果: Conference contribution

3 被引用数 (Scopus)

抄録

In this paper, we propose a method to improve the accuracy of speech recognition in noisy environments by utilizing Deep Canonical Correlation Analysis (DCCA). DCCA generates projections from two modalities into one common space, so that the correlation of projected vectors could be maximized. Our idea is to employ DCCA techniques with audio and visual modalities to enhance the robustness of Automatic Speech Recognition (ASR); A) noisy audio features can be recovered by clean visual features, and B) an ASR model can be trained using audio and visual features, as data augmentation. We evaluated our method using an audiovisual corpus CENSREC-1-AV and a noise database DEMAND. Compared to conventional ASR and feature-fusion-based audio-visual speech recognition, our DCCA-based recognizers achieved better performance. In addition, experimental results shows that utilizing DCCA enables us to get better results in various noisy environments, thanks to the visual modality. Furthermore, it is found that DCCA can be used as a data augmentation scheme if only a few training data are available, by incorporating visual DCCA features to build an audio-only ASR model, in addition to audio DCCA features.

本文言語English
ホスト出版物のタイトルICPRAM 2021 - Proceedings of the 10th International Conference on Pattern Recognition Applications and Methods
編集者Maria De Marsico, Gabriella Sanniti di Baja, Ana Fred
出版社SciTePress
ページ63-70
ページ数8
ISBN(電子版)9789897584862
出版ステータスPublished - 2021
外部発表はい
イベント10th International Conference on Pattern Recognition Applications and Methods, ICPRAM 2021 - Virtual, Online
継続期間: 2021 2月 42021 2月 6

出版物シリーズ

名前ICPRAM 2021 - Proceedings of the 10th International Conference on Pattern Recognition Applications and Methods

Conference

Conference10th International Conference on Pattern Recognition Applications and Methods, ICPRAM 2021
CityVirtual, Online
Period21/2/421/2/6

ASJC Scopus subject areas

  • コンピュータ ビジョンおよびパターン認識

フィンガープリント

「Speech recognition using deep canonical correlation analysis in noisy environments」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル