Audio-visual voice conversion using noise-robust features

Kohei Sawada, Masanori Takehara, Satoshi Tamura, Satoru Hayamizu

研究成果: Conference contribution

抄録

Voice Conversion (VC) is a technique to convert speech data of source speaker into ones of target speaker. VC has been investigated and statistical VC is used for various purposes. Conventional VC uses acoustic features, however, the audio-only VC has suffered from the degradation in noisy or real environments. This paper proposes an AudioVisual VC (AVVC) method using not only audio features but also visual information, i.e. lip images. Eigenlip feature is employed in our scheme as visual feature. We also propose a feature selection approach for audio-visual features. Experiments were conducted to evaluate our AVVC scheme comparing with audio-only VC, using noisy data. The results show that AVVC can improve the performance even in noisy environments, by properly selecting audio and visual parameters. It is also found that visual VC is also successful. Furthermore, it is observed that visual dynamic features are more effective than visual static information.

本文言語English
ホスト出版物のタイトル2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014
出版社Institute of Electrical and Electronics Engineers Inc.
ページ7899-7903
ページ数5
ISBN(印刷版)9781479928927
DOI
出版ステータスPublished - 2014
外部発表はい
イベント2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014 - Florence, Italy
継続期間: 2014 5月 42014 5月 9

出版物シリーズ

名前ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN(印刷版)1520-6149

Conference

Conference2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014
国/地域Italy
CityFlorence
Period14/5/414/5/9

ASJC Scopus subject areas

  • ソフトウェア
  • 信号処理
  • 電子工学および電気工学

フィンガープリント

「Audio-visual voice conversion using noise-robust features」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル