Multi-modal translation system and its evaluation

Shigeo Morishima, S. Nakamura

研究成果: Conference contribution

2 被引用数 (Scopus)

抄録

Speech-to-speech translation has been studied to realize natural human communication beyond language barriers. Toward further multi-modal natural communication, visual information such as face and lip movements will be necessary. We introduce a multi-modal English-to-Japanese and Japanese-to-English translation system that also translates the speaker's speech motion while synchronizing it to the translated speech. To retain the speaker's facial expression, we substitute only the speech organ's image with the synthesized one, which is made by a three-dimensional wire-frame model that is adaptable to any speaker. Our approach enables image synthesis and translation with an extremely small database. We conduct subjective evaluation tests using the connected digit discrimination test using data with and without audio-visual lip-synchronization. The results confirm the significant quality of the proposed audio-visual translation system and the importance of lip-synchronization.

本文言語English
ホスト出版物のタイトルProceedings - 4th IEEE International Conference on Multimodal Interfaces, ICMI 2002
出版社Institute of Electrical and Electronics Engineers Inc.
ページ241-246
ページ数6
ISBN(印刷版)0769518346, 9780769518343
DOI
出版ステータスPublished - 2002
外部発表はい
イベント4th IEEE International Conference on Multimodal Interfaces, ICMI 2002 - Pittsburgh, United States
継続期間: 2002 10月 142002 10月 16

Other

Other4th IEEE International Conference on Multimodal Interfaces, ICMI 2002
国/地域United States
CityPittsburgh
Period02/10/1402/10/16

ASJC Scopus subject areas

  • 人工知能
  • コンピュータ グラフィックスおよびコンピュータ支援設計
  • コンピュータ ビジョンおよびパターン認識
  • ハードウェアとアーキテクチャ

フィンガープリント

「Multi-modal translation system and its evaluation」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル