Multimodal interaction system that integrates speech and visual information

Satoru Hayamizu, Osamu Hasegawa, Katunobu Itou, Takashi Yoshimura, Tomoyoshi Akiba, Hideki Asoh, Shotaro Akaho, Takio Kurita, Katsuhiko Sakaue

Research output: Contribution to journalArticlepeer-review


This paper presents the studies related to multimodal interaction systems. It also describes our new direction in the research, `Intermodal Learning'. The prototype system has four modes: vision, graphical display, speech recognition, and speech synthesis sub-systems, and an interaction manager. We demonstrated that it can learn user's face and name and the appearance and names of objects. A speech recognition technique to estimate phonetic transcriptions from multiple speech samples was used to learn new words. This is similar to a baby learning about the real world by communicating with its parents.

Original languageEnglish
Pages (from-to)37-44
Number of pages8
JournalDenshi Gijutsu Sogo Kenkyusho Iho/Bulletin of the Electrotechnical Laboratory
Issue number4-5
Publication statusPublished - 2000
Externally publishedYes

ASJC Scopus subject areas

  • Condensed Matter Physics
  • Electrical and Electronic Engineering


Dive into the research topics of 'Multimodal interaction system that integrates speech and visual information'. Together they form a unique fingerprint.

Cite this