This paper presents the studies related to multimodal interaction systems. It also describes our new direction in the research, `Intermodal Learning'. The prototype system has four modes: vision, graphical display, speech recognition, and speech synthesis sub-systems, and an interaction manager. We demonstrated that it can learn user's face and name and the appearance and names of objects. A speech recognition technique to estimate phonetic transcriptions from multiple speech samples was used to learn new words. This is similar to a baby learning about the real world by communicating with its parents.
|Number of pages
|Denshi Gijutsu Sogo Kenkyusho Iho/Bulletin of the Electrotechnical Laboratory
|Published - 2000
ASJC Scopus subject areas
- Condensed Matter Physics
- Electrical and Electronic Engineering