Multimodal integration learning of robot behavior using deep neural networks

Kuniaki Noda*, Hiroaki Arie, Yuki Suga, Tetsuya Ogata

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

126 Citations (Scopus)


For humans to accurately understand the world around them, multimodal integration is essential because it enhances perceptual precision and reduces ambiguity. Computational models replicating such human ability may contribute to the practical use of robots in daily human living environments; however, primarily because of scalability problems that conventional machine learning algorithms suffer from, sensory-motor information processing in robotic applications has typically been achieved via modal-dependent processes. In this paper, we propose a novel computational framework enabling the integration of sensory-motor time-series data and the self-organization of multimodal fused representations based on a deep learning approach. To evaluate our proposed model, we conducted two behavior-learning experiments utilizing a humanoid robot; the experiments consisted of object manipulation and bell-ringing tasks. From our experimental results, we show that large amounts of sensory-motor information, including raw RGB images, sound spectrums, and joint angles, are directly fused to generate higher-level multimodal representations. Further, we demonstrated that our proposed framework realizes the following three functions: (1) cross-modal memory retrieval utilizing the information complementation capability of the deep autoencoder; (2) noise-robust behavior recognition utilizing the generalization capability of multimodal features; and (3) multimodal causality acquisition and sensory-motor prediction based on the acquired causality.

Original languageEnglish
Pages (from-to)721-736
Number of pages16
JournalRobotics and Autonomous Systems
Issue number6
Publication statusPublished - 2014 Jun


  • Cross-modal memory retrieval
  • Deep learning
  • Multimodal integration
  • Object manipulation

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Software
  • General Mathematics
  • Computer Science Applications


Dive into the research topics of 'Multimodal integration learning of robot behavior using deep neural networks'. Together they form a unique fingerprint.

Cite this