Paired recurrent autoencoders for bidirectional translation between robot actions and linguistic descriptions

Tatsuro Yamada, Hiroyuki Matsunaga, Tetsuya Ogata*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

33 Citations (Scopus)

Abstract

We propose a novel deep learning framework for bidirectional translation between robot actions and their linguistic descriptions. Our model consists of two recurrent autoencoders (RAEs). One RAE learns to encode action sequences as fixed-dimensional vectors in a way that allows the sequences to be reproduced from the vectors by its decoder. The other RAE learns to encode descriptions in a similar way. In the learning process, in addition to reproduction losses, we create another loss function whereby the representations of an action and its corresponding description approach each other in the latent vector space. Across the shared representation, the trained model can produce a linguistic description given a robot action. The model is also able to generate an appropriate action by receiving a linguistic instruction, conditioned on the current visual input. Visualization of the latent representations shows that the robot actions are embedded in a semantically compositional way in the vector space by being learned jointly with descriptions.

Original languageEnglish
Article number8403309
Pages (from-to)3441-3448
Number of pages8
JournalIEEE Robotics and Automation Letters
Volume3
Issue number4
DOIs
Publication statusPublished - 2018 Oct

Keywords

  • AI-based methods
  • Deep learning in robotics and automation
  • neurorobotics

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Biomedical Engineering
  • Human-Computer Interaction
  • Mechanical Engineering
  • Computer Vision and Pattern Recognition
  • Computer Science Applications
  • Control and Optimization
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Paired recurrent autoencoders for bidirectional translation between robot actions and linguistic descriptions'. Together they form a unique fingerprint.

Cite this