TY - GEN
T1 - A Bi-directional Multiple Timescales LSTM Model for Grounding of Actions and Verbs
AU - Antunes, Alexandre
AU - Laflaquiere, Alban
AU - Ogata, Tetsuya
AU - Cangelosi, Angelo
N1 - Funding Information:
This work is funded by the H2020 Marie Skłodowska-Curie Innovative Training Networks through the APRIL (H2020-MSCA-ITN-2015-674868) project.
Publisher Copyright:
© 2019 IEEE.
PY - 2019/11
Y1 - 2019/11
N2 - In this paper we present a neural architecture to learn a bi-directional mapping between actions and language. We implement a Multiple Timescale Long Short-Term Memory (MT-LSTM) network comprised of 7 layers with different timescale factors, to connect actions to language without explicitly learning an intermediate representation. Instead, the model self-organizes such representations at the level of a slow-varying latent layer, linking action branch and language branch at the center. We train the model in a bi-directional way, learning how to produce a sentence from a certain action sequence input and, simultaneously, how to generate an action sequence given a sentence as input. Furthermore we show this model preserves some of the generalization behaviour of Multiple Timescale Recurrent Neural Networks (MTRNN) in generating sentences and actions that were not explicitly trained. We compare this model with a number of different baseline models, confirming the importance of both the bi-directional training and the multiple timescales architecture. Finally, the network was evaluated on motor actions performed by an iCub robot and their corresponding letter-based description. The results of these experiments are presented at the end of the paper.
AB - In this paper we present a neural architecture to learn a bi-directional mapping between actions and language. We implement a Multiple Timescale Long Short-Term Memory (MT-LSTM) network comprised of 7 layers with different timescale factors, to connect actions to language without explicitly learning an intermediate representation. Instead, the model self-organizes such representations at the level of a slow-varying latent layer, linking action branch and language branch at the center. We train the model in a bi-directional way, learning how to produce a sentence from a certain action sequence input and, simultaneously, how to generate an action sequence given a sentence as input. Furthermore we show this model preserves some of the generalization behaviour of Multiple Timescale Recurrent Neural Networks (MTRNN) in generating sentences and actions that were not explicitly trained. We compare this model with a number of different baseline models, confirming the importance of both the bi-directional training and the multiple timescales architecture. Finally, the network was evaluated on motor actions performed by an iCub robot and their corresponding letter-based description. The results of these experiments are presented at the end of the paper.
UR - http://www.scopus.com/inward/record.url?scp=85081155063&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85081155063&partnerID=8YFLogxK
U2 - 10.1109/IROS40897.2019.8967799
DO - 10.1109/IROS40897.2019.8967799
M3 - Conference contribution
AN - SCOPUS:85081155063
T3 - IEEE International Conference on Intelligent Robots and Systems
SP - 2614
EP - 2621
BT - 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2019
Y2 - 3 November 2019 through 8 November 2019
ER -