TY - JOUR
T1 - Embodying Pre-Trained Word Embeddings through Robot Actions
AU - Toyoda, Minori
AU - Suzuki, Kanata
AU - Mori, Hiroki
AU - Hayashi, Yoshihiko
AU - Ogata, Tetsuya
N1 - Funding Information:
Manuscript received October 15, 2020; accepted February 17, 2021. Date of publication March 23, 2021; date of current version April 8, 2021. This letter was recommended for publication by Associate Editor E. Ugur and Editor T. Asfour upon evaluation of the reviewers’ comments. This work was supported by JST CREST under Grant JPMJCR15E3, Japan. (Corresponding author: Tetsuya Ogata.) Minori Toyoda and Yoshihiko Hayashi are with the Faculty of Science and Engineering, Waseda University, Tokyo 169-8555, Japan (e-mail: minori-toyoda@fuji.waseda.jp; yshk.hayashi@aoni.waseda.jp).
Publisher Copyright:
© 2016 IEEE.
PY - 2021/4
Y1 - 2021/4
N2 - We propose a promising neural network model with which to acquire a grounded representation of robot actions and the linguistic descriptions thereof. Properly responding to various linguistic expressions, including polysemous words, is an important ability for robots that interact with people via linguistic dialogue. Previous studies have shown that robots can use words that are not included in the action-description paired datasets by using pre-trained word embeddings. However, the word embeddings trained under the distributional hypothesis are not grounded, as they are derived purely from a text corpus. In this letter, we transform the pre-trained word embeddings to embodied ones by using the robot's sensory-motor experiences. We extend a bidirectional translation model for actions and descriptions by incorporating non-linear layers that retrofit the word embeddings. By training the retrofit layer and the bidirectional translation model alternately, our proposed model is able to transform the pre-trained word embeddings to adapt to a paired action-description dataset. Our results demonstrate that the embeddings of synonyms form a semantic cluster by reflecting the experiences (actions and environments) of a robot. These embeddings allow the robot to properly generate actions from unseen words that are not paired with actions in a dataset.
AB - We propose a promising neural network model with which to acquire a grounded representation of robot actions and the linguistic descriptions thereof. Properly responding to various linguistic expressions, including polysemous words, is an important ability for robots that interact with people via linguistic dialogue. Previous studies have shown that robots can use words that are not included in the action-description paired datasets by using pre-trained word embeddings. However, the word embeddings trained under the distributional hypothesis are not grounded, as they are derived purely from a text corpus. In this letter, we transform the pre-trained word embeddings to embodied ones by using the robot's sensory-motor experiences. We extend a bidirectional translation model for actions and descriptions by incorporating non-linear layers that retrofit the word embeddings. By training the retrofit layer and the bidirectional translation model alternately, our proposed model is able to transform the pre-trained word embeddings to adapt to a paired action-description dataset. Our results demonstrate that the embeddings of synonyms form a semantic cluster by reflecting the experiences (actions and environments) of a robot. These embeddings allow the robot to properly generate actions from unseen words that are not paired with actions in a dataset.
KW - Learning from experience
KW - embodied cognitive science
KW - multi - modal perception for HRI
UR - http://www.scopus.com/inward/record.url?scp=85103259583&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85103259583&partnerID=8YFLogxK
U2 - 10.1109/LRA.2021.3067862
DO - 10.1109/LRA.2021.3067862
M3 - Article
AN - SCOPUS:85103259583
SN - 2377-3766
VL - 6
SP - 4225
EP - 4232
JO - IEEE Robotics and Automation Letters
JF - IEEE Robotics and Automation Letters
IS - 2
M1 - 9384172
ER -