TY - GEN
T1 - Integrated Learning of Robot Motion and Sentences
T2 - 39th IEEE International Conference on Robotics and Automation, ICRA 2022
AU - Ito, Hiroshi
AU - Ichiwara, Hideyuki
AU - Yamamoto, Kenjiro
AU - Mori, Hiroki
AU - Ogata, Tetsuya
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - We propose a motion generation model that can achieve robust behavior against environmental changes based on language instructions at a low cost. Conventional robots that communicate with humans use a restricted environment and language to build up a mapping between language and motion, and thus need to prepare a huge training set in order to achieve versatility. Our method trains pairs of language, visual, and motor information of the robot, and generates motions in real-time based on the 'attention' of the language instructions. Specifically, the robot generates motions while focusing on the indicated objects by the human when multiple objects are in the field of view. In addition, since position recognition and motion generation of the indicated object are performed in real-time, robust motion generation is possible in response to changes in the object position and lighting conditions. We clarified that features related to the object name and its location are self-organized in the latent (PB: Parametric Bias) space by end-to-end learning of robot motion and sentences. These observations may indicate the importance of integrated learning of robot motion and sentences since such feature representations cannot be obtained by learning motions alone.
AB - We propose a motion generation model that can achieve robust behavior against environmental changes based on language instructions at a low cost. Conventional robots that communicate with humans use a restricted environment and language to build up a mapping between language and motion, and thus need to prepare a huge training set in order to achieve versatility. Our method trains pairs of language, visual, and motor information of the robot, and generates motions in real-time based on the 'attention' of the language instructions. Specifically, the robot generates motions while focusing on the indicated objects by the human when multiple objects are in the field of view. In addition, since position recognition and motion generation of the indicated object are performed in real-time, robust motion generation is possible in response to changes in the object position and lighting conditions. We clarified that features related to the object name and its location are self-organized in the latent (PB: Parametric Bias) space by end-to-end learning of robot motion and sentences. These observations may indicate the importance of integrated learning of robot motion and sentences since such feature representations cannot be obtained by learning motions alone.
UR - http://www.scopus.com/inward/record.url?scp=85136322782&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85136322782&partnerID=8YFLogxK
U2 - 10.1109/ICRA46639.2022.9811815
DO - 10.1109/ICRA46639.2022.9811815
M3 - Conference contribution
AN - SCOPUS:85136322782
T3 - Proceedings - IEEE International Conference on Robotics and Automation
SP - 5404
EP - 5410
BT - 2022 IEEE International Conference on Robotics and Automation, ICRA 2022
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 23 May 2022 through 27 May 2022
ER -