TY - GEN
T1 - Multi-Fingered Dragging of Unknown Objects and Orientations Using Distributed Tactile Information Through Vision-Transformer and LSTM
AU - Ueno, T.
AU - Funabashi, S.
AU - Ito, H.
AU - Schmitz, A.
AU - Kulkarni, S.
AU - Ogata, T.
AU - Sugano, S.
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Multi-fingered hands can be suitable for stable object manipulation. Furthermore, abundant tactile information can be acquired with multi-fingered hands, useful to recognize the object's properties, which is beneficial to adapt the motion to the object. However, generating dexterous manipulation motions with multi-fingered hands with high density tactile sensors is challenging due to complex touch states. Hence, tasks that conventionally require a high level of active tactile sensing simultaneously with motion generation, such as pulling in the hand while recognizing the posture of an object are difficult to accomplish. In this letter, we propose a novel deep predictive learning approach using Vision-Transformer (ViT) and Long-Short Term Memory (LSTM). The ViT's attention mechanism can spatially focus on specific fingers represented by distributed 3-axis tactile sensors (uSkin). The LSTM can preserve long time-series information of the manipulation which can realize changing the desired motion according to the initial touching position and orientation for the target object. Results showed that the ViT-LSTM is effective in performing adaptive finger movements according to the properties of the object, i.e. its hardness and relative posture.
AB - Multi-fingered hands can be suitable for stable object manipulation. Furthermore, abundant tactile information can be acquired with multi-fingered hands, useful to recognize the object's properties, which is beneficial to adapt the motion to the object. However, generating dexterous manipulation motions with multi-fingered hands with high density tactile sensors is challenging due to complex touch states. Hence, tasks that conventionally require a high level of active tactile sensing simultaneously with motion generation, such as pulling in the hand while recognizing the posture of an object are difficult to accomplish. In this letter, we propose a novel deep predictive learning approach using Vision-Transformer (ViT) and Long-Short Term Memory (LSTM). The ViT's attention mechanism can spatially focus on specific fingers represented by distributed 3-axis tactile sensors (uSkin). The LSTM can preserve long time-series information of the manipulation which can realize changing the desired motion according to the initial touching position and orientation for the target object. Results showed that the ViT-LSTM is effective in performing adaptive finger movements according to the properties of the object, i.e. its hardness and relative posture.
UR - http://www.scopus.com/inward/record.url?scp=85216467844&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85216467844&partnerID=8YFLogxK
U2 - 10.1109/IROS58592.2024.10802283
DO - 10.1109/IROS58592.2024.10802283
M3 - Conference contribution
AN - SCOPUS:85216467844
T3 - IEEE International Conference on Intelligent Robots and Systems
SP - 7445
EP - 7452
BT - 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2024
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2024
Y2 - 14 October 2024 through 18 October 2024
ER -