TY - GEN
T1 - Guided Visual Attention Model Based on Interactions Between Top-down and Bottom-up Prediction for Robot Pose Prediction
AU - Hiruma, Hyogo
AU - Mori, Hiroki
AU - Ito, Hiroshi
AU - Ogata, Tetsuya
N1 - Funding Information:
ACKNOWLEDGMENT This work was supported by JST Moonshot R&D Grant Number JPMJMS2031 and by JSPS KAKENHI Grant Number JP21H05138.
Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Deep robot vision models are widely used for recognizing objects from camera images, but shows poor performance when detecting objects at untrained positions. Although such problem can be alleviated by training with large datasets, the dataset collection cost cannot be ignored. Existing visual attention models tackled the problem by employing a data efficient structure which learns to extract task relevant image areas. However, since the models cannot modify attention targets after training, it is difficult to apply to dynamically changing tasks. This paper proposed a novel Key-Query-Value formulated visual attention model. This model is capable of switching attention targets by externally modifying the Query representations, namely top-down attention. The proposed model is experimented on a simulator and a real-world environment. The model was compared to existing end-to-end robot vision models in the simulator experiments, showing higher performance and data efficiency. In the real-world robot experiments, the model showed high precision along with its scalability and extendibility.
AB - Deep robot vision models are widely used for recognizing objects from camera images, but shows poor performance when detecting objects at untrained positions. Although such problem can be alleviated by training with large datasets, the dataset collection cost cannot be ignored. Existing visual attention models tackled the problem by employing a data efficient structure which learns to extract task relevant image areas. However, since the models cannot modify attention targets after training, it is difficult to apply to dynamically changing tasks. This paper proposed a novel Key-Query-Value formulated visual attention model. This model is capable of switching attention targets by externally modifying the Query representations, namely top-down attention. The proposed model is experimented on a simulator and a real-world environment. The model was compared to existing end-to-end robot vision models in the simulator experiments, showing higher performance and data efficiency. In the real-world robot experiments, the model showed high precision along with its scalability and extendibility.
KW - neural networks
KW - robotics
KW - visual attention
UR - http://www.scopus.com/inward/record.url?scp=85143899537&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85143899537&partnerID=8YFLogxK
U2 - 10.1109/IECON49645.2022.9969015
DO - 10.1109/IECON49645.2022.9969015
M3 - Conference contribution
AN - SCOPUS:85143899537
T3 - IECON Proceedings (Industrial Electronics Conference)
BT - IECON 2022 - 48th Annual Conference of the IEEE Industrial Electronics Society
PB - IEEE Computer Society
T2 - 48th Annual Conference of the IEEE Industrial Electronics Society, IECON 2022
Y2 - 17 October 2022 through 20 October 2022
ER -