TY - GEN
T1 - Exploring and exploiting the hierarchical structure of a scene for scene graph generation
AU - Kurosawa, Ikuto
AU - Kobayashi, Tetsunori
AU - Hayashi, Yoshihiko
N1 - Funding Information:
The present work was partially supported by JSPS KAK-ENHI Grants number 17H01831.
Publisher Copyright:
© 2020 IEEE
PY - 2020
Y1 - 2020
N2 - The scene graph of an image is an explicit, concise representation of the image; hence, it can be used in various applications such as visual question answering or robot vision. We propose a novel neural network model for generating scene graphs that maintain global consistency, which prevents the generation of unrealistic scene graphs; the performance in the scene graph generation task is expected to improve. Our proposed model is used to construct a hierarchical structure whose leaf nodes correspond to objects depicted in the image, and a message is passed along the estimated structure on the fly. To this end, we aggregate features of all objects into the root node of the hierarchical structure, and the global context is back-propagated to the root node to maintain all the object nodes. The experimental results on the Visual Genome dataset indicate that the proposed model outperformed the existing models in scene graph generation tasks. We further qualitatively confirmed that the hierarchical structures captured by the proposed model seemed to be valid.
AB - The scene graph of an image is an explicit, concise representation of the image; hence, it can be used in various applications such as visual question answering or robot vision. We propose a novel neural network model for generating scene graphs that maintain global consistency, which prevents the generation of unrealistic scene graphs; the performance in the scene graph generation task is expected to improve. Our proposed model is used to construct a hierarchical structure whose leaf nodes correspond to objects depicted in the image, and a message is passed along the estimated structure on the fly. To this end, we aggregate features of all objects into the root node of the hierarchical structure, and the global context is back-propagated to the root node to maintain all the object nodes. The experimental results on the Visual Genome dataset indicate that the proposed model outperformed the existing models in scene graph generation tasks. We further qualitatively confirmed that the hierarchical structures captured by the proposed model seemed to be valid.
UR - http://www.scopus.com/inward/record.url?scp=85110551226&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85110551226&partnerID=8YFLogxK
U2 - 10.1109/ICPR48806.2021.9413251
DO - 10.1109/ICPR48806.2021.9413251
M3 - Conference contribution
AN - SCOPUS:85110551226
T3 - Proceedings - International Conference on Pattern Recognition
SP - 1422
EP - 1429
BT - Proceedings of ICPR 2020 - 25th International Conference on Pattern Recognition
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 25th International Conference on Pattern Recognition, ICPR 2020
Y2 - 10 January 2021 through 15 January 2021
ER -