Incorporating visual features into word embeddings: A bimodal autoencoder-based approach

Mika Hasegawa, Tetsunori Kobayashi, Yoshihiko Hayashi

Research output: Contribution to conferencePaperpeer-review

10 Citations (Scopus)

Abstract

Multimodal semantic representation is an evolving area of research in natural language processing as well as computer vision. Combining or integrating perceptual information, such as visual features, with linguistic features is recently being actively studied. This paper presents a novel bimodal autoencoder model for multimodal representation learning: the autoencoder learns in order to enhance linguistic feature vectors by incorporating the corresponding visual features. During the runtime, owing to the trained neural network, visually enhanced multimodal representations can be achieved even for words for which direct visual-linguistic correspondences are not learned. The empirical results obtained with standard semantic relatedness tasks demonstrate that our approach is generally promising. We further investigate the potential efficacy of the enhanced word embeddings in discriminating antonyms and synonyms from vaguely related words.

Original languageEnglish
Publication statusPublished - 2017
Event12th International Conference on Computational Semantics, IWCS 2017 - Montpellier, France
Duration: 2017 Sept 192017 Sept 22

Conference

Conference12th International Conference on Computational Semantics, IWCS 2017
Country/TerritoryFrance
CityMontpellier
Period17/9/1917/9/22

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Science Applications
  • Information Systems

Fingerprint

Dive into the research topics of 'Incorporating visual features into word embeddings: A bimodal autoencoder-based approach'. Together they form a unique fingerprint.

Cite this