Abstract
Multimodal semantic representation is an evolving area of research in natural language processing as well as computer vision. Combining or integrating perceptual information, such as visual features, with linguistic features is recently being actively studied. This paper presents a novel bimodal autoencoder model for multimodal representation learning: the autoencoder learns in order to enhance linguistic feature vectors by incorporating the corresponding visual features. During the runtime, owing to the trained neural network, visually enhanced multimodal representations can be achieved even for words for which direct visual-linguistic correspondences are not learned. The empirical results obtained with standard semantic relatedness tasks demonstrate that our approach is generally promising. We further investigate the potential efficacy of the enhanced word embeddings in discriminating antonyms and synonyms from vaguely related words.
Original language | English |
---|---|
Publication status | Published - 2017 |
Event | 12th International Conference on Computational Semantics, IWCS 2017 - Montpellier, France Duration: 2017 Sept 19 → 2017 Sept 22 |
Conference
Conference | 12th International Conference on Computational Semantics, IWCS 2017 |
---|---|
Country/Territory | France |
City | Montpellier |
Period | 17/9/19 → 17/9/22 |
ASJC Scopus subject areas
- Computer Networks and Communications
- Computer Science Applications
- Information Systems