Multi-modal Embedding for Main Product Detection in Fashion

Long Long Yu, Edgar Simo-Serra, Francesc Moreno-Noguer, Antonio Rubio

Research output: Chapter in Book/Report/Conference proceedingConference contribution

11 Citations (Scopus)

Abstract

We present an approach to detect the main product in fashion images by exploiting the textual metadata associated with each image. Our approach is based on a Convolutional Neural Network and learns a joint embedding of object proposals and textual metadata to predict the main product in the image. We additionally use several complementary classification and overlap losses in order to improve training stability and performance. Our tests on a large-scale dataset taken from eight e-commerce sites show that our approach outperforms strong baselines and is able to accurately detect the main product in a wide diversity of challenging fashion images.

Original languageEnglish
Title of host publicationProceedings - 2017 IEEE International Conference on Computer Vision Workshops, ICCVW 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2236-2242
Number of pages7
ISBN (Electronic)9781538610343
DOIs
Publication statusPublished - 2017 Jul 1
Externally publishedYes
Event16th IEEE International Conference on Computer Vision Workshops, ICCVW 2017 - Venice, Italy
Duration: 2017 Oct 222017 Oct 29

Publication series

NameProceedings - 2017 IEEE International Conference on Computer Vision Workshops, ICCVW 2017
Volume2018-January

Other

Other16th IEEE International Conference on Computer Vision Workshops, ICCVW 2017
Country/TerritoryItaly
CityVenice
Period17/10/2217/10/29

ASJC Scopus subject areas

  • Computer Science Applications
  • Computer Vision and Pattern Recognition

Fingerprint

Dive into the research topics of 'Multi-modal Embedding for Main Product Detection in Fashion'. Together they form a unique fingerprint.

Cite this