TY - GEN
T1 - Fashion style in 128 floats
T2 - 29th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016
AU - Simo-Serra, Edgar
AU - Ishikawa, Hiroshi
N1 - Publisher Copyright:
© 2016 IEEE.
Copyright:
Copyright 2017 Elsevier B.V., All rights reserved.
PY - 2016/12/9
Y1 - 2016/12/9
N2 - We propose a novel approach for learning features from weakly-supervised data by joint ranking and classification. In order to exploit data with weak labels, we jointly train a feature extraction network with a ranking loss and a classification network with a cross-entropy loss. We obtain high-quality compact discriminative features with few parameters, learned on relatively small datasets without additional annotations. This enables us to tackle tasks with specialized images not very similar to the more generic ones in existing fully-supervised datasets. We show that the resulting features in combination with a linear classifier surpass the state-of-the-art on the Hipster Wars dataset despite using features only 0.3% of the size. Our proposed features significantly outperform those obtained from networks trained on ImageNet, despite being 32 times smaller (128 single-precision floats), trained on noisy and weakly-labeled data, and using only 1.5% of the number of parameters.1.
AB - We propose a novel approach for learning features from weakly-supervised data by joint ranking and classification. In order to exploit data with weak labels, we jointly train a feature extraction network with a ranking loss and a classification network with a cross-entropy loss. We obtain high-quality compact discriminative features with few parameters, learned on relatively small datasets without additional annotations. This enables us to tackle tasks with specialized images not very similar to the more generic ones in existing fully-supervised datasets. We show that the resulting features in combination with a linear classifier surpass the state-of-the-art on the Hipster Wars dataset despite using features only 0.3% of the size. Our proposed features significantly outperform those obtained from networks trained on ImageNet, despite being 32 times smaller (128 single-precision floats), trained on noisy and weakly-labeled data, and using only 1.5% of the number of parameters.1.
UR - http://www.scopus.com/inward/record.url?scp=84986275909&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84986275909&partnerID=8YFLogxK
U2 - 10.1109/CVPR.2016.39
DO - 10.1109/CVPR.2016.39
M3 - Conference contribution
AN - SCOPUS:84986275909
T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
SP - 298
EP - 307
BT - Proceedings - 29th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016
PB - IEEE Computer Society
Y2 - 26 June 2016 through 1 July 2016
ER -