TY - GEN
T1 - Discriminative learning of deep convolutional feature point descriptors
AU - Simo-Serra, Edgar
AU - Trulls, Eduard
AU - Ferraz, Luis
AU - Kokkinos, Iasonas
AU - Fua, Pascal
AU - Moreno-Noguer, Francesc
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2015/2/17
Y1 - 2015/2/17
N2 - Deep learning has revolutionalized image-level tasks such as classification, but patch-level tasks, such as correspondence, still rely on hand-crafted features, e.g. SIFT. In this paper we use Convolutional Neural Networks (CNNs) to learn discriminant patch representations and in particular train a Siamese network with pairs of (non-)corresponding patches. We deal with the large number of potential pairs with the combination of a stochastic sampling of the training set and an aggressive mining strategy biased towards patches that are hard to classify. By using the L2 distance during both training and testing we develop 128-D descriptors whose euclidean distances reflect patch similarity, and which can be used as a drop-in replacement for any task involving SIFT. We demonstrate consistent performance gains over the state of the art, and generalize well against scaling and rotation, perspective transformation, non-rigid deformation, and illumination changes. Our descriptors are efficient to compute and amenable to modern GPUs, and are publicly available.
AB - Deep learning has revolutionalized image-level tasks such as classification, but patch-level tasks, such as correspondence, still rely on hand-crafted features, e.g. SIFT. In this paper we use Convolutional Neural Networks (CNNs) to learn discriminant patch representations and in particular train a Siamese network with pairs of (non-)corresponding patches. We deal with the large number of potential pairs with the combination of a stochastic sampling of the training set and an aggressive mining strategy biased towards patches that are hard to classify. By using the L2 distance during both training and testing we develop 128-D descriptors whose euclidean distances reflect patch similarity, and which can be used as a drop-in replacement for any task involving SIFT. We demonstrate consistent performance gains over the state of the art, and generalize well against scaling and rotation, perspective transformation, non-rigid deformation, and illumination changes. Our descriptors are efficient to compute and amenable to modern GPUs, and are publicly available.
UR - http://www.scopus.com/inward/record.url?scp=84973915418&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84973915418&partnerID=8YFLogxK
U2 - 10.1109/ICCV.2015.22
DO - 10.1109/ICCV.2015.22
M3 - Conference contribution
AN - SCOPUS:84973915418
T3 - Proceedings of the IEEE International Conference on Computer Vision
SP - 118
EP - 126
BT - 2015 International Conference on Computer Vision, ICCV 2015
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 15th IEEE International Conference on Computer Vision, ICCV 2015
Y2 - 11 December 2015 through 18 December 2015
ER -