Breakdown: Phototourism | Stereo | Sequences

Breakdown on the Phototourism dataset, multi-view stereo task, by sequence.



Stereo — All sequences — Sorted by mAP15o
Method BM FCS LMS LB MC MR PSM RS SF SPC USC AVG Date Type By Details Link Contact Updated Descriptor size
AKAZE (OpenCV)
kp:8000, match:nn
0.0558 0.0133 0.0177 0.0575 0.0207 0.0112 0.0210 0.0329 0.0139 0.0102 0.0628 0.0288 19-04-24 F Challenge organizers AKAZE, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A TBA
SphereDesc
kp:8000, match:nn
0.0755 0.0333 0.0449 0.0508 0.0472 0.0189 0.0217 0.0491 0.0427 0.0253 0.0338 0.0403 19-04-26 F Anonymous We use OpenCV's implementation of AKAZE detector, and for each keypoint we extract a descriptor via CNN. N/A Anonymous N/A 256 float32
Brisk + SSS
kp:8000, match:nn
0.0681 0.0207 0.0123 0.0602 0.0289 0.0156 0.0210 0.0415 0.0299 0.0196 0.0514 0.0336 19-05-14 F Anonymous We use OpenCV's implementation of brisk detector with the default settings, and for each image there are at most 8K keypoints. For each keypoint, we extract a descriptor via a CNN model. TBA Anonymous N/A 128 float32
D2-Net (single scale)
kp:8000, match:nn
0.0779 0.0335 0.0867 0.0616 0.0472 0.0223 0.0317 0.0420 0.0432 0.0244 0.0366 0.0461 19-05-07 F Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf https://github.com/mihaidusmanu/d2-net imagematching@uvic.ca N/A 512 float32
D2-Net (multiscale)
kp:8000, match:nn
0.0798 0.0376 0.0921 0.0665 0.0482 0.0257 0.0307 0.0434 0.0398 0.0309 0.0439 0.0490 19-05-07 F Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf https://github.com/mihaidusmanu/d2-net imagematching@uvic.ca N/A 512 float32
D2-Net (single scale, no PT dataset)
kp:8000, match:nn
0.0794 0.0331 0.0832 0.0547 0.0455 0.0204 0.0305 0.0443 0.0409 0.0215 0.0408 0.0449 19-06-01 F Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf https://github.com/mihaidusmanu/d2-net imagematching@uvic.ca N/A 512 float32
D2-Net (multi-scale, no PT dataset)
kp:8000, match:nn
0.0798 0.0358 0.0892 0.0735 0.0486 0.0234 0.0263 0.0491 0.0421 0.0261 0.0449 0.0490 19-06-05 F Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf https://github.com/mihaidusmanu/d2-net imagematching@uvic.ca N/A 512 float32
DELF
kp:1024, match:nn
0.0732 0.0184 0.0411 0.0366 0.0295 0.0124 0.0288 0.0324 0.0151 0.0173 0.0222 0.0297 19-05-05 F Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 https://github.com/tensorflow/models/tree/master/research/delf imagematching@uvic.ca N/A 40 float32
DELF
kp:2048, match:nn
0.0765 0.0247 0.0367 0.0508 0.0364 0.0183 0.0236 0.0277 0.0199 0.0167 0.0252 0.0324 19-05-05 F Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 https://github.com/tensorflow/models/tree/master/research/delf imagematching@uvic.ca N/A 40 float32
DELF
kp:256, match:nn
0.0626 0.0094 0.0215 0.0178 0.0238 0.0040 0.0236 0.0277 0.0056 0.0134 0.0298 0.0217 19-05-05 F Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 https://github.com/tensorflow/models/tree/master/research/delf imagematching@uvic.ca N/A 40 float32
DELF
kp:512, match:nn
0.0685 0.0148 0.0300 0.0282 0.0236 0.0099 0.0283 0.0377 0.0116 0.0173 0.0229 0.0266 19-05-05 F Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 https://github.com/tensorflow/models/tree/master/research/delf imagematching@uvic.ca N/A 40 float32
ELF-256D
kp:512, match:nn
0.0554 0.0092 0.0156 0.0432 0.0185 0.0116 0.0227 0.0367 0.0091 0.0050 0.0262 0.0230 19-05-07 F Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. TBA Anonymous N/A 256 float32
ELF-512D
kp:512, match:nn
0.0632 0.0171 0.0257 0.0540 0.0213 0.0128 0.0227 0.0372 0.0095 0.0098 0.0280 0.0274 19-05-09 F Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. TBA Anonymous N/A 256 float32
ELF-SIFT
kp:512, match:nn
0.0569 0.0178 0.0297 0.0571 0.0236 0.0107 0.0263 0.0320 0.0207 0.0092 0.0310 0.0286 19-04-26 F Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are HOG (as in SIFT). N/A Anonymous N/A 128 uint8
SIFT + GeoDesc
kp:2048, match:nn
0.0708 0.0202 0.0264 0.0310 0.0293 0.0185 0.0239 0.0448 0.0263 0.0144 0.0386 0.0313 19-05-19 F Challenge organizers GeoDesc extracted on SIFT keypoints. Feature matching with brute-force nearest-neighbour search. https://github.com/lzx551402/geodesc imagematching@uvic.ca N/A 128 float32
SIFT + HardNet
kp:2048, match:nn
0.0753 0.0245 0.0255 0.0303 0.0323 0.0126 0.0197 0.0448 0.0295 0.0148 0.0411 0.0319 19-05-19 F Challenge organizers HardNet extracted on SIFT keypoints. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger patches than specified for SIFT (scale multiplying factor 1612). Feature matching with brute-force nearest-neighbour search. https://github.com/DagnyT/hardnet imagematching@uvic.ca N/A 128 float32
HarrisZ/RsGLOH2
kp:8000, match:sGOr2f
0.0794 0.0265 0.0537 0.0669 0.0429 0.0213 0.0217 0.0358 0.0346 0.0180 0.0509 0.0411 19-05-23 F Fabio Bellavia and Carlo Colombo HarrisZ keypoints and Root squared (like RootSIFT) sGLOH2 descriptors using sGOr2f* matching strategy. Distance table entries lower than their respective row-wise and column-wise averages are discarded and then matching pairs are computed using the greedy nearest neighbour as in the WISW@CAIP2019 contest. Keypoints and matches are ordered according to their ranks (best ones first). http://cvg.dsi.unifi.it/cvg/index.php?id=research#descriptor bellavia.fabio@gmail.com N/A 256 float32
HesAffNet - HardNet2
kp:8000, match:nn
0.0767 0.0281 0.0608 0.0286 0.0366 0.0164 0.0217 0.0548 0.0407 0.0196 0.0338 0.0380 19-05-29 F Milan Pultar, Dmytro Mishkin, Jiří Matas Detector: Hessian + AffNet(affine shape) + OriNet (orientation), Code and weight from https://github.com/ducha-aiki/affnet, Paper: https://arxiv.org/abs/1711.06704; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: https://github.com/pultarmi/HardNet_MultiDataset, Paper: https://arxiv.org/pdf/1901.09780.pdf; DeepNets were plugged into MODS framework, but without view synthesis and matching https://github.com/ducha-aiki/mods-light-zmq; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. TBA ducha.aiki@gmail.com N/A 128 uint8
Hessian - HardNet2
kp:8000, match:nn
0.0759 0.0279 0.0798 0.0404 0.0415 0.0198 0.0227 0.0515 0.0384 0.0219 0.0381 0.0416 19-05-30 F Milan Pultar, Dmytro Mishkin, Jiří Matas Detector: Hessian. Gravity vector orientation is assumed; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: https://github.com/pultarmi/HardNet_MultiDataset Paper: https://arxiv.org/pdf/1901.09780.pdf, DeepNets were plugged into MODS framework, but without view synthesis and matching https://github.com/ducha-aiki/mods-light-zmq; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. TBA ducha.aiki@gmail.com N/A 128 uint8
ORB (OpenCV)
kp:8000, match:nn
0.0501 0.0085 0.0072 0.0233 0.0110 0.0074 0.0158 0.0324 0.0098 0.0079 0.0343 0.0189 19-04-24 F Challenge organizers ORB, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A TBA
Scale-invariant desc. (Log-Polar, lambda=32)
kp:8000, match:nn
0.0804 0.0252 0.0577 0.0637 0.0500 0.0192 0.0229 0.0453 0.0413 0.0165 0.0393 0.0420 19-06-25 F Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA patrick.ebel@epfl.ch N/A 128 float32
Scale-invariant desc. (Log-Polar, lambda=64)
kp:8000, match:nn
0.0812 0.0295 0.0577 0.0707 0.0502 0.0211 0.0273 0.0429 0.0371 0.0203 0.0378 0.0432 19-06-24 F Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA patrick.ebel@epfl.ch N/A 128 float32
Scale-invariant desc. (Log-Polar, lambda=96)
kp:8000, match:nn
0.0828 0.0288 0.0586 0.0790 0.0463 0.0202 0.0246 0.0529 0.0394 0.0180 0.0416 0.0448 19-06-20 F Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA patrick.ebel@epfl.ch N/A 128 float32
SIFT-AID (NN matcher)
kp:8000, match:nn
0.0640 0.0126 0.0098 0.0286 0.0201 0.0103 0.0146 0.0291 0.0158 0.0130 0.0305 0.0226 19-05-10 F Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: https://hal.archives-ouvertes.fr/hal-02016010. Code: https://github.com/rdguez-mariano/sift-aid https://hal.archives-ouvertes.fr/hal-02016010 facciolo@cmla.ens-cachan.fr N/A 6272 bits
SIFT-AID (custom matcher)
kp:8000, match:sift-aid
0.0638 0.0117 0.0056 0.0268 0.0222 0.0107 0.0134 0.0324 0.0129 0.0092 0.0313 0.0218 19-04-29 F/M Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: https://hal.archives-ouvertes.fr/hal-02016010. Code: https://github.com/rdguez-mariano/sift-aid https://hal.archives-ouvertes.fr/hal-02016010 facciolo@cmla.ens-cachan.fr N/A 6272 bits
SIFT + ContextDesc
kp:8000, match:nn
0.0824 0.0279 0.0503 0.0592 0.0577 0.0244 0.0266 0.0510 0.0463 0.0192 0.0383 0.0439 19-05-09 F Zixin Luo ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. https://github.com/lzx551402/contextdesc zluoag@cse.ust.hk N/A 128 uint8
SIFT-Dense-ContextDesc
kp:8000, match:nn
0.0830 0.0301 0.0537 0.0547 0.0492 0.0223 0.0288 0.0558 0.0427 0.0196 0.0358 0.0432 19-05-28 F Zixin Luo, Jiahui Zhang Dense-ContextDesc is a variant of ContextDesc, where descriptors are densely extracted from full images, instead of image patches, while other settings stay unchanged as original ContextDesc. We find Dense-ContextDesc performs better regarding in particular illumination changes. Dense-ContextDesc is extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. The model will be available on the authors' GitHub page. TBA zluoag@cse.ust.hk N/A TBA
SIFT + ContextDesc + Inlier Classification V2
kp:8000, match:custom
0.0902 0.0569 0.1138 0.1452 0.0860 0.0941 0.0478 0.1140 0.0654 0.0549 0.0373 0.0823 19-05-28 F/M Dawei Sun, Zixin Luo, Jiahui Zhang We use the SIFT detector and ContextDesc descriptor, and then we train an improved inlier classification and fundamental matrix estimation network based on Yi et al. CVPR2018. https://github.com/lzx551402/contextdesc zluoag@cse.ust.hk N/A 128 float32
SIFT-GeoDesc-GitHub
kp:8000, match:nn
0.0683 0.0234 0.0418 0.0411 0.0437 0.0143 0.0256 0.0448 0.0346 0.0157 0.0512 0.0368 19-05-08 F Zixin Luo GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features extracted from the code provided by the authors. https://arxiv.org/abs/1807.06294 zluoag@cse.ust.hk N/A 128 float32
SIFT + GeoDesc
kp:8000, match:nn
0.0751 0.0263 0.0391 0.0463 0.0392 0.0177 0.0239 0.0429 0.0317 0.0159 0.0459 0.0367 19-04-24 F Challenge organizers GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. https://github.com/lzx551402/geodesc imagematching@uvic.ca N/A 128 float32
SIFT + HardNet
kp:8000, match:nn
0.0810 0.0295 0.0615 0.0477 0.0545 0.0198 0.0241 0.0486 0.0402 0.0201 0.0406 0.0425 19-04-24 F Challenge organizers HardNet extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 1612). Feature matching with brute-force nearest-neighbour search. https://github.com/DagnyT/hardnet imagematching@uvic.ca N/A 128 float32
SIFT + L2-Net
kp:8000, match:nn
0.0835 0.0265 0.0438 0.0439 0.0457 0.0221 0.0244 0.0491 0.0386 0.0196 0.0429 0.0400 19-04-24 F Challenge organizers L2-Net extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 1612). Feature matching with brute-force nearest-neighbour search. https://github.com/yuruntian/L2-Net imagematching@uvic.ca N/A 128 float32
SIFT + ContextDesc + Inlier Classification V1
kp:8000, match:custom
0.0998 0.0495 0.1035 0.1459 0.0795 0.0762 0.0404 0.1021 0.0577 0.0426 0.0295 0.0752 19-05-29 F/M Dawei Sun, Zixin Luo, Jiahui Zhang We use the SIFT detector and ContextDesc descriptor, and then we train an inlier classification and fundamental matrix estimation network using the architecture of Yi et al. CVPR2018. https://github.com/lzx551402/contextdesc zluoag@cse.ust.hk N/A 128 float32
SIFT (OpenCV)
kp:8000, match:nn
0.0548 0.0155 0.0168 0.0414 0.0226 0.0158 0.0202 0.0401 0.0156 0.0113 0.0504 0.0277 19-04-24 F Challenge organizers SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A 128 float32
SIFT + TFeat
kp:8000, match:nn
0.0736 0.0250 0.0329 0.0494 0.0419 0.0194 0.0224 0.0467 0.0266 0.0163 0.0381 0.0357 19-04-24 F Challenge organizers T-Feat extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 1612). Feature matching with brute-force nearest-neighbour search. https://github.com/vbalnt/tfeat imagematching@uvic.ca N/A 128 float32
SIFT (OpenCV)
kp:2048, match:nn
0.0558 0.0112 0.0127 0.0327 0.0169 0.0128 0.0161 0.0334 0.0135 0.0092 0.0310 0.0223 19-05-17 F Challenge organizers SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A 128 float32
Superpoint (nn matcher)
kp:2048, match:nn
0.0726 0.0283 0.0644 0.0672 0.0429 0.0126 0.0285 0.0439 0.0409 0.0194 0.0353 0.0415 19-06-07 F Challenge organizers SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. TBA imagematching@uvic.ca N/A 256 float32
Superpoint (1:1 matcher)
kp:2048, match:nn1to1
0.0865 0.0418 0.0825 0.0766 0.0551 0.0189 0.0339 0.0563 0.0506 0.0292 0.0398 0.0519 19-06-07 F Challenge organizers SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. TBA imagematching@uvic.ca N/A 256 float32
SuperPoint (default)
kp:2048, match:nn
0.0804 0.0297 0.0543 0.0776 0.0398 0.0135 0.0295 0.0477 0.0355 0.0171 0.0305 0.0414 19-04-24 F Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned (about 1200 on average). Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:1024, match:nn
0.0800 0.0218 0.0590 0.0804 0.0394 0.0124 0.0273 0.0424 0.0322 0.0173 0.0300 0.0402 19-04-26 F Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:2048, match:nn
0.0814 0.0288 0.0644 0.0696 0.0429 0.0139 0.0285 0.0420 0.0361 0.0182 0.0318 0.0416 19-04-26 F Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:256, match:nn
0.0661 0.0133 0.0221 0.0710 0.0248 0.0105 0.0283 0.0391 0.0168 0.0104 0.0366 0.0308 19-04-26 F Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:512, match:nn
0.0777 0.0178 0.0445 0.0829 0.0315 0.0143 0.0273 0.0415 0.0245 0.0171 0.0358 0.0377 19-04-26 F Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:8000, match:nn
0.0781 0.0322 0.0653 0.0484 0.0394 0.0103 0.0314 0.0410 0.0340 0.0180 0.0272 0.0387 19-04-26 F Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
Scale-invariant desc. (Cartesian, lambda=16)
kp:8000, match:nn
0.0855 0.0236 0.0485 0.0463 0.0498 0.0192 0.0239 0.0491 0.0421 0.0192 0.0383 0.0405 19-07-29 F Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. (This is a baseline where we use cartesian patches instead.) Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA patrick.ebel@epfl.ch N/A 128 float32
SuperPoint (trained on coco + phototourism training set)
kp:2048, match:nn
0.0771 0.0245 0.0519 0.0644 0.0439 0.0145 0.0270 0.0424 0.0309 0.0186 0.0330 0.0389 19-05-30 F Daniel DeTone, Paul Sarlin, Tomasz Malisiewicz, Andrew Rabinovich SuperPoint V1 model trained on COCO homographic warps at VGA resolution, plus pairs from the phototourism training set using the GT poses and depths for correspondence. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork ddetone@magicleap.com N/A 256 float32
SuperPoint + Custom Matcher (v1)
kp:2048, match:custom
0.0925 0.0427 0.0499 0.1323 0.0571 0.0362 0.0363 0.0796 0.0442 0.0422 0.0361 0.0590 19-05-30 F/M Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich Features are extracted by a modified SuperPoint: - retrained on the phototourism training set; - keypoints refinement; - better descriptor sampling; - adjusted thresholds. A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). TBA pesarlin@magicleap.com N/A 256 float32
SuperPoint + Custom Matcher (v2)
kp:2048, match:custom
0.1082 0.0423 0.0474 0.1466 0.0596 0.0415 0.0412 0.0954 0.0488 0.0399 0.0333 0.0640 19-05-28 F/M Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich Features are extracted by a modified SuperPoint: - retrained on the phototourism training set, - keypoints refinement, - better descriptor sampling, - adjusted thresholds; A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). TBA pesarlin@magicleap.com N/A 256 float32
SURF (OpenCV)
kp:8000, match:nn
0.0438 0.0088 0.0136 0.0272 0.0213 0.0107 0.0197 0.0348 0.0085 0.0069 0.0361 0.0210 19-04-24 F Challenge organizers SURF, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A TBA
SIFT + ContextDesc
kp:8000, match:nn1to1
0.0794 0.0373 0.0736 0.0686 0.0636 0.0242 0.0312 0.0529 0.0554 0.0286 0.0492 0.0513 19-06-07 F Challenge organizers ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with nearest-neighbour search, enforcing cross-match consistency. Features are quantized to uint8 and extracted from the code provided by the authors. https://github.com/lzx551402/contextdesc imagematching@uvic.ca N/A TBA


Results for individual sequences:


Stereo — sequence 'british_museum'
Method Date Type #kp MS mAP5o mAP10o mAP15o mAP20o mAP25o By Details Link Contact Updated Descriptor size
AKAZE (OpenCV)
kp:8000, match:nn
19-04-24 F 7963.6 0.2278 0.0016 0.0176 0.0558 0.1072 0.1622 Challenge organizers AKAZE, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A TBA
SphereDesc
kp:8000, match:nn
19-04-26 F 4451.2 0.2861 0.0014 0.0278 0.0755 0.1462 0.2270 Anonymous We use OpenCV's implementation of AKAZE detector, and for each keypoint we extract a descriptor via CNN. N/A Anonymous N/A 256 float32
Brisk + SSS
kp:8000, match:nn
19-05-14 F 4427.3 0.2965 0.0014 0.0205 0.0681 0.1293 0.1839 Anonymous We use OpenCV's implementation of brisk detector with the default settings, and for each image there are at most 8K keypoints. For each keypoint, we extract a descriptor via a CNN model. TBA Anonymous N/A 128 float32
D2-Net (single scale)
kp:8000, match:nn
19-05-07 F 4846.2 0.2745 0.0014 0.0309 0.0779 0.1452 0.2203 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf https://github.com/mihaidusmanu/d2-net imagematching@uvic.ca N/A 512 float32
D2-Net (multiscale)
kp:8000, match:nn
19-05-07 F 6052.9 0.2859 0.0022 0.0305 0.0798 0.1487 0.2227 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf https://github.com/mihaidusmanu/d2-net imagematching@uvic.ca N/A 512 float32
D2-Net (single scale, no PT dataset)
kp:8000, match:nn
19-06-01 F 4888.7 0.2739 0.0033 0.0329 0.0794 0.1440 0.2184 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf https://github.com/mihaidusmanu/d2-net imagematching@uvic.ca N/A 512 float32
D2-Net (multi-scale, no PT dataset)
kp:8000, match:nn
19-06-05 F 6008.1 0.2850 0.0010 0.0301 0.0798 0.1483 0.2201 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf https://github.com/mihaidusmanu/d2-net imagematching@uvic.ca N/A 512 float32
DELF
kp:1024, match:nn
19-05-05 F 1024.0 0.2996 0.0012 0.0288 0.0732 0.1258 0.1798 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 https://github.com/tensorflow/models/tree/master/research/delf imagematching@uvic.ca N/A 40 float32
DELF
kp:2048, match:nn
19-05-05 F 2046.9 0.2911 0.0014 0.0282 0.0765 0.1293 0.1814 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 https://github.com/tensorflow/models/tree/master/research/delf imagematching@uvic.ca N/A 40 float32
DELF
kp:256, match:nn
19-05-05 F 256.0 0.3004 0.0010 0.0241 0.0626 0.1152 0.1714 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 https://github.com/tensorflow/models/tree/master/research/delf imagematching@uvic.ca N/A 40 float32
DELF
kp:512, match:nn
19-05-05 F 512.0 0.3021 0.0008 0.0215 0.0685 0.1252 0.1741 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 https://github.com/tensorflow/models/tree/master/research/delf imagematching@uvic.ca N/A 40 float32
ELF-256D
kp:512, match:nn
19-05-07 F 421.4 0.2522 0.0018 0.0198 0.0554 0.0994 0.1458 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. TBA Anonymous N/A 256 float32
ELF-512D
kp:512, match:nn
19-05-09 F 421.4 0.2844 0.0020 0.0221 0.0632 0.1127 0.1663 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. TBA Anonymous N/A 256 float32
ELF-SIFT
kp:512, match:nn
19-04-26 F 490.2 0.2320 0.0018 0.0237 0.0569 0.0998 0.1507 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are HOG (as in SIFT). N/A Anonymous N/A 128 uint8
SIFT + GeoDesc
kp:2048, match:nn
19-05-19 F 2024.5 0.2448 0.0014 0.0288 0.0708 0.1266 0.1847 Challenge organizers GeoDesc extracted on SIFT keypoints. Feature matching with brute-force nearest-neighbour search. https://github.com/lzx551402/geodesc imagematching@uvic.ca N/A 128 float32
SIFT + HardNet
kp:2048, match:nn
19-05-19 F 2024.5 0.2511 0.0025 0.0274 0.0753 0.1293 0.1886 Challenge organizers HardNet extracted on SIFT keypoints. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger patches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. https://github.com/DagnyT/hardnet imagematching@uvic.ca N/A 128 float32
HarrisZ/RsGLOH2
kp:8000, match:sGOr2f
19-05-23 F 1696.2 0.2468 0.0016 0.0241 0.0794 0.1403 0.2064 Fabio Bellavia and Carlo Colombo HarrisZ keypoints and Root squared (like RootSIFT) sGLOH2 descriptors using sGOr2f* matching strategy. Distance table entries lower than their respective row-wise and column-wise averages are discarded and then matching pairs are computed using the greedy nearest neighbour as in the WISW@CAIP2019 contest. Keypoints and matches are ordered according to their ranks (best ones first). http://cvg.dsi.unifi.it/cvg/index.php?id=research#descriptor bellavia.fabio@gmail.com N/A 256 float32
HesAffNet - HardNet2
kp:8000, match:nn
19-05-29 F 7670.5 0.2575 0.0022 0.0272 0.0767 0.1362 0.1925 Milan Pultar, Dmytro Mishkin, Jiří Matas Detector: Hessian + AffNet(affine shape) + OriNet (orientation), Code and weight from https://github.com/ducha-aiki/affnet, Paper: https://arxiv.org/abs/1711.06704; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: https://github.com/pultarmi/HardNet_MultiDataset, Paper: https://arxiv.org/pdf/1901.09780.pdf; DeepNets were plugged into MODS framework, but without view synthesis and matching https://github.com/ducha-aiki/mods-light-zmq; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. TBA ducha.aiki@gmail.com N/A 128 uint8
Hessian - HardNet2
kp:8000, match:nn
19-05-30 F 7767.3 0.2786 0.0033 0.0297 0.0759 0.1409 0.2144 Milan Pultar, Dmytro Mishkin, Jiří Matas Detector: Hessian. Gravity vector orientation is assumed; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: https://github.com/pultarmi/HardNet_MultiDataset Paper: https://arxiv.org/pdf/1901.09780.pdf, DeepNets were plugged into MODS framework, but without view synthesis and matching https://github.com/ducha-aiki/mods-light-zmq; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. TBA ducha.aiki@gmail.com N/A 128 uint8
ORB (OpenCV)
kp:8000, match:nn
19-04-24 F 7494.1 0.2209 0.0012 0.0164 0.0501 0.0972 0.1557 Challenge organizers ORB, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A TBA
Scale-invariant desc. (Log-Polar, lambda=32)
kp:8000, match:nn
19-06-25 F 7968.4 0.2670 0.0025 0.0292 0.0804 0.1426 0.2031 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA patrick.ebel@epfl.ch N/A 128 float32
Scale-invariant desc. (Log-Polar, lambda=64)
kp:8000, match:nn
19-06-24 F 7968.4 0.2776 0.0027 0.0333 0.0812 0.1432 0.2049 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA patrick.ebel@epfl.ch N/A 128 float32
Scale-invariant desc. (Log-Polar, lambda=96)
kp:8000, match:nn
19-06-20 F 7968.4 0.2820 0.0027 0.0325 0.0828 0.1467 0.2025 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA patrick.ebel@epfl.ch N/A 128 float32
SIFT-AID (NN matcher)
kp:8000, match:nn
19-05-10 F 2910.1 0.2466 0.0012 0.0229 0.0640 0.1133 0.1597 Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: https://hal.archives-ouvertes.fr/hal-02016010. Code: https://github.com/rdguez-mariano/sift-aid https://hal.archives-ouvertes.fr/hal-02016010 facciolo@cmla.ens-cachan.fr N/A 6272 bits
SIFT-AID (custom matcher)
kp:8000, match:sift-aid
19-04-29 F/M 2910.1 0.2453 0.0018 0.0211 0.0638 0.1105 0.1585 Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: https://hal.archives-ouvertes.fr/hal-02016010. Code: https://github.com/rdguez-mariano/sift-aid https://hal.archives-ouvertes.fr/hal-02016010 facciolo@cmla.ens-cachan.fr N/A 6272 bits
SIFT + ContextDesc
kp:8000, match:nn
19-05-09 F 7421.8 0.3115 0.0025 0.0321 0.0824 0.1393 0.2101 Zixin Luo ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. https://github.com/lzx551402/contextdesc zluoag@cse.ust.hk N/A 128 uint8
SIFT-Dense-ContextDesc
kp:8000, match:nn
19-05-28 F 7421.6 0.2869 0.0029 0.0288 0.0830 0.1458 0.2021 Zixin Luo, Jiahui Zhang Dense-ContextDesc is a variant of ContextDesc, where descriptors are densely extracted from full images, instead of image patches, while other settings stay unchanged as original ContextDesc. We find Dense-ContextDesc performs better regarding in particular illumination changes. Dense-ContextDesc is extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. The model will be available on the authors' GitHub page. TBA zluoag@cse.ust.hk N/A TBA
SIFT + ContextDesc + Inlier Classification V2
kp:8000, match:custom
19-05-28 F/M 7421.8 0.4302 0.0022 0.0299 0.0902 0.2009 0.3291 Dawei Sun, Zixin Luo, Jiahui Zhang We use the SIFT detector and ContextDesc descriptor, and then we train an improved inlier classification and fundamental matrix estimation network based on [Yi et al. CVPR2018] (https://arxiv.org/pdf/1711.05971.pdf). https://github.com/lzx551402/contextdesc zluoag@cse.ust.hk N/A 128 float32
SIFT-GeoDesc-GitHub
kp:8000, match:nn
19-05-08 F 7421.8 0.2174 0.0016 0.0233 0.0683 0.1197 0.1690 Zixin Luo GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features extracted from the code provided by the authors. https://arxiv.org/abs/1807.06294 zluoag@cse.ust.hk N/A 128 float32
SIFT + GeoDesc
kp:8000, match:nn
19-04-24 F 7968.5 0.2287 0.0022 0.0305 0.0751 0.1242 0.1804 Challenge organizers GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. https://github.com/lzx551402/geodesc imagematching@uvic.ca N/A 128 float32
SIFT + HardNet
kp:8000, match:nn
19-04-24 F 7968.5 0.2588 0.0025 0.0290 0.0810 0.1368 0.1964 Challenge organizers HardNet extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. https://github.com/DagnyT/hardnet imagematching@uvic.ca N/A 128 float32
SIFT + L2-Net
kp:8000, match:nn
19-04-24 F 7968.5 0.2474 0.0029 0.0319 0.0835 0.1338 0.1912 Challenge organizers L2-Net extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. https://github.com/yuruntian/L2-Net imagematching@uvic.ca N/A 128 float32
SIFT + ContextDesc + Inlier Classification V1
kp:8000, match:custom
19-05-29 F/M 7421.8 0.4256 0.0014 0.0325 0.0998 0.2107 0.3453 Dawei Sun, Zixin Luo, Jiahui Zhang We use the SIFT detector and ContextDesc descriptor, and then we train an inlier classification and fundamental matrix estimation network using the architecture of [Yi et al. CVPR2018] (https://arxiv.org/pdf/1711.05971.pdf). https://github.com/lzx551402/contextdesc zluoag@cse.ust.hk N/A 128 float32
SIFT (OpenCV)
kp:8000, match:nn
19-04-24 F 7968.3 0.2004 0.0016 0.0182 0.0548 0.0984 0.1426 Challenge organizers SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A 128 float32
SIFT + TFeat
kp:8000, match:nn
19-04-24 F 7968.5 0.2323 0.0018 0.0307 0.0736 0.1317 0.1876 Challenge organizers T-Feat extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. https://github.com/vbalnt/tfeat imagematching@uvic.ca N/A 128 float32
SIFT (OpenCV)
kp:2048, match:nn
19-05-17 F 2048.0 0.2230 0.0025 0.0207 0.0558 0.0974 0.1456 Challenge organizers SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A 128 float32
Superpoint (nn matcher)
kp:2048, match:nn
19-06-07 F 1883.6 0.2463 0.0014 0.0266 0.0726 0.1368 0.2041 Challenge organizers SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. TBA imagematching@uvic.ca N/A 256 float32
Superpoint (1:1 matcher)
kp:2048, match:nn1to1
19-06-07 F 1883.6 0.2758 0.0018 0.0333 0.0865 0.1536 0.2387 Challenge organizers SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. TBA imagematching@uvic.ca N/A 256 float32
SuperPoint (default)
kp:2048, match:nn
19-04-24 F 1156.4 0.2668 0.0012 0.0309 0.0804 0.1540 0.2215 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned (about 1200 on average). Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:1024, match:nn
19-04-26 F 1024.0 0.2655 0.0018 0.0295 0.0800 0.1426 0.2154 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:2048, match:nn
19-04-26 F 2048.0 0.2610 0.0018 0.0327 0.0814 0.1393 0.2078 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:256, match:nn
19-04-26 F 256.0 0.2517 0.0025 0.0276 0.0661 0.1209 0.1884 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:512, match:nn
19-04-26 F 512.0 0.2605 0.0014 0.0284 0.0777 0.1420 0.2158 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:8000, match:nn
19-04-26 F 8000.0 0.2564 0.0018 0.0295 0.0781 0.1346 0.1933 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
Scale-invariant desc. (Cartesian, lambda=16)
kp:8000, match:nn
19-07-29 F 7968.4 0.2508 0.0035 0.0346 0.0855 0.1428 0.2025 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. (This is a baseline where we use cartesian patches instead.) Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA patrick.ebel@epfl.ch N/A 128 float32
SuperPoint (trained on coco + phototourism training set)
kp:2048, match:nn
19-05-30 F 1488.2 0.2483 0.0031 0.0297 0.0771 0.1415 0.2078 Daniel DeTone, Paul Sarlin, Tomasz Malisiewicz, Andrew Rabinovich SuperPoint V1 model trained on COCO homographic warps at VGA resolution, plus pairs from the phototourism training set using the GT poses and depths for correspondence. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork ddetone@magicleap.com N/A 256 float32
SuperPoint + Custom Matcher (v1)
kp:2048, match:custom
19-05-30 F/M 1417.5 0.4226 0.0020 0.0309 0.0925 0.1990 0.3160 Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich Features are extracted by a modified SuperPoint: - retrained on the phototourism training set; - keypoints refinement; - better descriptor sampling; - adjusted thresholds. A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). TBA pesarlin@magicleap.com N/A 256 float32
SuperPoint + Custom Matcher (v2)
kp:2048, match:custom
19-05-28 F/M 1883.6 0.4184 0.0027 0.0327 0.1082 0.2182 0.3436 Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich Features are extracted by a modified SuperPoint: - retrained on the phototourism training set, - keypoints refinement, - better descriptor sampling, - adjusted thresholds; A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). TBA pesarlin@magicleap.com N/A 256 float32
SURF (OpenCV)
kp:8000, match:nn
19-04-24 F 7830.3 0.2009 0.0006 0.0157 0.0438 0.0871 0.1344 Challenge organizers SURF, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A TBA
SIFT + ContextDesc
kp:8000, match:nn1to1
19-06-07 F 7421.8 0.3273 0.0020 0.0295 0.0794 0.1548 0.2428 Challenge organizers ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with nearest-neighbour search, enforcing cross-match consistency. Features are quantized to uint8 and extracted from the code provided by the authors. https://github.com/lzx551402/contextdesc imagematching@uvic.ca N/A TBA

Stereo — sequence 'florence_cathedral_side'
Method Date Type #kp MS mAP5o mAP10o mAP15o mAP20o mAP25o By Details Link Contact Updated Descriptor size
AKAZE (OpenCV)
kp:8000, match:nn
19-04-24 F 7859.0 0.1603 0.0002 0.0038 0.0133 0.0290 0.0558 Challenge organizers AKAZE, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A TBA
SphereDesc
kp:8000, match:nn
19-04-26 F 9429.2 0.2110 0.0004 0.0081 0.0333 0.0821 0.1478 Anonymous We use OpenCV's implementation of AKAZE detector, and for each keypoint we extract a descriptor via CNN. N/A Anonymous N/A 256 float32
Brisk + SSS
kp:8000, match:nn
19-05-14 F 7066.7 0.1963 0.0009 0.0070 0.0207 0.0488 0.0915 Anonymous We use OpenCV's implementation of brisk detector with the default settings, and for each image there are at most 8K keypoints. For each keypoint, we extract a descriptor via a CNN model. TBA Anonymous N/A 128 float32
D2-Net (single scale)
kp:8000, match:nn
19-05-07 F 5416.8 0.2177 0.0002 0.0076 0.0335 0.0834 0.1601 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf https://github.com/mihaidusmanu/d2-net imagematching@uvic.ca N/A 512 float32
D2-Net (multiscale)
kp:8000, match:nn
19-05-07 F 7491.8 0.2297 0.0004 0.0115 0.0376 0.1003 0.1759 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf https://github.com/mihaidusmanu/d2-net imagematching@uvic.ca N/A 512 float32
D2-Net (single scale, no PT dataset)
kp:8000, match:nn
19-06-01 F 5474.9 0.2170 0.0000 0.0081 0.0331 0.0839 0.1608 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf https://github.com/mihaidusmanu/d2-net imagematching@uvic.ca N/A 512 float32
D2-Net (multi-scale, no PT dataset)
kp:8000, match:nn
19-06-05 F 7198.9 0.2282 0.0007 0.0079 0.0358 0.0913 0.1721 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf https://github.com/mihaidusmanu/d2-net imagematching@uvic.ca N/A 512 float32
DELF
kp:1024, match:nn
19-05-05 F 1024.0 0.2195 0.0000 0.0058 0.0184 0.0540 0.1039 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 https://github.com/tensorflow/models/tree/master/research/delf imagematching@uvic.ca N/A 40 float32
DELF
kp:2048, match:nn
19-05-05 F 2034.4 0.2158 0.0004 0.0079 0.0247 0.0625 0.1219 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 https://github.com/tensorflow/models/tree/master/research/delf imagematching@uvic.ca N/A 40 float32
DELF
kp:256, match:nn
19-05-05 F 256.0 0.2233 0.0000 0.0036 0.0094 0.0189 0.0405 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 https://github.com/tensorflow/models/tree/master/research/delf imagematching@uvic.ca N/A 40 float32
DELF
kp:512, match:nn
19-05-05 F 512.0 0.2214 0.0002 0.0052 0.0148 0.0373 0.0715 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 https://github.com/tensorflow/models/tree/master/research/delf imagematching@uvic.ca N/A 40 float32
ELF-256D
kp:512, match:nn
19-05-07 F 456.1 0.1907 0.0000 0.0022 0.0092 0.0254 0.0547 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. TBA Anonymous N/A 256 float32
ELF-512D
kp:512, match:nn
19-05-09 F 455.8 0.2052 0.0002 0.0040 0.0171 0.0421 0.0774 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. TBA Anonymous N/A 256 float32
ELF-SIFT
kp:512, match:nn
19-04-26 F 495.2 0.1824 0.0004 0.0063 0.0178 0.0385 0.0771 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are HOG (as in SIFT). N/A Anonymous N/A 128 uint8
SIFT + GeoDesc
kp:2048, match:nn
19-05-19 F 2048.0 0.1821 0.0013 0.0052 0.0202 0.0452 0.0976 Challenge organizers GeoDesc extracted on SIFT keypoints. Feature matching with brute-force nearest-neighbour search. https://github.com/lzx551402/geodesc imagematching@uvic.ca N/A 128 float32
SIFT + HardNet
kp:2048, match:nn
19-05-19 F 2048.0 0.1859 0.0004 0.0056 0.0245 0.0533 0.1091 Challenge organizers HardNet extracted on SIFT keypoints. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger patches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. https://github.com/DagnyT/hardnet imagematching@uvic.ca N/A 128 float32
HarrisZ/RsGLOH2
kp:8000, match:sGOr2f
19-05-23 F 2616.2 0.1802 0.0009 0.0079 0.0265 0.0598 0.1156 Fabio Bellavia and Carlo Colombo HarrisZ keypoints and Root squared (like RootSIFT) sGLOH2 descriptors using sGOr2f* matching strategy. Distance table entries lower than their respective row-wise and column-wise averages are discarded and then matching pairs are computed using the greedy nearest neighbour as in the WISW@CAIP2019 contest. Keypoints and matches are ordered according to their ranks (best ones first). http://cvg.dsi.unifi.it/cvg/index.php?id=research#descriptor bellavia.fabio@gmail.com N/A 256 float32
HesAffNet - HardNet2
kp:8000, match:nn
19-05-29 F 7678.8 0.1910 0.0007 0.0079 0.0281 0.0630 0.1230 Milan Pultar, Dmytro Mishkin, Jiří Matas Detector: Hessian + AffNet(affine shape) + OriNet (orientation), Code and weight from https://github.com/ducha-aiki/affnet, Paper: https://arxiv.org/abs/1711.06704; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: https://github.com/pultarmi/HardNet_MultiDataset, Paper: https://arxiv.org/pdf/1901.09780.pdf; DeepNets were plugged into MODS framework, but without view synthesis and matching https://github.com/ducha-aiki/mods-light-zmq; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. TBA ducha.aiki@gmail.com N/A 128 uint8
Hessian - HardNet2
kp:8000, match:nn
19-05-30 F 7754.4 0.2086 0.0007 0.0081 0.0279 0.0733 0.1410 Milan Pultar, Dmytro Mishkin, Jiří Matas Detector: Hessian. Gravity vector orientation is assumed; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: https://github.com/pultarmi/HardNet_MultiDataset Paper: https://arxiv.org/pdf/1901.09780.pdf, DeepNets were plugged into MODS framework, but without view synthesis and matching https://github.com/ducha-aiki/mods-light-zmq; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. TBA ducha.aiki@gmail.com N/A 128 uint8
ORB (OpenCV)
kp:8000, match:nn
19-04-24 F 7886.4 0.1637 0.0000 0.0031 0.0085 0.0198 0.0400 Challenge organizers ORB, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A TBA
Scale-invariant desc. (Log-Polar, lambda=32)
kp:8000, match:nn
19-06-25 F 7873.7 0.2032 0.0009 0.0072 0.0252 0.0628 0.1347 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA patrick.ebel@epfl.ch N/A 128 float32
Scale-invariant desc. (Log-Polar, lambda=64)
kp:8000, match:nn
19-06-24 F 7873.7 0.2053 0.0009 0.0081 0.0295 0.0762 0.1446 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA patrick.ebel@epfl.ch N/A 128 float32
Scale-invariant desc. (Log-Polar, lambda=96)
kp:8000, match:nn
19-06-20 F 7873.7 0.2063 0.0013 0.0081 0.0288 0.0738 0.1433 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA patrick.ebel@epfl.ch N/A 128 float32
SIFT-AID (NN matcher)
kp:8000, match:nn
19-05-10 F 6225.9 0.1775 0.0007 0.0045 0.0126 0.0283 0.0576 Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: https://hal.archives-ouvertes.fr/hal-02016010. Code: https://github.com/rdguez-mariano/sift-aid https://hal.archives-ouvertes.fr/hal-02016010 facciolo@cmla.ens-cachan.fr N/A 6272 bits
SIFT-AID (custom matcher)
kp:8000, match:sift-aid
19-04-29 F/M 6225.9 0.1766 0.0002 0.0029 0.0117 0.0232 0.0488 Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: https://hal.archives-ouvertes.fr/hal-02016010. Code: https://github.com/rdguez-mariano/sift-aid https://hal.archives-ouvertes.fr/hal-02016010 facciolo@cmla.ens-cachan.fr N/A 6272 bits
SIFT + ContextDesc
kp:8000, match:nn
19-05-09 F 7783.0 0.2254 0.0009 0.0085 0.0279 0.0711 0.1444 Zixin Luo ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. https://github.com/lzx551402/contextdesc zluoag@cse.ust.hk N/A 128 uint8
SIFT-Dense-ContextDesc
kp:8000, match:nn
19-05-28 F 7782.9 0.2182 0.0002 0.0074 0.0301 0.0742 0.1478 Zixin Luo, Jiahui Zhang Dense-ContextDesc is a variant of ContextDesc, where descriptors are densely extracted from full images, instead of image patches, while other settings stay unchanged as original ContextDesc. We find Dense-ContextDesc performs better regarding in particular illumination changes. Dense-ContextDesc is extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. The model will be available on the authors' GitHub page. TBA zluoag@cse.ust.hk N/A TBA
SIFT + ContextDesc + Inlier Classification V2
kp:8000, match:custom
19-05-28 F/M 7783.0 0.2894 0.0004 0.0139 0.0569 0.1278 0.2254 Dawei Sun, Zixin Luo, Jiahui Zhang We use the SIFT detector and ContextDesc descriptor, and then we train an improved inlier classification and fundamental matrix estimation network based on [Yi et al. CVPR2018] (https://arxiv.org/pdf/1711.05971.pdf). https://github.com/lzx551402/contextdesc zluoag@cse.ust.hk N/A 128 float32
SIFT-GeoDesc-GitHub
kp:8000, match:nn
19-05-08 F 7783.0 0.1843 0.0004 0.0049 0.0234 0.0587 0.1197 Zixin Luo GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features extracted from the code provided by the authors. https://arxiv.org/abs/1807.06294 zluoag@cse.ust.hk N/A 128 float32
SIFT + GeoDesc
kp:8000, match:nn
19-04-24 F 7874.2 0.1878 0.0004 0.0074 0.0263 0.0652 0.1271 Challenge organizers GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. https://github.com/lzx551402/geodesc imagematching@uvic.ca N/A 128 float32
SIFT + HardNet
kp:8000, match:nn
19-04-24 F 7874.2 0.2022 0.0007 0.0074 0.0295 0.0731 0.1422 Challenge organizers HardNet extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. https://github.com/DagnyT/hardnet imagematching@uvic.ca N/A 128 float32
SIFT + L2-Net
kp:8000, match:nn
19-04-24 F 7874.2 0.1987 0.0013 0.0076 0.0265 0.0646 0.1323 Challenge organizers L2-Net extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. https://github.com/yuruntian/L2-Net imagematching@uvic.ca N/A 128 float32
SIFT + ContextDesc + Inlier Classification V1
kp:8000, match:custom
19-05-29 F/M 7783.0 0.2891 0.0009 0.0137 0.0495 0.1134 0.2015 Dawei Sun, Zixin Luo, Jiahui Zhang We use the SIFT detector and ContextDesc descriptor, and then we train an inlier classification and fundamental matrix estimation network using the architecture of [Yi et al. CVPR2018] (https://arxiv.org/pdf/1711.05971.pdf). https://github.com/lzx551402/contextdesc zluoag@cse.ust.hk N/A 128 float32
SIFT (OpenCV)
kp:8000, match:nn
19-04-24 F 7873.7 0.1747 0.0002 0.0045 0.0155 0.0371 0.0839 Challenge organizers SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A 128 float32
SIFT + TFeat
kp:8000, match:nn
19-04-24 F 7874.2 0.1880 0.0004 0.0085 0.0250 0.0580 0.1161 Challenge organizers T-Feat extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. https://github.com/vbalnt/tfeat imagematching@uvic.ca N/A 128 float32
SIFT (OpenCV)
kp:2048, match:nn
19-05-17 F 2048.0 0.1678 0.0004 0.0036 0.0112 0.0263 0.0589 Challenge organizers SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A 128 float32
Superpoint (nn matcher)
kp:2048, match:nn
19-06-07 F 2010.2 0.1917 0.0004 0.0090 0.0283 0.0731 0.1352 Challenge organizers SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. TBA imagematching@uvic.ca N/A 256 float32
Superpoint (1:1 matcher)
kp:2048, match:nn1to1
19-06-07 F 2010.2 0.2063 0.0004 0.0128 0.0418 0.0902 0.1595 Challenge organizers SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. TBA imagematching@uvic.ca N/A 256 float32
SuperPoint (default)
kp:2048, match:nn
19-04-24 F 1663.1 0.2016 0.0007 0.0076 0.0297 0.0650 0.1350 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned (about 1200 on average). Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:1024, match:nn
19-04-26 F 1024.0 0.1979 0.0004 0.0054 0.0218 0.0632 0.1221 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:2048, match:nn
19-04-26 F 2048.0 0.2006 0.0002 0.0072 0.0288 0.0717 0.1399 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:256, match:nn
19-04-26 F 256.0 0.1768 0.0000 0.0025 0.0133 0.0387 0.0796 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:512, match:nn
19-04-26 F 512.0 0.1865 0.0004 0.0047 0.0178 0.0544 0.1075 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:8000, match:nn
19-04-26 F 8000.0 0.1981 0.0007 0.0074 0.0322 0.0751 0.1473 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
Scale-invariant desc. (Cartesian, lambda=16)
kp:8000, match:nn
19-07-29 F 7873.7 0.1997 0.0009 0.0065 0.0236 0.0704 0.1435 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. (This is a baseline where we use cartesian patches instead.) Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA patrick.ebel@epfl.ch N/A 128 float32
SuperPoint (trained on coco + phototourism training set)
kp:2048, match:nn
19-05-30 F 1803.0 0.1889 0.0002 0.0049 0.0245 0.0628 0.1246 Daniel DeTone, Paul Sarlin, Tomasz Malisiewicz, Andrew Rabinovich SuperPoint V1 model trained on COCO homographic warps at VGA resolution, plus pairs from the phototourism training set using the GT poses and depths for correspondence. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork ddetone@magicleap.com N/A 256 float32
SuperPoint + Custom Matcher (v1)
kp:2048, match:custom
19-05-30 F/M 1851.4 0.2766 0.0011 0.0119 0.0427 0.0963 0.1649 Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich Features are extracted by a modified SuperPoint: - retrained on the phototourism training set; - keypoints refinement; - better descriptor sampling; - adjusted thresholds. A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). TBA pesarlin@magicleap.com N/A 256 float32
SuperPoint + Custom Matcher (v2)
kp:2048, match:custom
19-05-28 F/M 2010.2 0.2769 0.0011 0.0119 0.0423 0.0972 0.1748 Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich Features are extracted by a modified SuperPoint: - retrained on the phototourism training set, - keypoints refinement, - better descriptor sampling, - adjusted thresholds; A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). TBA pesarlin@magicleap.com N/A 256 float32
SURF (OpenCV)
kp:8000, match:nn
19-04-24 F 7830.2 0.1628 0.0004 0.0031 0.0088 0.0250 0.0542 Challenge organizers SURF, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A TBA
SIFT + ContextDesc
kp:8000, match:nn1to1
19-06-07 F 7783.0 0.2360 0.0002 0.0101 0.0373 0.0920 0.1750 Challenge organizers ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with nearest-neighbour search, enforcing cross-match consistency. Features are quantized to uint8 and extracted from the code provided by the authors. https://github.com/lzx551402/contextdesc imagematching@uvic.ca N/A TBA

Stereo — sequence 'lincoln_memorial_statue'
Method Date Type #kp MS mAP5o mAP10o mAP15o mAP20o mAP25o By Details Link Contact Updated Descriptor size
AKAZE (OpenCV)
kp:8000, match:nn
19-04-24 F 7799.0 0.2834 0.0004 0.0054 0.0177 0.0510 0.1100 Challenge organizers AKAZE, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A TBA
SphereDesc
kp:8000, match:nn
19-04-26 F 1292.5 0.3649 0.0007 0.0132 0.0449 0.1019 0.1688 Anonymous We use OpenCV's implementation of AKAZE detector, and for each keypoint we extract a descriptor via CNN. N/A Anonymous N/A 256 float32
Brisk + SSS
kp:8000, match:nn
19-05-14 F 1292.5 0.3617 0.0000 0.0038 0.0123 0.0398 0.0733 Anonymous We use OpenCV's implementation of brisk detector with the default settings, and for each image there are at most 8K keypoints. For each keypoint, we extract a descriptor via a CNN model. TBA Anonymous N/A 128 float32
D2-Net (single scale)
kp:8000, match:nn
19-05-07 F 4352.2 0.3358 0.0007 0.0206 0.0867 0.1967 0.3125 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf https://github.com/mihaidusmanu/d2-net imagematching@uvic.ca N/A 512 float32
D2-Net (multiscale)
kp:8000, match:nn
19-05-07 F 5662.7 0.3539 0.0004 0.0203 0.0921 0.2110 0.3300 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf https://github.com/mihaidusmanu/d2-net imagematching@uvic.ca N/A 512 float32
D2-Net (single scale, no PT dataset)
kp:8000, match:nn
19-06-01 F 4536.2 0.3348 0.0004 0.0190 0.0832 0.1932 0.3023 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf https://github.com/mihaidusmanu/d2-net imagematching@uvic.ca N/A 512 float32
D2-Net (multi-scale, no PT dataset)
kp:8000, match:nn
19-06-05 F 5764.5 0.3546 0.0009 0.0177 0.0892 0.1981 0.3184 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf https://github.com/mihaidusmanu/d2-net imagematching@uvic.ca N/A 512 float32
DELF
kp:1024, match:nn
19-05-05 F 1024.0 0.3337 0.0004 0.0087 0.0411 0.1167 0.2124 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 https://github.com/tensorflow/models/tree/master/research/delf imagematching@uvic.ca N/A 40 float32
DELF
kp:2048, match:nn
19-05-05 F 2037.7 0.3195 0.0000 0.0069 0.0367 0.1102 0.2019 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 https://github.com/tensorflow/models/tree/master/research/delf imagematching@uvic.ca N/A 40 float32
DELF
kp:256, match:nn
19-05-05 F 256.0 0.3501 0.0007 0.0040 0.0215 0.0646 0.1348 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 https://github.com/tensorflow/models/tree/master/research/delf imagematching@uvic.ca N/A 40 float32
DELF
kp:512, match:nn
19-05-05 F 512.0 0.3436 0.0000 0.0060 0.0300 0.0928 0.1871 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 https://github.com/tensorflow/models/tree/master/research/delf imagematching@uvic.ca N/A 40 float32
ELF-256D
kp:512, match:nn
19-05-07 F 281.1 0.3411 0.0002 0.0036 0.0156 0.0358 0.0702 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. TBA Anonymous N/A 256 float32
ELF-512D
kp:512, match:nn
19-05-09 F 281.1 0.3563 0.0002 0.0069 0.0257 0.0590 0.1053 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. TBA Anonymous N/A 256 float32
ELF-SIFT
kp:512, match:nn
19-04-26 F 466.8 0.3046 0.0002 0.0074 0.0297 0.0675 0.1158 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are HOG (as in SIFT). N/A Anonymous N/A 128 uint8
SIFT + GeoDesc
kp:2048, match:nn
19-05-19 F 1255.5 0.3341 0.0004 0.0054 0.0264 0.0700 0.1297 Challenge organizers GeoDesc extracted on SIFT keypoints. Feature matching with brute-force nearest-neighbour search. https://github.com/lzx551402/geodesc imagematching@uvic.ca N/A 128 float32
SIFT + HardNet
kp:2048, match:nn
19-05-19 F 1255.5 0.3334 0.0007 0.0067 0.0255 0.0680 0.1259 Challenge organizers HardNet extracted on SIFT keypoints. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger patches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. https://github.com/DagnyT/hardnet imagematching@uvic.ca N/A 128 float32
HarrisZ/RsGLOH2
kp:8000, match:sGOr2f
19-05-23 F 1363.3 0.3367 0.0013 0.0110 0.0537 0.1384 0.2444 Fabio Bellavia and Carlo Colombo HarrisZ keypoints and Root squared (like RootSIFT) sGLOH2 descriptors using sGOr2f* matching strategy. Distance table entries lower than their respective row-wise and column-wise averages are discarded and then matching pairs are computed using the greedy nearest neighbour as in the WISW@CAIP2019 contest. Keypoints and matches are ordered according to their ranks (best ones first). http://cvg.dsi.unifi.it/cvg/index.php?id=research#descriptor bellavia.fabio@gmail.com N/A 256 float32
HesAffNet - HardNet2
kp:8000, match:nn
19-05-29 F 7751.6 0.3307 0.0007 0.0130 0.0608 0.1449 0.2408 Milan Pultar, Dmytro Mishkin, Jiří Matas Detector: Hessian + AffNet(affine shape) + OriNet (orientation), Code and weight from https://github.com/ducha-aiki/affnet, Paper: https://arxiv.org/abs/1711.06704; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: https://github.com/pultarmi/HardNet_MultiDataset, Paper: https://arxiv.org/pdf/1901.09780.pdf; DeepNets were plugged into MODS framework, but without view synthesis and matching https://github.com/ducha-aiki/mods-light-zmq; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. TBA ducha.aiki@gmail.com N/A 128 uint8
Hessian - HardNet2
kp:8000, match:nn
19-05-30 F 7776.5 0.3451 0.0002 0.0174 0.0798 0.1724 0.2853 Milan Pultar, Dmytro Mishkin, Jiří Matas Detector: Hessian. Gravity vector orientation is assumed; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: https://github.com/pultarmi/HardNet_MultiDataset Paper: https://arxiv.org/pdf/1901.09780.pdf, DeepNets were plugged into MODS framework, but without view synthesis and matching https://github.com/ducha-aiki/mods-light-zmq; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. TBA ducha.aiki@gmail.com N/A 128 uint8
ORB (OpenCV)
kp:8000, match:nn
19-04-24 F 3920.7 0.3164 0.0002 0.0025 0.0072 0.0201 0.0382 Challenge organizers ORB, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A TBA
Scale-invariant desc. (Log-Polar, lambda=32)
kp:8000, match:nn
19-06-25 F 7770.9 0.3104 0.0009 0.0139 0.0577 0.1397 0.2383 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA patrick.ebel@epfl.ch N/A 128 float32
Scale-invariant desc. (Log-Polar, lambda=64)
kp:8000, match:nn
19-06-24 F 7770.9 0.3154 0.0004 0.0130 0.0577 0.1388 0.2350 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA patrick.ebel@epfl.ch N/A 128 float32
Scale-invariant desc. (Log-Polar, lambda=96)
kp:8000, match:nn
19-06-20 F 7770.9 0.3175 0.0004 0.0163 0.0586 0.1426 0.2484 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA patrick.ebel@epfl.ch N/A 128 float32
SIFT-AID (NN matcher)
kp:8000, match:nn
19-05-10 F 1108.1 0.3384 0.0002 0.0025 0.0098 0.0217 0.0400 Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: https://hal.archives-ouvertes.fr/hal-02016010. Code: https://github.com/rdguez-mariano/sift-aid https://hal.archives-ouvertes.fr/hal-02016010 facciolo@cmla.ens-cachan.fr N/A 6272 bits
SIFT-AID (custom matcher)
kp:8000, match:sift-aid
19-04-29 F/M 1108.1 0.3413 0.0000 0.0013 0.0056 0.0161 0.0358 Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: https://hal.archives-ouvertes.fr/hal-02016010. Code: https://github.com/rdguez-mariano/sift-aid https://hal.archives-ouvertes.fr/hal-02016010 facciolo@cmla.ens-cachan.fr N/A 6272 bits
SIFT + ContextDesc
kp:8000, match:nn
19-05-09 F 5820.3 0.3555 0.0002 0.0118 0.0503 0.1326 0.2325 Zixin Luo ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. https://github.com/lzx551402/contextdesc zluoag@cse.ust.hk N/A 128 uint8
SIFT-Dense-ContextDesc
kp:8000, match:nn
19-05-28 F 5820.2 0.3529 0.0007 0.0130 0.0537 0.1446 0.2466 Zixin Luo, Jiahui Zhang Dense-ContextDesc is a variant of ContextDesc, where descriptors are densely extracted from full images, instead of image patches, while other settings stay unchanged as original ContextDesc. We find Dense-ContextDesc performs better regarding in particular illumination changes. Dense-ContextDesc is extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. The model will be available on the authors' GitHub page. TBA zluoag@cse.ust.hk N/A TBA
SIFT + ContextDesc + Inlier Classification V2
kp:8000, match:custom
19-05-28 F/M 5820.3 0.4037 0.0013 0.0268 0.1138 0.2535 0.3948 Dawei Sun, Zixin Luo, Jiahui Zhang We use the SIFT detector and ContextDesc descriptor, and then we train an improved inlier classification and fundamental matrix estimation network based on [Yi et al. CVPR2018] (https://arxiv.org/pdf/1711.05971.pdf). https://github.com/lzx551402/contextdesc zluoag@cse.ust.hk N/A 128 float32
SIFT-GeoDesc-GitHub
kp:8000, match:nn
19-05-08 F 5820.3 0.2939 0.0004 0.0112 0.0418 0.1028 0.1961 Zixin Luo GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features extracted from the code provided by the authors. https://arxiv.org/abs/1807.06294 zluoag@cse.ust.hk N/A 128 float32
SIFT + GeoDesc
kp:8000, match:nn
19-04-24 F 7771.0 0.2881 0.0009 0.0089 0.0391 0.0973 0.1847 Challenge organizers GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. https://github.com/lzx551402/geodesc imagematching@uvic.ca N/A 128 float32
SIFT + HardNet
kp:8000, match:nn
19-04-24 F 7771.0 0.3033 0.0011 0.0154 0.0615 0.1429 0.2482 Challenge organizers HardNet extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. https://github.com/DagnyT/hardnet imagematching@uvic.ca N/A 128 float32
SIFT + L2-Net
kp:8000, match:nn
19-04-24 F 7771.0 0.3002 0.0007 0.0116 0.0438 0.1259 0.2240 Challenge organizers L2-Net extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. https://github.com/yuruntian/L2-Net imagematching@uvic.ca N/A 128 float32
SIFT + ContextDesc + Inlier Classification V1
kp:8000, match:custom
19-05-29 F/M 5820.3 0.4036 0.0027 0.0221 0.1035 0.2397 0.3783 Dawei Sun, Zixin Luo, Jiahui Zhang We use the SIFT detector and ContextDesc descriptor, and then we train an inlier classification and fundamental matrix estimation network using the architecture of [Yi et al. CVPR2018] (https://arxiv.org/pdf/1711.05971.pdf). https://github.com/lzx551402/contextdesc zluoag@cse.ust.hk N/A 128 float32
SIFT (OpenCV)
kp:8000, match:nn
19-04-24 F 7770.9 0.2680 0.0002 0.0036 0.0168 0.0463 0.1033 Challenge organizers SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A 128 float32
SIFT + TFeat
kp:8000, match:nn
19-04-24 F 7771.0 0.2908 0.0002 0.0078 0.0329 0.0883 0.1737 Challenge organizers T-Feat extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. https://github.com/vbalnt/tfeat imagematching@uvic.ca N/A 128 float32
SIFT (OpenCV)
kp:2048, match:nn
19-05-17 F 2048.0 0.3013 0.0000 0.0027 0.0127 0.0317 0.0722 Challenge organizers SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A 128 float32
Superpoint (nn matcher)
kp:2048, match:nn
19-06-07 F 1701.4 0.3054 0.0011 0.0179 0.0644 0.1373 0.2305 Challenge organizers SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. TBA imagematching@uvic.ca N/A 256 float32
Superpoint (1:1 matcher)
kp:2048, match:nn1to1
19-06-07 F 1701.4 0.3225 0.0013 0.0215 0.0825 0.1791 0.2906 Challenge organizers SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. TBA imagematching@uvic.ca N/A 256 float32
SuperPoint (default)
kp:2048, match:nn
19-04-24 F 808.2 0.3362 0.0020 0.0179 0.0543 0.1290 0.2084 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned (about 1200 on average). Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:1024, match:nn
19-04-26 F 1024.0 0.3310 0.0009 0.0148 0.0590 0.1375 0.2267 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:2048, match:nn
19-04-26 F 2048.0 0.3164 0.0007 0.0154 0.0644 0.1426 0.2300 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:256, match:nn
19-04-26 F 256.0 0.3392 0.0007 0.0076 0.0221 0.0503 0.0858 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:512, match:nn
19-04-26 F 512.0 0.3414 0.0009 0.0134 0.0445 0.0993 0.1699 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:8000, match:nn
19-04-26 F 8000.0 0.3028 0.0018 0.0143 0.0653 0.1417 0.2361 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
Scale-invariant desc. (Cartesian, lambda=16)
kp:8000, match:nn
19-07-29 F 7770.9 0.3016 0.0011 0.0121 0.0485 0.1288 0.2426 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. (This is a baseline where we use cartesian patches instead.) Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA patrick.ebel@epfl.ch N/A 128 float32
SuperPoint (trained on coco + phototourism training set)
kp:2048, match:nn
19-05-30 F 1165.2 0.3127 0.0007 0.0116 0.0519 0.1209 0.2039 Daniel DeTone, Paul Sarlin, Tomasz Malisiewicz, Andrew Rabinovich SuperPoint V1 model trained on COCO homographic warps at VGA resolution, plus pairs from the phototourism training set using the GT poses and depths for correspondence. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork ddetone@magicleap.com N/A 256 float32
SuperPoint + Custom Matcher (v1)
kp:2048, match:custom
19-05-30 F/M 975.0 0.3958 0.0004 0.0134 0.0499 0.1107 0.1791 Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich Features are extracted by a modified SuperPoint: - retrained on the phototourism training set; - keypoints refinement; - better descriptor sampling; - adjusted thresholds. A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). TBA pesarlin@magicleap.com N/A 256 float32
SuperPoint + Custom Matcher (v2)
kp:2048, match:custom
19-05-28 F/M 1701.4 0.3948 0.0009 0.0145 0.0474 0.1100 0.1961 Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich Features are extracted by a modified SuperPoint: - retrained on the phototourism training set, - keypoints refinement, - better descriptor sampling, - adjusted thresholds; A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). TBA pesarlin@magicleap.com N/A 256 float32
SURF (OpenCV)
kp:8000, match:nn
19-04-24 F 7723.4 0.2617 0.0002 0.0029 0.0136 0.0405 0.0814 Challenge organizers SURF, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A TBA
SIFT + ContextDesc
kp:8000, match:nn1to1
19-06-07 F 5820.3 0.3634 0.0009 0.0163 0.0736 0.1838 0.3023 Challenge organizers ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with nearest-neighbour search, enforcing cross-match consistency. Features are quantized to uint8 and extracted from the code provided by the authors. https://github.com/lzx551402/contextdesc imagematching@uvic.ca N/A TBA

Stereo — sequence 'london_bridge'
Method Date Type #kp MS mAP5o mAP10o mAP15o mAP20o mAP25o By Details Link Contact Updated Descriptor size
AKAZE (OpenCV)
kp:8000, match:nn
19-04-24 F 7962.8 0.2545 0.0003 0.0066 0.0575 0.1473 0.2469 Challenge organizers AKAZE, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A TBA
SphereDesc
kp:8000, match:nn
19-04-26 F 3773.8 0.2875 0.0003 0.0063 0.0508 0.1233 0.2044 Anonymous We use OpenCV's implementation of AKAZE detector, and for each keypoint we extract a descriptor via CNN. N/A Anonymous N/A 256 float32
Brisk + SSS
kp:8000, match:nn
19-05-14 F 3702.9 0.2989 0.0000 0.0077 0.0602 0.1515 0.2479 Anonymous We use OpenCV's implementation of brisk detector with the default settings, and for each image there are at most 8K keypoints. For each keypoint, we extract a descriptor via a CNN model. TBA Anonymous N/A 128 float32
D2-Net (single scale)
kp:8000, match:nn
19-05-07 F 5004.7 0.2913 0.0003 0.0101 0.0616 0.1469 0.2378 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf https://github.com/mihaidusmanu/d2-net imagematching@uvic.ca N/A 512 float32
D2-Net (multiscale)
kp:8000, match:nn
19-05-07 F 6698.9 0.2998 0.0003 0.0136 0.0665 0.1539 0.2472 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf https://github.com/mihaidusmanu/d2-net imagematching@uvic.ca N/A 512 float32
D2-Net (single scale, no PT dataset)
kp:8000, match:nn
19-06-01 F 5133.2 0.2910 0.0003 0.0094 0.0547 0.1375 0.2295 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf https://github.com/mihaidusmanu/d2-net imagematching@uvic.ca N/A 512 float32
D2-Net (multi-scale, no PT dataset)
kp:8000, match:nn
19-06-05 F 6647.1 0.2996 0.0010 0.0146 0.0735 0.1633 0.2531 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf https://github.com/mihaidusmanu/d2-net imagematching@uvic.ca N/A 512 float32
DELF
kp:1024, match:nn
19-05-05 F 1024.0 0.3143 0.0000 0.0028 0.0366 0.1010 0.1619 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 https://github.com/tensorflow/models/tree/master/research/delf imagematching@uvic.ca N/A 40 float32
DELF
kp:2048, match:nn
19-05-05 F 2046.5 0.3102 0.0000 0.0063 0.0508 0.1118 0.1845 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 https://github.com/tensorflow/models/tree/master/research/delf imagematching@uvic.ca N/A 40 float32
DELF
kp:256, match:nn
19-05-05 F 256.0 0.3120 0.0003 0.0024 0.0178 0.0529 0.0996 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 https://github.com/tensorflow/models/tree/master/research/delf imagematching@uvic.ca N/A 40 float32
DELF
kp:512, match:nn
19-05-05 F 512.0 0.3150 0.0003 0.0035 0.0282 0.0769 0.1309 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 https://github.com/tensorflow/models/tree/master/research/delf imagematching@uvic.ca N/A 40 float32
ELF-256D
kp:512, match:nn
19-05-07 F 297.1 0.2823 0.0003 0.0035 0.0432 0.1208 0.1957 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. TBA Anonymous N/A 256 float32
ELF-512D
kp:512, match:nn
19-05-09 F 297.4 0.2929 0.0003 0.0059 0.0540 0.1327 0.2110 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. TBA Anonymous N/A 256 float32
ELF-SIFT
kp:512, match:nn
19-04-26 F 488.8 0.2749 0.0000 0.0108 0.0571 0.1365 0.2162 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are HOG (as in SIFT). N/A Anonymous N/A 128 uint8
SIFT + GeoDesc
kp:2048, match:nn
19-05-19 F 1983.5 0.2510 0.0000 0.0021 0.0310 0.0884 0.1765 Challenge organizers GeoDesc extracted on SIFT keypoints. Feature matching with brute-force nearest-neighbour search. https://github.com/lzx551402/geodesc imagematching@uvic.ca N/A 128 float32
SIFT + HardNet
kp:2048, match:nn
19-05-19 F 1983.5 0.2553 0.0000 0.0017 0.0303 0.0815 0.1598 Challenge organizers HardNet extracted on SIFT keypoints. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger patches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. https://github.com/DagnyT/hardnet imagematching@uvic.ca N/A 128 float32
HarrisZ/RsGLOH2
kp:8000, match:sGOr2f
19-05-23 F 1917.2 0.2811 0.0000 0.0104 0.0669 0.1664 0.2618 Fabio Bellavia and Carlo Colombo HarrisZ keypoints and Root squared (like RootSIFT) sGLOH2 descriptors using sGOr2f* matching strategy. Distance table entries lower than their respective row-wise and column-wise averages are discarded and then matching pairs are computed using the greedy nearest neighbour as in the WISW@CAIP2019 contest. Keypoints and matches are ordered according to their ranks (best ones first). http://cvg.dsi.unifi.it/cvg/index.php?id=research#descriptor bellavia.fabio@gmail.com N/A 256 float32
HesAffNet - HardNet2
kp:8000, match:nn
19-05-29 F 7668.0 0.2665 0.0000 0.0045 0.0286 0.0870 0.1640 Milan Pultar, Dmytro Mishkin, Jiří Matas Detector: Hessian + AffNet(affine shape) + OriNet (orientation), Code and weight from https://github.com/ducha-aiki/affnet, Paper: https://arxiv.org/abs/1711.06704; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: https://github.com/pultarmi/HardNet_MultiDataset, Paper: https://arxiv.org/pdf/1901.09780.pdf; DeepNets were plugged into MODS framework, but without view synthesis and matching https://github.com/ducha-aiki/mods-light-zmq; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. TBA ducha.aiki@gmail.com N/A 128 uint8
Hessian - HardNet2
kp:8000, match:nn
19-05-30 F 7748.9 0.2821 0.0000 0.0070 0.0404 0.1107 0.1960 Milan Pultar, Dmytro Mishkin, Jiří Matas Detector: Hessian. Gravity vector orientation is assumed; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: https://github.com/pultarmi/HardNet_MultiDataset Paper: https://arxiv.org/pdf/1901.09780.pdf, DeepNets were plugged into MODS framework, but without view synthesis and matching https://github.com/ducha-aiki/mods-light-zmq; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. TBA ducha.aiki@gmail.com N/A 128 uint8
ORB (OpenCV)
kp:8000, match:nn
19-04-24 F 6982.2 0.2415 0.0000 0.0021 0.0233 0.0769 0.1462 Challenge organizers ORB, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A TBA
Scale-invariant desc. (Log-Polar, lambda=32)
kp:8000, match:nn
19-06-25 F 7954.8 0.2972 0.0003 0.0122 0.0637 0.1435 0.2329 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA patrick.ebel@epfl.ch N/A 128 float32
Scale-invariant desc. (Log-Polar, lambda=64)
kp:8000, match:nn
19-06-24 F 7954.8 0.3025 0.0003 0.0164 0.0707 0.1602 0.2531 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA patrick.ebel@epfl.ch N/A 128 float32
Scale-invariant desc. (Log-Polar, lambda=96)
kp:8000, match:nn
19-06-20 F 7954.8 0.3049 0.0010 0.0160 0.0790 0.1744 0.2625 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA patrick.ebel@epfl.ch N/A 128 float32
SIFT-AID (NN matcher)
kp:8000, match:nn
19-05-10 F 2895.2 0.2848 0.0000 0.0045 0.0286 0.0783 0.1452 Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: https://hal.archives-ouvertes.fr/hal-02016010. Code: https://github.com/rdguez-mariano/sift-aid https://hal.archives-ouvertes.fr/hal-02016010 facciolo@cmla.ens-cachan.fr N/A 6272 bits
SIFT-AID (custom matcher)
kp:8000, match:sift-aid
19-04-29 F/M 2895.2 0.2843 0.0000 0.0035 0.0268 0.0790 0.1504 Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: https://hal.archives-ouvertes.fr/hal-02016010. Code: https://github.com/rdguez-mariano/sift-aid https://hal.archives-ouvertes.fr/hal-02016010 facciolo@cmla.ens-cachan.fr N/A 6272 bits
SIFT + ContextDesc
kp:8000, match:nn
19-05-09 F 7678.9 0.3487 0.0003 0.0118 0.0592 0.1306 0.2190 Zixin Luo ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. https://github.com/lzx551402/contextdesc zluoag@cse.ust.hk N/A 128 uint8
SIFT-Dense-ContextDesc
kp:8000, match:nn
19-05-28 F 7678.9 0.3322 0.0003 0.0104 0.0547 0.1327 0.2124 Zixin Luo, Jiahui Zhang Dense-ContextDesc is a variant of ContextDesc, where descriptors are densely extracted from full images, instead of image patches, while other settings stay unchanged as original ContextDesc. We find Dense-ContextDesc performs better regarding in particular illumination changes. Dense-ContextDesc is extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. The model will be available on the authors' GitHub page. TBA zluoag@cse.ust.hk N/A TBA
SIFT + ContextDesc + Inlier Classification V2
kp:8000, match:custom
19-05-28 F/M 7678.9 0.3781 0.0007 0.0327 0.1452 0.2914 0.4185 Dawei Sun, Zixin Luo, Jiahui Zhang We use the SIFT detector and ContextDesc descriptor, and then we train an improved inlier classification and fundamental matrix estimation network based on [Yi et al. CVPR2018] (https://arxiv.org/pdf/1711.05971.pdf). https://github.com/lzx551402/contextdesc zluoag@cse.ust.hk N/A 128 float32
SIFT-GeoDesc-GitHub
kp:8000, match:nn
19-05-08 F 7678.9 0.2757 0.0000 0.0077 0.0411 0.1166 0.2037 Zixin Luo GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features extracted from the code provided by the authors. https://arxiv.org/abs/1807.06294 zluoag@cse.ust.hk N/A 128 float32
SIFT + GeoDesc
kp:8000, match:nn
19-04-24 F 7955.7 0.2688 0.0000 0.0059 0.0463 0.1229 0.2159 Challenge organizers GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. https://github.com/lzx551402/geodesc imagematching@uvic.ca N/A 128 float32
SIFT + HardNet
kp:8000, match:nn
19-04-24 F 7955.7 0.2906 0.0003 0.0094 0.0477 0.1271 0.2030 Challenge organizers HardNet extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. https://github.com/DagnyT/hardnet imagematching@uvic.ca N/A 128 float32
SIFT + L2-Net
kp:8000, match:nn
19-04-24 F 7955.7 0.2906 0.0000 0.0073 0.0439 0.1079 0.1915 Challenge organizers L2-Net extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. https://github.com/yuruntian/L2-Net imagematching@uvic.ca N/A 128 float32
SIFT + ContextDesc + Inlier Classification V1
kp:8000, match:custom
19-05-29 F/M 7678.9 0.3766 0.0017 0.0338 0.1459 0.2765 0.3830 Dawei Sun, Zixin Luo, Jiahui Zhang We use the SIFT detector and ContextDesc descriptor, and then we train an inlier classification and fundamental matrix estimation network using the architecture of [Yi et al. CVPR2018] (https://arxiv.org/pdf/1711.05971.pdf). https://github.com/lzx551402/contextdesc zluoag@cse.ust.hk N/A 128 float32
SIFT (OpenCV)
kp:8000, match:nn
19-04-24 F 7954.4 0.2514 0.0000 0.0073 0.0414 0.1083 0.1981 Challenge organizers SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A 128 float32
SIFT + TFeat
kp:8000, match:nn
19-04-24 F 7955.7 0.2771 0.0000 0.0080 0.0494 0.1139 0.1905 Challenge organizers T-Feat extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. https://github.com/vbalnt/tfeat imagematching@uvic.ca N/A 128 float32
SIFT (OpenCV)
kp:2048, match:nn
19-05-17 F 2048.0 0.2345 0.0000 0.0028 0.0327 0.1003 0.1744 Challenge organizers SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A 128 float32
Superpoint (nn matcher)
kp:2048, match:nn
19-06-07 F 1799.3 0.2803 0.0000 0.0108 0.0672 0.1692 0.2552 Challenge organizers SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. TBA imagematching@uvic.ca N/A 256 float32
Superpoint (1:1 matcher)
kp:2048, match:nn1to1
19-06-07 F 1799.3 0.3046 0.0000 0.0129 0.0766 0.1748 0.2775 Challenge organizers SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. TBA imagematching@uvic.ca N/A 256 float32
SuperPoint (default)
kp:2048, match:nn
19-04-24 F 1094.0 0.2886 0.0000 0.0139 0.0776 0.1842 0.2761 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned (about 1200 on average). Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:1024, match:nn
19-04-26 F 1024.0 0.2873 0.0000 0.0139 0.0804 0.1995 0.2900 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:2048, match:nn
19-04-26 F 2048.0 0.2741 0.0000 0.0125 0.0696 0.1570 0.2451 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:256, match:nn
19-04-26 F 256.0 0.2832 0.0000 0.0070 0.0710 0.1661 0.2584 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:512, match:nn
19-04-26 F 512.0 0.2901 0.0003 0.0097 0.0829 0.1964 0.2942 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:8000, match:nn
19-04-26 F 8000.0 0.2590 0.0003 0.0101 0.0484 0.1194 0.1978 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
Scale-invariant desc. (Cartesian, lambda=16)
kp:8000, match:nn
19-07-29 F 7954.8 0.2883 0.0003 0.0073 0.0463 0.1142 0.2079 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. (This is a baseline where we use cartesian patches instead.) Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA patrick.ebel@epfl.ch N/A 128 float32
SuperPoint (trained on coco + phototourism training set)
kp:2048, match:nn
19-05-30 F 1418.4 0.2823 0.0007 0.0097 0.0644 0.1591 0.2604 Daniel DeTone, Paul Sarlin, Tomasz Malisiewicz, Andrew Rabinovich SuperPoint V1 model trained on COCO homographic warps at VGA resolution, plus pairs from the phototourism training set using the GT poses and depths for correspondence. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork ddetone@magicleap.com N/A 256 float32
SuperPoint + Custom Matcher (v1)
kp:2048, match:custom
19-05-30 F/M 1364.5 0.3725 0.0014 0.0320 0.1323 0.2625 0.3795 Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich Features are extracted by a modified SuperPoint: - retrained on the phototourism training set; - keypoints refinement; - better descriptor sampling; - adjusted thresholds. A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). TBA pesarlin@magicleap.com N/A 256 float32
SuperPoint + Custom Matcher (v2)
kp:2048, match:custom
19-05-28 F/M 1799.3 0.3718 0.0021 0.0320 0.1466 0.2834 0.3959 Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich Features are extracted by a modified SuperPoint: - retrained on the phototourism training set, - keypoints refinement, - better descriptor sampling, - adjusted thresholds; A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). TBA pesarlin@magicleap.com N/A 256 float32
SURF (OpenCV)
kp:8000, match:nn
19-04-24 F 7915.0 0.2465 0.0003 0.0045 0.0272 0.0766 0.1501 Challenge organizers SURF, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A TBA
SIFT + ContextDesc
kp:8000, match:nn1to1
19-06-07 F 7678.9 0.3590 0.0007 0.0125 0.0686 0.1647 0.2692 Challenge organizers ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with nearest-neighbour search, enforcing cross-match consistency. Features are quantized to uint8 and extracted from the code provided by the authors. https://github.com/lzx551402/contextdesc imagematching@uvic.ca N/A TBA

Stereo — sequence 'milan_cathedral'
Method Date Type #kp MS mAP5o mAP10o mAP15o mAP20o mAP25o By Details Link Contact Updated Descriptor size
AKAZE (OpenCV)
kp:8000, match:nn
19-04-24 F 7818.1 0.1838 0.0000 0.0045 0.0207 0.0573 0.1258 Challenge organizers AKAZE, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A TBA
SphereDesc
kp:8000, match:nn
19-04-26 F 5628.3 0.2179 0.0002 0.0112 0.0472 0.1130 0.2004 Anonymous We use OpenCV's implementation of AKAZE detector, and for each keypoint we extract a descriptor via CNN. N/A Anonymous N/A 256 float32
Brisk + SSS
kp:8000, match:nn
19-05-14 F 5111.7 0.2172 0.0000 0.0039 0.0289 0.0864 0.1583 Anonymous We use OpenCV's implementation of brisk detector with the default settings, and for each image there are at most 8K keypoints. For each keypoint, we extract a descriptor via a CNN model. TBA Anonymous N/A 128 float32
D2-Net (single scale)
kp:8000, match:nn
19-05-07 F 5932.2 0.2226 0.0004 0.0069 0.0472 0.1211 0.2173 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf https://github.com/mihaidusmanu/d2-net imagematching@uvic.ca N/A 512 float32
D2-Net (multiscale)
kp:8000, match:nn
19-05-07 F 7782.8 0.2308 0.0006 0.0098 0.0482 0.1195 0.2232 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf https://github.com/mihaidusmanu/d2-net imagematching@uvic.ca N/A 512 float32
D2-Net (single scale, no PT dataset)
kp:8000, match:nn
19-06-01 F 5965.0 0.2213 0.0006 0.0085 0.0455 0.1213 0.2205 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf https://github.com/mihaidusmanu/d2-net imagematching@uvic.ca N/A 512 float32
D2-Net (multi-scale, no PT dataset)
kp:8000, match:nn
19-06-05 F 7241.2 0.2293 0.0008 0.0087 0.0486 0.1246 0.2270 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf https://github.com/mihaidusmanu/d2-net imagematching@uvic.ca N/A 512 float32
DELF
kp:1024, match:nn
19-05-05 F 1024.0 0.2291 0.0006 0.0065 0.0295 0.0815 0.1618 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 https://github.com/tensorflow/models/tree/master/research/delf imagematching@uvic.ca N/A 40 float32
DELF
kp:2048, match:nn
19-05-05 F 2036.2 0.2261 0.0014 0.0081 0.0364 0.0923 0.1770 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 https://github.com/tensorflow/models/tree/master/research/delf imagematching@uvic.ca N/A 40 float32
DELF
kp:256, match:nn
19-05-05 F 256.0 0.2290 0.0004 0.0047 0.0238 0.0589 0.1189 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 https://github.com/tensorflow/models/tree/master/research/delf imagematching@uvic.ca N/A 40 float32
DELF
kp:512, match:nn
19-05-05 F 512.0 0.2290 0.0000 0.0049 0.0236 0.0652 0.1358 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 https://github.com/tensorflow/models/tree/master/research/delf imagematching@uvic.ca N/A 40 float32
ELF-256D
kp:512, match:nn
19-05-07 F 357.9 0.1940 0.0004 0.0045 0.0185 0.0514 0.0900 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. TBA Anonymous N/A 256 float32
ELF-512D
kp:512, match:nn
19-05-09 F 358.0 0.2008 0.0002 0.0026 0.0213 0.0567 0.1010 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. TBA Anonymous N/A 256 float32
ELF-SIFT
kp:512, match:nn
19-04-26 F 495.5 0.1892 0.0002 0.0049 0.0236 0.0665 0.1242 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are HOG (as in SIFT). N/A Anonymous N/A 128 uint8
SIFT + GeoDesc
kp:2048, match:nn
19-05-19 F 2039.6 0.1845 0.0002 0.0047 0.0293 0.0760 0.1455 Challenge organizers GeoDesc extracted on SIFT keypoints. Feature matching with brute-force nearest-neighbour search. https://github.com/lzx551402/geodesc imagematching@uvic.ca N/A 128 float32
SIFT + HardNet
kp:2048, match:nn
19-05-19 F 2039.6 0.1879 0.0000 0.0055 0.0323 0.0833 0.1591 Challenge organizers HardNet extracted on SIFT keypoints. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger patches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. https://github.com/DagnyT/hardnet imagematching@uvic.ca N/A 128 float32
HarrisZ/RsGLOH2
kp:8000, match:sGOr2f
19-05-23 F 2681.2 0.2070 0.0008 0.0071 0.0429 0.1106 0.2063 Fabio Bellavia and Carlo Colombo HarrisZ keypoints and Root squared (like RootSIFT) sGLOH2 descriptors using sGOr2f* matching strategy. Distance table entries lower than their respective row-wise and column-wise averages are discarded and then matching pairs are computed using the greedy nearest neighbour as in the WISW@CAIP2019 contest. Keypoints and matches are ordered according to their ranks (best ones first). http://cvg.dsi.unifi.it/cvg/index.php?id=research#descriptor bellavia.fabio@gmail.com N/A 256 float32
HesAffNet - HardNet2
kp:8000, match:nn
19-05-29 F 7698.8 0.1956 0.0004 0.0077 0.0366 0.1069 0.1998 Milan Pultar, Dmytro Mishkin, Jiří Matas Detector: Hessian + AffNet(affine shape) + OriNet (orientation), Code and weight from https://github.com/ducha-aiki/affnet, Paper: https://arxiv.org/abs/1711.06704; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: https://github.com/pultarmi/HardNet_MultiDataset, Paper: https://arxiv.org/pdf/1901.09780.pdf; DeepNets were plugged into MODS framework, but without view synthesis and matching https://github.com/ducha-aiki/mods-light-zmq; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. TBA ducha.aiki@gmail.com N/A 128 uint8
Hessian - HardNet2
kp:8000, match:nn
19-05-30 F 7761.0 0.2077 0.0006 0.0077 0.0415 0.1142 0.2093 Milan Pultar, Dmytro Mishkin, Jiří Matas Detector: Hessian. Gravity vector orientation is assumed; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: https://github.com/pultarmi/HardNet_MultiDataset Paper: https://arxiv.org/pdf/1901.09780.pdf, DeepNets were plugged into MODS framework, but without view synthesis and matching https://github.com/ducha-aiki/mods-light-zmq; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. TBA ducha.aiki@gmail.com N/A 128 uint8
ORB (OpenCV)
kp:8000, match:nn
19-04-24 F 7653.5 0.1726 0.0000 0.0010 0.0110 0.0343 0.0703 Challenge organizers ORB, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A TBA
Scale-invariant desc. (Log-Polar, lambda=32)
kp:8000, match:nn
19-06-25 F 7823.4 0.2100 0.0006 0.0112 0.0500 0.1209 0.2217 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA patrick.ebel@epfl.ch N/A 128 float32
Scale-invariant desc. (Log-Polar, lambda=64)
kp:8000, match:nn
19-06-24 F 7823.4 0.2152 0.0008 0.0098 0.0502 0.1268 0.2232 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA patrick.ebel@epfl.ch N/A 128 float32
Scale-invariant desc. (Log-Polar, lambda=96)
kp:8000, match:nn
19-06-20 F 7823.4 0.2183 0.0004 0.0091 0.0463 0.1203 0.2252 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA patrick.ebel@epfl.ch N/A 128 float32
SIFT-AID (NN matcher)
kp:8000, match:nn
19-05-10 F 4481.9 0.1919 0.0006 0.0039 0.0201 0.0537 0.1026 Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: https://hal.archives-ouvertes.fr/hal-02016010. Code: https://github.com/rdguez-mariano/sift-aid https://hal.archives-ouvertes.fr/hal-02016010 facciolo@cmla.ens-cachan.fr N/A 6272 bits
SIFT-AID (custom matcher)
kp:8000, match:sift-aid
19-04-29 F/M 4481.9 0.1919 0.0004 0.0033 0.0222 0.0522 0.1012 Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: https://hal.archives-ouvertes.fr/hal-02016010. Code: https://github.com/rdguez-mariano/sift-aid https://hal.archives-ouvertes.fr/hal-02016010 facciolo@cmla.ens-cachan.fr N/A 6272 bits
SIFT + ContextDesc
kp:8000, match:nn
19-05-09 F 7725.5 0.2565 0.0002 0.0112 0.0577 0.1413 0.2433 Zixin Luo ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. https://github.com/lzx551402/contextdesc zluoag@cse.ust.hk N/A 128 uint8
SIFT-Dense-ContextDesc
kp:8000, match:nn
19-05-28 F 7725.4 0.2453 0.0004 0.0085 0.0492 0.1248 0.2297 Zixin Luo, Jiahui Zhang Dense-ContextDesc is a variant of ContextDesc, where descriptors are densely extracted from full images, instead of image patches, while other settings stay unchanged as original ContextDesc. We find Dense-ContextDesc performs better regarding in particular illumination changes. Dense-ContextDesc is extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. The model will be available on the authors' GitHub page. TBA zluoag@cse.ust.hk N/A TBA
SIFT + ContextDesc + Inlier Classification V2
kp:8000, match:custom
19-05-28 F/M 7725.5 0.2981 0.0022 0.0213 0.0860 0.1917 0.3067 Dawei Sun, Zixin Luo, Jiahui Zhang We use the SIFT detector and ContextDesc descriptor, and then we train an improved inlier classification and fundamental matrix estimation network based on [Yi et al. CVPR2018] (https://arxiv.org/pdf/1711.05971.pdf). https://github.com/lzx551402/contextdesc zluoag@cse.ust.hk N/A 128 float32
SIFT-GeoDesc-GitHub
kp:8000, match:nn
19-05-08 F 7725.5 0.2007 0.0004 0.0085 0.0437 0.1100 0.2033 Zixin Luo GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features extracted from the code provided by the authors. https://arxiv.org/abs/1807.06294 zluoag@cse.ust.hk N/A 128 float32
SIFT + GeoDesc
kp:8000, match:nn
19-04-24 F 7823.6 0.1951 0.0002 0.0085 0.0392 0.1081 0.2012 Challenge organizers GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. https://github.com/lzx551402/geodesc imagematching@uvic.ca N/A 128 float32
SIFT + HardNet
kp:8000, match:nn
19-04-24 F 7823.6 0.2093 0.0008 0.0104 0.0545 0.1272 0.2248 Challenge organizers HardNet extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. https://github.com/DagnyT/hardnet imagematching@uvic.ca N/A 128 float32
SIFT + L2-Net
kp:8000, match:nn
19-04-24 F 7823.6 0.2055 0.0016 0.0100 0.0457 0.1146 0.2110 Challenge organizers L2-Net extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. https://github.com/yuruntian/L2-Net imagematching@uvic.ca N/A 128 float32
SIFT + ContextDesc + Inlier Classification V1
kp:8000, match:custom
19-05-29 F/M 7725.5 0.2978 0.0016 0.0199 0.0795 0.1754 0.2967 Dawei Sun, Zixin Luo, Jiahui Zhang We use the SIFT detector and ContextDesc descriptor, and then we train an inlier classification and fundamental matrix estimation network using the architecture of [Yi et al. CVPR2018] (https://arxiv.org/pdf/1711.05971.pdf). https://github.com/lzx551402/contextdesc zluoag@cse.ust.hk N/A 128 float32
SIFT (OpenCV)
kp:8000, match:nn
19-04-24 F 7823.8 0.1808 0.0004 0.0061 0.0226 0.0736 0.1465 Challenge organizers SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A 128 float32
SIFT + TFeat
kp:8000, match:nn
19-04-24 F 7823.6 0.1965 0.0008 0.0106 0.0419 0.1000 0.1876 Challenge organizers T-Feat extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. https://github.com/vbalnt/tfeat imagematching@uvic.ca N/A 128 float32
SIFT (OpenCV)
kp:2048, match:nn
19-05-17 F 2048.0 0.1683 0.0004 0.0030 0.0169 0.0500 0.0957 Challenge organizers SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A 128 float32
Superpoint (nn matcher)
kp:2048, match:nn
19-06-07 F 1997.1 0.2065 0.0006 0.0087 0.0429 0.1146 0.2006 Challenge organizers SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. TBA imagematching@uvic.ca N/A 256 float32
Superpoint (1:1 matcher)
kp:2048, match:nn1to1
19-06-07 F 1997.1 0.2205 0.0004 0.0106 0.0551 0.1301 0.2307 Challenge organizers SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. TBA imagematching@uvic.ca N/A 256 float32
SuperPoint (default)
kp:2048, match:nn
19-04-24 F 1483.6 0.2081 0.0010 0.0089 0.0398 0.1059 0.1939 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned (about 1200 on average). Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:1024, match:nn
19-04-26 F 1024.0 0.2100 0.0006 0.0102 0.0394 0.1020 0.1882 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:2048, match:nn
19-04-26 F 2048.0 0.2044 0.0012 0.0075 0.0429 0.1037 0.1935 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:256, match:nn
19-04-26 F 256.0 0.2040 0.0002 0.0053 0.0248 0.0654 0.1228 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:512, match:nn
19-04-26 F 512.0 0.2094 0.0002 0.0049 0.0315 0.0909 0.1600 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:8000, match:nn
19-04-26 F 8000.0 0.1956 0.0002 0.0096 0.0394 0.0996 0.1866 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
Scale-invariant desc. (Cartesian, lambda=16)
kp:8000, match:nn
19-07-29 F 7823.4 0.2071 0.0008 0.0118 0.0498 0.1215 0.2209 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. (This is a baseline where we use cartesian patches instead.) Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA patrick.ebel@epfl.ch N/A 128 float32
SuperPoint (trained on coco + phototourism training set)
kp:2048, match:nn
19-05-30 F 1726.8 0.2066 0.0008 0.0073 0.0439 0.1091 0.2014 Daniel DeTone, Paul Sarlin, Tomasz Malisiewicz, Andrew Rabinovich SuperPoint V1 model trained on COCO homographic warps at VGA resolution, plus pairs from the phototourism training set using the GT poses and depths for correspondence. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork ddetone@magicleap.com N/A 256 float32
SuperPoint + Custom Matcher (v1)
kp:2048, match:custom
19-05-30 F/M 1691.7 0.2989 0.0006 0.0126 0.0571 0.1335 0.2299 Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich Features are extracted by a modified SuperPoint: - retrained on the phototourism training set; - keypoints refinement; - better descriptor sampling; - adjusted thresholds. A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). TBA pesarlin@magicleap.com N/A 256 float32
SuperPoint + Custom Matcher (v2)
kp:2048, match:custom
19-05-28 F/M 1997.1 0.2981 0.0008 0.0154 0.0596 0.1388 0.2419 Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich Features are extracted by a modified SuperPoint: - retrained on the phototourism training set, - keypoints refinement, - better descriptor sampling, - adjusted thresholds; A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). TBA pesarlin@magicleap.com N/A 256 float32
SURF (OpenCV)
kp:8000, match:nn
19-04-24 F 7630.6 0.1704 0.0006 0.0047 0.0213 0.0526 0.1065 Challenge organizers SURF, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A TBA
SIFT + ContextDesc
kp:8000, match:nn1to1
19-06-07 F 7725.5 0.2642 0.0008 0.0132 0.0636 0.1585 0.2744 Challenge organizers ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with nearest-neighbour search, enforcing cross-match consistency. Features are quantized to uint8 and extracted from the code provided by the authors. https://github.com/lzx551402/contextdesc imagematching@uvic.ca N/A TBA

Stereo — sequence 'mount_rushmore'
Method Date Type #kp MS mAP5o mAP10o mAP15o mAP20o mAP25o By Details Link Contact Updated Descriptor size
AKAZE (OpenCV)
kp:8000, match:nn
19-04-24 F 7809.6 0.3603 0.0000 0.0011 0.0112 0.0549 0.1480 Challenge organizers AKAZE, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A TBA
SphereDesc
kp:8000, match:nn
19-04-26 F 7344.7 0.4323 0.0002 0.0046 0.0189 0.0798 0.1760 Anonymous We use OpenCV's implementation of AKAZE detector, and for each keypoint we extract a descriptor via CNN. N/A Anonymous N/A 256 float32
Brisk + SSS
kp:8000, match:nn
19-05-14 F 5588.5 0.4334 0.0000 0.0017 0.0156 0.0701 0.1541 Anonymous We use OpenCV's implementation of brisk detector with the default settings, and for each image there are at most 8K keypoints. For each keypoint, we extract a descriptor via a CNN model. TBA Anonymous N/A 128 float32
D2-Net (single scale)
kp:8000, match:nn
19-05-07 F 6006.5 0.4210 0.0004 0.0063 0.0223 0.0733 0.1676 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf https://github.com/mihaidusmanu/d2-net imagematching@uvic.ca N/A 512 float32
D2-Net (multiscale)
kp:8000, match:nn
19-05-07 F 8732.2 0.4307 0.0002 0.0067 0.0257 0.0842 0.1825 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf https://github.com/mihaidusmanu/d2-net imagematching@uvic.ca N/A 512 float32
D2-Net (single scale, no PT dataset)
kp:8000, match:nn
19-06-01 F 6145.1 0.4196 0.0011 0.0051 0.0204 0.0714 0.1615 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf https://github.com/mihaidusmanu/d2-net imagematching@uvic.ca N/A 512 float32
D2-Net (multi-scale, no PT dataset)
kp:8000, match:nn
19-06-05 F 7542.0 0.4288 0.0002 0.0051 0.0234 0.0735 0.1672 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf https://github.com/mihaidusmanu/d2-net imagematching@uvic.ca N/A 512 float32
DELF
kp:1024, match:nn
19-05-05 F 1024.0 0.4118 0.0002 0.0019 0.0124 0.0451 0.1202 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 https://github.com/tensorflow/models/tree/master/research/delf imagematching@uvic.ca N/A 40 float32
DELF
kp:2048, match:nn
19-05-05 F 2032.3 0.4054 0.0000 0.0029 0.0183 0.0535 0.1318 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 https://github.com/tensorflow/models/tree/master/research/delf imagematching@uvic.ca N/A 40 float32
DELF
kp:256, match:nn
19-05-05 F 256.0 0.4175 0.0000 0.0002 0.0040 0.0288 0.0787 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 https://github.com/tensorflow/models/tree/master/research/delf imagematching@uvic.ca N/A 40 float32
DELF
kp:512, match:nn
19-05-05 F 512.0 0.4160 0.0002 0.0036 0.0099 0.0354 0.1023 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 https://github.com/tensorflow/models/tree/master/research/delf imagematching@uvic.ca N/A 40 float32
ELF-256D
kp:512, match:nn
19-05-07 F 354.8 0.3949 0.0004 0.0015 0.0116 0.0413 0.0861 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. TBA Anonymous N/A 256 float32
ELF-512D
kp:512, match:nn
19-05-09 F 354.8 0.4041 0.0000 0.0006 0.0128 0.0474 0.1011 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. TBA Anonymous N/A 256 float32
ELF-SIFT
kp:512, match:nn
19-04-26 F 490.9 0.3744 0.0006 0.0019 0.0107 0.0505 0.1166 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are HOG (as in SIFT). N/A Anonymous N/A 128 uint8
SIFT + GeoDesc
kp:2048, match:nn
19-05-19 F 1988.9 0.4000 0.0006 0.0029 0.0185 0.0783 0.1573 Challenge organizers GeoDesc extracted on SIFT keypoints. Feature matching with brute-force nearest-neighbour search. https://github.com/lzx551402/geodesc imagematching@uvic.ca N/A 128 float32
SIFT + HardNet
kp:2048, match:nn
19-05-19 F 1988.9 0.4039 0.0000 0.0017 0.0126 0.0758 0.1514 Challenge organizers HardNet extracted on SIFT keypoints. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger patches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. https://github.com/DagnyT/hardnet imagematching@uvic.ca N/A 128 float32
HarrisZ/RsGLOH2
kp:8000, match:sGOr2f
19-05-23 F 2900.8 0.3947 0.0000 0.0029 0.0213 0.0817 0.1739 Fabio Bellavia and Carlo Colombo HarrisZ keypoints and Root squared (like RootSIFT) sGLOH2 descriptors using sGOr2f* matching strategy. Distance table entries lower than their respective row-wise and column-wise averages are discarded and then matching pairs are computed using the greedy nearest neighbour as in the WISW@CAIP2019 contest. Keypoints and matches are ordered according to their ranks (best ones first). http://cvg.dsi.unifi.it/cvg/index.php?id=research#descriptor bellavia.fabio@gmail.com N/A 256 float32
HesAffNet - HardNet2
kp:8000, match:nn
19-05-29 F 7598.1 0.4131 0.0002 0.0025 0.0164 0.0844 0.1893 Milan Pultar, Dmytro Mishkin, Jiří Matas Detector: Hessian + AffNet(affine shape) + OriNet (orientation), Code and weight from https://github.com/ducha-aiki/affnet, Paper: https://arxiv.org/abs/1711.06704; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: https://github.com/pultarmi/HardNet_MultiDataset, Paper: https://arxiv.org/pdf/1901.09780.pdf; DeepNets were plugged into MODS framework, but without view synthesis and matching https://github.com/ducha-aiki/mods-light-zmq; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. TBA ducha.aiki@gmail.com N/A 128 uint8
Hessian - HardNet2
kp:8000, match:nn
19-05-30 F 7668.5 0.4263 0.0000 0.0025 0.0198 0.0966 0.2061 Milan Pultar, Dmytro Mishkin, Jiří Matas Detector: Hessian. Gravity vector orientation is assumed; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: https://github.com/pultarmi/HardNet_MultiDataset Paper: https://arxiv.org/pdf/1901.09780.pdf, DeepNets were plugged into MODS framework, but without view synthesis and matching https://github.com/ducha-aiki/mods-light-zmq; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. TBA ducha.aiki@gmail.com N/A 128 uint8
ORB (OpenCV)
kp:8000, match:nn
19-04-24 F 7279.8 0.3824 0.0000 0.0002 0.0074 0.0453 0.1069 Challenge organizers ORB, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A TBA
Scale-invariant desc. (Log-Polar, lambda=32)
kp:8000, match:nn
19-06-25 F 7791.1 0.3969 0.0002 0.0029 0.0192 0.0842 0.1867 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA patrick.ebel@epfl.ch N/A 128 float32
Scale-invariant desc. (Log-Polar, lambda=64)
kp:8000, match:nn
19-06-24 F 7791.1 0.4029 0.0004 0.0038 0.0211 0.0859 0.1821 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA patrick.ebel@epfl.ch N/A 128 float32
Scale-invariant desc. (Log-Polar, lambda=96)
kp:8000, match:nn
19-06-20 F 7791.1 0.4058 0.0002 0.0040 0.0202 0.0811 0.1638 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA patrick.ebel@epfl.ch N/A 128 float32
SIFT-AID (NN matcher)
kp:8000, match:nn
19-05-10 F 4379.1 0.4001 0.0000 0.0017 0.0103 0.0381 0.0985 Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: https://hal.archives-ouvertes.fr/hal-02016010. Code: https://github.com/rdguez-mariano/sift-aid https://hal.archives-ouvertes.fr/hal-02016010 facciolo@cmla.ens-cachan.fr N/A 6272 bits
SIFT-AID (custom matcher)
kp:8000, match:sift-aid
19-04-29 F/M 4379.1 0.4021 0.0000 0.0013 0.0107 0.0381 0.0909 Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: https://hal.archives-ouvertes.fr/hal-02016010. Code: https://github.com/rdguez-mariano/sift-aid https://hal.archives-ouvertes.fr/hal-02016010 facciolo@cmla.ens-cachan.fr N/A 6272 bits
SIFT + ContextDesc
kp:8000, match:nn
19-05-09 F 7719.0 0.4673 0.0011 0.0061 0.0244 0.0627 0.1309 Zixin Luo ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. https://github.com/lzx551402/contextdesc zluoag@cse.ust.hk N/A 128 uint8
SIFT-Dense-ContextDesc
kp:8000, match:nn
19-05-28 F 7719.0 0.4417 0.0000 0.0042 0.0223 0.0661 0.1495 Zixin Luo, Jiahui Zhang Dense-ContextDesc is a variant of ContextDesc, where descriptors are densely extracted from full images, instead of image patches, while other settings stay unchanged as original ContextDesc. We find Dense-ContextDesc performs better regarding in particular illumination changes. Dense-ContextDesc is extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. The model will be available on the authors' GitHub page. TBA zluoag@cse.ust.hk N/A TBA
SIFT + ContextDesc + Inlier Classification V2
kp:8000, match:custom
19-05-28 F/M 7719.0 0.4917 0.0034 0.0328 0.0941 0.2051 0.3091 Dawei Sun, Zixin Luo, Jiahui Zhang We use the SIFT detector and ContextDesc descriptor, and then we train an improved inlier classification and fundamental matrix estimation network based on [Yi et al. CVPR2018] (https://arxiv.org/pdf/1711.05971.pdf). https://github.com/lzx551402/contextdesc zluoag@cse.ust.hk N/A 128 float32
SIFT-GeoDesc-GitHub
kp:8000, match:nn
19-05-08 F 7719.0 0.3774 0.0002 0.0023 0.0143 0.0777 0.1865 Zixin Luo GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features extracted from the code provided by the authors. https://arxiv.org/abs/1807.06294 zluoag@cse.ust.hk N/A 128 float32
SIFT + GeoDesc
kp:8000, match:nn
19-04-24 F 7791.0 0.3707 0.0004 0.0038 0.0177 0.0855 0.1863 Challenge organizers GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. https://github.com/lzx551402/geodesc imagematching@uvic.ca N/A 128 float32
SIFT + HardNet
kp:8000, match:nn
19-04-24 F 7791.0 0.3932 0.0000 0.0036 0.0198 0.0825 0.1888 Challenge organizers HardNet extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. https://github.com/DagnyT/hardnet imagematching@uvic.ca N/A 128 float32
SIFT + L2-Net
kp:8000, match:nn
19-04-24 F 7791.0 0.3893 0.0002 0.0051 0.0221 0.0781 0.1785 Challenge organizers L2-Net extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. https://github.com/yuruntian/L2-Net imagematching@uvic.ca N/A 128 float32
SIFT + ContextDesc + Inlier Classification V1
kp:8000, match:custom
19-05-29 F/M 7719.0 0.4938 0.0029 0.0234 0.0762 0.1878 0.2958 Dawei Sun, Zixin Luo, Jiahui Zhang We use the SIFT detector and ContextDesc descriptor, and then we train an inlier classification and fundamental matrix estimation network using the architecture of [Yi et al. CVPR2018] (https://arxiv.org/pdf/1711.05971.pdf). https://github.com/lzx551402/contextdesc zluoag@cse.ust.hk N/A 128 float32
SIFT (OpenCV)
kp:8000, match:nn
19-04-24 F 7788.9 0.3556 0.0002 0.0017 0.0158 0.0699 0.1575 Challenge organizers SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A 128 float32
SIFT + TFeat
kp:8000, match:nn
19-04-24 F 7791.0 0.3788 0.0002 0.0042 0.0194 0.0783 0.1699 Challenge organizers T-Feat extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. https://github.com/vbalnt/tfeat imagematching@uvic.ca N/A 128 float32
SIFT (OpenCV)
kp:2048, match:nn
19-05-17 F 2048.0 0.3787 0.0002 0.0013 0.0128 0.0585 0.1175 Challenge organizers SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A 128 float32
Superpoint (nn matcher)
kp:2048, match:nn
19-06-07 F 1710.5 0.3756 0.0000 0.0027 0.0126 0.0451 0.1181 Challenge organizers SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. TBA imagematching@uvic.ca N/A 256 float32
Superpoint (1:1 matcher)
kp:2048, match:nn1to1
19-06-07 F 1710.5 0.3999 0.0002 0.0040 0.0189 0.0667 0.1324 Challenge organizers SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. TBA imagematching@uvic.ca N/A 256 float32
SuperPoint (default)
kp:2048, match:nn
19-04-24 F 936.8 0.3766 0.0004 0.0032 0.0135 0.0455 0.1036 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned (about 1200 on average). Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:1024, match:nn
19-04-26 F 1024.0 0.3750 0.0000 0.0017 0.0124 0.0436 0.1091 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:2048, match:nn
19-04-26 F 2048.0 0.3702 0.0006 0.0044 0.0139 0.0421 0.0941 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:256, match:nn
19-04-26 F 256.0 0.3734 0.0004 0.0017 0.0105 0.0440 0.1011 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:512, match:nn
19-04-26 F 512.0 0.3770 0.0004 0.0029 0.0143 0.0495 0.1215 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:8000, match:nn
19-04-26 F 8000.0 0.3662 0.0002 0.0032 0.0103 0.0387 0.0832 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
Scale-invariant desc. (Cartesian, lambda=16)
kp:8000, match:nn
19-07-29 F 7791.1 0.3879 0.0002 0.0038 0.0192 0.0859 0.1909 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. (This is a baseline where we use cartesian patches instead.) Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA patrick.ebel@epfl.ch N/A 128 float32
SuperPoint (trained on coco + phototourism training set)
kp:2048, match:nn
19-05-30 F 1342.3 0.3785 0.0008 0.0032 0.0145 0.0484 0.1232 Daniel DeTone, Paul Sarlin, Tomasz Malisiewicz, Andrew Rabinovich SuperPoint V1 model trained on COCO homographic warps at VGA resolution, plus pairs from the phototourism training set using the GT poses and depths for correspondence. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork ddetone@magicleap.com N/A 256 float32
SuperPoint + Custom Matcher (v1)
kp:2048, match:custom
19-05-30 F/M 1173.5 0.4783 0.0015 0.0112 0.0362 0.0926 0.1773 Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich Features are extracted by a modified SuperPoint: - retrained on the phototourism training set; - keypoints refinement; - better descriptor sampling; - adjusted thresholds. A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). TBA pesarlin@magicleap.com N/A 256 float32
SuperPoint + Custom Matcher (v2)
kp:2048, match:custom
19-05-28 F/M 1710.5 0.4875 0.0013 0.0114 0.0415 0.1078 0.1977 Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich Features are extracted by a modified SuperPoint: - retrained on the phototourism training set, - keypoints refinement, - better descriptor sampling, - adjusted thresholds; A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). TBA pesarlin@magicleap.com N/A 256 float32
SURF (OpenCV)
kp:8000, match:nn
19-04-24 F 7514.8 0.3499 0.0002 0.0011 0.0107 0.0512 0.1192 Challenge organizers SURF, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A TBA
SIFT + ContextDesc
kp:8000, match:nn1to1
19-06-07 F 7719.0 0.4747 0.0004 0.0046 0.0242 0.0678 0.1408 Challenge organizers ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with nearest-neighbour search, enforcing cross-match consistency. Features are quantized to uint8 and extracted from the code provided by the authors. https://github.com/lzx551402/contextdesc imagematching@uvic.ca N/A TBA

Stereo — sequence 'piazza_san_marco'
Method Date Type #kp MS mAP5o mAP10o mAP15o mAP20o mAP25o By Details Link Contact Updated Descriptor size
AKAZE (OpenCV)
kp:8000, match:nn
19-04-24 F 7783.7 0.1563 0.0000 0.0029 0.0210 0.0660 0.1326 Challenge organizers AKAZE, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A TBA
SphereDesc
kp:8000, match:nn
19-04-26 F 6671.6 0.1798 0.0002 0.0032 0.0217 0.0768 0.1584 Anonymous We use OpenCV's implementation of AKAZE detector, and for each keypoint we extract a descriptor via CNN. N/A Anonymous N/A 256 float32
Brisk + SSS
kp:8000, match:nn
19-05-14 F 6007.0 0.1804 0.0005 0.0019 0.0210 0.0721 0.1423 Anonymous We use OpenCV's implementation of brisk detector with the default settings, and for each image there are at most 8K keypoints. For each keypoint, we extract a descriptor via a CNN model. TBA Anonymous N/A 128 float32
D2-Net (single scale)
kp:8000, match:nn
19-05-07 F 6269.4 0.1740 0.0000 0.0051 0.0317 0.0906 0.1747 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf https://github.com/mihaidusmanu/d2-net imagematching@uvic.ca N/A 512 float32
D2-Net (multiscale)
kp:8000, match:nn
19-05-07 F 7888.4 0.1817 0.0000 0.0049 0.0307 0.0953 0.1842 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf https://github.com/mihaidusmanu/d2-net imagematching@uvic.ca N/A 512 float32
D2-Net (single scale, no PT dataset)
kp:8000, match:nn
19-06-01 F 6368.5 0.1735 0.0005 0.0051 0.0305 0.0836 0.1793 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf https://github.com/mihaidusmanu/d2-net imagematching@uvic.ca N/A 512 float32
D2-Net (multi-scale, no PT dataset)
kp:8000, match:nn
19-06-05 F 7314.6 0.1821 0.0002 0.0046 0.0263 0.0838 0.1774 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf https://github.com/mihaidusmanu/d2-net imagematching@uvic.ca N/A 512 float32
DELF
kp:1024, match:nn
19-05-05 F 1024.0 0.1768 0.0000 0.0049 0.0288 0.0865 0.1769 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 https://github.com/tensorflow/models/tree/master/research/delf imagematching@uvic.ca N/A 40 float32
DELF
kp:2048, match:nn
19-05-05 F 2023.0 0.1780 0.0002 0.0029 0.0236 0.0850 0.1803 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 https://github.com/tensorflow/models/tree/master/research/delf imagematching@uvic.ca N/A 40 float32
DELF
kp:256, match:nn
19-05-05 F 256.0 0.1767 0.0000 0.0049 0.0236 0.0736 0.1606 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 https://github.com/tensorflow/models/tree/master/research/delf imagematching@uvic.ca N/A 40 float32
DELF
kp:512, match:nn
19-05-05 F 512.0 0.1764 0.0000 0.0051 0.0283 0.0831 0.1728 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 https://github.com/tensorflow/models/tree/master/research/delf imagematching@uvic.ca N/A 40 float32
ELF-256D
kp:512, match:nn
19-05-07 F 399.4 0.1725 0.0005 0.0046 0.0227 0.0690 0.1372 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. TBA Anonymous N/A 256 float32
ELF-512D
kp:512, match:nn
19-05-09 F 399.4 0.1739 0.0000 0.0056 0.0227 0.0648 0.1289 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. TBA Anonymous N/A 256 float32
ELF-SIFT
kp:512, match:nn
19-04-26 F 483.8 0.1587 0.0000 0.0039 0.0263 0.0807 0.1489 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are HOG (as in SIFT). N/A Anonymous N/A 128 uint8
SIFT + GeoDesc
kp:2048, match:nn
19-05-19 F 2041.5 0.1735 0.0000 0.0027 0.0239 0.0785 0.1535 Challenge organizers GeoDesc extracted on SIFT keypoints. Feature matching with brute-force nearest-neighbour search. https://github.com/lzx551402/geodesc imagematching@uvic.ca N/A 128 float32
SIFT + HardNet
kp:2048, match:nn
19-05-19 F 2041.5 0.1755 0.0000 0.0027 0.0197 0.0743 0.1503 Challenge organizers HardNet extracted on SIFT keypoints. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger patches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. https://github.com/DagnyT/hardnet imagematching@uvic.ca N/A 128 float32
HarrisZ/RsGLOH2
kp:8000, match:sGOr2f
19-05-23 F 2667.1 0.1670 0.0000 0.0034 0.0217 0.0811 0.1672 Fabio Bellavia and Carlo Colombo HarrisZ keypoints and Root squared (like RootSIFT) sGLOH2 descriptors using sGOr2f* matching strategy. Distance table entries lower than their respective row-wise and column-wise averages are discarded and then matching pairs are computed using the greedy nearest neighbour as in the WISW@CAIP2019 contest. Keypoints and matches are ordered according to their ranks (best ones first). http://cvg.dsi.unifi.it/cvg/index.php?id=research#descriptor bellavia.fabio@gmail.com N/A 256 float32
HesAffNet - HardNet2
kp:8000, match:nn
19-05-29 F 7657.5 0.1713 0.0007 0.0029 0.0217 0.0743 0.1603 Milan Pultar, Dmytro Mishkin, Jiří Matas Detector: Hessian + AffNet(affine shape) + OriNet (orientation), Code and weight from https://github.com/ducha-aiki/affnet, Paper: https://arxiv.org/abs/1711.06704; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: https://github.com/pultarmi/HardNet_MultiDataset, Paper: https://arxiv.org/pdf/1901.09780.pdf; DeepNets were plugged into MODS framework, but without view synthesis and matching https://github.com/ducha-aiki/mods-light-zmq; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. TBA ducha.aiki@gmail.com N/A 128 uint8
Hessian - HardNet2
kp:8000, match:nn
19-05-30 F 7728.7 0.1780 0.0002 0.0037 0.0227 0.0741 0.1608 Milan Pultar, Dmytro Mishkin, Jiří Matas Detector: Hessian. Gravity vector orientation is assumed; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: https://github.com/pultarmi/HardNet_MultiDataset Paper: https://arxiv.org/pdf/1901.09780.pdf, DeepNets were plugged into MODS framework, but without view synthesis and matching https://github.com/ducha-aiki/mods-light-zmq; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. TBA ducha.aiki@gmail.com N/A 128 uint8
ORB (OpenCV)
kp:8000, match:nn
19-04-24 F 7816.3 0.1673 0.0000 0.0015 0.0158 0.0602 0.1291 Challenge organizers ORB, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A TBA
Scale-invariant desc. (Log-Polar, lambda=32)
kp:8000, match:nn
19-06-25 F 7807.0 0.1728 0.0000 0.0044 0.0229 0.0797 0.1701 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA patrick.ebel@epfl.ch N/A 128 float32
Scale-invariant desc. (Log-Polar, lambda=64)
kp:8000, match:nn
19-06-24 F 7807.0 0.1750 0.0002 0.0044 0.0273 0.0858 0.1718 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA patrick.ebel@epfl.ch N/A 128 float32
Scale-invariant desc. (Log-Polar, lambda=96)
kp:8000, match:nn
19-06-20 F 7807.0 0.1759 0.0000 0.0046 0.0246 0.0821 0.1691 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA patrick.ebel@epfl.ch N/A 128 float32
SIFT-AID (NN matcher)
kp:8000, match:nn
19-05-10 F 4511.4 0.1573 0.0005 0.0015 0.0146 0.0526 0.1067 Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: https://hal.archives-ouvertes.fr/hal-02016010. Code: https://github.com/rdguez-mariano/sift-aid https://hal.archives-ouvertes.fr/hal-02016010 facciolo@cmla.ens-cachan.fr N/A 6272 bits
SIFT-AID (custom matcher)
kp:8000, match:sift-aid
19-04-29 F/M 4511.4 0.1575 0.0002 0.0015 0.0134 0.0463 0.1023 Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: https://hal.archives-ouvertes.fr/hal-02016010. Code: https://github.com/rdguez-mariano/sift-aid https://hal.archives-ouvertes.fr/hal-02016010 facciolo@cmla.ens-cachan.fr N/A 6272 bits
SIFT + ContextDesc
kp:8000, match:nn
19-05-09 F 7764.3 0.1902 0.0000 0.0054 0.0266 0.0841 0.1788 Zixin Luo ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. https://github.com/lzx551402/contextdesc zluoag@cse.ust.hk N/A 128 uint8
SIFT-Dense-ContextDesc
kp:8000, match:nn
19-05-28 F 7764.3 0.1820 0.0000 0.0054 0.0288 0.0838 0.1762 Zixin Luo, Jiahui Zhang Dense-ContextDesc is a variant of ContextDesc, where descriptors are densely extracted from full images, instead of image patches, while other settings stay unchanged as original ContextDesc. We find Dense-ContextDesc performs better regarding in particular illumination changes. Dense-ContextDesc is extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. The model will be available on the authors' GitHub page. TBA zluoag@cse.ust.hk N/A TBA
SIFT + ContextDesc + Inlier Classification V2
kp:8000, match:custom
19-05-28 F/M 7764.3 0.2300 0.0024 0.0105 0.0478 0.1187 0.2269 Dawei Sun, Zixin Luo, Jiahui Zhang We use the SIFT detector and ContextDesc descriptor, and then we train an improved inlier classification and fundamental matrix estimation network based on [Yi et al. CVPR2018] (https://arxiv.org/pdf/1711.05971.pdf). https://github.com/lzx551402/contextdesc zluoag@cse.ust.hk N/A 128 float32
SIFT-GeoDesc-GitHub
kp:8000, match:nn
19-05-08 F 7764.3 0.1621 0.0002 0.0034 0.0256 0.0789 0.1577 Zixin Luo GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features extracted from the code provided by the authors. https://arxiv.org/abs/1807.06294 zluoag@cse.ust.hk N/A 128 float32
SIFT + GeoDesc
kp:8000, match:nn
19-04-24 F 7808.0 0.1635 0.0000 0.0049 0.0239 0.0807 0.1589 Challenge organizers GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. https://github.com/lzx551402/geodesc imagematching@uvic.ca N/A 128 float32
SIFT + HardNet
kp:8000, match:nn
19-04-24 F 7808.0 0.1738 0.0002 0.0032 0.0241 0.0841 0.1706 Challenge organizers HardNet extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. https://github.com/DagnyT/hardnet imagematching@uvic.ca N/A 128 float32
SIFT + L2-Net
kp:8000, match:nn
19-04-24 F 7808.0 0.1702 0.0005 0.0049 0.0244 0.0809 0.1679 Challenge organizers L2-Net extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. https://github.com/yuruntian/L2-Net imagematching@uvic.ca N/A 128 float32
SIFT + ContextDesc + Inlier Classification V1
kp:8000, match:custom
19-05-29 F/M 7764.3 0.2280 0.0007 0.0083 0.0404 0.1067 0.2069 Dawei Sun, Zixin Luo, Jiahui Zhang We use the SIFT detector and ContextDesc descriptor, and then we train an inlier classification and fundamental matrix estimation network using the architecture of [Yi et al. CVPR2018] (https://arxiv.org/pdf/1711.05971.pdf). https://github.com/lzx551402/contextdesc zluoag@cse.ust.hk N/A 128 float32
SIFT (OpenCV)
kp:8000, match:nn
19-04-24 F 7807.2 0.1537 0.0000 0.0032 0.0202 0.0653 0.1289 Challenge organizers SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A 128 float32
SIFT + TFeat
kp:8000, match:nn
19-04-24 F 7808.0 0.1644 0.0002 0.0039 0.0224 0.0770 0.1547 Challenge organizers T-Feat extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. https://github.com/vbalnt/tfeat imagematching@uvic.ca N/A 128 float32
SIFT (OpenCV)
kp:2048, match:nn
19-05-17 F 2048.0 0.1738 0.0000 0.0017 0.0161 0.0619 0.1328 Challenge organizers SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A 128 float32
Superpoint (nn matcher)
kp:2048, match:nn
19-06-07 F 2003.7 0.1646 0.0000 0.0049 0.0285 0.0872 0.1728 Challenge organizers SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. TBA imagematching@uvic.ca N/A 256 float32
Superpoint (1:1 matcher)
kp:2048, match:nn1to1
19-06-07 F 2003.7 0.1680 0.0010 0.0063 0.0339 0.0945 0.1745 Challenge organizers SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. TBA imagematching@uvic.ca N/A 256 float32
SuperPoint (default)
kp:2048, match:nn
19-04-24 F 1557.9 0.1680 0.0002 0.0058 0.0295 0.0863 0.1723 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned (about 1200 on average). Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:1024, match:nn
19-04-26 F 1024.0 0.1676 0.0002 0.0058 0.0273 0.0831 0.1737 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:2048, match:nn
19-04-26 F 2048.0 0.1666 0.0005 0.0051 0.0285 0.0885 0.1742 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:256, match:nn
19-04-26 F 256.0 0.1583 0.0005 0.0054 0.0283 0.0797 0.1608 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:512, match:nn
19-04-26 F 512.0 0.1627 0.0002 0.0039 0.0273 0.0850 0.1715 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:8000, match:nn
19-04-26 F 8000.0 0.1618 0.0007 0.0066 0.0314 0.0848 0.1684 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
Scale-invariant desc. (Cartesian, lambda=16)
kp:8000, match:nn
19-07-29 F 7807.0 0.1723 0.0005 0.0027 0.0239 0.0821 0.1664 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. (This is a baseline where we use cartesian patches instead.) Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA patrick.ebel@epfl.ch N/A 128 float32
SuperPoint (trained on coco + phototourism training set)
kp:2048, match:nn
19-05-30 F 1807.8 0.1672 0.0000 0.0041 0.0270 0.0821 0.1713 Daniel DeTone, Paul Sarlin, Tomasz Malisiewicz, Andrew Rabinovich SuperPoint V1 model trained on COCO homographic warps at VGA resolution, plus pairs from the phototourism training set using the GT poses and depths for correspondence. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork ddetone@magicleap.com N/A 256 float32
SuperPoint + Custom Matcher (v1)
kp:2048, match:custom
19-05-30 F/M 1793.0 0.2083 0.0005 0.0080 0.0363 0.0970 0.1815 Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich Features are extracted by a modified SuperPoint: - retrained on the phototourism training set; - keypoints refinement; - better descriptor sampling; - adjusted thresholds. A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). TBA pesarlin@magicleap.com N/A 256 float32
SuperPoint + Custom Matcher (v2)
kp:2048, match:custom
19-05-28 F/M 2003.7 0.2083 0.0002 0.0088 0.0412 0.0992 0.1830 Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich Features are extracted by a modified SuperPoint: - retrained on the phototourism training set, - keypoints refinement, - better descriptor sampling, - adjusted thresholds; A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). TBA pesarlin@magicleap.com N/A 256 float32
SURF (OpenCV)
kp:8000, match:nn
19-04-24 F 7680.7 0.1457 0.0010 0.0051 0.0197 0.0595 0.1284 Challenge organizers SURF, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A TBA
SIFT + ContextDesc
kp:8000, match:nn1to1
19-06-07 F 7764.3 0.1950 0.0002 0.0056 0.0312 0.0911 0.1884 Challenge organizers ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with nearest-neighbour search, enforcing cross-match consistency. Features are quantized to uint8 and extracted from the code provided by the authors. https://github.com/lzx551402/contextdesc imagematching@uvic.ca N/A TBA

Stereo — sequence 'reichstag'
Method Date Type #kp MS mAP5o mAP10o mAP15o mAP20o mAP25o By Details Link Contact Updated Descriptor size
AKAZE (OpenCV)
kp:8000, match:nn
19-04-24 F 7961.9 0.2624 0.0010 0.0067 0.0329 0.0806 0.1588 Challenge organizers AKAZE, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A TBA
SphereDesc
kp:8000, match:nn
19-04-26 F 3421.1 0.3793 0.0010 0.0134 0.0491 0.1140 0.1931 Anonymous We use OpenCV's implementation of AKAZE detector, and for each keypoint we extract a descriptor via CNN. N/A Anonymous N/A 256 float32
Brisk + SSS
kp:8000, match:nn
19-05-14 F 3391.7 0.3844 0.0005 0.0091 0.0415 0.1011 0.1841 Anonymous We use OpenCV's implementation of brisk detector with the default settings, and for each image there are at most 8K keypoints. For each keypoint, we extract a descriptor via a CNN model. TBA Anonymous N/A 128 float32
D2-Net (single scale)
kp:8000, match:nn
19-05-07 F 4835.5 0.3876 0.0005 0.0134 0.0420 0.0939 0.1769 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf https://github.com/mihaidusmanu/d2-net imagematching@uvic.ca N/A 512 float32
D2-Net (multiscale)
kp:8000, match:nn
19-05-07 F 6059.3 0.4055 0.0010 0.0105 0.0434 0.1097 0.1927 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf https://github.com/mihaidusmanu/d2-net imagematching@uvic.ca N/A 512 float32
D2-Net (single scale, no PT dataset)
kp:8000, match:nn
19-06-01 F 4958.1 0.3867 0.0024 0.0162 0.0443 0.1011 0.1855 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf https://github.com/mihaidusmanu/d2-net imagematching@uvic.ca N/A 512 float32
D2-Net (multi-scale, no PT dataset)
kp:8000, match:nn
19-06-05 F 6035.7 0.4046 0.0005 0.0119 0.0491 0.0997 0.1850 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf https://github.com/mihaidusmanu/d2-net imagematching@uvic.ca N/A 512 float32
DELF
kp:1024, match:nn
19-05-05 F 1024.0 0.3924 0.0014 0.0129 0.0324 0.0653 0.1049 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 https://github.com/tensorflow/models/tree/master/research/delf imagematching@uvic.ca N/A 40 float32
DELF
kp:2048, match:nn
19-05-05 F 2048.0 0.3846 0.0014 0.0110 0.0277 0.0539 0.0901 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 https://github.com/tensorflow/models/tree/master/research/delf imagematching@uvic.ca N/A 40 float32
DELF
kp:256, match:nn
19-05-05 F 256.0 0.3789 0.0005 0.0086 0.0277 0.0491 0.0858 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 https://github.com/tensorflow/models/tree/master/research/delf imagematching@uvic.ca N/A 40 float32
DELF
kp:512, match:nn
19-05-05 F 512.0 0.3859 0.0019 0.0124 0.0377 0.0668 0.1040 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 https://github.com/tensorflow/models/tree/master/research/delf imagematching@uvic.ca N/A 40 float32
ELF-256D
kp:512, match:nn
19-05-07 F 269.7 0.3271 0.0005 0.0076 0.0367 0.0763 0.1392 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. TBA Anonymous N/A 256 float32
ELF-512D
kp:512, match:nn
19-05-09 F 269.6 0.3587 0.0019 0.0138 0.0372 0.0720 0.1249 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. TBA Anonymous N/A 256 float32
ELF-SIFT
kp:512, match:nn
19-04-26 F 478.1 0.3051 0.0019 0.0138 0.0320 0.0620 0.1116 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are HOG (as in SIFT). N/A Anonymous N/A 128 uint8
SIFT + GeoDesc
kp:2048, match:nn
19-05-19 F 2012.6 0.3221 0.0005 0.0105 0.0448 0.0968 0.1631 Challenge organizers GeoDesc extracted on SIFT keypoints. Feature matching with brute-force nearest-neighbour search. https://github.com/lzx551402/geodesc imagematching@uvic.ca N/A 128 float32
SIFT + HardNet
kp:2048, match:nn
19-05-19 F 2012.6 0.3312 0.0010 0.0124 0.0448 0.0973 0.1621 Challenge organizers HardNet extracted on SIFT keypoints. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger patches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. https://github.com/DagnyT/hardnet imagematching@uvic.ca N/A 128 float32
HarrisZ/RsGLOH2
kp:8000, match:sGOr2f
19-05-23 F 2245.6 0.3314 0.0014 0.0134 0.0358 0.0925 0.1645 Fabio Bellavia and Carlo Colombo HarrisZ keypoints and Root squared (like RootSIFT) sGLOH2 descriptors using sGOr2f* matching strategy. Distance table entries lower than their respective row-wise and column-wise averages are discarded and then matching pairs are computed using the greedy nearest neighbour as in the WISW@CAIP2019 contest. Keypoints and matches are ordered according to their ranks (best ones first). http://cvg.dsi.unifi.it/cvg/index.php?id=research#descriptor bellavia.fabio@gmail.com N/A 256 float32
HesAffNet - HardNet2
kp:8000, match:nn
19-05-29 F 7741.2 0.3447 0.0014 0.0129 0.0548 0.1130 0.1979 Milan Pultar, Dmytro Mishkin, Jiří Matas Detector: Hessian + AffNet(affine shape) + OriNet (orientation), Code and weight from https://github.com/ducha-aiki/affnet, Paper: https://arxiv.org/abs/1711.06704; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: https://github.com/pultarmi/HardNet_MultiDataset, Paper: https://arxiv.org/pdf/1901.09780.pdf; DeepNets were plugged into MODS framework, but without view synthesis and matching https://github.com/ducha-aiki/mods-light-zmq; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. TBA ducha.aiki@gmail.com N/A 128 uint8
Hessian - HardNet2
kp:8000, match:nn
19-05-30 F 7793.9 0.3813 0.0014 0.0119 0.0515 0.1211 0.2198 Milan Pultar, Dmytro Mishkin, Jiří Matas Detector: Hessian. Gravity vector orientation is assumed; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: https://github.com/pultarmi/HardNet_MultiDataset Paper: https://arxiv.org/pdf/1901.09780.pdf, DeepNets were plugged into MODS framework, but without view synthesis and matching https://github.com/ducha-aiki/mods-light-zmq; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. TBA ducha.aiki@gmail.com N/A 128 uint8
ORB (OpenCV)
kp:8000, match:nn
19-04-24 F 7094.3 0.2725 0.0000 0.0095 0.0324 0.0787 0.1474 Challenge organizers ORB, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A TBA
Scale-invariant desc. (Log-Polar, lambda=32)
kp:8000, match:nn
19-06-25 F 7977.9 0.3547 0.0029 0.0134 0.0453 0.1011 0.1893 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA patrick.ebel@epfl.ch N/A 128 float32
Scale-invariant desc. (Log-Polar, lambda=64)
kp:8000, match:nn
19-06-24 F 7977.9 0.3664 0.0010 0.0129 0.0429 0.1063 0.1884 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA patrick.ebel@epfl.ch N/A 128 float32
Scale-invariant desc. (Log-Polar, lambda=96)
kp:8000, match:nn
19-06-20 F 7977.9 0.3723 0.0010 0.0129 0.0529 0.1173 0.1941 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA patrick.ebel@epfl.ch N/A 128 float32
SIFT-AID (NN matcher)
kp:8000, match:nn
19-05-10 F 3023.9 0.3308 0.0010 0.0091 0.0291 0.0677 0.1178 Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: https://hal.archives-ouvertes.fr/hal-02016010. Code: https://github.com/rdguez-mariano/sift-aid https://hal.archives-ouvertes.fr/hal-02016010 facciolo@cmla.ens-cachan.fr N/A 6272 bits
SIFT-AID (custom matcher)
kp:8000, match:sift-aid
19-04-29 F/M 3023.9 0.3265 0.0019 0.0086 0.0324 0.0639 0.1164 Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: https://hal.archives-ouvertes.fr/hal-02016010. Code: https://github.com/rdguez-mariano/sift-aid https://hal.archives-ouvertes.fr/hal-02016010 facciolo@cmla.ens-cachan.fr N/A 6272 bits
SIFT + ContextDesc
kp:8000, match:nn
19-05-09 F 7837.6 0.4452 0.0000 0.0119 0.0510 0.1149 0.2084 Zixin Luo ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. https://github.com/lzx551402/contextdesc zluoag@cse.ust.hk N/A 128 uint8
SIFT-Dense-ContextDesc
kp:8000, match:nn
19-05-28 F 7837.5 0.4163 0.0005 0.0148 0.0558 0.1092 0.1927 Zixin Luo, Jiahui Zhang Dense-ContextDesc is a variant of ContextDesc, where descriptors are densely extracted from full images, instead of image patches, while other settings stay unchanged as original ContextDesc. We find Dense-ContextDesc performs better regarding in particular illumination changes. Dense-ContextDesc is extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. The model will be available on the authors' GitHub page. TBA zluoag@cse.ust.hk N/A TBA
SIFT + ContextDesc + Inlier Classification V2
kp:8000, match:custom
19-05-28 F/M 7837.6 0.5902 0.0029 0.0258 0.1140 0.2475 0.3972 Dawei Sun, Zixin Luo, Jiahui Zhang We use the SIFT detector and ContextDesc descriptor, and then we train an improved inlier classification and fundamental matrix estimation network based on [Yi et al. CVPR2018] (https://arxiv.org/pdf/1711.05971.pdf). https://github.com/lzx551402/contextdesc zluoag@cse.ust.hk N/A 128 float32
SIFT-GeoDesc-GitHub
kp:8000, match:nn
19-05-08 F 7837.6 0.3057 0.0010 0.0124 0.0448 0.1040 0.1807 Zixin Luo GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features extracted from the code provided by the authors. https://arxiv.org/abs/1807.06294 zluoag@cse.ust.hk N/A 128 float32
SIFT + GeoDesc
kp:8000, match:nn
19-04-24 F 7977.9 0.3042 0.0005 0.0129 0.0429 0.0963 0.1707 Challenge organizers GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. https://github.com/lzx551402/geodesc imagematching@uvic.ca N/A 128 float32
SIFT + HardNet
kp:8000, match:nn
19-04-24 F 7977.9 0.3519 0.0019 0.0157 0.0486 0.1078 0.1845 Challenge organizers HardNet extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. https://github.com/DagnyT/hardnet imagematching@uvic.ca N/A 128 float32
SIFT + L2-Net
kp:8000, match:nn
19-04-24 F 7977.9 0.3389 0.0014 0.0153 0.0491 0.0920 0.1750 Challenge organizers L2-Net extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. https://github.com/yuruntian/L2-Net imagematching@uvic.ca N/A 128 float32
SIFT + ContextDesc + Inlier Classification V1
kp:8000, match:custom
19-05-29 F/M 7837.6 0.5897 0.0029 0.0277 0.1021 0.2365 0.3882 Dawei Sun, Zixin Luo, Jiahui Zhang We use the SIFT detector and ContextDesc descriptor, and then we train an inlier classification and fundamental matrix estimation network using the architecture of [Yi et al. CVPR2018] (https://arxiv.org/pdf/1711.05971.pdf). https://github.com/lzx551402/contextdesc zluoag@cse.ust.hk N/A 128 float32
SIFT (OpenCV)
kp:8000, match:nn
19-04-24 F 7978.0 0.2607 0.0005 0.0091 0.0401 0.0863 0.1545 Challenge organizers SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A 128 float32
SIFT + TFeat
kp:8000, match:nn
19-04-24 F 7977.9 0.3106 0.0019 0.0153 0.0467 0.0925 0.1674 Challenge organizers T-Feat extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. https://github.com/vbalnt/tfeat imagematching@uvic.ca N/A 128 float32
SIFT (OpenCV)
kp:2048, match:nn
19-05-17 F 2048.0 0.2757 0.0005 0.0105 0.0334 0.0668 0.1144 Challenge organizers SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A 128 float32
Superpoint (nn matcher)
kp:2048, match:nn
19-06-07 F 1803.4 0.3515 0.0014 0.0129 0.0439 0.0949 0.1645 Challenge organizers SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. TBA imagematching@uvic.ca N/A 256 float32
Superpoint (1:1 matcher)
kp:2048, match:nn1to1
19-06-07 F 1803.4 0.3931 0.0010 0.0134 0.0563 0.1221 0.2232 Challenge organizers SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. TBA imagematching@uvic.ca N/A 256 float32
SuperPoint (default)
kp:2048, match:nn
19-04-24 F 1226.4 0.3795 0.0010 0.0186 0.0477 0.0968 0.1698 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned (about 1200 on average). Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:1024, match:nn
19-04-26 F 1024.0 0.3806 0.0019 0.0157 0.0424 0.0920 0.1593 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:2048, match:nn
19-04-26 F 2048.0 0.3577 0.0014 0.0124 0.0420 0.0906 0.1669 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:256, match:nn
19-04-26 F 256.0 0.3605 0.0014 0.0124 0.0391 0.0897 0.1459 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:512, match:nn
19-04-26 F 512.0 0.3780 0.0014 0.0138 0.0415 0.0920 0.1722 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:8000, match:nn
19-04-26 F 8000.0 0.3281 0.0014 0.0119 0.0410 0.0873 0.1474 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
Scale-invariant desc. (Cartesian, lambda=16)
kp:8000, match:nn
19-07-29 F 7977.9 0.3456 0.0024 0.0162 0.0491 0.1121 0.1845 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. (This is a baseline where we use cartesian patches instead.) Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA patrick.ebel@epfl.ch N/A 128 float32
SuperPoint (trained on coco + phototourism training set)
kp:2048, match:nn
19-05-30 F 1475.9 0.3525 0.0014 0.0143 0.0424 0.0897 0.1574 Daniel DeTone, Paul Sarlin, Tomasz Malisiewicz, Andrew Rabinovich SuperPoint V1 model trained on COCO homographic warps at VGA resolution, plus pairs from the phototourism training set using the GT poses and depths for correspondence. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork ddetone@magicleap.com N/A 256 float32
SuperPoint + Custom Matcher (v1)
kp:2048, match:custom
19-05-30 F/M 1463.2 0.5845 0.0014 0.0196 0.0796 0.1955 0.3257 Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich Features are extracted by a modified SuperPoint: - retrained on the phototourism training set; - keypoints refinement; - better descriptor sampling; - adjusted thresholds. A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). TBA pesarlin@magicleap.com N/A 256 float32
SuperPoint + Custom Matcher (v2)
kp:2048, match:custom
19-05-28 F/M 1803.4 0.5843 0.0000 0.0253 0.0954 0.2117 0.3743 Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich Features are extracted by a modified SuperPoint: - retrained on the phototourism training set, - keypoints refinement, - better descriptor sampling, - adjusted thresholds; A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). TBA pesarlin@magicleap.com N/A 256 float32
SURF (OpenCV)
kp:8000, match:nn
19-04-24 F 7719.0 0.2391 0.0005 0.0091 0.0348 0.0672 0.1307 Challenge organizers SURF, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A TBA
SIFT + ContextDesc
kp:8000, match:nn1to1
19-06-07 F 7837.6 0.4705 0.0014 0.0114 0.0529 0.1330 0.2437 Challenge organizers ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with nearest-neighbour search, enforcing cross-match consistency. Features are quantized to uint8 and extracted from the code provided by the authors. https://github.com/lzx551402/contextdesc imagematching@uvic.ca N/A TBA

Stereo — sequence 'sagrada_familia'
Method Date Type #kp MS mAP5o mAP10o mAP15o mAP20o mAP25o By Details Link Contact Updated Descriptor size
AKAZE (OpenCV)
kp:8000, match:nn
19-04-24 F 7862.3 0.1433 0.0006 0.0062 0.0139 0.0305 0.0537 Challenge organizers AKAZE, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A TBA
SphereDesc
kp:8000, match:nn
19-04-26 F 7516.2 0.1775 0.0006 0.0112 0.0427 0.0898 0.1506 Anonymous We use OpenCV's implementation of AKAZE detector, and for each keypoint we extract a descriptor via CNN. N/A Anonymous N/A 256 float32
Brisk + SSS
kp:8000, match:nn
19-05-14 F 5945.2 0.1886 0.0008 0.0056 0.0299 0.0631 0.1073 Anonymous We use OpenCV's implementation of brisk detector with the default settings, and for each image there are at most 8K keypoints. For each keypoint, we extract a descriptor via a CNN model. TBA Anonymous N/A 128 float32
D2-Net (single scale)
kp:8000, match:nn
19-05-07 F 7186.2 0.1899 0.0012 0.0133 0.0432 0.0927 0.1523 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf https://github.com/mihaidusmanu/d2-net imagematching@uvic.ca N/A 512 float32
D2-Net (multiscale)
kp:8000, match:nn
19-05-07 F 10329.8 0.1992 0.0004 0.0102 0.0398 0.0913 0.1488 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf https://github.com/mihaidusmanu/d2-net imagematching@uvic.ca N/A 512 float32
D2-Net (single scale, no PT dataset)
kp:8000, match:nn
19-06-01 F 6964.7 0.1898 0.0008 0.0118 0.0409 0.0844 0.1459 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf https://github.com/mihaidusmanu/d2-net imagematching@uvic.ca N/A 512 float32
D2-Net (multi-scale, no PT dataset)
kp:8000, match:nn
19-06-05 F 7698.3 0.1983 0.0002 0.0112 0.0421 0.0873 0.1525 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf https://github.com/mihaidusmanu/d2-net imagematching@uvic.ca N/A 512 float32
DELF
kp:1024, match:nn
19-05-05 F 1024.0 0.1673 0.0000 0.0033 0.0151 0.0423 0.0755 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 https://github.com/tensorflow/models/tree/master/research/delf imagematching@uvic.ca N/A 40 float32
DELF
kp:2048, match:nn
19-05-05 F 2041.3 0.1677 0.0006 0.0052 0.0199 0.0492 0.0894 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 https://github.com/tensorflow/models/tree/master/research/delf imagematching@uvic.ca N/A 40 float32
DELF
kp:256, match:nn
19-05-05 F 256.0 0.1632 0.0000 0.0006 0.0056 0.0120 0.0234 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 https://github.com/tensorflow/models/tree/master/research/delf imagematching@uvic.ca N/A 40 float32
DELF
kp:512, match:nn
19-05-05 F 512.0 0.1660 0.0000 0.0015 0.0116 0.0257 0.0506 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 https://github.com/tensorflow/models/tree/master/research/delf imagematching@uvic.ca N/A 40 float32
ELF-256D
kp:512, match:nn
19-05-07 F 313.1 0.1656 0.0002 0.0027 0.0091 0.0230 0.0380 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. TBA Anonymous N/A 256 float32
ELF-512D
kp:512, match:nn
19-05-09 F 313.3 0.1782 0.0000 0.0015 0.0095 0.0212 0.0390 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. TBA Anonymous N/A 256 float32
ELF-SIFT
kp:512, match:nn
19-04-26 F 500.0 0.1646 0.0006 0.0058 0.0207 0.0432 0.0726 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are HOG (as in SIFT). N/A Anonymous N/A 128 uint8
SIFT + GeoDesc
kp:2048, match:nn
19-05-19 F 2012.8 0.1532 0.0012 0.0087 0.0263 0.0571 0.1004 Challenge organizers GeoDesc extracted on SIFT keypoints. Feature matching with brute-force nearest-neighbour search. https://github.com/lzx551402/geodesc imagematching@uvic.ca N/A 128 float32
SIFT + HardNet
kp:2048, match:nn
19-05-19 F 2012.8 0.1531 0.0002 0.0079 0.0295 0.0593 0.1002 Challenge organizers HardNet extracted on SIFT keypoints. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger patches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. https://github.com/DagnyT/hardnet imagematching@uvic.ca N/A 128 float32
HarrisZ/RsGLOH2
kp:8000, match:sGOr2f
19-05-23 F 3588.3 0.1627 0.0006 0.0098 0.0346 0.0732 0.1245 Fabio Bellavia and Carlo Colombo HarrisZ keypoints and Root squared (like RootSIFT) sGLOH2 descriptors using sGOr2f* matching strategy. Distance table entries lower than their respective row-wise and column-wise averages are discarded and then matching pairs are computed using the greedy nearest neighbour as in the WISW@CAIP2019 contest. Keypoints and matches are ordered according to their ranks (best ones first). http://cvg.dsi.unifi.it/cvg/index.php?id=research#descriptor bellavia.fabio@gmail.com N/A 256 float32
HesAffNet - HardNet2
kp:8000, match:nn
19-05-29 F 7695.2 0.1613 0.0010 0.0124 0.0407 0.0863 0.1373 Milan Pultar, Dmytro Mishkin, Jiří Matas Detector: Hessian + AffNet(affine shape) + OriNet (orientation), Code and weight from https://github.com/ducha-aiki/affnet, Paper: https://arxiv.org/abs/1711.06704; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: https://github.com/pultarmi/HardNet_MultiDataset, Paper: https://arxiv.org/pdf/1901.09780.pdf; DeepNets were plugged into MODS framework, but without view synthesis and matching https://github.com/ducha-aiki/mods-light-zmq; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. TBA ducha.aiki@gmail.com N/A 128 uint8
Hessian - HardNet2
kp:8000, match:nn
19-05-30 F 7755.4 0.1765 0.0012 0.0139 0.0384 0.0830 0.1452 Milan Pultar, Dmytro Mishkin, Jiří Matas Detector: Hessian. Gravity vector orientation is assumed; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: https://github.com/pultarmi/HardNet_MultiDataset Paper: https://arxiv.org/pdf/1901.09780.pdf, DeepNets were plugged into MODS framework, but without view synthesis and matching https://github.com/ducha-aiki/mods-light-zmq; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. TBA ducha.aiki@gmail.com N/A 128 uint8
ORB (OpenCV)
kp:8000, match:nn
19-04-24 F 7643.7 0.1420 0.0000 0.0023 0.0098 0.0197 0.0336 Challenge organizers ORB, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A TBA
Scale-invariant desc. (Log-Polar, lambda=32)
kp:8000, match:nn
19-06-25 F 7893.9 0.1677 0.0006 0.0114 0.0413 0.0828 0.1369 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA patrick.ebel@epfl.ch N/A 128 float32
Scale-invariant desc. (Log-Polar, lambda=64)
kp:8000, match:nn
19-06-24 F 7893.9 0.1719 0.0004 0.0110 0.0371 0.0828 0.1400 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA patrick.ebel@epfl.ch N/A 128 float32
Scale-invariant desc. (Log-Polar, lambda=96)
kp:8000, match:nn
19-06-20 F 7893.9 0.1741 0.0006 0.0100 0.0394 0.0840 0.1417 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA patrick.ebel@epfl.ch N/A 128 float32
SIFT-AID (NN matcher)
kp:8000, match:nn
19-05-10 F 5943.8 0.1572 0.0000 0.0021 0.0158 0.0388 0.0585 Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: https://hal.archives-ouvertes.fr/hal-02016010. Code: https://github.com/rdguez-mariano/sift-aid https://hal.archives-ouvertes.fr/hal-02016010 facciolo@cmla.ens-cachan.fr N/A 6272 bits
SIFT-AID (custom matcher)
kp:8000, match:sift-aid
19-04-29 F/M 5943.8 0.1558 0.0002 0.0037 0.0129 0.0320 0.0521 Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: https://hal.archives-ouvertes.fr/hal-02016010. Code: https://github.com/rdguez-mariano/sift-aid https://hal.archives-ouvertes.fr/hal-02016010 facciolo@cmla.ens-cachan.fr N/A 6272 bits
SIFT + ContextDesc
kp:8000, match:nn
19-05-09 F 7855.1 0.1970 0.0015 0.0137 0.0463 0.0979 0.1629 Zixin Luo ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. https://github.com/lzx551402/contextdesc zluoag@cse.ust.hk N/A 128 uint8
SIFT-Dense-ContextDesc
kp:8000, match:nn
19-05-28 F 7855.1 0.1876 0.0012 0.0129 0.0427 0.0880 0.1471 Zixin Luo, Jiahui Zhang Dense-ContextDesc is a variant of ContextDesc, where descriptors are densely extracted from full images, instead of image patches, while other settings stay unchanged as original ContextDesc. We find Dense-ContextDesc performs better regarding in particular illumination changes. Dense-ContextDesc is extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. The model will be available on the authors' GitHub page. TBA zluoag@cse.ust.hk N/A TBA
SIFT + ContextDesc + Inlier Classification V2
kp:8000, match:custom
19-05-28 F/M 7855.1 0.2372 0.0008 0.0201 0.0654 0.1307 0.2199 Dawei Sun, Zixin Luo, Jiahui Zhang We use the SIFT detector and ContextDesc descriptor, and then we train an improved inlier classification and fundamental matrix estimation network based on [Yi et al. CVPR2018] (https://arxiv.org/pdf/1711.05971.pdf). https://github.com/lzx551402/contextdesc zluoag@cse.ust.hk N/A 128 float32
SIFT-GeoDesc-GitHub
kp:8000, match:nn
19-05-08 F 7855.1 0.1557 0.0012 0.0083 0.0346 0.0726 0.1251 Zixin Luo GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features extracted from the code provided by the authors. https://arxiv.org/abs/1807.06294 zluoag@cse.ust.hk N/A 128 float32
SIFT + GeoDesc
kp:8000, match:nn
19-04-24 F 7896.1 0.1521 0.0008 0.0108 0.0317 0.0718 0.1224 Challenge organizers GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. https://github.com/lzx551402/geodesc imagematching@uvic.ca N/A 128 float32
SIFT + HardNet
kp:8000, match:nn
19-04-24 F 7896.1 0.1635 0.0012 0.0129 0.0402 0.0838 0.1427 Challenge organizers HardNet extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. https://github.com/DagnyT/hardnet imagematching@uvic.ca N/A 128 float32
SIFT + L2-Net
kp:8000, match:nn
19-04-24 F 7896.1 0.1595 0.0008 0.0098 0.0386 0.0788 0.1284 Challenge organizers L2-Net extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. https://github.com/yuruntian/L2-Net imagematching@uvic.ca N/A 128 float32
SIFT + ContextDesc + Inlier Classification V1
kp:8000, match:custom
19-05-29 F/M 7855.1 0.2369 0.0019 0.0151 0.0577 0.1299 0.2085 Dawei Sun, Zixin Luo, Jiahui Zhang We use the SIFT detector and ContextDesc descriptor, and then we train an inlier classification and fundamental matrix estimation network using the architecture of [Yi et al. CVPR2018] (https://arxiv.org/pdf/1711.05971.pdf). https://github.com/lzx551402/contextdesc zluoag@cse.ust.hk N/A 128 float32
SIFT (OpenCV)
kp:8000, match:nn
19-04-24 F 7894.1 0.1382 0.0004 0.0044 0.0156 0.0340 0.0583 Challenge organizers SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A 128 float32
SIFT + TFeat
kp:8000, match:nn
19-04-24 F 7896.1 0.1515 0.0012 0.0089 0.0266 0.0571 0.0952 Challenge organizers T-Feat extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. https://github.com/vbalnt/tfeat imagematching@uvic.ca N/A 128 float32
SIFT (OpenCV)
kp:2048, match:nn
19-05-17 F 2048.0 0.1373 0.0002 0.0037 0.0135 0.0259 0.0494 Challenge organizers SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A 128 float32
Superpoint (nn matcher)
kp:2048, match:nn
19-06-07 F 1988.9 0.1634 0.0012 0.0137 0.0409 0.0803 0.1328 Challenge organizers SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. TBA imagematching@uvic.ca N/A 256 float32
Superpoint (1:1 matcher)
kp:2048, match:nn1to1
19-06-07 F 1988.9 0.1785 0.0010 0.0149 0.0506 0.0952 0.1575 Challenge organizers SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. TBA imagematching@uvic.ca N/A 256 float32
SuperPoint (default)
kp:2048, match:nn
19-04-24 F 1474.5 0.1708 0.0008 0.0110 0.0355 0.0739 0.1247 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned (about 1200 on average). Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:1024, match:nn
19-04-26 F 1024.0 0.1693 0.0006 0.0102 0.0322 0.0672 0.1164 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:2048, match:nn
19-04-26 F 2048.0 0.1709 0.0008 0.0106 0.0361 0.0759 0.1297 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:256, match:nn
19-04-26 F 256.0 0.1594 0.0004 0.0050 0.0168 0.0349 0.0581 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:512, match:nn
19-04-26 F 512.0 0.1654 0.0010 0.0058 0.0245 0.0577 0.0942 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:8000, match:nn
19-04-26 F 8000.0 0.1706 0.0004 0.0095 0.0340 0.0757 0.1222 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
Scale-invariant desc. (Cartesian, lambda=16)
kp:8000, match:nn
19-07-29 F 7893.9 0.1605 0.0012 0.0120 0.0421 0.0853 0.1413 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. (This is a baseline where we use cartesian patches instead.) Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA patrick.ebel@epfl.ch N/A 128 float32
SuperPoint (trained on coco + phototourism training set)
kp:2048, match:nn
19-05-30 F 1833.9 0.1629 0.0006 0.0102 0.0309 0.0656 0.1154 Daniel DeTone, Paul Sarlin, Tomasz Malisiewicz, Andrew Rabinovich SuperPoint V1 model trained on COCO homographic warps at VGA resolution, plus pairs from the phototourism training set using the GT poses and depths for correspondence. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork ddetone@magicleap.com N/A 256 float32
SuperPoint + Custom Matcher (v1)
kp:2048, match:custom
19-05-30 F/M 1714.4 0.2270 0.0012 0.0129 0.0442 0.0932 0.1566 Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich Features are extracted by a modified SuperPoint: - retrained on the phototourism training set; - keypoints refinement; - better descriptor sampling; - adjusted thresholds. A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). TBA pesarlin@magicleap.com N/A 256 float32
SuperPoint + Custom Matcher (v2)
kp:2048, match:custom
19-05-28 F/M 1988.9 0.2269 0.0008 0.0162 0.0488 0.0985 0.1699 Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich Features are extracted by a modified SuperPoint: - retrained on the phototourism training set, - keypoints refinement, - better descriptor sampling, - adjusted thresholds; A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). TBA pesarlin@magicleap.com N/A 256 float32
SURF (OpenCV)
kp:8000, match:nn
19-04-24 F 7732.5 0.1390 0.0002 0.0027 0.0085 0.0243 0.0407 Challenge organizers SURF, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A TBA
SIFT + ContextDesc
kp:8000, match:nn1to1
19-06-07 F 7855.1 0.2059 0.0017 0.0160 0.0554 0.1149 0.1882 Challenge organizers ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with nearest-neighbour search, enforcing cross-match consistency. Features are quantized to uint8 and extracted from the code provided by the authors. https://github.com/lzx551402/contextdesc imagematching@uvic.ca N/A TBA

Stereo — sequence 'st_pauls_cathedral'
Method Date Type #kp MS mAP5o mAP10o mAP15o mAP20o mAP25o By Details Link Contact Updated Descriptor size
AKAZE (OpenCV)
kp:8000, match:nn
19-04-24 F 7857.2 0.1513 0.0002 0.0023 0.0102 0.0288 0.0648 Challenge organizers AKAZE, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A TBA
SphereDesc
kp:8000, match:nn
19-04-26 F 5685.9 0.1725 0.0006 0.0069 0.0253 0.0623 0.1205 Anonymous We use OpenCV's implementation of AKAZE detector, and for each keypoint we extract a descriptor via CNN. N/A Anonymous N/A 256 float32
Brisk + SSS
kp:8000, match:nn
19-05-14 F 5171.6 0.1727 0.0000 0.0056 0.0196 0.0562 0.1076 Anonymous We use OpenCV's implementation of brisk detector with the default settings, and for each image there are at most 8K keypoints. For each keypoint, we extract a descriptor via a CNN model. TBA Anonymous N/A 128 float32
D2-Net (single scale)
kp:8000, match:nn
19-05-07 F 5454.0 0.1751 0.0006 0.0069 0.0244 0.0610 0.1161 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf https://github.com/mihaidusmanu/d2-net imagematching@uvic.ca N/A 512 float32
D2-Net (multiscale)
kp:8000, match:nn
19-05-07 F 7231.5 0.1829 0.0006 0.0100 0.0309 0.0773 0.1425 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf https://github.com/mihaidusmanu/d2-net imagematching@uvic.ca N/A 512 float32
D2-Net (single scale, no PT dataset)
kp:8000, match:nn
19-06-01 F 5540.6 0.1744 0.0008 0.0048 0.0215 0.0593 0.1155 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf https://github.com/mihaidusmanu/d2-net imagematching@uvic.ca N/A 512 float32
D2-Net (multi-scale, no PT dataset)
kp:8000, match:nn
19-06-05 F 7004.6 0.1819 0.0006 0.0079 0.0261 0.0721 0.1366 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf https://github.com/mihaidusmanu/d2-net imagematching@uvic.ca N/A 512 float32
DELF
kp:1024, match:nn
19-05-05 F 1024.0 0.1833 0.0006 0.0042 0.0173 0.0506 0.0994 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 https://github.com/tensorflow/models/tree/master/research/delf imagematching@uvic.ca N/A 40 float32
DELF
kp:2048, match:nn
19-05-05 F 2033.1 0.1823 0.0006 0.0033 0.0167 0.0476 0.0902 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 https://github.com/tensorflow/models/tree/master/research/delf imagematching@uvic.ca N/A 40 float32
DELF
kp:256, match:nn
19-05-05 F 256.0 0.1809 0.0004 0.0031 0.0134 0.0439 0.0861 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 https://github.com/tensorflow/models/tree/master/research/delf imagematching@uvic.ca N/A 40 float32
DELF
kp:512, match:nn
19-05-05 F 512.0 0.1835 0.0002 0.0038 0.0173 0.0453 0.0957 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 https://github.com/tensorflow/models/tree/master/research/delf imagematching@uvic.ca N/A 40 float32
ELF-256D
kp:512, match:nn
19-05-07 F 375.1 0.1583 0.0004 0.0015 0.0050 0.0167 0.0351 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. TBA Anonymous N/A 256 float32
ELF-512D
kp:512, match:nn
19-05-09 F 375.0 0.1705 0.0004 0.0029 0.0098 0.0288 0.0564 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. TBA Anonymous N/A 256 float32
ELF-SIFT
kp:512, match:nn
19-04-26 F 464.1 0.1517 0.0006 0.0019 0.0092 0.0234 0.0501 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are HOG (as in SIFT). N/A Anonymous N/A 128 uint8
SIFT + GeoDesc
kp:2048, match:nn
19-05-19 F 2034.0 0.1503 0.0006 0.0033 0.0144 0.0397 0.0794 Challenge organizers GeoDesc extracted on SIFT keypoints. Feature matching with brute-force nearest-neighbour search. https://github.com/lzx551402/geodesc imagematching@uvic.ca N/A 128 float32
SIFT + HardNet
kp:2048, match:nn
19-05-19 F 2034.0 0.1527 0.0002 0.0040 0.0148 0.0395 0.0777 Challenge organizers HardNet extracted on SIFT keypoints. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger patches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. https://github.com/DagnyT/hardnet imagematching@uvic.ca N/A 128 float32
HarrisZ/RsGLOH2
kp:8000, match:sGOr2f
19-05-23 F 2300.1 0.1557 0.0002 0.0058 0.0180 0.0539 0.1053 Fabio Bellavia and Carlo Colombo HarrisZ keypoints and Root squared (like RootSIFT) sGLOH2 descriptors using sGOr2f* matching strategy. Distance table entries lower than their respective row-wise and column-wise averages are discarded and then matching pairs are computed using the greedy nearest neighbour as in the WISW@CAIP2019 contest. Keypoints and matches are ordered according to their ranks (best ones first). http://cvg.dsi.unifi.it/cvg/index.php?id=research#descriptor bellavia.fabio@gmail.com N/A 256 float32
HesAffNet - HardNet2
kp:8000, match:nn
19-05-29 F 7657.3 0.1595 0.0013 0.0056 0.0196 0.0499 0.0999 Milan Pultar, Dmytro Mishkin, Jiří Matas Detector: Hessian + AffNet(affine shape) + OriNet (orientation), Code and weight from https://github.com/ducha-aiki/affnet, Paper: https://arxiv.org/abs/1711.06704; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: https://github.com/pultarmi/HardNet_MultiDataset, Paper: https://arxiv.org/pdf/1901.09780.pdf; DeepNets were plugged into MODS framework, but without view synthesis and matching https://github.com/ducha-aiki/mods-light-zmq; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. TBA ducha.aiki@gmail.com N/A 128 uint8
Hessian - HardNet2
kp:8000, match:nn
19-05-30 F 7732.6 0.1689 0.0004 0.0061 0.0219 0.0531 0.1028 Milan Pultar, Dmytro Mishkin, Jiří Matas Detector: Hessian. Gravity vector orientation is assumed; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: https://github.com/pultarmi/HardNet_MultiDataset Paper: https://arxiv.org/pdf/1901.09780.pdf, DeepNets were plugged into MODS framework, but without view synthesis and matching https://github.com/ducha-aiki/mods-light-zmq; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. TBA ducha.aiki@gmail.com N/A 128 uint8
ORB (OpenCV)
kp:8000, match:nn
19-04-24 F 7683.3 0.1406 0.0004 0.0021 0.0079 0.0198 0.0370 Challenge organizers ORB, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A TBA
Scale-invariant desc. (Log-Polar, lambda=32)
kp:8000, match:nn
19-06-25 F 7868.9 0.1704 0.0004 0.0052 0.0165 0.0501 0.0911 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA patrick.ebel@epfl.ch N/A 128 float32
Scale-invariant desc. (Log-Polar, lambda=64)
kp:8000, match:nn
19-06-24 F 7868.9 0.1742 0.0002 0.0046 0.0203 0.0522 0.1030 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA patrick.ebel@epfl.ch N/A 128 float32
Scale-invariant desc. (Log-Polar, lambda=96)
kp:8000, match:nn
19-06-20 F 7868.9 0.1760 0.0004 0.0054 0.0180 0.0516 0.1059 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA patrick.ebel@epfl.ch N/A 128 float32
SIFT-AID (NN matcher)
kp:8000, match:nn
19-05-10 F 3924.0 0.1537 0.0002 0.0021 0.0130 0.0326 0.0606 Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: https://hal.archives-ouvertes.fr/hal-02016010. Code: https://github.com/rdguez-mariano/sift-aid https://hal.archives-ouvertes.fr/hal-02016010 facciolo@cmla.ens-cachan.fr N/A 6272 bits
SIFT-AID (custom matcher)
kp:8000, match:sift-aid
19-04-29 F/M 3924.0 0.1530 0.0000 0.0021 0.0092 0.0276 0.0560 Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: https://hal.archives-ouvertes.fr/hal-02016010. Code: https://github.com/rdguez-mariano/sift-aid https://hal.archives-ouvertes.fr/hal-02016010 facciolo@cmla.ens-cachan.fr N/A 6272 bits
SIFT + ContextDesc
kp:8000, match:nn
19-05-09 F 7698.4 0.1993 0.0006 0.0046 0.0192 0.0543 0.1057 Zixin Luo ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. https://github.com/lzx551402/contextdesc zluoag@cse.ust.hk N/A 128 uint8
SIFT-Dense-ContextDesc
kp:8000, match:nn
19-05-28 F 7698.4 0.1851 0.0004 0.0052 0.0196 0.0503 0.1011 Zixin Luo, Jiahui Zhang Dense-ContextDesc is a variant of ContextDesc, where descriptors are densely extracted from full images, instead of image patches, while other settings stay unchanged as original ContextDesc. We find Dense-ContextDesc performs better regarding in particular illumination changes. Dense-ContextDesc is extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. The model will be available on the authors' GitHub page. TBA zluoag@cse.ust.hk N/A TBA
SIFT + ContextDesc + Inlier Classification V2
kp:8000, match:custom
19-05-28 F/M 7698.4 0.2546 0.0006 0.0132 0.0549 0.1356 0.2486 Dawei Sun, Zixin Luo, Jiahui Zhang We use the SIFT detector and ContextDesc descriptor, and then we train an improved inlier classification and fundamental matrix estimation network based on [Yi et al. CVPR2018] (https://arxiv.org/pdf/1711.05971.pdf). https://github.com/lzx551402/contextdesc zluoag@cse.ust.hk N/A 128 float32
SIFT-GeoDesc-GitHub
kp:8000, match:nn
19-05-08 F 7698.4 0.1530 0.0004 0.0040 0.0157 0.0470 0.0905 Zixin Luo GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features extracted from the code provided by the authors. https://arxiv.org/abs/1807.06294 zluoag@cse.ust.hk N/A 128 float32
SIFT + GeoDesc
kp:8000, match:nn
19-04-24 F 7869.2 0.1563 0.0004 0.0046 0.0159 0.0439 0.0963 Challenge organizers GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. https://github.com/lzx551402/geodesc imagematching@uvic.ca N/A 128 float32
SIFT + HardNet
kp:8000, match:nn
19-04-24 F 7869.2 0.1689 0.0006 0.0046 0.0201 0.0533 0.1057 Challenge organizers HardNet extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. https://github.com/DagnyT/hardnet imagematching@uvic.ca N/A 128 float32
SIFT + L2-Net
kp:8000, match:nn
19-04-24 F 7869.2 0.1652 0.0002 0.0046 0.0196 0.0508 0.1003 Challenge organizers L2-Net extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. https://github.com/yuruntian/L2-Net imagematching@uvic.ca N/A 128 float32
SIFT + ContextDesc + Inlier Classification V1
kp:8000, match:custom
19-05-29 F/M 7698.4 0.2544 0.0013 0.0127 0.0426 0.1174 0.2185 Dawei Sun, Zixin Luo, Jiahui Zhang We use the SIFT detector and ContextDesc descriptor, and then we train an inlier classification and fundamental matrix estimation network using the architecture of [Yi et al. CVPR2018] (https://arxiv.org/pdf/1711.05971.pdf). https://github.com/lzx551402/contextdesc zluoag@cse.ust.hk N/A 128 float32
SIFT (OpenCV)
kp:8000, match:nn
19-04-24 F 7868.9 0.1447 0.0002 0.0027 0.0113 0.0309 0.0677 Challenge organizers SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A 128 float32
SIFT + TFeat
kp:8000, match:nn
19-04-24 F 7869.2 0.1570 0.0006 0.0040 0.0163 0.0449 0.0873 Challenge organizers T-Feat extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. https://github.com/vbalnt/tfeat imagematching@uvic.ca N/A 128 float32
SIFT (OpenCV)
kp:2048, match:nn
19-05-17 F 2048.0 0.1404 0.0002 0.0021 0.0092 0.0265 0.0522 Challenge organizers SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A 128 float32
Superpoint (nn matcher)
kp:2048, match:nn
19-06-07 F 1979.4 0.1608 0.0008 0.0044 0.0194 0.0520 0.0982 Challenge organizers SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. TBA imagematching@uvic.ca N/A 256 float32
Superpoint (1:1 matcher)
kp:2048, match:nn1to1
19-06-07 F 1979.4 0.1729 0.0006 0.0052 0.0292 0.0735 0.1393 Challenge organizers SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. TBA imagematching@uvic.ca N/A 256 float32
SuperPoint (default)
kp:2048, match:nn
19-04-24 F 1371.5 0.1657 0.0006 0.0058 0.0171 0.0547 0.1019 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned (about 1200 on average). Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:1024, match:nn
19-04-26 F 1024.0 0.1646 0.0004 0.0048 0.0173 0.0491 0.0961 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:2048, match:nn
19-04-26 F 2048.0 0.1633 0.0004 0.0050 0.0182 0.0460 0.0973 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:256, match:nn
19-04-26 F 256.0 0.1530 0.0004 0.0021 0.0104 0.0303 0.0610 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:512, match:nn
19-04-26 F 512.0 0.1597 0.0004 0.0033 0.0171 0.0443 0.0831 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:8000, match:nn
19-04-26 F 8000.0 0.1604 0.0006 0.0040 0.0180 0.0491 0.0892 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
Scale-invariant desc. (Cartesian, lambda=16)
kp:8000, match:nn
19-07-29 F 7868.9 0.1665 0.0004 0.0056 0.0192 0.0543 0.1019 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. (This is a baseline where we use cartesian patches instead.) Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA patrick.ebel@epfl.ch N/A 128 float32
SuperPoint (trained on coco + phototourism training set)
kp:2048, match:nn
19-05-30 F 1688.6 0.1604 0.0008 0.0040 0.0186 0.0529 0.1001 Daniel DeTone, Paul Sarlin, Tomasz Malisiewicz, Andrew Rabinovich SuperPoint V1 model trained on COCO homographic warps at VGA resolution, plus pairs from the phototourism training set using the GT poses and depths for correspondence. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork ddetone@magicleap.com N/A 256 float32
SuperPoint + Custom Matcher (v1)
kp:2048, match:custom
19-05-30 F/M 1636.0 0.2492 0.0010 0.0125 0.0422 0.1026 0.1964 Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich Features are extracted by a modified SuperPoint: - retrained on the phototourism training set; - keypoints refinement; - better descriptor sampling; - adjusted thresholds. A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). TBA pesarlin@magicleap.com N/A 256 float32
SuperPoint + Custom Matcher (v2)
kp:2048, match:custom
19-05-28 F/M 1979.4 0.2488 0.0023 0.0109 0.0399 0.1007 0.1899 Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich Features are extracted by a modified SuperPoint: - retrained on the phototourism training set, - keypoints refinement, - better descriptor sampling, - adjusted thresholds; A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). TBA pesarlin@magicleap.com N/A 256 float32
SURF (OpenCV)
kp:8000, match:nn
19-04-24 F 7714.1 0.1404 0.0004 0.0019 0.0069 0.0217 0.0531 Challenge organizers SURF, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A TBA
SIFT + ContextDesc
kp:8000, match:nn1to1
19-06-07 F 7698.4 0.2088 0.0008 0.0084 0.0286 0.0771 0.1437 Challenge organizers ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with nearest-neighbour search, enforcing cross-match consistency. Features are quantized to uint8 and extracted from the code provided by the authors. https://github.com/lzx551402/contextdesc imagematching@uvic.ca N/A TBA

Stereo — sequence 'united_states_capitol'
Method Date Type #kp MS mAP5o mAP10o mAP15o mAP20o mAP25o By Details Link Contact Updated Descriptor size
AKAZE (OpenCV)
kp:8000, match:nn
19-04-24 F 8000.0 0.2440 0.0025 0.0197 0.0628 0.1460 0.2595 Challenge organizers AKAZE, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A TBA
SphereDesc
kp:8000, match:nn
19-04-26 F 3709.7 0.2778 0.0010 0.0108 0.0338 0.0862 0.1631 Anonymous We use OpenCV's implementation of AKAZE detector, and for each keypoint we extract a descriptor via CNN. N/A Anonymous N/A 256 float32
Brisk + SSS
kp:8000, match:nn
19-05-14 F 3466.6 0.2864 0.0023 0.0151 0.0514 0.1238 0.2239 Anonymous We use OpenCV's implementation of brisk detector with the default settings, and for each image there are at most 8K keypoints. For each keypoint, we extract a descriptor via a CNN model. TBA Anonymous N/A 128 float32
D2-Net (single scale)
kp:8000, match:nn
19-05-07 F 4917.8 0.2625 0.0013 0.0101 0.0366 0.0870 0.1548 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf https://github.com/mihaidusmanu/d2-net imagematching@uvic.ca N/A 512 float32
D2-Net (multiscale)
kp:8000, match:nn
19-05-07 F 6729.8 0.2722 0.0003 0.0119 0.0439 0.1036 0.1904 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf https://github.com/mihaidusmanu/d2-net imagematching@uvic.ca N/A 512 float32
D2-Net (single scale, no PT dataset)
kp:8000, match:nn
19-06-01 F 4989.5 0.2615 0.0008 0.0124 0.0408 0.0880 0.1578 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf https://github.com/mihaidusmanu/d2-net imagematching@uvic.ca N/A 512 float32
D2-Net (multi-scale, no PT dataset)
kp:8000, match:nn
19-06-05 F 6501.6 0.2716 0.0008 0.0134 0.0449 0.1059 0.1947 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf https://github.com/mihaidusmanu/d2-net imagematching@uvic.ca N/A 512 float32
DELF
kp:1024, match:nn
19-05-05 F 1024.0 0.3106 0.0003 0.0055 0.0222 0.0502 0.1004 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 https://github.com/tensorflow/models/tree/master/research/delf imagematching@uvic.ca N/A 40 float32
DELF
kp:2048, match:nn
19-05-05 F 2048.0 0.2956 0.0003 0.0076 0.0252 0.0519 0.1001 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 https://github.com/tensorflow/models/tree/master/research/delf imagematching@uvic.ca N/A 40 float32
DELF
kp:256, match:nn
19-05-05 F 256.0 0.3175 0.0010 0.0088 0.0298 0.0691 0.1359 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 https://github.com/tensorflow/models/tree/master/research/delf imagematching@uvic.ca N/A 40 float32
DELF
kp:512, match:nn
19-05-05 F 512.0 0.3172 0.0010 0.0083 0.0229 0.0524 0.1064 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 https://github.com/tensorflow/models/tree/master/research/delf imagematching@uvic.ca N/A 40 float32
ELF-256D
kp:512, match:nn
19-05-07 F 266.9 0.2694 0.0013 0.0096 0.0262 0.0618 0.1215 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. TBA Anonymous N/A 256 float32
ELF-512D
kp:512, match:nn
19-05-09 F 266.8 0.2837 0.0010 0.0068 0.0280 0.0610 0.1193 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. TBA Anonymous N/A 256 float32
ELF-SIFT
kp:512, match:nn
19-04-26 F 470.5 0.2536 0.0008 0.0093 0.0310 0.0645 0.1223 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are HOG (as in SIFT). N/A Anonymous N/A 128 uint8
SIFT + GeoDesc
kp:2048, match:nn
19-05-19 F 1965.2 0.2493 0.0013 0.0096 0.0386 0.0988 0.1727 Challenge organizers GeoDesc extracted on SIFT keypoints. Feature matching with brute-force nearest-neighbour search. https://github.com/lzx551402/geodesc imagematching@uvic.ca N/A 128 float32
SIFT + HardNet
kp:2048, match:nn
19-05-19 F 1965.2 0.2515 0.0015 0.0113 0.0411 0.0920 0.1636 Challenge organizers HardNet extracted on SIFT keypoints. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger patches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. https://github.com/DagnyT/hardnet imagematching@uvic.ca N/A 128 float32
HarrisZ/RsGLOH2
kp:8000, match:sGOr2f
19-05-23 F 2032.7 0.2592 0.0010 0.0119 0.0509 0.1142 0.2057 Fabio Bellavia and Carlo Colombo HarrisZ keypoints and Root squared (like RootSIFT) sGLOH2 descriptors using sGOr2f* matching strategy. Distance table entries lower than their respective row-wise and column-wise averages are discarded and then matching pairs are computed using the greedy nearest neighbour as in the WISW@CAIP2019 contest. Keypoints and matches are ordered according to their ranks (best ones first). http://cvg.dsi.unifi.it/cvg/index.php?id=research#descriptor bellavia.fabio@gmail.com N/A 256 float32
HesAffNet - HardNet2
kp:8000, match:nn
19-05-29 F 7685.2 0.2562 0.0015 0.0111 0.0338 0.0792 0.1455 Milan Pultar, Dmytro Mishkin, Jiří Matas Detector: Hessian + AffNet(affine shape) + OriNet (orientation), Code and weight from https://github.com/ducha-aiki/affnet, Paper: https://arxiv.org/abs/1711.06704; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: https://github.com/pultarmi/HardNet_MultiDataset, Paper: https://arxiv.org/pdf/1901.09780.pdf; DeepNets were plugged into MODS framework, but without view synthesis and matching https://github.com/ducha-aiki/mods-light-zmq; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. TBA ducha.aiki@gmail.com N/A 128 uint8
Hessian - HardNet2
kp:8000, match:nn
19-05-30 F 7757.0 0.2652 0.0010 0.0146 0.0381 0.0872 0.1551 Milan Pultar, Dmytro Mishkin, Jiří Matas Detector: Hessian. Gravity vector orientation is assumed; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: https://github.com/pultarmi/HardNet_MultiDataset Paper: https://arxiv.org/pdf/1901.09780.pdf, DeepNets were plugged into MODS framework, but without view synthesis and matching https://github.com/ducha-aiki/mods-light-zmq; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. TBA ducha.aiki@gmail.com N/A 128 uint8
ORB (OpenCV)
kp:8000, match:nn
19-04-24 F 6953.4 0.2425 0.0008 0.0083 0.0343 0.0797 0.1427 Challenge organizers ORB, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A TBA
Scale-invariant desc. (Log-Polar, lambda=32)
kp:8000, match:nn
19-06-25 F 8000.0 0.2632 0.0008 0.0111 0.0393 0.0872 0.1621 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA patrick.ebel@epfl.ch N/A 128 float32
Scale-invariant desc. (Log-Polar, lambda=64)
kp:8000, match:nn
19-06-24 F 8000.0 0.2669 0.0005 0.0121 0.0378 0.0905 0.1652 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA patrick.ebel@epfl.ch N/A 128 float32
Scale-invariant desc. (Log-Polar, lambda=96)
kp:8000, match:nn
19-06-20 F 8000.0 0.2691 0.0008 0.0124 0.0416 0.0983 0.1783 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA patrick.ebel@epfl.ch N/A 128 float32
SIFT-AID (NN matcher)
kp:8000, match:nn
19-05-10 F 2709.8 0.2629 0.0010 0.0129 0.0305 0.0721 0.1314 Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: https://hal.archives-ouvertes.fr/hal-02016010. Code: https://github.com/rdguez-mariano/sift-aid https://hal.archives-ouvertes.fr/hal-02016010 facciolo@cmla.ens-cachan.fr N/A 6272 bits
SIFT-AID (custom matcher)
kp:8000, match:sift-aid
19-04-29 F/M 2709.8 0.2645 0.0005 0.0083 0.0313 0.0782 0.1437 Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: https://hal.archives-ouvertes.fr/hal-02016010. Code: https://github.com/rdguez-mariano/sift-aid https://hal.archives-ouvertes.fr/hal-02016010 facciolo@cmla.ens-cachan.fr N/A 6272 bits
SIFT + ContextDesc
kp:8000, match:nn
19-05-09 F 7363.6 0.3099 0.0010 0.0119 0.0383 0.0986 0.1793 Zixin Luo ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. https://github.com/lzx551402/contextdesc zluoag@cse.ust.hk N/A 128 uint8
SIFT-Dense-ContextDesc
kp:8000, match:nn
19-05-28 F 7363.5 0.2925 0.0010 0.0116 0.0358 0.0817 0.1520 Zixin Luo, Jiahui Zhang Dense-ContextDesc is a variant of ContextDesc, where descriptors are densely extracted from full images, instead of image patches, while other settings stay unchanged as original ContextDesc. We find Dense-ContextDesc performs better regarding in particular illumination changes. Dense-ContextDesc is extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. The model will be available on the authors' GitHub page. TBA zluoag@cse.ust.hk N/A TBA
SIFT + ContextDesc + Inlier Classification V2
kp:8000, match:custom
19-05-28 F/M 7363.6 0.3932 0.0008 0.0116 0.0373 0.0976 0.1828 Dawei Sun, Zixin Luo, Jiahui Zhang We use the SIFT detector and ContextDesc descriptor, and then we train an improved inlier classification and fundamental matrix estimation network based on [Yi et al. CVPR2018] (https://arxiv.org/pdf/1711.05971.pdf). https://github.com/lzx551402/contextdesc zluoag@cse.ust.hk N/A 128 float32
SIFT-GeoDesc-GitHub
kp:8000, match:nn
19-05-08 F 7363.6 0.2443 0.0013 0.0144 0.0512 0.1067 0.1858 Zixin Luo GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features extracted from the code provided by the authors. https://arxiv.org/abs/1807.06294 zluoag@cse.ust.hk N/A 128 float32
SIFT + GeoDesc
kp:8000, match:nn
19-04-24 F 8000.0 0.2469 0.0018 0.0154 0.0459 0.1016 0.1762 Challenge organizers GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. https://github.com/lzx551402/geodesc imagematching@uvic.ca N/A 128 float32
SIFT + HardNet
kp:8000, match:nn
19-04-24 F 8000.0 0.2602 0.0010 0.0116 0.0406 0.0870 0.1652 Challenge organizers HardNet extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. https://github.com/DagnyT/hardnet imagematching@uvic.ca N/A 128 float32
SIFT + L2-Net
kp:8000, match:nn
19-04-24 F 8000.0 0.2576 0.0005 0.0131 0.0429 0.0946 0.1654 Challenge organizers L2-Net extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. https://github.com/yuruntian/L2-Net imagematching@uvic.ca N/A 128 float32
SIFT + ContextDesc + Inlier Classification V1
kp:8000, match:custom
19-05-29 F/M 7363.6 0.3903 0.0013 0.0101 0.0295 0.0751 0.1455 Dawei Sun, Zixin Luo, Jiahui Zhang We use the SIFT detector and ContextDesc descriptor, and then we train an inlier classification and fundamental matrix estimation network using the architecture of [Yi et al. CVPR2018] (https://arxiv.org/pdf/1711.05971.pdf). https://github.com/lzx551402/contextdesc zluoag@cse.ust.hk N/A 128 float32
SIFT (OpenCV)
kp:8000, match:nn
19-04-24 F 8000.0 0.2346 0.0005 0.0146 0.0504 0.1109 0.1924 Challenge organizers SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A 128 float32
SIFT + TFeat
kp:8000, match:nn
19-04-24 F 8000.0 0.2485 0.0005 0.0119 0.0381 0.0875 0.1641 Challenge organizers T-Feat extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. https://github.com/vbalnt/tfeat imagematching@uvic.ca N/A 128 float32
SIFT (OpenCV)
kp:2048, match:nn
19-05-17 F 2048.0 0.2357 0.0005 0.0088 0.0310 0.0814 0.1523 Challenge organizers SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A 128 float32
Superpoint (nn matcher)
kp:2048, match:nn
19-06-07 F 1808.9 0.2642 0.0015 0.0119 0.0353 0.0817 0.1541 Challenge organizers SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. TBA imagematching@uvic.ca N/A 256 float32
Superpoint (1:1 matcher)
kp:2048, match:nn1to1
19-06-07 F 1808.9 0.2738 0.0010 0.0116 0.0398 0.1046 0.1942 Challenge organizers SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. TBA imagematching@uvic.ca N/A 256 float32
SuperPoint (default)
kp:2048, match:nn
19-04-24 F 1159.0 0.2718 0.0013 0.0093 0.0305 0.0714 0.1369 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned (about 1200 on average). Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:1024, match:nn
19-04-26 F 1024.0 0.2726 0.0015 0.0098 0.0300 0.0688 0.1273 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:2048, match:nn
19-04-26 F 2048.0 0.2598 0.0020 0.0106 0.0318 0.0661 0.1266 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:256, match:nn
19-04-26 F 256.0 0.2737 0.0018 0.0113 0.0366 0.0840 0.1357 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:512, match:nn
19-04-26 F 512.0 0.2779 0.0010 0.0088 0.0358 0.0754 0.1359 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:8000, match:nn
19-04-26 F 8000.0 0.2448 0.0005 0.0103 0.0272 0.0603 0.1104 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
Scale-invariant desc. (Cartesian, lambda=16)
kp:8000, match:nn
19-07-29 F 8000.0 0.2584 0.0013 0.0129 0.0383 0.0890 0.1692 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. (This is a baseline where we use cartesian patches instead.) Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA patrick.ebel@epfl.ch N/A 128 float32
SuperPoint (trained on coco + phototourism training set)
kp:2048, match:nn
19-05-30 F 1440.8 0.2671 0.0015 0.0134 0.0330 0.0804 0.1536 Daniel DeTone, Paul Sarlin, Tomasz Malisiewicz, Andrew Rabinovich SuperPoint V1 model trained on COCO homographic warps at VGA resolution, plus pairs from the phototourism training set using the GT poses and depths for correspondence. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork ddetone@magicleap.com N/A 256 float32
SuperPoint + Custom Matcher (v1)
kp:2048, match:custom
19-05-30 F/M 1442.2 0.3892 0.0008 0.0081 0.0361 0.0890 0.1755 Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich Features are extracted by a modified SuperPoint: - retrained on the phototourism training set; - keypoints refinement; - better descriptor sampling; - adjusted thresholds. A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). TBA pesarlin@magicleap.com N/A 256 float32
SuperPoint + Custom Matcher (v2)
kp:2048, match:custom
19-05-28 F/M 1808.9 0.3861 0.0013 0.0086 0.0333 0.0888 0.1689 Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich Features are extracted by a modified SuperPoint: - retrained on the phototourism training set, - keypoints refinement, - better descriptor sampling, - adjusted thresholds; A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). TBA pesarlin@magicleap.com N/A 256 float32
SURF (OpenCV)
kp:8000, match:nn
19-04-24 F 7945.9 0.2294 0.0018 0.0113 0.0361 0.0900 0.1659 Challenge organizers SURF, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A TBA
SIFT + ContextDesc
kp:8000, match:nn1to1
19-06-07 F 7363.6 0.3179 0.0010 0.0149 0.0492 0.1109 0.1979 Challenge organizers ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with nearest-neighbour search, enforcing cross-match consistency. Features are quantized to uint8 and extracted from the code provided by the authors. https://github.com/lzx551402/contextdesc imagematching@uvic.ca N/A TBA