Breakdown on the Phototourism dataset, multi-view stereo task, by sequence.
MVS — All sequences — Sorted by mAP15o | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Method | BM | FCS | LMS | LB | MC | MR | PSM | RS | SF | SPC | USC | AVG | ||||||||
AKAZE (OpenCV) kp:8000, match:nn |
0.3576 | 0.4385 | 0.5551 | 0.3696 | 0.5935 | 0.3131 | 0.2543 | 0.4708 | 0.5709 | 0.6054 | 0.1847 | 0.4285 | 19-04-24 | F | Challenge organizers | AKAZE, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. | https://opencv.org | imagematching@uvic.ca | N/A | TBA |
SphereDesc kp:8000, match:nn |
0.3772 | 0.5865 | 0.6595 | 0.3789 | 0.6131 | 0.3458 | 0.4122 | 0.4310 | 0.7381 | 0.6481 | 0.1606 | 0.4865 | 19-04-26 | F | Anonymous | We use OpenCV's implementation of AKAZE detector, and for each keypoint we extract a descriptor via CNN. | N/A | Anonymous | N/A | 256 float32 |
Brisk + SSS kp:8000, match:nn |
0.0844 | 0.3523 | 0.1150 | 0.3040 | 0.3429 | 0.1207 | 0.2678 | 0.3511 | 0.4466 | 0.4849 | 0.0936 | 0.2694 | 19-05-14 | F | Anonymous | We use OpenCV's implementation of brisk detector with the default settings, and for each image there are at most 8K keypoints. For each keypoint, we extract a descriptor via a CNN model. | TBA | Anonymous | N/A | 128 float32 |
D2-Net (single scale) kp:8000, match:nn |
0.2518 | 0.4914 | 0.6187 | 0.4915 | 0.4482 | 0.3186 | 0.3673 | 0.3340 | 0.5189 | 0.5070 | 0.1321 | 0.4072 | 19-05-07 | F | Challenge organizers | D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf | https://github.com/mihaidusmanu/d2-net | imagematching@uvic.ca | N/A | 512 float32 |
D2-Net (multiscale) kp:8000, match:nn |
0.1904 | 0.4486 | 0.6308 | 0.4622 | 0.4263 | 0.3543 | 0.3746 | 0.3690 | 0.5083 | 0.5044 | 0.1323 | 0.4001 | 19-05-07 | F | Challenge organizers | D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf | https://github.com/mihaidusmanu/d2-net | imagematching@uvic.ca | N/A | 512 float32 |
D2-Net (single scale, no PT dataset) kp:8000, match:nn |
0.2522 | 0.4901 | 0.6258 | 0.4932 | 0.4486 | 0.3217 | 0.3760 | 0.3284 | 0.5127 | 0.5209 | 0.1431 | 0.4102 | 19-06-01 | F | Challenge organizers | D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf | https://github.com/mihaidusmanu/d2-net | imagematching@uvic.ca | N/A | 512 float32 |
D2-Net (multi-scale, no PT dataset) kp:8000, match:nn |
0.1962 | 0.4584 | 0.6617 | 0.4545 | 0.4435 | 0.3182 | 0.3976 | 0.3394 | 0.4576 | 0.5162 | 0.1208 | 0.3967 | 19-06-05 | F | Challenge organizers | D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf | https://github.com/mihaidusmanu/d2-net | imagematching@uvic.ca | N/A | 512 float32 |
DELF kp:1024, match:nn |
0.0858 | 0.2017 | 0.2292 | 0.1254 | 0.1748 | 0.0330 | 0.0476 | 0.1275 | 0.0621 | 0.2125 | 0.0602 | 0.1236 | 19-05-05 | F | Challenge organizers | DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 | https://github.com/tensorflow/models/tree/master/research/delf | imagematching@uvic.ca | N/A | 40 float32 |
DELF kp:2048, match:nn |
0.1014 | 0.2859 | 0.2730 | 0.1300 | 0.2152 | 0.0592 | 0.1378 | 0.1566 | 0.1365 | 0.2347 | 0.0615 | 0.1629 | 19-05-05 | F | Challenge organizers | DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 | https://github.com/tensorflow/models/tree/master/research/delf | imagematching@uvic.ca | N/A | 40 float32 |
DELF kp:256, match:nn |
0.0001 | 0.0019 | 0.0148 | 0.0195 | 0.0075 | 0.0010 | %!f(int64=0000) | 0.0223 | %!f(int64=0000) | 0.0005 | 0.0112 | 0.0072 | 19-05-05 | F | Challenge organizers | DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 | https://github.com/tensorflow/models/tree/master/research/delf | imagematching@uvic.ca | N/A | 40 float32 |
DELF kp:512, match:nn |
0.0409 | 0.0558 | 0.0850 | 0.0988 | 0.0773 | 0.0140 | 0.0010 | 0.0895 | 0.0018 | 0.1425 | 0.0375 | 0.0585 | 19-05-05 | F | Challenge organizers | DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 | https://github.com/tensorflow/models/tree/master/research/delf | imagematching@uvic.ca | N/A | 40 float32 |
ELF-256D kp:512, match:nn |
0.0449 | 0.0499 | 0.1188 | 0.1584 | 0.0372 | 0.0190 | 0.0010 | 0.1209 | 0.0985 | 0.0742 | 0.0002 | 0.0657 | 19-05-07 | F | Anonymous | ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. | TBA | Anonymous | N/A | 256 float32 |
ELF-512D kp:512, match:nn |
0.0499 | 0.0654 | 0.1666 | 0.2027 | 0.0645 | 0.0207 | 0.0024 | 0.1424 | 0.0889 | 0.1297 | 0.0009 | 0.0849 | 19-05-09 | F | Anonymous | ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. | TBA | Anonymous | N/A | 256 float32 |
ELF-SIFT kp:512, match:nn |
0.0933 | 0.1873 | 0.2702 | 0.2906 | 0.2627 | 0.0799 | 0.0148 | 0.2254 | 0.2236 | 0.1925 | 0.0055 | 0.1678 | 19-04-26 | F | Anonymous | ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are HOG (as in SIFT). | N/A | Anonymous | N/A | 128 uint8 |
SIFT + GeoDesc kp:2048, match:nn |
0.2347 | 0.5479 | 0.6881 | 0.2262 | 0.4348 | 0.2438 | 0.2070 | 0.3115 | 0.6402 | 0.5461 | 0.1021 | 0.3802 | 19-05-19 | F | Challenge organizers | GeoDesc extracted on SIFT keypoints. Feature matching with brute-force nearest-neighbour search. | https://github.com/lzx551402/geodesc | imagematching@uvic.ca | N/A | 128 float32 |
SIFT + HardNet kp:2048, match:nn |
0.2313 | 0.5550 | 0.6536 | 0.2170 | 0.4446 | 0.2620 | 0.1933 | 0.3006 | 0.6428 | 0.5339 | 0.1218 | 0.3778 | 19-05-19 | F | Challenge organizers | HardNet extracted on SIFT keypoints. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger patches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. | https://github.com/DagnyT/hardnet | imagematching@uvic.ca | N/A | 128 float32 |
HarrisZ/RsGLOH2 kp:8000, match:sGOr2f |
0.3285 | 0.6044 | 0.7392 | 0.5208 | 0.6511 | 0.4007 | 0.3091 | 0.4042 | 0.7566 | 0.6161 | 0.2137 | 0.5040 | 19-05-23 | F | Fabio Bellavia and Carlo Colombo | HarrisZ keypoints and Root squared (like RootSIFT) sGLOH2 descriptors using sGOr2f* matching strategy. Distance table entries lower than their respective row-wise and column-wise averages are discarded and then matching pairs are computed using the greedy nearest neighbour as in the WISW@CAIP2019 contest. Keypoints and matches are ordered according to their ranks (best ones first). | http://cvg.dsi.unifi.it/cvg/index.php?id=research#descriptor | bellavia.fabio@gmail.com | N/A | 256 float32 |
HesAffNet - HardNet2 kp:8000, match:nn |
0.4287 | 0.6316 | 0.7908 | 0.4149 | 0.6392 | 0.3792 | 0.4449 | 0.4937 | 0.7711 | 0.6565 | 0.1619 | 0.5284 | 19-05-29 | F | Milan Pultar, Dmytro Mishkin, Jiří Matas | Detector: Hessian + AffNet(affine shape) + OriNet (orientation), Code and weight from https://github.com/ducha-aiki/affnet, Paper: https://arxiv.org/abs/1711.06704; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: https://github.com/pultarmi/HardNet_MultiDataset, Paper: https://arxiv.org/pdf/1901.09780.pdf; DeepNets were plugged into MODS framework, but without view synthesis and matching https://github.com/ducha-aiki/mods-light-zmq; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. | TBA | ducha.aiki@gmail.com | N/A | 128 uint8 |
Hessian - HardNet2 kp:8000, match:nn |
0.4294 | 0.5902 | 0.7407 | 0.3967 | 0.6356 | 0.4294 | 0.4477 | 0.5004 | 0.5997 | 0.6336 | 0.1572 | 0.5055 | 19-05-30 | F | Milan Pultar, Dmytro Mishkin, Jiří Matas | Detector: Hessian. Gravity vector orientation is assumed; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: https://github.com/pultarmi/HardNet_MultiDataset Paper: https://arxiv.org/pdf/1901.09780.pdf, DeepNets were plugged into MODS framework, but without view synthesis and matching https://github.com/ducha-aiki/mods-light-zmq; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. | TBA | ducha.aiki@gmail.com | N/A | 128 uint8 |
ORB (OpenCV) kp:8000, match:nn |
0.1587 | 0.3061 | 0.3016 | 0.1642 | 0.2027 | 0.1402 | 0.1302 | 0.3006 | 0.3802 | 0.3913 | 0.0613 | 0.2307 | 19-04-24 | F | Challenge organizers | ORB, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. | https://opencv.org | imagematching@uvic.ca | N/A | TBA |
Scale-invariant desc. (Log-Polar, lambda=32) kp:8000, match:nn |
0.4094 | 0.6191 | 0.7261 | 0.5028 | 0.7047 | 0.5206 | 0.4309 | 0.4283 | 0.7671 | 0.6324 | 0.1863 | 0.5389 | 19-06-25 | F | Patrick Ebel | We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. | TBA | patrick.ebel@epfl.ch | N/A | 128 float32 |
Scale-invariant desc. (Log-Polar, lambda=64) kp:8000, match:nn |
0.4028 | 0.5943 | 0.7361 | 0.5324 | 0.7128 | 0.4911 | 0.4473 | 0.4141 | 0.7515 | 0.6505 | 0.2032 | 0.5396 | 19-06-24 | F | Patrick Ebel | We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. | TBA | patrick.ebel@epfl.ch | N/A | 128 float32 |
Scale-invariant desc. (Log-Polar, lambda=96) kp:8000, match:nn |
0.3993 | 0.6064 | 0.7317 | 0.5410 | 0.7159 | 0.5201 | 0.4580 | 0.4270 | 0.7335 | 0.6406 | 0.1957 | 0.5427 | 19-06-20 | F | Patrick Ebel | We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. | TBA | patrick.ebel@epfl.ch | N/A | 128 float32 |
SIFT-AID (NN matcher) kp:8000, match:nn |
0.1532 | 0.3674 | 0.2183 | 0.2667 | 0.4313 | 0.1428 | 0.1834 | 0.3325 | 0.4663 | 0.3444 | 0.0502 | 0.2688 | 19-05-10 | F | Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon | We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: https://hal.archives-ouvertes.fr/hal-02016010. Code: https://github.com/rdguez-mariano/sift-aid | https://hal.archives-ouvertes.fr/hal-02016010 | facciolo@cmla.ens-cachan.fr | N/A | 6272 bits |
SIFT-AID (custom matcher) kp:8000, match:sift-aid |
0.1351 | 0.3428 | 0.1588 | 0.2417 | 0.4441 | 0.1367 | 0.1903 | 0.3177 | 0.4562 | 0.3357 | 0.0491 | 0.2553 | 19-04-29 | F/M | Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon | We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: https://hal.archives-ouvertes.fr/hal-02016010. Code: https://github.com/rdguez-mariano/sift-aid | https://hal.archives-ouvertes.fr/hal-02016010 | facciolo@cmla.ens-cachan.fr | N/A | 6272 bits |
SIFT + ContextDesc kp:8000, match:nn |
0.3339 | 0.6322 | 0.7204 | 0.5712 | 0.7115 | 0.4788 | 0.4852 | 0.4540 | 0.7826 | 0.5961 | 0.1735 | 0.5399 | 19-05-09 | F | Zixin Luo | ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. | https://github.com/lzx551402/contextdesc | zluoag@cse.ust.hk | N/A | 128 uint8 |
SIFT-Dense-ContextDesc kp:8000, match:nn |
0.3768 | 0.5872 | 0.7283 | 0.5217 | 0.6756 | 0.4511 | 0.4682 | 0.4026 | 0.7268 | 0.5668 | 0.1371 | 0.5129 | 19-05-28 | F | Zixin Luo, Jiahui Zhang | Dense-ContextDesc is a variant of ContextDesc, where descriptors are densely extracted from full images, instead of image patches, while other settings stay unchanged as original ContextDesc. We find Dense-ContextDesc performs better regarding in particular illumination changes. Dense-ContextDesc is extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. The model will be available on the authors' GitHub page. | TBA | zluoag@cse.ust.hk | N/A | TBA |
SIFT + ContextDesc + Inlier Classification V2 kp:8000, match:custom |
0.6071 | 0.8317 | 0.8542 | 0.8174 | 0.8809 | 0.6305 | 0.6854 | 0.8043 | 0.9044 | 0.8776 | 0.2345 | 0.7389 | 19-05-28 | F/M | Dawei Sun, Zixin Luo, Jiahui Zhang | We use the SIFT detector and ContextDesc descriptor, and then we train an improved inlier classification and fundamental matrix estimation network based on [Yi et al. CVPR2018] (https://arxiv.org/pdf/1711.05971.pdf). | https://github.com/lzx551402/contextdesc | zluoag@cse.ust.hk | N/A | 128 float32 |
SIFT-GeoDesc-GitHub kp:8000, match:nn |
0.3375 | 0.6042 | 0.6834 | 0.4911 | 0.7246 | 0.4662 | 0.4772 | 0.4788 | 0.7983 | 0.6165 | 0.1497 | 0.5298 | 19-05-08 | F | Zixin Luo | GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features extracted from the code provided by the authors. | https://arxiv.org/abs/1807.06294 | zluoag@cse.ust.hk | N/A | 128 float32 |
SIFT + GeoDesc kp:8000, match:nn |
0.4169 | 0.6473 | 0.6744 | 0.4696 | 0.7138 | 0.4907 | 0.4137 | 0.4632 | 0.7740 | 0.6234 | 0.1616 | 0.5317 | 19-04-24 | F | Challenge organizers | GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. | https://github.com/lzx551402/geodesc | imagematching@uvic.ca | N/A | 128 float32 |
SIFT + HardNet kp:8000, match:nn |
0.4090 | 0.6438 | 0.7726 | 0.4702 | 0.6945 | 0.5106 | 0.4612 | 0.4477 | 0.8030 | 0.6376 | 0.1794 | 0.5481 | 19-04-24 | F | Challenge organizers | HardNet extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. | https://github.com/DagnyT/hardnet | imagematching@uvic.ca | N/A | 128 float32 |
SIFT + L2-Net kp:8000, match:nn |
0.3292 | 0.5935 | 0.7109 | 0.4380 | 0.7093 | 0.4667 | 0.4000 | 0.4192 | 0.7682 | 0.5832 | 0.1778 | 0.5087 | 19-04-24 | F | Challenge organizers | L2-Net extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. | https://github.com/yuruntian/L2-Net | imagematching@uvic.ca | N/A | 128 float32 |
SIFT + ContextDesc + Inlier Classification V1 kp:8000, match:custom |
0.5711 | 0.8059 | 0.8677 | 0.7937 | 0.8609 | 0.6222 | 0.6156 | 0.7760 | 0.8970 | 0.8567 | 0.2190 | 0.7169 | 19-05-29 | F/M | Dawei Sun, Zixin Luo, Jiahui Zhang | We use the SIFT detector and ContextDesc descriptor, and then we train an inlier classification and fundamental matrix estimation network using the architecture of [Yi et al. CVPR2018] (https://arxiv.org/pdf/1711.05971.pdf). | https://github.com/lzx551402/contextdesc | zluoag@cse.ust.hk | N/A | 128 float32 |
SIFT (OpenCV) kp:8000, match:nn |
0.2714 | 0.5530 | 0.4768 | 0.3859 | 0.6433 | 0.3181 | 0.2797 | 0.4216 | 0.6002 | 0.4979 | 0.1131 | 0.4146 | 19-04-24 | F | Challenge organizers | SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. | https://opencv.org | imagematching@uvic.ca | N/A | 128 float32 |
SIFT + TFeat kp:8000, match:nn |
0.3304 | 0.5315 | 0.6238 | 0.4131 | 0.6819 | 0.4022 | 0.3557 | 0.3929 | 0.6857 | 0.5503 | 0.1402 | 0.4643 | 19-04-24 | F | Challenge organizers | T-Feat extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. | https://github.com/vbalnt/tfeat | imagematching@uvic.ca | N/A | 128 float32 |
SIFT (OpenCV) kp:2048, match:nn |
0.1458 | 0.4113 | 0.4038 | 0.1747 | 0.2606 | 0.1601 | 0.0645 | 0.1915 | 0.4469 | 0.3590 | 0.0652 | 0.2439 | 19-05-17 | F | Challenge organizers | SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. | https://opencv.org | imagematching@uvic.ca | N/A | 128 float32 |
Superpoint (nn matcher) kp:2048, match:nn |
0.3344 | 0.5417 | 0.7106 | 0.5541 | 0.6105 | 0.3891 | 0.2737 | 0.3591 | 0.7074 | 0.6131 | 0.1618 | 0.4778 | 19-06-07 | F | Challenge organizers | SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. | TBA | imagematching@uvic.ca | N/A | 256 float32 |
Superpoint (1:1 matcher) kp:2048, match:nn1to1 |
0.4071 | 0.6746 | 0.7953 | 0.6602 | 0.6507 | 0.4309 | 0.3033 | 0.4084 | 0.7685 | 0.7030 | 0.1814 | 0.5440 | 19-06-07 | F | Challenge organizers | SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. | TBA | imagematching@uvic.ca | N/A | 256 float32 |
SuperPoint (default) kp:2048, match:nn |
0.3763 | 0.3927 | 0.6357 | 0.5936 | 0.5046 | 0.3569 | 0.3163 | 0.3722 | 0.4853 | 0.5128 | 0.1470 | 0.4267 | 19-04-24 | F | Challenge organizers | SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned (about 1200 on average). Feature matching done with brute-force nearest-neighbour search. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | imagematching@uvic.ca | N/A | 256 float32 |
SuperPoint kp:1024, match:nn |
0.3752 | 0.3704 | 0.6258 | 0.6135 | 0.4946 | 0.3670 | 0.2659 | 0.3370 | 0.4623 | 0.5174 | 0.1302 | 0.4145 | 19-04-26 | F | Challenge organizers | SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | imagematching@uvic.ca | N/A | 256 float32 |
SuperPoint kp:2048, match:nn |
0.3557 | 0.3989 | 0.5727 | 0.5711 | 0.5072 | 0.3859 | 0.3269 | 0.3598 | 0.4956 | 0.5199 | 0.1501 | 0.4222 | 19-04-26 | F | Challenge organizers | SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | imagematching@uvic.ca | N/A | 256 float32 |
SuperPoint kp:256, match:nn |
0.1089 | 0.0256 | 0.2964 | 0.3699 | 0.0160 | 0.1125 | 0.0002 | 0.1210 | 0.2019 | 0.1638 | 0.0016 | 0.1289 | 19-04-26 | F | Challenge organizers | SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | imagematching@uvic.ca | N/A | 256 float32 |
SuperPoint kp:512, match:nn |
0.3242 | 0.1806 | 0.5745 | 0.5669 | 0.3108 | 0.2704 | 0.0990 | 0.2769 | 0.3634 | 0.4048 | 0.0585 | 0.3118 | 19-04-26 | F | Challenge organizers | SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | imagematching@uvic.ca | N/A | 256 float32 |
SuperPoint kp:8000, match:nn |
0.3283 | 0.3907 | 0.5753 | 0.4620 | 0.4401 | 0.3203 | 0.3695 | 0.3651 | 0.4828 | 0.4707 | 0.1208 | 0.3932 | 19-04-26 | F | Challenge organizers | SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | imagematching@uvic.ca | N/A | 256 float32 |
Scale-invariant desc. (Cartesian, lambda=16) kp:8000, match:nn |
0.3459 | 0.6743 | 0.7612 | 0.3762 | 0.7104 | 0.4867 | 0.3722 | 0.3781 | 0.8166 | 0.6483 | 0.1591 | 0.5208 | 19-07-29 | F | Patrick Ebel | We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. (This is a baseline where we use cartesian patches instead.) Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. | TBA | patrick.ebel@epfl.ch | N/A | 128 float32 |
SuperPoint (trained on coco + phototourism training set) kp:2048, match:nn |
0.4057 | 0.5237 | 0.6628 | 0.5740 | 0.5633 | 0.3562 | 0.3349 | 0.3850 | 0.6564 | 0.5875 | 0.1590 | 0.4735 | 19-05-30 | F | Daniel DeTone, Paul Sarlin, Tomasz Malisiewicz, Andrew Rabinovich | SuperPoint V1 model trained on COCO homographic warps at VGA resolution, plus pairs from the phototourism training set using the GT poses and depths for correspondence. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | ddetone@magicleap.com | N/A | 256 float32 |
SuperPoint + Custom Matcher (v1) kp:2048, match:custom |
0.7052 | 0.6671 | 0.6918 | 0.8131 | 0.7382 | 0.5033 | 0.3152 | 0.6750 | 0.7949 | 0.7887 | 0.2596 | 0.6320 | 19-05-30 | F/M | Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich | Features are extracted by a modified SuperPoint: - retrained on the phototourism training set; - keypoints refinement; - better descriptor sampling; - adjusted thresholds. A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). | TBA | pesarlin@magicleap.com | N/A | 256 float32 |
SuperPoint + Custom Matcher (v2) kp:2048, match:custom |
0.6989 | 0.6873 | 0.6997 | 0.8126 | 0.7504 | 0.5183 | 0.3557 | 0.7210 | 0.7942 | 0.8112 | 0.2540 | 0.6458 | 19-05-28 | F/M | Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich | Features are extracted by a modified SuperPoint: - retrained on the phototourism training set, - keypoints refinement, - better descriptor sampling, - adjusted thresholds; A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). | TBA | pesarlin@magicleap.com | N/A | 256 float32 |
SURF (OpenCV) kp:8000, match:nn |
0.1458 | 0.3515 | 0.3862 | 0.2207 | 0.4503 | 0.2126 | 0.2072 | 0.3438 | 0.4977 | 0.4159 | 0.0760 | 0.3007 | 19-04-24 | F | Challenge organizers | SURF, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. | https://opencv.org | imagematching@uvic.ca | N/A | TBA |
SIFT + ContextDesc kp:8000, match:nn1to1 |
0.3075 | 0.7850 | 0.8314 | 0.6763 | 0.7492 | 0.5393 | 0.5014 | 0.4667 | 0.8463 | 0.7185 | 0.1965 | 0.6017 | 19-06-07 | F | Challenge organizers | ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with nearest-neighbour search, enforcing cross-match consistency. Features are quantized to uint8 and extracted from the code provided by the authors. | https://github.com/lzx551402/contextdesc | imagematching@uvic.ca | N/A | TBA |
Results for individual sequences:
MVS — sequence 'british_museum' | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Method | Date | Type | Ims (%) | #Pts | SR | TL | mAP5o | mAP10o | mAP15o | mAP20o | mAP25o | ATE | ||||||
AKAZE (OpenCV) kp:8000, match:nn |
19-04-24 | F | 99.1 | 4960.4 | 99.5 | 3.76 | 0.1442 | 0.2674 | 0.3576 | 0.4264 | 0.4881 | — | Challenge organizers | AKAZE, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. | https://opencv.org | imagematching@uvic.ca | N/A | TBA |
SphereDesc kp:8000, match:nn |
19-04-26 | F | 99.4 | 3516.7 | 99.2 | 3.30 | 0.1580 | 0.2773 | 0.3772 | 0.4609 | 0.5263 | — | Anonymous | We use OpenCV's implementation of AKAZE detector, and for each keypoint we extract a descriptor via CNN. | N/A | Anonymous | N/A | 256 float32 |
Brisk + SSS kp:8000, match:nn |
19-05-14 | F | 98.9 | 2095.0 | 98.0 | 3.06 | 0.0102 | 0.0410 | 0.0844 | 0.1457 | 0.2142 | — | Anonymous | We use OpenCV's implementation of brisk detector with the default settings, and for each image there are at most 8K keypoints. For each keypoint, we extract a descriptor via a CNN model. | TBA | Anonymous | N/A | 128 float32 |
D2-Net (single scale) kp:8000, match:nn |
19-05-07 | F | 95.8 | 3457.5 | 94.5 | 2.89 | 0.0700 | 0.1689 | 0.2518 | 0.3336 | 0.4026 | — | Challenge organizers | D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf | https://github.com/mihaidusmanu/d2-net | imagematching@uvic.ca | N/A | 512 float32 |
D2-Net (multiscale) kp:8000, match:nn |
19-05-07 | F | 96.3 | 3622.8 | 95.5 | 2.85 | 0.0386 | 0.1086 | 0.1904 | 0.2725 | 0.3439 | — | Challenge organizers | D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf | https://github.com/mihaidusmanu/d2-net | imagematching@uvic.ca | N/A | 512 float32 |
D2-Net (single scale, no PT dataset) kp:8000, match:nn |
19-06-01 | F | 95.8 | 3445.4 | 94.8 | 2.88 | 0.0663 | 0.1633 | 0.2522 | 0.3313 | 0.4039 | — | Challenge organizers | D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf | https://github.com/mihaidusmanu/d2-net | imagematching@uvic.ca | N/A | 512 float32 |
D2-Net (multi-scale, no PT dataset) kp:8000, match:nn |
19-06-05 | F | 95.9 | 3611.2 | 95.0 | 2.83 | 0.0336 | 0.1068 | 0.1962 | 0.2832 | 0.3553 | — | Challenge organizers | D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf | https://github.com/mihaidusmanu/d2-net | imagematching@uvic.ca | N/A | 512 float32 |
DELF kp:1024, match:nn |
19-05-05 | F | 91.7 | 804.0 | 89.5 | 2.33 | 0.0149 | 0.0464 | 0.0858 | 0.1385 | 0.1972 | — | Challenge organizers | DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 | https://github.com/tensorflow/models/tree/master/research/delf | imagematching@uvic.ca | N/A | 40 float32 |
DELF kp:2048, match:nn |
19-05-05 | F | 94.2 | 1517.5 | 92.2 | 2.29 | 0.0165 | 0.0519 | 0.1014 | 0.1625 | 0.2240 | — | Challenge organizers | DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 | https://github.com/tensorflow/models/tree/master/research/delf | imagematching@uvic.ca | N/A | 40 float32 |
DELF kp:256, match:nn |
19-05-05 | F | 25.9 | 55.0 | 7.0 | 1.76 | 0.0000 | 0.0001 | 0.0001 | 0.0002 | 0.0004 | — | Challenge organizers | DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 | https://github.com/tensorflow/models/tree/master/research/delf | imagematching@uvic.ca | N/A | 40 float32 |
DELF kp:512, match:nn |
19-05-05 | F | 79.2 | 351.5 | 70.0 | 2.30 | 0.0044 | 0.0191 | 0.0409 | 0.0704 | 0.1062 | — | Challenge organizers | DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 | https://github.com/tensorflow/models/tree/master/research/delf | imagematching@uvic.ca | N/A | 40 float32 |
ELF-256D kp:512, match:nn |
19-05-07 | F | 82.1 | 277.2 | 66.8 | 2.73 | 0.0055 | 0.0210 | 0.0449 | 0.0785 | 0.1183 | — | Anonymous | ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. | TBA | Anonymous | N/A | 256 float32 |
ELF-512D kp:512, match:nn |
19-05-09 | F | 81.4 | 267.9 | 73.8 | 2.82 | 0.0077 | 0.0258 | 0.0499 | 0.0899 | 0.1269 | — | Anonymous | ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. | TBA | Anonymous | N/A | 256 float32 |
ELF-SIFT kp:512, match:nn |
19-04-26 | F | 87.6 | 294.0 | 70.0 | 2.90 | 0.0169 | 0.0540 | 0.0933 | 0.1398 | 0.1870 | — | Anonymous | ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are HOG (as in SIFT). | N/A | Anonymous | N/A | 128 uint8 |
SIFT + GeoDesc kp:2048, match:nn |
19-05-19 | F | 99.0 | 1768.9 | 99.5 | 3.31 | 0.0849 | 0.1626 | 0.2347 | 0.3106 | 0.3816 | — | Challenge organizers | GeoDesc extracted on SIFT keypoints. Feature matching with brute-force nearest-neighbour search. | https://github.com/lzx551402/geodesc | imagematching@uvic.ca | N/A | 128 float32 |
SIFT + HardNet kp:2048, match:nn |
19-05-19 | F | 99.1 | 2019.6 | 99.8 | 3.19 | 0.0778 | 0.1531 | 0.2313 | 0.3113 | 0.3792 | — | Challenge organizers | HardNet extracted on SIFT keypoints. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger patches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. | https://github.com/DagnyT/hardnet | imagematching@uvic.ca | N/A | 128 float32 |
HarrisZ/RsGLOH2 kp:8000, match:sGOr2f |
19-05-23 | F | 99.4 | 1230.2 | 99.0 | 3.31 | 0.1306 | 0.2388 | 0.3285 | 0.4143 | 0.4816 | — | Fabio Bellavia and Carlo Colombo | HarrisZ keypoints and Root squared (like RootSIFT) sGLOH2 descriptors using sGOr2f* matching strategy. Distance table entries lower than their respective row-wise and column-wise averages are discarded and then matching pairs are computed using the greedy nearest neighbour as in the WISW@CAIP2019 contest. Keypoints and matches are ordered according to their ranks (best ones first). | http://cvg.dsi.unifi.it/cvg/index.php?id=research#descriptor | bellavia.fabio@gmail.com | N/A | 256 float32 |
HesAffNet - HardNet2 kp:8000, match:nn |
19-05-29 | F | 99.5 | 5718.4 | 100.0 | 3.60 | 0.1898 | 0.3316 | 0.4287 | 0.5047 | 0.5667 | — | Milan Pultar, Dmytro Mishkin, Jiří Matas | Detector: Hessian + AffNet(affine shape) + OriNet (orientation), Code and weight from https://github.com/ducha-aiki/affnet, Paper: https://arxiv.org/abs/1711.06704; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: https://github.com/pultarmi/HardNet_MultiDataset, Paper: https://arxiv.org/pdf/1901.09780.pdf; DeepNets were plugged into MODS framework, but without view synthesis and matching https://github.com/ducha-aiki/mods-light-zmq; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. | TBA | ducha.aiki@gmail.com | N/A | 128 uint8 |
Hessian - HardNet2 kp:8000, match:nn |
19-05-30 | F | 97.5 | 6175.5 | 96.0 | 3.63 | 0.2179 | 0.3460 | 0.4294 | 0.4954 | 0.5531 | — | Milan Pultar, Dmytro Mishkin, Jiří Matas | Detector: Hessian. Gravity vector orientation is assumed; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: https://github.com/pultarmi/HardNet_MultiDataset Paper: https://arxiv.org/pdf/1901.09780.pdf, DeepNets were plugged into MODS framework, but without view synthesis and matching https://github.com/ducha-aiki/mods-light-zmq; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. | TBA | ducha.aiki@gmail.com | N/A | 128 uint8 |
ORB (OpenCV) kp:8000, match:nn |
19-04-24 | F | 97.5 | 3255.1 | 94.0 | 3.39 | 0.0334 | 0.0898 | 0.1587 | 0.2268 | 0.2955 | — | Challenge organizers | ORB, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. | https://opencv.org | imagematching@uvic.ca | N/A | TBA |
Scale-invariant desc. (Log-Polar, lambda=32) kp:8000, match:nn |
19-06-25 | F | 99.5 | 7420.1 | 99.2 | 3.35 | 0.2044 | 0.3255 | 0.4094 | 0.4839 | 0.5486 | — | Patrick Ebel | We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. | TBA | patrick.ebel@epfl.ch | N/A | 128 float32 |
Scale-invariant desc. (Log-Polar, lambda=64) kp:8000, match:nn |
19-06-24 | F | 99.6 | 7694.9 | 99.8 | 3.32 | 0.1978 | 0.3176 | 0.4028 | 0.4793 | 0.5407 | — | Patrick Ebel | We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. | TBA | patrick.ebel@epfl.ch | N/A | 128 float32 |
Scale-invariant desc. (Log-Polar, lambda=96) kp:8000, match:nn |
19-06-20 | F | 99.4 | 7787.7 | 99.2 | 3.31 | 0.1911 | 0.3166 | 0.3993 | 0.4709 | 0.5348 | — | Patrick Ebel | We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. | TBA | patrick.ebel@epfl.ch | N/A | 128 float32 |
SIFT-AID (NN matcher) kp:8000, match:nn |
19-05-10 | F | 97.6 | 1898.7 | 92.0 | 3.02 | 0.0354 | 0.0877 | 0.1532 | 0.2208 | 0.2907 | — | Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon | We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: https://hal.archives-ouvertes.fr/hal-02016010. Code: https://github.com/rdguez-mariano/sift-aid | https://hal.archives-ouvertes.fr/hal-02016010 | facciolo@cmla.ens-cachan.fr | N/A | 6272 bits |
SIFT-AID (custom matcher) kp:8000, match:sift-aid |
19-04-29 | F/M | 96.4 | 1816.7 | 92.8 | 2.95 | 0.0275 | 0.0754 | 0.1351 | 0.1993 | 0.2716 | — | Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon | We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: https://hal.archives-ouvertes.fr/hal-02016010. Code: https://github.com/rdguez-mariano/sift-aid | https://hal.archives-ouvertes.fr/hal-02016010 | facciolo@cmla.ens-cachan.fr | N/A | 6272 bits |
SIFT + ContextDesc kp:8000, match:nn |
19-05-09 | F | 99.3 | 6043.3 | 99.8 | 2.92 | 0.1531 | 0.2511 | 0.3339 | 0.4077 | 0.4683 | — | Zixin Luo | ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. | https://github.com/lzx551402/contextdesc | zluoag@cse.ust.hk | N/A | 128 uint8 |
SIFT-Dense-ContextDesc kp:8000, match:nn |
19-05-28 | F | 98.6 | 5381.4 | 97.2 | 2.97 | 0.1763 | 0.2915 | 0.3768 | 0.4434 | 0.5078 | — | Zixin Luo, Jiahui Zhang | Dense-ContextDesc is a variant of ContextDesc, where descriptors are densely extracted from full images, instead of image patches, while other settings stay unchanged as original ContextDesc. We find Dense-ContextDesc performs better regarding in particular illumination changes. Dense-ContextDesc is extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. The model will be available on the authors' GitHub page. | TBA | zluoag@cse.ust.hk | N/A | TBA |
SIFT + ContextDesc + Inlier Classification V2 kp:8000, match:custom |
19-05-28 | F/M | 99.3 | 6167.1 | 98.0 | 3.12 | 0.3613 | 0.5157 | 0.6071 | 0.6787 | 0.7271 | — | Dawei Sun, Zixin Luo, Jiahui Zhang | We use the SIFT detector and ContextDesc descriptor, and then we train an improved inlier classification and fundamental matrix estimation network based on [Yi et al. CVPR2018] (https://arxiv.org/pdf/1711.05971.pdf). | https://github.com/lzx551402/contextdesc | zluoag@cse.ust.hk | N/A | 128 float32 |
SIFT-GeoDesc-GitHub kp:8000, match:nn |
19-05-08 | F | 99.4 | 4407.3 | 99.2 | 3.14 | 0.1621 | 0.2620 | 0.3375 | 0.4073 | 0.4742 | — | Zixin Luo | GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features extracted from the code provided by the authors. | https://arxiv.org/abs/1807.06294 | zluoag@cse.ust.hk | N/A | 128 float32 |
SIFT + GeoDesc kp:8000, match:nn |
19-04-24 | F | 99.5 | 5722.5 | 100.0 | 3.45 | 0.2066 | 0.3316 | 0.4169 | 0.4933 | 0.5506 | — | Challenge organizers | GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. | https://github.com/lzx551402/geodesc | imagematching@uvic.ca | N/A | 128 float32 |
SIFT + HardNet kp:8000, match:nn |
19-04-24 | F | 99.6 | 6837.1 | 100.0 | 3.38 | 0.2024 | 0.3305 | 0.4090 | 0.4957 | 0.5559 | — | Challenge organizers | HardNet extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. | https://github.com/DagnyT/hardnet | imagematching@uvic.ca | N/A | 128 float32 |
SIFT + L2-Net kp:8000, match:nn |
19-04-24 | F | 99.6 | 6314.7 | 100.0 | 3.22 | 0.1492 | 0.2527 | 0.3292 | 0.4030 | 0.4699 | — | Challenge organizers | L2-Net extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. | https://github.com/yuruntian/L2-Net | imagematching@uvic.ca | N/A | 128 float32 |
SIFT + ContextDesc + Inlier Classification V1 kp:8000, match:custom |
19-05-29 | F/M | 99.5 | 6170.7 | 99.8 | 3.08 | 0.3315 | 0.4759 | 0.5711 | 0.6451 | 0.7017 | — | Dawei Sun, Zixin Luo, Jiahui Zhang | We use the SIFT detector and ContextDesc descriptor, and then we train an inlier classification and fundamental matrix estimation network using the architecture of [Yi et al. CVPR2018] (https://arxiv.org/pdf/1711.05971.pdf). | https://github.com/lzx551402/contextdesc | zluoag@cse.ust.hk | N/A | 128 float32 |
SIFT (OpenCV) kp:8000, match:nn |
19-04-24 | F | 98.1 | 4336.2 | 97.2 | 3.25 | 0.1122 | 0.2007 | 0.2714 | 0.3424 | 0.4135 | — | Challenge organizers | SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. | https://opencv.org | imagematching@uvic.ca | N/A | 128 float32 |
SIFT + TFeat kp:8000, match:nn |
19-04-24 | F | 99.1 | 5718.0 | 99.8 | 3.20 | 0.1452 | 0.2462 | 0.3304 | 0.4033 | 0.4724 | — | Challenge organizers | T-Feat extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. | https://github.com/vbalnt/tfeat | imagematching@uvic.ca | N/A | 128 float32 |
SIFT (OpenCV) kp:2048, match:nn |
19-05-17 | F | 97.9 | 1573.7 | 95.5 | 3.48 | 0.0411 | 0.0924 | 0.1458 | 0.2164 | 0.2877 | — | Challenge organizers | SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. | https://opencv.org | imagematching@uvic.ca | N/A | 128 float32 |
Superpoint (nn matcher) kp:2048, match:nn |
19-06-07 | F | 99.2 | 1563.9 | 97.5 | 3.36 | 0.1320 | 0.2481 | 0.3344 | 0.4009 | 0.4741 | — | Challenge organizers | SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. | TBA | imagematching@uvic.ca | N/A | 256 float32 |
Superpoint (1:1 matcher) kp:2048, match:nn1to1 |
19-06-07 | F | 99.4 | 1479.1 | 98.5 | 3.50 | 0.1942 | 0.3205 | 0.4071 | 0.4807 | 0.5482 | — | Challenge organizers | SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. | TBA | imagematching@uvic.ca | N/A | 256 float32 |
SuperPoint (default) kp:2048, match:nn |
19-04-24 | F | 95.6 | 1003.0 | 93.5 | 3.33 | 0.1745 | 0.2897 | 0.3763 | 0.4470 | 0.5024 | — | Challenge organizers | SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned (about 1200 on average). Feature matching done with brute-force nearest-neighbour search. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | imagematching@uvic.ca | N/A | 256 float32 |
SuperPoint kp:1024, match:nn |
19-04-26 | F | 95.9 | 915.3 | 93.5 | 3.28 | 0.1755 | 0.2912 | 0.3752 | 0.4456 | 0.4965 | — | Challenge organizers | SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | imagematching@uvic.ca | N/A | 256 float32 |
SuperPoint kp:2048, match:nn |
19-04-26 | F | 95.2 | 1634.1 | 94.8 | 3.26 | 0.1567 | 0.2703 | 0.3557 | 0.4228 | 0.4766 | — | Challenge organizers | SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | imagematching@uvic.ca | N/A | 256 float32 |
SuperPoint kp:256, match:nn |
19-04-26 | F | 75.9 | 173.6 | 47.5 | 2.92 | 0.0402 | 0.0793 | 0.1089 | 0.1323 | 0.1505 | — | Challenge organizers | SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | imagematching@uvic.ca | N/A | 256 float32 |
SuperPoint kp:512, match:nn |
19-04-26 | F | 93.2 | 457.5 | 87.5 | 3.14 | 0.1349 | 0.2415 | 0.3242 | 0.3830 | 0.4355 | — | Challenge organizers | SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | imagematching@uvic.ca | N/A | 256 float32 |
SuperPoint kp:8000, match:nn |
19-04-26 | F | 96.2 | 5766.0 | 94.5 | 3.25 | 0.1269 | 0.2422 | 0.3283 | 0.3987 | 0.4625 | — | Challenge organizers | SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | imagematching@uvic.ca | N/A | 256 float32 |
Scale-invariant desc. (Cartesian, lambda=16) kp:8000, match:nn |
19-07-29 | F | 99.4 | 7869.6 | 99.8 | 3.33 | 0.1557 | 0.2578 | 0.3459 | 0.4145 | 0.4826 | — | Patrick Ebel | We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. (This is a baseline where we use cartesian patches instead.) Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. | TBA | patrick.ebel@epfl.ch | N/A | 128 float32 |
SuperPoint (trained on coco + phototourism training set) kp:2048, match:nn |
19-05-30 | F | 98.8 | 1100.9 | 97.2 | 3.50 | 0.1885 | 0.3207 | 0.4057 | 0.4768 | 0.5428 | — | Daniel DeTone, Paul Sarlin, Tomasz Malisiewicz, Andrew Rabinovich | SuperPoint V1 model trained on COCO homographic warps at VGA resolution, plus pairs from the phototourism training set using the GT poses and depths for correspondence. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | ddetone@magicleap.com | N/A | 256 float32 |
SuperPoint + Custom Matcher (v1) kp:2048, match:custom |
19-05-30 | F/M | 99.3 | 1092.9 | 98.5 | 3.81 | 0.4603 | 0.6270 | 0.7052 | 0.7533 | 0.7989 | — | Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich | Features are extracted by a modified SuperPoint: - retrained on the phototourism training set; - keypoints refinement; - better descriptor sampling; - adjusted thresholds. A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). | TBA | pesarlin@magicleap.com | N/A | 256 float32 |
SuperPoint + Custom Matcher (v2) kp:2048, match:custom |
19-05-28 | F/M | 99.3 | 1258.2 | 99.0 | 3.89 | 0.4530 | 0.6140 | 0.6989 | 0.7556 | 0.8001 | — | Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich | Features are extracted by a modified SuperPoint: - retrained on the phototourism training set, - keypoints refinement, - better descriptor sampling, - adjusted thresholds; A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). | TBA | pesarlin@magicleap.com | N/A | 256 float32 |
SURF (OpenCV) kp:8000, match:nn |
19-04-24 | F | 97.8 | 3199.3 | 94.2 | 2.89 | 0.0386 | 0.0920 | 0.1458 | 0.2112 | 0.2830 | — | Challenge organizers | SURF, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. | https://opencv.org | imagematching@uvic.ca | N/A | TBA |
SIFT + ContextDesc kp:8000, match:nn1to1 |
19-06-07 | F | 99.5 | 6693.3 | 100.0 | 2.97 | 0.1253 | 0.2196 | 0.3075 | 0.3843 | 0.4521 | — | Challenge organizers | ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with nearest-neighbour search, enforcing cross-match consistency. Features are quantized to uint8 and extracted from the code provided by the authors. | https://github.com/lzx551402/contextdesc | imagematching@uvic.ca | N/A | TBA |
MVS — sequence 'florence_cathedral_side' | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Method | Date | Type | Ims (%) | #Pts | SR | TL | mAP5o | mAP10o | mAP15o | mAP20o | mAP25o | ATE | ||||||
AKAZE (OpenCV) kp:8000, match:nn |
19-04-24 | F | 92.3 | 4821.8 | 81.7 | 3.06 | 0.3638 | 0.4103 | 0.4385 | 0.4615 | 0.4841 | — | Challenge organizers | AKAZE, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. | https://opencv.org | imagematching@uvic.ca | N/A | TBA |
SphereDesc kp:8000, match:nn |
19-04-26 | F | 97.8 | 7748.9 | 95.2 | 3.16 | 0.4741 | 0.5381 | 0.5865 | 0.6261 | 0.6657 | — | Anonymous | We use OpenCV's implementation of AKAZE detector, and for each keypoint we extract a descriptor via CNN. | N/A | Anonymous | N/A | 256 float32 |
Brisk + SSS kp:8000, match:nn |
19-05-14 | F | 93.7 | 3959.1 | 87.2 | 2.60 | 0.2233 | 0.2980 | 0.3523 | 0.4039 | 0.4457 | — | Anonymous | We use OpenCV's implementation of brisk detector with the default settings, and for each image there are at most 8K keypoints. For each keypoint, we extract a descriptor via a CNN model. | TBA | Anonymous | N/A | 128 float32 |
D2-Net (single scale) kp:8000, match:nn |
19-05-07 | F | 92.5 | 4421.4 | 91.5 | 3.06 | 0.3081 | 0.4217 | 0.4914 | 0.5396 | 0.5713 | — | Challenge organizers | D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf | https://github.com/mihaidusmanu/d2-net | imagematching@uvic.ca | N/A | 512 float32 |
D2-Net (multiscale) kp:8000, match:nn |
19-05-07 | F | 93.0 | 6092.1 | 93.2 | 2.89 | 0.2586 | 0.3721 | 0.4486 | 0.5162 | 0.5684 | — | Challenge organizers | D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf | https://github.com/mihaidusmanu/d2-net | imagematching@uvic.ca | N/A | 512 float32 |
D2-Net (single scale, no PT dataset) kp:8000, match:nn |
19-06-01 | F | 91.7 | 4463.2 | 92.8 | 3.06 | 0.3054 | 0.4182 | 0.4901 | 0.5333 | 0.5709 | — | Challenge organizers | D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf | https://github.com/mihaidusmanu/d2-net | imagematching@uvic.ca | N/A | 512 float32 |
D2-Net (multi-scale, no PT dataset) kp:8000, match:nn |
19-06-05 | F | 92.3 | 5775.1 | 92.2 | 2.88 | 0.2732 | 0.3889 | 0.4584 | 0.5096 | 0.5530 | — | Challenge organizers | D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf | https://github.com/mihaidusmanu/d2-net | imagematching@uvic.ca | N/A | 512 float32 |
DELF kp:1024, match:nn |
19-05-05 | F | 85.3 | 588.5 | 74.0 | 2.42 | 0.0332 | 0.1237 | 0.2017 | 0.2653 | 0.3064 | — | Challenge organizers | DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 | https://github.com/tensorflow/models/tree/master/research/delf | imagematching@uvic.ca | N/A | 40 float32 |
DELF kp:2048, match:nn |
19-05-05 | F | 89.3 | 1200.4 | 81.2 | 2.41 | 0.0868 | 0.2067 | 0.2859 | 0.3434 | 0.3916 | — | Challenge organizers | DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 | https://github.com/tensorflow/models/tree/master/research/delf | imagematching@uvic.ca | N/A | 40 float32 |
DELF kp:256, match:nn |
19-05-05 | F | 55.7 | 76.4 | 22.8 | 2.26 | 0.0000 | 0.0005 | 0.0019 | 0.0044 | 0.0074 | — | Challenge organizers | DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 | https://github.com/tensorflow/models/tree/master/research/delf | imagematching@uvic.ca | N/A | 40 float32 |
DELF kp:512, match:nn |
19-05-05 | F | 72.9 | 239.8 | 57.2 | 2.36 | 0.0047 | 0.0225 | 0.0558 | 0.0939 | 0.1246 | — | Challenge organizers | DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 | https://github.com/tensorflow/models/tree/master/research/delf | imagematching@uvic.ca | N/A | 40 float32 |
ELF-256D kp:512, match:nn |
19-05-07 | F | 37.4 | 179.8 | 32.8 | 1.87 | 0.0302 | 0.0436 | 0.0499 | 0.0542 | 0.0581 | — | Anonymous | ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. | TBA | Anonymous | N/A | 256 float32 |
ELF-512D kp:512, match:nn |
19-05-09 | F | 64.3 | 194.4 | 40.2 | 2.28 | 0.0324 | 0.0521 | 0.0654 | 0.0736 | 0.0813 | — | Anonymous | ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. | TBA | Anonymous | N/A | 256 float32 |
ELF-SIFT kp:512, match:nn |
19-04-26 | F | 75.3 | 309.9 | 47.2 | 2.82 | 0.1363 | 0.1706 | 0.1873 | 0.1945 | 0.2006 | — | Anonymous | ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are HOG (as in SIFT). | N/A | Anonymous | N/A | 128 uint8 |
SIFT + GeoDesc kp:2048, match:nn |
19-05-19 | F | 95.3 | 1940.1 | 87.2 | 3.14 | 0.4212 | 0.5014 | 0.5479 | 0.5849 | 0.6129 | — | Challenge organizers | GeoDesc extracted on SIFT keypoints. Feature matching with brute-force nearest-neighbour search. | https://github.com/lzx551402/geodesc | imagematching@uvic.ca | N/A | 128 float32 |
SIFT + HardNet kp:2048, match:nn |
19-05-19 | F | 95.6 | 2123.3 | 88.5 | 3.12 | 0.4293 | 0.5089 | 0.5550 | 0.5946 | 0.6318 | — | Challenge organizers | HardNet extracted on SIFT keypoints. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger patches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. | https://github.com/DagnyT/hardnet | imagematching@uvic.ca | N/A | 128 float32 |
HarrisZ/RsGLOH2 kp:8000, match:sGOr2f |
19-05-23 | F | 97.5 | 2091.2 | 91.5 | 3.20 | 0.4790 | 0.5510 | 0.6044 | 0.6469 | 0.6828 | — | Fabio Bellavia and Carlo Colombo | HarrisZ keypoints and Root squared (like RootSIFT) sGLOH2 descriptors using sGOr2f* matching strategy. Distance table entries lower than their respective row-wise and column-wise averages are discarded and then matching pairs are computed using the greedy nearest neighbour as in the WISW@CAIP2019 contest. Keypoints and matches are ordered according to their ranks (best ones first). | http://cvg.dsi.unifi.it/cvg/index.php?id=research#descriptor | bellavia.fabio@gmail.com | N/A | 256 float32 |
HesAffNet - HardNet2 kp:8000, match:nn |
19-05-29 | F | 97.7 | 6265.0 | 94.8 | 3.44 | 0.5418 | 0.5993 | 0.6316 | 0.6680 | 0.7056 | — | Milan Pultar, Dmytro Mishkin, Jiří Matas | Detector: Hessian + AffNet(affine shape) + OriNet (orientation), Code and weight from https://github.com/ducha-aiki/affnet, Paper: https://arxiv.org/abs/1711.06704; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: https://github.com/pultarmi/HardNet_MultiDataset, Paper: https://arxiv.org/pdf/1901.09780.pdf; DeepNets were plugged into MODS framework, but without view synthesis and matching https://github.com/ducha-aiki/mods-light-zmq; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. | TBA | ducha.aiki@gmail.com | N/A | 128 uint8 |
Hessian - HardNet2 kp:8000, match:nn |
19-05-30 | F | 94.6 | 6460.8 | 92.2 | 3.53 | 0.4991 | 0.5524 | 0.5902 | 0.6203 | 0.6526 | — | Milan Pultar, Dmytro Mishkin, Jiří Matas | Detector: Hessian. Gravity vector orientation is assumed; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: https://github.com/pultarmi/HardNet_MultiDataset Paper: https://arxiv.org/pdf/1901.09780.pdf, DeepNets were plugged into MODS framework, but without view synthesis and matching https://github.com/ducha-aiki/mods-light-zmq; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. | TBA | ducha.aiki@gmail.com | N/A | 128 uint8 |
ORB (OpenCV) kp:8000, match:nn |
19-04-24 | F | 87.7 | 3858.0 | 71.0 | 2.64 | 0.2375 | 0.2792 | 0.3061 | 0.3266 | 0.3509 | — | Challenge organizers | ORB, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. | https://opencv.org | imagematching@uvic.ca | N/A | TBA |
Scale-invariant desc. (Log-Polar, lambda=32) kp:8000, match:nn |
19-06-25 | F | 98.4 | 7558.2 | 95.5 | 3.37 | 0.4973 | 0.5718 | 0.6191 | 0.6604 | 0.6953 | — | Patrick Ebel | We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. | TBA | patrick.ebel@epfl.ch | N/A | 128 float32 |
Scale-invariant desc. (Log-Polar, lambda=64) kp:8000, match:nn |
19-06-24 | F | 98.0 | 7589.5 | 94.8 | 3.34 | 0.4923 | 0.5526 | 0.5943 | 0.6297 | 0.6652 | — | Patrick Ebel | We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. | TBA | patrick.ebel@epfl.ch | N/A | 128 float32 |
Scale-invariant desc. (Log-Polar, lambda=96) kp:8000, match:nn |
19-06-20 | F | 98.8 | 7655.2 | 94.8 | 3.33 | 0.5094 | 0.5654 | 0.6064 | 0.6459 | 0.6754 | — | Patrick Ebel | We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. | TBA | patrick.ebel@epfl.ch | N/A | 128 float32 |
SIFT-AID (NN matcher) kp:8000, match:nn |
19-05-10 | F | 89.4 | 4132.4 | 78.0 | 2.76 | 0.2820 | 0.3401 | 0.3674 | 0.3938 | 0.4214 | — | Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon | We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: https://hal.archives-ouvertes.fr/hal-02016010. Code: https://github.com/rdguez-mariano/sift-aid | https://hal.archives-ouvertes.fr/hal-02016010 | facciolo@cmla.ens-cachan.fr | N/A | 6272 bits |
SIFT-AID (custom matcher) kp:8000, match:sift-aid |
19-04-29 | F/M | 87.3 | 3781.1 | 74.8 | 2.66 | 0.2627 | 0.3147 | 0.3428 | 0.3638 | 0.3850 | — | Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon | We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: https://hal.archives-ouvertes.fr/hal-02016010. Code: https://github.com/rdguez-mariano/sift-aid | https://hal.archives-ouvertes.fr/hal-02016010 | facciolo@cmla.ens-cachan.fr | N/A | 6272 bits |
SIFT + ContextDesc kp:8000, match:nn |
19-05-09 | F | 98.5 | 7259.7 | 96.0 | 3.26 | 0.5283 | 0.5921 | 0.6322 | 0.6655 | 0.6918 | — | Zixin Luo | ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. | https://github.com/lzx551402/contextdesc | zluoag@cse.ust.hk | N/A | 128 uint8 |
SIFT-Dense-ContextDesc kp:8000, match:nn |
19-05-28 | F | 96.3 | 5673.6 | 93.0 | 3.27 | 0.4892 | 0.5437 | 0.5872 | 0.6208 | 0.6492 | — | Zixin Luo, Jiahui Zhang | Dense-ContextDesc is a variant of ContextDesc, where descriptors are densely extracted from full images, instead of image patches, while other settings stay unchanged as original ContextDesc. We find Dense-ContextDesc performs better regarding in particular illumination changes. Dense-ContextDesc is extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. The model will be available on the authors' GitHub page. | TBA | zluoag@cse.ust.hk | N/A | TBA |
SIFT + ContextDesc + Inlier Classification V2 kp:8000, match:custom |
19-05-28 | F/M | 99.7 | 7276.0 | 96.8 | 3.56 | 0.7187 | 0.7936 | 0.8317 | 0.8501 | 0.8687 | — | Dawei Sun, Zixin Luo, Jiahui Zhang | We use the SIFT detector and ContextDesc descriptor, and then we train an improved inlier classification and fundamental matrix estimation network based on [Yi et al. CVPR2018] (https://arxiv.org/pdf/1711.05971.pdf). | https://github.com/lzx551402/contextdesc | zluoag@cse.ust.hk | N/A | 128 float32 |
SIFT-GeoDesc-GitHub kp:8000, match:nn |
19-05-08 | F | 98.0 | 6651.5 | 95.2 | 3.32 | 0.5159 | 0.5732 | 0.6042 | 0.6374 | 0.6741 | — | Zixin Luo | GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features extracted from the code provided by the authors. | https://arxiv.org/abs/1807.06294 | zluoag@cse.ust.hk | N/A | 128 float32 |
SIFT + GeoDesc kp:8000, match:nn |
19-04-24 | F | 98.3 | 7094.6 | 95.8 | 3.50 | 0.5467 | 0.6084 | 0.6473 | 0.6791 | 0.7146 | — | Challenge organizers | GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. | https://github.com/lzx551402/geodesc | imagematching@uvic.ca | N/A | 128 float32 |
SIFT + HardNet kp:8000, match:nn |
19-04-24 | F | 98.8 | 7774.4 | 96.8 | 3.46 | 0.5471 | 0.6084 | 0.6438 | 0.6724 | 0.7028 | — | Challenge organizers | HardNet extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. | https://github.com/DagnyT/hardnet | imagematching@uvic.ca | N/A | 128 float32 |
SIFT + L2-Net kp:8000, match:nn |
19-04-24 | F | 98.0 | 7250.3 | 96.5 | 3.33 | 0.5012 | 0.5587 | 0.5935 | 0.6245 | 0.6627 | — | Challenge organizers | L2-Net extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. | https://github.com/yuruntian/L2-Net | imagematching@uvic.ca | N/A | 128 float32 |
SIFT + ContextDesc + Inlier Classification V1 kp:8000, match:custom |
19-05-29 | F/M | 99.1 | 7224.6 | 97.2 | 3.53 | 0.7153 | 0.7766 | 0.8059 | 0.8325 | 0.8581 | — | Dawei Sun, Zixin Luo, Jiahui Zhang | We use the SIFT detector and ContextDesc descriptor, and then we train an inlier classification and fundamental matrix estimation network using the architecture of [Yi et al. CVPR2018] (https://arxiv.org/pdf/1711.05971.pdf). | https://github.com/lzx551402/contextdesc | zluoag@cse.ust.hk | N/A | 128 float32 |
SIFT (OpenCV) kp:8000, match:nn |
19-04-24 | F | 96.7 | 6040.3 | 92.5 | 3.31 | 0.4774 | 0.5243 | 0.5530 | 0.5861 | 0.6184 | — | Challenge organizers | SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. | https://opencv.org | imagematching@uvic.ca | N/A | 128 float32 |
SIFT + TFeat kp:8000, match:nn |
19-04-24 | F | 97.6 | 6488.0 | 95.0 | 3.26 | 0.4359 | 0.4882 | 0.5315 | 0.5723 | 0.6088 | — | Challenge organizers | T-Feat extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. | https://github.com/vbalnt/tfeat | imagematching@uvic.ca | N/A | 128 float32 |
SIFT (OpenCV) kp:2048, match:nn |
19-05-17 | F | 89.0 | 1666.1 | 74.5 | 2.97 | 0.3189 | 0.3770 | 0.4113 | 0.4356 | 0.4592 | — | Challenge organizers | SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. | https://opencv.org | imagematching@uvic.ca | N/A | 128 float32 |
Superpoint (nn matcher) kp:2048, match:nn |
19-06-07 | F | 96.0 | 1862.0 | 87.0 | 3.53 | 0.4306 | 0.5019 | 0.5417 | 0.5752 | 0.6014 | — | Challenge organizers | SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. | TBA | imagematching@uvic.ca | N/A | 256 float32 |
Superpoint (1:1 matcher) kp:2048, match:nn1to1 |
19-06-07 | F | 96.8 | 1880.9 | 90.8 | 3.70 | 0.5507 | 0.6319 | 0.6746 | 0.7049 | 0.7246 | — | Challenge organizers | SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. | TBA | imagematching@uvic.ca | N/A | 256 float32 |
SuperPoint (default) kp:2048, match:nn |
19-04-24 | F | 88.7 | 1320.5 | 80.5 | 3.28 | 0.3088 | 0.3606 | 0.3927 | 0.4212 | 0.4471 | — | Challenge organizers | SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned (about 1200 on average). Feature matching done with brute-force nearest-neighbour search. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | imagematching@uvic.ca | N/A | 256 float32 |
SuperPoint kp:1024, match:nn |
19-04-26 | F | 85.5 | 850.7 | 74.5 | 3.13 | 0.2989 | 0.3428 | 0.3704 | 0.3905 | 0.4057 | — | Challenge organizers | SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | imagematching@uvic.ca | N/A | 256 float32 |
SuperPoint kp:2048, match:nn |
19-04-26 | F | 89.5 | 1528.8 | 82.8 | 3.29 | 0.3072 | 0.3622 | 0.3989 | 0.4326 | 0.4630 | — | Challenge organizers | SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | imagematching@uvic.ca | N/A | 256 float32 |
SuperPoint kp:256, match:nn |
19-04-26 | F | 17.9 | 81.3 | 20.5 | 1.31 | 0.0209 | 0.0239 | 0.0256 | 0.0267 | 0.0280 | — | Challenge organizers | SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | imagematching@uvic.ca | N/A | 256 float32 |
SuperPoint kp:512, match:nn |
19-04-26 | F | 73.0 | 355.0 | 55.2 | 2.75 | 0.1514 | 0.1694 | 0.1806 | 0.1922 | 0.2005 | — | Challenge organizers | SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | imagematching@uvic.ca | N/A | 256 float32 |
SuperPoint kp:8000, match:nn |
19-04-26 | F | 91.0 | 5043.3 | 89.2 | 3.10 | 0.2632 | 0.3433 | 0.3907 | 0.4282 | 0.4653 | — | Challenge organizers | SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | imagematching@uvic.ca | N/A | 256 float32 |
Scale-invariant desc. (Cartesian, lambda=16) kp:8000, match:nn |
19-07-29 | F | 98.9 | 8281.7 | 97.0 | 3.49 | 0.5546 | 0.6281 | 0.6743 | 0.7062 | 0.7327 | — | Patrick Ebel | We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. (This is a baseline where we use cartesian patches instead.) Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. | TBA | patrick.ebel@epfl.ch | N/A | 128 float32 |
SuperPoint (trained on coco + phototourism training set) kp:2048, match:nn |
19-05-30 | F | 94.7 | 1537.1 | 84.5 | 3.38 | 0.4163 | 0.4815 | 0.5237 | 0.5539 | 0.5838 | — | Daniel DeTone, Paul Sarlin, Tomasz Malisiewicz, Andrew Rabinovich | SuperPoint V1 model trained on COCO homographic warps at VGA resolution, plus pairs from the phototourism training set using the GT poses and depths for correspondence. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | ddetone@magicleap.com | N/A | 256 float32 |
SuperPoint + Custom Matcher (v1) kp:2048, match:custom |
19-05-30 | F/M | 97.5 | 1587.8 | 90.2 | 3.73 | 0.5526 | 0.6232 | 0.6670 | 0.6939 | 0.7185 | — | Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich | Features are extracted by a modified SuperPoint: - retrained on the phototourism training set; - keypoints refinement; - better descriptor sampling; - adjusted thresholds. A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). | TBA | pesarlin@magicleap.com | N/A | 256 float32 |
SuperPoint + Custom Matcher (v2) kp:2048, match:custom |
19-05-28 | F/M | 97.4 | 1657.0 | 92.8 | 3.80 | 0.5767 | 0.6521 | 0.6873 | 0.7179 | 0.7447 | — | Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich | Features are extracted by a modified SuperPoint: - retrained on the phototourism training set, - keypoints refinement, - better descriptor sampling, - adjusted thresholds; A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). | TBA | pesarlin@magicleap.com | N/A | 256 float32 |
SURF (OpenCV) kp:8000, match:nn |
19-04-24 | F | 88.6 | 3721.2 | 72.2 | 2.72 | 0.2854 | 0.3243 | 0.3515 | 0.3740 | 0.3960 | — | Challenge organizers | SURF, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. | https://opencv.org | imagematching@uvic.ca | N/A | TBA |
SIFT + ContextDesc kp:8000, match:nn1to1 |
19-06-07 | F | 98.8 | 7951.4 | 95.8 | 3.48 | 0.6800 | 0.7519 | 0.7850 | 0.8047 | 0.8217 | — | Challenge organizers | ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with nearest-neighbour search, enforcing cross-match consistency. Features are quantized to uint8 and extracted from the code provided by the authors. | https://github.com/lzx551402/contextdesc | imagematching@uvic.ca | N/A | TBA |
MVS — sequence 'lincoln_memorial_statue' | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Method | Date | Type | Ims (%) | #Pts | SR | TL | mAP5o | mAP10o | mAP15o | mAP20o | mAP25o | ATE | ||||||
AKAZE (OpenCV) kp:8000, match:nn |
19-04-24 | F | 95.9 | 5196.5 | 87.5 | 3.21 | 0.4596 | 0.5232 | 0.5551 | 0.5818 | 0.6020 | — | Challenge organizers | AKAZE, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. | https://opencv.org | imagematching@uvic.ca | N/A | TBA |
SphereDesc kp:8000, match:nn |
19-04-26 | F | 97.6 | 1076.8 | 93.2 | 4.25 | 0.5305 | 0.6167 | 0.6595 | 0.6920 | 0.7132 | — | Anonymous | We use OpenCV's implementation of AKAZE detector, and for each keypoint we extract a descriptor via CNN. | N/A | Anonymous | N/A | 256 float32 |
Brisk + SSS kp:8000, match:nn |
19-05-14 | F | 90.9 | 793.1 | 87.2 | 3.12 | 0.0365 | 0.0796 | 0.1150 | 0.1471 | 0.1801 | — | Anonymous | We use OpenCV's implementation of brisk detector with the default settings, and for each image there are at most 8K keypoints. For each keypoint, we extract a descriptor via a CNN model. | TBA | Anonymous | N/A | 128 float32 |
D2-Net (single scale) kp:8000, match:nn |
19-05-07 | F | 94.3 | 4540.1 | 94.8 | 3.05 | 0.4675 | 0.5678 | 0.6187 | 0.6494 | 0.6767 | — | Challenge organizers | D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf | https://github.com/mihaidusmanu/d2-net | imagematching@uvic.ca | N/A | 512 float32 |
D2-Net (multiscale) kp:8000, match:nn |
19-05-07 | F | 95.0 | 6023.8 | 95.5 | 2.94 | 0.4699 | 0.5761 | 0.6308 | 0.6671 | 0.7006 | — | Challenge organizers | D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf | https://github.com/mihaidusmanu/d2-net | imagematching@uvic.ca | N/A | 512 float32 |
D2-Net (single scale, no PT dataset) kp:8000, match:nn |
19-06-01 | F | 94.2 | 4643.3 | 93.5 | 3.05 | 0.4602 | 0.5636 | 0.6258 | 0.6527 | 0.6798 | — | Challenge organizers | D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf | https://github.com/mihaidusmanu/d2-net | imagematching@uvic.ca | N/A | 512 float32 |
D2-Net (multi-scale, no PT dataset) kp:8000, match:nn |
19-06-05 | F | 95.2 | 6163.4 | 95.2 | 2.94 | 0.4443 | 0.5866 | 0.6617 | 0.6864 | 0.7140 | — | Challenge organizers | D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf | https://github.com/mihaidusmanu/d2-net | imagematching@uvic.ca | N/A | 512 float32 |
DELF kp:1024, match:nn |
19-05-05 | F | 80.2 | 534.1 | 66.0 | 2.38 | 0.0837 | 0.1753 | 0.2292 | 0.2669 | 0.2874 | — | Challenge organizers | DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 | https://github.com/tensorflow/models/tree/master/research/delf | imagematching@uvic.ca | N/A | 40 float32 |
DELF kp:2048, match:nn |
19-05-05 | F | 84.3 | 1020.8 | 71.5 | 2.35 | 0.1008 | 0.2074 | 0.2730 | 0.3139 | 0.3425 | — | Challenge organizers | DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 | https://github.com/tensorflow/models/tree/master/research/delf | imagematching@uvic.ca | N/A | 40 float32 |
DELF kp:256, match:nn |
19-05-05 | F | 32.0 | 63.5 | 31.2 | 1.73 | 0.0016 | 0.0077 | 0.0148 | 0.0205 | 0.0239 | — | Challenge organizers | DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 | https://github.com/tensorflow/models/tree/master/research/delf | imagematching@uvic.ca | N/A | 40 float32 |
DELF kp:512, match:nn |
19-05-05 | F | 69.3 | 201.4 | 53.5 | 2.34 | 0.0196 | 0.0570 | 0.0850 | 0.1064 | 0.1212 | — | Challenge organizers | DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 | https://github.com/tensorflow/models/tree/master/research/delf | imagematching@uvic.ca | N/A | 40 float32 |
ELF-256D kp:512, match:nn |
19-05-07 | F | 69.3 | 160.9 | 46.0 | 2.87 | 0.0744 | 0.1020 | 0.1188 | 0.1301 | 0.1364 | — | Anonymous | ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. | TBA | Anonymous | N/A | 256 float32 |
ELF-512D kp:512, match:nn |
19-05-09 | F | 73.1 | 195.6 | 53.2 | 2.93 | 0.1044 | 0.1440 | 0.1666 | 0.1778 | 0.1867 | — | Anonymous | ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. | TBA | Anonymous | N/A | 256 float32 |
ELF-SIFT kp:512, match:nn |
19-04-26 | F | 79.6 | 329.7 | 66.5 | 3.09 | 0.1974 | 0.2482 | 0.2702 | 0.2898 | 0.3026 | — | Anonymous | ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are HOG (as in SIFT). | N/A | Anonymous | N/A | 128 uint8 |
SIFT + GeoDesc kp:2048, match:nn |
19-05-19 | F | 97.8 | 1258.5 | 93.2 | 3.95 | 0.5709 | 0.6464 | 0.6881 | 0.7169 | 0.7356 | — | Challenge organizers | GeoDesc extracted on SIFT keypoints. Feature matching with brute-force nearest-neighbour search. | https://github.com/lzx551402/geodesc | imagematching@uvic.ca | N/A | 128 float32 |
SIFT + HardNet kp:2048, match:nn |
19-05-19 | F | 97.1 | 1362.8 | 92.2 | 3.85 | 0.5274 | 0.6036 | 0.6536 | 0.6790 | 0.7021 | — | Challenge organizers | HardNet extracted on SIFT keypoints. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger patches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. | https://github.com/DagnyT/hardnet | imagematching@uvic.ca | N/A | 128 float32 |
HarrisZ/RsGLOH2 kp:8000, match:sGOr2f |
19-05-23 | F | 98.9 | 1277.6 | 97.5 | 3.83 | 0.6315 | 0.7058 | 0.7392 | 0.7719 | 0.7970 | — | Fabio Bellavia and Carlo Colombo | HarrisZ keypoints and Root squared (like RootSIFT) sGLOH2 descriptors using sGOr2f* matching strategy. Distance table entries lower than their respective row-wise and column-wise averages are discarded and then matching pairs are computed using the greedy nearest neighbour as in the WISW@CAIP2019 contest. Keypoints and matches are ordered according to their ranks (best ones first). | http://cvg.dsi.unifi.it/cvg/index.php?id=research#descriptor | bellavia.fabio@gmail.com | N/A | 256 float32 |
HesAffNet - HardNet2 kp:8000, match:nn |
19-05-29 | F | 99.0 | 6822.6 | 99.0 | 4.00 | 0.6910 | 0.7537 | 0.7908 | 0.8168 | 0.8375 | — | Milan Pultar, Dmytro Mishkin, Jiří Matas | Detector: Hessian + AffNet(affine shape) + OriNet (orientation), Code and weight from https://github.com/ducha-aiki/affnet, Paper: https://arxiv.org/abs/1711.06704; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: https://github.com/pultarmi/HardNet_MultiDataset, Paper: https://arxiv.org/pdf/1901.09780.pdf; DeepNets were plugged into MODS framework, but without view synthesis and matching https://github.com/ducha-aiki/mods-light-zmq; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. | TBA | ducha.aiki@gmail.com | N/A | 128 uint8 |
Hessian - HardNet2 kp:8000, match:nn |
19-05-30 | F | 96.9 | 6726.0 | 95.8 | 4.04 | 0.6736 | 0.7241 | 0.7407 | 0.7595 | 0.7736 | — | Milan Pultar, Dmytro Mishkin, Jiří Matas | Detector: Hessian. Gravity vector orientation is assumed; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: https://github.com/pultarmi/HardNet_MultiDataset Paper: https://arxiv.org/pdf/1901.09780.pdf, DeepNets were plugged into MODS framework, but without view synthesis and matching https://github.com/ducha-aiki/mods-light-zmq; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. | TBA | ducha.aiki@gmail.com | N/A | 128 uint8 |
ORB (OpenCV) kp:8000, match:nn |
19-04-24 | F | 95.6 | 2357.2 | 91.8 | 3.47 | 0.2133 | 0.2653 | 0.3016 | 0.3404 | 0.3774 | — | Challenge organizers | ORB, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. | https://opencv.org | imagematching@uvic.ca | N/A | TBA |
Scale-invariant desc. (Log-Polar, lambda=32) kp:8000, match:nn |
19-06-25 | F | 98.1 | 6861.5 | 97.2 | 3.47 | 0.6357 | 0.6994 | 0.7261 | 0.7485 | 0.7656 | — | Patrick Ebel | We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. | TBA | patrick.ebel@epfl.ch | N/A | 128 float32 |
Scale-invariant desc. (Log-Polar, lambda=64) kp:8000, match:nn |
19-06-24 | F | 98.1 | 7181.7 | 96.5 | 3.44 | 0.6370 | 0.7004 | 0.7361 | 0.7596 | 0.7767 | — | Patrick Ebel | We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. | TBA | patrick.ebel@epfl.ch | N/A | 128 float32 |
Scale-invariant desc. (Log-Polar, lambda=96) kp:8000, match:nn |
19-06-20 | F | 98.4 | 7138.7 | 96.2 | 3.44 | 0.6257 | 0.6971 | 0.7317 | 0.7544 | 0.7747 | — | Patrick Ebel | We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. | TBA | patrick.ebel@epfl.ch | N/A | 128 float32 |
SIFT-AID (NN matcher) kp:8000, match:nn |
19-05-10 | F | 88.9 | 820.2 | 83.2 | 3.42 | 0.1204 | 0.1771 | 0.2183 | 0.2457 | 0.2696 | — | Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon | We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: https://hal.archives-ouvertes.fr/hal-02016010. Code: https://github.com/rdguez-mariano/sift-aid | https://hal.archives-ouvertes.fr/hal-02016010 | facciolo@cmla.ens-cachan.fr | N/A | 6272 bits |
SIFT-AID (custom matcher) kp:8000, match:sift-aid |
19-04-29 | F/M | 86.8 | 725.6 | 80.2 | 3.21 | 0.0821 | 0.1254 | 0.1588 | 0.1892 | 0.2132 | — | Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon | We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: https://hal.archives-ouvertes.fr/hal-02016010. Code: https://github.com/rdguez-mariano/sift-aid | https://hal.archives-ouvertes.fr/hal-02016010 | facciolo@cmla.ens-cachan.fr | N/A | 6272 bits |
SIFT + ContextDesc kp:8000, match:nn |
19-05-09 | F | 98.8 | 5106.3 | 97.0 | 3.43 | 0.6233 | 0.6881 | 0.7204 | 0.7408 | 0.7649 | — | Zixin Luo | ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. | https://github.com/lzx551402/contextdesc | zluoag@cse.ust.hk | N/A | 128 uint8 |
SIFT-Dense-ContextDesc kp:8000, match:nn |
19-05-28 | F | 97.7 | 4447.9 | 94.0 | 3.41 | 0.6325 | 0.6999 | 0.7283 | 0.7484 | 0.7625 | — | Zixin Luo, Jiahui Zhang | Dense-ContextDesc is a variant of ContextDesc, where descriptors are densely extracted from full images, instead of image patches, while other settings stay unchanged as original ContextDesc. We find Dense-ContextDesc performs better regarding in particular illumination changes. Dense-ContextDesc is extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. The model will be available on the authors' GitHub page. | TBA | zluoag@cse.ust.hk | N/A | TBA |
SIFT + ContextDesc + Inlier Classification V2 kp:8000, match:custom |
19-05-28 | F/M | 99.6 | 5099.5 | 98.8 | 3.54 | 0.7674 | 0.8225 | 0.8542 | 0.8722 | 0.8874 | — | Dawei Sun, Zixin Luo, Jiahui Zhang | We use the SIFT detector and ContextDesc descriptor, and then we train an improved inlier classification and fundamental matrix estimation network based on [Yi et al. CVPR2018] (https://arxiv.org/pdf/1711.05971.pdf). | https://github.com/lzx551402/contextdesc | zluoag@cse.ust.hk | N/A | 128 float32 |
SIFT-GeoDesc-GitHub kp:8000, match:nn |
19-05-08 | F | 98.5 | 4395.5 | 96.8 | 3.52 | 0.5948 | 0.6508 | 0.6834 | 0.7094 | 0.7323 | — | Zixin Luo | GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features extracted from the code provided by the authors. | https://arxiv.org/abs/1807.06294 | zluoag@cse.ust.hk | N/A | 128 float32 |
SIFT + GeoDesc kp:8000, match:nn |
19-04-24 | F | 98.1 | 5615.2 | 96.8 | 3.55 | 0.6031 | 0.6511 | 0.6744 | 0.6970 | 0.7121 | — | Challenge organizers | GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. | https://github.com/lzx551402/geodesc | imagematching@uvic.ca | N/A | 128 float32 |
SIFT + HardNet kp:8000, match:nn |
19-04-24 | F | 98.6 | 6905.6 | 96.8 | 3.56 | 0.6800 | 0.7454 | 0.7726 | 0.7886 | 0.8055 | — | Challenge organizers | HardNet extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. | https://github.com/DagnyT/hardnet | imagematching@uvic.ca | N/A | 128 float32 |
SIFT + L2-Net kp:8000, match:nn |
19-04-24 | F | 98.5 | 6278.0 | 94.2 | 3.47 | 0.6097 | 0.6791 | 0.7109 | 0.7307 | 0.7504 | — | Challenge organizers | L2-Net extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. | https://github.com/yuruntian/L2-Net | imagematching@uvic.ca | N/A | 128 float32 |
SIFT + ContextDesc + Inlier Classification V1 kp:8000, match:custom |
19-05-29 | F/M | 99.5 | 5009.4 | 99.2 | 3.56 | 0.7708 | 0.8430 | 0.8677 | 0.8873 | 0.8984 | — | Dawei Sun, Zixin Luo, Jiahui Zhang | We use the SIFT detector and ContextDesc descriptor, and then we train an inlier classification and fundamental matrix estimation network using the architecture of [Yi et al. CVPR2018] (https://arxiv.org/pdf/1711.05971.pdf). | https://github.com/lzx551402/contextdesc | zluoag@cse.ust.hk | N/A | 128 float32 |
SIFT (OpenCV) kp:8000, match:nn |
19-04-24 | F | 94.9 | 3769.6 | 86.8 | 3.27 | 0.3983 | 0.4470 | 0.4768 | 0.5010 | 0.5233 | — | Challenge organizers | SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. | https://opencv.org | imagematching@uvic.ca | N/A | 128 float32 |
SIFT + TFeat kp:8000, match:nn |
19-04-24 | F | 98.0 | 5567.0 | 92.2 | 3.37 | 0.5373 | 0.5954 | 0.6238 | 0.6453 | 0.6680 | — | Challenge organizers | T-Feat extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. | https://github.com/vbalnt/tfeat | imagematching@uvic.ca | N/A | 128 float32 |
SIFT (OpenCV) kp:2048, match:nn |
19-05-17 | F | 92.4 | 1593.7 | 85.2 | 3.36 | 0.3010 | 0.3615 | 0.4038 | 0.4352 | 0.4573 | — | Challenge organizers | SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. | https://opencv.org | imagematching@uvic.ca | N/A | 128 float32 |
Superpoint (nn matcher) kp:2048, match:nn |
19-06-07 | F | 96.4 | 1535.2 | 90.0 | 4.02 | 0.6261 | 0.6877 | 0.7106 | 0.7276 | 0.7416 | — | Challenge organizers | SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. | TBA | imagematching@uvic.ca | N/A | 256 float32 |
Superpoint (1:1 matcher) kp:2048, match:nn1to1 |
19-06-07 | F | 98.1 | 1490.6 | 91.2 | 4.17 | 0.7057 | 0.7695 | 0.7953 | 0.8085 | 0.8189 | — | Challenge organizers | SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. | TBA | imagematching@uvic.ca | N/A | 256 float32 |
SuperPoint (default) kp:2048, match:nn |
19-04-24 | F | 93.6 | 791.2 | 87.2 | 4.40 | 0.5501 | 0.6104 | 0.6357 | 0.6530 | 0.6638 | — | Challenge organizers | SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned (about 1200 on average). Feature matching done with brute-force nearest-neighbour search. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | imagematching@uvic.ca | N/A | 256 float32 |
SuperPoint kp:1024, match:nn |
19-04-26 | F | 93.5 | 967.9 | 87.5 | 4.26 | 0.5339 | 0.5940 | 0.6258 | 0.6436 | 0.6547 | — | Challenge organizers | SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | imagematching@uvic.ca | N/A | 256 float32 |
SuperPoint kp:2048, match:nn |
19-04-26 | F | 93.1 | 1772.5 | 88.5 | 3.71 | 0.4758 | 0.5381 | 0.5727 | 0.5973 | 0.6115 | — | Challenge organizers | SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | imagematching@uvic.ca | N/A | 256 float32 |
SuperPoint kp:256, match:nn |
19-04-26 | F | 84.0 | 238.9 | 79.5 | 4.24 | 0.2156 | 0.2626 | 0.2964 | 0.3136 | 0.3343 | — | Challenge organizers | SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | imagematching@uvic.ca | N/A | 256 float32 |
SuperPoint kp:512, match:nn |
19-04-26 | F | 92.7 | 519.7 | 85.0 | 4.58 | 0.4984 | 0.5515 | 0.5745 | 0.5930 | 0.6051 | — | Challenge organizers | SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | imagematching@uvic.ca | N/A | 256 float32 |
SuperPoint kp:8000, match:nn |
19-04-26 | F | 94.4 | 6401.4 | 90.5 | 3.23 | 0.4430 | 0.5314 | 0.5753 | 0.6083 | 0.6304 | — | Challenge organizers | SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | imagematching@uvic.ca | N/A | 256 float32 |
Scale-invariant desc. (Cartesian, lambda=16) kp:8000, match:nn |
19-07-29 | F | 99.1 | 6734.6 | 97.2 | 3.64 | 0.6515 | 0.7284 | 0.7612 | 0.7863 | 0.8062 | — | Patrick Ebel | We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. (This is a baseline where we use cartesian patches instead.) Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. | TBA | patrick.ebel@epfl.ch | N/A | 128 float32 |
SuperPoint (trained on coco + phototourism training set) kp:2048, match:nn |
19-05-30 | F | 97.1 | 1106.8 | 90.0 | 4.02 | 0.5655 | 0.6321 | 0.6628 | 0.6907 | 0.7063 | — | Daniel DeTone, Paul Sarlin, Tomasz Malisiewicz, Andrew Rabinovich | SuperPoint V1 model trained on COCO homographic warps at VGA resolution, plus pairs from the phototourism training set using the GT poses and depths for correspondence. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | ddetone@magicleap.com | N/A | 256 float32 |
SuperPoint + Custom Matcher (v1) kp:2048, match:custom |
19-05-30 | F/M | 97.5 | 798.1 | 93.2 | 4.59 | 0.5836 | 0.6523 | 0.6918 | 0.7197 | 0.7406 | — | Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich | Features are extracted by a modified SuperPoint: - retrained on the phototourism training set; - keypoints refinement; - better descriptor sampling; - adjusted thresholds. A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). | TBA | pesarlin@magicleap.com | N/A | 256 float32 |
SuperPoint + Custom Matcher (v2) kp:2048, match:custom |
19-05-28 | F/M | 96.8 | 1237.2 | 91.8 | 4.20 | 0.6141 | 0.6727 | 0.6997 | 0.7208 | 0.7390 | — | Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich | Features are extracted by a modified SuperPoint: - retrained on the phototourism training set, - keypoints refinement, - better descriptor sampling, - adjusted thresholds; A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). | TBA | pesarlin@magicleap.com | N/A | 256 float32 |
SURF (OpenCV) kp:8000, match:nn |
19-04-24 | F | 94.5 | 3245.4 | 85.0 | 2.98 | 0.3038 | 0.3571 | 0.3862 | 0.4119 | 0.4384 | — | Challenge organizers | SURF, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. | https://opencv.org | imagematching@uvic.ca | N/A | TBA |
SIFT + ContextDesc kp:8000, match:nn1to1 |
19-06-07 | F | 99.4 | 5063.3 | 99.8 | 3.62 | 0.7325 | 0.7950 | 0.8314 | 0.8568 | 0.8713 | — | Challenge organizers | ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with nearest-neighbour search, enforcing cross-match consistency. Features are quantized to uint8 and extracted from the code provided by the authors. | https://github.com/lzx551402/contextdesc | imagematching@uvic.ca | N/A | TBA |
MVS — sequence 'london_bridge' | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Method | Date | Type | Ims (%) | #Pts | SR | TL | mAP5o | mAP10o | mAP15o | mAP20o | mAP25o | ATE | ||||||
AKAZE (OpenCV) kp:8000, match:nn |
19-04-24 | F | 94.1 | 3666.5 | 89.2 | 3.27 | 0.2433 | 0.3206 | 0.3696 | 0.4181 | 0.4534 | — | Challenge organizers | AKAZE, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. | https://opencv.org | imagematching@uvic.ca | N/A | TBA |
SphereDesc kp:8000, match:nn |
19-04-26 | F | 96.8 | 2633.4 | 94.2 | 3.19 | 0.2211 | 0.3162 | 0.3789 | 0.4291 | 0.4686 | — | Anonymous | We use OpenCV's implementation of AKAZE detector, and for each keypoint we extract a descriptor via CNN. | N/A | Anonymous | N/A | 256 float32 |
Brisk + SSS kp:8000, match:nn |
19-05-14 | F | 94.6 | 2154.4 | 89.8 | 2.83 | 0.1277 | 0.2320 | 0.3040 | 0.3615 | 0.4028 | — | Anonymous | We use OpenCV's implementation of brisk detector with the default settings, and for each image there are at most 8K keypoints. For each keypoint, we extract a descriptor via a CNN model. | TBA | Anonymous | N/A | 128 float32 |
D2-Net (single scale) kp:8000, match:nn |
19-05-07 | F | 96.4 | 3939.5 | 97.0 | 3.31 | 0.2558 | 0.4064 | 0.4915 | 0.5478 | 0.5956 | — | Challenge organizers | D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf | https://github.com/mihaidusmanu/d2-net | imagematching@uvic.ca | N/A | 512 float32 |
D2-Net (multiscale) kp:8000, match:nn |
19-05-07 | F | 96.4 | 5022.2 | 97.0 | 3.05 | 0.2057 | 0.3689 | 0.4622 | 0.5308 | 0.5852 | — | Challenge organizers | D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf | https://github.com/mihaidusmanu/d2-net | imagematching@uvic.ca | N/A | 512 float32 |
D2-Net (single scale, no PT dataset) kp:8000, match:nn |
19-06-01 | F | 96.6 | 4016.4 | 96.5 | 3.30 | 0.2528 | 0.4043 | 0.4932 | 0.5578 | 0.6048 | — | Challenge organizers | D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf | https://github.com/mihaidusmanu/d2-net | imagematching@uvic.ca | N/A | 512 float32 |
D2-Net (multi-scale, no PT dataset) kp:8000, match:nn |
19-06-05 | F | 96.6 | 4915.3 | 97.0 | 3.05 | 0.1935 | 0.3499 | 0.4545 | 0.5304 | 0.5837 | — | Challenge organizers | D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf | https://github.com/mihaidusmanu/d2-net | imagematching@uvic.ca | N/A | 512 float32 |
DELF kp:1024, match:nn |
19-05-05 | F | 92.2 | 922.8 | 91.5 | 2.50 | 0.0184 | 0.0747 | 0.1254 | 0.1797 | 0.2241 | — | Challenge organizers | DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 | https://github.com/tensorflow/models/tree/master/research/delf | imagematching@uvic.ca | N/A | 40 float32 |
DELF kp:2048, match:nn |
19-05-05 | F | 94.0 | 1667.0 | 94.8 | 2.48 | 0.0207 | 0.0800 | 0.1300 | 0.1810 | 0.2237 | — | Challenge organizers | DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 | https://github.com/tensorflow/models/tree/master/research/delf | imagematching@uvic.ca | N/A | 40 float32 |
DELF kp:256, match:nn |
19-05-05 | F | 60.8 | 109.8 | 39.0 | 2.29 | 0.0027 | 0.0104 | 0.0195 | 0.0277 | 0.0360 | — | Challenge organizers | DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 | https://github.com/tensorflow/models/tree/master/research/delf | imagematching@uvic.ca | N/A | 40 float32 |
DELF kp:512, match:nn |
19-05-05 | F | 83.5 | 432.5 | 77.5 | 2.48 | 0.0154 | 0.0543 | 0.0988 | 0.1371 | 0.1746 | — | Challenge organizers | DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 | https://github.com/tensorflow/models/tree/master/research/delf | imagematching@uvic.ca | N/A | 40 float32 |
ELF-256D kp:512, match:nn |
19-05-07 | F | 73.9 | 159.8 | 46.8 | 2.97 | 0.0904 | 0.1372 | 0.1584 | 0.1737 | 0.1847 | — | Anonymous | ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. | TBA | Anonymous | N/A | 256 float32 |
ELF-512D kp:512, match:nn |
19-05-09 | F | 77.7 | 169.0 | 55.8 | 3.10 | 0.0979 | 0.1649 | 0.2027 | 0.2218 | 0.2363 | — | Anonymous | ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. | TBA | Anonymous | N/A | 256 float32 |
ELF-SIFT kp:512, match:nn |
19-04-26 | F | 84.8 | 328.1 | 65.5 | 3.12 | 0.1809 | 0.2550 | 0.2906 | 0.3187 | 0.3396 | — | Anonymous | ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are HOG (as in SIFT). | N/A | Anonymous | N/A | 128 uint8 |
SIFT + GeoDesc kp:2048, match:nn |
19-05-19 | F | 91.6 | 1260.9 | 87.5 | 3.05 | 0.1135 | 0.1838 | 0.2262 | 0.2627 | 0.2948 | — | Challenge organizers | GeoDesc extracted on SIFT keypoints. Feature matching with brute-force nearest-neighbour search. | https://github.com/lzx551402/geodesc | imagematching@uvic.ca | N/A | 128 float32 |
SIFT + HardNet kp:2048, match:nn |
19-05-19 | F | 92.0 | 1448.8 | 87.2 | 3.03 | 0.1051 | 0.1759 | 0.2170 | 0.2485 | 0.2815 | — | Challenge organizers | HardNet extracted on SIFT keypoints. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger patches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. | https://github.com/DagnyT/hardnet | imagematching@uvic.ca | N/A | 128 float32 |
HarrisZ/RsGLOH2 kp:8000, match:sGOr2f |
19-05-23 | F | 96.4 | 1242.2 | 93.8 | 3.35 | 0.3447 | 0.4644 | 0.5208 | 0.5691 | 0.6083 | — | Fabio Bellavia and Carlo Colombo | HarrisZ keypoints and Root squared (like RootSIFT) sGLOH2 descriptors using sGOr2f* matching strategy. Distance table entries lower than their respective row-wise and column-wise averages are discarded and then matching pairs are computed using the greedy nearest neighbour as in the WISW@CAIP2019 contest. Keypoints and matches are ordered according to their ranks (best ones first). | http://cvg.dsi.unifi.it/cvg/index.php?id=research#descriptor | bellavia.fabio@gmail.com | N/A | 256 float32 |
HesAffNet - HardNet2 kp:8000, match:nn |
19-05-29 | F | 96.2 | 4976.8 | 94.5 | 3.36 | 0.2634 | 0.3549 | 0.4149 | 0.4597 | 0.4984 | — | Milan Pultar, Dmytro Mishkin, Jiří Matas | Detector: Hessian + AffNet(affine shape) + OriNet (orientation), Code and weight from https://github.com/ducha-aiki/affnet, Paper: https://arxiv.org/abs/1711.06704; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: https://github.com/pultarmi/HardNet_MultiDataset, Paper: https://arxiv.org/pdf/1901.09780.pdf; DeepNets were plugged into MODS framework, but without view synthesis and matching https://github.com/ducha-aiki/mods-light-zmq; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. | TBA | ducha.aiki@gmail.com | N/A | 128 uint8 |
Hessian - HardNet2 kp:8000, match:nn |
19-05-30 | F | 95.7 | 5314.1 | 94.5 | 3.38 | 0.2466 | 0.3423 | 0.3967 | 0.4400 | 0.4687 | — | Milan Pultar, Dmytro Mishkin, Jiří Matas | Detector: Hessian. Gravity vector orientation is assumed; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: https://github.com/pultarmi/HardNet_MultiDataset Paper: https://arxiv.org/pdf/1901.09780.pdf, DeepNets were plugged into MODS framework, but without view synthesis and matching https://github.com/ducha-aiki/mods-light-zmq; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. | TBA | ducha.aiki@gmail.com | N/A | 128 uint8 |
ORB (OpenCV) kp:8000, match:nn |
19-04-24 | F | 88.4 | 2595.0 | 78.5 | 2.82 | 0.0736 | 0.1280 | 0.1642 | 0.1981 | 0.2300 | — | Challenge organizers | ORB, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. | https://opencv.org | imagematching@uvic.ca | N/A | TBA |
Scale-invariant desc. (Log-Polar, lambda=32) kp:8000, match:nn |
19-06-25 | F | 97.8 | 6327.6 | 98.2 | 3.32 | 0.3391 | 0.4378 | 0.5028 | 0.5517 | 0.5945 | — | Patrick Ebel | We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. | TBA | patrick.ebel@epfl.ch | N/A | 128 float32 |
Scale-invariant desc. (Log-Polar, lambda=64) kp:8000, match:nn |
19-06-24 | F | 97.9 | 6574.8 | 98.2 | 3.32 | 0.3616 | 0.4671 | 0.5324 | 0.5772 | 0.6103 | — | Patrick Ebel | We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. | TBA | patrick.ebel@epfl.ch | N/A | 128 float32 |
Scale-invariant desc. (Log-Polar, lambda=96) kp:8000, match:nn |
19-06-20 | F | 97.5 | 6681.3 | 98.2 | 3.31 | 0.3863 | 0.4826 | 0.5410 | 0.5843 | 0.6152 | — | Patrick Ebel | We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. | TBA | patrick.ebel@epfl.ch | N/A | 128 float32 |
SIFT-AID (NN matcher) kp:8000, match:nn |
19-05-10 | F | 94.2 | 2102.2 | 89.5 | 3.02 | 0.1235 | 0.2106 | 0.2667 | 0.3153 | 0.3640 | — | Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon | We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: https://hal.archives-ouvertes.fr/hal-02016010. Code: https://github.com/rdguez-mariano/sift-aid | https://hal.archives-ouvertes.fr/hal-02016010 | facciolo@cmla.ens-cachan.fr | N/A | 6272 bits |
SIFT-AID (custom matcher) kp:8000, match:sift-aid |
19-04-29 | F/M | 93.6 | 2021.3 | 88.8 | 2.93 | 0.1019 | 0.1855 | 0.2417 | 0.2926 | 0.3357 | — | Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon | We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: https://hal.archives-ouvertes.fr/hal-02016010. Code: https://github.com/rdguez-mariano/sift-aid | https://hal.archives-ouvertes.fr/hal-02016010 | facciolo@cmla.ens-cachan.fr | N/A | 6272 bits |
SIFT + ContextDesc kp:8000, match:nn |
19-05-09 | F | 98.2 | 5949.2 | 98.0 | 3.28 | 0.4002 | 0.5159 | 0.5712 | 0.6138 | 0.6520 | — | Zixin Luo | ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. | https://github.com/lzx551402/contextdesc | zluoag@cse.ust.hk | N/A | 128 uint8 |
SIFT-Dense-ContextDesc kp:8000, match:nn |
19-05-28 | F | 98.1 | 5188.2 | 97.8 | 3.33 | 0.3329 | 0.4564 | 0.5217 | 0.5654 | 0.6039 | — | Zixin Luo, Jiahui Zhang | Dense-ContextDesc is a variant of ContextDesc, where descriptors are densely extracted from full images, instead of image patches, while other settings stay unchanged as original ContextDesc. We find Dense-ContextDesc performs better regarding in particular illumination changes. Dense-ContextDesc is extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. The model will be available on the authors' GitHub page. | TBA | zluoag@cse.ust.hk | N/A | TBA |
SIFT + ContextDesc + Inlier Classification V2 kp:8000, match:custom |
19-05-28 | F/M | 98.7 | 5984.1 | 99.2 | 3.40 | 0.6285 | 0.7553 | 0.8174 | 0.8442 | 0.8620 | — | Dawei Sun, Zixin Luo, Jiahui Zhang | We use the SIFT detector and ContextDesc descriptor, and then we train an improved inlier classification and fundamental matrix estimation network based on [Yi et al. CVPR2018] (https://arxiv.org/pdf/1711.05971.pdf). | https://github.com/lzx551402/contextdesc | zluoag@cse.ust.hk | N/A | 128 float32 |
SIFT-GeoDesc-GitHub kp:8000, match:nn |
19-05-08 | F | 98.1 | 4628.8 | 97.5 | 3.39 | 0.3420 | 0.4375 | 0.4911 | 0.5313 | 0.5713 | — | Zixin Luo | GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features extracted from the code provided by the authors. | https://arxiv.org/abs/1807.06294 | zluoag@cse.ust.hk | N/A | 128 float32 |
SIFT + GeoDesc kp:8000, match:nn |
19-04-24 | F | 97.3 | 4744.3 | 97.0 | 3.39 | 0.3242 | 0.4196 | 0.4696 | 0.5165 | 0.5491 | — | Challenge organizers | GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. | https://github.com/lzx551402/geodesc | imagematching@uvic.ca | N/A | 128 float32 |
SIFT + HardNet kp:8000, match:nn |
19-04-24 | F | 98.3 | 6062.4 | 97.5 | 3.33 | 0.2956 | 0.4020 | 0.4702 | 0.5168 | 0.5550 | — | Challenge organizers | HardNet extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. | https://github.com/DagnyT/hardnet | imagematching@uvic.ca | N/A | 128 float32 |
SIFT + L2-Net kp:8000, match:nn |
19-04-24 | F | 97.5 | 5746.2 | 98.2 | 3.28 | 0.2718 | 0.3646 | 0.4380 | 0.4885 | 0.5317 | — | Challenge organizers | L2-Net extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. | https://github.com/yuruntian/L2-Net | imagematching@uvic.ca | N/A | 128 float32 |
SIFT + ContextDesc + Inlier Classification V1 kp:8000, match:custom |
19-05-29 | F/M | 98.4 | 6129.5 | 99.2 | 3.35 | 0.6083 | 0.7361 | 0.7937 | 0.8251 | 0.8564 | — | Dawei Sun, Zixin Luo, Jiahui Zhang | We use the SIFT detector and ContextDesc descriptor, and then we train an inlier classification and fundamental matrix estimation network using the architecture of [Yi et al. CVPR2018] (https://arxiv.org/pdf/1711.05971.pdf). | https://github.com/lzx551402/contextdesc | zluoag@cse.ust.hk | N/A | 128 float32 |
SIFT (OpenCV) kp:8000, match:nn |
19-04-24 | F | 94.7 | 3614.1 | 87.8 | 3.27 | 0.2479 | 0.3303 | 0.3859 | 0.4305 | 0.4718 | — | Challenge organizers | SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. | https://opencv.org | imagematching@uvic.ca | N/A | 128 float32 |
SIFT + TFeat kp:8000, match:nn |
19-04-24 | F | 97.0 | 5010.7 | 97.0 | 3.27 | 0.2514 | 0.3525 | 0.4131 | 0.4563 | 0.4968 | — | Challenge organizers | T-Feat extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. | https://github.com/vbalnt/tfeat | imagematching@uvic.ca | N/A | 128 float32 |
SIFT (OpenCV) kp:2048, match:nn |
19-05-17 | F | 82.3 | 888.5 | 71.5 | 2.87 | 0.0827 | 0.1396 | 0.1747 | 0.2032 | 0.2290 | — | Challenge organizers | SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. | https://opencv.org | imagematching@uvic.ca | N/A | 128 float32 |
Superpoint (nn matcher) kp:2048, match:nn |
19-06-07 | F | 97.5 | 1398.6 | 95.2 | 3.91 | 0.3730 | 0.4976 | 0.5541 | 0.6002 | 0.6341 | — | Challenge organizers | SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. | TBA | imagematching@uvic.ca | N/A | 256 float32 |
Superpoint (1:1 matcher) kp:2048, match:nn1to1 |
19-06-07 | F | 97.8 | 1336.9 | 95.2 | 4.03 | 0.4823 | 0.6033 | 0.6602 | 0.7031 | 0.7303 | — | Challenge organizers | SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. | TBA | imagematching@uvic.ca | N/A | 256 float32 |
SuperPoint (default) kp:2048, match:nn |
19-04-24 | F | 96.0 | 922.0 | 92.8 | 3.88 | 0.4148 | 0.5400 | 0.5936 | 0.6320 | 0.6642 | — | Challenge organizers | SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned (about 1200 on average). Feature matching done with brute-force nearest-neighbour search. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | imagematching@uvic.ca | N/A | 256 float32 |
SuperPoint kp:1024, match:nn |
19-04-26 | F | 95.9 | 874.8 | 94.5 | 3.93 | 0.4395 | 0.5600 | 0.6135 | 0.6477 | 0.6756 | — | Challenge organizers | SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | imagematching@uvic.ca | N/A | 256 float32 |
SuperPoint kp:2048, match:nn |
19-04-26 | F | 96.3 | 1484.5 | 94.8 | 3.76 | 0.4041 | 0.5096 | 0.5711 | 0.6172 | 0.6479 | — | Challenge organizers | SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | imagematching@uvic.ca | N/A | 256 float32 |
SuperPoint kp:256, match:nn |
19-04-26 | F | 87.0 | 225.8 | 63.5 | 3.37 | 0.2607 | 0.3387 | 0.3699 | 0.3857 | 0.3984 | — | Challenge organizers | SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | imagematching@uvic.ca | N/A | 256 float32 |
SuperPoint kp:512, match:nn |
19-04-26 | F | 93.8 | 481.0 | 88.2 | 3.78 | 0.4028 | 0.5143 | 0.5669 | 0.5948 | 0.6117 | — | Challenge organizers | SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | imagematching@uvic.ca | N/A | 256 float32 |
SuperPoint kp:8000, match:nn |
19-04-26 | F | 96.7 | 4626.6 | 96.2 | 3.46 | 0.2205 | 0.3698 | 0.4620 | 0.5172 | 0.5649 | — | Challenge organizers | SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | imagematching@uvic.ca | N/A | 256 float32 |
Scale-invariant desc. (Cartesian, lambda=16) kp:8000, match:nn |
19-07-29 | F | 98.2 | 6228.3 | 97.5 | 3.34 | 0.2030 | 0.3112 | 0.3762 | 0.4243 | 0.4616 | — | Patrick Ebel | We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. (This is a baseline where we use cartesian patches instead.) Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. | TBA | patrick.ebel@epfl.ch | N/A | 128 float32 |
SuperPoint (trained on coco + phototourism training set) kp:2048, match:nn |
19-05-30 | F | 97.4 | 1092.8 | 95.2 | 3.88 | 0.4075 | 0.5158 | 0.5740 | 0.6199 | 0.6544 | — | Daniel DeTone, Paul Sarlin, Tomasz Malisiewicz, Andrew Rabinovich | SuperPoint V1 model trained on COCO homographic warps at VGA resolution, plus pairs from the phototourism training set using the GT poses and depths for correspondence. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | ddetone@magicleap.com | N/A | 256 float32 |
SuperPoint + Custom Matcher (v1) kp:2048, match:custom |
19-05-30 | F/M | 98.2 | 1071.6 | 98.0 | 4.10 | 0.6167 | 0.7569 | 0.8131 | 0.8474 | 0.8664 | — | Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich | Features are extracted by a modified SuperPoint: - retrained on the phototourism training set; - keypoints refinement; - better descriptor sampling; - adjusted thresholds. A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). | TBA | pesarlin@magicleap.com | N/A | 256 float32 |
SuperPoint + Custom Matcher (v2) kp:2048, match:custom |
19-05-28 | F/M | 98.2 | 1316.8 | 97.0 | 4.10 | 0.6218 | 0.7594 | 0.8126 | 0.8456 | 0.8671 | — | Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich | Features are extracted by a modified SuperPoint: - retrained on the phototourism training set, - keypoints refinement, - better descriptor sampling, - adjusted thresholds; A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). | TBA | pesarlin@magicleap.com | N/A | 256 float32 |
SURF (OpenCV) kp:8000, match:nn |
19-04-24 | F | 91.7 | 2428.3 | 84.5 | 2.89 | 0.0975 | 0.1677 | 0.2207 | 0.2657 | 0.3093 | — | Challenge organizers | SURF, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. | https://opencv.org | imagematching@uvic.ca | N/A | TBA |
SIFT + ContextDesc kp:8000, match:nn1to1 |
19-06-07 | F | 98.5 | 5803.5 | 99.2 | 3.45 | 0.4695 | 0.6102 | 0.6763 | 0.7138 | 0.7400 | — | Challenge organizers | ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with nearest-neighbour search, enforcing cross-match consistency. Features are quantized to uint8 and extracted from the code provided by the authors. | https://github.com/lzx551402/contextdesc | imagematching@uvic.ca | N/A | TBA |
MVS — sequence 'milan_cathedral' | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Method | Date | Type | Ims (%) | #Pts | SR | TL | mAP5o | mAP10o | mAP15o | mAP20o | mAP25o | ATE | ||||||
AKAZE (OpenCV) kp:8000, match:nn |
19-04-24 | F | 98.0 | 4605.7 | 90.2 | 3.54 | 0.3742 | 0.5242 | 0.5935 | 0.6420 | 0.6702 | — | Challenge organizers | AKAZE, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. | https://opencv.org | imagematching@uvic.ca | N/A | TBA |
SphereDesc kp:8000, match:nn |
19-04-26 | F | 98.7 | 4561.8 | 96.0 | 3.04 | 0.3799 | 0.5325 | 0.6131 | 0.6570 | 0.7016 | — | Anonymous | We use OpenCV's implementation of AKAZE detector, and for each keypoint we extract a descriptor via CNN. | N/A | Anonymous | N/A | 256 float32 |
Brisk + SSS kp:8000, match:nn |
19-05-14 | F | 95.7 | 2854.1 | 88.2 | 2.77 | 0.1503 | 0.2682 | 0.3429 | 0.3976 | 0.4477 | — | Anonymous | We use OpenCV's implementation of brisk detector with the default settings, and for each image there are at most 8K keypoints. For each keypoint, we extract a descriptor via a CNN model. | TBA | Anonymous | N/A | 128 float32 |
D2-Net (single scale) kp:8000, match:nn |
19-05-07 | F | 96.7 | 4810.8 | 94.5 | 3.20 | 0.2055 | 0.3503 | 0.4482 | 0.5225 | 0.5721 | — | Challenge organizers | D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf | https://github.com/mihaidusmanu/d2-net | imagematching@uvic.ca | N/A | 512 float32 |
D2-Net (multiscale) kp:8000, match:nn |
19-05-07 | F | 97.2 | 6061.8 | 96.0 | 3.00 | 0.1761 | 0.3279 | 0.4263 | 0.5026 | 0.5647 | — | Challenge organizers | D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf | https://github.com/mihaidusmanu/d2-net | imagematching@uvic.ca | N/A | 512 float32 |
D2-Net (single scale, no PT dataset) kp:8000, match:nn |
19-06-01 | F | 96.7 | 4781.9 | 94.0 | 3.18 | 0.1998 | 0.3591 | 0.4486 | 0.5188 | 0.5728 | — | Challenge organizers | D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf | https://github.com/mihaidusmanu/d2-net | imagematching@uvic.ca | N/A | 512 float32 |
D2-Net (multi-scale, no PT dataset) kp:8000, match:nn |
19-06-05 | F | 97.5 | 5649.4 | 96.5 | 2.98 | 0.1823 | 0.3418 | 0.4435 | 0.5222 | 0.5855 | — | Challenge organizers | D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf | https://github.com/mihaidusmanu/d2-net | imagematching@uvic.ca | N/A | 512 float32 |
DELF kp:1024, match:nn |
19-05-05 | F | 91.1 | 744.9 | 83.0 | 2.49 | 0.0203 | 0.0951 | 0.1748 | 0.2460 | 0.3072 | — | Challenge organizers | DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 | https://github.com/tensorflow/models/tree/master/research/delf | imagematching@uvic.ca | N/A | 40 float32 |
DELF kp:2048, match:nn |
19-05-05 | F | 93.5 | 1402.3 | 87.5 | 2.48 | 0.0304 | 0.1251 | 0.2152 | 0.2912 | 0.3442 | — | Challenge organizers | DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 | https://github.com/tensorflow/models/tree/master/research/delf | imagematching@uvic.ca | N/A | 40 float32 |
DELF kp:256, match:nn |
19-05-05 | F | 35.7 | 86.7 | 22.5 | 1.79 | 0.0003 | 0.0028 | 0.0075 | 0.0123 | 0.0165 | — | Challenge organizers | DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 | https://github.com/tensorflow/models/tree/master/research/delf | imagematching@uvic.ca | N/A | 40 float32 |
DELF kp:512, match:nn |
19-05-05 | F | 81.4 | 320.6 | 62.5 | 2.45 | 0.0054 | 0.0336 | 0.0773 | 0.1190 | 0.1548 | — | Challenge organizers | DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 | https://github.com/tensorflow/models/tree/master/research/delf | imagematching@uvic.ca | N/A | 40 float32 |
ELF-256D kp:512, match:nn |
19-05-07 | F | 62.4 | 166.5 | 32.5 | 2.55 | 0.0169 | 0.0303 | 0.0372 | 0.0412 | 0.0455 | — | Anonymous | ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. | TBA | Anonymous | N/A | 256 float32 |
ELF-512D kp:512, match:nn |
19-05-09 | F | 66.8 | 191.2 | 41.0 | 2.69 | 0.0256 | 0.0489 | 0.0645 | 0.0749 | 0.0830 | — | Anonymous | ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. | TBA | Anonymous | N/A | 256 float32 |
ELF-SIFT kp:512, match:nn |
19-04-26 | F | 84.0 | 321.6 | 60.5 | 3.18 | 0.1424 | 0.2233 | 0.2627 | 0.2884 | 0.3109 | — | Anonymous | ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are HOG (as in SIFT). | N/A | Anonymous | N/A | 128 uint8 |
SIFT + GeoDesc kp:2048, match:nn |
19-05-19 | F | 94.0 | 1382.2 | 86.8 | 3.04 | 0.2484 | 0.3696 | 0.4348 | 0.4782 | 0.5125 | — | Challenge organizers | GeoDesc extracted on SIFT keypoints. Feature matching with brute-force nearest-neighbour search. | https://github.com/lzx551402/geodesc | imagematching@uvic.ca | N/A | 128 float32 |
SIFT + HardNet kp:2048, match:nn |
19-05-19 | F | 94.1 | 1605.5 | 86.0 | 2.99 | 0.2640 | 0.3892 | 0.4446 | 0.4968 | 0.5305 | — | Challenge organizers | HardNet extracted on SIFT keypoints. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger patches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. | https://github.com/DagnyT/hardnet | imagematching@uvic.ca | N/A | 128 float32 |
HarrisZ/RsGLOH2 kp:8000, match:sGOr2f |
19-05-23 | F | 99.3 | 1977.7 | 97.2 | 3.44 | 0.4228 | 0.5685 | 0.6511 | 0.7022 | 0.7335 | — | Fabio Bellavia and Carlo Colombo | HarrisZ keypoints and Root squared (like RootSIFT) sGLOH2 descriptors using sGOr2f* matching strategy. Distance table entries lower than their respective row-wise and column-wise averages are discarded and then matching pairs are computed using the greedy nearest neighbour as in the WISW@CAIP2019 contest. Keypoints and matches are ordered according to their ranks (best ones first). | http://cvg.dsi.unifi.it/cvg/index.php?id=research#descriptor | bellavia.fabio@gmail.com | N/A | 256 float32 |
HesAffNet - HardNet2 kp:8000, match:nn |
19-05-29 | F | 98.8 | 5426.8 | 96.8 | 3.45 | 0.4157 | 0.5569 | 0.6392 | 0.6904 | 0.7250 | — | Milan Pultar, Dmytro Mishkin, Jiří Matas | Detector: Hessian + AffNet(affine shape) + OriNet (orientation), Code and weight from https://github.com/ducha-aiki/affnet, Paper: https://arxiv.org/abs/1711.06704; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: https://github.com/pultarmi/HardNet_MultiDataset, Paper: https://arxiv.org/pdf/1901.09780.pdf; DeepNets were plugged into MODS framework, but without view synthesis and matching https://github.com/ducha-aiki/mods-light-zmq; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. | TBA | ducha.aiki@gmail.com | N/A | 128 uint8 |
Hessian - HardNet2 kp:8000, match:nn |
19-05-30 | F | 97.8 | 5934.5 | 95.0 | 3.45 | 0.4255 | 0.5604 | 0.6356 | 0.6788 | 0.7094 | — | Milan Pultar, Dmytro Mishkin, Jiří Matas | Detector: Hessian. Gravity vector orientation is assumed; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: https://github.com/pultarmi/HardNet_MultiDataset Paper: https://arxiv.org/pdf/1901.09780.pdf, DeepNets were plugged into MODS framework, but without view synthesis and matching https://github.com/ducha-aiki/mods-light-zmq; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. | TBA | ducha.aiki@gmail.com | N/A | 128 uint8 |
ORB (OpenCV) kp:8000, match:nn |
19-04-24 | F | 83.7 | 3000.3 | 67.2 | 2.65 | 0.1012 | 0.1613 | 0.2027 | 0.2277 | 0.2493 | — | Challenge organizers | ORB, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. | https://opencv.org | imagematching@uvic.ca | N/A | TBA |
Scale-invariant desc. (Log-Polar, lambda=32) kp:8000, match:nn |
19-06-25 | F | 99.9 | 7184.5 | 97.8 | 3.52 | 0.4876 | 0.6357 | 0.7047 | 0.7474 | 0.7866 | — | Patrick Ebel | We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. | TBA | patrick.ebel@epfl.ch | N/A | 128 float32 |
Scale-invariant desc. (Log-Polar, lambda=64) kp:8000, match:nn |
19-06-24 | F | 99.8 | 7388.7 | 98.2 | 3.50 | 0.4924 | 0.6350 | 0.7128 | 0.7541 | 0.7900 | — | Patrick Ebel | We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. | TBA | patrick.ebel@epfl.ch | N/A | 128 float32 |
Scale-invariant desc. (Log-Polar, lambda=96) kp:8000, match:nn |
19-06-20 | F | 99.7 | 7437.9 | 98.2 | 3.50 | 0.4985 | 0.6414 | 0.7159 | 0.7607 | 0.7938 | — | Patrick Ebel | We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. | TBA | patrick.ebel@epfl.ch | N/A | 128 float32 |
SIFT-AID (NN matcher) kp:8000, match:nn |
19-05-10 | F | 92.2 | 3106.0 | 82.5 | 2.95 | 0.2665 | 0.3724 | 0.4313 | 0.4724 | 0.5107 | — | Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon | We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: https://hal.archives-ouvertes.fr/hal-02016010. Code: https://github.com/rdguez-mariano/sift-aid | https://hal.archives-ouvertes.fr/hal-02016010 | facciolo@cmla.ens-cachan.fr | N/A | 6272 bits |
SIFT-AID (custom matcher) kp:8000, match:sift-aid |
19-04-29 | F/M | 91.5 | 2962.0 | 82.0 | 2.89 | 0.2635 | 0.3814 | 0.4441 | 0.4868 | 0.5202 | — | Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon | We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: https://hal.archives-ouvertes.fr/hal-02016010. Code: https://github.com/rdguez-mariano/sift-aid | https://hal.archives-ouvertes.fr/hal-02016010 | facciolo@cmla.ens-cachan.fr | N/A | 6272 bits |
SIFT + ContextDesc kp:8000, match:nn |
19-05-09 | F | 99.9 | 7159.1 | 99.0 | 3.49 | 0.4967 | 0.6466 | 0.7115 | 0.7636 | 0.7925 | — | Zixin Luo | ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. | https://github.com/lzx551402/contextdesc | zluoag@cse.ust.hk | N/A | 128 uint8 |
SIFT-Dense-ContextDesc kp:8000, match:nn |
19-05-28 | F | 99.5 | 5971.8 | 97.8 | 3.59 | 0.4545 | 0.6070 | 0.6756 | 0.7158 | 0.7507 | — | Zixin Luo, Jiahui Zhang | Dense-ContextDesc is a variant of ContextDesc, where descriptors are densely extracted from full images, instead of image patches, while other settings stay unchanged as original ContextDesc. We find Dense-ContextDesc performs better regarding in particular illumination changes. Dense-ContextDesc is extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. The model will be available on the authors' GitHub page. | TBA | zluoag@cse.ust.hk | N/A | TBA |
SIFT + ContextDesc + Inlier Classification V2 kp:8000, match:custom |
19-05-28 | F/M | 100.0 | 6725.5 | 100.0 | 3.67 | 0.6659 | 0.8305 | 0.8809 | 0.9102 | 0.9293 | — | Dawei Sun, Zixin Luo, Jiahui Zhang | We use the SIFT detector and ContextDesc descriptor, and then we train an improved inlier classification and fundamental matrix estimation network based on [Yi et al. CVPR2018] (https://arxiv.org/pdf/1711.05971.pdf). | https://github.com/lzx551402/contextdesc | zluoag@cse.ust.hk | N/A | 128 float32 |
SIFT-GeoDesc-GitHub kp:8000, match:nn |
19-05-08 | F | 99.7 | 5863.3 | 98.5 | 3.66 | 0.5024 | 0.6494 | 0.7246 | 0.7711 | 0.8054 | — | Zixin Luo | GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features extracted from the code provided by the authors. | https://arxiv.org/abs/1807.06294 | zluoag@cse.ust.hk | N/A | 128 float32 |
SIFT + GeoDesc kp:8000, match:nn |
19-04-24 | F | 99.4 | 5803.2 | 98.0 | 3.65 | 0.4853 | 0.6450 | 0.7138 | 0.7569 | 0.7808 | — | Challenge organizers | GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. | https://github.com/lzx551402/geodesc | imagematching@uvic.ca | N/A | 128 float32 |
SIFT + HardNet kp:8000, match:nn |
19-04-24 | F | 99.6 | 6992.4 | 98.5 | 3.53 | 0.4737 | 0.6270 | 0.6945 | 0.7446 | 0.7752 | — | Challenge organizers | HardNet extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. | https://github.com/DagnyT/hardnet | imagematching@uvic.ca | N/A | 128 float32 |
SIFT + L2-Net kp:8000, match:nn |
19-04-24 | F | 99.5 | 6584.7 | 97.8 | 3.51 | 0.4818 | 0.6342 | 0.7093 | 0.7595 | 0.7870 | — | Challenge organizers | L2-Net extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. | https://github.com/yuruntian/L2-Net | imagematching@uvic.ca | N/A | 128 float32 |
SIFT + ContextDesc + Inlier Classification V1 kp:8000, match:custom |
19-05-29 | F/M | 99.9 | 6656.3 | 99.8 | 3.67 | 0.6488 | 0.8004 | 0.8609 | 0.8883 | 0.9076 | — | Dawei Sun, Zixin Luo, Jiahui Zhang | We use the SIFT detector and ContextDesc descriptor, and then we train an inlier classification and fundamental matrix estimation network using the architecture of [Yi et al. CVPR2018] (https://arxiv.org/pdf/1711.05971.pdf). | https://github.com/lzx551402/contextdesc | zluoag@cse.ust.hk | N/A | 128 float32 |
SIFT (OpenCV) kp:8000, match:nn |
19-04-24 | F | 97.8 | 5042.7 | 92.0 | 3.47 | 0.4324 | 0.5683 | 0.6433 | 0.6835 | 0.7085 | — | Challenge organizers | SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. | https://opencv.org | imagematching@uvic.ca | N/A | 128 float32 |
SIFT + TFeat kp:8000, match:nn |
19-04-24 | F | 99.4 | 5916.9 | 96.5 | 3.47 | 0.4612 | 0.6155 | 0.6819 | 0.7305 | 0.7654 | — | Challenge organizers | T-Feat extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. | https://github.com/vbalnt/tfeat | imagematching@uvic.ca | N/A | 128 float32 |
SIFT (OpenCV) kp:2048, match:nn |
19-05-17 | F | 84.8 | 1020.4 | 71.5 | 2.93 | 0.1483 | 0.2179 | 0.2606 | 0.2910 | 0.3163 | — | Challenge organizers | SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. | https://opencv.org | imagematching@uvic.ca | N/A | 128 float32 |
Superpoint (nn matcher) kp:2048, match:nn |
19-06-07 | F | 98.1 | 1738.9 | 91.5 | 3.66 | 0.3882 | 0.5408 | 0.6105 | 0.6517 | 0.6871 | — | Challenge organizers | SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. | TBA | imagematching@uvic.ca | N/A | 256 float32 |
Superpoint (1:1 matcher) kp:2048, match:nn1to1 |
19-06-07 | F | 98.5 | 1621.8 | 93.8 | 3.73 | 0.4257 | 0.5707 | 0.6507 | 0.7016 | 0.7358 | — | Challenge organizers | SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. | TBA | imagematching@uvic.ca | N/A | 256 float32 |
SuperPoint (default) kp:2048, match:nn |
19-04-24 | F | 95.2 | 1172.9 | 85.2 | 3.39 | 0.3032 | 0.4336 | 0.5046 | 0.5534 | 0.5797 | — | Challenge organizers | SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned (about 1200 on average). Feature matching done with brute-force nearest-neighbour search. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | imagematching@uvic.ca | N/A | 256 float32 |
SuperPoint kp:1024, match:nn |
19-04-26 | F | 94.2 | 841.4 | 83.2 | 3.32 | 0.3051 | 0.4243 | 0.4946 | 0.5351 | 0.5616 | — | Challenge organizers | SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | imagematching@uvic.ca | N/A | 256 float32 |
SuperPoint kp:2048, match:nn |
19-04-26 | F | 95.7 | 1524.3 | 88.2 | 3.44 | 0.3035 | 0.4387 | 0.5072 | 0.5523 | 0.5886 | — | Challenge organizers | SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | imagematching@uvic.ca | N/A | 256 float32 |
SuperPoint kp:256, match:nn |
19-04-26 | F | 33.8 | 96.8 | 12.8 | 1.96 | 0.0081 | 0.0133 | 0.0160 | 0.0176 | 0.0188 | — | Challenge organizers | SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | imagematching@uvic.ca | N/A | 256 float32 |
SuperPoint kp:512, match:nn |
19-04-26 | F | 85.7 | 388.5 | 66.2 | 3.01 | 0.1857 | 0.2701 | 0.3108 | 0.3430 | 0.3634 | — | Challenge organizers | SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | imagematching@uvic.ca | N/A | 256 float32 |
SuperPoint kp:8000, match:nn |
19-04-26 | F | 96.5 | 4767.1 | 92.5 | 3.42 | 0.1902 | 0.3431 | 0.4401 | 0.5021 | 0.5568 | — | Challenge organizers | SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | imagematching@uvic.ca | N/A | 256 float32 |
Scale-invariant desc. (Cartesian, lambda=16) kp:8000, match:nn |
19-07-29 | F | 99.8 | 7483.0 | 99.2 | 3.52 | 0.4681 | 0.6285 | 0.7104 | 0.7622 | 0.8020 | — | Patrick Ebel | We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. (This is a baseline where we use cartesian patches instead.) Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. | TBA | patrick.ebel@epfl.ch | N/A | 128 float32 |
SuperPoint (trained on coco + phototourism training set) kp:2048, match:nn |
19-05-30 | F | 97.0 | 1367.4 | 90.5 | 3.57 | 0.3466 | 0.4874 | 0.5633 | 0.6130 | 0.6465 | — | Daniel DeTone, Paul Sarlin, Tomasz Malisiewicz, Andrew Rabinovich | SuperPoint V1 model trained on COCO homographic warps at VGA resolution, plus pairs from the phototourism training set using the GT poses and depths for correspondence. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | ddetone@magicleap.com | N/A | 256 float32 |
SuperPoint + Custom Matcher (v1) kp:2048, match:custom |
19-05-30 | F/M | 98.9 | 1291.3 | 94.5 | 3.80 | 0.5055 | 0.6663 | 0.7382 | 0.7775 | 0.8023 | — | Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich | Features are extracted by a modified SuperPoint: - retrained on the phototourism training set; - keypoints refinement; - better descriptor sampling; - adjusted thresholds. A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). | TBA | pesarlin@magicleap.com | N/A | 256 float32 |
SuperPoint + Custom Matcher (v2) kp:2048, match:custom |
19-05-28 | F/M | 99.1 | 1457.4 | 95.0 | 4.01 | 0.5125 | 0.6774 | 0.7504 | 0.7938 | 0.8180 | — | Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich | Features are extracted by a modified SuperPoint: - retrained on the phototourism training set, - keypoints refinement, - better descriptor sampling, - adjusted thresholds; A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). | TBA | pesarlin@magicleap.com | N/A | 256 float32 |
SURF (OpenCV) kp:8000, match:nn |
19-04-24 | F | 93.2 | 3014.7 | 82.5 | 3.10 | 0.2704 | 0.3890 | 0.4503 | 0.4908 | 0.5190 | — | Challenge organizers | SURF, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. | https://opencv.org | imagematching@uvic.ca | N/A | TBA |
SIFT + ContextDesc kp:8000, match:nn1to1 |
19-06-07 | F | 99.7 | 7245.2 | 99.8 | 3.57 | 0.5255 | 0.6810 | 0.7492 | 0.7906 | 0.8229 | — | Challenge organizers | ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with nearest-neighbour search, enforcing cross-match consistency. Features are quantized to uint8 and extracted from the code provided by the authors. | https://github.com/lzx551402/contextdesc | imagematching@uvic.ca | N/A | TBA |
MVS — sequence 'mount_rushmore' | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Method | Date | Type | Ims (%) | #Pts | SR | TL | mAP5o | mAP10o | mAP15o | mAP20o | mAP25o | ATE | ||||||
AKAZE (OpenCV) kp:8000, match:nn |
19-04-24 | F | 88.0 | 2560.1 | 84.5 | 3.50 | 0.1900 | 0.2628 | 0.3131 | 0.3593 | 0.3974 | — | Challenge organizers | AKAZE, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. | https://opencv.org | imagematching@uvic.ca | N/A | TBA |
SphereDesc kp:8000, match:nn |
19-04-26 | F | 92.7 | 3241.5 | 89.0 | 3.66 | 0.1738 | 0.2667 | 0.3458 | 0.4051 | 0.4399 | — | Anonymous | We use OpenCV's implementation of AKAZE detector, and for each keypoint we extract a descriptor via CNN. | N/A | Anonymous | N/A | 256 float32 |
Brisk + SSS kp:8000, match:nn |
19-05-14 | F | 86.1 | 2007.6 | 77.8 | 2.77 | 0.0279 | 0.0683 | 0.1207 | 0.1664 | 0.2017 | — | Anonymous | We use OpenCV's implementation of brisk detector with the default settings, and for each image there are at most 8K keypoints. For each keypoint, we extract a descriptor via a CNN model. | TBA | Anonymous | N/A | 128 float32 |
D2-Net (single scale) kp:8000, match:nn |
19-05-07 | F | 94.0 | 4259.6 | 90.2 | 3.55 | 0.1301 | 0.2326 | 0.3186 | 0.3834 | 0.4361 | — | Challenge organizers | D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf | https://github.com/mihaidusmanu/d2-net | imagematching@uvic.ca | N/A | 512 float32 |
D2-Net (multiscale) kp:8000, match:nn |
19-05-07 | F | 95.2 | 6731.8 | 94.0 | 3.36 | 0.1456 | 0.2653 | 0.3543 | 0.4342 | 0.4860 | — | Challenge organizers | D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf | https://github.com/mihaidusmanu/d2-net | imagematching@uvic.ca | N/A | 512 float32 |
D2-Net (single scale, no PT dataset) kp:8000, match:nn |
19-06-01 | F | 94.0 | 4065.3 | 93.0 | 3.56 | 0.1257 | 0.2435 | 0.3217 | 0.3834 | 0.4381 | — | Challenge organizers | D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf | https://github.com/mihaidusmanu/d2-net | imagematching@uvic.ca | N/A | 512 float32 |
D2-Net (multi-scale, no PT dataset) kp:8000, match:nn |
19-06-05 | F | 95.0 | 5694.8 | 93.2 | 3.29 | 0.1185 | 0.2274 | 0.3182 | 0.3912 | 0.4499 | — | Challenge organizers | D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf | https://github.com/mihaidusmanu/d2-net | imagematching@uvic.ca | N/A | 512 float32 |
DELF kp:1024, match:nn |
19-05-05 | F | 81.6 | 454.4 | 78.8 | 2.50 | 0.0010 | 0.0092 | 0.0330 | 0.0776 | 0.1192 | — | Challenge organizers | DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 | https://github.com/tensorflow/models/tree/master/research/delf | imagematching@uvic.ca | N/A | 40 float32 |
DELF kp:2048, match:nn |
19-05-05 | F | 86.3 | 853.8 | 87.8 | 2.52 | 0.0024 | 0.0192 | 0.0592 | 0.1117 | 0.1638 | — | Challenge organizers | DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 | https://github.com/tensorflow/models/tree/master/research/delf | imagematching@uvic.ca | N/A | 40 float32 |
DELF kp:256, match:nn |
19-05-05 | F | 32.2 | 54.0 | 27.5 | 1.80 | 0.0000 | 0.0002 | 0.0010 | 0.0038 | 0.0062 | — | Challenge organizers | DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 | https://github.com/tensorflow/models/tree/master/research/delf | imagematching@uvic.ca | N/A | 40 float32 |
DELF kp:512, match:nn |
19-05-05 | F | 70.7 | 191.1 | 60.2 | 2.42 | 0.0003 | 0.0040 | 0.0140 | 0.0316 | 0.0504 | — | Challenge organizers | DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 | https://github.com/tensorflow/models/tree/master/research/delf | imagematching@uvic.ca | N/A | 40 float32 |
ELF-256D kp:512, match:nn |
19-05-07 | F | 59.2 | 117.2 | 42.2 | 2.65 | 0.0043 | 0.0122 | 0.0190 | 0.0266 | 0.0313 | — | Anonymous | ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. | TBA | Anonymous | N/A | 256 float32 |
ELF-512D kp:512, match:nn |
19-05-09 | F | 60.0 | 120.6 | 45.0 | 2.73 | 0.0046 | 0.0122 | 0.0207 | 0.0285 | 0.0342 | — | Anonymous | ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. | TBA | Anonymous | N/A | 256 float32 |
ELF-SIFT kp:512, match:nn |
19-04-26 | F | 68.0 | 164.0 | 57.0 | 2.87 | 0.0293 | 0.0585 | 0.0799 | 0.0972 | 0.1116 | — | Anonymous | ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are HOG (as in SIFT). | N/A | Anonymous | N/A | 128 uint8 |
SIFT + GeoDesc kp:2048, match:nn |
19-05-19 | F | 83.7 | 814.9 | 81.0 | 3.23 | 0.1240 | 0.1972 | 0.2438 | 0.2877 | 0.3239 | — | Challenge organizers | GeoDesc extracted on SIFT keypoints. Feature matching with brute-force nearest-neighbour search. | https://github.com/lzx551402/geodesc | imagematching@uvic.ca | N/A | 128 float32 |
SIFT + HardNet kp:2048, match:nn |
19-05-19 | F | 86.0 | 956.1 | 78.8 | 3.22 | 0.1350 | 0.2085 | 0.2620 | 0.3051 | 0.3352 | — | Challenge organizers | HardNet extracted on SIFT keypoints. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger patches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. | https://github.com/DagnyT/hardnet | imagematching@uvic.ca | N/A | 128 float32 |
HarrisZ/RsGLOH2 kp:8000, match:sGOr2f |
19-05-23 | F | 93.0 | 1511.3 | 91.0 | 3.58 | 0.2318 | 0.3353 | 0.4007 | 0.4460 | 0.4868 | — | Fabio Bellavia and Carlo Colombo | HarrisZ keypoints and Root squared (like RootSIFT) sGLOH2 descriptors using sGOr2f* matching strategy. Distance table entries lower than their respective row-wise and column-wise averages are discarded and then matching pairs are computed using the greedy nearest neighbour as in the WISW@CAIP2019 contest. Keypoints and matches are ordered according to their ranks (best ones first). | http://cvg.dsi.unifi.it/cvg/index.php?id=research#descriptor | bellavia.fabio@gmail.com | N/A | 256 float32 |
HesAffNet - HardNet2 kp:8000, match:nn |
19-05-29 | F | 90.0 | 3140.7 | 91.2 | 3.66 | 0.2129 | 0.3063 | 0.3792 | 0.4237 | 0.4648 | — | Milan Pultar, Dmytro Mishkin, Jiří Matas | Detector: Hessian + AffNet(affine shape) + OriNet (orientation), Code and weight from https://github.com/ducha-aiki/affnet, Paper: https://arxiv.org/abs/1711.06704; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: https://github.com/pultarmi/HardNet_MultiDataset, Paper: https://arxiv.org/pdf/1901.09780.pdf; DeepNets were plugged into MODS framework, but without view synthesis and matching https://github.com/ducha-aiki/mods-light-zmq; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. | TBA | ducha.aiki@gmail.com | N/A | 128 uint8 |
Hessian - HardNet2 kp:8000, match:nn |
19-05-30 | F | 92.5 | 3961.1 | 94.5 | 3.84 | 0.2539 | 0.3624 | 0.4294 | 0.4835 | 0.5198 | — | Milan Pultar, Dmytro Mishkin, Jiří Matas | Detector: Hessian. Gravity vector orientation is assumed; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: https://github.com/pultarmi/HardNet_MultiDataset Paper: https://arxiv.org/pdf/1901.09780.pdf, DeepNets were plugged into MODS framework, but without view synthesis and matching https://github.com/ducha-aiki/mods-light-zmq; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. | TBA | ducha.aiki@gmail.com | N/A | 128 uint8 |
ORB (OpenCV) kp:8000, match:nn |
19-04-24 | F | 78.1 | 1604.7 | 69.8 | 2.96 | 0.0640 | 0.1070 | 0.1402 | 0.1691 | 0.1905 | — | Challenge organizers | ORB, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. | https://opencv.org | imagematching@uvic.ca | N/A | TBA |
Scale-invariant desc. (Log-Polar, lambda=32) kp:8000, match:nn |
19-06-25 | F | 94.5 | 4923.5 | 92.8 | 3.59 | 0.3455 | 0.4482 | 0.5206 | 0.5666 | 0.5975 | — | Patrick Ebel | We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. | TBA | patrick.ebel@epfl.ch | N/A | 128 float32 |
Scale-invariant desc. (Log-Polar, lambda=64) kp:8000, match:nn |
19-06-24 | F | 93.5 | 5179.7 | 92.5 | 3.56 | 0.3277 | 0.4360 | 0.4911 | 0.5325 | 0.5606 | — | Patrick Ebel | We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. | TBA | patrick.ebel@epfl.ch | N/A | 128 float32 |
Scale-invariant desc. (Log-Polar, lambda=96) kp:8000, match:nn |
19-06-20 | F | 94.4 | 5162.8 | 94.2 | 3.58 | 0.3358 | 0.4460 | 0.5201 | 0.5683 | 0.5981 | — | Patrick Ebel | We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. | TBA | patrick.ebel@epfl.ch | N/A | 128 float32 |
SIFT-AID (NN matcher) kp:8000, match:nn |
19-05-10 | F | 78.3 | 1244.1 | 66.0 | 2.86 | 0.0693 | 0.1117 | 0.1428 | 0.1707 | 0.1927 | — | Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon | We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: https://hal.archives-ouvertes.fr/hal-02016010. Code: https://github.com/rdguez-mariano/sift-aid | https://hal.archives-ouvertes.fr/hal-02016010 | facciolo@cmla.ens-cachan.fr | N/A | 6272 bits |
SIFT-AID (custom matcher) kp:8000, match:sift-aid |
19-04-29 | F/M | 75.7 | 1066.4 | 65.0 | 2.79 | 0.0656 | 0.1071 | 0.1367 | 0.1638 | 0.1835 | — | Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon | We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: https://hal.archives-ouvertes.fr/hal-02016010. Code: https://github.com/rdguez-mariano/sift-aid | https://hal.archives-ouvertes.fr/hal-02016010 | facciolo@cmla.ens-cachan.fr | N/A | 6272 bits |
SIFT + ContextDesc kp:8000, match:nn |
19-05-09 | F | 94.9 | 4393.5 | 95.0 | 3.66 | 0.2958 | 0.4132 | 0.4788 | 0.5322 | 0.5700 | — | Zixin Luo | ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. | https://github.com/lzx551402/contextdesc | zluoag@cse.ust.hk | N/A | 128 uint8 |
SIFT-Dense-ContextDesc kp:8000, match:nn |
19-05-28 | F | 93.0 | 3832.8 | 92.5 | 3.77 | 0.2789 | 0.3830 | 0.4511 | 0.5060 | 0.5436 | — | Zixin Luo, Jiahui Zhang | Dense-ContextDesc is a variant of ContextDesc, where descriptors are densely extracted from full images, instead of image patches, while other settings stay unchanged as original ContextDesc. We find Dense-ContextDesc performs better regarding in particular illumination changes. Dense-ContextDesc is extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. The model will be available on the authors' GitHub page. | TBA | zluoag@cse.ust.hk | N/A | TBA |
SIFT + ContextDesc + Inlier Classification V2 kp:8000, match:custom |
19-05-28 | F/M | 95.2 | 5279.1 | 95.5 | 3.63 | 0.4208 | 0.5494 | 0.6305 | 0.6803 | 0.7085 | — | Dawei Sun, Zixin Luo, Jiahui Zhang | We use the SIFT detector and ContextDesc descriptor, and then we train an improved inlier classification and fundamental matrix estimation network based on [Yi et al. CVPR2018] (https://arxiv.org/pdf/1711.05971.pdf). | https://github.com/lzx551402/contextdesc | zluoag@cse.ust.hk | N/A | 128 float32 |
SIFT-GeoDesc-GitHub kp:8000, match:nn |
19-05-08 | F | 93.9 | 4283.5 | 91.5 | 3.64 | 0.3019 | 0.4028 | 0.4662 | 0.5188 | 0.5631 | — | Zixin Luo | GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features extracted from the code provided by the authors. | https://arxiv.org/abs/1807.06294 | zluoag@cse.ust.hk | N/A | 128 float32 |
SIFT + GeoDesc kp:8000, match:nn |
19-04-24 | F | 94.2 | 4258.4 | 91.2 | 3.66 | 0.3333 | 0.4335 | 0.4907 | 0.5437 | 0.5759 | — | Challenge organizers | GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. | https://github.com/lzx551402/geodesc | imagematching@uvic.ca | N/A | 128 float32 |
SIFT + HardNet kp:8000, match:nn |
19-04-24 | F | 94.8 | 5031.3 | 91.8 | 3.61 | 0.3225 | 0.4429 | 0.5106 | 0.5660 | 0.6041 | — | Challenge organizers | HardNet extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. | https://github.com/DagnyT/hardnet | imagematching@uvic.ca | N/A | 128 float32 |
SIFT + L2-Net kp:8000, match:nn |
19-04-24 | F | 92.9 | 4193.4 | 89.5 | 3.57 | 0.2987 | 0.3976 | 0.4667 | 0.5171 | 0.5530 | — | Challenge organizers | L2-Net extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. | https://github.com/yuruntian/L2-Net | imagematching@uvic.ca | N/A | 128 float32 |
SIFT + ContextDesc + Inlier Classification V1 kp:8000, match:custom |
19-05-29 | F/M | 94.9 | 5005.6 | 95.8 | 3.68 | 0.4124 | 0.5522 | 0.6222 | 0.6646 | 0.6979 | — | Dawei Sun, Zixin Luo, Jiahui Zhang | We use the SIFT detector and ContextDesc descriptor, and then we train an inlier classification and fundamental matrix estimation network using the architecture of [Yi et al. CVPR2018] (https://arxiv.org/pdf/1711.05971.pdf). | https://github.com/lzx551402/contextdesc | zluoag@cse.ust.hk | N/A | 128 float32 |
SIFT (OpenCV) kp:8000, match:nn |
19-04-24 | F | 85.6 | 2470.7 | 83.0 | 3.29 | 0.1986 | 0.2710 | 0.3181 | 0.3563 | 0.3825 | — | Challenge organizers | SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. | https://opencv.org | imagematching@uvic.ca | N/A | 128 float32 |
SIFT + TFeat kp:8000, match:nn |
19-04-24 | F | 91.1 | 3276.1 | 86.8 | 3.50 | 0.2535 | 0.3399 | 0.4022 | 0.4485 | 0.4811 | — | Challenge organizers | T-Feat extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. | https://github.com/vbalnt/tfeat | imagematching@uvic.ca | N/A | 128 float32 |
SIFT (OpenCV) kp:2048, match:nn |
19-05-17 | F | 76.7 | 689.6 | 71.5 | 2.97 | 0.0798 | 0.1236 | 0.1601 | 0.1927 | 0.2159 | — | Challenge organizers | SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. | https://opencv.org | imagematching@uvic.ca | N/A | 128 float32 |
Superpoint (nn matcher) kp:2048, match:nn |
19-06-07 | F | 92.0 | 1005.4 | 89.2 | 4.21 | 0.2201 | 0.3210 | 0.3891 | 0.4421 | 0.4823 | — | Challenge organizers | SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. | TBA | imagematching@uvic.ca | N/A | 256 float32 |
Superpoint (1:1 matcher) kp:2048, match:nn1to1 |
19-06-07 | F | 91.9 | 990.3 | 87.2 | 4.23 | 0.2608 | 0.3599 | 0.4309 | 0.4813 | 0.5167 | — | Challenge organizers | SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. | TBA | imagematching@uvic.ca | N/A | 256 float32 |
SuperPoint (default) kp:2048, match:nn |
19-04-24 | F | 89.9 | 494.1 | 87.8 | 4.14 | 0.1925 | 0.2929 | 0.3569 | 0.4057 | 0.4463 | — | Challenge organizers | SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned (about 1200 on average). Feature matching done with brute-force nearest-neighbour search. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | imagematching@uvic.ca | N/A | 256 float32 |
SuperPoint kp:1024, match:nn |
19-04-26 | F | 91.2 | 580.9 | 87.0 | 4.24 | 0.2113 | 0.3033 | 0.3670 | 0.4165 | 0.4602 | — | Challenge organizers | SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | imagematching@uvic.ca | N/A | 256 float32 |
SuperPoint kp:2048, match:nn |
19-04-26 | F | 91.5 | 1048.4 | 89.8 | 4.08 | 0.2214 | 0.3216 | 0.3859 | 0.4300 | 0.4717 | — | Challenge organizers | SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | imagematching@uvic.ca | N/A | 256 float32 |
SuperPoint kp:256, match:nn |
19-04-26 | F | 75.2 | 113.6 | 64.2 | 3.42 | 0.0501 | 0.0837 | 0.1125 | 0.1370 | 0.1558 | — | Challenge organizers | SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | imagematching@uvic.ca | N/A | 256 float32 |
SuperPoint kp:512, match:nn |
19-04-26 | F | 83.0 | 246.6 | 83.2 | 3.84 | 0.1380 | 0.2119 | 0.2704 | 0.3132 | 0.3438 | — | Challenge organizers | SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | imagematching@uvic.ca | N/A | 256 float32 |
SuperPoint kp:8000, match:nn |
19-04-26 | F | 94.4 | 3764.8 | 92.0 | 3.85 | 0.1489 | 0.2427 | 0.3203 | 0.3796 | 0.4292 | — | Challenge organizers | SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | imagematching@uvic.ca | N/A | 256 float32 |
Scale-invariant desc. (Cartesian, lambda=16) kp:8000, match:nn |
19-07-29 | F | 94.3 | 5187.3 | 92.2 | 3.56 | 0.3011 | 0.4134 | 0.4867 | 0.5406 | 0.5767 | — | Patrick Ebel | We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. (This is a baseline where we use cartesian patches instead.) Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. | TBA | patrick.ebel@epfl.ch | N/A | 128 float32 |
SuperPoint (trained on coco + phototourism training set) kp:2048, match:nn |
19-05-30 | F | 91.7 | 701.3 | 87.2 | 4.19 | 0.2019 | 0.2927 | 0.3562 | 0.4068 | 0.4445 | — | Daniel DeTone, Paul Sarlin, Tomasz Malisiewicz, Andrew Rabinovich | SuperPoint V1 model trained on COCO homographic warps at VGA resolution, plus pairs from the phototourism training set using the GT poses and depths for correspondence. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | ddetone@magicleap.com | N/A | 256 float32 |
SuperPoint + Custom Matcher (v1) kp:2048, match:custom |
19-05-30 | F/M | 94.9 | 686.9 | 90.2 | 4.37 | 0.3028 | 0.4263 | 0.5033 | 0.5579 | 0.5958 | — | Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich | Features are extracted by a modified SuperPoint: - retrained on the phototourism training set; - keypoints refinement; - better descriptor sampling; - adjusted thresholds. A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). | TBA | pesarlin@magicleap.com | N/A | 256 float32 |
SuperPoint + Custom Matcher (v2) kp:2048, match:custom |
19-05-28 | F/M | 93.4 | 960.1 | 88.2 | 4.36 | 0.3139 | 0.4466 | 0.5183 | 0.5656 | 0.5954 | — | Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich | Features are extracted by a modified SuperPoint: - retrained on the phototourism training set, - keypoints refinement, - better descriptor sampling, - adjusted thresholds; A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). | TBA | pesarlin@magicleap.com | N/A | 256 float32 |
SURF (OpenCV) kp:8000, match:nn |
19-04-24 | F | 83.3 | 1508.4 | 76.5 | 3.23 | 0.1146 | 0.1709 | 0.2126 | 0.2475 | 0.2775 | — | Challenge organizers | SURF, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. | https://opencv.org | imagematching@uvic.ca | N/A | TBA |
SIFT + ContextDesc kp:8000, match:nn1to1 |
19-06-07 | F | 95.5 | 5563.0 | 96.2 | 3.63 | 0.3288 | 0.4579 | 0.5393 | 0.5927 | 0.6348 | — | Challenge organizers | ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with nearest-neighbour search, enforcing cross-match consistency. Features are quantized to uint8 and extracted from the code provided by the authors. | https://github.com/lzx551402/contextdesc | imagematching@uvic.ca | N/A | TBA |
MVS — sequence 'piazza_san_marco' | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Method | Date | Type | Ims (%) | #Pts | SR | TL | mAP5o | mAP10o | mAP15o | mAP20o | mAP25o | ATE | ||||||
AKAZE (OpenCV) kp:8000, match:nn |
19-04-24 | F | 90.2 | 5212.9 | 84.8 | 2.37 | 0.1090 | 0.1960 | 0.2543 | 0.3059 | 0.3479 | — | Challenge organizers | AKAZE, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. | https://opencv.org | imagematching@uvic.ca | N/A | TBA |
SphereDesc kp:8000, match:nn |
19-04-26 | F | 96.6 | 5929.9 | 96.0 | 2.47 | 0.1927 | 0.3246 | 0.4122 | 0.4794 | 0.5291 | — | Anonymous | We use OpenCV's implementation of AKAZE detector, and for each keypoint we extract a descriptor via CNN. | N/A | Anonymous | N/A | 256 float32 |
Brisk + SSS kp:8000, match:nn |
19-05-14 | F | 92.3 | 4004.4 | 89.5 | 2.29 | 0.0765 | 0.1823 | 0.2678 | 0.3353 | 0.3876 | — | Anonymous | We use OpenCV's implementation of brisk detector with the default settings, and for each image there are at most 8K keypoints. For each keypoint, we extract a descriptor via a CNN model. | TBA | Anonymous | N/A | 128 float32 |
D2-Net (single scale) kp:8000, match:nn |
19-05-07 | F | 95.9 | 5493.2 | 93.5 | 2.41 | 0.1385 | 0.2715 | 0.3673 | 0.4293 | 0.4868 | — | Challenge organizers | D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf | https://github.com/mihaidusmanu/d2-net | imagematching@uvic.ca | N/A | 512 float32 |
D2-Net (multiscale) kp:8000, match:nn |
19-05-07 | F | 96.8 | 6287.8 | 96.5 | 2.42 | 0.1290 | 0.2761 | 0.3746 | 0.4409 | 0.4992 | — | Challenge organizers | D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf | https://github.com/mihaidusmanu/d2-net | imagematching@uvic.ca | N/A | 512 float32 |
D2-Net (single scale, no PT dataset) kp:8000, match:nn |
19-06-01 | F | 96.1 | 5554.5 | 94.2 | 2.41 | 0.1407 | 0.2800 | 0.3760 | 0.4440 | 0.5006 | — | Challenge organizers | D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf | https://github.com/mihaidusmanu/d2-net | imagematching@uvic.ca | N/A | 512 float32 |
D2-Net (multi-scale, no PT dataset) kp:8000, match:nn |
19-06-05 | F | 97.1 | 5970.1 | 95.5 | 2.42 | 0.1452 | 0.2975 | 0.3976 | 0.4676 | 0.5287 | — | Challenge organizers | D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf | https://github.com/mihaidusmanu/d2-net | imagematching@uvic.ca | N/A | 512 float32 |
DELF kp:1024, match:nn |
19-05-05 | F | 65.6 | 526.3 | 56.5 | 2.09 | 0.0058 | 0.0262 | 0.0476 | 0.0674 | 0.0848 | — | Challenge organizers | DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 | https://github.com/tensorflow/models/tree/master/research/delf | imagematching@uvic.ca | N/A | 40 float32 |
DELF kp:2048, match:nn |
19-05-05 | F | 81.8 | 1718.9 | 78.0 | 2.08 | 0.0178 | 0.0764 | 0.1378 | 0.1946 | 0.2381 | — | Challenge organizers | DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 | https://github.com/tensorflow/models/tree/master/research/delf | imagematching@uvic.ca | N/A | 40 float32 |
DELF kp:256, match:nn |
19-05-05 | F | %!f(int64=0) | 0.0 | 0.0 | 0.00 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | — | Challenge organizers | DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 | https://github.com/tensorflow/models/tree/master/research/delf | imagematching@uvic.ca | N/A | 40 float32 |
DELF kp:512, match:nn |
19-05-05 | F | 51.8 | 112.2 | 22.2 | 2.13 | 0.0001 | 0.0006 | 0.0010 | 0.0026 | 0.0034 | — | Challenge organizers | DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 | https://github.com/tensorflow/models/tree/master/research/delf | imagematching@uvic.ca | N/A | 40 float32 |
ELF-256D kp:512, match:nn |
19-05-07 | F | 11.6 | 63.5 | 21.8 | 1.08 | 0.0001 | 0.0005 | 0.0010 | 0.0014 | 0.0017 | — | Anonymous | ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. | TBA | Anonymous | N/A | 256 float32 |
ELF-512D kp:512, match:nn |
19-05-09 | F | 27.6 | 95.0 | 29.8 | 1.63 | 0.0003 | 0.0011 | 0.0024 | 0.0040 | 0.0048 | — | Anonymous | ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. | TBA | Anonymous | N/A | 256 float32 |
ELF-SIFT kp:512, match:nn |
19-04-26 | F | 55.6 | 141.1 | 34.2 | 2.27 | 0.0036 | 0.0089 | 0.0148 | 0.0176 | 0.0215 | — | Anonymous | ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are HOG (as in SIFT). | N/A | Anonymous | N/A | 128 uint8 |
SIFT + GeoDesc kp:2048, match:nn |
19-05-19 | F | 88.4 | 2023.8 | 85.5 | 2.28 | 0.0373 | 0.1222 | 0.2070 | 0.2820 | 0.3337 | — | Challenge organizers | GeoDesc extracted on SIFT keypoints. Feature matching with brute-force nearest-neighbour search. | https://github.com/lzx551402/geodesc | imagematching@uvic.ca | N/A | 128 float32 |
SIFT + HardNet kp:2048, match:nn |
19-05-19 | F | 87.8 | 2147.9 | 84.5 | 2.25 | 0.0328 | 0.1092 | 0.1933 | 0.2619 | 0.3165 | — | Challenge organizers | HardNet extracted on SIFT keypoints. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger patches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. | https://github.com/DagnyT/hardnet | imagematching@uvic.ca | N/A | 128 float32 |
HarrisZ/RsGLOH2 kp:8000, match:sGOr2f |
19-05-23 | F | 93.9 | 2525.5 | 92.0 | 2.37 | 0.1127 | 0.2203 | 0.3091 | 0.3771 | 0.4310 | — | Fabio Bellavia and Carlo Colombo | HarrisZ keypoints and Root squared (like RootSIFT) sGLOH2 descriptors using sGOr2f* matching strategy. Distance table entries lower than their respective row-wise and column-wise averages are discarded and then matching pairs are computed using the greedy nearest neighbour as in the WISW@CAIP2019 contest. Keypoints and matches are ordered according to their ranks (best ones first). | http://cvg.dsi.unifi.it/cvg/index.php?id=research#descriptor | bellavia.fabio@gmail.com | N/A | 256 float32 |
HesAffNet - HardNet2 kp:8000, match:nn |
19-05-29 | F | 97.2 | 7092.2 | 95.8 | 2.63 | 0.2507 | 0.3783 | 0.4449 | 0.5045 | 0.5562 | — | Milan Pultar, Dmytro Mishkin, Jiří Matas | Detector: Hessian + AffNet(affine shape) + OriNet (orientation), Code and weight from https://github.com/ducha-aiki/affnet, Paper: https://arxiv.org/abs/1711.06704; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: https://github.com/pultarmi/HardNet_MultiDataset, Paper: https://arxiv.org/pdf/1901.09780.pdf; DeepNets were plugged into MODS framework, but without view synthesis and matching https://github.com/ducha-aiki/mods-light-zmq; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. | TBA | ducha.aiki@gmail.com | N/A | 128 uint8 |
Hessian - HardNet2 kp:8000, match:nn |
19-05-30 | F | 97.5 | 7443.0 | 97.2 | 2.71 | 0.2714 | 0.3850 | 0.4477 | 0.5046 | 0.5518 | — | Milan Pultar, Dmytro Mishkin, Jiří Matas | Detector: Hessian. Gravity vector orientation is assumed; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: https://github.com/pultarmi/HardNet_MultiDataset Paper: https://arxiv.org/pdf/1901.09780.pdf, DeepNets were plugged into MODS framework, but without view synthesis and matching https://github.com/ducha-aiki/mods-light-zmq; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. | TBA | ducha.aiki@gmail.com | N/A | 128 uint8 |
ORB (OpenCV) kp:8000, match:nn |
19-04-24 | F | 87.5 | 4718.8 | 86.8 | 2.18 | 0.0192 | 0.0666 | 0.1302 | 0.1867 | 0.2426 | — | Challenge organizers | ORB, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. | https://opencv.org | imagematching@uvic.ca | N/A | TBA |
Scale-invariant desc. (Log-Polar, lambda=32) kp:8000, match:nn |
19-06-25 | F | 98.1 | 8339.5 | 98.0 | 2.52 | 0.2242 | 0.3458 | 0.4309 | 0.4866 | 0.5436 | — | Patrick Ebel | We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. | TBA | patrick.ebel@epfl.ch | N/A | 128 float32 |
Scale-invariant desc. (Log-Polar, lambda=64) kp:8000, match:nn |
19-06-24 | F | 98.3 | 8545.4 | 98.5 | 2.53 | 0.2424 | 0.3622 | 0.4473 | 0.5034 | 0.5488 | — | Patrick Ebel | We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. | TBA | patrick.ebel@epfl.ch | N/A | 128 float32 |
Scale-invariant desc. (Log-Polar, lambda=96) kp:8000, match:nn |
19-06-20 | F | 98.6 | 8696.8 | 98.5 | 2.56 | 0.2526 | 0.3750 | 0.4580 | 0.5077 | 0.5571 | — | Patrick Ebel | We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. | TBA | patrick.ebel@epfl.ch | N/A | 128 float32 |
SIFT-AID (NN matcher) kp:8000, match:nn |
19-05-10 | F | 83.0 | 2717.4 | 77.2 | 2.24 | 0.0687 | 0.1341 | 0.1834 | 0.2265 | 0.2614 | — | Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon | We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: https://hal.archives-ouvertes.fr/hal-02016010. Code: https://github.com/rdguez-mariano/sift-aid | https://hal.archives-ouvertes.fr/hal-02016010 | facciolo@cmla.ens-cachan.fr | N/A | 6272 bits |
SIFT-AID (custom matcher) kp:8000, match:sift-aid |
19-04-29 | F/M | 82.4 | 2633.9 | 74.2 | 2.24 | 0.0704 | 0.1389 | 0.1903 | 0.2296 | 0.2672 | — | Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon | We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: https://hal.archives-ouvertes.fr/hal-02016010. Code: https://github.com/rdguez-mariano/sift-aid | https://hal.archives-ouvertes.fr/hal-02016010 | facciolo@cmla.ens-cachan.fr | N/A | 6272 bits |
SIFT + ContextDesc kp:8000, match:nn |
19-05-09 | F | 98.9 | 8508.5 | 98.0 | 2.65 | 0.2755 | 0.4066 | 0.4852 | 0.5414 | 0.5906 | — | Zixin Luo | ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. | https://github.com/lzx551402/contextdesc | zluoag@cse.ust.hk | N/A | 128 uint8 |
SIFT-Dense-ContextDesc kp:8000, match:nn |
19-05-28 | F | 98.5 | 6919.9 | 97.0 | 2.70 | 0.2535 | 0.3866 | 0.4682 | 0.5198 | 0.5635 | — | Zixin Luo, Jiahui Zhang | Dense-ContextDesc is a variant of ContextDesc, where descriptors are densely extracted from full images, instead of image patches, while other settings stay unchanged as original ContextDesc. We find Dense-ContextDesc performs better regarding in particular illumination changes. Dense-ContextDesc is extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. The model will be available on the authors' GitHub page. | TBA | zluoag@cse.ust.hk | N/A | TBA |
SIFT + ContextDesc + Inlier Classification V2 kp:8000, match:custom |
19-05-28 | F/M | 99.6 | 8324.3 | 99.5 | 2.85 | 0.5046 | 0.6215 | 0.6854 | 0.7333 | 0.7671 | — | Dawei Sun, Zixin Luo, Jiahui Zhang | We use the SIFT detector and ContextDesc descriptor, and then we train an improved inlier classification and fundamental matrix estimation network based on [Yi et al. CVPR2018] (https://arxiv.org/pdf/1711.05971.pdf). | https://github.com/lzx551402/contextdesc | zluoag@cse.ust.hk | N/A | 128 float32 |
SIFT-GeoDesc-GitHub kp:8000, match:nn |
19-05-08 | F | 98.1 | 7387.0 | 95.0 | 2.70 | 0.2740 | 0.4013 | 0.4772 | 0.5292 | 0.5787 | — | Zixin Luo | GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features extracted from the code provided by the authors. | https://arxiv.org/abs/1807.06294 | zluoag@cse.ust.hk | N/A | 128 float32 |
SIFT + GeoDesc kp:8000, match:nn |
19-04-24 | F | 97.5 | 7295.3 | 94.8 | 2.56 | 0.2476 | 0.3512 | 0.4137 | 0.4679 | 0.5185 | — | Challenge organizers | GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. | https://github.com/lzx551402/geodesc | imagematching@uvic.ca | N/A | 128 float32 |
SIFT + HardNet kp:8000, match:nn |
19-04-24 | F | 98.4 | 8333.2 | 99.2 | 2.57 | 0.2565 | 0.3783 | 0.4612 | 0.5187 | 0.5603 | — | Challenge organizers | HardNet extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. | https://github.com/DagnyT/hardnet | imagematching@uvic.ca | N/A | 128 float32 |
SIFT + L2-Net kp:8000, match:nn |
19-04-24 | F | 97.3 | 7765.5 | 97.0 | 2.47 | 0.2093 | 0.3237 | 0.4000 | 0.4572 | 0.5014 | — | Challenge organizers | L2-Net extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. | https://github.com/yuruntian/L2-Net | imagematching@uvic.ca | N/A | 128 float32 |
SIFT + ContextDesc + Inlier Classification V1 kp:8000, match:custom |
19-05-29 | F/M | 99.5 | 8211.0 | 98.8 | 2.84 | 0.4407 | 0.5491 | 0.6156 | 0.6625 | 0.7047 | — | Dawei Sun, Zixin Luo, Jiahui Zhang | We use the SIFT detector and ContextDesc descriptor, and then we train an inlier classification and fundamental matrix estimation network using the architecture of [Yi et al. CVPR2018] (https://arxiv.org/pdf/1711.05971.pdf). | https://github.com/lzx551402/contextdesc | zluoag@cse.ust.hk | N/A | 128 float32 |
SIFT (OpenCV) kp:8000, match:nn |
19-04-24 | F | 91.2 | 5351.7 | 87.2 | 2.35 | 0.1394 | 0.2188 | 0.2797 | 0.3339 | 0.3677 | — | Challenge organizers | SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. | https://opencv.org | imagematching@uvic.ca | N/A | 128 float32 |
SIFT + TFeat kp:8000, match:nn |
19-04-24 | F | 95.8 | 6975.9 | 94.8 | 2.41 | 0.1664 | 0.2763 | 0.3557 | 0.4210 | 0.4733 | — | Challenge organizers | T-Feat extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. | https://github.com/vbalnt/tfeat | imagematching@uvic.ca | N/A | 128 float32 |
SIFT (OpenCV) kp:2048, match:nn |
19-05-17 | F | 79.3 | 1396.0 | 77.8 | 2.12 | 0.0083 | 0.0303 | 0.0645 | 0.0962 | 0.1245 | — | Challenge organizers | SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. | https://opencv.org | imagematching@uvic.ca | N/A | 128 float32 |
Superpoint (nn matcher) kp:2048, match:nn |
19-06-07 | F | 89.6 | 2269.5 | 82.8 | 2.32 | 0.0670 | 0.1863 | 0.2737 | 0.3329 | 0.3877 | — | Challenge organizers | SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. | TBA | imagematching@uvic.ca | N/A | 256 float32 |
Superpoint (1:1 matcher) kp:2048, match:nn1to1 |
19-06-07 | F | 90.5 | 2160.2 | 82.2 | 2.37 | 0.0919 | 0.2143 | 0.3033 | 0.3675 | 0.4159 | — | Challenge organizers | SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. | TBA | imagematching@uvic.ca | N/A | 256 float32 |
SuperPoint (default) kp:2048, match:nn |
19-04-24 | F | 87.3 | 1317.6 | 80.2 | 2.46 | 0.1420 | 0.2472 | 0.3163 | 0.3597 | 0.3947 | — | Challenge organizers | SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned (about 1200 on average). Feature matching done with brute-force nearest-neighbour search. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | imagematching@uvic.ca | N/A | 256 float32 |
SuperPoint kp:1024, match:nn |
19-04-26 | F | 83.0 | 857.1 | 71.5 | 2.45 | 0.1234 | 0.2109 | 0.2659 | 0.3044 | 0.3314 | — | Challenge organizers | SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | imagematching@uvic.ca | N/A | 256 float32 |
SuperPoint kp:2048, match:nn |
19-04-26 | F | 89.2 | 1695.3 | 82.5 | 2.47 | 0.1420 | 0.2534 | 0.3269 | 0.3727 | 0.4123 | — | Challenge organizers | SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | imagematching@uvic.ca | N/A | 256 float32 |
SuperPoint kp:256, match:nn |
19-04-26 | F | 3.2 | 26.7 | 4.0 | 0.58 | 0.0000 | 0.0001 | 0.0002 | 0.0002 | 0.0003 | — | Challenge organizers | SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | imagematching@uvic.ca | N/A | 256 float32 |
SuperPoint kp:512, match:nn |
19-04-26 | F | 41.1 | 246.5 | 45.5 | 1.79 | 0.0461 | 0.0804 | 0.0990 | 0.1101 | 0.1176 | — | Challenge organizers | SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | imagematching@uvic.ca | N/A | 256 float32 |
SuperPoint kp:8000, match:nn |
19-04-26 | F | 95.8 | 6173.1 | 93.8 | 2.45 | 0.1470 | 0.2804 | 0.3695 | 0.4303 | 0.4829 | — | Challenge organizers | SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | imagematching@uvic.ca | N/A | 256 float32 |
Scale-invariant desc. (Cartesian, lambda=16) kp:8000, match:nn |
19-07-29 | F | 98.5 | 9920.9 | 99.0 | 2.37 | 0.1257 | 0.2710 | 0.3722 | 0.4436 | 0.5105 | — | Patrick Ebel | We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. (This is a baseline where we use cartesian patches instead.) Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. | TBA | patrick.ebel@epfl.ch | N/A | 128 float32 |
SuperPoint (trained on coco + phototourism training set) kp:2048, match:nn |
19-05-30 | F | 88.9 | 1603.9 | 79.5 | 2.49 | 0.1575 | 0.2699 | 0.3349 | 0.3854 | 0.4214 | — | Daniel DeTone, Paul Sarlin, Tomasz Malisiewicz, Andrew Rabinovich | SuperPoint V1 model trained on COCO homographic warps at VGA resolution, plus pairs from the phototourism training set using the GT poses and depths for correspondence. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | ddetone@magicleap.com | N/A | 256 float32 |
SuperPoint + Custom Matcher (v1) kp:2048, match:custom |
19-05-30 | F/M | 88.9 | 1295.3 | 79.5 | 2.60 | 0.1524 | 0.2508 | 0.3152 | 0.3562 | 0.3950 | — | Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich | Features are extracted by a modified SuperPoint: - retrained on the phototourism training set; - keypoints refinement; - better descriptor sampling; - adjusted thresholds. A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). | TBA | pesarlin@magicleap.com | N/A | 256 float32 |
SuperPoint + Custom Matcher (v2) kp:2048, match:custom |
19-05-28 | F/M | 89.9 | 1560.5 | 80.5 | 2.63 | 0.2112 | 0.3038 | 0.3557 | 0.3998 | 0.4307 | — | Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich | Features are extracted by a modified SuperPoint: - retrained on the phototourism training set, - keypoints refinement, - better descriptor sampling, - adjusted thresholds; A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). | TBA | pesarlin@magicleap.com | N/A | 256 float32 |
SURF (OpenCV) kp:8000, match:nn |
19-04-24 | F | 83.4 | 3696.9 | 74.5 | 2.25 | 0.0784 | 0.1534 | 0.2072 | 0.2466 | 0.2803 | — | Challenge organizers | SURF, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. | https://opencv.org | imagematching@uvic.ca | N/A | TBA |
SIFT + ContextDesc kp:8000, match:nn1to1 |
19-06-07 | F | 99.2 | 9477.8 | 98.5 | 2.62 | 0.2472 | 0.4044 | 0.5014 | 0.5657 | 0.6243 | — | Challenge organizers | ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with nearest-neighbour search, enforcing cross-match consistency. Features are quantized to uint8 and extracted from the code provided by the authors. | https://github.com/lzx551402/contextdesc | imagematching@uvic.ca | N/A | TBA |
MVS — sequence 'reichstag' | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Method | Date | Type | Ims (%) | #Pts | SR | TL | mAP5o | mAP10o | mAP15o | mAP20o | mAP25o | ATE | ||||||
AKAZE (OpenCV) kp:8000, match:nn |
19-04-24 | F | 95.7 | 4128.4 | 90.5 | 3.48 | 0.3067 | 0.4024 | 0.4708 | 0.5249 | 0.5651 | — | Challenge organizers | AKAZE, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. | https://opencv.org | imagematching@uvic.ca | N/A | TBA |
SphereDesc kp:8000, match:nn |
19-04-26 | F | 96.4 | 2565.4 | 97.0 | 2.93 | 0.2516 | 0.3680 | 0.4310 | 0.4832 | 0.5263 | — | Anonymous | We use OpenCV's implementation of AKAZE detector, and for each keypoint we extract a descriptor via CNN. | N/A | Anonymous | N/A | 256 float32 |
Brisk + SSS kp:8000, match:nn |
19-05-14 | F | 94.7 | 1743.9 | 93.0 | 2.79 | 0.1505 | 0.2669 | 0.3511 | 0.4131 | 0.4629 | — | Anonymous | We use OpenCV's implementation of brisk detector with the default settings, and for each image there are at most 8K keypoints. For each keypoint, we extract a descriptor via a CNN model. | TBA | Anonymous | N/A | 128 float32 |
D2-Net (single scale) kp:8000, match:nn |
19-05-07 | F | 94.4 | 3590.9 | 97.5 | 3.20 | 0.1229 | 0.2517 | 0.3340 | 0.3847 | 0.4260 | — | Challenge organizers | D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf | https://github.com/mihaidusmanu/d2-net | imagematching@uvic.ca | N/A | 512 float32 |
D2-Net (multiscale) kp:8000, match:nn |
19-05-07 | F | 95.6 | 4299.7 | 97.0 | 3.04 | 0.1147 | 0.2860 | 0.3690 | 0.4264 | 0.4724 | — | Challenge organizers | D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf | https://github.com/mihaidusmanu/d2-net | imagematching@uvic.ca | N/A | 512 float32 |
D2-Net (single scale, no PT dataset) kp:8000, match:nn |
19-06-01 | F | 94.8 | 3675.1 | 96.2 | 3.21 | 0.1162 | 0.2502 | 0.3284 | 0.3836 | 0.4200 | — | Challenge organizers | D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf | https://github.com/mihaidusmanu/d2-net | imagematching@uvic.ca | N/A | 512 float32 |
D2-Net (multi-scale, no PT dataset) kp:8000, match:nn |
19-06-05 | F | 94.9 | 4309.6 | 97.2 | 3.05 | 0.1096 | 0.2550 | 0.3394 | 0.3964 | 0.4485 | — | Challenge organizers | D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf | https://github.com/mihaidusmanu/d2-net | imagematching@uvic.ca | N/A | 512 float32 |
DELF kp:1024, match:nn |
19-05-05 | F | 86.8 | 863.4 | 91.0 | 2.36 | 0.0117 | 0.0724 | 0.1275 | 0.1759 | 0.2156 | — | Challenge organizers | DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 | https://github.com/tensorflow/models/tree/master/research/delf | imagematching@uvic.ca | N/A | 40 float32 |
DELF kp:2048, match:nn |
19-05-05 | F | 90.8 | 1653.1 | 94.0 | 2.36 | 0.0143 | 0.0818 | 0.1566 | 0.2076 | 0.2548 | — | Challenge organizers | DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 | https://github.com/tensorflow/models/tree/master/research/delf | imagematching@uvic.ca | N/A | 40 float32 |
DELF kp:256, match:nn |
19-05-05 | F | 62.6 | 121.4 | 45.8 | 2.30 | 0.0016 | 0.0113 | 0.0223 | 0.0319 | 0.0417 | — | Challenge organizers | DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 | https://github.com/tensorflow/models/tree/master/research/delf | imagematching@uvic.ca | N/A | 40 float32 |
DELF kp:512, match:nn |
19-05-05 | F | 78.7 | 369.2 | 77.8 | 2.36 | 0.0089 | 0.0459 | 0.0895 | 0.1200 | 0.1482 | — | Challenge organizers | DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 | https://github.com/tensorflow/models/tree/master/research/delf | imagematching@uvic.ca | N/A | 40 float32 |
ELF-256D kp:512, match:nn |
19-05-07 | F | 73.2 | 162.5 | 53.0 | 2.91 | 0.0547 | 0.0973 | 0.1209 | 0.1389 | 0.1532 | — | Anonymous | ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. | TBA | Anonymous | N/A | 256 float32 |
ELF-512D kp:512, match:nn |
19-05-09 | F | 75.6 | 171.2 | 61.7 | 3.01 | 0.0498 | 0.1074 | 0.1424 | 0.1635 | 0.1826 | — | Anonymous | ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. | TBA | Anonymous | N/A | 256 float32 |
ELF-SIFT kp:512, match:nn |
19-04-26 | F | 82.3 | 341.6 | 79.0 | 3.14 | 0.0984 | 0.1749 | 0.2254 | 0.2604 | 0.2864 | — | Anonymous | ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are HOG (as in SIFT). | N/A | Anonymous | N/A | 128 uint8 |
SIFT + GeoDesc kp:2048, match:nn |
19-05-19 | F | 95.3 | 1699.5 | 95.0 | 2.97 | 0.1524 | 0.2545 | 0.3115 | 0.3624 | 0.4017 | — | Challenge organizers | GeoDesc extracted on SIFT keypoints. Feature matching with brute-force nearest-neighbour search. | https://github.com/lzx551402/geodesc | imagematching@uvic.ca | N/A | 128 float32 |
SIFT + HardNet kp:2048, match:nn |
19-05-19 | F | 95.1 | 1892.9 | 93.2 | 2.91 | 0.1425 | 0.2266 | 0.3006 | 0.3497 | 0.3919 | — | Challenge organizers | HardNet extracted on SIFT keypoints. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger patches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. | https://github.com/DagnyT/hardnet | imagematching@uvic.ca | N/A | 128 float32 |
HarrisZ/RsGLOH2 kp:8000, match:sGOr2f |
19-05-23 | F | 96.6 | 1813.1 | 97.0 | 3.26 | 0.2109 | 0.3279 | 0.4042 | 0.4626 | 0.5019 | — | Fabio Bellavia and Carlo Colombo | HarrisZ keypoints and Root squared (like RootSIFT) sGLOH2 descriptors using sGOr2f* matching strategy. Distance table entries lower than their respective row-wise and column-wise averages are discarded and then matching pairs are computed using the greedy nearest neighbour as in the WISW@CAIP2019 contest. Keypoints and matches are ordered according to their ranks (best ones first). | http://cvg.dsi.unifi.it/cvg/index.php?id=research#descriptor | bellavia.fabio@gmail.com | N/A | 256 float32 |
HesAffNet - HardNet2 kp:8000, match:nn |
19-05-29 | F | 97.8 | 5933.7 | 96.5 | 3.33 | 0.3222 | 0.4285 | 0.4937 | 0.5483 | 0.5888 | — | Milan Pultar, Dmytro Mishkin, Jiří Matas | Detector: Hessian + AffNet(affine shape) + OriNet (orientation), Code and weight from https://github.com/ducha-aiki/affnet, Paper: https://arxiv.org/abs/1711.06704; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: https://github.com/pultarmi/HardNet_MultiDataset, Paper: https://arxiv.org/pdf/1901.09780.pdf; DeepNets were plugged into MODS framework, but without view synthesis and matching https://github.com/ducha-aiki/mods-light-zmq; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. | TBA | ducha.aiki@gmail.com | N/A | 128 uint8 |
Hessian - HardNet2 kp:8000, match:nn |
19-05-30 | F | 96.5 | 6522.3 | 97.5 | 3.41 | 0.3363 | 0.4431 | 0.5004 | 0.5460 | 0.5798 | — | Milan Pultar, Dmytro Mishkin, Jiří Matas | Detector: Hessian. Gravity vector orientation is assumed; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: https://github.com/pultarmi/HardNet_MultiDataset Paper: https://arxiv.org/pdf/1901.09780.pdf, DeepNets were plugged into MODS framework, but without view synthesis and matching https://github.com/ducha-aiki/mods-light-zmq; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. | TBA | ducha.aiki@gmail.com | N/A | 128 uint8 |
ORB (OpenCV) kp:8000, match:nn |
19-04-24 | F | 88.6 | 2551.1 | 83.0 | 2.76 | 0.1663 | 0.2417 | 0.3006 | 0.3463 | 0.3810 | — | Challenge organizers | ORB, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. | https://opencv.org | imagematching@uvic.ca | N/A | TBA |
Scale-invariant desc. (Log-Polar, lambda=32) kp:8000, match:nn |
19-06-25 | F | 97.2 | 6984.9 | 98.5 | 3.28 | 0.2495 | 0.3605 | 0.4283 | 0.4785 | 0.5177 | — | Patrick Ebel | We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. | TBA | patrick.ebel@epfl.ch | N/A | 128 float32 |
Scale-invariant desc. (Log-Polar, lambda=64) kp:8000, match:nn |
19-06-24 | F | 97.9 | 7051.2 | 98.2 | 3.31 | 0.2399 | 0.3521 | 0.4141 | 0.4673 | 0.5101 | — | Patrick Ebel | We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. | TBA | patrick.ebel@epfl.ch | N/A | 128 float32 |
Scale-invariant desc. (Log-Polar, lambda=96) kp:8000, match:nn |
19-06-20 | F | 97.1 | 7002.9 | 99.5 | 3.28 | 0.2307 | 0.3494 | 0.4270 | 0.4742 | 0.5106 | — | Patrick Ebel | We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. | TBA | patrick.ebel@epfl.ch | N/A | 128 float32 |
SIFT-AID (NN matcher) kp:8000, match:nn |
19-05-10 | F | 93.3 | 2182.3 | 92.0 | 2.83 | 0.1669 | 0.2666 | 0.3325 | 0.3850 | 0.4194 | — | Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon | We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: https://hal.archives-ouvertes.fr/hal-02016010. Code: https://github.com/rdguez-mariano/sift-aid | https://hal.archives-ouvertes.fr/hal-02016010 | facciolo@cmla.ens-cachan.fr | N/A | 6272 bits |
SIFT-AID (custom matcher) kp:8000, match:sift-aid |
19-04-29 | F/M | 92.4 | 2068.1 | 91.2 | 2.80 | 0.1584 | 0.2514 | 0.3177 | 0.3605 | 0.4051 | — | Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon | We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: https://hal.archives-ouvertes.fr/hal-02016010. Code: https://github.com/rdguez-mariano/sift-aid | https://hal.archives-ouvertes.fr/hal-02016010 | facciolo@cmla.ens-cachan.fr | N/A | 6272 bits |
SIFT + ContextDesc kp:8000, match:nn |
19-05-09 | F | 98.0 | 6330.9 | 98.5 | 3.28 | 0.2695 | 0.3867 | 0.4540 | 0.4986 | 0.5341 | — | Zixin Luo | ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. | https://github.com/lzx551402/contextdesc | zluoag@cse.ust.hk | N/A | 128 uint8 |
SIFT-Dense-ContextDesc kp:8000, match:nn |
19-05-28 | F | 97.0 | 5371.2 | 97.8 | 3.39 | 0.2258 | 0.3288 | 0.4026 | 0.4630 | 0.5033 | — | Zixin Luo, Jiahui Zhang | Dense-ContextDesc is a variant of ContextDesc, where descriptors are densely extracted from full images, instead of image patches, while other settings stay unchanged as original ContextDesc. We find Dense-ContextDesc performs better regarding in particular illumination changes. Dense-ContextDesc is extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. The model will be available on the authors' GitHub page. | TBA | zluoag@cse.ust.hk | N/A | TBA |
SIFT + ContextDesc + Inlier Classification V2 kp:8000, match:custom |
19-05-28 | F/M | 99.2 | 6703.2 | 90.2 | 3.64 | 0.6642 | 0.7632 | 0.8043 | 0.8240 | 0.8371 | — | Dawei Sun, Zixin Luo, Jiahui Zhang | We use the SIFT detector and ContextDesc descriptor, and then we train an improved inlier classification and fundamental matrix estimation network based on [Yi et al. CVPR2018] (https://arxiv.org/pdf/1711.05971.pdf). | https://github.com/lzx551402/contextdesc | zluoag@cse.ust.hk | N/A | 128 float32 |
SIFT-GeoDesc-GitHub kp:8000, match:nn |
19-05-08 | F | 97.4 | 5624.5 | 95.0 | 3.38 | 0.3033 | 0.4097 | 0.4788 | 0.5296 | 0.5633 | — | Zixin Luo | GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features extracted from the code provided by the authors. | https://arxiv.org/abs/1807.06294 | zluoag@cse.ust.hk | N/A | 128 float32 |
SIFT + GeoDesc kp:8000, match:nn |
19-04-24 | F | 97.3 | 5817.2 | 97.0 | 3.41 | 0.3031 | 0.3966 | 0.4632 | 0.5150 | 0.5497 | — | Challenge organizers | GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. | https://github.com/lzx551402/geodesc | imagematching@uvic.ca | N/A | 128 float32 |
SIFT + HardNet kp:8000, match:nn |
19-04-24 | F | 97.8 | 6797.9 | 98.2 | 3.33 | 0.2767 | 0.3813 | 0.4477 | 0.4950 | 0.5404 | — | Challenge organizers | HardNet extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. | https://github.com/DagnyT/hardnet | imagematching@uvic.ca | N/A | 128 float32 |
SIFT + L2-Net kp:8000, match:nn |
19-04-24 | F | 97.6 | 6504.8 | 98.8 | 3.25 | 0.2300 | 0.3503 | 0.4192 | 0.4664 | 0.5079 | — | Challenge organizers | L2-Net extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. | https://github.com/yuruntian/L2-Net | imagematching@uvic.ca | N/A | 128 float32 |
SIFT + ContextDesc + Inlier Classification V1 kp:8000, match:custom |
19-05-29 | F/M | 98.7 | 6598.4 | 92.5 | 3.61 | 0.6263 | 0.7309 | 0.7760 | 0.8065 | 0.8274 | — | Dawei Sun, Zixin Luo, Jiahui Zhang | We use the SIFT detector and ContextDesc descriptor, and then we train an inlier classification and fundamental matrix estimation network using the architecture of [Yi et al. CVPR2018] (https://arxiv.org/pdf/1711.05971.pdf). | https://github.com/lzx551402/contextdesc | zluoag@cse.ust.hk | N/A | 128 float32 |
SIFT (OpenCV) kp:8000, match:nn |
19-04-24 | F | 96.2 | 4660.9 | 92.2 | 3.28 | 0.2605 | 0.3551 | 0.4216 | 0.4699 | 0.5110 | — | Challenge organizers | SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. | https://opencv.org | imagematching@uvic.ca | N/A | 128 float32 |
SIFT + TFeat kp:8000, match:nn |
19-04-24 | F | 96.8 | 5811.1 | 97.0 | 3.19 | 0.2201 | 0.3241 | 0.3929 | 0.4449 | 0.4812 | — | Challenge organizers | T-Feat extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. | https://github.com/vbalnt/tfeat | imagematching@uvic.ca | N/A | 128 float32 |
SIFT (OpenCV) kp:2048, match:nn |
19-05-17 | F | 87.1 | 1175.4 | 84.7 | 2.80 | 0.0755 | 0.1394 | 0.1915 | 0.2308 | 0.2651 | — | Challenge organizers | SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. | https://opencv.org | imagematching@uvic.ca | N/A | 128 float32 |
Superpoint (nn matcher) kp:2048, match:nn |
19-06-07 | F | 93.8 | 1601.8 | 93.5 | 3.47 | 0.2030 | 0.3008 | 0.3591 | 0.4018 | 0.4331 | — | Challenge organizers | SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. | TBA | imagematching@uvic.ca | N/A | 256 float32 |
Superpoint (1:1 matcher) kp:2048, match:nn1to1 |
19-06-07 | F | 94.3 | 1567.3 | 93.2 | 3.64 | 0.2446 | 0.3533 | 0.4084 | 0.4488 | 0.4822 | — | Challenge organizers | SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. | TBA | imagematching@uvic.ca | N/A | 256 float32 |
SuperPoint (default) kp:2048, match:nn |
19-04-24 | F | 92.9 | 1088.8 | 93.0 | 3.54 | 0.2075 | 0.3153 | 0.3722 | 0.4203 | 0.4525 | — | Challenge organizers | SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned (about 1200 on average). Feature matching done with brute-force nearest-neighbour search. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | imagematching@uvic.ca | N/A | 256 float32 |
SuperPoint kp:1024, match:nn |
19-04-26 | F | 91.7 | 914.4 | 92.0 | 3.62 | 0.1855 | 0.2772 | 0.3370 | 0.3825 | 0.4207 | — | Challenge organizers | SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | imagematching@uvic.ca | N/A | 256 float32 |
SuperPoint kp:2048, match:nn |
19-04-26 | F | 93.1 | 1585.5 | 94.5 | 3.50 | 0.1955 | 0.2930 | 0.3598 | 0.4067 | 0.4375 | — | Challenge organizers | SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | imagematching@uvic.ca | N/A | 256 float32 |
SuperPoint kp:256, match:nn |
19-04-26 | F | 76.8 | 189.6 | 66.5 | 3.05 | 0.0376 | 0.0915 | 0.1210 | 0.1440 | 0.1620 | — | Challenge organizers | SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | imagematching@uvic.ca | N/A | 256 float32 |
SuperPoint kp:512, match:nn |
19-04-26 | F | 87.6 | 463.8 | 88.5 | 3.39 | 0.1346 | 0.2248 | 0.2769 | 0.3134 | 0.3449 | — | Challenge organizers | SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | imagematching@uvic.ca | N/A | 256 float32 |
SuperPoint kp:8000, match:nn |
19-04-26 | F | 95.1 | 4993.1 | 95.8 | 3.43 | 0.1759 | 0.2915 | 0.3651 | 0.4184 | 0.4571 | — | Challenge organizers | SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | imagematching@uvic.ca | N/A | 256 float32 |
Scale-invariant desc. (Cartesian, lambda=16) kp:8000, match:nn |
19-07-29 | F | 97.3 | 7456.5 | 97.8 | 3.22 | 0.1949 | 0.3117 | 0.3781 | 0.4246 | 0.4590 | — | Patrick Ebel | We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. (This is a baseline where we use cartesian patches instead.) Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. | TBA | patrick.ebel@epfl.ch | N/A | 128 float32 |
SuperPoint (trained on coco + phototourism training set) kp:2048, match:nn |
19-05-30 | F | 93.7 | 1202.0 | 93.0 | 3.52 | 0.2204 | 0.3234 | 0.3850 | 0.4317 | 0.4649 | — | Daniel DeTone, Paul Sarlin, Tomasz Malisiewicz, Andrew Rabinovich | SuperPoint V1 model trained on COCO homographic warps at VGA resolution, plus pairs from the phototourism training set using the GT poses and depths for correspondence. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | ddetone@magicleap.com | N/A | 256 float32 |
SuperPoint + Custom Matcher (v1) kp:2048, match:custom |
19-05-30 | F/M | 97.4 | 1249.3 | 92.2 | 4.05 | 0.5040 | 0.6062 | 0.6750 | 0.7181 | 0.7449 | — | Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich | Features are extracted by a modified SuperPoint: - retrained on the phototourism training set; - keypoints refinement; - better descriptor sampling; - adjusted thresholds. A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). | TBA | pesarlin@magicleap.com | N/A | 256 float32 |
SuperPoint + Custom Matcher (v2) kp:2048, match:custom |
19-05-28 | F/M | 97.4 | 1476.1 | 89.5 | 4.10 | 0.5705 | 0.6767 | 0.7210 | 0.7541 | 0.7708 | — | Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich | Features are extracted by a modified SuperPoint: - retrained on the phototourism training set, - keypoints refinement, - better descriptor sampling, - adjusted thresholds; A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). | TBA | pesarlin@magicleap.com | N/A | 256 float32 |
SURF (OpenCV) kp:8000, match:nn |
19-04-24 | F | 91.3 | 2316.1 | 87.2 | 3.08 | 0.1986 | 0.2845 | 0.3438 | 0.3844 | 0.4171 | — | Challenge organizers | SURF, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. | https://opencv.org | imagematching@uvic.ca | N/A | TBA |
SIFT + ContextDesc kp:8000, match:nn1to1 |
19-06-07 | F | 97.4 | 6743.8 | 98.5 | 3.34 | 0.2794 | 0.3979 | 0.4667 | 0.5176 | 0.5543 | — | Challenge organizers | ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with nearest-neighbour search, enforcing cross-match consistency. Features are quantized to uint8 and extracted from the code provided by the authors. | https://github.com/lzx551402/contextdesc | imagematching@uvic.ca | N/A | TBA |
MVS — sequence 'sagrada_familia' | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Method | Date | Type | Ims (%) | #Pts | SR | TL | mAP5o | mAP10o | mAP15o | mAP20o | mAP25o | ATE | ||||||
AKAZE (OpenCV) kp:8000, match:nn |
19-04-24 | F | 94.8 | 5940.4 | 86.0 | 3.34 | 0.4935 | 0.5487 | 0.5709 | 0.5878 | 0.6045 | — | Challenge organizers | AKAZE, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. | https://opencv.org | imagematching@uvic.ca | N/A | TBA |
SphereDesc kp:8000, match:nn |
19-04-26 | F | 98.3 | 6872.1 | 94.0 | 3.48 | 0.6073 | 0.7024 | 0.7381 | 0.7566 | 0.7772 | — | Anonymous | We use OpenCV's implementation of AKAZE detector, and for each keypoint we extract a descriptor via CNN. | N/A | Anonymous | N/A | 256 float32 |
Brisk + SSS kp:8000, match:nn |
19-05-14 | F | 96.7 | 4768.0 | 91.2 | 2.86 | 0.3090 | 0.3994 | 0.4466 | 0.4838 | 0.5194 | — | Anonymous | We use OpenCV's implementation of brisk detector with the default settings, and for each image there are at most 8K keypoints. For each keypoint, we extract a descriptor via a CNN model. | TBA | Anonymous | N/A | 128 float32 |
D2-Net (single scale) kp:8000, match:nn |
19-05-07 | F | 89.5 | 7264.4 | 87.5 | 3.15 | 0.3794 | 0.4680 | 0.5189 | 0.5427 | 0.5621 | — | Challenge organizers | D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf | https://github.com/mihaidusmanu/d2-net | imagematching@uvic.ca | N/A | 512 float32 |
D2-Net (multiscale) kp:8000, match:nn |
19-05-07 | F | 89.9 | 10021.2 | 90.5 | 2.98 | 0.3677 | 0.4655 | 0.5083 | 0.5418 | 0.5670 | — | Challenge organizers | D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf | https://github.com/mihaidusmanu/d2-net | imagematching@uvic.ca | N/A | 512 float32 |
D2-Net (single scale, no PT dataset) kp:8000, match:nn |
19-06-01 | F | 89.6 | 7110.8 | 87.8 | 3.11 | 0.3851 | 0.4757 | 0.5127 | 0.5392 | 0.5611 | — | Challenge organizers | D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf | https://github.com/mihaidusmanu/d2-net | imagematching@uvic.ca | N/A | 512 float32 |
D2-Net (multi-scale, no PT dataset) kp:8000, match:nn |
19-06-05 | F | 89.3 | 7456.1 | 90.0 | 2.85 | 0.2961 | 0.4099 | 0.4576 | 0.4874 | 0.5115 | — | Challenge organizers | D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf | https://github.com/mihaidusmanu/d2-net | imagematching@uvic.ca | N/A | 512 float32 |
DELF kp:1024, match:nn |
19-05-05 | F | 73.3 | 380.0 | 50.5 | 2.30 | 0.0129 | 0.0379 | 0.0621 | 0.0849 | 0.1032 | — | Challenge organizers | DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 | https://github.com/tensorflow/models/tree/master/research/delf | imagematching@uvic.ca | N/A | 40 float32 |
DELF kp:2048, match:nn |
19-05-05 | F | 79.6 | 938.9 | 66.2 | 2.32 | 0.0322 | 0.0897 | 0.1365 | 0.1714 | 0.2008 | — | Challenge organizers | DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 | https://github.com/tensorflow/models/tree/master/research/delf | imagematching@uvic.ca | N/A | 40 float32 |
DELF kp:256, match:nn |
19-05-05 | F | 3.0 | 17.2 | 0.3 | 0.60 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | — | Challenge organizers | DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 | https://github.com/tensorflow/models/tree/master/research/delf | imagematching@uvic.ca | N/A | 40 float32 |
DELF kp:512, match:nn |
19-05-05 | F | 29.8 | 103.7 | 18.2 | 1.65 | 0.0001 | 0.0009 | 0.0018 | 0.0031 | 0.0044 | — | Challenge organizers | DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 | https://github.com/tensorflow/models/tree/master/research/delf | imagematching@uvic.ca | N/A | 40 float32 |
ELF-256D kp:512, match:nn |
19-05-07 | F | 67.9 | 205.1 | 42.8 | 2.82 | 0.0704 | 0.0888 | 0.0985 | 0.1048 | 0.1102 | — | Anonymous | ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. | TBA | Anonymous | N/A | 256 float32 |
ELF-512D kp:512, match:nn |
19-05-09 | F | 67.4 | 194.6 | 44.8 | 2.73 | 0.0593 | 0.0786 | 0.0889 | 0.0958 | 0.1012 | — | Anonymous | ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. | TBA | Anonymous | N/A | 256 float32 |
ELF-SIFT kp:512, match:nn |
19-04-26 | F | 76.8 | 407.4 | 62.7 | 2.95 | 0.1847 | 0.2118 | 0.2236 | 0.2345 | 0.2443 | — | Anonymous | ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are HOG (as in SIFT). | N/A | Anonymous | N/A | 128 uint8 |
SIFT + GeoDesc kp:2048, match:nn |
19-05-19 | F | 96.2 | 1808.8 | 88.0 | 3.56 | 0.5144 | 0.6012 | 0.6402 | 0.6617 | 0.6751 | — | Challenge organizers | GeoDesc extracted on SIFT keypoints. Feature matching with brute-force nearest-neighbour search. | https://github.com/lzx551402/geodesc | imagematching@uvic.ca | N/A | 128 float32 |
SIFT + HardNet kp:2048, match:nn |
19-05-19 | F | 96.3 | 1982.9 | 89.8 | 3.52 | 0.5206 | 0.6096 | 0.6428 | 0.6636 | 0.6857 | — | Challenge organizers | HardNet extracted on SIFT keypoints. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger patches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. | https://github.com/DagnyT/hardnet | imagematching@uvic.ca | N/A | 128 float32 |
HarrisZ/RsGLOH2 kp:8000, match:sGOr2f |
19-05-23 | F | 98.5 | 3099.9 | 93.8 | 3.67 | 0.6417 | 0.7238 | 0.7566 | 0.7788 | 0.7897 | — | Fabio Bellavia and Carlo Colombo | HarrisZ keypoints and Root squared (like RootSIFT) sGLOH2 descriptors using sGOr2f* matching strategy. Distance table entries lower than their respective row-wise and column-wise averages are discarded and then matching pairs are computed using the greedy nearest neighbour as in the WISW@CAIP2019 contest. Keypoints and matches are ordered according to their ranks (best ones first). | http://cvg.dsi.unifi.it/cvg/index.php?id=research#descriptor | bellavia.fabio@gmail.com | N/A | 256 float32 |
HesAffNet - HardNet2 kp:8000, match:nn |
19-05-29 | F | 99.0 | 6513.1 | 97.0 | 3.83 | 0.6514 | 0.7379 | 0.7711 | 0.7984 | 0.8137 | — | Milan Pultar, Dmytro Mishkin, Jiří Matas | Detector: Hessian + AffNet(affine shape) + OriNet (orientation), Code and weight from https://github.com/ducha-aiki/affnet, Paper: https://arxiv.org/abs/1711.06704; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: https://github.com/pultarmi/HardNet_MultiDataset, Paper: https://arxiv.org/pdf/1901.09780.pdf; DeepNets were plugged into MODS framework, but without view synthesis and matching https://github.com/ducha-aiki/mods-light-zmq; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. | TBA | ducha.aiki@gmail.com | N/A | 128 uint8 |
Hessian - HardNet2 kp:8000, match:nn |
19-05-30 | F | 92.6 | 6240.6 | 90.8 | 3.74 | 0.5166 | 0.5766 | 0.5997 | 0.6147 | 0.6272 | — | Milan Pultar, Dmytro Mishkin, Jiří Matas | Detector: Hessian. Gravity vector orientation is assumed; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: https://github.com/pultarmi/HardNet_MultiDataset Paper: https://arxiv.org/pdf/1901.09780.pdf, DeepNets were plugged into MODS framework, but without view synthesis and matching https://github.com/ducha-aiki/mods-light-zmq; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. | TBA | ducha.aiki@gmail.com | N/A | 128 uint8 |
ORB (OpenCV) kp:8000, match:nn |
19-04-24 | F | 90.3 | 4267.3 | 77.8 | 2.90 | 0.2972 | 0.3519 | 0.3802 | 0.3998 | 0.4188 | — | Challenge organizers | ORB, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. | https://opencv.org | imagematching@uvic.ca | N/A | TBA |
Scale-invariant desc. (Log-Polar, lambda=32) kp:8000, match:nn |
19-06-25 | F | 98.9 | 7755.0 | 94.2 | 3.60 | 0.6742 | 0.7390 | 0.7671 | 0.7859 | 0.7980 | — | Patrick Ebel | We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. | TBA | patrick.ebel@epfl.ch | N/A | 128 float32 |
Scale-invariant desc. (Log-Polar, lambda=64) kp:8000, match:nn |
19-06-24 | F | 98.4 | 7764.4 | 94.2 | 3.55 | 0.6458 | 0.7195 | 0.7515 | 0.7726 | 0.7866 | — | Patrick Ebel | We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. | TBA | patrick.ebel@epfl.ch | N/A | 128 float32 |
Scale-invariant desc. (Log-Polar, lambda=96) kp:8000, match:nn |
19-06-20 | F | 98.1 | 7788.2 | 95.2 | 3.54 | 0.6345 | 0.7037 | 0.7335 | 0.7563 | 0.7683 | — | Patrick Ebel | We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. | TBA | patrick.ebel@epfl.ch | N/A | 128 float32 |
SIFT-AID (NN matcher) kp:8000, match:nn |
19-05-10 | F | 91.3 | 5160.2 | 77.5 | 2.96 | 0.3819 | 0.4405 | 0.4663 | 0.4851 | 0.5012 | — | Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon | We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: https://hal.archives-ouvertes.fr/hal-02016010. Code: https://github.com/rdguez-mariano/sift-aid | https://hal.archives-ouvertes.fr/hal-02016010 | facciolo@cmla.ens-cachan.fr | N/A | 6272 bits |
SIFT-AID (custom matcher) kp:8000, match:sift-aid |
19-04-29 | F/M | 90.3 | 4920.5 | 74.8 | 2.90 | 0.3649 | 0.4322 | 0.4562 | 0.4720 | 0.4852 | — | Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon | We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: https://hal.archives-ouvertes.fr/hal-02016010. Code: https://github.com/rdguez-mariano/sift-aid | https://hal.archives-ouvertes.fr/hal-02016010 | facciolo@cmla.ens-cachan.fr | N/A | 6272 bits |
SIFT + ContextDesc kp:8000, match:nn |
19-05-09 | F | 99.2 | 7398.7 | 97.8 | 3.82 | 0.6873 | 0.7552 | 0.7826 | 0.7969 | 0.8147 | — | Zixin Luo | ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. | https://github.com/lzx551402/contextdesc | zluoag@cse.ust.hk | N/A | 128 uint8 |
SIFT-Dense-ContextDesc kp:8000, match:nn |
19-05-28 | F | 97.7 | 6186.7 | 93.0 | 3.79 | 0.6268 | 0.6967 | 0.7268 | 0.7460 | 0.7563 | — | Zixin Luo, Jiahui Zhang | Dense-ContextDesc is a variant of ContextDesc, where descriptors are densely extracted from full images, instead of image patches, while other settings stay unchanged as original ContextDesc. We find Dense-ContextDesc performs better regarding in particular illumination changes. Dense-ContextDesc is extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. The model will be available on the authors' GitHub page. | TBA | zluoag@cse.ust.hk | N/A | TBA |
SIFT + ContextDesc + Inlier Classification V2 kp:8000, match:custom |
19-05-28 | F/M | 99.9 | 7109.3 | 99.2 | 3.99 | 0.7882 | 0.8712 | 0.9044 | 0.9231 | 0.9325 | — | Dawei Sun, Zixin Luo, Jiahui Zhang | We use the SIFT detector and ContextDesc descriptor, and then we train an improved inlier classification and fundamental matrix estimation network based on [Yi et al. CVPR2018] (https://arxiv.org/pdf/1711.05971.pdf). | https://github.com/lzx551402/contextdesc | zluoag@cse.ust.hk | N/A | 128 float32 |
SIFT-GeoDesc-GitHub kp:8000, match:nn |
19-05-08 | F | 98.7 | 6953.4 | 96.0 | 3.80 | 0.6939 | 0.7666 | 0.7983 | 0.8096 | 0.8279 | — | Zixin Luo | GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features extracted from the code provided by the authors. | https://arxiv.org/abs/1807.06294 | zluoag@cse.ust.hk | N/A | 128 float32 |
SIFT + GeoDesc kp:8000, match:nn |
19-04-24 | F | 98.7 | 7011.0 | 95.5 | 3.81 | 0.6903 | 0.7544 | 0.7740 | 0.7939 | 0.8091 | — | Challenge organizers | GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. | https://github.com/lzx551402/geodesc | imagematching@uvic.ca | N/A | 128 float32 |
SIFT + HardNet kp:8000, match:nn |
19-04-24 | F | 99.4 | 7983.0 | 98.5 | 3.76 | 0.7048 | 0.7742 | 0.8030 | 0.8179 | 0.8344 | — | Challenge organizers | HardNet extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. | https://github.com/DagnyT/hardnet | imagematching@uvic.ca | N/A | 128 float32 |
SIFT + L2-Net kp:8000, match:nn |
19-04-24 | F | 98.8 | 7596.6 | 96.2 | 3.67 | 0.6693 | 0.7417 | 0.7682 | 0.7852 | 0.8004 | — | Challenge organizers | L2-Net extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. | https://github.com/yuruntian/L2-Net | imagematching@uvic.ca | N/A | 128 float32 |
SIFT + ContextDesc + Inlier Classification V1 kp:8000, match:custom |
19-05-29 | F/M | 99.6 | 7004.5 | 98.5 | 3.98 | 0.7868 | 0.8671 | 0.8970 | 0.9121 | 0.9276 | — | Dawei Sun, Zixin Luo, Jiahui Zhang | We use the SIFT detector and ContextDesc descriptor, and then we train an inlier classification and fundamental matrix estimation network using the architecture of [Yi et al. CVPR2018] (https://arxiv.org/pdf/1711.05971.pdf). | https://github.com/lzx551402/contextdesc | zluoag@cse.ust.hk | N/A | 128 float32 |
SIFT (OpenCV) kp:8000, match:nn |
19-04-24 | F | 95.5 | 6397.0 | 85.0 | 3.40 | 0.5238 | 0.5770 | 0.6002 | 0.6150 | 0.6292 | — | Challenge organizers | SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. | https://opencv.org | imagematching@uvic.ca | N/A | 128 float32 |
SIFT + TFeat kp:8000, match:nn |
19-04-24 | F | 97.5 | 7119.6 | 92.2 | 3.55 | 0.5976 | 0.6585 | 0.6857 | 0.7047 | 0.7228 | — | Challenge organizers | T-Feat extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. | https://github.com/vbalnt/tfeat | imagematching@uvic.ca | N/A | 128 float32 |
SIFT (OpenCV) kp:2048, match:nn |
19-05-17 | F | 88.8 | 1581.5 | 77.5 | 3.22 | 0.3584 | 0.4222 | 0.4469 | 0.4630 | 0.4728 | — | Challenge organizers | SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. | https://opencv.org | imagematching@uvic.ca | N/A | 128 float32 |
Superpoint (nn matcher) kp:2048, match:nn |
19-06-07 | F | 97.1 | 2094.6 | 88.2 | 4.08 | 0.6137 | 0.6817 | 0.7074 | 0.7189 | 0.7322 | — | Challenge organizers | SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. | TBA | imagematching@uvic.ca | N/A | 256 float32 |
Superpoint (1:1 matcher) kp:2048, match:nn1to1 |
19-06-07 | F | 97.3 | 2014.2 | 91.0 | 4.14 | 0.6567 | 0.7397 | 0.7685 | 0.7831 | 0.7912 | — | Challenge organizers | SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. | TBA | imagematching@uvic.ca | N/A | 256 float32 |
SuperPoint (default) kp:2048, match:nn |
19-04-24 | F | 88.0 | 1373.6 | 81.0 | 3.54 | 0.4117 | 0.4641 | 0.4853 | 0.4977 | 0.5074 | — | Challenge organizers | SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned (about 1200 on average). Feature matching done with brute-force nearest-neighbour search. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | imagematching@uvic.ca | N/A | 256 float32 |
SuperPoint kp:1024, match:nn |
19-04-26 | F | 87.2 | 981.2 | 79.2 | 3.54 | 0.3896 | 0.4424 | 0.4623 | 0.4737 | 0.4858 | — | Challenge organizers | SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | imagematching@uvic.ca | N/A | 256 float32 |
SuperPoint kp:2048, match:nn |
19-04-26 | F | 88.5 | 1838.3 | 82.2 | 3.56 | 0.4220 | 0.4732 | 0.4956 | 0.5074 | 0.5159 | — | Challenge organizers | SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | imagematching@uvic.ca | N/A | 256 float32 |
SuperPoint kp:256, match:nn |
19-04-26 | F | 76.2 | 231.9 | 55.8 | 3.10 | 0.1653 | 0.1907 | 0.2019 | 0.2115 | 0.2179 | — | Challenge organizers | SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | imagematching@uvic.ca | N/A | 256 float32 |
SuperPoint kp:512, match:nn |
19-04-26 | F | 83.8 | 472.3 | 71.8 | 3.38 | 0.3075 | 0.3490 | 0.3634 | 0.3772 | 0.3851 | — | Challenge organizers | SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | imagematching@uvic.ca | N/A | 256 float32 |
SuperPoint kp:8000, match:nn |
19-04-26 | F | 89.5 | 6354.4 | 87.5 | 3.34 | 0.3630 | 0.4459 | 0.4828 | 0.5083 | 0.5269 | — | Challenge organizers | SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | imagematching@uvic.ca | N/A | 256 float32 |
Scale-invariant desc. (Cartesian, lambda=16) kp:8000, match:nn |
19-07-29 | F | 99.5 | 7966.5 | 98.5 | 3.81 | 0.6945 | 0.7823 | 0.8166 | 0.8363 | 0.8447 | — | Patrick Ebel | We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. (This is a baseline where we use cartesian patches instead.) Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. | TBA | patrick.ebel@epfl.ch | N/A | 128 float32 |
SuperPoint (trained on coco + phototourism training set) kp:2048, match:nn |
19-05-30 | F | 95.8 | 1868.7 | 85.8 | 3.83 | 0.5707 | 0.6319 | 0.6564 | 0.6721 | 0.6820 | — | Daniel DeTone, Paul Sarlin, Tomasz Malisiewicz, Andrew Rabinovich | SuperPoint V1 model trained on COCO homographic warps at VGA resolution, plus pairs from the phototourism training set using the GT poses and depths for correspondence. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | ddetone@magicleap.com | N/A | 256 float32 |
SuperPoint + Custom Matcher (v1) kp:2048, match:custom |
19-05-30 | F/M | 98.0 | 1599.9 | 91.2 | 4.09 | 0.6830 | 0.7700 | 0.7949 | 0.8121 | 0.8225 | — | Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich | Features are extracted by a modified SuperPoint: - retrained on the phototourism training set; - keypoints refinement; - better descriptor sampling; - adjusted thresholds. A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). | TBA | pesarlin@magicleap.com | N/A | 256 float32 |
SuperPoint + Custom Matcher (v2) kp:2048, match:custom |
19-05-28 | F/M | 98.1 | 1818.0 | 92.8 | 4.22 | 0.6883 | 0.7644 | 0.7942 | 0.8133 | 0.8285 | — | Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich | Features are extracted by a modified SuperPoint: - retrained on the phototourism training set, - keypoints refinement, - better descriptor sampling, - adjusted thresholds; A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). | TBA | pesarlin@magicleap.com | N/A | 256 float32 |
SURF (OpenCV) kp:8000, match:nn |
19-04-24 | F | 91.6 | 4425.8 | 79.0 | 3.08 | 0.4146 | 0.4687 | 0.4977 | 0.5097 | 0.5208 | — | Challenge organizers | SURF, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. | https://opencv.org | imagematching@uvic.ca | N/A | TBA |
SIFT + ContextDesc kp:8000, match:nn1to1 |
19-06-07 | F | 99.5 | 7543.2 | 99.2 | 3.93 | 0.7400 | 0.8114 | 0.8463 | 0.8602 | 0.8750 | — | Challenge organizers | ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with nearest-neighbour search, enforcing cross-match consistency. Features are quantized to uint8 and extracted from the code provided by the authors. | https://github.com/lzx551402/contextdesc | imagematching@uvic.ca | N/A | TBA |
MVS — sequence 'st_pauls_cathedral' | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Method | Date | Type | Ims (%) | #Pts | SR | TL | mAP5o | mAP10o | mAP15o | mAP20o | mAP25o | ATE | ||||||
AKAZE (OpenCV) kp:8000, match:nn |
19-04-24 | F | 98.1 | 5029.6 | 93.0 | 3.34 | 0.4401 | 0.5381 | 0.6054 | 0.6487 | 0.6816 | — | Challenge organizers | AKAZE, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. | https://opencv.org | imagematching@uvic.ca | N/A | TBA |
SphereDesc kp:8000, match:nn |
19-04-26 | F | 99.4 | 3838.2 | 98.2 | 3.34 | 0.4444 | 0.5737 | 0.6481 | 0.6951 | 0.7375 | — | Anonymous | We use OpenCV's implementation of AKAZE detector, and for each keypoint we extract a descriptor via CNN. | N/A | Anonymous | N/A | 256 float32 |
Brisk + SSS kp:8000, match:nn |
19-05-14 | F | 97.9 | 2951.3 | 94.0 | 2.82 | 0.2455 | 0.3941 | 0.4849 | 0.5456 | 0.6024 | — | Anonymous | We use OpenCV's implementation of brisk detector with the default settings, and for each image there are at most 8K keypoints. For each keypoint, we extract a descriptor via a CNN model. | TBA | Anonymous | N/A | 128 float32 |
D2-Net (single scale) kp:8000, match:nn |
19-05-07 | F | 94.5 | 3953.4 | 92.5 | 3.12 | 0.2786 | 0.4211 | 0.5070 | 0.5661 | 0.6065 | — | Challenge organizers | D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf | https://github.com/mihaidusmanu/d2-net | imagematching@uvic.ca | N/A | 512 float32 |
D2-Net (multiscale) kp:8000, match:nn |
19-05-07 | F | 95.3 | 5052.5 | 94.0 | 2.92 | 0.2677 | 0.4203 | 0.5044 | 0.5542 | 0.6053 | — | Challenge organizers | D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf | https://github.com/mihaidusmanu/d2-net | imagematching@uvic.ca | N/A | 512 float32 |
D2-Net (single scale, no PT dataset) kp:8000, match:nn |
19-06-01 | F | 94.7 | 4027.0 | 92.5 | 3.13 | 0.2925 | 0.4395 | 0.5209 | 0.5768 | 0.6201 | — | Challenge organizers | D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf | https://github.com/mihaidusmanu/d2-net | imagematching@uvic.ca | N/A | 512 float32 |
D2-Net (multi-scale, no PT dataset) kp:8000, match:nn |
19-06-05 | F | 94.9 | 4921.9 | 93.5 | 2.91 | 0.2615 | 0.4158 | 0.5162 | 0.5794 | 0.6231 | — | Challenge organizers | D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf | https://github.com/mihaidusmanu/d2-net | imagematching@uvic.ca | N/A | 512 float32 |
DELF kp:1024, match:nn |
19-05-05 | F | 86.6 | 764.3 | 72.5 | 2.36 | 0.0304 | 0.1220 | 0.2125 | 0.2769 | 0.3233 | — | Challenge organizers | DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 | https://github.com/tensorflow/models/tree/master/research/delf | imagematching@uvic.ca | N/A | 40 float32 |
DELF kp:2048, match:nn |
19-05-05 | F | 90.5 | 1439.9 | 84.2 | 2.34 | 0.0295 | 0.1255 | 0.2347 | 0.3221 | 0.3912 | — | Challenge organizers | DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 | https://github.com/tensorflow/models/tree/master/research/delf | imagematching@uvic.ca | N/A | 40 float32 |
DELF kp:256, match:nn |
19-05-05 | F | 11.1 | 32.4 | 10.8 | 1.30 | 0.0000 | 0.0002 | 0.0005 | 0.0007 | 0.0009 | — | Challenge organizers | DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 | https://github.com/tensorflow/models/tree/master/research/delf | imagematching@uvic.ca | N/A | 40 float32 |
DELF kp:512, match:nn |
19-05-05 | F | 75.8 | 338.4 | 53.0 | 2.39 | 0.0205 | 0.0828 | 0.1425 | 0.1791 | 0.2028 | — | Challenge organizers | DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 | https://github.com/tensorflow/models/tree/master/research/delf | imagematching@uvic.ca | N/A | 40 float32 |
ELF-256D kp:512, match:nn |
19-05-07 | F | 38.7 | 175.0 | 32.2 | 1.97 | 0.0421 | 0.0638 | 0.0742 | 0.0815 | 0.0863 | — | Anonymous | ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. | TBA | Anonymous | N/A | 256 float32 |
ELF-512D kp:512, match:nn |
19-05-09 | F | 44.5 | 209.0 | 43.5 | 2.06 | 0.0651 | 0.1088 | 0.1297 | 0.1422 | 0.1516 | — | Anonymous | ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. | TBA | Anonymous | N/A | 256 float32 |
ELF-SIFT kp:512, match:nn |
19-04-26 | F | 75.3 | 285.9 | 44.8 | 2.81 | 0.1219 | 0.1727 | 0.1925 | 0.2039 | 0.2102 | — | Anonymous | ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are HOG (as in SIFT). | N/A | Anonymous | N/A | 128 uint8 |
SIFT + GeoDesc kp:2048, match:nn |
19-05-19 | F | 97.4 | 1541.3 | 89.2 | 3.21 | 0.3546 | 0.4782 | 0.5461 | 0.6014 | 0.6410 | — | Challenge organizers | GeoDesc extracted on SIFT keypoints. Feature matching with brute-force nearest-neighbour search. | https://github.com/lzx551402/geodesc | imagematching@uvic.ca | N/A | 128 float32 |
SIFT + HardNet kp:2048, match:nn |
19-05-19 | F | 96.5 | 1732.2 | 88.5 | 3.18 | 0.3593 | 0.4710 | 0.5339 | 0.5833 | 0.6193 | — | Challenge organizers | HardNet extracted on SIFT keypoints. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger patches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. | https://github.com/DagnyT/hardnet | imagematching@uvic.ca | N/A | 128 float32 |
HarrisZ/RsGLOH2 kp:8000, match:sGOr2f |
19-05-23 | F | 99.2 | 1582.2 | 95.8 | 3.41 | 0.4207 | 0.5421 | 0.6161 | 0.6674 | 0.7073 | — | Fabio Bellavia and Carlo Colombo | HarrisZ keypoints and Root squared (like RootSIFT) sGLOH2 descriptors using sGOr2f* matching strategy. Distance table entries lower than their respective row-wise and column-wise averages are discarded and then matching pairs are computed using the greedy nearest neighbour as in the WISW@CAIP2019 contest. Keypoints and matches are ordered according to their ranks (best ones first). | http://cvg.dsi.unifi.it/cvg/index.php?id=research#descriptor | bellavia.fabio@gmail.com | N/A | 256 float32 |
HesAffNet - HardNet2 kp:8000, match:nn |
19-05-29 | F | 99.3 | 5409.4 | 97.2 | 3.60 | 0.4946 | 0.5948 | 0.6565 | 0.7065 | 0.7519 | — | Milan Pultar, Dmytro Mishkin, Jiří Matas | Detector: Hessian + AffNet(affine shape) + OriNet (orientation), Code and weight from https://github.com/ducha-aiki/affnet, Paper: https://arxiv.org/abs/1711.06704; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: https://github.com/pultarmi/HardNet_MultiDataset, Paper: https://arxiv.org/pdf/1901.09780.pdf; DeepNets were plugged into MODS framework, but without view synthesis and matching https://github.com/ducha-aiki/mods-light-zmq; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. | TBA | ducha.aiki@gmail.com | N/A | 128 uint8 |
Hessian - HardNet2 kp:8000, match:nn |
19-05-30 | F | 97.1 | 5559.4 | 96.0 | 3.69 | 0.4895 | 0.5809 | 0.6336 | 0.6708 | 0.6942 | — | Milan Pultar, Dmytro Mishkin, Jiří Matas | Detector: Hessian. Gravity vector orientation is assumed; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: https://github.com/pultarmi/HardNet_MultiDataset Paper: https://arxiv.org/pdf/1901.09780.pdf, DeepNets were plugged into MODS framework, but without view synthesis and matching https://github.com/ducha-aiki/mods-light-zmq; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. | TBA | ducha.aiki@gmail.com | N/A | 128 uint8 |
ORB (OpenCV) kp:8000, match:nn |
19-04-24 | F | 92.1 | 3463.2 | 76.8 | 2.80 | 0.2696 | 0.3522 | 0.3913 | 0.4239 | 0.4531 | — | Challenge organizers | ORB, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. | https://opencv.org | imagematching@uvic.ca | N/A | TBA |
Scale-invariant desc. (Log-Polar, lambda=32) kp:8000, match:nn |
19-06-25 | F | 99.4 | 6959.9 | 97.0 | 3.48 | 0.4576 | 0.5670 | 0.6324 | 0.6850 | 0.7258 | — | Patrick Ebel | We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. | TBA | patrick.ebel@epfl.ch | N/A | 128 float32 |
Scale-invariant desc. (Log-Polar, lambda=64) kp:8000, match:nn |
19-06-24 | F | 99.2 | 7072.3 | 97.2 | 3.50 | 0.4674 | 0.5807 | 0.6505 | 0.6959 | 0.7393 | — | Patrick Ebel | We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. | TBA | patrick.ebel@epfl.ch | N/A | 128 float32 |
Scale-invariant desc. (Log-Polar, lambda=96) kp:8000, match:nn |
19-06-20 | F | 98.9 | 7100.3 | 98.0 | 3.52 | 0.4724 | 0.5723 | 0.6406 | 0.6897 | 0.7307 | — | Patrick Ebel | We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. | TBA | patrick.ebel@epfl.ch | N/A | 128 float32 |
SIFT-AID (NN matcher) kp:8000, match:nn |
19-05-10 | F | 89.8 | 2453.7 | 79.8 | 2.72 | 0.2058 | 0.2932 | 0.3444 | 0.3862 | 0.4224 | — | Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon | We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: https://hal.archives-ouvertes.fr/hal-02016010. Code: https://github.com/rdguez-mariano/sift-aid | https://hal.archives-ouvertes.fr/hal-02016010 | facciolo@cmla.ens-cachan.fr | N/A | 6272 bits |
SIFT-AID (custom matcher) kp:8000, match:sift-aid |
19-04-29 | F/M | 89.4 | 2358.0 | 78.0 | 2.67 | 0.1967 | 0.2852 | 0.3357 | 0.3700 | 0.4064 | — | Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon | We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: https://hal.archives-ouvertes.fr/hal-02016010. Code: https://github.com/rdguez-mariano/sift-aid | https://hal.archives-ouvertes.fr/hal-02016010 | facciolo@cmla.ens-cachan.fr | N/A | 6272 bits |
SIFT + ContextDesc kp:8000, match:nn |
19-05-09 | F | 99.3 | 6204.4 | 99.0 | 3.26 | 0.4145 | 0.5301 | 0.5961 | 0.6511 | 0.6984 | — | Zixin Luo | ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. | https://github.com/lzx551402/contextdesc | zluoag@cse.ust.hk | N/A | 128 uint8 |
SIFT-Dense-ContextDesc kp:8000, match:nn |
19-05-28 | F | 98.2 | 5378.9 | 96.5 | 3.30 | 0.3933 | 0.5027 | 0.5668 | 0.6189 | 0.6594 | — | Zixin Luo, Jiahui Zhang | Dense-ContextDesc is a variant of ContextDesc, where descriptors are densely extracted from full images, instead of image patches, while other settings stay unchanged as original ContextDesc. We find Dense-ContextDesc performs better regarding in particular illumination changes. Dense-ContextDesc is extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. The model will be available on the authors' GitHub page. | TBA | zluoag@cse.ust.hk | N/A | TBA |
SIFT + ContextDesc + Inlier Classification V2 kp:8000, match:custom |
19-05-28 | F/M | 99.9 | 6463.6 | 99.2 | 3.51 | 0.7123 | 0.8234 | 0.8776 | 0.9129 | 0.9317 | — | Dawei Sun, Zixin Luo, Jiahui Zhang | We use the SIFT detector and ContextDesc descriptor, and then we train an improved inlier classification and fundamental matrix estimation network based on [Yi et al. CVPR2018] (https://arxiv.org/pdf/1711.05971.pdf). | https://github.com/lzx551402/contextdesc | zluoag@cse.ust.hk | N/A | 128 float32 |
SIFT-GeoDesc-GitHub kp:8000, match:nn |
19-05-08 | F | 99.2 | 5465.5 | 96.2 | 3.31 | 0.4321 | 0.5493 | 0.6165 | 0.6710 | 0.7125 | — | Zixin Luo | GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features extracted from the code provided by the authors. | https://arxiv.org/abs/1807.06294 | zluoag@cse.ust.hk | N/A | 128 float32 |
SIFT + GeoDesc kp:8000, match:nn |
19-04-24 | F | 99.2 | 5972.0 | 97.2 | 3.50 | 0.4401 | 0.5554 | 0.6234 | 0.6820 | 0.7212 | — | Challenge organizers | GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. | https://github.com/lzx551402/geodesc | imagematching@uvic.ca | N/A | 128 float32 |
SIFT + HardNet kp:8000, match:nn |
19-04-24 | F | 99.7 | 6868.1 | 98.2 | 3.52 | 0.4578 | 0.5680 | 0.6376 | 0.6851 | 0.7191 | — | Challenge organizers | HardNet extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. | https://github.com/DagnyT/hardnet | imagematching@uvic.ca | N/A | 128 float32 |
SIFT + L2-Net kp:8000, match:nn |
19-04-24 | F | 99.3 | 6480.5 | 98.2 | 3.33 | 0.3764 | 0.4962 | 0.5832 | 0.6330 | 0.6822 | — | Challenge organizers | L2-Net extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. | https://github.com/yuruntian/L2-Net | imagematching@uvic.ca | N/A | 128 float32 |
SIFT + ContextDesc + Inlier Classification V1 kp:8000, match:custom |
19-05-29 | F/M | 100.0 | 6347.3 | 98.8 | 3.51 | 0.6772 | 0.8028 | 0.8567 | 0.8880 | 0.9082 | — | Dawei Sun, Zixin Luo, Jiahui Zhang | We use the SIFT detector and ContextDesc descriptor, and then we train an inlier classification and fundamental matrix estimation network using the architecture of [Yi et al. CVPR2018] (https://arxiv.org/pdf/1711.05971.pdf). | https://github.com/lzx551402/contextdesc | zluoag@cse.ust.hk | N/A | 128 float32 |
SIFT (OpenCV) kp:8000, match:nn |
19-04-24 | F | 96.4 | 4750.2 | 89.8 | 3.11 | 0.3349 | 0.4361 | 0.4979 | 0.5367 | 0.5821 | — | Challenge organizers | SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. | https://opencv.org | imagematching@uvic.ca | N/A | 128 float32 |
SIFT + TFeat kp:8000, match:nn |
19-04-24 | F | 98.8 | 5944.1 | 96.5 | 3.23 | 0.3548 | 0.4763 | 0.5503 | 0.6128 | 0.6506 | — | Challenge organizers | T-Feat extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. | https://github.com/vbalnt/tfeat | imagematching@uvic.ca | N/A | 128 float32 |
SIFT (OpenCV) kp:2048, match:nn |
19-05-17 | F | 87.5 | 1255.7 | 73.0 | 2.97 | 0.2381 | 0.3189 | 0.3590 | 0.3897 | 0.4106 | — | Challenge organizers | SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. | https://opencv.org | imagematching@uvic.ca | N/A | 128 float32 |
Superpoint (nn matcher) kp:2048, match:nn |
19-06-07 | F | 98.0 | 1626.5 | 92.5 | 3.69 | 0.4208 | 0.5450 | 0.6131 | 0.6646 | 0.7078 | — | Challenge organizers | SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. | TBA | imagematching@uvic.ca | N/A | 256 float32 |
Superpoint (1:1 matcher) kp:2048, match:nn1to1 |
19-06-07 | F | 98.8 | 1587.8 | 95.0 | 3.79 | 0.5113 | 0.6443 | 0.7030 | 0.7471 | 0.7763 | — | Challenge organizers | SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. | TBA | imagematching@uvic.ca | N/A | 256 float32 |
SuperPoint (default) kp:2048, match:nn |
19-04-24 | F | 93.5 | 1027.1 | 88.5 | 3.59 | 0.3638 | 0.4593 | 0.5128 | 0.5560 | 0.5902 | — | Challenge organizers | SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned (about 1200 on average). Feature matching done with brute-force nearest-neighbour search. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | imagematching@uvic.ca | N/A | 256 float32 |
SuperPoint kp:1024, match:nn |
19-04-26 | F | 92.9 | 830.4 | 85.2 | 3.58 | 0.3786 | 0.4686 | 0.5174 | 0.5500 | 0.5801 | — | Challenge organizers | SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | imagematching@uvic.ca | N/A | 256 float32 |
SuperPoint kp:2048, match:nn |
19-04-26 | F | 93.6 | 1468.0 | 90.5 | 3.51 | 0.3513 | 0.4497 | 0.5199 | 0.5611 | 0.5891 | — | Challenge organizers | SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | imagematching@uvic.ca | N/A | 256 float32 |
SuperPoint kp:256, match:nn |
19-04-26 | F | 49.7 | 184.0 | 35.2 | 2.32 | 0.1264 | 0.1528 | 0.1638 | 0.1699 | 0.1742 | — | Challenge organizers | SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | imagematching@uvic.ca | N/A | 256 float32 |
SuperPoint kp:512, match:nn |
19-04-26 | F | 89.4 | 434.7 | 69.5 | 3.34 | 0.3006 | 0.3722 | 0.4048 | 0.4272 | 0.4503 | — | Challenge organizers | SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | imagematching@uvic.ca | N/A | 256 float32 |
SuperPoint kp:8000, match:nn |
19-04-26 | F | 94.8 | 4882.2 | 94.5 | 3.30 | 0.2568 | 0.3925 | 0.4707 | 0.5367 | 0.5743 | — | Challenge organizers | SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | imagematching@uvic.ca | N/A | 256 float32 |
Scale-invariant desc. (Cartesian, lambda=16) kp:8000, match:nn |
19-07-29 | F | 99.8 | 7262.0 | 98.0 | 3.54 | 0.4289 | 0.5625 | 0.6483 | 0.7029 | 0.7403 | — | Patrick Ebel | We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. (This is a baseline where we use cartesian patches instead.) Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. | TBA | patrick.ebel@epfl.ch | N/A | 128 float32 |
SuperPoint (trained on coco + phototourism training set) kp:2048, match:nn |
19-05-30 | F | 97.3 | 1298.0 | 89.0 | 3.59 | 0.4018 | 0.5150 | 0.5875 | 0.6367 | 0.6728 | — | Daniel DeTone, Paul Sarlin, Tomasz Malisiewicz, Andrew Rabinovich | SuperPoint V1 model trained on COCO homographic warps at VGA resolution, plus pairs from the phototourism training set using the GT poses and depths for correspondence. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | ddetone@magicleap.com | N/A | 256 float32 |
SuperPoint + Custom Matcher (v1) kp:2048, match:custom |
19-05-30 | F/M | 99.5 | 1230.0 | 95.0 | 3.96 | 0.6269 | 0.7390 | 0.7887 | 0.8344 | 0.8547 | — | Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich | Features are extracted by a modified SuperPoint: - retrained on the phototourism training set; - keypoints refinement; - better descriptor sampling; - adjusted thresholds. A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). | TBA | pesarlin@magicleap.com | N/A | 256 float32 |
SuperPoint + Custom Matcher (v2) kp:2048, match:custom |
19-05-28 | F/M | 99.5 | 1464.3 | 96.0 | 4.00 | 0.6377 | 0.7520 | 0.8112 | 0.8422 | 0.8664 | — | Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich | Features are extracted by a modified SuperPoint: - retrained on the phototourism training set, - keypoints refinement, - better descriptor sampling, - adjusted thresholds; A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). | TBA | pesarlin@magicleap.com | N/A | 256 float32 |
SURF (OpenCV) kp:8000, match:nn |
19-04-24 | F | 92.4 | 2982.1 | 80.2 | 2.85 | 0.2626 | 0.3615 | 0.4159 | 0.4536 | 0.4859 | — | Challenge organizers | SURF, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. | https://opencv.org | imagematching@uvic.ca | N/A | TBA |
SIFT + ContextDesc kp:8000, match:nn1to1 |
19-06-07 | F | 99.8 | 6758.6 | 98.2 | 3.39 | 0.5165 | 0.6437 | 0.7185 | 0.7677 | 0.8067 | — | Challenge organizers | ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with nearest-neighbour search, enforcing cross-match consistency. Features are quantized to uint8 and extracted from the code provided by the authors. | https://github.com/lzx551402/contextdesc | imagematching@uvic.ca | N/A | TBA |
MVS — sequence 'united_states_capitol' | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Method | Date | Type | Ims (%) | #Pts | SR | TL | mAP5o | mAP10o | mAP15o | mAP20o | mAP25o | ATE | ||||||
AKAZE (OpenCV) kp:8000, match:nn |
19-04-24 | F | 91.0 | 1813.4 | 86.5 | 2.91 | 0.0720 | 0.1280 | 0.1847 | 0.2412 | 0.2880 | — | Challenge organizers | AKAZE, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. | https://opencv.org | imagematching@uvic.ca | N/A | TBA |
SphereDesc kp:8000, match:nn |
19-04-26 | F | 89.0 | 928.2 | 82.5 | 2.76 | 0.0510 | 0.1023 | 0.1606 | 0.2114 | 0.2634 | — | Anonymous | We use OpenCV's implementation of AKAZE detector, and for each keypoint we extract a descriptor via CNN. | N/A | Anonymous | N/A | 256 float32 |
Brisk + SSS kp:8000, match:nn |
19-05-14 | F | 80.4 | 701.1 | 71.0 | 2.48 | 0.0194 | 0.0518 | 0.0936 | 0.1299 | 0.1688 | — | Anonymous | We use OpenCV's implementation of brisk detector with the default settings, and for each image there are at most 8K keypoints. For each keypoint, we extract a descriptor via a CNN model. | TBA | Anonymous | N/A | 128 float32 |
D2-Net (single scale) kp:8000, match:nn |
19-05-07 | F | 88.8 | 1326.9 | 86.8 | 2.69 | 0.0220 | 0.0745 | 0.1321 | 0.1910 | 0.2469 | — | Challenge organizers | D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf | https://github.com/mihaidusmanu/d2-net | imagematching@uvic.ca | N/A | 512 float32 |
D2-Net (multiscale) kp:8000, match:nn |
19-05-07 | F | 89.8 | 1647.0 | 89.5 | 2.76 | 0.0207 | 0.0729 | 0.1323 | 0.1987 | 0.2639 | — | Challenge organizers | D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf | https://github.com/mihaidusmanu/d2-net | imagematching@uvic.ca | N/A | 512 float32 |
D2-Net (single scale, no PT dataset) kp:8000, match:nn |
19-06-01 | F | 89.2 | 1400.0 | 86.0 | 2.71 | 0.0274 | 0.0861 | 0.1431 | 0.1964 | 0.2513 | — | Challenge organizers | D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf | https://github.com/mihaidusmanu/d2-net | imagematching@uvic.ca | N/A | 512 float32 |
D2-Net (multi-scale, no PT dataset) kp:8000, match:nn |
19-06-05 | F | 90.4 | 1616.3 | 89.0 | 2.82 | 0.0203 | 0.0651 | 0.1208 | 0.1831 | 0.2547 | — | Challenge organizers | D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf | https://github.com/mihaidusmanu/d2-net | imagematching@uvic.ca | N/A | 512 float32 |
DELF kp:1024, match:nn |
19-05-05 | F | 81.0 | 423.7 | 81.0 | 2.35 | 0.0025 | 0.0256 | 0.0602 | 0.0995 | 0.1363 | — | Challenge organizers | DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 | https://github.com/tensorflow/models/tree/master/research/delf | imagematching@uvic.ca | N/A | 40 float32 |
DELF kp:2048, match:nn |
19-05-05 | F | 83.0 | 653.1 | 83.8 | 2.30 | 0.0044 | 0.0264 | 0.0615 | 0.1043 | 0.1467 | — | Challenge organizers | DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 | https://github.com/tensorflow/models/tree/master/research/delf | imagematching@uvic.ca | N/A | 40 float32 |
DELF kp:256, match:nn |
19-05-05 | F | 61.2 | 88.2 | 37.3 | 2.31 | 0.0007 | 0.0043 | 0.0112 | 0.0208 | 0.0286 | — | Challenge organizers | DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 | https://github.com/tensorflow/models/tree/master/research/delf | imagematching@uvic.ca | N/A | 40 float32 |
DELF kp:512, match:nn |
19-05-05 | F | 72.0 | 207.2 | 68.2 | 2.35 | 0.0014 | 0.0171 | 0.0375 | 0.0634 | 0.0849 | — | Challenge organizers | DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 | https://github.com/tensorflow/models/tree/master/research/delf | imagematching@uvic.ca | N/A | 40 float32 |
ELF-256D kp:512, match:nn |
19-05-07 | F | 3.7 | 19.0 | 5.2 | 0.60 | 0.0001 | 0.0002 | 0.0002 | 0.0002 | 0.0003 | — | Anonymous | ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. | TBA | Anonymous | N/A | 256 float32 |
ELF-512D kp:512, match:nn |
19-05-09 | F | 25.9 | 62.5 | 12.5 | 1.74 | 0.0000 | 0.0004 | 0.0009 | 0.0013 | 0.0017 | — | Anonymous | ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. | TBA | Anonymous | N/A | 256 float32 |
ELF-SIFT kp:512, match:nn |
19-04-26 | F | 28.2 | 79.4 | 34.2 | 1.77 | 0.0017 | 0.0038 | 0.0055 | 0.0067 | 0.0080 | — | Anonymous | ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are HOG (as in SIFT). | N/A | Anonymous | N/A | 128 uint8 |
SIFT + GeoDesc kp:2048, match:nn |
19-05-19 | F | 82.5 | 676.1 | 81.5 | 2.69 | 0.0247 | 0.0590 | 0.1021 | 0.1516 | 0.1885 | — | Challenge organizers | GeoDesc extracted on SIFT keypoints. Feature matching with brute-force nearest-neighbour search. | https://github.com/lzx551402/geodesc | imagematching@uvic.ca | N/A | 128 float32 |
SIFT + HardNet kp:2048, match:nn |
19-05-19 | F | 82.3 | 715.0 | 80.8 | 2.74 | 0.0359 | 0.0787 | 0.1218 | 0.1618 | 0.1988 | — | Challenge organizers | HardNet extracted on SIFT keypoints. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger patches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. | https://github.com/DagnyT/hardnet | imagematching@uvic.ca | N/A | 128 float32 |
HarrisZ/RsGLOH2 kp:8000, match:sGOr2f |
19-05-23 | F | 89.8 | 653.5 | 85.8 | 2.94 | 0.0999 | 0.1562 | 0.2137 | 0.2636 | 0.3083 | — | Fabio Bellavia and Carlo Colombo | HarrisZ keypoints and Root squared (like RootSIFT) sGLOH2 descriptors using sGOr2f* matching strategy. Distance table entries lower than their respective row-wise and column-wise averages are discarded and then matching pairs are computed using the greedy nearest neighbour as in the WISW@CAIP2019 contest. Keypoints and matches are ordered according to their ranks (best ones first). | http://cvg.dsi.unifi.it/cvg/index.php?id=research#descriptor | bellavia.fabio@gmail.com | N/A | 256 float32 |
HesAffNet - HardNet2 kp:8000, match:nn |
19-05-29 | F | 90.6 | 2309.4 | 90.0 | 2.84 | 0.0540 | 0.1056 | 0.1619 | 0.2106 | 0.2607 | — | Milan Pultar, Dmytro Mishkin, Jiří Matas | Detector: Hessian + AffNet(affine shape) + OriNet (orientation), Code and weight from https://github.com/ducha-aiki/affnet, Paper: https://arxiv.org/abs/1711.06704; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: https://github.com/pultarmi/HardNet_MultiDataset, Paper: https://arxiv.org/pdf/1901.09780.pdf; DeepNets were plugged into MODS framework, but without view synthesis and matching https://github.com/ducha-aiki/mods-light-zmq; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. | TBA | ducha.aiki@gmail.com | N/A | 128 uint8 |
Hessian - HardNet2 kp:8000, match:nn |
19-05-30 | F | 86.7 | 2442.2 | 86.0 | 2.97 | 0.0555 | 0.1024 | 0.1572 | 0.2004 | 0.2431 | — | Milan Pultar, Dmytro Mishkin, Jiří Matas | Detector: Hessian. Gravity vector orientation is assumed; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: https://github.com/pultarmi/HardNet_MultiDataset Paper: https://arxiv.org/pdf/1901.09780.pdf, DeepNets were plugged into MODS framework, but without view synthesis and matching https://github.com/ducha-aiki/mods-light-zmq; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. | TBA | ducha.aiki@gmail.com | N/A | 128 uint8 |
ORB (OpenCV) kp:8000, match:nn |
19-04-24 | F | 74.6 | 1198.0 | 60.0 | 2.39 | 0.0159 | 0.0394 | 0.0613 | 0.0846 | 0.1068 | — | Challenge organizers | ORB, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. | https://opencv.org | imagematching@uvic.ca | N/A | TBA |
Scale-invariant desc. (Log-Polar, lambda=32) kp:8000, match:nn |
19-06-25 | F | 92.6 | 2591.7 | 92.2 | 2.88 | 0.0767 | 0.1290 | 0.1863 | 0.2457 | 0.2978 | — | Patrick Ebel | We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. | TBA | patrick.ebel@epfl.ch | N/A | 128 float32 |
Scale-invariant desc. (Log-Polar, lambda=64) kp:8000, match:nn |
19-06-24 | F | 92.4 | 2676.2 | 91.8 | 2.90 | 0.0788 | 0.1378 | 0.2032 | 0.2611 | 0.3144 | — | Patrick Ebel | We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. | TBA | patrick.ebel@epfl.ch | N/A | 128 float32 |
Scale-invariant desc. (Log-Polar, lambda=96) kp:8000, match:nn |
19-06-20 | F | 92.7 | 2692.5 | 93.5 | 2.88 | 0.0727 | 0.1337 | 0.1957 | 0.2622 | 0.3221 | — | Patrick Ebel | We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. | TBA | patrick.ebel@epfl.ch | N/A | 128 float32 |
SIFT-AID (NN matcher) kp:8000, match:nn |
19-05-10 | F | 78.1 | 550.6 | 70.2 | 2.51 | 0.0088 | 0.0263 | 0.0502 | 0.0817 | 0.1114 | — | Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon | We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: https://hal.archives-ouvertes.fr/hal-02016010. Code: https://github.com/rdguez-mariano/sift-aid | https://hal.archives-ouvertes.fr/hal-02016010 | facciolo@cmla.ens-cachan.fr | N/A | 6272 bits |
SIFT-AID (custom matcher) kp:8000, match:sift-aid |
19-04-29 | F/M | 77.4 | 521.8 | 68.5 | 2.49 | 0.0060 | 0.0208 | 0.0491 | 0.0757 | 0.1008 | — | Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon | We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: https://hal.archives-ouvertes.fr/hal-02016010. Code: https://github.com/rdguez-mariano/sift-aid | https://hal.archives-ouvertes.fr/hal-02016010 | facciolo@cmla.ens-cachan.fr | N/A | 6272 bits |
SIFT + ContextDesc kp:8000, match:nn |
19-05-09 | F | 91.9 | 1870.7 | 90.8 | 2.77 | 0.0663 | 0.1178 | 0.1735 | 0.2268 | 0.2712 | — | Zixin Luo | ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. | https://github.com/lzx551402/contextdesc | zluoag@cse.ust.hk | N/A | 128 uint8 |
SIFT-Dense-ContextDesc kp:8000, match:nn |
19-05-28 | F | 88.5 | 1579.1 | 86.0 | 2.69 | 0.0515 | 0.0934 | 0.1371 | 0.1923 | 0.2406 | — | Zixin Luo, Jiahui Zhang | Dense-ContextDesc is a variant of ContextDesc, where descriptors are densely extracted from full images, instead of image patches, while other settings stay unchanged as original ContextDesc. We find Dense-ContextDesc performs better regarding in particular illumination changes. Dense-ContextDesc is extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. The model will be available on the authors' GitHub page. | TBA | zluoag@cse.ust.hk | N/A | TBA |
SIFT + ContextDesc + Inlier Classification V2 kp:8000, match:custom |
19-05-28 | F/M | 93.6 | 2254.2 | 95.8 | 2.89 | 0.0986 | 0.1665 | 0.2345 | 0.2961 | 0.3552 | — | Dawei Sun, Zixin Luo, Jiahui Zhang | We use the SIFT detector and ContextDesc descriptor, and then we train an improved inlier classification and fundamental matrix estimation network based on [Yi et al. CVPR2018] (https://arxiv.org/pdf/1711.05971.pdf). | https://github.com/lzx551402/contextdesc | zluoag@cse.ust.hk | N/A | 128 float32 |
SIFT-GeoDesc-GitHub kp:8000, match:nn |
19-05-08 | F | 90.8 | 1685.8 | 89.0 | 2.80 | 0.0634 | 0.1011 | 0.1497 | 0.2003 | 0.2523 | — | Zixin Luo | GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features extracted from the code provided by the authors. | https://arxiv.org/abs/1807.06294 | zluoag@cse.ust.hk | N/A | 128 float32 |
SIFT + GeoDesc kp:8000, match:nn |
19-04-24 | F | 91.3 | 2088.3 | 90.2 | 2.86 | 0.0640 | 0.1096 | 0.1616 | 0.2239 | 0.2710 | — | Challenge organizers | GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. | https://github.com/lzx551402/geodesc | imagematching@uvic.ca | N/A | 128 float32 |
SIFT + HardNet kp:8000, match:nn |
19-04-24 | F | 92.5 | 2496.1 | 93.5 | 2.87 | 0.0659 | 0.1175 | 0.1794 | 0.2339 | 0.2888 | — | Challenge organizers | HardNet extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. | https://github.com/DagnyT/hardnet | imagematching@uvic.ca | N/A | 128 float32 |
SIFT + L2-Net kp:8000, match:nn |
19-04-24 | F | 91.1 | 2193.0 | 93.0 | 2.87 | 0.0671 | 0.1217 | 0.1778 | 0.2377 | 0.2895 | — | Challenge organizers | L2-Net extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. | https://github.com/yuruntian/L2-Net | imagematching@uvic.ca | N/A | 128 float32 |
SIFT + ContextDesc + Inlier Classification V1 kp:8000, match:custom |
19-05-29 | F/M | 93.8 | 2146.5 | 96.8 | 2.88 | 0.0897 | 0.1619 | 0.2190 | 0.2876 | 0.3456 | — | Dawei Sun, Zixin Luo, Jiahui Zhang | We use the SIFT detector and ContextDesc descriptor, and then we train an inlier classification and fundamental matrix estimation network using the architecture of [Yi et al. CVPR2018] (https://arxiv.org/pdf/1711.05971.pdf). | https://github.com/lzx551402/contextdesc | zluoag@cse.ust.hk | N/A | 128 float32 |
SIFT (OpenCV) kp:8000, match:nn |
19-04-24 | F | 82.8 | 1322.7 | 79.8 | 2.65 | 0.0441 | 0.0756 | 0.1131 | 0.1496 | 0.1827 | — | Challenge organizers | SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. | https://opencv.org | imagematching@uvic.ca | N/A | 128 float32 |
SIFT + TFeat kp:8000, match:nn |
19-04-24 | F | 90.0 | 1949.4 | 87.8 | 2.74 | 0.0448 | 0.0903 | 0.1402 | 0.1946 | 0.2422 | — | Challenge organizers | T-Feat extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. | https://github.com/vbalnt/tfeat | imagematching@uvic.ca | N/A | 128 float32 |
SIFT (OpenCV) kp:2048, match:nn |
19-05-17 | F | 72.9 | 516.2 | 63.2 | 2.54 | 0.0216 | 0.0431 | 0.0652 | 0.0850 | 0.1032 | — | Challenge organizers | SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. | https://opencv.org | imagematching@uvic.ca | N/A | 128 float32 |
Superpoint (nn matcher) kp:2048, match:nn |
19-06-07 | F | 86.9 | 669.9 | 86.2 | 2.85 | 0.0493 | 0.1073 | 0.1618 | 0.2121 | 0.2538 | — | Challenge organizers | SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. | TBA | imagematching@uvic.ca | N/A | 256 float32 |
Superpoint (1:1 matcher) kp:2048, match:nn1to1 |
19-06-07 | F | 88.3 | 656.0 | 87.2 | 2.98 | 0.0628 | 0.1220 | 0.1814 | 0.2381 | 0.2909 | — | Challenge organizers | SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. | TBA | imagematching@uvic.ca | N/A | 256 float32 |
SuperPoint (default) kp:2048, match:nn |
19-04-24 | F | 82.5 | 376.4 | 75.0 | 2.92 | 0.0608 | 0.1068 | 0.1470 | 0.1855 | 0.2230 | — | Challenge organizers | SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned (about 1200 on average). Feature matching done with brute-force nearest-neighbour search. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | imagematching@uvic.ca | N/A | 256 float32 |
SuperPoint kp:1024, match:nn |
19-04-26 | F | 79.4 | 335.0 | 74.2 | 2.87 | 0.0524 | 0.0956 | 0.1302 | 0.1667 | 0.1983 | — | Challenge organizers | SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | imagematching@uvic.ca | N/A | 256 float32 |
SuperPoint kp:2048, match:nn |
19-04-26 | F | 84.4 | 537.5 | 77.5 | 2.87 | 0.0563 | 0.1054 | 0.1501 | 0.1934 | 0.2285 | — | Challenge organizers | SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | imagematching@uvic.ca | N/A | 256 float32 |
SuperPoint kp:256, match:nn |
19-04-26 | F | 26.1 | 75.3 | 18.0 | 1.75 | 0.0007 | 0.0011 | 0.0016 | 0.0017 | 0.0023 | — | Challenge organizers | SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | imagematching@uvic.ca | N/A | 256 float32 |
SuperPoint kp:512, match:nn |
19-04-26 | F | 66.6 | 152.6 | 57.0 | 2.59 | 0.0231 | 0.0409 | 0.0585 | 0.0746 | 0.0871 | — | Challenge organizers | SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | imagematching@uvic.ca | N/A | 256 float32 |
SuperPoint kp:8000, match:nn |
19-04-26 | F | 86.7 | 1606.2 | 84.8 | 2.63 | 0.0276 | 0.0716 | 0.1208 | 0.1717 | 0.2196 | — | Challenge organizers | SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | imagematching@uvic.ca | N/A | 256 float32 |
Scale-invariant desc. (Cartesian, lambda=16) kp:8000, match:nn |
19-07-29 | F | 91.8 | 2760.2 | 94.5 | 2.75 | 0.0476 | 0.1005 | 0.1591 | 0.2132 | 0.2671 | — | Patrick Ebel | We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. (This is a baseline where we use cartesian patches instead.) Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. | TBA | patrick.ebel@epfl.ch | N/A | 128 float32 |
SuperPoint (trained on coco + phototourism training set) kp:2048, match:nn |
19-05-30 | F | 85.0 | 464.9 | 82.5 | 2.94 | 0.0617 | 0.1122 | 0.1590 | 0.2069 | 0.2538 | — | Daniel DeTone, Paul Sarlin, Tomasz Malisiewicz, Andrew Rabinovich | SuperPoint V1 model trained on COCO homographic warps at VGA resolution, plus pairs from the phototourism training set using the GT poses and depths for correspondence. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned. | https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork | ddetone@magicleap.com | N/A | 256 float32 |
SuperPoint + Custom Matcher (v1) kp:2048, match:custom |
19-05-30 | F/M | 91.1 | 525.3 | 88.8 | 3.23 | 0.1240 | 0.1938 | 0.2596 | 0.3222 | 0.3717 | — | Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich | Features are extracted by a modified SuperPoint: - retrained on the phototourism training set; - keypoints refinement; - better descriptor sampling; - adjusted thresholds. A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). | TBA | pesarlin@magicleap.com | N/A | 256 float32 |
SuperPoint + Custom Matcher (v2) kp:2048, match:custom |
19-05-28 | F/M | 91.8 | 640.3 | 90.5 | 3.26 | 0.1085 | 0.1803 | 0.2540 | 0.3161 | 0.3740 | — | Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich | Features are extracted by a modified SuperPoint: - retrained on the phototourism training set, - keypoints refinement, - better descriptor sampling, - adjusted thresholds; A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). | TBA | pesarlin@magicleap.com | N/A | 256 float32 |
SURF (OpenCV) kp:8000, match:nn |
19-04-24 | F | 82.8 | 967.7 | 73.0 | 2.52 | 0.0190 | 0.0419 | 0.0760 | 0.1135 | 0.1451 | — | Challenge organizers | SURF, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. | https://opencv.org | imagematching@uvic.ca | N/A | TBA |
SIFT + ContextDesc kp:8000, match:nn1to1 |
19-06-07 | F | 91.9 | 2349.6 | 93.2 | 2.78 | 0.0712 | 0.1350 | 0.1965 | 0.2561 | 0.3058 | — | Challenge organizers | ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with nearest-neighbour search, enforcing cross-match consistency. Features are quantized to uint8 and extracted from the code provided by the authors. | https://github.com/lzx551402/contextdesc | imagematching@uvic.ca | N/A | TBA |