Breakdown: Phototourism | MVS | Sequences

Breakdown on the Phototourism dataset, multi-view stereo task, by sequence.

MVS — All sequences — Sorted by mAP15o
Method BM FCS LMS LB MC MR PSM RS SF SPC USC AVG Date Type By Details Link Contact Updated Descriptor size
kp:8000, match:nn
0.3576 0.4385 0.5551 0.3696 0.5935 0.3131 0.2543 0.4708 0.5709 0.6054 0.1847 0.4285 19-04-24 F Challenge organizers AKAZE, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. N/A TBA
kp:8000, match:nn
0.3772 0.5865 0.6595 0.3789 0.6131 0.3458 0.4122 0.4310 0.7381 0.6481 0.1606 0.4865 19-04-26 F Anonymous We use OpenCV's implementation of AKAZE detector, and for each keypoint we extract a descriptor via CNN. N/A Anonymous N/A 256 float32
Brisk + SSS
kp:8000, match:nn
0.0844 0.3523 0.1150 0.3040 0.3429 0.1207 0.2678 0.3511 0.4466 0.4849 0.0936 0.2694 19-05-14 F Anonymous We use OpenCV's implementation of brisk detector with the default settings, and for each image there are at most 8K keypoints. For each keypoint, we extract a descriptor via a CNN model. TBA Anonymous N/A 128 float32
D2-Net (single scale)
kp:8000, match:nn
0.2518 0.4914 0.6187 0.4915 0.4482 0.3186 0.3673 0.3340 0.5189 0.5070 0.1321 0.4072 19-05-07 F Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: N/A 512 float32
D2-Net (multiscale)
kp:8000, match:nn
0.1904 0.4486 0.6308 0.4622 0.4263 0.3543 0.3746 0.3690 0.5083 0.5044 0.1323 0.4001 19-05-07 F Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: N/A 512 float32
D2-Net (single scale, no PT dataset)
kp:8000, match:nn
0.2522 0.4901 0.6258 0.4932 0.4486 0.3217 0.3760 0.3284 0.5127 0.5209 0.1431 0.4102 19-06-01 F Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: N/A 512 float32
D2-Net (multi-scale, no PT dataset)
kp:8000, match:nn
0.1962 0.4584 0.6617 0.4545 0.4435 0.3182 0.3976 0.3394 0.4576 0.5162 0.1208 0.3967 19-06-05 F Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: N/A 512 float32
kp:1024, match:nn
0.0858 0.2017 0.2292 0.1254 0.1748 0.0330 0.0476 0.1275 0.0621 0.2125 0.0602 0.1236 19-05-05 F Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: N/A 40 float32
kp:2048, match:nn
0.1014 0.2859 0.2730 0.1300 0.2152 0.0592 0.1378 0.1566 0.1365 0.2347 0.0615 0.1629 19-05-05 F Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: N/A 40 float32
kp:256, match:nn
0.0001 0.0019 0.0148 0.0195 0.0075 0.0010 %!f(int64=0000) 0.0223 %!f(int64=0000) 0.0005 0.0112 0.0072 19-05-05 F Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: N/A 40 float32
kp:512, match:nn
0.0409 0.0558 0.0850 0.0988 0.0773 0.0140 0.0010 0.0895 0.0018 0.1425 0.0375 0.0585 19-05-05 F Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: N/A 40 float32
kp:512, match:nn
0.0449 0.0499 0.1188 0.1584 0.0372 0.0190 0.0010 0.1209 0.0985 0.0742 0.0002 0.0657 19-05-07 F Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. TBA Anonymous N/A 256 float32
kp:512, match:nn
0.0499 0.0654 0.1666 0.2027 0.0645 0.0207 0.0024 0.1424 0.0889 0.1297 0.0009 0.0849 19-05-09 F Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. TBA Anonymous N/A 256 float32
kp:512, match:nn
0.0933 0.1873 0.2702 0.2906 0.2627 0.0799 0.0148 0.2254 0.2236 0.1925 0.0055 0.1678 19-04-26 F Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are HOG (as in SIFT). N/A Anonymous N/A 128 uint8
SIFT + GeoDesc
kp:2048, match:nn
0.2347 0.5479 0.6881 0.2262 0.4348 0.2438 0.2070 0.3115 0.6402 0.5461 0.1021 0.3802 19-05-19 F Challenge organizers GeoDesc extracted on SIFT keypoints. Feature matching with brute-force nearest-neighbour search. N/A 128 float32
SIFT + HardNet
kp:2048, match:nn
0.2313 0.5550 0.6536 0.2170 0.4446 0.2620 0.1933 0.3006 0.6428 0.5339 0.1218 0.3778 19-05-19 F Challenge organizers HardNet extracted on SIFT keypoints. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger patches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. N/A 128 float32
kp:8000, match:sGOr2f
0.3285 0.6044 0.7392 0.5208 0.6511 0.4007 0.3091 0.4042 0.7566 0.6161 0.2137 0.5040 19-05-23 F Fabio Bellavia and Carlo Colombo HarrisZ keypoints and Root squared (like RootSIFT) sGLOH2 descriptors using sGOr2f* matching strategy. Distance table entries lower than their respective row-wise and column-wise averages are discarded and then matching pairs are computed using the greedy nearest neighbour as in the WISW@CAIP2019 contest. Keypoints and matches are ordered according to their ranks (best ones first). N/A 256 float32
HesAffNet - HardNet2
kp:8000, match:nn
0.4287 0.6316 0.7908 0.4149 0.6392 0.3792 0.4449 0.4937 0.7711 0.6565 0.1619 0.5284 19-05-29 F Milan Pultar, Dmytro Mishkin, Jiří Matas Detector: Hessian + AffNet(affine shape) + OriNet (orientation), Code and weight from, Paper:; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train:, Paper:; DeepNets were plugged into MODS framework, but without view synthesis and matching; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. TBA N/A 128 uint8
Hessian - HardNet2
kp:8000, match:nn
0.4294 0.5902 0.7407 0.3967 0.6356 0.4294 0.4477 0.5004 0.5997 0.6336 0.1572 0.5055 19-05-30 F Milan Pultar, Dmytro Mishkin, Jiří Matas Detector: Hessian. Gravity vector orientation is assumed; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: Paper:, DeepNets were plugged into MODS framework, but without view synthesis and matching; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. TBA N/A 128 uint8
ORB (OpenCV)
kp:8000, match:nn
0.1587 0.3061 0.3016 0.1642 0.2027 0.1402 0.1302 0.3006 0.3802 0.3913 0.0613 0.2307 19-04-24 F Challenge organizers ORB, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. N/A TBA
Scale-invariant desc. (Log-Polar, lambda=32)
kp:8000, match:nn
0.4094 0.6191 0.7261 0.5028 0.7047 0.5206 0.4309 0.4283 0.7671 0.6324 0.1863 0.5389 19-06-25 F Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA N/A 128 float32
Scale-invariant desc. (Log-Polar, lambda=64)
kp:8000, match:nn
0.4028 0.5943 0.7361 0.5324 0.7128 0.4911 0.4473 0.4141 0.7515 0.6505 0.2032 0.5396 19-06-24 F Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA N/A 128 float32
Scale-invariant desc. (Log-Polar, lambda=96)
kp:8000, match:nn
0.3993 0.6064 0.7317 0.5410 0.7159 0.5201 0.4580 0.4270 0.7335 0.6406 0.1957 0.5427 19-06-20 F Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA N/A 128 float32
SIFT-AID (NN matcher)
kp:8000, match:nn
0.1532 0.3674 0.2183 0.2667 0.4313 0.1428 0.1834 0.3325 0.4663 0.3444 0.0502 0.2688 19-05-10 F Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: Code: N/A 6272 bits
SIFT-AID (custom matcher)
kp:8000, match:sift-aid
0.1351 0.3428 0.1588 0.2417 0.4441 0.1367 0.1903 0.3177 0.4562 0.3357 0.0491 0.2553 19-04-29 F/M Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: Code: N/A 6272 bits
SIFT + ContextDesc
kp:8000, match:nn
0.3339 0.6322 0.7204 0.5712 0.7115 0.4788 0.4852 0.4540 0.7826 0.5961 0.1735 0.5399 19-05-09 F Zixin Luo ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. N/A 128 uint8
kp:8000, match:nn
0.3768 0.5872 0.7283 0.5217 0.6756 0.4511 0.4682 0.4026 0.7268 0.5668 0.1371 0.5129 19-05-28 F Zixin Luo, Jiahui Zhang Dense-ContextDesc is a variant of ContextDesc, where descriptors are densely extracted from full images, instead of image patches, while other settings stay unchanged as original ContextDesc. We find Dense-ContextDesc performs better regarding in particular illumination changes. Dense-ContextDesc is extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. The model will be available on the authors' GitHub page. TBA N/A TBA
SIFT + ContextDesc + Inlier Classification V2
kp:8000, match:custom
0.6071 0.8317 0.8542 0.8174 0.8809 0.6305 0.6854 0.8043 0.9044 0.8776 0.2345 0.7389 19-05-28 F/M Dawei Sun, Zixin Luo, Jiahui Zhang We use the SIFT detector and ContextDesc descriptor, and then we train an improved inlier classification and fundamental matrix estimation network based on [Yi et al. CVPR2018] ( N/A 128 float32
kp:8000, match:nn
0.3375 0.6042 0.6834 0.4911 0.7246 0.4662 0.4772 0.4788 0.7983 0.6165 0.1497 0.5298 19-05-08 F Zixin Luo GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features extracted from the code provided by the authors. N/A 128 float32
SIFT + GeoDesc
kp:8000, match:nn
0.4169 0.6473 0.6744 0.4696 0.7138 0.4907 0.4137 0.4632 0.7740 0.6234 0.1616 0.5317 19-04-24 F Challenge organizers GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. N/A 128 float32
SIFT + HardNet
kp:8000, match:nn
0.4090 0.6438 0.7726 0.4702 0.6945 0.5106 0.4612 0.4477 0.8030 0.6376 0.1794 0.5481 19-04-24 F Challenge organizers HardNet extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. N/A 128 float32
SIFT + L2-Net
kp:8000, match:nn
0.3292 0.5935 0.7109 0.4380 0.7093 0.4667 0.4000 0.4192 0.7682 0.5832 0.1778 0.5087 19-04-24 F Challenge organizers L2-Net extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. N/A 128 float32
SIFT + ContextDesc + Inlier Classification V1
kp:8000, match:custom
0.5711 0.8059 0.8677 0.7937 0.8609 0.6222 0.6156 0.7760 0.8970 0.8567 0.2190 0.7169 19-05-29 F/M Dawei Sun, Zixin Luo, Jiahui Zhang We use the SIFT detector and ContextDesc descriptor, and then we train an inlier classification and fundamental matrix estimation network using the architecture of [Yi et al. CVPR2018] ( N/A 128 float32
kp:8000, match:nn
0.2714 0.5530 0.4768 0.3859 0.6433 0.3181 0.2797 0.4216 0.6002 0.4979 0.1131 0.4146 19-04-24 F Challenge organizers SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. N/A 128 float32
SIFT + TFeat
kp:8000, match:nn
0.3304 0.5315 0.6238 0.4131 0.6819 0.4022 0.3557 0.3929 0.6857 0.5503 0.1402 0.4643 19-04-24 F Challenge organizers T-Feat extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. N/A 128 float32
kp:2048, match:nn
0.1458 0.4113 0.4038 0.1747 0.2606 0.1601 0.0645 0.1915 0.4469 0.3590 0.0652 0.2439 19-05-17 F Challenge organizers SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. N/A 128 float32
Superpoint (nn matcher)
kp:2048, match:nn
0.3344 0.5417 0.7106 0.5541 0.6105 0.3891 0.2737 0.3591 0.7074 0.6131 0.1618 0.4778 19-06-07 F Challenge organizers SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. TBA N/A 256 float32
Superpoint (1:1 matcher)
kp:2048, match:nn1to1
0.4071 0.6746 0.7953 0.6602 0.6507 0.4309 0.3033 0.4084 0.7685 0.7030 0.1814 0.5440 19-06-07 F Challenge organizers SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. TBA N/A 256 float32
SuperPoint (default)
kp:2048, match:nn
0.3763 0.3927 0.6357 0.5936 0.5046 0.3569 0.3163 0.3722 0.4853 0.5128 0.1470 0.4267 19-04-24 F Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned (about 1200 on average). Feature matching done with brute-force nearest-neighbour search. N/A 256 float32
kp:1024, match:nn
0.3752 0.3704 0.6258 0.6135 0.4946 0.3670 0.2659 0.3370 0.4623 0.5174 0.1302 0.4145 19-04-26 F Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. N/A 256 float32
kp:2048, match:nn
0.3557 0.3989 0.5727 0.5711 0.5072 0.3859 0.3269 0.3598 0.4956 0.5199 0.1501 0.4222 19-04-26 F Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. N/A 256 float32
kp:256, match:nn
0.1089 0.0256 0.2964 0.3699 0.0160 0.1125 0.0002 0.1210 0.2019 0.1638 0.0016 0.1289 19-04-26 F Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. N/A 256 float32
kp:512, match:nn
0.3242 0.1806 0.5745 0.5669 0.3108 0.2704 0.0990 0.2769 0.3634 0.4048 0.0585 0.3118 19-04-26 F Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. N/A 256 float32
kp:8000, match:nn
0.3283 0.3907 0.5753 0.4620 0.4401 0.3203 0.3695 0.3651 0.4828 0.4707 0.1208 0.3932 19-04-26 F Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. N/A 256 float32
Scale-invariant desc. (Cartesian, lambda=16)
kp:8000, match:nn
0.3459 0.6743 0.7612 0.3762 0.7104 0.4867 0.3722 0.3781 0.8166 0.6483 0.1591 0.5208 19-07-29 F Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. (This is a baseline where we use cartesian patches instead.) Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA N/A 128 float32
SuperPoint (trained on coco + phototourism training set)
kp:2048, match:nn
0.4057 0.5237 0.6628 0.5740 0.5633 0.3562 0.3349 0.3850 0.6564 0.5875 0.1590 0.4735 19-05-30 F Daniel DeTone, Paul Sarlin, Tomasz Malisiewicz, Andrew Rabinovich SuperPoint V1 model trained on COCO homographic warps at VGA resolution, plus pairs from the phototourism training set using the GT poses and depths for correspondence. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned. N/A 256 float32
SuperPoint + Custom Matcher (v1)
kp:2048, match:custom
0.7052 0.6671 0.6918 0.8131 0.7382 0.5033 0.3152 0.6750 0.7949 0.7887 0.2596 0.6320 19-05-30 F/M Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich Features are extracted by a modified SuperPoint: - retrained on the phototourism training set; - keypoints refinement; - better descriptor sampling; - adjusted thresholds. A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). TBA N/A 256 float32
SuperPoint + Custom Matcher (v2)
kp:2048, match:custom
0.6989 0.6873 0.6997 0.8126 0.7504 0.5183 0.3557 0.7210 0.7942 0.8112 0.2540 0.6458 19-05-28 F/M Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich Features are extracted by a modified SuperPoint: - retrained on the phototourism training set, - keypoints refinement, - better descriptor sampling, - adjusted thresholds; A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). TBA N/A 256 float32
kp:8000, match:nn
0.1458 0.3515 0.3862 0.2207 0.4503 0.2126 0.2072 0.3438 0.4977 0.4159 0.0760 0.3007 19-04-24 F Challenge organizers SURF, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. N/A TBA
SIFT + ContextDesc
kp:8000, match:nn1to1
0.3075 0.7850 0.8314 0.6763 0.7492 0.5393 0.5014 0.4667 0.8463 0.7185 0.1965 0.6017 19-06-07 F Challenge organizers ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with nearest-neighbour search, enforcing cross-match consistency. Features are quantized to uint8 and extracted from the code provided by the authors. N/A TBA

Results for individual sequences:

MVS — sequence 'british_museum'
Method Date Type Ims (%) #Pts SR TL mAP5o mAP10o mAP15o mAP20o mAP25o ATE By Details Link Contact Updated Descriptor size
kp:8000, match:nn
19-04-24 F 99.1 4960.4 99.5 3.76 0.1442 0.2674 0.3576 0.4264 0.4881 Challenge organizers AKAZE, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. N/A TBA
kp:8000, match:nn
19-04-26 F 99.4 3516.7 99.2 3.30 0.1580 0.2773 0.3772 0.4609 0.5263 Anonymous We use OpenCV's implementation of AKAZE detector, and for each keypoint we extract a descriptor via CNN. N/A Anonymous N/A 256 float32
Brisk + SSS
kp:8000, match:nn
19-05-14 F 98.9 2095.0 98.0 3.06 0.0102 0.0410 0.0844 0.1457 0.2142 Anonymous We use OpenCV's implementation of brisk detector with the default settings, and for each image there are at most 8K keypoints. For each keypoint, we extract a descriptor via a CNN model. TBA Anonymous N/A 128 float32
D2-Net (single scale)
kp:8000, match:nn
19-05-07 F 95.8 3457.5 94.5 2.89 0.0700 0.1689 0.2518 0.3336 0.4026 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: N/A 512 float32
D2-Net (multiscale)
kp:8000, match:nn
19-05-07 F 96.3 3622.8 95.5 2.85 0.0386 0.1086 0.1904 0.2725 0.3439 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: N/A 512 float32
D2-Net (single scale, no PT dataset)
kp:8000, match:nn
19-06-01 F 95.8 3445.4 94.8 2.88 0.0663 0.1633 0.2522 0.3313 0.4039 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: N/A 512 float32
D2-Net (multi-scale, no PT dataset)
kp:8000, match:nn
19-06-05 F 95.9 3611.2 95.0 2.83 0.0336 0.1068 0.1962 0.2832 0.3553 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: N/A 512 float32
kp:1024, match:nn
19-05-05 F 91.7 804.0 89.5 2.33 0.0149 0.0464 0.0858 0.1385 0.1972 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: N/A 40 float32
kp:2048, match:nn
19-05-05 F 94.2 1517.5 92.2 2.29 0.0165 0.0519 0.1014 0.1625 0.2240 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: N/A 40 float32
kp:256, match:nn
19-05-05 F 25.9 55.0 7.0 1.76 0.0000 0.0001 0.0001 0.0002 0.0004 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: N/A 40 float32
kp:512, match:nn
19-05-05 F 79.2 351.5 70.0 2.30 0.0044 0.0191 0.0409 0.0704 0.1062 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: N/A 40 float32
kp:512, match:nn
19-05-07 F 82.1 277.2 66.8 2.73 0.0055 0.0210 0.0449 0.0785 0.1183 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. TBA Anonymous N/A 256 float32
kp:512, match:nn
19-05-09 F 81.4 267.9 73.8 2.82 0.0077 0.0258 0.0499 0.0899 0.1269 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. TBA Anonymous N/A 256 float32
kp:512, match:nn
19-04-26 F 87.6 294.0 70.0 2.90 0.0169 0.0540 0.0933 0.1398 0.1870 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are HOG (as in SIFT). N/A Anonymous N/A 128 uint8
SIFT + GeoDesc
kp:2048, match:nn
19-05-19 F 99.0 1768.9 99.5 3.31 0.0849 0.1626 0.2347 0.3106 0.3816 Challenge organizers GeoDesc extracted on SIFT keypoints. Feature matching with brute-force nearest-neighbour search. N/A 128 float32
SIFT + HardNet
kp:2048, match:nn
19-05-19 F 99.1 2019.6 99.8 3.19 0.0778 0.1531 0.2313 0.3113 0.3792 Challenge organizers HardNet extracted on SIFT keypoints. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger patches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. N/A 128 float32
kp:8000, match:sGOr2f
19-05-23 F 99.4 1230.2 99.0 3.31 0.1306 0.2388 0.3285 0.4143 0.4816 Fabio Bellavia and Carlo Colombo HarrisZ keypoints and Root squared (like RootSIFT) sGLOH2 descriptors using sGOr2f* matching strategy. Distance table entries lower than their respective row-wise and column-wise averages are discarded and then matching pairs are computed using the greedy nearest neighbour as in the WISW@CAIP2019 contest. Keypoints and matches are ordered according to their ranks (best ones first). N/A 256 float32
HesAffNet - HardNet2
kp:8000, match:nn
19-05-29 F 99.5 5718.4 100.0 3.60 0.1898 0.3316 0.4287 0.5047 0.5667 Milan Pultar, Dmytro Mishkin, Jiří Matas Detector: Hessian + AffNet(affine shape) + OriNet (orientation), Code and weight from, Paper:; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train:, Paper:; DeepNets were plugged into MODS framework, but without view synthesis and matching; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. TBA N/A 128 uint8
Hessian - HardNet2
kp:8000, match:nn
19-05-30 F 97.5 6175.5 96.0 3.63 0.2179 0.3460 0.4294 0.4954 0.5531 Milan Pultar, Dmytro Mishkin, Jiří Matas Detector: Hessian. Gravity vector orientation is assumed; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: Paper:, DeepNets were plugged into MODS framework, but without view synthesis and matching; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. TBA N/A 128 uint8
ORB (OpenCV)
kp:8000, match:nn
19-04-24 F 97.5 3255.1 94.0 3.39 0.0334 0.0898 0.1587 0.2268 0.2955 Challenge organizers ORB, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. N/A TBA
Scale-invariant desc. (Log-Polar, lambda=32)
kp:8000, match:nn
19-06-25 F 99.5 7420.1 99.2 3.35 0.2044 0.3255 0.4094 0.4839 0.5486 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA N/A 128 float32
Scale-invariant desc. (Log-Polar, lambda=64)
kp:8000, match:nn
19-06-24 F 99.6 7694.9 99.8 3.32 0.1978 0.3176 0.4028 0.4793 0.5407 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA N/A 128 float32
Scale-invariant desc. (Log-Polar, lambda=96)
kp:8000, match:nn
19-06-20 F 99.4 7787.7 99.2 3.31 0.1911 0.3166 0.3993 0.4709 0.5348 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA N/A 128 float32
SIFT-AID (NN matcher)
kp:8000, match:nn
19-05-10 F 97.6 1898.7 92.0 3.02 0.0354 0.0877 0.1532 0.2208 0.2907 Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: Code: N/A 6272 bits
SIFT-AID (custom matcher)
kp:8000, match:sift-aid
19-04-29 F/M 96.4 1816.7 92.8 2.95 0.0275 0.0754 0.1351 0.1993 0.2716 Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: Code: N/A 6272 bits
SIFT + ContextDesc
kp:8000, match:nn
19-05-09 F 99.3 6043.3 99.8 2.92 0.1531 0.2511 0.3339 0.4077 0.4683 Zixin Luo ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. N/A 128 uint8
kp:8000, match:nn
19-05-28 F 98.6 5381.4 97.2 2.97 0.1763 0.2915 0.3768 0.4434 0.5078 Zixin Luo, Jiahui Zhang Dense-ContextDesc is a variant of ContextDesc, where descriptors are densely extracted from full images, instead of image patches, while other settings stay unchanged as original ContextDesc. We find Dense-ContextDesc performs better regarding in particular illumination changes. Dense-ContextDesc is extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. The model will be available on the authors' GitHub page. TBA N/A TBA
SIFT + ContextDesc + Inlier Classification V2
kp:8000, match:custom
19-05-28 F/M 99.3 6167.1 98.0 3.12 0.3613 0.5157 0.6071 0.6787 0.7271 Dawei Sun, Zixin Luo, Jiahui Zhang We use the SIFT detector and ContextDesc descriptor, and then we train an improved inlier classification and fundamental matrix estimation network based on [Yi et al. CVPR2018] ( N/A 128 float32
kp:8000, match:nn
19-05-08 F 99.4 4407.3 99.2 3.14 0.1621 0.2620 0.3375 0.4073 0.4742 Zixin Luo GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features extracted from the code provided by the authors. N/A 128 float32
SIFT + GeoDesc
kp:8000, match:nn
19-04-24 F 99.5 5722.5 100.0 3.45 0.2066 0.3316 0.4169 0.4933 0.5506 Challenge organizers GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. N/A 128 float32
SIFT + HardNet
kp:8000, match:nn
19-04-24 F 99.6 6837.1 100.0 3.38 0.2024 0.3305 0.4090 0.4957 0.5559 Challenge organizers HardNet extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. N/A 128 float32
SIFT + L2-Net
kp:8000, match:nn
19-04-24 F 99.6 6314.7 100.0 3.22 0.1492 0.2527 0.3292 0.4030 0.4699 Challenge organizers L2-Net extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. N/A 128 float32
SIFT + ContextDesc + Inlier Classification V1
kp:8000, match:custom
19-05-29 F/M 99.5 6170.7 99.8 3.08 0.3315 0.4759 0.5711 0.6451 0.7017 Dawei Sun, Zixin Luo, Jiahui Zhang We use the SIFT detector and ContextDesc descriptor, and then we train an inlier classification and fundamental matrix estimation network using the architecture of [Yi et al. CVPR2018] ( N/A 128 float32
kp:8000, match:nn
19-04-24 F 98.1 4336.2 97.2 3.25 0.1122 0.2007 0.2714 0.3424 0.4135 Challenge organizers SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. N/A 128 float32
SIFT + TFeat
kp:8000, match:nn
19-04-24 F 99.1 5718.0 99.8 3.20 0.1452 0.2462 0.3304 0.4033 0.4724 Challenge organizers T-Feat extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. N/A 128 float32
kp:2048, match:nn
19-05-17 F 97.9 1573.7 95.5 3.48 0.0411 0.0924 0.1458 0.2164 0.2877 Challenge organizers SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. N/A 128 float32
Superpoint (nn matcher)
kp:2048, match:nn
19-06-07 F 99.2 1563.9 97.5 3.36 0.1320 0.2481 0.3344 0.4009 0.4741 Challenge organizers SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. TBA N/A 256 float32
Superpoint (1:1 matcher)
kp:2048, match:nn1to1
19-06-07 F 99.4 1479.1 98.5 3.50 0.1942 0.3205 0.4071 0.4807 0.5482 Challenge organizers SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. TBA N/A 256 float32
SuperPoint (default)
kp:2048, match:nn
19-04-24 F 95.6 1003.0 93.5 3.33 0.1745 0.2897 0.3763 0.4470 0.5024 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned (about 1200 on average). Feature matching done with brute-force nearest-neighbour search. N/A 256 float32
kp:1024, match:nn
19-04-26 F 95.9 915.3 93.5 3.28 0.1755 0.2912 0.3752 0.4456 0.4965 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. N/A 256 float32
kp:2048, match:nn
19-04-26 F 95.2 1634.1 94.8 3.26 0.1567 0.2703 0.3557 0.4228 0.4766 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. N/A 256 float32
kp:256, match:nn
19-04-26 F 75.9 173.6 47.5 2.92 0.0402 0.0793 0.1089 0.1323 0.1505 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. N/A 256 float32
kp:512, match:nn
19-04-26 F 93.2 457.5 87.5 3.14 0.1349 0.2415 0.3242 0.3830 0.4355 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. N/A 256 float32
kp:8000, match:nn
19-04-26 F 96.2 5766.0 94.5 3.25 0.1269 0.2422 0.3283 0.3987 0.4625 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. N/A 256 float32
Scale-invariant desc. (Cartesian, lambda=16)
kp:8000, match:nn
19-07-29 F 99.4 7869.6 99.8 3.33 0.1557 0.2578 0.3459 0.4145 0.4826 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. (This is a baseline where we use cartesian patches instead.) Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA N/A 128 float32
SuperPoint (trained on coco + phototourism training set)
kp:2048, match:nn
19-05-30 F 98.8 1100.9 97.2 3.50 0.1885 0.3207 0.4057 0.4768 0.5428 Daniel DeTone, Paul Sarlin, Tomasz Malisiewicz, Andrew Rabinovich SuperPoint V1 model trained on COCO homographic warps at VGA resolution, plus pairs from the phototourism training set using the GT poses and depths for correspondence. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned. N/A 256 float32
SuperPoint + Custom Matcher (v1)
kp:2048, match:custom
19-05-30 F/M 99.3 1092.9 98.5 3.81 0.4603 0.6270 0.7052 0.7533 0.7989 Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich Features are extracted by a modified SuperPoint: - retrained on the phototourism training set; - keypoints refinement; - better descriptor sampling; - adjusted thresholds. A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). TBA N/A 256 float32
SuperPoint + Custom Matcher (v2)
kp:2048, match:custom
19-05-28 F/M 99.3 1258.2 99.0 3.89 0.4530 0.6140 0.6989 0.7556 0.8001 Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich Features are extracted by a modified SuperPoint: - retrained on the phototourism training set, - keypoints refinement, - better descriptor sampling, - adjusted thresholds; A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). TBA N/A 256 float32
kp:8000, match:nn
19-04-24 F 97.8 3199.3 94.2 2.89 0.0386 0.0920 0.1458 0.2112 0.2830 Challenge organizers SURF, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. N/A TBA
SIFT + ContextDesc
kp:8000, match:nn1to1
19-06-07 F 99.5 6693.3 100.0 2.97 0.1253 0.2196 0.3075 0.3843 0.4521 Challenge organizers ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with nearest-neighbour search, enforcing cross-match consistency. Features are quantized to uint8 and extracted from the code provided by the authors. N/A TBA

MVS — sequence 'florence_cathedral_side'
Method Date Type Ims (%) #Pts SR TL mAP5o mAP10o mAP15o mAP20o mAP25o ATE By Details Link Contact Updated Descriptor size
kp:8000, match:nn
19-04-24 F 92.3 4821.8 81.7 3.06 0.3638 0.4103 0.4385 0.4615 0.4841 Challenge organizers AKAZE, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. N/A TBA
kp:8000, match:nn
19-04-26 F 97.8 7748.9 95.2 3.16 0.4741 0.5381 0.5865 0.6261 0.6657 Anonymous We use OpenCV's implementation of AKAZE detector, and for each keypoint we extract a descriptor via CNN. N/A Anonymous N/A 256 float32
Brisk + SSS
kp:8000, match:nn
19-05-14 F 93.7 3959.1 87.2 2.60 0.2233 0.2980 0.3523 0.4039 0.4457 Anonymous We use OpenCV's implementation of brisk detector with the default settings, and for each image there are at most 8K keypoints. For each keypoint, we extract a descriptor via a CNN model. TBA Anonymous N/A 128 float32
D2-Net (single scale)
kp:8000, match:nn
19-05-07 F 92.5 4421.4 91.5 3.06 0.3081 0.4217 0.4914 0.5396 0.5713 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: N/A 512 float32
D2-Net (multiscale)
kp:8000, match:nn
19-05-07 F 93.0 6092.1 93.2 2.89 0.2586 0.3721 0.4486 0.5162 0.5684 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: N/A 512 float32
D2-Net (single scale, no PT dataset)
kp:8000, match:nn
19-06-01 F 91.7 4463.2 92.8 3.06 0.3054 0.4182 0.4901 0.5333 0.5709 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: N/A 512 float32
D2-Net (multi-scale, no PT dataset)
kp:8000, match:nn
19-06-05 F 92.3 5775.1 92.2 2.88 0.2732 0.3889 0.4584 0.5096 0.5530 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: N/A 512 float32
kp:1024, match:nn
19-05-05 F 85.3 588.5 74.0 2.42 0.0332 0.1237 0.2017 0.2653 0.3064 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: N/A 40 float32
kp:2048, match:nn
19-05-05 F 89.3 1200.4 81.2 2.41 0.0868 0.2067 0.2859 0.3434 0.3916 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: N/A 40 float32
kp:256, match:nn
19-05-05 F 55.7 76.4 22.8 2.26 0.0000 0.0005 0.0019 0.0044 0.0074 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: N/A 40 float32
kp:512, match:nn
19-05-05 F 72.9 239.8 57.2 2.36 0.0047 0.0225 0.0558 0.0939 0.1246 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: N/A 40 float32
kp:512, match:nn
19-05-07 F 37.4 179.8 32.8 1.87 0.0302 0.0436 0.0499 0.0542 0.0581 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. TBA Anonymous N/A 256 float32
kp:512, match:nn
19-05-09 F 64.3 194.4 40.2 2.28 0.0324 0.0521 0.0654 0.0736 0.0813 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. TBA Anonymous N/A 256 float32
kp:512, match:nn
19-04-26 F 75.3 309.9 47.2 2.82 0.1363 0.1706 0.1873 0.1945 0.2006 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are HOG (as in SIFT). N/A Anonymous N/A 128 uint8
SIFT + GeoDesc
kp:2048, match:nn
19-05-19 F 95.3 1940.1 87.2 3.14 0.4212 0.5014 0.5479 0.5849 0.6129 Challenge organizers GeoDesc extracted on SIFT keypoints. Feature matching with brute-force nearest-neighbour search. N/A 128 float32
SIFT + HardNet
kp:2048, match:nn
19-05-19 F 95.6 2123.3 88.5 3.12 0.4293 0.5089 0.5550 0.5946 0.6318 Challenge organizers HardNet extracted on SIFT keypoints. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger patches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. N/A 128 float32
kp:8000, match:sGOr2f
19-05-23 F 97.5 2091.2 91.5 3.20 0.4790 0.5510 0.6044 0.6469 0.6828 Fabio Bellavia and Carlo Colombo HarrisZ keypoints and Root squared (like RootSIFT) sGLOH2 descriptors using sGOr2f* matching strategy. Distance table entries lower than their respective row-wise and column-wise averages are discarded and then matching pairs are computed using the greedy nearest neighbour as in the WISW@CAIP2019 contest. Keypoints and matches are ordered according to their ranks (best ones first). N/A 256 float32
HesAffNet - HardNet2
kp:8000, match:nn
19-05-29 F 97.7 6265.0 94.8 3.44 0.5418 0.5993 0.6316 0.6680 0.7056 Milan Pultar, Dmytro Mishkin, Jiří Matas Detector: Hessian + AffNet(affine shape) + OriNet (orientation), Code and weight from, Paper:; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train:, Paper:; DeepNets were plugged into MODS framework, but without view synthesis and matching; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. TBA N/A 128 uint8
Hessian - HardNet2
kp:8000, match:nn
19-05-30 F 94.6 6460.8 92.2 3.53 0.4991 0.5524 0.5902 0.6203 0.6526 Milan Pultar, Dmytro Mishkin, Jiří Matas Detector: Hessian. Gravity vector orientation is assumed; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: Paper:, DeepNets were plugged into MODS framework, but without view synthesis and matching; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. TBA N/A 128 uint8
ORB (OpenCV)
kp:8000, match:nn
19-04-24 F 87.7 3858.0 71.0 2.64 0.2375 0.2792 0.3061 0.3266 0.3509 Challenge organizers ORB, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. N/A TBA
Scale-invariant desc. (Log-Polar, lambda=32)
kp:8000, match:nn
19-06-25 F 98.4 7558.2 95.5 3.37 0.4973 0.5718 0.6191 0.6604 0.6953 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA N/A 128 float32
Scale-invariant desc. (Log-Polar, lambda=64)
kp:8000, match:nn
19-06-24 F 98.0 7589.5 94.8 3.34 0.4923 0.5526 0.5943 0.6297 0.6652 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA N/A 128 float32
Scale-invariant desc. (Log-Polar, lambda=96)
kp:8000, match:nn
19-06-20 F 98.8 7655.2 94.8 3.33 0.5094 0.5654 0.6064 0.6459 0.6754 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA N/A 128 float32
SIFT-AID (NN matcher)
kp:8000, match:nn
19-05-10 F 89.4 4132.4 78.0 2.76 0.2820 0.3401 0.3674 0.3938 0.4214 Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: Code: N/A 6272 bits
SIFT-AID (custom matcher)
kp:8000, match:sift-aid
19-04-29 F/M 87.3 3781.1 74.8 2.66 0.2627 0.3147 0.3428 0.3638 0.3850 Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: Code: N/A 6272 bits
SIFT + ContextDesc
kp:8000, match:nn
19-05-09 F 98.5 7259.7 96.0 3.26 0.5283 0.5921 0.6322 0.6655 0.6918 Zixin Luo ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. N/A 128 uint8
kp:8000, match:nn
19-05-28 F 96.3 5673.6 93.0 3.27 0.4892 0.5437 0.5872 0.6208 0.6492 Zixin Luo, Jiahui Zhang Dense-ContextDesc is a variant of ContextDesc, where descriptors are densely extracted from full images, instead of image patches, while other settings stay unchanged as original ContextDesc. We find Dense-ContextDesc performs better regarding in particular illumination changes. Dense-ContextDesc is extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. The model will be available on the authors' GitHub page. TBA N/A TBA
SIFT + ContextDesc + Inlier Classification V2
kp:8000, match:custom
19-05-28 F/M 99.7 7276.0 96.8 3.56 0.7187 0.7936 0.8317 0.8501 0.8687 Dawei Sun, Zixin Luo, Jiahui Zhang We use the SIFT detector and ContextDesc descriptor, and then we train an improved inlier classification and fundamental matrix estimation network based on [Yi et al. CVPR2018] ( N/A 128 float32
kp:8000, match:nn
19-05-08 F 98.0 6651.5 95.2 3.32 0.5159 0.5732 0.6042 0.6374 0.6741 Zixin Luo GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features extracted from the code provided by the authors. N/A 128 float32
SIFT + GeoDesc
kp:8000, match:nn
19-04-24 F 98.3 7094.6 95.8 3.50 0.5467 0.6084 0.6473 0.6791 0.7146 Challenge organizers GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. N/A 128 float32
SIFT + HardNet
kp:8000, match:nn
19-04-24 F 98.8 7774.4 96.8 3.46 0.5471 0.6084 0.6438 0.6724 0.7028 Challenge organizers HardNet extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. N/A 128 float32
SIFT + L2-Net
kp:8000, match:nn
19-04-24 F 98.0 7250.3 96.5 3.33 0.5012 0.5587 0.5935 0.6245 0.6627 Challenge organizers L2-Net extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. N/A 128 float32
SIFT + ContextDesc + Inlier Classification V1
kp:8000, match:custom
19-05-29 F/M 99.1 7224.6 97.2 3.53 0.7153 0.7766 0.8059 0.8325 0.8581 Dawei Sun, Zixin Luo, Jiahui Zhang We use the SIFT detector and ContextDesc descriptor, and then we train an inlier classification and fundamental matrix estimation network using the architecture of [Yi et al. CVPR2018] ( N/A 128 float32
kp:8000, match:nn
19-04-24 F 96.7 6040.3 92.5 3.31 0.4774 0.5243 0.5530 0.5861 0.6184 Challenge organizers SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. N/A 128 float32
SIFT + TFeat
kp:8000, match:nn
19-04-24 F 97.6 6488.0 95.0 3.26 0.4359 0.4882 0.5315 0.5723 0.6088 Challenge organizers T-Feat extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. N/A 128 float32
kp:2048, match:nn
19-05-17 F 89.0 1666.1 74.5 2.97 0.3189 0.3770 0.4113 0.4356 0.4592 Challenge organizers SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. N/A 128 float32
Superpoint (nn matcher)
kp:2048, match:nn
19-06-07 F 96.0 1862.0 87.0 3.53 0.4306 0.5019 0.5417 0.5752 0.6014 Challenge organizers SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. TBA N/A 256 float32
Superpoint (1:1 matcher)
kp:2048, match:nn1to1
19-06-07 F 96.8 1880.9 90.8 3.70 0.5507 0.6319 0.6746 0.7049 0.7246 Challenge organizers SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. TBA N/A 256 float32
SuperPoint (default)
kp:2048, match:nn
19-04-24 F 88.7 1320.5 80.5 3.28 0.3088 0.3606 0.3927 0.4212 0.4471 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned (about 1200 on average). Feature matching done with brute-force nearest-neighbour search. N/A 256 float32
kp:1024, match:nn
19-04-26 F 85.5 850.7 74.5 3.13 0.2989 0.3428 0.3704 0.3905 0.4057 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. N/A 256 float32
kp:2048, match:nn
19-04-26 F 89.5 1528.8 82.8 3.29 0.3072 0.3622 0.3989 0.4326 0.4630 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. N/A 256 float32
kp:256, match:nn
19-04-26 F 17.9 81.3 20.5 1.31 0.0209 0.0239 0.0256 0.0267 0.0280 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. N/A 256 float32
kp:512, match:nn
19-04-26 F 73.0 355.0 55.2 2.75 0.1514 0.1694 0.1806 0.1922 0.2005 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. N/A 256 float32
kp:8000, match:nn
19-04-26 F 91.0 5043.3 89.2 3.10 0.2632 0.3433 0.3907 0.4282 0.4653 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. N/A 256 float32
Scale-invariant desc. (Cartesian, lambda=16)
kp:8000, match:nn
19-07-29 F 98.9 8281.7 97.0 3.49 0.5546 0.6281 0.6743 0.7062 0.7327 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. (This is a baseline where we use cartesian patches instead.) Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA N/A 128 float32
SuperPoint (trained on coco + phototourism training set)
kp:2048, match:nn
19-05-30 F 94.7 1537.1 84.5 3.38 0.4163 0.4815 0.5237 0.5539 0.5838 Daniel DeTone, Paul Sarlin, Tomasz Malisiewicz, Andrew Rabinovich SuperPoint V1 model trained on COCO homographic warps at VGA resolution, plus pairs from the phototourism training set using the GT poses and depths for correspondence. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned. N/A 256 float32
SuperPoint + Custom Matcher (v1)
kp:2048, match:custom
19-05-30 F/M 97.5 1587.8 90.2 3.73 0.5526 0.6232 0.6670 0.6939 0.7185 Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich Features are extracted by a modified SuperPoint: - retrained on the phototourism training set; - keypoints refinement; - better descriptor sampling; - adjusted thresholds. A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). TBA N/A 256 float32
SuperPoint + Custom Matcher (v2)
kp:2048, match:custom
19-05-28 F/M 97.4 1657.0 92.8 3.80 0.5767 0.6521 0.6873 0.7179 0.7447 Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich Features are extracted by a modified SuperPoint: - retrained on the phototourism training set, - keypoints refinement, - better descriptor sampling, - adjusted thresholds; A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). TBA N/A 256 float32
kp:8000, match:nn
19-04-24 F 88.6 3721.2 72.2 2.72 0.2854 0.3243 0.3515 0.3740 0.3960 Challenge organizers SURF, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. N/A TBA
SIFT + ContextDesc
kp:8000, match:nn1to1
19-06-07 F 98.8 7951.4 95.8 3.48 0.6800 0.7519 0.7850 0.8047 0.8217 Challenge organizers ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with nearest-neighbour search, enforcing cross-match consistency. Features are quantized to uint8 and extracted from the code provided by the authors. N/A TBA

MVS — sequence 'lincoln_memorial_statue'
Method Date Type Ims (%) #Pts SR TL mAP5o mAP10o mAP15o mAP20o mAP25o ATE By Details Link Contact Updated Descriptor size
kp:8000, match:nn
19-04-24 F 95.9 5196.5 87.5 3.21 0.4596 0.5232 0.5551 0.5818 0.6020 Challenge organizers AKAZE, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. N/A TBA
kp:8000, match:nn
19-04-26 F 97.6 1076.8 93.2 4.25 0.5305 0.6167 0.6595 0.6920 0.7132 Anonymous We use OpenCV's implementation of AKAZE detector, and for each keypoint we extract a descriptor via CNN. N/A Anonymous N/A 256 float32
Brisk + SSS
kp:8000, match:nn
19-05-14 F 90.9 793.1 87.2 3.12 0.0365 0.0796 0.1150 0.1471 0.1801 Anonymous We use OpenCV's implementation of brisk detector with the default settings, and for each image there are at most 8K keypoints. For each keypoint, we extract a descriptor via a CNN model. TBA Anonymous N/A 128 float32
D2-Net (single scale)
kp:8000, match:nn
19-05-07 F 94.3 4540.1 94.8 3.05 0.4675 0.5678 0.6187 0.6494 0.6767 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: N/A 512 float32
D2-Net (multiscale)
kp:8000, match:nn
19-05-07 F 95.0 6023.8 95.5 2.94 0.4699 0.5761 0.6308 0.6671 0.7006 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: N/A 512 float32
D2-Net (single scale, no PT dataset)
kp:8000, match:nn
19-06-01 F 94.2 4643.3 93.5 3.05 0.4602 0.5636 0.6258 0.6527 0.6798 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: N/A 512 float32
D2-Net (multi-scale, no PT dataset)
kp:8000, match:nn
19-06-05 F 95.2 6163.4 95.2 2.94 0.4443 0.5866 0.6617 0.6864 0.7140 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: N/A 512 float32
kp:1024, match:nn
19-05-05 F 80.2 534.1 66.0 2.38 0.0837 0.1753 0.2292 0.2669 0.2874 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: N/A 40 float32
kp:2048, match:nn
19-05-05 F 84.3 1020.8 71.5 2.35 0.1008 0.2074 0.2730 0.3139 0.3425 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: N/A 40 float32
kp:256, match:nn
19-05-05 F 32.0 63.5 31.2 1.73 0.0016 0.0077 0.0148 0.0205 0.0239 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: N/A 40 float32
kp:512, match:nn
19-05-05 F 69.3 201.4 53.5 2.34 0.0196 0.0570 0.0850 0.1064 0.1212 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: N/A 40 float32
kp:512, match:nn
19-05-07 F 69.3 160.9 46.0 2.87 0.0744 0.1020 0.1188 0.1301 0.1364 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. TBA Anonymous N/A 256 float32
kp:512, match:nn
19-05-09 F 73.1 195.6 53.2 2.93 0.1044 0.1440 0.1666 0.1778 0.1867 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. TBA Anonymous N/A 256 float32
kp:512, match:nn
19-04-26 F 79.6 329.7 66.5 3.09 0.1974 0.2482 0.2702 0.2898 0.3026 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are HOG (as in SIFT). N/A Anonymous N/A 128 uint8
SIFT + GeoDesc
kp:2048, match:nn
19-05-19 F 97.8 1258.5 93.2 3.95 0.5709 0.6464 0.6881 0.7169 0.7356 Challenge organizers GeoDesc extracted on SIFT keypoints. Feature matching with brute-force nearest-neighbour search. N/A 128 float32
SIFT + HardNet
kp:2048, match:nn
19-05-19 F 97.1 1362.8 92.2 3.85 0.5274 0.6036 0.6536 0.6790 0.7021 Challenge organizers HardNet extracted on SIFT keypoints. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger patches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. N/A 128 float32
kp:8000, match:sGOr2f
19-05-23 F 98.9 1277.6 97.5 3.83 0.6315 0.7058 0.7392 0.7719 0.7970 Fabio Bellavia and Carlo Colombo HarrisZ keypoints and Root squared (like RootSIFT) sGLOH2 descriptors using sGOr2f* matching strategy. Distance table entries lower than their respective row-wise and column-wise averages are discarded and then matching pairs are computed using the greedy nearest neighbour as in the WISW@CAIP2019 contest. Keypoints and matches are ordered according to their ranks (best ones first). N/A 256 float32
HesAffNet - HardNet2
kp:8000, match:nn
19-05-29 F 99.0 6822.6 99.0 4.00 0.6910 0.7537 0.7908 0.8168 0.8375 Milan Pultar, Dmytro Mishkin, Jiří Matas Detector: Hessian + AffNet(affine shape) + OriNet (orientation), Code and weight from, Paper:; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train:, Paper:; DeepNets were plugged into MODS framework, but without view synthesis and matching; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. TBA N/A 128 uint8
Hessian - HardNet2
kp:8000, match:nn
19-05-30 F 96.9 6726.0 95.8 4.04 0.6736 0.7241 0.7407 0.7595 0.7736 Milan Pultar, Dmytro Mishkin, Jiří Matas Detector: Hessian. Gravity vector orientation is assumed; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: Paper:, DeepNets were plugged into MODS framework, but without view synthesis and matching; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. TBA N/A 128 uint8
ORB (OpenCV)
kp:8000, match:nn
19-04-24 F 95.6 2357.2 91.8 3.47 0.2133 0.2653 0.3016 0.3404 0.3774 Challenge organizers ORB, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. N/A TBA
Scale-invariant desc. (Log-Polar, lambda=32)
kp:8000, match:nn
19-06-25 F 98.1 6861.5 97.2 3.47 0.6357 0.6994 0.7261 0.7485 0.7656 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA N/A 128 float32
Scale-invariant desc. (Log-Polar, lambda=64)
kp:8000, match:nn
19-06-24 F 98.1 7181.7 96.5 3.44 0.6370 0.7004 0.7361 0.7596 0.7767 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA N/A 128 float32
Scale-invariant desc. (Log-Polar, lambda=96)
kp:8000, match:nn
19-06-20 F 98.4 7138.7 96.2 3.44 0.6257 0.6971 0.7317 0.7544 0.7747 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA N/A 128 float32
SIFT-AID (NN matcher)
kp:8000, match:nn
19-05-10 F 88.9 820.2 83.2 3.42 0.1204 0.1771 0.2183 0.2457 0.2696 Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: Code: N/A 6272 bits
SIFT-AID (custom matcher)
kp:8000, match:sift-aid
19-04-29 F/M 86.8 725.6 80.2 3.21 0.0821 0.1254 0.1588 0.1892 0.2132 Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: Code: N/A 6272 bits
SIFT + ContextDesc
kp:8000, match:nn
19-05-09 F 98.8 5106.3 97.0 3.43 0.6233 0.6881 0.7204 0.7408 0.7649 Zixin Luo ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. N/A 128 uint8
kp:8000, match:nn
19-05-28 F 97.7 4447.9 94.0 3.41 0.6325 0.6999 0.7283 0.7484 0.7625 Zixin Luo, Jiahui Zhang Dense-ContextDesc is a variant of ContextDesc, where descriptors are densely extracted from full images, instead of image patches, while other settings stay unchanged as original ContextDesc. We find Dense-ContextDesc performs better regarding in particular illumination changes. Dense-ContextDesc is extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. The model will be available on the authors' GitHub page. TBA N/A TBA
SIFT + ContextDesc + Inlier Classification V2
kp:8000, match:custom
19-05-28 F/M 99.6 5099.5 98.8 3.54 0.7674 0.8225 0.8542 0.8722 0.8874 Dawei Sun, Zixin Luo, Jiahui Zhang We use the SIFT detector and ContextDesc descriptor, and then we train an improved inlier classification and fundamental matrix estimation network based on [Yi et al. CVPR2018] ( N/A 128 float32
kp:8000, match:nn
19-05-08 F 98.5 4395.5 96.8 3.52 0.5948 0.6508 0.6834 0.7094 0.7323 Zixin Luo GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features extracted from the code provided by the authors. N/A 128 float32
SIFT + GeoDesc
kp:8000, match:nn
19-04-24 F 98.1 5615.2 96.8 3.55 0.6031 0.6511 0.6744 0.6970 0.7121 Challenge organizers GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. N/A 128 float32
SIFT + HardNet
kp:8000, match:nn
19-04-24 F 98.6 6905.6 96.8 3.56 0.6800 0.7454 0.7726 0.7886 0.8055 Challenge organizers HardNet extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. N/A 128 float32
SIFT + L2-Net
kp:8000, match:nn
19-04-24 F 98.5 6278.0 94.2 3.47 0.6097 0.6791 0.7109 0.7307 0.7504 Challenge organizers L2-Net extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. N/A 128 float32
SIFT + ContextDesc + Inlier Classification V1
kp:8000, match:custom
19-05-29 F/M 99.5 5009.4 99.2 3.56 0.7708 0.8430 0.8677 0.8873 0.8984 Dawei Sun, Zixin Luo, Jiahui Zhang We use the SIFT detector and ContextDesc descriptor, and then we train an inlier classification and fundamental matrix estimation network using the architecture of [Yi et al. CVPR2018] ( N/A 128 float32
kp:8000, match:nn
19-04-24 F 94.9 3769.6 86.8 3.27 0.3983 0.4470 0.4768 0.5010 0.5233 Challenge organizers SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. N/A 128 float32
SIFT + TFeat
kp:8000, match:nn
19-04-24 F 98.0 5567.0 92.2 3.37 0.5373 0.5954 0.6238 0.6453 0.6680 Challenge organizers T-Feat extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. N/A 128 float32
kp:2048, match:nn
19-05-17 F 92.4 1593.7 85.2 3.36 0.3010 0.3615 0.4038 0.4352 0.4573 Challenge organizers SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. N/A 128 float32
Superpoint (nn matcher)
kp:2048, match:nn
19-06-07 F 96.4 1535.2 90.0 4.02 0.6261 0.6877 0.7106 0.7276 0.7416 Challenge organizers SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. TBA N/A 256 float32
Superpoint (1:1 matcher)
kp:2048, match:nn1to1
19-06-07 F 98.1 1490.6 91.2 4.17 0.7057 0.7695 0.7953 0.8085 0.8189 Challenge organizers SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. TBA N/A 256 float32
SuperPoint (default)
kp:2048, match:nn
19-04-24 F 93.6 791.2 87.2 4.40 0.5501 0.6104 0.6357 0.6530 0.6638 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned (about 1200 on average). Feature matching done with brute-force nearest-neighbour search. N/A 256 float32
kp:1024, match:nn
19-04-26 F 93.5 967.9 87.5 4.26 0.5339 0.5940 0.6258 0.6436 0.6547 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. N/A 256 float32
kp:2048, match:nn
19-04-26 F 93.1 1772.5 88.5 3.71 0.4758 0.5381 0.5727 0.5973 0.6115 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. N/A 256 float32
kp:256, match:nn
19-04-26 F 84.0 238.9 79.5 4.24 0.2156 0.2626 0.2964 0.3136 0.3343 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. N/A 256 float32
kp:512, match:nn
19-04-26 F 92.7 519.7 85.0 4.58 0.4984 0.5515 0.5745 0.5930 0.6051 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. N/A 256 float32
kp:8000, match:nn
19-04-26 F 94.4 6401.4 90.5 3.23 0.4430 0.5314 0.5753 0.6083 0.6304 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. N/A 256 float32
Scale-invariant desc. (Cartesian, lambda=16)
kp:8000, match:nn
19-07-29 F 99.1 6734.6 97.2 3.64 0.6515 0.7284 0.7612 0.7863 0.8062 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. (This is a baseline where we use cartesian patches instead.) Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA N/A 128 float32
SuperPoint (trained on coco + phototourism training set)
kp:2048, match:nn
19-05-30 F 97.1 1106.8 90.0 4.02 0.5655 0.6321 0.6628 0.6907 0.7063 Daniel DeTone, Paul Sarlin, Tomasz Malisiewicz, Andrew Rabinovich SuperPoint V1 model trained on COCO homographic warps at VGA resolution, plus pairs from the phototourism training set using the GT poses and depths for correspondence. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned. N/A 256 float32
SuperPoint + Custom Matcher (v1)
kp:2048, match:custom
19-05-30 F/M 97.5 798.1 93.2 4.59 0.5836 0.6523 0.6918 0.7197 0.7406 Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich Features are extracted by a modified SuperPoint: - retrained on the phototourism training set; - keypoints refinement; - better descriptor sampling; - adjusted thresholds. A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). TBA N/A 256 float32
SuperPoint + Custom Matcher (v2)
kp:2048, match:custom
19-05-28 F/M 96.8 1237.2 91.8 4.20 0.6141 0.6727 0.6997 0.7208 0.7390 Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich Features are extracted by a modified SuperPoint: - retrained on the phototourism training set, - keypoints refinement, - better descriptor sampling, - adjusted thresholds; A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). TBA N/A 256 float32
kp:8000, match:nn
19-04-24 F 94.5 3245.4 85.0 2.98 0.3038 0.3571 0.3862 0.4119 0.4384 Challenge organizers SURF, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. N/A TBA
SIFT + ContextDesc
kp:8000, match:nn1to1
19-06-07 F 99.4 5063.3 99.8 3.62 0.7325 0.7950 0.8314 0.8568 0.8713 Challenge organizers ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with nearest-neighbour search, enforcing cross-match consistency. Features are quantized to uint8 and extracted from the code provided by the authors. N/A TBA

MVS — sequence 'london_bridge'
Method Date Type Ims (%) #Pts SR TL mAP5o mAP10o mAP15o mAP20o mAP25o ATE By Details Link Contact Updated Descriptor size
kp:8000, match:nn
19-04-24 F 94.1 3666.5 89.2 3.27 0.2433 0.3206 0.3696 0.4181 0.4534 Challenge organizers AKAZE, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. N/A TBA
kp:8000, match:nn
19-04-26 F 96.8 2633.4 94.2 3.19 0.2211 0.3162 0.3789 0.4291 0.4686 Anonymous We use OpenCV's implementation of AKAZE detector, and for each keypoint we extract a descriptor via CNN. N/A Anonymous N/A 256 float32
Brisk + SSS
kp:8000, match:nn
19-05-14 F 94.6 2154.4 89.8 2.83 0.1277 0.2320 0.3040 0.3615 0.4028 Anonymous We use OpenCV's implementation of brisk detector with the default settings, and for each image there are at most 8K keypoints. For each keypoint, we extract a descriptor via a CNN model. TBA Anonymous N/A 128 float32
D2-Net (single scale)
kp:8000, match:nn
19-05-07 F 96.4 3939.5 97.0 3.31 0.2558 0.4064 0.4915 0.5478 0.5956 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: N/A 512 float32
D2-Net (multiscale)
kp:8000, match:nn
19-05-07 F 96.4 5022.2 97.0 3.05 0.2057 0.3689 0.4622 0.5308 0.5852 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: N/A 512 float32
D2-Net (single scale, no PT dataset)
kp:8000, match:nn
19-06-01 F 96.6 4016.4 96.5 3.30 0.2528 0.4043 0.4932 0.5578 0.6048 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: N/A 512 float32
D2-Net (multi-scale, no PT dataset)
kp:8000, match:nn
19-06-05 F 96.6 4915.3 97.0 3.05 0.1935 0.3499 0.4545 0.5304 0.5837 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: N/A 512 float32
kp:1024, match:nn
19-05-05 F 92.2 922.8 91.5 2.50 0.0184 0.0747 0.1254 0.1797 0.2241 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: N/A 40 float32
kp:2048, match:nn
19-05-05 F 94.0 1667.0 94.8 2.48 0.0207 0.0800 0.1300 0.1810 0.2237 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: N/A 40 float32
kp:256, match:nn
19-05-05 F 60.8 109.8 39.0 2.29 0.0027 0.0104 0.0195 0.0277 0.0360 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: N/A 40 float32
kp:512, match:nn
19-05-05 F 83.5 432.5 77.5 2.48 0.0154 0.0543 0.0988 0.1371 0.1746 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: N/A 40 float32
kp:512, match:nn
19-05-07 F 73.9 159.8 46.8 2.97 0.0904 0.1372 0.1584 0.1737 0.1847 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. TBA Anonymous N/A 256 float32
kp:512, match:nn
19-05-09 F 77.7 169.0 55.8 3.10 0.0979 0.1649 0.2027 0.2218 0.2363 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. TBA Anonymous N/A 256 float32
kp:512, match:nn
19-04-26 F 84.8 328.1 65.5 3.12 0.1809 0.2550 0.2906 0.3187 0.3396 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are HOG (as in SIFT). N/A Anonymous N/A 128 uint8
SIFT + GeoDesc
kp:2048, match:nn
19-05-19 F 91.6 1260.9 87.5 3.05 0.1135 0.1838 0.2262 0.2627 0.2948 Challenge organizers GeoDesc extracted on SIFT keypoints. Feature matching with brute-force nearest-neighbour search. N/A 128 float32
SIFT + HardNet
kp:2048, match:nn
19-05-19 F 92.0 1448.8 87.2 3.03 0.1051 0.1759 0.2170 0.2485 0.2815 Challenge organizers HardNet extracted on SIFT keypoints. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger patches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. N/A 128 float32
kp:8000, match:sGOr2f
19-05-23 F 96.4 1242.2 93.8 3.35 0.3447 0.4644 0.5208 0.5691 0.6083 Fabio Bellavia and Carlo Colombo HarrisZ keypoints and Root squared (like RootSIFT) sGLOH2 descriptors using sGOr2f* matching strategy. Distance table entries lower than their respective row-wise and column-wise averages are discarded and then matching pairs are computed using the greedy nearest neighbour as in the WISW@CAIP2019 contest. Keypoints and matches are ordered according to their ranks (best ones first). N/A 256 float32
HesAffNet - HardNet2
kp:8000, match:nn
19-05-29 F 96.2 4976.8 94.5 3.36 0.2634 0.3549 0.4149 0.4597 0.4984 Milan Pultar, Dmytro Mishkin, Jiří Matas Detector: Hessian + AffNet(affine shape) + OriNet (orientation), Code and weight from, Paper:; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train:, Paper:; DeepNets were plugged into MODS framework, but without view synthesis and matching; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. TBA N/A 128 uint8
Hessian - HardNet2
kp:8000, match:nn
19-05-30 F 95.7 5314.1 94.5 3.38 0.2466 0.3423 0.3967 0.4400 0.4687 Milan Pultar, Dmytro Mishkin, Jiří Matas Detector: Hessian. Gravity vector orientation is assumed; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: Paper:, DeepNets were plugged into MODS framework, but without view synthesis and matching; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. TBA N/A 128 uint8
ORB (OpenCV)
kp:8000, match:nn
19-04-24 F 88.4 2595.0 78.5 2.82 0.0736 0.1280 0.1642 0.1981 0.2300 Challenge organizers ORB, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. N/A TBA
Scale-invariant desc. (Log-Polar, lambda=32)
kp:8000, match:nn
19-06-25 F 97.8 6327.6 98.2 3.32 0.3391 0.4378 0.5028 0.5517 0.5945 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA N/A 128 float32
Scale-invariant desc. (Log-Polar, lambda=64)
kp:8000, match:nn
19-06-24 F 97.9 6574.8 98.2 3.32 0.3616 0.4671 0.5324 0.5772 0.6103 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA N/A 128 float32
Scale-invariant desc. (Log-Polar, lambda=96)
kp:8000, match:nn
19-06-20 F 97.5 6681.3 98.2 3.31 0.3863 0.4826 0.5410 0.5843 0.6152 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA N/A 128 float32
SIFT-AID (NN matcher)
kp:8000, match:nn
19-05-10 F 94.2 2102.2 89.5 3.02 0.1235 0.2106 0.2667 0.3153 0.3640 Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: Code: N/A 6272 bits
SIFT-AID (custom matcher)
kp:8000, match:sift-aid
19-04-29 F/M 93.6 2021.3 88.8 2.93 0.1019 0.1855 0.2417 0.2926 0.3357 Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: Code: N/A 6272 bits
SIFT + ContextDesc
kp:8000, match:nn
19-05-09 F 98.2 5949.2 98.0 3.28 0.4002 0.5159 0.5712 0.6138 0.6520 Zixin Luo ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. N/A 128 uint8
kp:8000, match:nn
19-05-28 F 98.1 5188.2 97.8 3.33 0.3329 0.4564 0.5217 0.5654 0.6039 Zixin Luo, Jiahui Zhang Dense-ContextDesc is a variant of ContextDesc, where descriptors are densely extracted from full images, instead of image patches, while other settings stay unchanged as original ContextDesc. We find Dense-ContextDesc performs better regarding in particular illumination changes. Dense-ContextDesc is extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. The model will be available on the authors' GitHub page. TBA N/A TBA
SIFT + ContextDesc + Inlier Classification V2
kp:8000, match:custom
19-05-28 F/M 98.7 5984.1 99.2 3.40 0.6285 0.7553 0.8174 0.8442 0.8620 Dawei Sun, Zixin Luo, Jiahui Zhang We use the SIFT detector and ContextDesc descriptor, and then we train an improved inlier classification and fundamental matrix estimation network based on [Yi et al. CVPR2018] ( N/A 128 float32
kp:8000, match:nn
19-05-08 F 98.1 4628.8 97.5 3.39 0.3420 0.4375 0.4911 0.5313 0.5713 Zixin Luo GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features extracted from the code provided by the authors. N/A 128 float32
SIFT + GeoDesc
kp:8000, match:nn
19-04-24 F 97.3 4744.3 97.0 3.39 0.3242 0.4196 0.4696 0.5165 0.5491 Challenge organizers GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. N/A 128 float32
SIFT + HardNet
kp:8000, match:nn
19-04-24 F 98.3 6062.4 97.5 3.33 0.2956 0.4020 0.4702 0.5168 0.5550 Challenge organizers HardNet extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. N/A 128 float32
SIFT + L2-Net
kp:8000, match:nn
19-04-24 F 97.5 5746.2 98.2 3.28 0.2718 0.3646 0.4380 0.4885 0.5317 Challenge organizers L2-Net extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. N/A 128 float32
SIFT + ContextDesc + Inlier Classification V1
kp:8000, match:custom
19-05-29 F/M 98.4 6129.5 99.2 3.35 0.6083 0.7361 0.7937 0.8251 0.8564 Dawei Sun, Zixin Luo, Jiahui Zhang We use the SIFT detector and ContextDesc descriptor, and then we train an inlier classification and fundamental matrix estimation network using the architecture of [Yi et al. CVPR2018] ( N/A 128 float32
kp:8000, match:nn
19-04-24 F 94.7 3614.1 87.8 3.27 0.2479 0.3303 0.3859 0.4305 0.4718 Challenge organizers SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. N/A 128 float32
SIFT + TFeat
kp:8000, match:nn
19-04-24 F 97.0 5010.7 97.0 3.27 0.2514 0.3525 0.4131 0.4563 0.4968 Challenge organizers T-Feat extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. N/A 128 float32
kp:2048, match:nn
19-05-17 F 82.3 888.5 71.5 2.87 0.0827 0.1396 0.1747 0.2032 0.2290 Challenge organizers SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. N/A 128 float32
Superpoint (nn matcher)
kp:2048, match:nn
19-06-07 F 97.5 1398.6 95.2 3.91 0.3730 0.4976 0.5541 0.6002 0.6341 Challenge organizers SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. TBA N/A 256 float32
Superpoint (1:1 matcher)
kp:2048, match:nn1to1
19-06-07 F 97.8 1336.9 95.2 4.03 0.4823 0.6033 0.6602 0.7031 0.7303 Challenge organizers SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. TBA N/A 256 float32
SuperPoint (default)
kp:2048, match:nn
19-04-24 F 96.0 922.0 92.8 3.88 0.4148 0.5400 0.5936 0.6320 0.6642 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned (about 1200 on average). Feature matching done with brute-force nearest-neighbour search. N/A 256 float32
kp:1024, match:nn
19-04-26 F 95.9 874.8 94.5 3.93 0.4395 0.5600 0.6135 0.6477 0.6756 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. N/A 256 float32
kp:2048, match:nn
19-04-26 F 96.3 1484.5 94.8 3.76 0.4041 0.5096 0.5711 0.6172 0.6479 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. N/A 256 float32
kp:256, match:nn
19-04-26 F 87.0 225.8 63.5 3.37 0.2607 0.3387 0.3699 0.3857 0.3984 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. N/A 256 float32
kp:512, match:nn
19-04-26 F 93.8 481.0 88.2 3.78 0.4028 0.5143 0.5669 0.5948 0.6117 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. N/A 256 float32
kp:8000, match:nn
19-04-26 F 96.7 4626.6 96.2 3.46 0.2205 0.3698 0.4620 0.5172 0.5649 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. N/A 256 float32
Scale-invariant desc. (Cartesian, lambda=16)
kp:8000, match:nn
19-07-29 F 98.2 6228.3 97.5 3.34 0.2030 0.3112 0.3762 0.4243 0.4616 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. (This is a baseline where we use cartesian patches instead.) Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA N/A 128 float32
SuperPoint (trained on coco + phototourism training set)
kp:2048, match:nn
19-05-30 F 97.4 1092.8 95.2 3.88 0.4075 0.5158 0.5740 0.6199 0.6544 Daniel DeTone, Paul Sarlin, Tomasz Malisiewicz, Andrew Rabinovich SuperPoint V1 model trained on COCO homographic warps at VGA resolution, plus pairs from the phototourism training set using the GT poses and depths for correspondence. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned. N/A 256 float32
SuperPoint + Custom Matcher (v1)
kp:2048, match:custom
19-05-30 F/M 98.2 1071.6 98.0 4.10 0.6167 0.7569 0.8131 0.8474 0.8664 Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich Features are extracted by a modified SuperPoint: - retrained on the phototourism training set; - keypoints refinement; - better descriptor sampling; - adjusted thresholds. A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). TBA N/A 256 float32
SuperPoint + Custom Matcher (v2)
kp:2048, match:custom
19-05-28 F/M 98.2 1316.8 97.0 4.10 0.6218 0.7594 0.8126 0.8456 0.8671 Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich Features are extracted by a modified SuperPoint: - retrained on the phototourism training set, - keypoints refinement, - better descriptor sampling, - adjusted thresholds; A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). TBA N/A 256 float32
kp:8000, match:nn
19-04-24 F 91.7 2428.3 84.5 2.89 0.0975 0.1677 0.2207 0.2657 0.3093 Challenge organizers SURF, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. N/A TBA
SIFT + ContextDesc
kp:8000, match:nn1to1
19-06-07 F 98.5 5803.5 99.2 3.45 0.4695 0.6102 0.6763 0.7138 0.7400 Challenge organizers ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with nearest-neighbour search, enforcing cross-match consistency. Features are quantized to uint8 and extracted from the code provided by the authors. N/A TBA

MVS — sequence 'milan_cathedral'
Method Date Type Ims (%) #Pts SR TL mAP5o mAP10o mAP15o mAP20o mAP25o ATE By Details Link Contact Updated Descriptor size
kp:8000, match:nn
19-04-24 F 98.0 4605.7 90.2 3.54 0.3742 0.5242 0.5935 0.6420 0.6702 Challenge organizers AKAZE, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. N/A TBA
kp:8000, match:nn
19-04-26 F 98.7 4561.8 96.0 3.04 0.3799 0.5325 0.6131 0.6570 0.7016 Anonymous We use OpenCV's implementation of AKAZE detector, and for each keypoint we extract a descriptor via CNN. N/A Anonymous N/A 256 float32
Brisk + SSS
kp:8000, match:nn
19-05-14 F 95.7 2854.1 88.2 2.77 0.1503 0.2682 0.3429 0.3976 0.4477 Anonymous We use OpenCV's implementation of brisk detector with the default settings, and for each image there are at most 8K keypoints. For each keypoint, we extract a descriptor via a CNN model. TBA Anonymous N/A 128 float32
D2-Net (single scale)
kp:8000, match:nn
19-05-07 F 96.7 4810.8 94.5 3.20 0.2055 0.3503 0.4482 0.5225 0.5721 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: N/A 512 float32
D2-Net (multiscale)
kp:8000, match:nn
19-05-07 F 97.2 6061.8 96.0 3.00 0.1761 0.3279 0.4263 0.5026 0.5647 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: N/A 512 float32
D2-Net (single scale, no PT dataset)
kp:8000, match:nn
19-06-01 F 96.7 4781.9 94.0 3.18 0.1998 0.3591 0.4486 0.5188 0.5728 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: N/A 512 float32
D2-Net (multi-scale, no PT dataset)
kp:8000, match:nn
19-06-05 F 97.5 5649.4 96.5 2.98 0.1823 0.3418 0.4435 0.5222 0.5855 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: N/A 512 float32
kp:1024, match:nn
19-05-05 F 91.1 744.9 83.0 2.49 0.0203 0.0951 0.1748 0.2460 0.3072 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: N/A 40 float32
kp:2048, match:nn
19-05-05 F 93.5 1402.3 87.5 2.48 0.0304 0.1251 0.2152 0.2912 0.3442 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: N/A 40 float32
kp:256, match:nn
19-05-05 F 35.7 86.7 22.5 1.79 0.0003 0.0028 0.0075 0.0123 0.0165 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: N/A 40 float32
kp:512, match:nn
19-05-05 F 81.4 320.6 62.5 2.45 0.0054 0.0336 0.0773 0.1190 0.1548 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: N/A 40 float32
kp:512, match:nn
19-05-07 F 62.4 166.5 32.5 2.55 0.0169 0.0303 0.0372 0.0412 0.0455 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. TBA Anonymous N/A 256 float32
kp:512, match:nn
19-05-09 F 66.8 191.2 41.0 2.69 0.0256 0.0489 0.0645 0.0749 0.0830 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. TBA Anonymous N/A 256 float32
kp:512, match:nn
19-04-26 F 84.0 321.6 60.5 3.18 0.1424 0.2233 0.2627 0.2884 0.3109 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are HOG (as in SIFT). N/A Anonymous N/A 128 uint8
SIFT + GeoDesc
kp:2048, match:nn
19-05-19 F 94.0 1382.2 86.8 3.04 0.2484 0.3696 0.4348 0.4782 0.5125 Challenge organizers GeoDesc extracted on SIFT keypoints. Feature matching with brute-force nearest-neighbour search. N/A 128 float32
SIFT + HardNet
kp:2048, match:nn
19-05-19 F 94.1 1605.5 86.0 2.99 0.2640 0.3892 0.4446 0.4968 0.5305 Challenge organizers HardNet extracted on SIFT keypoints. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger patches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. N/A 128 float32
kp:8000, match:sGOr2f
19-05-23 F 99.3 1977.7 97.2 3.44 0.4228 0.5685 0.6511 0.7022 0.7335 Fabio Bellavia and Carlo Colombo HarrisZ keypoints and Root squared (like RootSIFT) sGLOH2 descriptors using sGOr2f* matching strategy. Distance table entries lower than their respective row-wise and column-wise averages are discarded and then matching pairs are computed using the greedy nearest neighbour as in the WISW@CAIP2019 contest. Keypoints and matches are ordered according to their ranks (best ones first). N/A 256 float32
HesAffNet - HardNet2
kp:8000, match:nn
19-05-29 F 98.8 5426.8 96.8 3.45 0.4157 0.5569 0.6392 0.6904 0.7250 Milan Pultar, Dmytro Mishkin, Jiří Matas Detector: Hessian + AffNet(affine shape) + OriNet (orientation), Code and weight from, Paper:; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train:, Paper:; DeepNets were plugged into MODS framework, but without view synthesis and matching; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. TBA N/A 128 uint8
Hessian - HardNet2
kp:8000, match:nn
19-05-30 F 97.8 5934.5 95.0 3.45 0.4255 0.5604 0.6356 0.6788 0.7094 Milan Pultar, Dmytro Mishkin, Jiří Matas Detector: Hessian. Gravity vector orientation is assumed; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: Paper:, DeepNets were plugged into MODS framework, but without view synthesis and matching; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. TBA N/A 128 uint8
ORB (OpenCV)
kp:8000, match:nn
19-04-24 F 83.7 3000.3 67.2 2.65 0.1012 0.1613 0.2027 0.2277 0.2493 Challenge organizers ORB, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. N/A TBA
Scale-invariant desc. (Log-Polar, lambda=32)
kp:8000, match:nn
19-06-25 F 99.9 7184.5 97.8 3.52 0.4876 0.6357 0.7047 0.7474 0.7866 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA N/A 128 float32
Scale-invariant desc. (Log-Polar, lambda=64)
kp:8000, match:nn
19-06-24 F 99.8 7388.7 98.2 3.50 0.4924 0.6350 0.7128 0.7541 0.7900 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA N/A 128 float32
Scale-invariant desc. (Log-Polar, lambda=96)
kp:8000, match:nn
19-06-20 F 99.7 7437.9 98.2 3.50 0.4985 0.6414 0.7159 0.7607 0.7938 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA N/A 128 float32
SIFT-AID (NN matcher)
kp:8000, match:nn
19-05-10 F 92.2 3106.0 82.5 2.95 0.2665 0.3724 0.4313 0.4724 0.5107 Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: Code: N/A 6272 bits
SIFT-AID (custom matcher)
kp:8000, match:sift-aid
19-04-29 F/M 91.5 2962.0 82.0 2.89 0.2635 0.3814 0.4441 0.4868 0.5202 Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: Code: N/A 6272 bits
SIFT + ContextDesc
kp:8000, match:nn
19-05-09 F 99.9 7159.1 99.0 3.49 0.4967 0.6466 0.7115 0.7636 0.7925 Zixin Luo ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. N/A 128 uint8
kp:8000, match:nn
19-05-28 F 99.5 5971.8 97.8 3.59 0.4545 0.6070 0.6756 0.7158 0.7507 Zixin Luo, Jiahui Zhang Dense-ContextDesc is a variant of ContextDesc, where descriptors are densely extracted from full images, instead of image patches, while other settings stay unchanged as original ContextDesc. We find Dense-ContextDesc performs better regarding in particular illumination changes. Dense-ContextDesc is extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. The model will be available on the authors' GitHub page. TBA N/A TBA
SIFT + ContextDesc + Inlier Classification V2
kp:8000, match:custom
19-05-28 F/M 100.0 6725.5 100.0 3.67 0.6659 0.8305 0.8809 0.9102 0.9293 Dawei Sun, Zixin Luo, Jiahui Zhang We use the SIFT detector and ContextDesc descriptor, and then we train an improved inlier classification and fundamental matrix estimation network based on [Yi et al. CVPR2018] ( N/A 128 float32
kp:8000, match:nn
19-05-08 F 99.7 5863.3 98.5 3.66 0.5024 0.6494 0.7246 0.7711 0.8054 Zixin Luo GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features extracted from the code provided by the authors. N/A 128 float32
SIFT + GeoDesc
kp:8000, match:nn
19-04-24 F 99.4 5803.2 98.0 3.65 0.4853 0.6450 0.7138 0.7569 0.7808 Challenge organizers GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. N/A 128 float32
SIFT + HardNet
kp:8000, match:nn
19-04-24 F 99.6 6992.4 98.5 3.53 0.4737 0.6270 0.6945 0.7446 0.7752 Challenge organizers HardNet extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. N/A 128 float32
SIFT + L2-Net
kp:8000, match:nn
19-04-24 F 99.5 6584.7 97.8 3.51 0.4818 0.6342 0.7093 0.7595 0.7870 Challenge organizers L2-Net extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. N/A 128 float32
SIFT + ContextDesc + Inlier Classification V1
kp:8000, match:custom
19-05-29 F/M 99.9 6656.3 99.8 3.67 0.6488 0.8004 0.8609 0.8883 0.9076 Dawei Sun, Zixin Luo, Jiahui Zhang We use the SIFT detector and ContextDesc descriptor, and then we train an inlier classification and fundamental matrix estimation network using the architecture of [Yi et al. CVPR2018] ( N/A 128 float32
kp:8000, match:nn
19-04-24 F 97.8 5042.7 92.0 3.47 0.4324 0.5683 0.6433 0.6835 0.7085 Challenge organizers SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. N/A 128 float32
SIFT + TFeat
kp:8000, match:nn
19-04-24 F 99.4 5916.9 96.5 3.47 0.4612 0.6155 0.6819 0.7305 0.7654 Challenge organizers T-Feat extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. N/A 128 float32
kp:2048, match:nn
19-05-17 F 84.8 1020.4 71.5 2.93 0.1483 0.2179 0.2606 0.2910 0.3163 Challenge organizers SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. N/A 128 float32
Superpoint (nn matcher)
kp:2048, match:nn
19-06-07 F 98.1 1738.9 91.5 3.66 0.3882 0.5408 0.6105 0.6517 0.6871 Challenge organizers SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. TBA N/A 256 float32
Superpoint (1:1 matcher)
kp:2048, match:nn1to1
19-06-07 F 98.5 1621.8 93.8 3.73 0.4257 0.5707 0.6507 0.7016 0.7358 Challenge organizers SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. TBA N/A 256 float32
SuperPoint (default)
kp:2048, match:nn
19-04-24 F 95.2 1172.9 85.2 3.39 0.3032 0.4336 0.5046 0.5534 0.5797 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned (about 1200 on average). Feature matching done with brute-force nearest-neighbour search. N/A 256 float32
kp:1024, match:nn
19-04-26 F 94.2 841.4 83.2 3.32 0.3051 0.4243 0.4946 0.5351 0.5616 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. N/A 256 float32
kp:2048, match:nn
19-04-26 F 95.7 1524.3 88.2 3.44 0.3035 0.4387 0.5072 0.5523 0.5886 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. N/A 256 float32
kp:256, match:nn
19-04-26 F 33.8 96.8 12.8 1.96 0.0081 0.0133 0.0160 0.0176 0.0188 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. N/A 256 float32
kp:512, match:nn
19-04-26 F 85.7 388.5 66.2 3.01 0.1857 0.2701 0.3108 0.3430 0.3634 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. N/A 256 float32
kp:8000, match:nn
19-04-26 F 96.5 4767.1 92.5 3.42 0.1902 0.3431 0.4401 0.5021 0.5568 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. N/A 256 float32
Scale-invariant desc. (Cartesian, lambda=16)
kp:8000, match:nn
19-07-29 F 99.8 7483.0 99.2 3.52 0.4681 0.6285 0.7104 0.7622 0.8020 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. (This is a baseline where we use cartesian patches instead.) Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA N/A 128 float32
SuperPoint (trained on coco + phototourism training set)
kp:2048, match:nn
19-05-30 F 97.0 1367.4 90.5 3.57 0.3466 0.4874 0.5633 0.6130 0.6465 Daniel DeTone, Paul Sarlin, Tomasz Malisiewicz, Andrew Rabinovich SuperPoint V1 model trained on COCO homographic warps at VGA resolution, plus pairs from the phototourism training set using the GT poses and depths for correspondence. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned. N/A 256 float32
SuperPoint + Custom Matcher (v1)
kp:2048, match:custom
19-05-30 F/M 98.9 1291.3 94.5 3.80 0.5055 0.6663 0.7382 0.7775 0.8023 Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich Features are extracted by a modified SuperPoint: - retrained on the phototourism training set; - keypoints refinement; - better descriptor sampling; - adjusted thresholds. A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). TBA N/A 256 float32
SuperPoint + Custom Matcher (v2)
kp:2048, match:custom
19-05-28 F/M 99.1 1457.4 95.0 4.01 0.5125 0.6774 0.7504 0.7938 0.8180 Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich Features are extracted by a modified SuperPoint: - retrained on the phototourism training set, - keypoints refinement, - better descriptor sampling, - adjusted thresholds; A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). TBA N/A 256 float32
kp:8000, match:nn
19-04-24 F 93.2 3014.7 82.5 3.10 0.2704 0.3890 0.4503 0.4908 0.5190 Challenge organizers SURF, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. N/A TBA
SIFT + ContextDesc
kp:8000, match:nn1to1
19-06-07 F 99.7 7245.2 99.8 3.57 0.5255 0.6810 0.7492 0.7906 0.8229 Challenge organizers ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with nearest-neighbour search, enforcing cross-match consistency. Features are quantized to uint8 and extracted from the code provided by the authors. N/A TBA

MVS — sequence 'mount_rushmore'
Method Date Type Ims (%) #Pts SR TL mAP5o mAP10o mAP15o mAP20o mAP25o ATE By Details Link Contact Updated Descriptor size
kp:8000, match:nn
19-04-24 F 88.0 2560.1 84.5 3.50 0.1900 0.2628 0.3131 0.3593 0.3974 Challenge organizers AKAZE, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. N/A TBA
kp:8000, match:nn
19-04-26 F 92.7 3241.5 89.0 3.66 0.1738 0.2667 0.3458 0.4051 0.4399 Anonymous We use OpenCV's implementation of AKAZE detector, and for each keypoint we extract a descriptor via CNN. N/A Anonymous N/A 256 float32
Brisk + SSS
kp:8000, match:nn
19-05-14 F 86.1 2007.6 77.8 2.77 0.0279 0.0683 0.1207 0.1664 0.2017 Anonymous We use OpenCV's implementation of brisk detector with the default settings, and for each image there are at most 8K keypoints. For each keypoint, we extract a descriptor via a CNN model. TBA Anonymous N/A 128 float32
D2-Net (single scale)
kp:8000, match:nn
19-05-07 F 94.0 4259.6 90.2 3.55 0.1301 0.2326 0.3186 0.3834 0.4361 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: N/A 512 float32
D2-Net (multiscale)
kp:8000, match:nn
19-05-07 F 95.2 6731.8 94.0 3.36 0.1456 0.2653 0.3543 0.4342 0.4860 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: N/A 512 float32
D2-Net (single scale, no PT dataset)
kp:8000, match:nn
19-06-01 F 94.0 4065.3 93.0 3.56 0.1257 0.2435 0.3217 0.3834 0.4381 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: N/A 512 float32
D2-Net (multi-scale, no PT dataset)
kp:8000, match:nn
19-06-05 F 95.0 5694.8 93.2 3.29 0.1185 0.2274 0.3182 0.3912 0.4499 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: N/A 512 float32
kp:1024, match:nn
19-05-05 F 81.6 454.4 78.8 2.50 0.0010 0.0092 0.0330 0.0776 0.1192 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: N/A 40 float32
kp:2048, match:nn
19-05-05 F 86.3 853.8 87.8 2.52 0.0024 0.0192 0.0592 0.1117 0.1638 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: N/A 40 float32
kp:256, match:nn
19-05-05 F 32.2 54.0 27.5 1.80 0.0000 0.0002 0.0010 0.0038 0.0062 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: N/A 40 float32
kp:512, match:nn
19-05-05 F 70.7 191.1 60.2 2.42 0.0003 0.0040 0.0140 0.0316 0.0504 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: N/A 40 float32
kp:512, match:nn
19-05-07 F 59.2 117.2 42.2 2.65 0.0043 0.0122 0.0190 0.0266 0.0313 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. TBA Anonymous N/A 256 float32
kp:512, match:nn
19-05-09 F 60.0 120.6 45.0 2.73 0.0046 0.0122 0.0207 0.0285 0.0342 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. TBA Anonymous N/A 256 float32
kp:512, match:nn
19-04-26 F 68.0 164.0 57.0 2.87 0.0293 0.0585 0.0799 0.0972 0.1116 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are HOG (as in SIFT). N/A Anonymous N/A 128 uint8
SIFT + GeoDesc
kp:2048, match:nn
19-05-19 F 83.7 814.9 81.0 3.23 0.1240 0.1972 0.2438 0.2877 0.3239 Challenge organizers GeoDesc extracted on SIFT keypoints. Feature matching with brute-force nearest-neighbour search. N/A 128 float32
SIFT + HardNet
kp:2048, match:nn
19-05-19 F 86.0 956.1 78.8 3.22 0.1350 0.2085 0.2620 0.3051 0.3352 Challenge organizers HardNet extracted on SIFT keypoints. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger patches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. N/A 128 float32
kp:8000, match:sGOr2f
19-05-23 F 93.0 1511.3 91.0 3.58 0.2318 0.3353 0.4007 0.4460 0.4868 Fabio Bellavia and Carlo Colombo HarrisZ keypoints and Root squared (like RootSIFT) sGLOH2 descriptors using sGOr2f* matching strategy. Distance table entries lower than their respective row-wise and column-wise averages are discarded and then matching pairs are computed using the greedy nearest neighbour as in the WISW@CAIP2019 contest. Keypoints and matches are ordered according to their ranks (best ones first). N/A 256 float32
HesAffNet - HardNet2
kp:8000, match:nn
19-05-29 F 90.0 3140.7 91.2 3.66 0.2129 0.3063 0.3792 0.4237 0.4648 Milan Pultar, Dmytro Mishkin, Jiří Matas Detector: Hessian + AffNet(affine shape) + OriNet (orientation), Code and weight from, Paper:; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train:, Paper:; DeepNets were plugged into MODS framework, but without view synthesis and matching; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. TBA N/A 128 uint8
Hessian - HardNet2
kp:8000, match:nn
19-05-30 F 92.5 3961.1 94.5 3.84 0.2539 0.3624 0.4294 0.4835 0.5198 Milan Pultar, Dmytro Mishkin, Jiří Matas Detector: Hessian. Gravity vector orientation is assumed; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: Paper:, DeepNets were plugged into MODS framework, but without view synthesis and matching; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. TBA N/A 128 uint8
ORB (OpenCV)
kp:8000, match:nn
19-04-24 F 78.1 1604.7 69.8 2.96 0.0640 0.1070 0.1402 0.1691 0.1905 Challenge organizers ORB, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. N/A TBA
Scale-invariant desc. (Log-Polar, lambda=32)
kp:8000, match:nn
19-06-25 F 94.5 4923.5 92.8 3.59 0.3455 0.4482 0.5206 0.5666 0.5975 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA N/A 128 float32
Scale-invariant desc. (Log-Polar, lambda=64)
kp:8000, match:nn
19-06-24 F 93.5 5179.7 92.5 3.56 0.3277 0.4360 0.4911 0.5325 0.5606 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA N/A 128 float32
Scale-invariant desc. (Log-Polar, lambda=96)
kp:8000, match:nn
19-06-20 F 94.4 5162.8 94.2 3.58 0.3358 0.4460 0.5201 0.5683 0.5981 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA N/A 128 float32
SIFT-AID (NN matcher)
kp:8000, match:nn
19-05-10 F 78.3 1244.1 66.0 2.86 0.0693 0.1117 0.1428 0.1707 0.1927 Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: Code: N/A 6272 bits
SIFT-AID (custom matcher)
kp:8000, match:sift-aid
19-04-29 F/M 75.7 1066.4 65.0 2.79 0.0656 0.1071 0.1367 0.1638 0.1835 Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: Code: N/A 6272 bits
SIFT + ContextDesc
kp:8000, match:nn
19-05-09 F 94.9 4393.5 95.0 3.66 0.2958 0.4132 0.4788 0.5322 0.5700 Zixin Luo ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. N/A 128 uint8
kp:8000, match:nn
19-05-28 F 93.0 3832.8 92.5 3.77 0.2789 0.3830 0.4511 0.5060 0.5436 Zixin Luo, Jiahui Zhang Dense-ContextDesc is a variant of ContextDesc, where descriptors are densely extracted from full images, instead of image patches, while other settings stay unchanged as original ContextDesc. We find Dense-ContextDesc performs better regarding in particular illumination changes. Dense-ContextDesc is extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. The model will be available on the authors' GitHub page. TBA N/A TBA
SIFT + ContextDesc + Inlier Classification V2
kp:8000, match:custom
19-05-28 F/M 95.2 5279.1 95.5 3.63 0.4208 0.5494 0.6305 0.6803 0.7085 Dawei Sun, Zixin Luo, Jiahui Zhang We use the SIFT detector and ContextDesc descriptor, and then we train an improved inlier classification and fundamental matrix estimation network based on [Yi et al. CVPR2018] ( N/A 128 float32
kp:8000, match:nn
19-05-08 F 93.9 4283.5 91.5 3.64 0.3019 0.4028 0.4662 0.5188 0.5631 Zixin Luo GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features extracted from the code provided by the authors. N/A 128 float32
SIFT + GeoDesc
kp:8000, match:nn
19-04-24 F 94.2 4258.4 91.2 3.66 0.3333 0.4335 0.4907 0.5437 0.5759 Challenge organizers GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. N/A 128 float32
SIFT + HardNet
kp:8000, match:nn
19-04-24 F 94.8 5031.3 91.8 3.61 0.3225 0.4429 0.5106 0.5660 0.6041 Challenge organizers HardNet extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. N/A 128 float32
SIFT + L2-Net
kp:8000, match:nn
19-04-24 F 92.9 4193.4 89.5 3.57 0.2987 0.3976 0.4667 0.5171 0.5530 Challenge organizers L2-Net extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. N/A 128 float32
SIFT + ContextDesc + Inlier Classification V1
kp:8000, match:custom
19-05-29 F/M 94.9 5005.6 95.8 3.68 0.4124 0.5522 0.6222 0.6646 0.6979 Dawei Sun, Zixin Luo, Jiahui Zhang We use the SIFT detector and ContextDesc descriptor, and then we train an inlier classification and fundamental matrix estimation network using the architecture of [Yi et al. CVPR2018] ( N/A 128 float32
kp:8000, match:nn
19-04-24 F 85.6 2470.7 83.0 3.29 0.1986 0.2710 0.3181 0.3563 0.3825 Challenge organizers SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. N/A 128 float32
SIFT + TFeat
kp:8000, match:nn
19-04-24 F 91.1 3276.1 86.8 3.50 0.2535 0.3399 0.4022 0.4485 0.4811 Challenge organizers T-Feat extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. N/A 128 float32
kp:2048, match:nn
19-05-17 F 76.7 689.6 71.5 2.97 0.0798 0.1236 0.1601 0.1927 0.2159 Challenge organizers SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. N/A 128 float32
Superpoint (nn matcher)
kp:2048, match:nn
19-06-07 F 92.0 1005.4 89.2 4.21 0.2201 0.3210 0.3891 0.4421 0.4823 Challenge organizers SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. TBA N/A 256 float32
Superpoint (1:1 matcher)
kp:2048, match:nn1to1
19-06-07 F 91.9 990.3 87.2 4.23 0.2608 0.3599 0.4309 0.4813 0.5167 Challenge organizers SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. TBA N/A 256 float32
SuperPoint (default)
kp:2048, match:nn
19-04-24 F 89.9 494.1 87.8 4.14 0.1925 0.2929 0.3569 0.4057 0.4463 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned (about 1200 on average). Feature matching done with brute-force nearest-neighbour search. N/A 256 float32
kp:1024, match:nn
19-04-26 F 91.2 580.9 87.0 4.24 0.2113 0.3033 0.3670 0.4165 0.4602 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. N/A 256 float32
kp:2048, match:nn
19-04-26 F 91.5 1048.4 89.8 4.08 0.2214 0.3216 0.3859 0.4300 0.4717 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. N/A 256 float32
kp:256, match:nn
19-04-26 F 75.2 113.6 64.2 3.42 0.0501 0.0837 0.1125 0.1370 0.1558 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. N/A 256 float32
kp:512, match:nn
19-04-26 F 83.0 246.6 83.2 3.84 0.1380 0.2119 0.2704 0.3132 0.3438 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. N/A 256 float32
kp:8000, match:nn
19-04-26 F 94.4 3764.8 92.0 3.85 0.1489 0.2427 0.3203 0.3796 0.4292 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. N/A 256 float32
Scale-invariant desc. (Cartesian, lambda=16)
kp:8000, match:nn
19-07-29 F 94.3 5187.3 92.2 3.56 0.3011 0.4134 0.4867 0.5406 0.5767 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. (This is a baseline where we use cartesian patches instead.) Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA N/A 128 float32
SuperPoint (trained on coco + phototourism training set)
kp:2048, match:nn
19-05-30 F 91.7 701.3 87.2 4.19 0.2019 0.2927 0.3562 0.4068 0.4445 Daniel DeTone, Paul Sarlin, Tomasz Malisiewicz, Andrew Rabinovich SuperPoint V1 model trained on COCO homographic warps at VGA resolution, plus pairs from the phototourism training set using the GT poses and depths for correspondence. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned. N/A 256 float32
SuperPoint + Custom Matcher (v1)
kp:2048, match:custom
19-05-30 F/M 94.9 686.9 90.2 4.37 0.3028 0.4263 0.5033 0.5579 0.5958 Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich Features are extracted by a modified SuperPoint: - retrained on the phototourism training set; - keypoints refinement; - better descriptor sampling; - adjusted thresholds. A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). TBA N/A 256 float32
SuperPoint + Custom Matcher (v2)
kp:2048, match:custom
19-05-28 F/M 93.4 960.1 88.2 4.36 0.3139 0.4466 0.5183 0.5656 0.5954 Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich Features are extracted by a modified SuperPoint: - retrained on the phototourism training set, - keypoints refinement, - better descriptor sampling, - adjusted thresholds; A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). TBA N/A 256 float32
kp:8000, match:nn
19-04-24 F 83.3 1508.4 76.5 3.23 0.1146 0.1709 0.2126 0.2475 0.2775 Challenge organizers SURF, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. N/A TBA
SIFT + ContextDesc
kp:8000, match:nn1to1
19-06-07 F 95.5 5563.0 96.2 3.63 0.3288 0.4579 0.5393 0.5927 0.6348 Challenge organizers ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with nearest-neighbour search, enforcing cross-match consistency. Features are quantized to uint8 and extracted from the code provided by the authors. N/A TBA

MVS — sequence 'piazza_san_marco'
Method Date Type Ims (%) #Pts SR TL mAP5o mAP10o mAP15o mAP20o mAP25o ATE By Details Link Contact Updated Descriptor size
kp:8000, match:nn
19-04-24 F 90.2 5212.9 84.8 2.37 0.1090 0.1960 0.2543 0.3059 0.3479 Challenge organizers AKAZE, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. N/A TBA
kp:8000, match:nn
19-04-26 F 96.6 5929.9 96.0 2.47 0.1927 0.3246 0.4122 0.4794 0.5291 Anonymous We use OpenCV's implementation of AKAZE detector, and for each keypoint we extract a descriptor via CNN. N/A Anonymous N/A 256 float32
Brisk + SSS
kp:8000, match:nn
19-05-14 F 92.3 4004.4 89.5 2.29 0.0765 0.1823 0.2678 0.3353 0.3876 Anonymous We use OpenCV's implementation of brisk detector with the default settings, and for each image there are at most 8K keypoints. For each keypoint, we extract a descriptor via a CNN model. TBA Anonymous N/A 128 float32
D2-Net (single scale)
kp:8000, match:nn
19-05-07 F 95.9 5493.2 93.5 2.41 0.1385 0.2715 0.3673 0.4293 0.4868 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: N/A 512 float32
D2-Net (multiscale)
kp:8000, match:nn
19-05-07 F 96.8 6287.8 96.5 2.42 0.1290 0.2761 0.3746 0.4409 0.4992 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: N/A 512 float32
D2-Net (single scale, no PT dataset)
kp:8000, match:nn
19-06-01 F 96.1 5554.5 94.2 2.41 0.1407 0.2800 0.3760 0.4440 0.5006 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: N/A 512 float32
D2-Net (multi-scale, no PT dataset)
kp:8000, match:nn
19-06-05 F 97.1 5970.1 95.5 2.42 0.1452 0.2975 0.3976 0.4676 0.5287 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: N/A 512 float32
kp:1024, match:nn
19-05-05 F 65.6 526.3 56.5 2.09 0.0058 0.0262 0.0476 0.0674 0.0848 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: N/A 40 float32
kp:2048, match:nn
19-05-05 F 81.8 1718.9 78.0 2.08 0.0178 0.0764 0.1378 0.1946 0.2381 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: N/A 40 float32
kp:256, match:nn
19-05-05 F %!f(int64=0) 0.0 0.0 0.00 0.0000 0.0000 0.0000 0.0000 0.0000 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: N/A 40 float32
kp:512, match:nn
19-05-05 F 51.8 112.2 22.2 2.13 0.0001 0.0006 0.0010 0.0026 0.0034 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: N/A 40 float32
kp:512, match:nn
19-05-07 F 11.6 63.5 21.8 1.08 0.0001 0.0005 0.0010 0.0014 0.0017 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. TBA Anonymous N/A 256 float32
kp:512, match:nn
19-05-09 F 27.6 95.0 29.8 1.63 0.0003 0.0011 0.0024 0.0040 0.0048 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. TBA Anonymous N/A 256 float32
kp:512, match:nn
19-04-26 F 55.6 141.1 34.2 2.27 0.0036 0.0089 0.0148 0.0176 0.0215 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are HOG (as in SIFT). N/A Anonymous N/A 128 uint8
SIFT + GeoDesc
kp:2048, match:nn
19-05-19 F 88.4 2023.8 85.5 2.28 0.0373 0.1222 0.2070 0.2820 0.3337 Challenge organizers GeoDesc extracted on SIFT keypoints. Feature matching with brute-force nearest-neighbour search. N/A 128 float32
SIFT + HardNet
kp:2048, match:nn
19-05-19 F 87.8 2147.9 84.5 2.25 0.0328 0.1092 0.1933 0.2619 0.3165 Challenge organizers HardNet extracted on SIFT keypoints. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger patches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. N/A 128 float32
kp:8000, match:sGOr2f
19-05-23 F 93.9 2525.5 92.0 2.37 0.1127 0.2203 0.3091 0.3771 0.4310 Fabio Bellavia and Carlo Colombo HarrisZ keypoints and Root squared (like RootSIFT) sGLOH2 descriptors using sGOr2f* matching strategy. Distance table entries lower than their respective row-wise and column-wise averages are discarded and then matching pairs are computed using the greedy nearest neighbour as in the WISW@CAIP2019 contest. Keypoints and matches are ordered according to their ranks (best ones first). N/A 256 float32
HesAffNet - HardNet2
kp:8000, match:nn
19-05-29 F 97.2 7092.2 95.8 2.63 0.2507 0.3783 0.4449 0.5045 0.5562 Milan Pultar, Dmytro Mishkin, Jiří Matas Detector: Hessian + AffNet(affine shape) + OriNet (orientation), Code and weight from, Paper:; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train:, Paper:; DeepNets were plugged into MODS framework, but without view synthesis and matching; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. TBA N/A 128 uint8
Hessian - HardNet2
kp:8000, match:nn
19-05-30 F 97.5 7443.0 97.2 2.71 0.2714 0.3850 0.4477 0.5046 0.5518 Milan Pultar, Dmytro Mishkin, Jiří Matas Detector: Hessian. Gravity vector orientation is assumed; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: Paper:, DeepNets were plugged into MODS framework, but without view synthesis and matching; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. TBA N/A 128 uint8
ORB (OpenCV)
kp:8000, match:nn
19-04-24 F 87.5 4718.8 86.8 2.18 0.0192 0.0666 0.1302 0.1867 0.2426 Challenge organizers ORB, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. N/A TBA
Scale-invariant desc. (Log-Polar, lambda=32)
kp:8000, match:nn
19-06-25 F 98.1 8339.5 98.0 2.52 0.2242 0.3458 0.4309 0.4866 0.5436 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA N/A 128 float32
Scale-invariant desc. (Log-Polar, lambda=64)
kp:8000, match:nn
19-06-24 F 98.3 8545.4 98.5 2.53 0.2424 0.3622 0.4473 0.5034 0.5488 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA N/A 128 float32
Scale-invariant desc. (Log-Polar, lambda=96)
kp:8000, match:nn
19-06-20 F 98.6 8696.8 98.5 2.56 0.2526 0.3750 0.4580 0.5077 0.5571 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA N/A 128 float32
SIFT-AID (NN matcher)
kp:8000, match:nn
19-05-10 F 83.0 2717.4 77.2 2.24 0.0687 0.1341 0.1834 0.2265 0.2614 Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: Code: N/A 6272 bits
SIFT-AID (custom matcher)
kp:8000, match:sift-aid
19-04-29 F/M 82.4 2633.9 74.2 2.24 0.0704 0.1389 0.1903 0.2296 0.2672 Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: Code: N/A 6272 bits
SIFT + ContextDesc
kp:8000, match:nn
19-05-09 F 98.9 8508.5 98.0 2.65 0.2755 0.4066 0.4852 0.5414 0.5906 Zixin Luo ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. N/A 128 uint8
kp:8000, match:nn
19-05-28 F 98.5 6919.9 97.0 2.70 0.2535 0.3866 0.4682 0.5198 0.5635 Zixin Luo, Jiahui Zhang Dense-ContextDesc is a variant of ContextDesc, where descriptors are densely extracted from full images, instead of image patches, while other settings stay unchanged as original ContextDesc. We find Dense-ContextDesc performs better regarding in particular illumination changes. Dense-ContextDesc is extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. The model will be available on the authors' GitHub page. TBA N/A TBA
SIFT + ContextDesc + Inlier Classification V2
kp:8000, match:custom
19-05-28 F/M 99.6 8324.3 99.5 2.85 0.5046 0.6215 0.6854 0.7333 0.7671 Dawei Sun, Zixin Luo, Jiahui Zhang We use the SIFT detector and ContextDesc descriptor, and then we train an improved inlier classification and fundamental matrix estimation network based on [Yi et al. CVPR2018] ( N/A 128 float32
kp:8000, match:nn
19-05-08 F 98.1 7387.0 95.0 2.70 0.2740 0.4013 0.4772 0.5292 0.5787 Zixin Luo GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features extracted from the code provided by the authors. N/A 128 float32
SIFT + GeoDesc
kp:8000, match:nn
19-04-24 F 97.5 7295.3 94.8 2.56 0.2476 0.3512 0.4137 0.4679 0.5185 Challenge organizers GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. N/A 128 float32
SIFT + HardNet
kp:8000, match:nn
19-04-24 F 98.4 8333.2 99.2 2.57 0.2565 0.3783 0.4612 0.5187 0.5603 Challenge organizers HardNet extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. N/A 128 float32
SIFT + L2-Net
kp:8000, match:nn
19-04-24 F 97.3 7765.5 97.0 2.47 0.2093 0.3237 0.4000 0.4572 0.5014 Challenge organizers L2-Net extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. N/A 128 float32
SIFT + ContextDesc + Inlier Classification V1
kp:8000, match:custom
19-05-29 F/M 99.5 8211.0 98.8 2.84 0.4407 0.5491 0.6156 0.6625 0.7047 Dawei Sun, Zixin Luo, Jiahui Zhang We use the SIFT detector and ContextDesc descriptor, and then we train an inlier classification and fundamental matrix estimation network using the architecture of [Yi et al. CVPR2018] ( N/A 128 float32
kp:8000, match:nn
19-04-24 F 91.2 5351.7 87.2 2.35 0.1394 0.2188 0.2797 0.3339 0.3677 Challenge organizers SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. N/A 128 float32
SIFT + TFeat
kp:8000, match:nn
19-04-24 F 95.8 6975.9 94.8 2.41 0.1664 0.2763 0.3557 0.4210 0.4733 Challenge organizers T-Feat extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. N/A 128 float32
kp:2048, match:nn
19-05-17 F 79.3 1396.0 77.8 2.12 0.0083 0.0303 0.0645 0.0962 0.1245 Challenge organizers SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. N/A 128 float32
Superpoint (nn matcher)
kp:2048, match:nn
19-06-07 F 89.6 2269.5 82.8 2.32 0.0670 0.1863 0.2737 0.3329 0.3877 Challenge organizers SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. TBA N/A 256 float32
Superpoint (1:1 matcher)
kp:2048, match:nn1to1
19-06-07 F 90.5 2160.2 82.2 2.37 0.0919 0.2143 0.3033 0.3675 0.4159 Challenge organizers SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. TBA N/A 256 float32
SuperPoint (default)
kp:2048, match:nn
19-04-24 F 87.3 1317.6 80.2 2.46 0.1420 0.2472 0.3163 0.3597 0.3947 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned (about 1200 on average). Feature matching done with brute-force nearest-neighbour search. N/A 256 float32
kp:1024, match:nn
19-04-26 F 83.0 857.1 71.5 2.45 0.1234 0.2109 0.2659 0.3044 0.3314 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. N/A 256 float32
kp:2048, match:nn
19-04-26 F 89.2 1695.3 82.5 2.47 0.1420 0.2534 0.3269 0.3727 0.4123 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. N/A 256 float32
kp:256, match:nn
19-04-26 F 3.2 26.7 4.0 0.58 0.0000 0.0001 0.0002 0.0002 0.0003 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. N/A 256 float32
kp:512, match:nn
19-04-26 F 41.1 246.5 45.5 1.79 0.0461 0.0804 0.0990 0.1101 0.1176 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. N/A 256 float32
kp:8000, match:nn
19-04-26 F 95.8 6173.1 93.8 2.45 0.1470 0.2804 0.3695 0.4303 0.4829 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. N/A 256 float32
Scale-invariant desc. (Cartesian, lambda=16)
kp:8000, match:nn
19-07-29 F 98.5 9920.9 99.0 2.37 0.1257 0.2710 0.3722 0.4436 0.5105 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. (This is a baseline where we use cartesian patches instead.) Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA N/A 128 float32
SuperPoint (trained on coco + phototourism training set)
kp:2048, match:nn
19-05-30 F 88.9 1603.9 79.5 2.49 0.1575 0.2699 0.3349 0.3854 0.4214 Daniel DeTone, Paul Sarlin, Tomasz Malisiewicz, Andrew Rabinovich SuperPoint V1 model trained on COCO homographic warps at VGA resolution, plus pairs from the phototourism training set using the GT poses and depths for correspondence. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned. N/A 256 float32
SuperPoint + Custom Matcher (v1)
kp:2048, match:custom
19-05-30 F/M 88.9 1295.3 79.5 2.60 0.1524 0.2508 0.3152 0.3562 0.3950 Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich Features are extracted by a modified SuperPoint: - retrained on the phototourism training set; - keypoints refinement; - better descriptor sampling; - adjusted thresholds. A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). TBA N/A 256 float32
SuperPoint + Custom Matcher (v2)
kp:2048, match:custom
19-05-28 F/M 89.9 1560.5 80.5 2.63 0.2112 0.3038 0.3557 0.3998 0.4307 Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich Features are extracted by a modified SuperPoint: - retrained on the phototourism training set, - keypoints refinement, - better descriptor sampling, - adjusted thresholds; A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). TBA N/A 256 float32
kp:8000, match:nn
19-04-24 F 83.4 3696.9 74.5 2.25 0.0784 0.1534 0.2072 0.2466 0.2803 Challenge organizers SURF, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. N/A TBA
SIFT + ContextDesc
kp:8000, match:nn1to1
19-06-07 F 99.2 9477.8 98.5 2.62 0.2472 0.4044 0.5014 0.5657 0.6243 Challenge organizers ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with nearest-neighbour search, enforcing cross-match consistency. Features are quantized to uint8 and extracted from the code provided by the authors. N/A TBA

MVS — sequence 'reichstag'
Method Date Type Ims (%) #Pts SR TL mAP5o mAP10o mAP15o mAP20o mAP25o ATE By Details Link Contact Updated Descriptor size
kp:8000, match:nn
19-04-24 F 95.7 4128.4 90.5 3.48 0.3067 0.4024 0.4708 0.5249 0.5651 Challenge organizers AKAZE, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. N/A TBA
kp:8000, match:nn
19-04-26 F 96.4 2565.4 97.0 2.93 0.2516 0.3680 0.4310 0.4832 0.5263 Anonymous We use OpenCV's implementation of AKAZE detector, and for each keypoint we extract a descriptor via CNN. N/A Anonymous N/A 256 float32
Brisk + SSS
kp:8000, match:nn
19-05-14 F 94.7 1743.9 93.0 2.79 0.1505 0.2669 0.3511 0.4131 0.4629 Anonymous We use OpenCV's implementation of brisk detector with the default settings, and for each image there are at most 8K keypoints. For each keypoint, we extract a descriptor via a CNN model. TBA Anonymous N/A 128 float32
D2-Net (single scale)
kp:8000, match:nn
19-05-07 F 94.4 3590.9 97.5 3.20 0.1229 0.2517 0.3340 0.3847 0.4260 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: N/A 512 float32
D2-Net (multiscale)
kp:8000, match:nn
19-05-07 F 95.6 4299.7 97.0 3.04 0.1147 0.2860 0.3690 0.4264 0.4724 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: N/A 512 float32
D2-Net (single scale, no PT dataset)
kp:8000, match:nn
19-06-01 F 94.8 3675.1 96.2 3.21 0.1162 0.2502 0.3284 0.3836 0.4200 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: N/A 512 float32
D2-Net (multi-scale, no PT dataset)
kp:8000, match:nn
19-06-05 F 94.9 4309.6 97.2 3.05 0.1096 0.2550 0.3394 0.3964 0.4485 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: N/A 512 float32
kp:1024, match:nn
19-05-05 F 86.8 863.4 91.0 2.36 0.0117 0.0724 0.1275 0.1759 0.2156 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: N/A 40 float32
kp:2048, match:nn
19-05-05 F 90.8 1653.1 94.0 2.36 0.0143 0.0818 0.1566 0.2076 0.2548 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: N/A 40 float32
kp:256, match:nn
19-05-05 F 62.6 121.4 45.8 2.30 0.0016 0.0113 0.0223 0.0319 0.0417 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: N/A 40 float32
kp:512, match:nn
19-05-05 F 78.7 369.2 77.8 2.36 0.0089 0.0459 0.0895 0.1200 0.1482 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: N/A 40 float32
kp:512, match:nn
19-05-07 F 73.2 162.5 53.0 2.91 0.0547 0.0973 0.1209 0.1389 0.1532 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. TBA Anonymous N/A 256 float32
kp:512, match:nn
19-05-09 F 75.6 171.2 61.7 3.01 0.0498 0.1074 0.1424 0.1635 0.1826 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. TBA Anonymous N/A 256 float32
kp:512, match:nn
19-04-26 F 82.3 341.6 79.0 3.14 0.0984 0.1749 0.2254 0.2604 0.2864 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are HOG (as in SIFT). N/A Anonymous N/A 128 uint8
SIFT + GeoDesc
kp:2048, match:nn
19-05-19 F 95.3 1699.5 95.0 2.97 0.1524 0.2545 0.3115 0.3624 0.4017 Challenge organizers GeoDesc extracted on SIFT keypoints. Feature matching with brute-force nearest-neighbour search. N/A 128 float32
SIFT + HardNet
kp:2048, match:nn
19-05-19 F 95.1 1892.9 93.2 2.91 0.1425 0.2266 0.3006 0.3497 0.3919 Challenge organizers HardNet extracted on SIFT keypoints. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger patches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. N/A 128 float32
kp:8000, match:sGOr2f
19-05-23 F 96.6 1813.1 97.0 3.26 0.2109 0.3279 0.4042 0.4626 0.5019 Fabio Bellavia and Carlo Colombo HarrisZ keypoints and Root squared (like RootSIFT) sGLOH2 descriptors using sGOr2f* matching strategy. Distance table entries lower than their respective row-wise and column-wise averages are discarded and then matching pairs are computed using the greedy nearest neighbour as in the WISW@CAIP2019 contest. Keypoints and matches are ordered according to their ranks (best ones first). N/A 256 float32
HesAffNet - HardNet2
kp:8000, match:nn
19-05-29 F 97.8 5933.7 96.5 3.33 0.3222 0.4285 0.4937 0.5483 0.5888 Milan Pultar, Dmytro Mishkin, Jiří Matas Detector: Hessian + AffNet(affine shape) + OriNet (orientation), Code and weight from, Paper:; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train:, Paper:; DeepNets were plugged into MODS framework, but without view synthesis and matching; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. TBA N/A 128 uint8
Hessian - HardNet2
kp:8000, match:nn
19-05-30 F 96.5 6522.3 97.5 3.41 0.3363 0.4431 0.5004 0.5460 0.5798 Milan Pultar, Dmytro Mishkin, Jiří Matas Detector: Hessian. Gravity vector orientation is assumed; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: Paper:, DeepNets were plugged into MODS framework, but without view synthesis and matching; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. TBA N/A 128 uint8
ORB (OpenCV)
kp:8000, match:nn
19-04-24 F 88.6 2551.1 83.0 2.76 0.1663 0.2417 0.3006 0.3463 0.3810 Challenge organizers ORB, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. N/A TBA
Scale-invariant desc. (Log-Polar, lambda=32)
kp:8000, match:nn
19-06-25 F 97.2 6984.9 98.5 3.28 0.2495 0.3605 0.4283 0.4785 0.5177 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA N/A 128 float32
Scale-invariant desc. (Log-Polar, lambda=64)
kp:8000, match:nn
19-06-24 F 97.9 7051.2 98.2 3.31 0.2399 0.3521 0.4141 0.4673 0.5101 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA N/A 128 float32
Scale-invariant desc. (Log-Polar, lambda=96)
kp:8000, match:nn
19-06-20 F 97.1 7002.9 99.5 3.28 0.2307 0.3494 0.4270 0.4742 0.5106 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA N/A 128 float32
SIFT-AID (NN matcher)
kp:8000, match:nn
19-05-10 F 93.3 2182.3 92.0 2.83 0.1669 0.2666 0.3325 0.3850 0.4194 Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: Code: N/A 6272 bits
SIFT-AID (custom matcher)
kp:8000, match:sift-aid
19-04-29 F/M 92.4 2068.1 91.2 2.80 0.1584 0.2514 0.3177 0.3605 0.4051 Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: Code: N/A 6272 bits
SIFT + ContextDesc
kp:8000, match:nn
19-05-09 F 98.0 6330.9 98.5 3.28 0.2695 0.3867 0.4540 0.4986 0.5341 Zixin Luo ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. N/A 128 uint8
kp:8000, match:nn
19-05-28 F 97.0 5371.2 97.8 3.39 0.2258 0.3288 0.4026 0.4630 0.5033 Zixin Luo, Jiahui Zhang Dense-ContextDesc is a variant of ContextDesc, where descriptors are densely extracted from full images, instead of image patches, while other settings stay unchanged as original ContextDesc. We find Dense-ContextDesc performs better regarding in particular illumination changes. Dense-ContextDesc is extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. The model will be available on the authors' GitHub page. TBA N/A TBA
SIFT + ContextDesc + Inlier Classification V2
kp:8000, match:custom
19-05-28 F/M 99.2 6703.2 90.2 3.64 0.6642 0.7632 0.8043 0.8240 0.8371 Dawei Sun, Zixin Luo, Jiahui Zhang We use the SIFT detector and ContextDesc descriptor, and then we train an improved inlier classification and fundamental matrix estimation network based on [Yi et al. CVPR2018] ( N/A 128 float32
kp:8000, match:nn
19-05-08 F 97.4 5624.5 95.0 3.38 0.3033 0.4097 0.4788 0.5296 0.5633 Zixin Luo GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features extracted from the code provided by the authors. N/A 128 float32
SIFT + GeoDesc
kp:8000, match:nn
19-04-24 F 97.3 5817.2 97.0 3.41 0.3031 0.3966 0.4632 0.5150 0.5497 Challenge organizers GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. N/A 128 float32
SIFT + HardNet
kp:8000, match:nn
19-04-24 F 97.8 6797.9 98.2 3.33 0.2767 0.3813 0.4477 0.4950 0.5404 Challenge organizers HardNet extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. N/A 128 float32
SIFT + L2-Net
kp:8000, match:nn
19-04-24 F 97.6 6504.8 98.8 3.25 0.2300 0.3503 0.4192 0.4664 0.5079 Challenge organizers L2-Net extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. N/A 128 float32
SIFT + ContextDesc + Inlier Classification V1
kp:8000, match:custom
19-05-29 F/M 98.7 6598.4 92.5 3.61 0.6263 0.7309 0.7760 0.8065 0.8274 Dawei Sun, Zixin Luo, Jiahui Zhang We use the SIFT detector and ContextDesc descriptor, and then we train an inlier classification and fundamental matrix estimation network using the architecture of [Yi et al. CVPR2018] ( N/A 128 float32
kp:8000, match:nn
19-04-24 F 96.2 4660.9 92.2 3.28 0.2605 0.3551 0.4216 0.4699 0.5110 Challenge organizers SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. N/A 128 float32
SIFT + TFeat
kp:8000, match:nn
19-04-24 F 96.8 5811.1 97.0 3.19 0.2201 0.3241 0.3929 0.4449 0.4812 Challenge organizers T-Feat extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. N/A 128 float32
kp:2048, match:nn
19-05-17 F 87.1 1175.4 84.7 2.80 0.0755 0.1394 0.1915 0.2308 0.2651 Challenge organizers SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. N/A 128 float32
Superpoint (nn matcher)
kp:2048, match:nn
19-06-07 F 93.8 1601.8 93.5 3.47 0.2030 0.3008 0.3591 0.4018 0.4331 Challenge organizers SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. TBA N/A 256 float32
Superpoint (1:1 matcher)
kp:2048, match:nn1to1
19-06-07 F 94.3 1567.3 93.2 3.64 0.2446 0.3533 0.4084 0.4488 0.4822 Challenge organizers SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. TBA N/A 256 float32
SuperPoint (default)
kp:2048, match:nn
19-04-24 F 92.9 1088.8 93.0 3.54 0.2075 0.3153 0.3722 0.4203 0.4525 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned (about 1200 on average). Feature matching done with brute-force nearest-neighbour search. N/A 256 float32
kp:1024, match:nn
19-04-26 F 91.7 914.4 92.0 3.62 0.1855 0.2772 0.3370 0.3825 0.4207 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. N/A 256 float32
kp:2048, match:nn
19-04-26 F 93.1 1585.5 94.5 3.50 0.1955 0.2930 0.3598 0.4067 0.4375 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. N/A 256 float32
kp:256, match:nn
19-04-26 F 76.8 189.6 66.5 3.05 0.0376 0.0915 0.1210 0.1440 0.1620 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. N/A 256 float32
kp:512, match:nn
19-04-26 F 87.6 463.8 88.5 3.39 0.1346 0.2248 0.2769 0.3134 0.3449 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. N/A 256 float32
kp:8000, match:nn
19-04-26 F 95.1 4993.1 95.8 3.43 0.1759 0.2915 0.3651 0.4184 0.4571 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. N/A 256 float32
Scale-invariant desc. (Cartesian, lambda=16)
kp:8000, match:nn
19-07-29 F 97.3 7456.5 97.8 3.22 0.1949 0.3117 0.3781 0.4246 0.4590 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. (This is a baseline where we use cartesian patches instead.) Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA N/A 128 float32
SuperPoint (trained on coco + phototourism training set)
kp:2048, match:nn
19-05-30 F 93.7 1202.0 93.0 3.52 0.2204 0.3234 0.3850 0.4317 0.4649 Daniel DeTone, Paul Sarlin, Tomasz Malisiewicz, Andrew Rabinovich SuperPoint V1 model trained on COCO homographic warps at VGA resolution, plus pairs from the phototourism training set using the GT poses and depths for correspondence. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned. N/A 256 float32
SuperPoint + Custom Matcher (v1)
kp:2048, match:custom
19-05-30 F/M 97.4 1249.3 92.2 4.05 0.5040 0.6062 0.6750 0.7181 0.7449 Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich Features are extracted by a modified SuperPoint: - retrained on the phototourism training set; - keypoints refinement; - better descriptor sampling; - adjusted thresholds. A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). TBA N/A 256 float32
SuperPoint + Custom Matcher (v2)
kp:2048, match:custom
19-05-28 F/M 97.4 1476.1 89.5 4.10 0.5705 0.6767 0.7210 0.7541 0.7708 Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich Features are extracted by a modified SuperPoint: - retrained on the phototourism training set, - keypoints refinement, - better descriptor sampling, - adjusted thresholds; A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). TBA N/A 256 float32
kp:8000, match:nn
19-04-24 F 91.3 2316.1 87.2 3.08 0.1986 0.2845 0.3438 0.3844 0.4171 Challenge organizers SURF, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. N/A TBA
SIFT + ContextDesc
kp:8000, match:nn1to1
19-06-07 F 97.4 6743.8 98.5 3.34 0.2794 0.3979 0.4667 0.5176 0.5543 Challenge organizers ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with nearest-neighbour search, enforcing cross-match consistency. Features are quantized to uint8 and extracted from the code provided by the authors. N/A TBA

MVS — sequence 'sagrada_familia'
Method Date Type Ims (%) #Pts SR TL mAP5o mAP10o mAP15o mAP20o mAP25o ATE By Details Link Contact Updated Descriptor size
kp:8000, match:nn
19-04-24 F 94.8 5940.4 86.0 3.34 0.4935 0.5487 0.5709 0.5878 0.6045 Challenge organizers AKAZE, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. N/A TBA
kp:8000, match:nn
19-04-26 F 98.3 6872.1 94.0 3.48 0.6073 0.7024 0.7381 0.7566 0.7772 Anonymous We use OpenCV's implementation of AKAZE detector, and for each keypoint we extract a descriptor via CNN. N/A Anonymous N/A 256 float32
Brisk + SSS
kp:8000, match:nn
19-05-14 F 96.7 4768.0 91.2 2.86 0.3090 0.3994 0.4466 0.4838 0.5194 Anonymous We use OpenCV's implementation of brisk detector with the default settings, and for each image there are at most 8K keypoints. For each keypoint, we extract a descriptor via a CNN model. TBA Anonymous N/A 128 float32
D2-Net (single scale)
kp:8000, match:nn
19-05-07 F 89.5 7264.4 87.5 3.15 0.3794 0.4680 0.5189 0.5427 0.5621 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: N/A 512 float32
D2-Net (multiscale)
kp:8000, match:nn
19-05-07 F 89.9 10021.2 90.5 2.98 0.3677 0.4655 0.5083 0.5418 0.5670 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: N/A 512 float32
D2-Net (single scale, no PT dataset)
kp:8000, match:nn
19-06-01 F 89.6 7110.8 87.8 3.11 0.3851 0.4757 0.5127 0.5392 0.5611 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: N/A 512 float32
D2-Net (multi-scale, no PT dataset)
kp:8000, match:nn
19-06-05 F 89.3 7456.1 90.0 2.85 0.2961 0.4099 0.4576 0.4874 0.5115 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: N/A 512 float32
kp:1024, match:nn
19-05-05 F 73.3 380.0 50.5 2.30 0.0129 0.0379 0.0621 0.0849 0.1032 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: N/A 40 float32
kp:2048, match:nn
19-05-05 F 79.6 938.9 66.2 2.32 0.0322 0.0897 0.1365 0.1714 0.2008 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: N/A 40 float32
kp:256, match:nn
19-05-05 F 3.0 17.2 0.3 0.60 0.0000 0.0000 0.0000 0.0000 0.0000 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: N/A 40 float32
kp:512, match:nn
19-05-05 F 29.8 103.7 18.2 1.65 0.0001 0.0009 0.0018 0.0031 0.0044 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: N/A 40 float32
kp:512, match:nn
19-05-07 F 67.9 205.1 42.8 2.82 0.0704 0.0888 0.0985 0.1048 0.1102 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. TBA Anonymous N/A 256 float32
kp:512, match:nn
19-05-09 F 67.4 194.6 44.8 2.73 0.0593 0.0786 0.0889 0.0958 0.1012 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. TBA Anonymous N/A 256 float32
kp:512, match:nn
19-04-26 F 76.8 407.4 62.7 2.95 0.1847 0.2118 0.2236 0.2345 0.2443 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are HOG (as in SIFT). N/A Anonymous N/A 128 uint8
SIFT + GeoDesc
kp:2048, match:nn
19-05-19 F 96.2 1808.8 88.0 3.56 0.5144 0.6012 0.6402 0.6617 0.6751 Challenge organizers GeoDesc extracted on SIFT keypoints. Feature matching with brute-force nearest-neighbour search. N/A 128 float32
SIFT + HardNet
kp:2048, match:nn
19-05-19 F 96.3 1982.9 89.8 3.52 0.5206 0.6096 0.6428 0.6636 0.6857 Challenge organizers HardNet extracted on SIFT keypoints. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger patches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. N/A 128 float32
kp:8000, match:sGOr2f
19-05-23 F 98.5 3099.9 93.8 3.67 0.6417 0.7238 0.7566 0.7788 0.7897 Fabio Bellavia and Carlo Colombo HarrisZ keypoints and Root squared (like RootSIFT) sGLOH2 descriptors using sGOr2f* matching strategy. Distance table entries lower than their respective row-wise and column-wise averages are discarded and then matching pairs are computed using the greedy nearest neighbour as in the WISW@CAIP2019 contest. Keypoints and matches are ordered according to their ranks (best ones first). N/A 256 float32
HesAffNet - HardNet2
kp:8000, match:nn
19-05-29 F 99.0 6513.1 97.0 3.83 0.6514 0.7379 0.7711 0.7984 0.8137 Milan Pultar, Dmytro Mishkin, Jiří Matas Detector: Hessian + AffNet(affine shape) + OriNet (orientation), Code and weight from, Paper:; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train:, Paper:; DeepNets were plugged into MODS framework, but without view synthesis and matching; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. TBA N/A 128 uint8
Hessian - HardNet2
kp:8000, match:nn
19-05-30 F 92.6 6240.6 90.8 3.74 0.5166 0.5766 0.5997 0.6147 0.6272 Milan Pultar, Dmytro Mishkin, Jiří Matas Detector: Hessian. Gravity vector orientation is assumed; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: Paper:, DeepNets were plugged into MODS framework, but without view synthesis and matching; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. TBA N/A 128 uint8
ORB (OpenCV)
kp:8000, match:nn
19-04-24 F 90.3 4267.3 77.8 2.90 0.2972 0.3519 0.3802 0.3998 0.4188 Challenge organizers ORB, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. N/A TBA
Scale-invariant desc. (Log-Polar, lambda=32)
kp:8000, match:nn
19-06-25 F 98.9 7755.0 94.2 3.60 0.6742 0.7390 0.7671 0.7859 0.7980 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA N/A 128 float32
Scale-invariant desc. (Log-Polar, lambda=64)
kp:8000, match:nn
19-06-24 F 98.4 7764.4 94.2 3.55 0.6458 0.7195 0.7515 0.7726 0.7866 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA N/A 128 float32
Scale-invariant desc. (Log-Polar, lambda=96)
kp:8000, match:nn
19-06-20 F 98.1 7788.2 95.2 3.54 0.6345 0.7037 0.7335 0.7563 0.7683 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA N/A 128 float32
SIFT-AID (NN matcher)
kp:8000, match:nn
19-05-10 F 91.3 5160.2 77.5 2.96 0.3819 0.4405 0.4663 0.4851 0.5012 Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: Code: N/A 6272 bits
SIFT-AID (custom matcher)
kp:8000, match:sift-aid
19-04-29 F/M 90.3 4920.5 74.8 2.90 0.3649 0.4322 0.4562 0.4720 0.4852 Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: Code: N/A 6272 bits
SIFT + ContextDesc
kp:8000, match:nn
19-05-09 F 99.2 7398.7 97.8 3.82 0.6873 0.7552 0.7826 0.7969 0.8147 Zixin Luo ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. N/A 128 uint8
kp:8000, match:nn
19-05-28 F 97.7 6186.7 93.0 3.79 0.6268 0.6967 0.7268 0.7460 0.7563 Zixin Luo, Jiahui Zhang Dense-ContextDesc is a variant of ContextDesc, where descriptors are densely extracted from full images, instead of image patches, while other settings stay unchanged as original ContextDesc. We find Dense-ContextDesc performs better regarding in particular illumination changes. Dense-ContextDesc is extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. The model will be available on the authors' GitHub page. TBA N/A TBA
SIFT + ContextDesc + Inlier Classification V2
kp:8000, match:custom
19-05-28 F/M 99.9 7109.3 99.2 3.99 0.7882 0.8712 0.9044 0.9231 0.9325 Dawei Sun, Zixin Luo, Jiahui Zhang We use the SIFT detector and ContextDesc descriptor, and then we train an improved inlier classification and fundamental matrix estimation network based on [Yi et al. CVPR2018] ( N/A 128 float32
kp:8000, match:nn
19-05-08 F 98.7 6953.4 96.0 3.80 0.6939 0.7666 0.7983 0.8096 0.8279 Zixin Luo GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features extracted from the code provided by the authors. N/A 128 float32
SIFT + GeoDesc
kp:8000, match:nn
19-04-24 F 98.7 7011.0 95.5 3.81 0.6903 0.7544 0.7740 0.7939 0.8091 Challenge organizers GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. N/A 128 float32
SIFT + HardNet
kp:8000, match:nn
19-04-24 F 99.4 7983.0 98.5 3.76 0.7048 0.7742 0.8030 0.8179 0.8344 Challenge organizers HardNet extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. N/A 128 float32
SIFT + L2-Net
kp:8000, match:nn
19-04-24 F 98.8 7596.6 96.2 3.67 0.6693 0.7417 0.7682 0.7852 0.8004 Challenge organizers L2-Net extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. N/A 128 float32
SIFT + ContextDesc + Inlier Classification V1
kp:8000, match:custom
19-05-29 F/M 99.6 7004.5 98.5 3.98 0.7868 0.8671 0.8970 0.9121 0.9276 Dawei Sun, Zixin Luo, Jiahui Zhang We use the SIFT detector and ContextDesc descriptor, and then we train an inlier classification and fundamental matrix estimation network using the architecture of [Yi et al. CVPR2018] ( N/A 128 float32
kp:8000, match:nn
19-04-24 F 95.5 6397.0 85.0 3.40 0.5238 0.5770 0.6002 0.6150 0.6292 Challenge organizers SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. N/A 128 float32
SIFT + TFeat
kp:8000, match:nn
19-04-24 F 97.5 7119.6 92.2 3.55 0.5976 0.6585 0.6857 0.7047 0.7228 Challenge organizers T-Feat extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. N/A 128 float32
kp:2048, match:nn
19-05-17 F 88.8 1581.5 77.5 3.22 0.3584 0.4222 0.4469 0.4630 0.4728 Challenge organizers SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. N/A 128 float32
Superpoint (nn matcher)
kp:2048, match:nn
19-06-07 F 97.1 2094.6 88.2 4.08 0.6137 0.6817 0.7074 0.7189 0.7322 Challenge organizers SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. TBA N/A 256 float32
Superpoint (1:1 matcher)
kp:2048, match:nn1to1
19-06-07 F 97.3 2014.2 91.0 4.14 0.6567 0.7397 0.7685 0.7831 0.7912 Challenge organizers SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. TBA N/A 256 float32
SuperPoint (default)
kp:2048, match:nn
19-04-24 F 88.0 1373.6 81.0 3.54 0.4117 0.4641 0.4853 0.4977 0.5074 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned (about 1200 on average). Feature matching done with brute-force nearest-neighbour search. N/A 256 float32
kp:1024, match:nn
19-04-26 F 87.2 981.2 79.2 3.54 0.3896 0.4424 0.4623 0.4737 0.4858 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. N/A 256 float32
kp:2048, match:nn
19-04-26 F 88.5 1838.3 82.2 3.56 0.4220 0.4732 0.4956 0.5074 0.5159 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. N/A 256 float32
kp:256, match:nn
19-04-26 F 76.2 231.9 55.8 3.10 0.1653 0.1907 0.2019 0.2115 0.2179 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. N/A 256 float32
kp:512, match:nn
19-04-26 F 83.8 472.3 71.8 3.38 0.3075 0.3490 0.3634 0.3772 0.3851 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. N/A 256 float32
kp:8000, match:nn
19-04-26 F 89.5 6354.4 87.5 3.34 0.3630 0.4459 0.4828 0.5083 0.5269 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. N/A 256 float32
Scale-invariant desc. (Cartesian, lambda=16)
kp:8000, match:nn
19-07-29 F 99.5 7966.5 98.5 3.81 0.6945 0.7823 0.8166 0.8363 0.8447 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. (This is a baseline where we use cartesian patches instead.) Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA N/A 128 float32
SuperPoint (trained on coco + phototourism training set)
kp:2048, match:nn
19-05-30 F 95.8 1868.7 85.8 3.83 0.5707 0.6319 0.6564 0.6721 0.6820 Daniel DeTone, Paul Sarlin, Tomasz Malisiewicz, Andrew Rabinovich SuperPoint V1 model trained on COCO homographic warps at VGA resolution, plus pairs from the phototourism training set using the GT poses and depths for correspondence. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned. N/A 256 float32
SuperPoint + Custom Matcher (v1)
kp:2048, match:custom
19-05-30 F/M 98.0 1599.9 91.2 4.09 0.6830 0.7700 0.7949 0.8121 0.8225 Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich Features are extracted by a modified SuperPoint: - retrained on the phototourism training set; - keypoints refinement; - better descriptor sampling; - adjusted thresholds. A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). TBA N/A 256 float32
SuperPoint + Custom Matcher (v2)
kp:2048, match:custom
19-05-28 F/M 98.1 1818.0 92.8 4.22 0.6883 0.7644 0.7942 0.8133 0.8285 Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich Features are extracted by a modified SuperPoint: - retrained on the phototourism training set, - keypoints refinement, - better descriptor sampling, - adjusted thresholds; A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). TBA N/A 256 float32
kp:8000, match:nn
19-04-24 F 91.6 4425.8 79.0 3.08 0.4146 0.4687 0.4977 0.5097 0.5208 Challenge organizers SURF, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. N/A TBA
SIFT + ContextDesc
kp:8000, match:nn1to1
19-06-07 F 99.5 7543.2 99.2 3.93 0.7400 0.8114 0.8463 0.8602 0.8750 Challenge organizers ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with nearest-neighbour search, enforcing cross-match consistency. Features are quantized to uint8 and extracted from the code provided by the authors. N/A TBA

MVS — sequence 'st_pauls_cathedral'
Method Date Type Ims (%) #Pts SR TL mAP5o mAP10o mAP15o mAP20o mAP25o ATE By Details Link Contact Updated Descriptor size
kp:8000, match:nn
19-04-24 F 98.1 5029.6 93.0 3.34 0.4401 0.5381 0.6054 0.6487 0.6816 Challenge organizers AKAZE, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. N/A TBA
kp:8000, match:nn
19-04-26 F 99.4 3838.2 98.2 3.34 0.4444 0.5737 0.6481 0.6951 0.7375 Anonymous We use OpenCV's implementation of AKAZE detector, and for each keypoint we extract a descriptor via CNN. N/A Anonymous N/A 256 float32
Brisk + SSS
kp:8000, match:nn
19-05-14 F 97.9 2951.3 94.0 2.82 0.2455 0.3941 0.4849 0.5456 0.6024 Anonymous We use OpenCV's implementation of brisk detector with the default settings, and for each image there are at most 8K keypoints. For each keypoint, we extract a descriptor via a CNN model. TBA Anonymous N/A 128 float32
D2-Net (single scale)
kp:8000, match:nn
19-05-07 F 94.5 3953.4 92.5 3.12 0.2786 0.4211 0.5070 0.5661 0.6065 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: N/A 512 float32
D2-Net (multiscale)
kp:8000, match:nn
19-05-07 F 95.3 5052.5 94.0 2.92 0.2677 0.4203 0.5044 0.5542 0.6053 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: N/A 512 float32
D2-Net (single scale, no PT dataset)
kp:8000, match:nn
19-06-01 F 94.7 4027.0 92.5 3.13 0.2925 0.4395 0.5209 0.5768 0.6201 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: N/A 512 float32
D2-Net (multi-scale, no PT dataset)
kp:8000, match:nn
19-06-05 F 94.9 4921.9 93.5 2.91 0.2615 0.4158 0.5162 0.5794 0.6231 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: N/A 512 float32
kp:1024, match:nn
19-05-05 F 86.6 764.3 72.5 2.36 0.0304 0.1220 0.2125 0.2769 0.3233 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: N/A 40 float32
kp:2048, match:nn
19-05-05 F 90.5 1439.9 84.2 2.34 0.0295 0.1255 0.2347 0.3221 0.3912 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: N/A 40 float32
kp:256, match:nn
19-05-05 F 11.1 32.4 10.8 1.30 0.0000 0.0002 0.0005 0.0007 0.0009 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: N/A 40 float32
kp:512, match:nn
19-05-05 F 75.8 338.4 53.0 2.39 0.0205 0.0828 0.1425 0.1791 0.2028 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: N/A 40 float32
kp:512, match:nn
19-05-07 F 38.7 175.0 32.2 1.97 0.0421 0.0638 0.0742 0.0815 0.0863 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. TBA Anonymous N/A 256 float32
kp:512, match:nn
19-05-09 F 44.5 209.0 43.5 2.06 0.0651 0.1088 0.1297 0.1422 0.1516 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. TBA Anonymous N/A 256 float32
kp:512, match:nn
19-04-26 F 75.3 285.9 44.8 2.81 0.1219 0.1727 0.1925 0.2039 0.2102 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are HOG (as in SIFT). N/A Anonymous N/A 128 uint8
SIFT + GeoDesc
kp:2048, match:nn
19-05-19 F 97.4 1541.3 89.2 3.21 0.3546 0.4782 0.5461 0.6014 0.6410 Challenge organizers GeoDesc extracted on SIFT keypoints. Feature matching with brute-force nearest-neighbour search. N/A 128 float32
SIFT + HardNet
kp:2048, match:nn
19-05-19 F 96.5 1732.2 88.5 3.18 0.3593 0.4710 0.5339 0.5833 0.6193 Challenge organizers HardNet extracted on SIFT keypoints. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger patches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. N/A 128 float32
kp:8000, match:sGOr2f
19-05-23 F 99.2 1582.2 95.8 3.41 0.4207 0.5421 0.6161 0.6674 0.7073 Fabio Bellavia and Carlo Colombo HarrisZ keypoints and Root squared (like RootSIFT) sGLOH2 descriptors using sGOr2f* matching strategy. Distance table entries lower than their respective row-wise and column-wise averages are discarded and then matching pairs are computed using the greedy nearest neighbour as in the WISW@CAIP2019 contest. Keypoints and matches are ordered according to their ranks (best ones first). N/A 256 float32
HesAffNet - HardNet2
kp:8000, match:nn
19-05-29 F 99.3 5409.4 97.2 3.60 0.4946 0.5948 0.6565 0.7065 0.7519 Milan Pultar, Dmytro Mishkin, Jiří Matas Detector: Hessian + AffNet(affine shape) + OriNet (orientation), Code and weight from, Paper:; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train:, Paper:; DeepNets were plugged into MODS framework, but without view synthesis and matching; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. TBA N/A 128 uint8
Hessian - HardNet2
kp:8000, match:nn
19-05-30 F 97.1 5559.4 96.0 3.69 0.4895 0.5809 0.6336 0.6708 0.6942 Milan Pultar, Dmytro Mishkin, Jiří Matas Detector: Hessian. Gravity vector orientation is assumed; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: Paper:, DeepNets were plugged into MODS framework, but without view synthesis and matching; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. TBA N/A 128 uint8
ORB (OpenCV)
kp:8000, match:nn
19-04-24 F 92.1 3463.2 76.8 2.80 0.2696 0.3522 0.3913 0.4239 0.4531 Challenge organizers ORB, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. N/A TBA
Scale-invariant desc. (Log-Polar, lambda=32)
kp:8000, match:nn
19-06-25 F 99.4 6959.9 97.0 3.48 0.4576 0.5670 0.6324 0.6850 0.7258 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA N/A 128 float32
Scale-invariant desc. (Log-Polar, lambda=64)
kp:8000, match:nn
19-06-24 F 99.2 7072.3 97.2 3.50 0.4674 0.5807 0.6505 0.6959 0.7393 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA N/A 128 float32
Scale-invariant desc. (Log-Polar, lambda=96)
kp:8000, match:nn
19-06-20 F 98.9 7100.3 98.0 3.52 0.4724 0.5723 0.6406 0.6897 0.7307 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA N/A 128 float32
SIFT-AID (NN matcher)
kp:8000, match:nn
19-05-10 F 89.8 2453.7 79.8 2.72 0.2058 0.2932 0.3444 0.3862 0.4224 Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: Code: N/A 6272 bits
SIFT-AID (custom matcher)
kp:8000, match:sift-aid
19-04-29 F/M 89.4 2358.0 78.0 2.67 0.1967 0.2852 0.3357 0.3700 0.4064 Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: Code: N/A 6272 bits
SIFT + ContextDesc
kp:8000, match:nn
19-05-09 F 99.3 6204.4 99.0 3.26 0.4145 0.5301 0.5961 0.6511 0.6984 Zixin Luo ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. N/A 128 uint8
kp:8000, match:nn
19-05-28 F 98.2 5378.9 96.5 3.30 0.3933 0.5027 0.5668 0.6189 0.6594 Zixin Luo, Jiahui Zhang Dense-ContextDesc is a variant of ContextDesc, where descriptors are densely extracted from full images, instead of image patches, while other settings stay unchanged as original ContextDesc. We find Dense-ContextDesc performs better regarding in particular illumination changes. Dense-ContextDesc is extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. The model will be available on the authors' GitHub page. TBA N/A TBA
SIFT + ContextDesc + Inlier Classification V2
kp:8000, match:custom
19-05-28 F/M 99.9 6463.6 99.2 3.51 0.7123 0.8234 0.8776 0.9129 0.9317 Dawei Sun, Zixin Luo, Jiahui Zhang We use the SIFT detector and ContextDesc descriptor, and then we train an improved inlier classification and fundamental matrix estimation network based on [Yi et al. CVPR2018] ( N/A 128 float32
kp:8000, match:nn
19-05-08 F 99.2 5465.5 96.2 3.31 0.4321 0.5493 0.6165 0.6710 0.7125 Zixin Luo GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features extracted from the code provided by the authors. N/A 128 float32
SIFT + GeoDesc
kp:8000, match:nn
19-04-24 F 99.2 5972.0 97.2 3.50 0.4401 0.5554 0.6234 0.6820 0.7212 Challenge organizers GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. N/A 128 float32
SIFT + HardNet
kp:8000, match:nn
19-04-24 F 99.7 6868.1 98.2 3.52 0.4578 0.5680 0.6376 0.6851 0.7191 Challenge organizers HardNet extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. N/A 128 float32
SIFT + L2-Net
kp:8000, match:nn
19-04-24 F 99.3 6480.5 98.2 3.33 0.3764 0.4962 0.5832 0.6330 0.6822 Challenge organizers L2-Net extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. N/A 128 float32
SIFT + ContextDesc + Inlier Classification V1
kp:8000, match:custom
19-05-29 F/M 100.0 6347.3 98.8 3.51 0.6772 0.8028 0.8567 0.8880 0.9082 Dawei Sun, Zixin Luo, Jiahui Zhang We use the SIFT detector and ContextDesc descriptor, and then we train an inlier classification and fundamental matrix estimation network using the architecture of [Yi et al. CVPR2018] ( N/A 128 float32
kp:8000, match:nn
19-04-24 F 96.4 4750.2 89.8 3.11 0.3349 0.4361 0.4979 0.5367 0.5821 Challenge organizers SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. N/A 128 float32
SIFT + TFeat
kp:8000, match:nn
19-04-24 F 98.8 5944.1 96.5 3.23 0.3548 0.4763 0.5503 0.6128 0.6506 Challenge organizers T-Feat extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. N/A 128 float32
kp:2048, match:nn
19-05-17 F 87.5 1255.7 73.0 2.97 0.2381 0.3189 0.3590 0.3897 0.4106 Challenge organizers SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. N/A 128 float32
Superpoint (nn matcher)
kp:2048, match:nn
19-06-07 F 98.0 1626.5 92.5 3.69 0.4208 0.5450 0.6131 0.6646 0.7078 Challenge organizers SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. TBA N/A 256 float32
Superpoint (1:1 matcher)
kp:2048, match:nn1to1
19-06-07 F 98.8 1587.8 95.0 3.79 0.5113 0.6443 0.7030 0.7471 0.7763 Challenge organizers SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. TBA N/A 256 float32
SuperPoint (default)
kp:2048, match:nn
19-04-24 F 93.5 1027.1 88.5 3.59 0.3638 0.4593 0.5128 0.5560 0.5902 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned (about 1200 on average). Feature matching done with brute-force nearest-neighbour search. N/A 256 float32
kp:1024, match:nn
19-04-26 F 92.9 830.4 85.2 3.58 0.3786 0.4686 0.5174 0.5500 0.5801 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. N/A 256 float32
kp:2048, match:nn
19-04-26 F 93.6 1468.0 90.5 3.51 0.3513 0.4497 0.5199 0.5611 0.5891 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. N/A 256 float32
kp:256, match:nn
19-04-26 F 49.7 184.0 35.2 2.32 0.1264 0.1528 0.1638 0.1699 0.1742 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. N/A 256 float32
kp:512, match:nn
19-04-26 F 89.4 434.7 69.5 3.34 0.3006 0.3722 0.4048 0.4272 0.4503 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. N/A 256 float32
kp:8000, match:nn
19-04-26 F 94.8 4882.2 94.5 3.30 0.2568 0.3925 0.4707 0.5367 0.5743 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. N/A 256 float32
Scale-invariant desc. (Cartesian, lambda=16)
kp:8000, match:nn
19-07-29 F 99.8 7262.0 98.0 3.54 0.4289 0.5625 0.6483 0.7029 0.7403 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. (This is a baseline where we use cartesian patches instead.) Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA N/A 128 float32
SuperPoint (trained on coco + phototourism training set)
kp:2048, match:nn
19-05-30 F 97.3 1298.0 89.0 3.59 0.4018 0.5150 0.5875 0.6367 0.6728 Daniel DeTone, Paul Sarlin, Tomasz Malisiewicz, Andrew Rabinovich SuperPoint V1 model trained on COCO homographic warps at VGA resolution, plus pairs from the phototourism training set using the GT poses and depths for correspondence. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned. N/A 256 float32
SuperPoint + Custom Matcher (v1)
kp:2048, match:custom
19-05-30 F/M 99.5 1230.0 95.0 3.96 0.6269 0.7390 0.7887 0.8344 0.8547 Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich Features are extracted by a modified SuperPoint: - retrained on the phototourism training set; - keypoints refinement; - better descriptor sampling; - adjusted thresholds. A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). TBA N/A 256 float32
SuperPoint + Custom Matcher (v2)
kp:2048, match:custom
19-05-28 F/M 99.5 1464.3 96.0 4.00 0.6377 0.7520 0.8112 0.8422 0.8664 Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich Features are extracted by a modified SuperPoint: - retrained on the phototourism training set, - keypoints refinement, - better descriptor sampling, - adjusted thresholds; A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). TBA N/A 256 float32
kp:8000, match:nn
19-04-24 F 92.4 2982.1 80.2 2.85 0.2626 0.3615 0.4159 0.4536 0.4859 Challenge organizers SURF, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. N/A TBA
SIFT + ContextDesc
kp:8000, match:nn1to1
19-06-07 F 99.8 6758.6 98.2 3.39 0.5165 0.6437 0.7185 0.7677 0.8067 Challenge organizers ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with nearest-neighbour search, enforcing cross-match consistency. Features are quantized to uint8 and extracted from the code provided by the authors. N/A TBA

MVS — sequence 'united_states_capitol'
Method Date Type Ims (%) #Pts SR TL mAP5o mAP10o mAP15o mAP20o mAP25o ATE By Details Link Contact Updated Descriptor size
kp:8000, match:nn
19-04-24 F 91.0 1813.4 86.5 2.91 0.0720 0.1280 0.1847 0.2412 0.2880 Challenge organizers AKAZE, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. N/A TBA
kp:8000, match:nn
19-04-26 F 89.0 928.2 82.5 2.76 0.0510 0.1023 0.1606 0.2114 0.2634 Anonymous We use OpenCV's implementation of AKAZE detector, and for each keypoint we extract a descriptor via CNN. N/A Anonymous N/A 256 float32
Brisk + SSS
kp:8000, match:nn
19-05-14 F 80.4 701.1 71.0 2.48 0.0194 0.0518 0.0936 0.1299 0.1688 Anonymous We use OpenCV's implementation of brisk detector with the default settings, and for each image there are at most 8K keypoints. For each keypoint, we extract a descriptor via a CNN model. TBA Anonymous N/A 128 float32
D2-Net (single scale)
kp:8000, match:nn
19-05-07 F 88.8 1326.9 86.8 2.69 0.0220 0.0745 0.1321 0.1910 0.2469 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: N/A 512 float32
D2-Net (multiscale)
kp:8000, match:nn
19-05-07 F 89.8 1647.0 89.5 2.76 0.0207 0.0729 0.1323 0.1987 0.2639 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: N/A 512 float32
D2-Net (single scale, no PT dataset)
kp:8000, match:nn
19-06-01 F 89.2 1400.0 86.0 2.71 0.0274 0.0861 0.1431 0.1964 0.2513 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: N/A 512 float32
D2-Net (multi-scale, no PT dataset)
kp:8000, match:nn
19-06-05 F 90.4 1616.3 89.0 2.82 0.0203 0.0651 0.1208 0.1831 0.2547 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: N/A 512 float32
kp:1024, match:nn
19-05-05 F 81.0 423.7 81.0 2.35 0.0025 0.0256 0.0602 0.0995 0.1363 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: N/A 40 float32
kp:2048, match:nn
19-05-05 F 83.0 653.1 83.8 2.30 0.0044 0.0264 0.0615 0.1043 0.1467 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: N/A 40 float32
kp:256, match:nn
19-05-05 F 61.2 88.2 37.3 2.31 0.0007 0.0043 0.0112 0.0208 0.0286 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: N/A 40 float32
kp:512, match:nn
19-05-05 F 72.0 207.2 68.2 2.35 0.0014 0.0171 0.0375 0.0634 0.0849 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: N/A 40 float32
kp:512, match:nn
19-05-07 F 3.7 19.0 5.2 0.60 0.0001 0.0002 0.0002 0.0002 0.0003 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. TBA Anonymous N/A 256 float32
kp:512, match:nn
19-05-09 F 25.9 62.5 12.5 1.74 0.0000 0.0004 0.0009 0.0013 0.0017 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. TBA Anonymous N/A 256 float32
kp:512, match:nn
19-04-26 F 28.2 79.4 34.2 1.77 0.0017 0.0038 0.0055 0.0067 0.0080 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are HOG (as in SIFT). N/A Anonymous N/A 128 uint8
SIFT + GeoDesc
kp:2048, match:nn
19-05-19 F 82.5 676.1 81.5 2.69 0.0247 0.0590 0.1021 0.1516 0.1885 Challenge organizers GeoDesc extracted on SIFT keypoints. Feature matching with brute-force nearest-neighbour search. N/A 128 float32
SIFT + HardNet
kp:2048, match:nn
19-05-19 F 82.3 715.0 80.8 2.74 0.0359 0.0787 0.1218 0.1618 0.1988 Challenge organizers HardNet extracted on SIFT keypoints. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger patches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. N/A 128 float32
kp:8000, match:sGOr2f
19-05-23 F 89.8 653.5 85.8 2.94 0.0999 0.1562 0.2137 0.2636 0.3083 Fabio Bellavia and Carlo Colombo HarrisZ keypoints and Root squared (like RootSIFT) sGLOH2 descriptors using sGOr2f* matching strategy. Distance table entries lower than their respective row-wise and column-wise averages are discarded and then matching pairs are computed using the greedy nearest neighbour as in the WISW@CAIP2019 contest. Keypoints and matches are ordered according to their ranks (best ones first). N/A 256 float32
HesAffNet - HardNet2
kp:8000, match:nn
19-05-29 F 90.6 2309.4 90.0 2.84 0.0540 0.1056 0.1619 0.2106 0.2607 Milan Pultar, Dmytro Mishkin, Jiří Matas Detector: Hessian + AffNet(affine shape) + OriNet (orientation), Code and weight from, Paper:; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train:, Paper:; DeepNets were plugged into MODS framework, but without view synthesis and matching; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. TBA N/A 128 uint8
Hessian - HardNet2
kp:8000, match:nn
19-05-30 F 86.7 2442.2 86.0 2.97 0.0555 0.1024 0.1572 0.2004 0.2431 Milan Pultar, Dmytro Mishkin, Jiří Matas Detector: Hessian. Gravity vector orientation is assumed; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: Paper:, DeepNets were plugged into MODS framework, but without view synthesis and matching; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. TBA N/A 128 uint8
ORB (OpenCV)
kp:8000, match:nn
19-04-24 F 74.6 1198.0 60.0 2.39 0.0159 0.0394 0.0613 0.0846 0.1068 Challenge organizers ORB, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. N/A TBA
Scale-invariant desc. (Log-Polar, lambda=32)
kp:8000, match:nn
19-06-25 F 92.6 2591.7 92.2 2.88 0.0767 0.1290 0.1863 0.2457 0.2978 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA N/A 128 float32
Scale-invariant desc. (Log-Polar, lambda=64)
kp:8000, match:nn
19-06-24 F 92.4 2676.2 91.8 2.90 0.0788 0.1378 0.2032 0.2611 0.3144 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA N/A 128 float32
Scale-invariant desc. (Log-Polar, lambda=96)
kp:8000, match:nn
19-06-20 F 92.7 2692.5 93.5 2.88 0.0727 0.1337 0.1957 0.2622 0.3221 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA N/A 128 float32
SIFT-AID (NN matcher)
kp:8000, match:nn
19-05-10 F 78.1 550.6 70.2 2.51 0.0088 0.0263 0.0502 0.0817 0.1114 Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: Code: N/A 6272 bits
SIFT-AID (custom matcher)
kp:8000, match:sift-aid
19-04-29 F/M 77.4 521.8 68.5 2.49 0.0060 0.0208 0.0491 0.0757 0.1008 Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: Code: N/A 6272 bits
SIFT + ContextDesc
kp:8000, match:nn
19-05-09 F 91.9 1870.7 90.8 2.77 0.0663 0.1178 0.1735 0.2268 0.2712 Zixin Luo ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. N/A 128 uint8
kp:8000, match:nn
19-05-28 F 88.5 1579.1 86.0 2.69 0.0515 0.0934 0.1371 0.1923 0.2406 Zixin Luo, Jiahui Zhang Dense-ContextDesc is a variant of ContextDesc, where descriptors are densely extracted from full images, instead of image patches, while other settings stay unchanged as original ContextDesc. We find Dense-ContextDesc performs better regarding in particular illumination changes. Dense-ContextDesc is extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. The model will be available on the authors' GitHub page. TBA N/A TBA
SIFT + ContextDesc + Inlier Classification V2
kp:8000, match:custom
19-05-28 F/M 93.6 2254.2 95.8 2.89 0.0986 0.1665 0.2345 0.2961 0.3552 Dawei Sun, Zixin Luo, Jiahui Zhang We use the SIFT detector and ContextDesc descriptor, and then we train an improved inlier classification and fundamental matrix estimation network based on [Yi et al. CVPR2018] ( N/A 128 float32
kp:8000, match:nn
19-05-08 F 90.8 1685.8 89.0 2.80 0.0634 0.1011 0.1497 0.2003 0.2523 Zixin Luo GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features extracted from the code provided by the authors. N/A 128 float32
SIFT + GeoDesc
kp:8000, match:nn
19-04-24 F 91.3 2088.3 90.2 2.86 0.0640 0.1096 0.1616 0.2239 0.2710 Challenge organizers GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. N/A 128 float32
SIFT + HardNet
kp:8000, match:nn
19-04-24 F 92.5 2496.1 93.5 2.87 0.0659 0.1175 0.1794 0.2339 0.2888 Challenge organizers HardNet extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. N/A 128 float32
SIFT + L2-Net
kp:8000, match:nn
19-04-24 F 91.1 2193.0 93.0 2.87 0.0671 0.1217 0.1778 0.2377 0.2895 Challenge organizers L2-Net extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. N/A 128 float32
SIFT + ContextDesc + Inlier Classification V1
kp:8000, match:custom
19-05-29 F/M 93.8 2146.5 96.8 2.88 0.0897 0.1619 0.2190 0.2876 0.3456 Dawei Sun, Zixin Luo, Jiahui Zhang We use the SIFT detector and ContextDesc descriptor, and then we train an inlier classification and fundamental matrix estimation network using the architecture of [Yi et al. CVPR2018] ( N/A 128 float32
kp:8000, match:nn
19-04-24 F 82.8 1322.7 79.8 2.65 0.0441 0.0756 0.1131 0.1496 0.1827 Challenge organizers SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. N/A 128 float32
SIFT + TFeat
kp:8000, match:nn
19-04-24 F 90.0 1949.4 87.8 2.74 0.0448 0.0903 0.1402 0.1946 0.2422 Challenge organizers T-Feat extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. N/A 128 float32
kp:2048, match:nn
19-05-17 F 72.9 516.2 63.2 2.54 0.0216 0.0431 0.0652 0.0850 0.1032 Challenge organizers SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. N/A 128 float32
Superpoint (nn matcher)
kp:2048, match:nn
19-06-07 F 86.9 669.9 86.2 2.85 0.0493 0.1073 0.1618 0.2121 0.2538 Challenge organizers SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. TBA N/A 256 float32
Superpoint (1:1 matcher)
kp:2048, match:nn1to1
19-06-07 F 88.3 656.0 87.2 2.98 0.0628 0.1220 0.1814 0.2381 0.2909 Challenge organizers SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. TBA N/A 256 float32
SuperPoint (default)
kp:2048, match:nn
19-04-24 F 82.5 376.4 75.0 2.92 0.0608 0.1068 0.1470 0.1855 0.2230 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned (about 1200 on average). Feature matching done with brute-force nearest-neighbour search. N/A 256 float32
kp:1024, match:nn
19-04-26 F 79.4 335.0 74.2 2.87 0.0524 0.0956 0.1302 0.1667 0.1983 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. N/A 256 float32
kp:2048, match:nn
19-04-26 F 84.4 537.5 77.5 2.87 0.0563 0.1054 0.1501 0.1934 0.2285 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. N/A 256 float32
kp:256, match:nn
19-04-26 F 26.1 75.3 18.0 1.75 0.0007 0.0011 0.0016 0.0017 0.0023 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. N/A 256 float32
kp:512, match:nn
19-04-26 F 66.6 152.6 57.0 2.59 0.0231 0.0409 0.0585 0.0746 0.0871 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. N/A 256 float32
kp:8000, match:nn
19-04-26 F 86.7 1606.2 84.8 2.63 0.0276 0.0716 0.1208 0.1717 0.2196 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. N/A 256 float32
Scale-invariant desc. (Cartesian, lambda=16)
kp:8000, match:nn
19-07-29 F 91.8 2760.2 94.5 2.75 0.0476 0.1005 0.1591 0.2132 0.2671 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. (This is a baseline where we use cartesian patches instead.) Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA N/A 128 float32
SuperPoint (trained on coco + phototourism training set)
kp:2048, match:nn
19-05-30 F 85.0 464.9 82.5 2.94 0.0617 0.1122 0.1590 0.2069 0.2538 Daniel DeTone, Paul Sarlin, Tomasz Malisiewicz, Andrew Rabinovich SuperPoint V1 model trained on COCO homographic warps at VGA resolution, plus pairs from the phototourism training set using the GT poses and depths for correspondence. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned. N/A 256 float32
SuperPoint + Custom Matcher (v1)
kp:2048, match:custom
19-05-30 F/M 91.1 525.3 88.8 3.23 0.1240 0.1938 0.2596 0.3222 0.3717 Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich Features are extracted by a modified SuperPoint: - retrained on the phototourism training set; - keypoints refinement; - better descriptor sampling; - adjusted thresholds. A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). TBA N/A 256 float32
SuperPoint + Custom Matcher (v2)
kp:2048, match:custom
19-05-28 F/M 91.8 640.3 90.5 3.26 0.1085 0.1803 0.2540 0.3161 0.3740 Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich Features are extracted by a modified SuperPoint: - retrained on the phototourism training set, - keypoints refinement, - better descriptor sampling, - adjusted thresholds; A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). TBA N/A 256 float32
kp:8000, match:nn
19-04-24 F 82.8 967.7 73.0 2.52 0.0190 0.0419 0.0760 0.1135 0.1451 Challenge organizers SURF, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. N/A TBA
SIFT + ContextDesc
kp:8000, match:nn1to1
19-06-07 F 91.9 2349.6 93.2 2.78 0.0712 0.1350 0.1965 0.2561 0.3058 Challenge organizers ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with nearest-neighbour search, enforcing cross-match consistency. Features are quantized to uint8 and extracted from the code provided by the authors. N/A TBA