Breakdown: Phototourism | MVS | Subsets

Breakdown on the Phototourism dataset, multi-view stereo task, by subset size.


MVS, bag size 3
Method Date Type Ims (%) #Pts SR TL mAP5o mAP10o mAP15o mAP20o mAP25o ATE By Details Link Contact Updated Descriptor size
AKAZE (OpenCV)
kp:8000, match:nn
19-04-24 F 100.0 692.9 57.5 2.28 0.0418 0.0885 0.1321 0.1794 0.2145 Challenge organizers AKAZE, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A TBA
SphereDesc
kp:8000, match:nn
19-04-26 F 100.0 727.0 77.6 2.28 0.0785 0.1542 0.2267 0.2839 0.3364 Anonymous We use OpenCV's implementation of AKAZE detector, and for each keypoint we extract a descriptor via CNN. N/A Anonymous N/A 256 float32
Brisk + SSS
kp:8000, match:nn
19-05-14 F 100.0 424.4 58.2 2.16 0.0115 0.0409 0.0803 0.1176 0.1564 Anonymous We use OpenCV's implementation of brisk detector with the default settings, and for each image there are at most 8K keypoints. For each keypoint, we extract a descriptor via a CNN model. TBA Anonymous N/A 128 float32
D2-Net (single scale)
kp:8000, match:nn
19-05-07 F 100.0 752.7 72.0 2.23 0.0536 0.1255 0.1985 0.2521 0.2973 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf https://github.com/mihaidusmanu/d2-net imagematching@uvic.ca N/A 512 float32
D2-Net (multiscale)
kp:8000, match:nn
19-05-07 F 100.0 896.8 78.2 2.20 0.0545 0.1324 0.1961 0.2667 0.3279 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf https://github.com/mihaidusmanu/d2-net imagematching@uvic.ca N/A 512 float32
D2-Net (single scale, no PT dataset)
kp:8000, match:nn
19-06-01 F 100.0 763.3 72.0 2.22 0.0488 0.1221 0.1870 0.2424 0.2961 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf https://github.com/mihaidusmanu/d2-net imagematching@uvic.ca N/A 512 float32
D2-Net (multi-scale, no PT dataset)
kp:8000, match:nn
19-06-05 F 100.0 813.4 76.7 2.21 0.0418 0.1185 0.1939 0.2570 0.3194 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf https://github.com/mihaidusmanu/d2-net imagematching@uvic.ca N/A 512 float32
DELF
kp:1024, match:nn
19-05-05 F 100.0 143.7 32.4 2.12 0.0048 0.0233 0.0388 0.0603 0.0779 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 https://github.com/tensorflow/models/tree/master/research/delf imagematching@uvic.ca N/A 40 float32
DELF
kp:2048, match:nn
19-05-05 F 100.0 226.5 47.5 2.10 0.0048 0.0324 0.0648 0.0997 0.1306 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 https://github.com/tensorflow/models/tree/master/research/delf imagematching@uvic.ca N/A 40 float32
DELF
kp:256, match:nn
19-05-05 F 36.4 23.6 0.5 0.79 0.0000 0.0003 0.0003 0.0006 0.0012 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 https://github.com/tensorflow/models/tree/master/research/delf imagematching@uvic.ca N/A 40 float32
DELF
kp:512, match:nn
19-05-05 F 90.9 75.0 10.3 2.00 0.0012 0.0058 0.0121 0.0164 0.0221 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 https://github.com/tensorflow/models/tree/master/research/delf imagematching@uvic.ca N/A 40 float32
ELF-256D
kp:512, match:nn
19-05-07 F 63.6 54.8 2.5 1.48 0.0021 0.0036 0.0061 0.0079 0.0103 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. TBA Anonymous N/A 256 float32
ELF-512D
kp:512, match:nn
19-05-09 F 72.7 57.9 4.9 1.65 0.0027 0.0058 0.0097 0.0124 0.0158 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. TBA Anonymous N/A 256 float32
ELF-SIFT
kp:512, match:nn
19-04-26 F 90.9 96.5 9.9 2.12 0.0073 0.0227 0.0303 0.0388 0.0476 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are HOG (as in SIFT). N/A Anonymous N/A 128 uint8
SIFT + GeoDesc
kp:2048, match:nn
19-05-19 F 100.0 286.7 59.0 2.30 0.0439 0.1018 0.1470 0.1894 0.2252 Challenge organizers GeoDesc extracted on SIFT keypoints. Feature matching with brute-force nearest-neighbour search. https://github.com/lzx551402/geodesc imagematching@uvic.ca N/A 128 float32
SIFT + HardNet
kp:2048, match:nn
19-05-19 F 100.0 291.2 57.2 2.29 0.0418 0.0927 0.1464 0.1885 0.2261 Challenge organizers HardNet extracted on SIFT keypoints. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger patches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. https://github.com/DagnyT/hardnet imagematching@uvic.ca N/A 128 float32
HarrisZ/RsGLOH2
kp:8000, match:sGOr2f
19-05-23 F 100.0 353.3 76.5 2.32 0.0821 0.1603 0.2273 0.2903 0.3376 Fabio Bellavia and Carlo Colombo HarrisZ keypoints and Root squared (like RootSIFT) sGLOH2 descriptors using sGOr2f* matching strategy. Distance table entries lower than their respective row-wise and column-wise averages are discarded and then matching pairs are computed using the greedy nearest neighbour as in the WISW@CAIP2019 contest. Keypoints and matches are ordered according to their ranks (best ones first). http://cvg.dsi.unifi.it/cvg/index.php?id=research#descriptor bellavia.fabio@gmail.com N/A 256 float32
HesAffNet - HardNet2
kp:8000, match:nn
19-05-29 F 100.0 1084.3 83.0 2.31 0.0973 0.1842 0.2561 0.3167 0.3776 Milan Pultar, Dmytro Mishkin, Jiří Matas Detector: Hessian + AffNet(affine shape) + OriNet (orientation), Code and weight from https://github.com/ducha-aiki/affnet, Paper: https://arxiv.org/abs/1711.06704; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: https://github.com/pultarmi/HardNet_MultiDataset, Paper: https://arxiv.org/pdf/1901.09780.pdf; DeepNets were plugged into MODS framework, but without view synthesis and matching https://github.com/ducha-aiki/mods-light-zmq; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. TBA ducha.aiki@gmail.com N/A 128 uint8
Hessian - HardNet2
kp:8000, match:nn
19-05-30 F 100.0 1259.3 77.3 2.34 0.1239 0.2027 0.2667 0.3270 0.3745 Milan Pultar, Dmytro Mishkin, Jiří Matas Detector: Hessian. Gravity vector orientation is assumed; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: https://github.com/pultarmi/HardNet_MultiDataset Paper: https://arxiv.org/pdf/1901.09780.pdf, DeepNets were plugged into MODS framework, but without view synthesis and matching https://github.com/ducha-aiki/mods-light-zmq; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. TBA ducha.aiki@gmail.com N/A 128 uint8
ORB (OpenCV)
kp:8000, match:nn
19-04-24 F 100.0 690.1 35.1 2.23 0.0112 0.0315 0.0552 0.0770 0.1000 Challenge organizers ORB, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A TBA
Scale-invariant desc. (Log-Polar, lambda=32)
kp:8000, match:nn
19-06-25 F 100.0 1084.1 86.0 2.27 0.0958 0.1836 0.2558 0.3167 0.3752 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA patrick.ebel@epfl.ch N/A 128 float32
Scale-invariant desc. (Log-Polar, lambda=64)
kp:8000, match:nn
19-06-24 F 100.0 1149.0 85.6 2.27 0.0979 0.1842 0.2588 0.3121 0.3636 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA patrick.ebel@epfl.ch N/A 128 float32
Scale-invariant desc. (Log-Polar, lambda=96)
kp:8000, match:nn
19-06-20 F 100.0 1139.8 88.0 2.26 0.0967 0.1824 0.2573 0.3200 0.3727 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA patrick.ebel@epfl.ch N/A 128 float32
SIFT-AID (NN matcher)
kp:8000, match:nn
19-05-10 F 100.0 432.1 38.7 2.19 0.0100 0.0339 0.0597 0.0842 0.1130 Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: https://hal.archives-ouvertes.fr/hal-02016010. Code: https://github.com/rdguez-mariano/sift-aid https://hal.archives-ouvertes.fr/hal-02016010 facciolo@cmla.ens-cachan.fr N/A 6272 bits
SIFT-AID (custom matcher)
kp:8000, match:sift-aid
19-04-29 F/M 100.0 389.7 35.5 2.19 0.0082 0.0300 0.0558 0.0794 0.1048 Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: https://hal.archives-ouvertes.fr/hal-02016010. Code: https://github.com/rdguez-mariano/sift-aid https://hal.archives-ouvertes.fr/hal-02016010 facciolo@cmla.ens-cachan.fr N/A 6272 bits
SIFT + ContextDesc
kp:8000, match:nn
19-05-09 F 100.0 992.1 88.9 2.26 0.1033 0.1945 0.2542 0.3109 0.3618 Zixin Luo ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. https://github.com/lzx551402/contextdesc zluoag@cse.ust.hk N/A 128 uint8
SIFT-Dense-ContextDesc
kp:8000, match:nn
19-05-28 F 100.0 923.7 79.6 2.26 0.0924 0.1636 0.2297 0.2855 0.3291 Zixin Luo, Jiahui Zhang Dense-ContextDesc is a variant of ContextDesc, where descriptors are densely extracted from full images, instead of image patches, while other settings stay unchanged as original ContextDesc. We find Dense-ContextDesc performs better regarding in particular illumination changes. Dense-ContextDesc is extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. The model will be available on the authors' GitHub page. TBA zluoag@cse.ust.hk N/A TBA
SIFT + ContextDesc + Inlier Classification V2
kp:8000, match:custom
19-05-28 F/M 100.0 1021.4 90.6 2.32 0.2618 0.4164 0.5167 0.5803 0.6297 Dawei Sun, Zixin Luo, Jiahui Zhang We use the SIFT detector and ContextDesc descriptor, and then we train an improved inlier classification and fundamental matrix estimation network based on [Yi et al. CVPR2018] (https://arxiv.org/pdf/1711.05971.pdf). https://github.com/lzx551402/contextdesc zluoag@cse.ust.hk N/A 128 float32
SIFT-GeoDesc-GitHub
kp:8000, match:nn
19-05-08 F 100.0 804.8 82.5 2.28 0.0924 0.1691 0.2330 0.2900 0.3494 Zixin Luo GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features extracted from the code provided by the authors. https://arxiv.org/abs/1807.06294 zluoag@cse.ust.hk N/A 128 float32
SIFT + GeoDesc
kp:8000, match:nn
19-04-24 F 100.0 880.0 83.5 2.30 0.0906 0.1715 0.2261 0.2915 0.3382 Challenge organizers GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. https://github.com/lzx551402/geodesc imagematching@uvic.ca N/A 128 float32
SIFT + HardNet
kp:8000, match:nn
19-04-24 F 100.0 1073.9 88.7 2.29 0.1124 0.2042 0.2739 0.3312 0.3842 Challenge organizers HardNet extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. https://github.com/DagnyT/hardnet imagematching@uvic.ca N/A 128 float32
SIFT + L2-Net
kp:8000, match:nn
19-04-24 F 100.0 948.2 85.5 2.26 0.0797 0.1606 0.2309 0.2836 0.3336 Challenge organizers L2-Net extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. https://github.com/yuruntian/L2-Net imagematching@uvic.ca N/A 128 float32
SIFT + ContextDesc + Inlier Classification V1
kp:8000, match:custom
19-05-29 F/M 100.0 992.7 91.6 2.30 0.2548 0.3994 0.4861 0.5518 0.6097 Dawei Sun, Zixin Luo, Jiahui Zhang We use the SIFT detector and ContextDesc descriptor, and then we train an inlier classification and fundamental matrix estimation network using the architecture of [Yi et al. CVPR2018] (https://arxiv.org/pdf/1711.05971.pdf). https://github.com/lzx551402/contextdesc zluoag@cse.ust.hk N/A 128 float32
SIFT (OpenCV)
kp:8000, match:nn
19-04-24 F 100.0 633.2 57.5 2.25 0.0470 0.0882 0.1273 0.1648 0.2018 Challenge organizers SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A 128 float32
SIFT + TFeat
kp:8000, match:nn
19-04-24 F 100.0 801.8 77.3 2.24 0.0621 0.1233 0.1788 0.2342 0.2797 Challenge organizers T-Feat extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. https://github.com/vbalnt/tfeat imagematching@uvic.ca N/A 128 float32
SIFT (OpenCV)
kp:2048, match:nn
19-05-17 F 100.0 263.5 32.0 2.27 0.0145 0.0376 0.0588 0.0833 0.1024 Challenge organizers SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A 128 float32
Superpoint (nn matcher)
kp:2048, match:nn
19-06-07 F 100.0 323.9 63.4 2.37 0.0700 0.1452 0.1967 0.2373 0.2806 Challenge organizers SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. TBA imagematching@uvic.ca N/A 256 float32
Superpoint (1:1 matcher)
kp:2048, match:nn1to1
19-06-07 F 100.0 308.6 67.5 2.40 0.1218 0.2124 0.2724 0.3230 0.3624 Challenge organizers SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. TBA imagematching@uvic.ca N/A 256 float32
SuperPoint (default)
kp:2048, match:nn
19-04-24 F 100.0 228.2 50.6 2.35 0.0527 0.1191 0.1694 0.2130 0.2464 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned (about 1200 on average). Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:1024, match:nn
19-04-26 F 100.0 199.7 46.5 2.35 0.0530 0.1067 0.1488 0.1833 0.2145 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:2048, match:nn
19-04-26 F 100.0 318.0 56.1 2.32 0.0488 0.1039 0.1576 0.1970 0.2321 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:256, match:nn
19-04-26 F 54.5 43.5 6.5 1.34 0.0082 0.0164 0.0224 0.0239 0.0270 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:512, match:nn
19-04-26 F 90.9 106.1 27.9 2.17 0.0412 0.0779 0.1003 0.1188 0.1333 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:8000, match:nn
19-04-26 F 100.0 897.1 69.0 2.26 0.0403 0.1033 0.1630 0.2130 0.2612 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
Scale-invariant desc. (Cartesian, lambda=16)
kp:8000, match:nn
19-07-29 F 100.0 1106.2 89.5 2.30 0.0964 0.1955 0.2724 0.3342 0.3836 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. (This is a baseline where we use cartesian patches instead.) Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA patrick.ebel@epfl.ch N/A 128 float32
SuperPoint (trained on coco + phototourism training set)
kp:2048, match:nn
19-05-30 F 100.0 249.3 57.0 2.35 0.0533 0.1112 0.1600 0.2058 0.2442 Daniel DeTone, Paul Sarlin, Tomasz Malisiewicz, Andrew Rabinovich SuperPoint V1 model trained on COCO homographic warps at VGA resolution, plus pairs from the phototourism training set using the GT poses and depths for correspondence. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork ddetone@magicleap.com N/A 256 float32
SuperPoint + Custom Matcher (v1)
kp:2048, match:custom
19-05-30 F/M 100.0 235.7 69.8 2.42 0.1621 0.2791 0.3542 0.4139 0.4582 Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich Features are extracted by a modified SuperPoint: - retrained on the phototourism training set; - keypoints refinement; - better descriptor sampling; - adjusted thresholds. A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). TBA pesarlin@magicleap.com N/A 256 float32
SuperPoint + Custom Matcher (v2)
kp:2048, match:custom
19-05-28 F/M 100.0 284.3 69.8 2.39 0.1788 0.2912 0.3670 0.4252 0.4703 Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich Features are extracted by a modified SuperPoint: - retrained on the phototourism training set, - keypoints refinement, - better descriptor sampling, - adjusted thresholds; A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). TBA pesarlin@magicleap.com N/A 256 float32
SURF (OpenCV)
kp:8000, match:nn
19-04-24 F 100.0 455.0 37.4 2.20 0.0142 0.0370 0.0633 0.0855 0.1136 Challenge organizers SURF, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A TBA
SIFT + ContextDesc
kp:8000, match:nn1to1
19-06-07 F 100.0 999.6 92.3 2.31 0.1597 0.2676 0.3542 0.4170 0.4694 Challenge organizers ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with nearest-neighbour search, enforcing cross-match consistency. Features are quantized to uint8 and extracted from the code provided by the authors. https://github.com/lzx551402/contextdesc imagematching@uvic.ca N/A TBA

MVS, bag size 5
Method Date Type Ims (%) #Pts SR TL mAP5o mAP10o mAP15o mAP20o mAP25o ATE By Details Link Contact Updated Descriptor size
AKAZE (OpenCV)
kp:8000, match:nn
19-04-24 F 88.0 1106.8 96.6 2.58 0.1263 0.2093 0.2677 0.3189 0.3642 Challenge organizers AKAZE, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A TBA
SphereDesc
kp:8000, match:nn
19-04-26 F 92.7 1336.3 98.7 2.59 0.2007 0.3067 0.3751 0.4305 0.4774 Anonymous We use OpenCV's implementation of AKAZE detector, and for each keypoint we extract a descriptor via CNN. N/A Anonymous N/A 256 float32
Brisk + SSS
kp:8000, match:nn
19-05-14 F 85.8 647.3 93.6 2.31 0.0319 0.0887 0.1437 0.1962 0.2451 Anonymous We use OpenCV's implementation of brisk detector with the default settings, and for each image there are at most 8K keypoints. For each keypoint, we extract a descriptor via a CNN model. TBA Anonymous N/A 128 float32
D2-Net (single scale)
kp:8000, match:nn
19-05-07 F 90.1 1398.3 99.0 2.46 0.1201 0.2271 0.3084 0.3732 0.4297 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf https://github.com/mihaidusmanu/d2-net imagematching@uvic.ca N/A 512 float32
D2-Net (multiscale)
kp:8000, match:nn
19-05-07 F 91.7 1757.1 99.5 2.42 0.1115 0.2278 0.3110 0.3786 0.4410 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf https://github.com/mihaidusmanu/d2-net imagematching@uvic.ca N/A 512 float32
D2-Net (single scale, no PT dataset)
kp:8000, match:nn
19-06-01 F 89.9 1426.3 99.4 2.46 0.1185 0.2385 0.3244 0.3879 0.4425 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf https://github.com/mihaidusmanu/d2-net imagematching@uvic.ca N/A 512 float32
D2-Net (multi-scale, no PT dataset)
kp:8000, match:nn
19-06-05 F 91.2 1608.4 99.5 2.41 0.1038 0.2254 0.3156 0.3833 0.4438 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf https://github.com/mihaidusmanu/d2-net imagematching@uvic.ca N/A 512 float32
DELF
kp:1024, match:nn
19-05-05 F 75.4 192.4 74.4 2.17 0.0073 0.0340 0.0661 0.1044 0.1389 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 https://github.com/tensorflow/models/tree/master/research/delf imagematching@uvic.ca N/A 40 float32
DELF
kp:2048, match:nn
19-05-05 F 79.9 332.0 87.8 2.16 0.0125 0.0466 0.0920 0.1366 0.1807 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 https://github.com/tensorflow/models/tree/master/research/delf imagematching@uvic.ca N/A 40 float32
DELF
kp:256, match:nn
19-05-05 F 45.2 45.8 4.3 1.63 0.0000 0.0007 0.0021 0.0028 0.0036 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 https://github.com/tensorflow/models/tree/master/research/delf imagematching@uvic.ca N/A 40 float32
DELF
kp:512, match:nn
19-05-05 F 67.8 116.5 41.8 2.17 0.0029 0.0109 0.0221 0.0376 0.0528 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 https://github.com/tensorflow/models/tree/master/research/delf imagematching@uvic.ca N/A 40 float32
ELF-256D
kp:512, match:nn
19-05-07 F 54.8 87.3 14.5 1.97 0.0026 0.0091 0.0133 0.0193 0.0254 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. TBA Anonymous N/A 256 float32
ELF-512D
kp:512, match:nn
19-05-09 F 65.7 104.9 22.0 2.35 0.0051 0.0128 0.0213 0.0283 0.0359 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. TBA Anonymous N/A 256 float32
ELF-SIFT
kp:512, match:nn
19-04-26 F 70.5 122.5 38.7 2.45 0.0218 0.0463 0.0665 0.0827 0.0979 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are HOG (as in SIFT). N/A Anonymous N/A 128 uint8
SIFT + GeoDesc
kp:2048, match:nn
19-05-19 F 86.4 449.6 95.4 2.58 0.1208 0.2030 0.2613 0.3179 0.3589 Challenge organizers GeoDesc extracted on SIFT keypoints. Feature matching with brute-force nearest-neighbour search. https://github.com/lzx551402/geodesc imagematching@uvic.ca N/A 128 float32
SIFT + HardNet
kp:2048, match:nn
19-05-19 F 86.0 477.3 95.4 2.55 0.1182 0.2001 0.2559 0.3066 0.3520 Challenge organizers HardNet extracted on SIFT keypoints. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger patches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. https://github.com/DagnyT/hardnet imagematching@uvic.ca N/A 128 float32
HarrisZ/RsGLOH2
kp:8000, match:sGOr2f
19-05-23 F 92.9 616.1 99.5 2.70 0.2201 0.3324 0.4084 0.4679 0.5146 Fabio Bellavia and Carlo Colombo HarrisZ keypoints and Root squared (like RootSIFT) sGLOH2 descriptors using sGOr2f* matching strategy. Distance table entries lower than their respective row-wise and column-wise averages are discarded and then matching pairs are computed using the greedy nearest neighbour as in the WISW@CAIP2019 contest. Keypoints and matches are ordered according to their ranks (best ones first). http://cvg.dsi.unifi.it/cvg/index.php?id=research#descriptor bellavia.fabio@gmail.com N/A 256 float32
HesAffNet - HardNet2
kp:8000, match:nn
19-05-29 F 93.7 1956.7 99.8 2.74 0.2687 0.3719 0.4400 0.4977 0.5402 Milan Pultar, Dmytro Mishkin, Jiří Matas Detector: Hessian + AffNet(affine shape) + OriNet (orientation), Code and weight from https://github.com/ducha-aiki/affnet, Paper: https://arxiv.org/abs/1711.06704; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: https://github.com/pultarmi/HardNet_MultiDataset, Paper: https://arxiv.org/pdf/1901.09780.pdf; DeepNets were plugged into MODS framework, but without view synthesis and matching https://github.com/ducha-aiki/mods-light-zmq; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. TBA ducha.aiki@gmail.com N/A 128 uint8
Hessian - HardNet2
kp:8000, match:nn
19-05-30 F 91.0 2189.0 99.3 2.74 0.2598 0.3574 0.4140 0.4573 0.4961 Milan Pultar, Dmytro Mishkin, Jiří Matas Detector: Hessian. Gravity vector orientation is assumed; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: https://github.com/pultarmi/HardNet_MultiDataset Paper: https://arxiv.org/pdf/1901.09780.pdf, DeepNets were plugged into MODS framework, but without view synthesis and matching https://github.com/ducha-aiki/mods-light-zmq; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. TBA ducha.aiki@gmail.com N/A 128 uint8
ORB (OpenCV)
kp:8000, match:nn
19-04-24 F 79.4 885.4 77.7 2.41 0.0379 0.0777 0.1145 0.1500 0.1873 Challenge organizers ORB, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A TBA
Scale-invariant desc. (Log-Polar, lambda=32)
kp:8000, match:nn
19-06-25 F 95.6 2135.6 99.8 2.66 0.2715 0.3778 0.4451 0.4985 0.5438 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA patrick.ebel@epfl.ch N/A 128 float32
Scale-invariant desc. (Log-Polar, lambda=64)
kp:8000, match:nn
19-06-24 F 94.6 2171.1 99.8 2.63 0.2515 0.3575 0.4277 0.4835 0.5282 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA patrick.ebel@epfl.ch N/A 128 float32
Scale-invariant desc. (Log-Polar, lambda=96)
kp:8000, match:nn
19-06-20 F 94.9 2218.2 99.8 2.64 0.2564 0.3650 0.4391 0.4922 0.5377 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA patrick.ebel@epfl.ch N/A 128 float32
SIFT-AID (NN matcher)
kp:8000, match:nn
19-05-10 F 79.8 580.4 84.3 2.35 0.0425 0.0841 0.1220 0.1612 0.1946 Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: https://hal.archives-ouvertes.fr/hal-02016010. Code: https://github.com/rdguez-mariano/sift-aid https://hal.archives-ouvertes.fr/hal-02016010 facciolo@cmla.ens-cachan.fr N/A 6272 bits
SIFT-AID (custom matcher)
kp:8000, match:sift-aid
19-04-29 F/M 78.0 522.7 81.2 2.31 0.0374 0.0808 0.1160 0.1485 0.1837 Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: https://hal.archives-ouvertes.fr/hal-02016010. Code: https://github.com/rdguez-mariano/sift-aid https://hal.archives-ouvertes.fr/hal-02016010 facciolo@cmla.ens-cachan.fr N/A 6272 bits
SIFT + ContextDesc
kp:8000, match:nn
19-05-09 F 95.9 1978.9 99.8 2.64 0.2660 0.3681 0.4392 0.4932 0.5405 Zixin Luo ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. https://github.com/lzx551402/contextdesc zluoag@cse.ust.hk N/A 128 uint8
SIFT-Dense-ContextDesc
kp:8000, match:nn
19-05-28 F 93.0 1665.6 99.5 2.60 0.2326 0.3430 0.4068 0.4595 0.5065 Zixin Luo, Jiahui Zhang Dense-ContextDesc is a variant of ContextDesc, where descriptors are densely extracted from full images, instead of image patches, while other settings stay unchanged as original ContextDesc. We find Dense-ContextDesc performs better regarding in particular illumination changes. Dense-ContextDesc is extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. The model will be available on the authors' GitHub page. TBA zluoag@cse.ust.hk N/A TBA
SIFT + ContextDesc + Inlier Classification V2
kp:8000, match:custom
19-05-28 F/M 97.5 2200.2 99.4 2.79 0.5305 0.6532 0.7134 0.7551 0.7806 Dawei Sun, Zixin Luo, Jiahui Zhang We use the SIFT detector and ContextDesc descriptor, and then we train an improved inlier classification and fundamental matrix estimation network based on [Yi et al. CVPR2018] (https://arxiv.org/pdf/1711.05971.pdf). https://github.com/lzx551402/contextdesc zluoag@cse.ust.hk N/A 128 float32
SIFT-GeoDesc-GitHub
kp:8000, match:nn
19-05-08 F 94.4 1534.7 99.5 2.67 0.2532 0.3562 0.4243 0.4765 0.5245 Zixin Luo GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features extracted from the code provided by the authors. https://arxiv.org/abs/1807.06294 zluoag@cse.ust.hk N/A 128 float32
SIFT + GeoDesc
kp:8000, match:nn
19-04-24 F 94.5 1691.3 99.8 2.70 0.2635 0.3651 0.4305 0.4862 0.5288 Challenge organizers GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. https://github.com/lzx551402/geodesc imagematching@uvic.ca N/A 128 float32
SIFT + HardNet
kp:8000, match:nn
19-04-24 F 96.1 2172.1 100.0 2.68 0.2806 0.3840 0.4511 0.5050 0.5492 Challenge organizers HardNet extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. https://github.com/DagnyT/hardnet imagematching@uvic.ca N/A 128 float32
SIFT + L2-Net
kp:8000, match:nn
19-04-24 F 94.1 1850.6 99.8 2.63 0.2415 0.3403 0.4057 0.4598 0.5081 Challenge organizers L2-Net extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. https://github.com/yuruntian/L2-Net imagematching@uvic.ca N/A 128 float32
SIFT + ContextDesc + Inlier Classification V1
kp:8000, match:custom
19-05-29 F/M 97.3 2145.2 99.7 2.79 0.4968 0.6233 0.6828 0.7237 0.7565 Dawei Sun, Zixin Luo, Jiahui Zhang We use the SIFT detector and ContextDesc descriptor, and then we train an inlier classification and fundamental matrix estimation network using the architecture of [Yi et al. CVPR2018] (https://arxiv.org/pdf/1711.05971.pdf). https://github.com/lzx551402/contextdesc zluoag@cse.ust.hk N/A 128 float32
SIFT (OpenCV)
kp:8000, match:nn
19-04-24 F 87.0 1011.7 96.6 2.54 0.1358 0.2091 0.2674 0.3167 0.3615 Challenge organizers SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A 128 float32
SIFT + TFeat
kp:8000, match:nn
19-04-24 F 92.3 1473.5 99.5 2.58 0.1767 0.2686 0.3347 0.3901 0.4380 Challenge organizers T-Feat extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. https://github.com/vbalnt/tfeat imagematching@uvic.ca N/A 128 float32
SIFT (OpenCV)
kp:2048, match:nn
19-05-17 F 77.2 355.9 76.5 2.43 0.0436 0.0817 0.1171 0.1492 0.1800 Challenge organizers SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A 128 float32
Superpoint (nn matcher)
kp:2048, match:nn
19-06-07 F 89.3 551.8 98.2 2.77 0.2004 0.3035 0.3649 0.4165 0.4609 Challenge organizers SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. TBA imagematching@uvic.ca N/A 256 float32
Superpoint (1:1 matcher)
kp:2048, match:nn1to1
19-06-07 F 91.0 546.0 98.2 2.86 0.2793 0.3926 0.4616 0.5140 0.5549 Challenge organizers SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. TBA imagematching@uvic.ca N/A 256 float32
SuperPoint (default)
kp:2048, match:nn
19-04-24 F 85.1 350.3 93.0 2.69 0.1697 0.2595 0.3138 0.3585 0.3955 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned (about 1200 on average). Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:1024, match:nn
19-04-26 F 82.9 302.3 89.0 2.67 0.1578 0.2437 0.2992 0.3403 0.3747 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:2048, match:nn
19-04-26 F 85.5 503.0 95.2 2.64 0.1462 0.2404 0.3063 0.3578 0.3966 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:256, match:nn
19-04-26 F 56.7 76.5 28.3 2.10 0.0216 0.0397 0.0505 0.0611 0.0702 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:512, match:nn
19-04-26 F 78.0 163.2 67.5 2.61 0.1058 0.1604 0.1977 0.2287 0.2527 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:8000, match:nn
19-04-26 F 89.2 1597.1 98.7 2.54 0.1085 0.2109 0.2856 0.3457 0.3946 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
Scale-invariant desc. (Cartesian, lambda=16)
kp:8000, match:nn
19-07-29 F 95.8 2303.6 99.9 2.70 0.2463 0.3597 0.4327 0.4821 0.5279 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. (This is a baseline where we use cartesian patches instead.) Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA patrick.ebel@epfl.ch N/A 128 float32
SuperPoint (trained on coco + phototourism training set)
kp:2048, match:nn
19-05-30 F 87.8 411.3 97.4 2.70 0.1846 0.2857 0.3524 0.4057 0.4491 Daniel DeTone, Paul Sarlin, Tomasz Malisiewicz, Andrew Rabinovich SuperPoint V1 model trained on COCO homographic warps at VGA resolution, plus pairs from the phototourism training set using the GT poses and depths for correspondence. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork ddetone@magicleap.com N/A 256 float32
SuperPoint + Custom Matcher (v1)
kp:2048, match:custom
19-05-30 F/M 92.9 426.6 98.0 2.93 0.3789 0.5105 0.5791 0.6239 0.6568 Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich Features are extracted by a modified SuperPoint: - retrained on the phototourism training set; - keypoints refinement; - better descriptor sampling; - adjusted thresholds. A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). TBA pesarlin@magicleap.com N/A 256 float32
SuperPoint + Custom Matcher (v2)
kp:2048, match:custom
19-05-28 F/M 92.5 514.8 98.5 2.95 0.4132 0.5445 0.6047 0.6475 0.6779 Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich Features are extracted by a modified SuperPoint: - retrained on the phototourism training set, - keypoints refinement, - better descriptor sampling, - adjusted thresholds; A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). TBA pesarlin@magicleap.com N/A 256 float32
SURF (OpenCV)
kp:8000, match:nn
19-04-24 F 80.4 599.7 86.1 2.39 0.0556 0.1045 0.1451 0.1811 0.2148 Challenge organizers SURF, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A TBA
SIFT + ContextDesc
kp:8000, match:nn1to1
19-06-07 F 96.6 2126.4 100.0 2.76 0.3405 0.4583 0.5313 0.5815 0.6252 Challenge organizers ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with nearest-neighbour search, enforcing cross-match consistency. Features are quantized to uint8 and extracted from the code provided by the authors. https://github.com/lzx551402/contextdesc imagematching@uvic.ca N/A TBA

MVS, bag size 10
Method Date Type Ims (%) #Pts SR TL mAP5o mAP10o mAP15o mAP20o mAP25o ATE By Details Link Contact Updated Descriptor size
AKAZE (OpenCV)
kp:8000, match:nn
19-04-24 F 92.3 3427.4 99.9 3.29 0.3557 0.4639 0.5297 0.5762 0.6130 Challenge organizers AKAZE, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A TBA
SphereDesc
kp:8000, match:nn
19-04-26 F 95.9 3506.0 99.9 3.32 0.3802 0.4957 0.5641 0.6136 0.6536 Anonymous We use OpenCV's implementation of AKAZE detector, and for each keypoint we extract a descriptor via CNN. N/A Anonymous N/A 256 float32
Brisk + SSS
kp:8000, match:nn
19-05-14 F 90.3 1900.5 99.8 2.76 0.1190 0.2231 0.2984 0.3588 0.4107 Anonymous We use OpenCV's implementation of brisk detector with the default settings, and for each image there are at most 8K keypoints. For each keypoint, we extract a descriptor via a CNN model. TBA Anonymous N/A 128 float32
D2-Net (single scale)
kp:8000, match:nn
19-05-07 F 92.1 3712.5 100.0 3.08 0.2573 0.3867 0.4683 0.5274 0.5741 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf https://github.com/mihaidusmanu/d2-net imagematching@uvic.ca N/A 512 float32
D2-Net (multiscale)
kp:8000, match:nn
19-05-07 F 93.2 4908.6 100.0 2.93 0.2364 0.3763 0.4650 0.5272 0.5776 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf https://github.com/mihaidusmanu/d2-net imagematching@uvic.ca N/A 512 float32
D2-Net (single scale, no PT dataset)
kp:8000, match:nn
19-06-01 F 92.4 3726.3 100.0 3.08 0.2649 0.4004 0.4782 0.5334 0.5771 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf https://github.com/mihaidusmanu/d2-net imagematching@uvic.ca N/A 512 float32
D2-Net (multi-scale, no PT dataset)
kp:8000, match:nn
19-06-05 F 93.1 4446.3 100.0 2.92 0.2228 0.3619 0.4511 0.5163 0.5672 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf https://github.com/mihaidusmanu/d2-net imagematching@uvic.ca N/A 512 float32
DELF
kp:1024, match:nn
19-05-05 F 72.1 401.3 96.6 2.34 0.0155 0.0656 0.1214 0.1759 0.2229 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 https://github.com/tensorflow/models/tree/master/research/delf imagematching@uvic.ca N/A 40 float32
DELF
kp:2048, match:nn
19-05-05 F 81.5 807.7 99.6 2.33 0.0258 0.0929 0.1647 0.2300 0.2861 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 https://github.com/tensorflow/models/tree/master/research/delf imagematching@uvic.ca N/A 40 float32
DELF
kp:256, match:nn
19-05-05 F 30.5 65.3 20.5 1.91 0.0003 0.0019 0.0044 0.0075 0.0103 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 https://github.com/tensorflow/models/tree/master/research/delf imagematching@uvic.ca N/A 40 float32
DELF
kp:512, match:nn
19-05-05 F 53.6 170.0 79.0 2.29 0.0046 0.0250 0.0534 0.0817 0.1069 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 https://github.com/tensorflow/models/tree/master/research/delf imagematching@uvic.ca N/A 40 float32
ELF-256D
kp:512, match:nn
19-05-07 F 42.6 130.8 46.8 2.33 0.0192 0.0354 0.0476 0.0586 0.0686 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. TBA Anonymous N/A 256 float32
ELF-512D
kp:512, match:nn
19-05-09 F 46.6 143.7 61.3 2.55 0.0197 0.0446 0.0634 0.0781 0.0900 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. TBA Anonymous N/A 256 float32
ELF-SIFT
kp:512, match:nn
19-04-26 F 59.1 210.8 77.5 2.78 0.0837 0.1366 0.1658 0.1890 0.2079 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are HOG (as in SIFT). N/A Anonymous N/A 128 uint8
SIFT + GeoDesc
kp:2048, match:nn
19-05-19 F 89.8 1221.2 100.0 3.18 0.2894 0.3881 0.4501 0.4996 0.5374 Challenge organizers GeoDesc extracted on SIFT keypoints. Feature matching with brute-force nearest-neighbour search. https://github.com/lzx551402/geodesc imagematching@uvic.ca N/A 128 float32
SIFT + HardNet
kp:2048, match:nn
19-05-19 F 89.8 1325.4 99.9 3.13 0.2859 0.3818 0.4413 0.4906 0.5285 Challenge organizers HardNet extracted on SIFT keypoints. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger patches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. https://github.com/DagnyT/hardnet imagematching@uvic.ca N/A 128 float32
HarrisZ/RsGLOH2
kp:8000, match:sGOr2f
19-05-23 F 95.0 1492.4 100.0 3.41 0.4054 0.5135 0.5792 0.6274 0.6662 Fabio Bellavia and Carlo Colombo HarrisZ keypoints and Root squared (like RootSIFT) sGLOH2 descriptors using sGOr2f* matching strategy. Distance table entries lower than their respective row-wise and column-wise averages are discarded and then matching pairs are computed using the greedy nearest neighbour as in the WISW@CAIP2019 contest. Keypoints and matches are ordered according to their ranks (best ones first). http://cvg.dsi.unifi.it/cvg/index.php?id=research#descriptor bellavia.fabio@gmail.com N/A 256 float32
HesAffNet - HardNet2
kp:8000, match:nn
19-05-29 F 96.0 4845.6 100.0 3.55 0.4468 0.5447 0.6021 0.6439 0.6794 Milan Pultar, Dmytro Mishkin, Jiří Matas Detector: Hessian + AffNet(affine shape) + OriNet (orientation), Code and weight from https://github.com/ducha-aiki/affnet, Paper: https://arxiv.org/abs/1711.06704; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: https://github.com/pultarmi/HardNet_MultiDataset, Paper: https://arxiv.org/pdf/1901.09780.pdf; DeepNets were plugged into MODS framework, but without view synthesis and matching https://github.com/ducha-aiki/mods-light-zmq; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. TBA ducha.aiki@gmail.com N/A 128 uint8
Hessian - HardNet2
kp:8000, match:nn
19-05-30 F 93.8 5326.1 100.0 3.62 0.4358 0.5302 0.5831 0.6210 0.6492 Milan Pultar, Dmytro Mishkin, Jiří Matas Detector: Hessian. Gravity vector orientation is assumed; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: https://github.com/pultarmi/HardNet_MultiDataset Paper: https://arxiv.org/pdf/1901.09780.pdf, DeepNets were plugged into MODS framework, but without view synthesis and matching https://github.com/ducha-aiki/mods-light-zmq; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. TBA ducha.aiki@gmail.com N/A 128 uint8
ORB (OpenCV)
kp:8000, match:nn
19-04-24 F 79.9 2076.1 98.7 2.86 0.1223 0.1873 0.2356 0.2782 0.3162 Challenge organizers ORB, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A TBA
Scale-invariant desc. (Log-Polar, lambda=32)
kp:8000, match:nn
19-06-25 F 96.9 5835.6 99.9 3.40 0.4561 0.5610 0.6235 0.6695 0.7032 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA patrick.ebel@epfl.ch N/A 128 float32
Scale-invariant desc. (Log-Polar, lambda=64)
kp:8000, match:nn
19-06-24 F 97.3 6055.3 100.0 3.39 0.4646 0.5711 0.6309 0.6772 0.7124 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA patrick.ebel@epfl.ch N/A 128 float32
Scale-invariant desc. (Log-Polar, lambda=96)
kp:8000, match:nn
19-06-20 F 97.5 6132.3 99.7 3.40 0.4750 0.5784 0.6380 0.6810 0.7133 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA patrick.ebel@epfl.ch N/A 128 float32
SIFT-AID (NN matcher)
kp:8000, match:nn
19-05-10 F 82.2 1693.3 99.9 2.82 0.1595 0.2445 0.2993 0.3459 0.3874 Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: https://hal.archives-ouvertes.fr/hal-02016010. Code: https://github.com/rdguez-mariano/sift-aid https://hal.archives-ouvertes.fr/hal-02016010 facciolo@cmla.ens-cachan.fr N/A 6272 bits
SIFT-AID (custom matcher)
kp:8000, match:sift-aid
19-04-29 F/M 79.6 1492.5 99.8 2.73 0.1454 0.2243 0.2778 0.3195 0.3578 Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: https://hal.archives-ouvertes.fr/hal-02016010. Code: https://github.com/rdguez-mariano/sift-aid https://hal.archives-ouvertes.fr/hal-02016010 facciolo@cmla.ens-cachan.fr N/A 6272 bits
SIFT + ContextDesc
kp:8000, match:nn
19-05-09 F 97.6 5440.2 99.9 3.35 0.4707 0.5780 0.6369 0.6823 0.7154 Zixin Luo ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. https://github.com/lzx551402/contextdesc zluoag@cse.ust.hk N/A 128 uint8
SIFT-Dense-ContextDesc
kp:8000, match:nn
19-05-28 F 95.9 4494.7 100.0 3.36 0.4241 0.5325 0.5955 0.6414 0.6780 Zixin Luo, Jiahui Zhang Dense-ContextDesc is a variant of ContextDesc, where descriptors are densely extracted from full images, instead of image patches, while other settings stay unchanged as original ContextDesc. We find Dense-ContextDesc performs better regarding in particular illumination changes. Dense-ContextDesc is extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. The model will be available on the authors' GitHub page. TBA zluoag@cse.ust.hk N/A TBA
SIFT + ContextDesc + Inlier Classification V2
kp:8000, match:custom
19-05-28 F/M 98.2 5753.4 99.9 3.57 0.6926 0.7735 0.8112 0.8365 0.8552 Dawei Sun, Zixin Luo, Jiahui Zhang We use the SIFT detector and ContextDesc descriptor, and then we train an improved inlier classification and fundamental matrix estimation network based on [Yi et al. CVPR2018] (https://arxiv.org/pdf/1711.05971.pdf). https://github.com/lzx551402/contextdesc zluoag@cse.ust.hk N/A 128 float32
SIFT-GeoDesc-GitHub
kp:8000, match:nn
19-05-08 F 97.0 4538.6 99.7 3.43 0.4603 0.5593 0.6158 0.6602 0.6965 Zixin Luo GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features extracted from the code provided by the authors. https://arxiv.org/abs/1807.06294 zluoag@cse.ust.hk N/A 128 float32
SIFT + GeoDesc
kp:8000, match:nn
19-04-24 F 96.7 4902.4 99.9 3.49 0.4729 0.5741 0.6312 0.6751 0.7081 Challenge organizers GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. https://github.com/lzx551402/geodesc imagematching@uvic.ca N/A 128 float32
SIFT + HardNet
kp:8000, match:nn
19-04-24 F 97.7 5954.2 100.0 3.46 0.4768 0.5833 0.6412 0.6865 0.7188 Challenge organizers HardNet extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. https://github.com/DagnyT/hardnet imagematching@uvic.ca N/A 128 float32
SIFT + L2-Net
kp:8000, match:nn
19-04-24 F 97.1 5226.7 100.0 3.36 0.4169 0.5218 0.5862 0.6348 0.6738 Challenge organizers L2-Net extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. https://github.com/yuruntian/L2-Net imagematching@uvic.ca N/A 128 float32
SIFT + ContextDesc + Inlier Classification V1
kp:8000, match:custom
19-05-29 F/M 97.8 5654.6 100.0 3.56 0.6630 0.7521 0.7933 0.8216 0.8422 Dawei Sun, Zixin Luo, Jiahui Zhang We use the SIFT detector and ContextDesc descriptor, and then we train an inlier classification and fundamental matrix estimation network using the architecture of [Yi et al. CVPR2018] (https://arxiv.org/pdf/1711.05971.pdf). https://github.com/lzx551402/contextdesc zluoag@cse.ust.hk N/A 128 float32
SIFT (OpenCV)
kp:8000, match:nn
19-04-24 F 91.0 3231.3 100.0 3.20 0.3372 0.4350 0.4961 0.5408 0.5769 Challenge organizers SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A 128 float32
SIFT + TFeat
kp:8000, match:nn
19-04-24 F 96.2 4513.6 99.9 3.27 0.3855 0.4961 0.5620 0.6125 0.6535 Challenge organizers T-Feat extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. https://github.com/vbalnt/tfeat imagematching@uvic.ca N/A 128 float32
SIFT (OpenCV)
kp:2048, match:nn
19-05-17 F 75.0 862.7 99.1 2.91 0.1481 0.2135 0.2591 0.2959 0.3275 Challenge organizers SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A 128 float32
Superpoint (nn matcher)
kp:2048, match:nn
19-06-07 F 93.1 1391.0 99.8 3.64 0.3915 0.5034 0.5674 0.6124 0.6491 Challenge organizers SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. TBA imagematching@uvic.ca N/A 256 float32
Superpoint (1:1 matcher)
kp:2048, match:nn1to1
19-06-07 F 93.9 1368.9 100.0 3.79 0.4608 0.5709 0.6303 0.6724 0.7042 Challenge organizers SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. TBA imagematching@uvic.ca N/A 256 float32
SuperPoint (default)
kp:2048, match:nn
19-04-24 F 87.4 851.9 99.9 3.53 0.3440 0.4490 0.5066 0.5486 0.5818 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned (about 1200 on average). Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:1024, match:nn
19-04-26 F 85.0 710.3 99.9 3.50 0.3409 0.4379 0.4946 0.5349 0.5643 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:2048, match:nn
19-04-26 F 88.6 1242.8 100.0 3.46 0.3292 0.4353 0.4977 0.5417 0.5752 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:256, match:nn
19-04-26 F 50.5 135.1 54.0 2.64 0.0793 0.1144 0.1352 0.1486 0.1596 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:512, match:nn
19-04-26 F 71.9 327.1 94.6 3.24 0.2366 0.3165 0.3608 0.3903 0.4135 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:8000, match:nn
19-04-26 F 92.0 4162.5 100.0 3.26 0.2501 0.3775 0.4557 0.5122 0.5573 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
Scale-invariant desc. (Cartesian, lambda=16)
kp:8000, match:nn
19-07-29 F 98.1 6295.4 99.9 3.43 0.4245 0.5321 0.5971 0.6453 0.6827 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. (This is a baseline where we use cartesian patches instead.) Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA patrick.ebel@epfl.ch N/A 128 float32
SuperPoint (trained on coco + phototourism training set)
kp:2048, match:nn
19-05-30 F 92.1 1061.1 100.0 3.60 0.3959 0.5095 0.5732 0.6210 0.6569 Daniel DeTone, Paul Sarlin, Tomasz Malisiewicz, Andrew Rabinovich SuperPoint V1 model trained on COCO homographic warps at VGA resolution, plus pairs from the phototourism training set using the GT poses and depths for correspondence. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork ddetone@magicleap.com N/A 256 float32
SuperPoint + Custom Matcher (v1)
kp:2048, match:custom
19-05-30 F/M 94.8 1043.4 100.0 3.99 0.5651 0.6701 0.7229 0.7581 0.7830 Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich Features are extracted by a modified SuperPoint: - retrained on the phototourism training set; - keypoints refinement; - better descriptor sampling; - adjusted thresholds. A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). TBA pesarlin@magicleap.com N/A 256 float32
SuperPoint + Custom Matcher (v2)
kp:2048, match:custom
19-05-28 F/M 95.1 1240.8 100.0 4.01 0.5774 0.6846 0.7348 0.7685 0.7925 Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich Features are extracted by a modified SuperPoint: - retrained on the phototourism training set, - keypoints refinement, - better descriptor sampling, - adjusted thresholds; A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). TBA pesarlin@magicleap.com N/A 256 float32
SURF (OpenCV)
kp:8000, match:nn
19-04-24 F 85.0 1790.2 99.9 2.91 0.1962 0.2835 0.3396 0.3866 0.4253 Challenge organizers SURF, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A TBA
SIFT + ContextDesc
kp:8000, match:nn1to1
19-06-07 F 97.5 5894.8 99.9 3.46 0.5135 0.6222 0.6796 0.7206 0.7522 Challenge organizers ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with nearest-neighbour search, enforcing cross-match consistency. Features are quantized to uint8 and extracted from the code provided by the authors. https://github.com/lzx551402/contextdesc imagematching@uvic.ca N/A TBA

MVS, bag size 25
Method Date Type Ims (%) #Pts SR TL mAP5o mAP10o mAP15o mAP20o mAP25o ATE By Details Link Contact Updated Descriptor size
AKAZE (OpenCV)
kp:8000, match:nn
19-04-24 F 96.9 12204.1 99.9 4.86 0.6385 0.7371 0.7845 0.8156 0.8382 Challenge organizers AKAZE, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A TBA
SphereDesc
kp:8000, match:nn
19-04-26 F 97.8 10035.3 100.0 4.75 0.6077 0.7228 0.7800 0.8159 0.8414 Anonymous We use OpenCV's implementation of AKAZE detector, and for each keypoint we extract a descriptor via CNN. N/A Anonymous N/A 256 float32
Brisk + SSS
kp:8000, match:nn
19-05-14 F 95.5 7221.3 100.0 3.83 0.3383 0.4770 0.5552 0.6111 0.6545 Anonymous We use OpenCV's implementation of brisk detector with the default settings, and for each image there are at most 8K keypoints. For each keypoint, we extract a descriptor via a CNN model. TBA Anonymous N/A 128 float32
D2-Net (single scale)
kp:8000, match:nn
19-05-07 F 93.4 11248.3 100.0 4.46 0.4339 0.5825 0.6538 0.6982 0.7290 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf https://github.com/mihaidusmanu/d2-net imagematching@uvic.ca N/A 512 float32
D2-Net (multiscale)
kp:8000, match:nn
19-05-07 F 93.6 14569.4 100.0 4.16 0.3954 0.5507 0.6284 0.6768 0.7105 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf https://github.com/mihaidusmanu/d2-net imagematching@uvic.ca N/A 512 float32
D2-Net (single scale, no PT dataset)
kp:8000, match:nn
19-06-01 F 93.5 11241.5 100.0 4.46 0.4305 0.5784 0.6514 0.6971 0.7292 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf https://github.com/mihaidusmanu/d2-net imagematching@uvic.ca N/A 512 float32
D2-Net (multi-scale, no PT dataset)
kp:8000, match:nn
19-06-05 F 93.6 13525.8 100.0 4.10 0.3872 0.5469 0.6263 0.6749 0.7089 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf https://github.com/mihaidusmanu/d2-net imagematching@uvic.ca N/A 512 float32
DELF
kp:1024, match:nn
19-05-05 F 85.3 1810.3 100.0 2.85 0.0577 0.1710 0.2682 0.3425 0.3984 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 https://github.com/tensorflow/models/tree/master/research/delf imagematching@uvic.ca N/A 40 float32
DELF
kp:2048, match:nn
19-05-05 F 90.3 3748.5 100.0 2.84 0.0863 0.2245 0.3300 0.4077 0.4649 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 https://github.com/tensorflow/models/tree/master/research/delf imagematching@uvic.ca N/A 40 float32
DELF
kp:256, match:nn
19-05-05 F 26.2 121.5 63.5 2.27 0.0022 0.0107 0.0219 0.0336 0.0437 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 https://github.com/tensorflow/models/tree/master/research/delf imagematching@uvic.ca N/A 40 float32
DELF
kp:512, match:nn
19-05-05 F 66.0 681.4 94.5 2.71 0.0206 0.0812 0.1465 0.2013 0.2456 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 https://github.com/tensorflow/models/tree/master/research/delf imagematching@uvic.ca N/A 40 float32
ELF-256D
kp:512, match:nn
19-05-07 F 49.6 340.4 89.6 3.32 0.1175 0.1690 0.1960 0.2164 0.2325 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. TBA Anonymous N/A 256 float32
ELF-512D
kp:512, match:nn
19-05-09 F 56.5 373.9 94.1 3.54 0.1351 0.2074 0.2455 0.2714 0.2912 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. TBA Anonymous N/A 256 float32
ELF-SIFT
kp:512, match:nn
19-04-26 F 69.5 662.0 99.9 3.89 0.2922 0.3696 0.4087 0.4356 0.4549 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are HOG (as in SIFT). N/A Anonymous N/A 128 uint8
SIFT + GeoDesc
kp:2048, match:nn
19-05-19 F 95.2 3924.3 100.0 4.46 0.5081 0.6074 0.6626 0.7023 0.7336 Challenge organizers GeoDesc extracted on SIFT keypoints. Feature matching with brute-force nearest-neighbour search. https://github.com/lzx551402/geodesc imagematching@uvic.ca N/A 128 float32
SIFT + HardNet
kp:2048, match:nn
19-05-19 F 95.8 4446.8 100.0 4.39 0.5103 0.6106 0.6677 0.7073 0.7380 Challenge organizers HardNet extracted on SIFT keypoints. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger patches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. https://github.com/DagnyT/hardnet imagematching@uvic.ca N/A 128 float32
HarrisZ/RsGLOH2
kp:8000, match:sGOr2f
19-05-23 F 98.4 4448.9 100.0 4.79 0.6474 0.7517 0.8013 0.8325 0.8555 Fabio Bellavia and Carlo Colombo HarrisZ keypoints and Root squared (like RootSIFT) sGLOH2 descriptors using sGOr2f* matching strategy. Distance table entries lower than their respective row-wise and column-wise averages are discarded and then matching pairs are computed using the greedy nearest neighbour as in the WISW@CAIP2019 contest. Keypoints and matches are ordered according to their ranks (best ones first). http://cvg.dsi.unifi.it/cvg/index.php?id=research#descriptor bellavia.fabio@gmail.com N/A 256 float32
HesAffNet - HardNet2
kp:8000, match:nn
19-05-29 F 97.7 13789.1 100.0 5.13 0.6736 0.7711 0.8154 0.8442 0.8644 Milan Pultar, Dmytro Mishkin, Jiří Matas Detector: Hessian + AffNet(affine shape) + OriNet (orientation), Code and weight from https://github.com/ducha-aiki/affnet, Paper: https://arxiv.org/abs/1711.06704; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: https://github.com/pultarmi/HardNet_MultiDataset, Paper: https://arxiv.org/pdf/1901.09780.pdf; DeepNets were plugged into MODS framework, but without view synthesis and matching https://github.com/ducha-aiki/mods-light-zmq; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. TBA ducha.aiki@gmail.com N/A 128 uint8
Hessian - HardNet2
kp:8000, match:nn
19-05-30 F 95.5 14054.5 100.0 5.26 0.6298 0.7190 0.7583 0.7817 0.7978 Milan Pultar, Dmytro Mishkin, Jiří Matas Detector: Hessian. Gravity vector orientation is assumed; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: https://github.com/pultarmi/HardNet_MultiDataset Paper: https://arxiv.org/pdf/1901.09780.pdf, DeepNets were plugged into MODS framework, but without view synthesis and matching https://github.com/ducha-aiki/mods-light-zmq; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. TBA ducha.aiki@gmail.com N/A 128 uint8
ORB (OpenCV)
kp:8000, match:nn
19-04-24 F 91.3 8300.6 99.9 3.75 0.3708 0.4608 0.5174 0.5603 0.5951 Challenge organizers ORB, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A TBA
Scale-invariant desc. (Log-Polar, lambda=32)
kp:8000, match:nn
19-06-25 F 98.2 17456.1 100.0 4.91 0.7009 0.7902 0.8312 0.8573 0.8764 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA patrick.ebel@epfl.ch N/A 128 float32
Scale-invariant desc. (Log-Polar, lambda=64)
kp:8000, match:nn
19-06-24 F 98.3 17795.1 100.0 4.89 0.7073 0.8003 0.8411 0.8663 0.8841 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA patrick.ebel@epfl.ch N/A 128 float32
Scale-invariant desc. (Log-Polar, lambda=96)
kp:8000, match:nn
19-06-20 F 98.1 17834.9 100.0 4.88 0.7028 0.7954 0.8363 0.8612 0.8783 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA patrick.ebel@epfl.ch N/A 128 float32
SIFT-AID (NN matcher)
kp:8000, match:nn
19-05-10 F 92.9 6882.4 100.0 4.02 0.4169 0.5321 0.5940 0.6389 0.6739 Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: https://hal.archives-ouvertes.fr/hal-02016010. Code: https://github.com/rdguez-mariano/sift-aid https://hal.archives-ouvertes.fr/hal-02016010 facciolo@cmla.ens-cachan.fr N/A 6272 bits
SIFT-AID (custom matcher)
kp:8000, match:sift-aid
19-04-29 F/M 92.7 6640.7 100.0 3.88 0.3907 0.5078 0.5717 0.6174 0.6533 Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: https://hal.archives-ouvertes.fr/hal-02016010. Code: https://github.com/rdguez-mariano/sift-aid https://hal.archives-ouvertes.fr/hal-02016010 facciolo@cmla.ens-cachan.fr N/A 6272 bits
SIFT + ContextDesc
kp:8000, match:nn
19-05-09 F 98.1 15670.4 100.0 4.78 0.6910 0.7879 0.8295 0.8549 0.8727 Zixin Luo ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. https://github.com/lzx551402/contextdesc zluoag@cse.ust.hk N/A 128 uint8
SIFT-Dense-ContextDesc
kp:8000, match:nn
19-05-28 F 97.6 13254.7 100.0 4.94 0.6745 0.7753 0.8197 0.8463 0.8648 Zixin Luo, Jiahui Zhang Dense-ContextDesc is a variant of ContextDesc, where descriptors are densely extracted from full images, instead of image patches, while other settings stay unchanged as original ContextDesc. We find Dense-ContextDesc performs better regarding in particular illumination changes. Dense-ContextDesc is extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. The model will be available on the authors' GitHub page. TBA zluoag@cse.ust.hk N/A TBA
SIFT + ContextDesc + Inlier Classification V2
kp:8000, match:custom
19-05-28 F/M 98.7 15528.9 100.0 5.06 0.8172 0.8889 0.9144 0.9281 0.9369 Dawei Sun, Zixin Luo, Jiahui Zhang We use the SIFT detector and ContextDesc descriptor, and then we train an improved inlier classification and fundamental matrix estimation network based on [Yi et al. CVPR2018] (https://arxiv.org/pdf/1711.05971.pdf). https://github.com/lzx551402/contextdesc zluoag@cse.ust.hk N/A 128 float32
SIFT-GeoDesc-GitHub
kp:8000, match:nn
19-05-08 F 98.3 13975.1 100.0 4.95 0.7162 0.8077 0.8460 0.8697 0.8861 Zixin Luo GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features extracted from the code provided by the authors. https://arxiv.org/abs/1807.06294 zluoag@cse.ust.hk N/A 128 float32
SIFT + GeoDesc
kp:8000, match:nn
19-04-24 F 98.2 14861.6 99.8 5.09 0.7163 0.8006 0.8391 0.8634 0.8804 Challenge organizers GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. https://github.com/lzx551402/geodesc imagematching@uvic.ca N/A 128 float32
SIFT + HardNet
kp:8000, match:nn
19-04-24 F 97.9 17011.3 100.0 5.00 0.6875 0.7833 0.8263 0.8535 0.8720 Challenge organizers HardNet extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. https://github.com/DagnyT/hardnet imagematching@uvic.ca N/A 128 float32
SIFT + L2-Net
kp:8000, match:nn
19-04-24 F 98.0 16304.6 99.9 4.84 0.6672 0.7666 0.8120 0.8409 0.8613 Challenge organizers L2-Net extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. https://github.com/yuruntian/L2-Net imagematching@uvic.ca N/A 128 float32
SIFT + ContextDesc + Inlier Classification V1
kp:8000, match:custom
19-05-29 F/M 98.7 15390.7 100.0 5.05 0.8063 0.8783 0.9053 0.9208 0.9311 Dawei Sun, Zixin Luo, Jiahui Zhang We use the SIFT detector and ContextDesc descriptor, and then we train an inlier classification and fundamental matrix estimation network using the architecture of [Yi et al. CVPR2018] (https://arxiv.org/pdf/1711.05971.pdf). https://github.com/lzx551402/contextdesc zluoag@cse.ust.hk N/A 128 float32
SIFT (OpenCV)
kp:8000, match:nn
19-04-24 F 96.5 12489.7 99.8 4.61 0.6326 0.7237 0.7679 0.7976 0.8201 Challenge organizers SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A 128 float32
SIFT + TFeat
kp:8000, match:nn
19-04-24 F 97.3 14948.2 99.9 4.70 0.6368 0.7349 0.7818 0.8120 0.8333 Challenge organizers T-Feat extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. https://github.com/vbalnt/tfeat imagematching@uvic.ca N/A 128 float32
SIFT (OpenCV)
kp:2048, match:nn
19-05-17 F 89.3 3374.9 100.0 4.11 0.4023 0.4912 0.5408 0.5766 0.6053 Challenge organizers SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A 128 float32
Superpoint (nn matcher)
kp:2048, match:nn
19-06-07 F 97.4 4048.4 100.0 5.43 0.6195 0.7274 0.7821 0.8167 0.8405 Challenge organizers SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. TBA imagematching@uvic.ca N/A 256 float32
Superpoint (1:1 matcher)
kp:2048, match:nn1to1
19-06-07 F 97.6 3880.2 100.0 5.60 0.6605 0.7619 0.8114 0.8414 0.8625 Challenge organizers SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. TBA imagematching@uvic.ca N/A 256 float32
SuperPoint (default)
kp:2048, match:nn
19-04-24 F 92.3 2528.6 100.0 5.42 0.5717 0.6706 0.7169 0.7458 0.7659 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned (about 1200 on average). Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:1024, match:nn
19-04-26 F 92.3 2041.9 100.0 5.36 0.5732 0.6699 0.7154 0.7438 0.7631 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:2048, match:nn
19-04-26 F 93.2 3797.1 100.0 5.19 0.5798 0.6805 0.7271 0.7558 0.7752 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:256, match:nn
19-04-26 F 58.5 340.4 81.3 4.11 0.2275 0.2796 0.3074 0.3264 0.3405 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:512, match:nn
19-04-26 F 82.8 937.5 100.0 4.93 0.4612 0.5456 0.5885 0.6155 0.6349 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:8000, match:nn
19-04-26 F 93.8 13117.2 100.0 4.83 0.4604 0.6008 0.6686 0.7107 0.7396 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
Scale-invariant desc. (Cartesian, lambda=16)
kp:8000, match:nn
19-07-29 F 97.6 18349.5 100.0 4.87 0.6240 0.7293 0.7811 0.8128 0.8361 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. (This is a baseline where we use cartesian patches instead.) Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA patrick.ebel@epfl.ch N/A 128 float32
SuperPoint (trained on coco + phototourism training set)
kp:2048, match:nn
19-05-30 F 97.3 3130.6 100.0 5.50 0.6528 0.7599 0.8084 0.8380 0.8581 Daniel DeTone, Paul Sarlin, Tomasz Malisiewicz, Andrew Rabinovich SuperPoint V1 model trained on COCO homographic warps at VGA resolution, plus pairs from the phototourism training set using the GT poses and depths for correspondence. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork ddetone@magicleap.com N/A 256 float32
SuperPoint + Custom Matcher (v1)
kp:2048, match:custom
19-05-30 F/M 98.2 2813.7 100.0 6.06 0.7527 0.8355 0.8718 0.8923 0.9062 Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich Features are extracted by a modified SuperPoint: - retrained on the phototourism training set; - keypoints refinement; - better descriptor sampling; - adjusted thresholds. A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). TBA pesarlin@magicleap.com N/A 256 float32
SuperPoint + Custom Matcher (v2)
kp:2048, match:custom
19-05-28 F/M 98.2 3358.5 100.0 6.13 0.7609 0.8430 0.8765 0.8952 0.9083 Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich Features are extracted by a modified SuperPoint: - retrained on the phototourism training set, - keypoints refinement, - better descriptor sampling, - adjusted thresholds; A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). TBA pesarlin@magicleap.com N/A 256 float32
SURF (OpenCV)
kp:8000, match:nn
19-04-24 F 94.8 8611.8 99.9 3.98 0.4916 0.5971 0.6547 0.6956 0.7271 Challenge organizers SURF, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A TBA
SIFT + ContextDesc
kp:8000, match:nn1to1
19-06-07 F 98.3 16867.5 100.0 4.85 0.7012 0.8004 0.8415 0.8664 0.8837 Challenge organizers ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with nearest-neighbour search, enforcing cross-match consistency. Features are quantized to uint8 and extracted from the code provided by the authors. https://github.com/lzx551402/contextdesc imagematching@uvic.ca N/A TBA