IMW CVPR 2019: Leaderboard

The challenge contains the following datasets and tasks (see this for details):

Some notes:

  • Place the mouse cursor over row headers for details about the metrics (or here for an example).
  • You can filter using the search box and labels, which are listed under the name of the method. Sparse methods are broken down into categories by the number of keypoints used: up to 256, 512, 1024, 2048, and 8000 (the maximum allowed) keypoints per image. Sparse feature matching can be done by brute-force nearest neighbour search (“nn”), one to one correspondences (“1to1”), or user-provided matches.

[P1] Phototourism dataset — Stereo task

Performance in stereo matching, averaged over all the test sequences.


Stereo — averaged over all sequences
Method Date Type #kp MS mAP5o mAP10o mAP15o mAP20o mAP25o By Details Link Contact Updated Descriptor size
AKAZE (OpenCV)
kp:8000, match:nn
19-04-24 F 7879.7 0.2207 0.0006 0.0070 0.0288 0.0726 0.1380 Challenge organizers AKAZE, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A TBA
SphereDesc
kp:8000, match:nn
19-04-26 F 5356.7 0.2715 0.0006 0.0106 0.0403 0.0978 0.1737 Anonymous We use OpenCV's implementation of AKAZE detector, and for each keypoint we extract a descriptor via CNN. N/A Anonymous N/A 256 float32
Brisk + SSS
kp:8000, match:nn
19-05-14 F 4652.0 0.2742 0.0006 0.0074 0.0336 0.0856 0.1522 Anonymous We use OpenCV's implementation of brisk detector with the default settings, and for each image there are at most 8K keypoints. For each keypoint, we extract a descriptor via a CNN model. TBA Anonymous N/A 128 float32
D2-Net (single scale)
kp:8000, match:nn
19-05-07 F 5474.7 0.2683 0.0006 0.0119 0.0461 0.1084 0.1900 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf https://github.com/mihaidusmanu/d2-net imagematching@uvic.ca N/A 512 float32
D2-Net (multiscale)
kp:8000, match:nn
19-05-07 F 7332.7 0.2793 0.0006 0.0127 0.0490 0.1177 0.2036 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf https://github.com/mihaidusmanu/d2-net imagematching@uvic.ca N/A 512 float32
D2-Net (single scale, no PT dataset)
kp:8000, match:nn
19-06-01 F 5542.2 0.2676 0.0010 0.0121 0.0449 0.1062 0.1888 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf https://github.com/mihaidusmanu/d2-net imagematching@uvic.ca N/A 512 float32
D2-Net (multi-scale, no PT dataset)
kp:8000, match:nn
19-06-05 F 6814.2 0.2785 0.0006 0.0121 0.0490 0.1134 0.2004 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf https://github.com/mihaidusmanu/d2-net imagematching@uvic.ca N/A 512 float32
DELF
kp:1024, match:nn
19-05-05 F 1024.0 0.2762 0.0004 0.0078 0.0297 0.0744 0.1361 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 https://github.com/tensorflow/models/tree/master/research/delf imagematching@uvic.ca N/A 40 float32
DELF
kp:2048, match:nn
19-05-05 F 2038.9 0.2706 0.0006 0.0082 0.0324 0.0770 0.1408 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 https://github.com/tensorflow/models/tree/master/research/delf imagematching@uvic.ca N/A 40 float32
DELF
kp:256, match:nn
19-05-05 F 256.0 0.2772 0.0004 0.0059 0.0217 0.0534 0.1032 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 https://github.com/tensorflow/models/tree/master/research/delf imagematching@uvic.ca N/A 40 float32
DELF
kp:512, match:nn
19-05-05 F 512.0 0.2778 0.0004 0.0069 0.0266 0.0642 0.1210 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 https://github.com/tensorflow/models/tree/master/research/delf imagematching@uvic.ca N/A 40 float32
ELF-256D
kp:512, match:nn
19-05-07 F 344.8 0.2498 0.0006 0.0056 0.0230 0.0564 0.1012 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. TBA Anonymous N/A 256 float32
ELF-512D
kp:512, match:nn
19-05-09 F 344.8 0.2644 0.0006 0.0066 0.0274 0.0635 0.1119 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. TBA Anonymous N/A 256 float32
ELF-SIFT
kp:512, match:nn
19-04-26 F 484.0 0.2356 0.0007 0.0082 0.0286 0.0666 0.1188 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are HOG (as in SIFT). N/A Anonymous N/A 128 uint8
SIFT + GeoDesc
kp:2048, match:nn
19-05-19 F 1946.0 0.2404 0.0007 0.0076 0.0313 0.0778 0.1419 Challenge organizers GeoDesc extracted on SIFT keypoints. Feature matching with brute-force nearest-neighbour search. https://github.com/lzx551402/geodesc imagematching@uvic.ca N/A 128 float32
SIFT + HardNet
kp:2048, match:nn
19-05-19 F 1946.0 0.2438 0.0006 0.0079 0.0319 0.0776 0.1407 Challenge organizers HardNet extracted on SIFT keypoints. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger patches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. https://github.com/DagnyT/hardnet imagematching@uvic.ca N/A 128 float32
HarrisZ/RsGLOH2
kp:8000, match:sGOr2f
19-05-23 F 2364.4 0.2475 0.0007 0.0098 0.0411 0.1011 0.1796 Fabio Bellavia and Carlo Colombo HarrisZ keypoints and Root squared (like RootSIFT) sGLOH2 descriptors using sGOr2f* matching strategy. Distance table entries lower than their respective row-wise and column-wise averages are discarded and then matching pairs are computed using the greedy nearest neighbour as in the WISW@CAIP2019 contest. Keypoints and matches are ordered according to their ranks (best ones first). http://cvg.dsi.unifi.it/cvg/index.php?id=research#descriptor bellavia.fabio@gmail.com N/A 256 float32
HesAffNet - HardNet2
kp:8000, match:nn
19-05-29 F 7682.0 0.2498 0.0009 0.0098 0.0380 0.0932 0.1682 Milan Pultar, Dmytro Mishkin, Jiří Matas Detector: Hessian + AffNet(affine shape) + OriNet (orientation), Code and weight from https://github.com/ducha-aiki/affnet, Paper: https://arxiv.org/abs/1711.06704; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: https://github.com/pultarmi/HardNet_MultiDataset, Paper: https://arxiv.org/pdf/1901.09780.pdf; DeepNets were plugged into MODS framework, but without view synthesis and matching https://github.com/ducha-aiki/mods-light-zmq; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. TBA ducha.aiki@gmail.com N/A 128 uint8
Hessian - HardNet2
kp:8000, match:nn
19-05-30 F 7749.5 0.2653 0.0008 0.0111 0.0416 0.1024 0.1851 Milan Pultar, Dmytro Mishkin, Jiří Matas Detector: Hessian. Gravity vector orientation is assumed; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: https://github.com/pultarmi/HardNet_MultiDataset Paper: https://arxiv.org/pdf/1901.09780.pdf, DeepNets were plugged into MODS framework, but without view synthesis and matching https://github.com/ducha-aiki/mods-light-zmq; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. TBA ducha.aiki@gmail.com N/A 128 uint8
ORB (OpenCV)
kp:8000, match:nn
19-04-24 F 7128.0 0.2239 0.0002 0.0045 0.0189 0.0502 0.0952 Challenge organizers ORB, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A TBA
Scale-invariant desc. (Log-Polar, lambda=32)
kp:8000, match:nn
19-06-25 F 7884.6 0.2558 0.0009 0.0111 0.0420 0.0995 0.1788 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA patrick.ebel@epfl.ch N/A 128 float32
Scale-invariant desc. (Log-Polar, lambda=64)
kp:8000, match:nn
19-06-24 F 7884.6 0.2612 0.0007 0.0118 0.0432 0.1044 0.1828 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA patrick.ebel@epfl.ch N/A 128 float32
Scale-invariant desc. (Log-Polar, lambda=96)
kp:8000, match:nn
19-06-20 F 7884.6 0.2638 0.0008 0.0119 0.0448 0.1066 0.1850 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA patrick.ebel@epfl.ch N/A 128 float32
SIFT-AID (NN matcher)
kp:8000, match:nn
19-05-10 F 3828.5 0.2456 0.0005 0.0061 0.0226 0.0543 0.0981 Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: https://hal.archives-ouvertes.fr/hal-02016010. Code: https://github.com/rdguez-mariano/sift-aid https://hal.archives-ouvertes.fr/hal-02016010 facciolo@cmla.ens-cachan.fr N/A 6272 bits
SIFT-AID (custom matcher)
kp:8000, match:sift-aid
19-04-29 F/M 3828.5 0.2454 0.0005 0.0052 0.0218 0.0515 0.0960 Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: https://hal.archives-ouvertes.fr/hal-02016010. Code: https://github.com/rdguez-mariano/sift-aid https://hal.archives-ouvertes.fr/hal-02016010 facciolo@cmla.ens-cachan.fr N/A 6272 bits
SIFT + ContextDesc
kp:8000, match:nn
19-05-09 F 7515.2 0.3006 0.0008 0.0117 0.0439 0.1025 0.1832 Zixin Luo ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. https://github.com/lzx551402/contextdesc zluoag@cse.ust.hk N/A 128 uint8
SIFT-Dense-ContextDesc
kp:8000, match:nn
19-05-28 F 7515.2 0.2855 0.0007 0.0111 0.0432 0.1001 0.1779 Zixin Luo, Jiahui Zhang Dense-ContextDesc is a variant of ContextDesc, where descriptors are densely extracted from full images, instead of image patches, while other settings stay unchanged as original ContextDesc. We find Dense-ContextDesc performs better regarding in particular illumination changes. Dense-ContextDesc is extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. The model will be available on the authors' GitHub page. TBA zluoag@cse.ust.hk N/A TBA
SIFT + ContextDesc + Inlier Classification V2
kp:8000, match:custom
19-05-28 F/M 7515.2 0.3633 0.0016 0.0217 0.0823 0.1818 0.2963 Dawei Sun, Zixin Luo, Jiahui Zhang We use the SIFT detector and ContextDesc descriptor, and then we train an improved inlier classification and fundamental matrix estimation network based on [Yi et al. CVPR2018] (https://arxiv.org/pdf/1711.05971.pdf). https://github.com/lzx551402/contextdesc zluoag@cse.ust.hk N/A 128 float32
SIFT-GeoDesc-GitHub
kp:8000, match:nn
19-05-08 F 7515.2 0.2337 0.0007 0.0091 0.0368 0.0904 0.1653 Zixin Luo GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features extracted from the code provided by the authors. https://arxiv.org/abs/1807.06294 zluoag@cse.ust.hk N/A 128 float32
SIFT + GeoDesc
kp:8000, match:nn
19-04-24 F 7885.0 0.2329 0.0007 0.0103 0.0367 0.0907 0.1655 Challenge organizers GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. https://github.com/lzx551402/geodesc imagematching@uvic.ca N/A 128 float32
SIFT + HardNet
kp:8000, match:nn
19-04-24 F 7885.0 0.2523 0.0009 0.0112 0.0425 0.1005 0.1793 Challenge organizers HardNet extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. https://github.com/DagnyT/hardnet imagematching@uvic.ca N/A 128 float32
SIFT + L2-Net
kp:8000, match:nn
19-04-24 F 7885.0 0.2476 0.0009 0.0110 0.0400 0.0929 0.1696 Challenge organizers L2-Net extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. https://github.com/yuruntian/L2-Net imagematching@uvic.ca N/A 128 float32
SIFT + ContextDesc + Inlier Classification V1
kp:8000, match:custom
19-05-29 F/M 7515.2 0.3623 0.0018 0.0199 0.0752 0.1699 0.2789 Dawei Sun, Zixin Luo, Jiahui Zhang We use the SIFT detector and ContextDesc descriptor, and then we train an inlier classification and fundamental matrix estimation network using the architecture of [Yi et al. CVPR2018] (https://arxiv.org/pdf/1711.05971.pdf). https://github.com/lzx551402/contextdesc zluoag@cse.ust.hk N/A 128 float32
SIFT (OpenCV)
kp:8000, match:nn
19-04-24 F 7884.4 0.2148 0.0004 0.0068 0.0277 0.0692 0.1303 Challenge organizers SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A 128 float32
SIFT + TFeat
kp:8000, match:nn
19-04-24 F 7885.0 0.2359 0.0007 0.0103 0.0357 0.0845 0.1540 Challenge organizers T-Feat extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. https://github.com/vbalnt/tfeat imagematching@uvic.ca N/A 128 float32
SIFT (OpenCV)
kp:2048, match:nn
19-05-17 F 2048.0 0.2215 0.0004 0.0055 0.0223 0.0570 0.1060 Challenge organizers SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A 128 float32
Superpoint (nn matcher)
kp:2048, match:nn
19-06-07 F 1880.6 0.2464 0.0008 0.0112 0.0415 0.0975 0.1696 Challenge organizers SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. TBA imagematching@uvic.ca N/A 256 float32
Superpoint (1:1 matcher)
kp:2048, match:nn1to1
19-06-07 F 1880.6 0.2651 0.0008 0.0133 0.0519 0.1168 0.2016 Challenge organizers SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. TBA imagematching@uvic.ca N/A 256 float32
SuperPoint (default)
kp:2048, match:nn
19-04-24 F 1266.5 0.2576 0.0008 0.0121 0.0414 0.0970 0.1676 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned (about 1200 on average). Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:1024, match:nn
19-04-26 F 1024.0 0.2565 0.0008 0.0111 0.0402 0.0953 0.1658 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:2048, match:nn
19-04-26 F 2048.0 0.2495 0.0009 0.0112 0.0416 0.0930 0.1641 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:256, match:nn
19-04-26 F 256.0 0.2485 0.0008 0.0080 0.0308 0.0731 0.1270 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:512, match:nn
19-04-26 F 512.0 0.2553 0.0007 0.0091 0.0377 0.0897 0.1569 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:8000, match:nn
19-04-26 F 8000.0 0.2404 0.0008 0.0106 0.0387 0.0879 0.1529 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
Scale-invariant desc. (Cartesian, lambda=16)
kp:8000, match:nn
19-07-29 F 7884.6 0.2490 0.0012 0.0114 0.0405 0.0988 0.1792 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. (This is a baseline where we use cartesian patches instead.) Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA patrick.ebel@epfl.ch N/A 128 float32
SuperPoint (trained on coco + phototourism training set)
kp:2048, match:nn
19-05-30 F 1562.8 0.2479 0.0010 0.0102 0.0389 0.0920 0.1654 Daniel DeTone, Paul Sarlin, Tomasz Malisiewicz, Andrew Rabinovich SuperPoint V1 model trained on COCO homographic warps at VGA resolution, plus pairs from the phototourism training set using the GT poses and depths for correspondence. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork ddetone@magicleap.com N/A 256 float32
SuperPoint + Custom Matcher (v1)
kp:2048, match:custom
19-05-30 F/M 1502.0 0.3548 0.0011 0.0157 0.0590 0.1338 0.2257 Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich Features are extracted by a modified SuperPoint: - retrained on the phototourism training set; - keypoints refinement; - better descriptor sampling; - adjusted thresholds. A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). TBA pesarlin@magicleap.com N/A 256 float32
SuperPoint + Custom Matcher (v2)
kp:2048, match:custom
19-05-28 F/M 1880.6 0.3547 0.0012 0.0171 0.0640 0.1413 0.2396 Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich Features are extracted by a modified SuperPoint: - retrained on the phototourism training set, - keypoints refinement, - better descriptor sampling, - adjusted thresholds; A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). TBA pesarlin@magicleap.com N/A 256 float32
SURF (OpenCV)
kp:8000, match:nn
19-04-24 F 7748.8 0.2078 0.0006 0.0057 0.0210 0.0542 0.1059 Challenge organizers SURF, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A TBA
SIFT + ContextDesc
kp:8000, match:nn1to1
19-06-07 F 7515.2 0.3112 0.0009 0.0130 0.0513 0.1226 0.2151 Challenge organizers ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with nearest-neighbour search, enforcing cross-match consistency. Features are quantized to uint8 and extracted from the code provided by the authors. https://github.com/lzx551402/contextdesc imagematching@uvic.ca N/A TBA


[P2] Phototourism dataset — Multi-view task

Performance in SfM reconstruction, averaged over all the test sequences.


MVS — averaged over all sequences
Method Date Type Ims (%) #Pts SR TL mAP5o mAP10o mAP15o mAP20o mAP25o ATE By Details Link Contact Updated Descriptor size
AKAZE (OpenCV)
kp:8000, match:nn
19-04-24 F 94.3 4357.8 88.5 3.25 0.2906 0.3747 0.4285 0.4725 0.5075 Challenge organizers AKAZE, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A TBA
SphereDesc
kp:8000, match:nn
19-04-26 F 96.6 3901.2 94.1 3.24 0.3168 0.4199 0.4865 0.5360 0.5772 Anonymous We use OpenCV's implementation of AKAZE detector, and for each keypoint we extract a descriptor via CNN. N/A Anonymous N/A 256 float32
Brisk + SSS
kp:8000, match:nn
19-05-14 F 92.9 2548.4 87.9 2.76 0.1252 0.2074 0.2694 0.3209 0.3667 Anonymous We use OpenCV's implementation of brisk detector with the default settings, and for each image there are at most 8K keypoints. For each keypoint, we extract a descriptor via a CNN model. TBA Anonymous N/A 128 float32
D2-Net (single scale)
kp:8000, match:nn
19-05-07 F 93.9 4278.0 92.8 3.06 0.2162 0.3304 0.4072 0.4627 0.5075 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf https://github.com/mihaidusmanu/d2-net imagematching@uvic.ca N/A 512 float32
D2-Net (multiscale)
kp:8000, match:nn
19-05-07 F 94.6 5533.0 94.4 2.93 0.1995 0.3218 0.4001 0.4623 0.5142 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf https://github.com/mihaidusmanu/d2-net imagematching@uvic.ca N/A 512 float32
D2-Net (single scale, no PT dataset)
kp:8000, match:nn
19-06-01 F 93.9 4289.3 92.8 3.06 0.2157 0.3349 0.4102 0.4652 0.5112 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf https://github.com/mihaidusmanu/d2-net imagematching@uvic.ca N/A 512 float32
D2-Net (multi-scale, no PT dataset)
kp:8000, match:nn
19-06-05 F 94.5 5098.5 94.0 2.91 0.1889 0.3132 0.3967 0.4579 0.5098 Challenge organizers D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf https://github.com/mihaidusmanu/d2-net imagematching@uvic.ca N/A 512 float32
DELF
kp:1024, match:nn
19-05-05 F 83.2 637.0 75.8 2.37 0.0213 0.0735 0.1236 0.1708 0.2095 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 https://github.com/tensorflow/models/tree/master/research/delf imagematching@uvic.ca N/A 40 float32
DELF
kp:2048, match:nn
19-05-05 F 87.9 1278.7 83.8 2.36 0.0323 0.0991 0.1629 0.2185 0.2656 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 https://github.com/tensorflow/models/tree/master/research/delf imagematching@uvic.ca N/A 40 float32
DELF
kp:256, match:nn
19-05-05 F 34.6 64.1 22.2 1.65 0.0006 0.0034 0.0072 0.0111 0.0147 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 https://github.com/tensorflow/models/tree/master/research/delf imagematching@uvic.ca N/A 40 float32
DELF
kp:512, match:nn
19-05-05 F 69.6 260.7 56.4 2.29 0.0073 0.0307 0.0585 0.0842 0.1069 Challenge organizers DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321 https://github.com/tensorflow/models/tree/master/research/delf imagematching@uvic.ca N/A 40 float32
ELF-256D
kp:512, match:nn
19-05-07 F 52.7 153.3 38.4 2.27 0.0354 0.0543 0.0657 0.0755 0.0842 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. TBA Anonymous N/A 256 float32
ELF-512D
kp:512, match:nn
19-05-09 F 60.4 170.1 45.6 2.52 0.0407 0.0677 0.0849 0.0976 0.1082 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints. TBA Anonymous N/A 256 float32
ELF-SIFT
kp:512, match:nn
19-04-26 F 72.5 273.0 56.5 2.81 0.1012 0.1438 0.1678 0.1865 0.2021 Anonymous ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are HOG (as in SIFT). N/A Anonymous N/A 128 uint8
SIFT + GeoDesc
kp:2048, match:nn
19-05-19 F 92.8 1470.4 88.6 3.13 0.2406 0.3251 0.3802 0.4273 0.4638 Challenge organizers GeoDesc extracted on SIFT keypoints. Feature matching with brute-force nearest-neighbour search. https://github.com/lzx551402/geodesc imagematching@uvic.ca N/A 128 float32
SIFT + HardNet
kp:2048, match:nn
19-05-19 F 92.9 1635.2 88.1 3.09 0.2391 0.3213 0.3778 0.4232 0.4611 Challenge organizers HardNet extracted on SIFT keypoints. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger patches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. https://github.com/DagnyT/hardnet imagematching@uvic.ca N/A 128 float32
HarrisZ/RsGLOH2
kp:8000, match:sGOr2f
19-05-23 F 96.6 1727.7 94.0 3.31 0.3388 0.4395 0.5040 0.5545 0.5935 Fabio Bellavia and Carlo Colombo HarrisZ keypoints and Root squared (like RootSIFT) sGLOH2 descriptors using sGOr2f* matching strategy. Distance table entries lower than their respective row-wise and column-wise averages are discarded and then matching pairs are computed using the greedy nearest neighbour as in the WISW@CAIP2019 contest. Keypoints and matches are ordered according to their ranks (best ones first). http://cvg.dsi.unifi.it/cvg/index.php?id=research#descriptor bellavia.fabio@gmail.com N/A 256 float32
HesAffNet - HardNet2
kp:8000, match:nn
19-05-29 F 96.8 5418.9 95.7 3.43 0.3716 0.4680 0.5284 0.5756 0.6154 Milan Pultar, Dmytro Mishkin, Jiří Matas Detector: Hessian + AffNet(affine shape) + OriNet (orientation), Code and weight from https://github.com/ducha-aiki/affnet, Paper: https://arxiv.org/abs/1711.06704; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: https://github.com/pultarmi/HardNet_MultiDataset, Paper: https://arxiv.org/pdf/1901.09780.pdf; DeepNets were plugged into MODS framework, but without view synthesis and matching https://github.com/ducha-aiki/mods-light-zmq; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. TBA ducha.aiki@gmail.com N/A 128 uint8
Hessian - HardNet2
kp:8000, match:nn
19-05-30 F 95.1 5707.2 94.1 3.49 0.3624 0.4523 0.5055 0.5467 0.5794 Milan Pultar, Dmytro Mishkin, Jiří Matas Detector: Hessian. Gravity vector orientation is assumed; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: https://github.com/pultarmi/HardNet_MultiDataset Paper: https://arxiv.org/pdf/1901.09780.pdf, DeepNets were plugged into MODS framework, but without view synthesis and matching https://github.com/ducha-aiki/mods-light-zmq; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images. TBA ducha.aiki@gmail.com N/A 128 uint8
ORB (OpenCV)
kp:8000, match:nn
19-04-24 F 87.6 2988.1 77.9 2.81 0.1356 0.1893 0.2307 0.2664 0.2996 Challenge organizers ORB, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A TBA
Scale-invariant desc. (Log-Polar, lambda=32)
kp:8000, match:nn
19-06-25 F 97.7 6627.9 96.4 3.31 0.3811 0.4782 0.5389 0.5855 0.6246 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA patrick.ebel@epfl.ch N/A 128 float32
Scale-invariant desc. (Log-Polar, lambda=64)
kp:8000, match:nn
19-06-24 F 97.6 6792.6 96.4 3.30 0.3803 0.4783 0.5396 0.5848 0.6221 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA patrick.ebel@epfl.ch N/A 128 float32
Scale-invariant desc. (Log-Polar, lambda=96)
kp:8000, match:nn
19-06-20 F 97.6 6831.3 96.9 3.29 0.3827 0.4803 0.5427 0.5886 0.6255 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA patrick.ebel@epfl.ch N/A 128 float32
SIFT-AID (NN matcher)
kp:8000, match:nn
19-05-10 F 88.7 2397.1 80.7 2.85 0.1572 0.2237 0.2688 0.3076 0.3422 Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: https://hal.archives-ouvertes.fr/hal-02016010. Code: https://github.com/rdguez-mariano/sift-aid https://hal.archives-ouvertes.fr/hal-02016010 facciolo@cmla.ens-cachan.fr N/A 6272 bits
SIFT-AID (custom matcher)
kp:8000, match:sift-aid
19-04-29 F/M 87.6 2261.4 79.1 2.78 0.1454 0.2107 0.2553 0.2912 0.3249 Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: https://hal.archives-ouvertes.fr/hal-02016010. Code: https://github.com/rdguez-mariano/sift-aid https://hal.archives-ouvertes.fr/hal-02016010 facciolo@cmla.ens-cachan.fr N/A 6272 bits
SIFT + ContextDesc
kp:8000, match:nn
19-05-09 F 97.9 6020.4 97.2 3.26 0.3828 0.4821 0.5399 0.5853 0.6226 Zixin Luo ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. https://github.com/lzx551402/contextdesc zluoag@cse.ust.hk N/A 128 uint8
SIFT-Dense-ContextDesc
kp:8000, match:nn
19-05-28 F 96.6 5084.7 94.8 3.29 0.3559 0.4536 0.5129 0.5582 0.5946 Zixin Luo, Jiahui Zhang Dense-ContextDesc is a variant of ContextDesc, where descriptors are densely extracted from full images, instead of image patches, while other settings stay unchanged as original ContextDesc. We find Dense-ContextDesc performs better regarding in particular illumination changes. Dense-ContextDesc is extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. The model will be available on the authors' GitHub page. TBA zluoag@cse.ust.hk N/A TBA
SIFT + ContextDesc + Inlier Classification V2
kp:8000, match:custom
19-05-28 F/M 98.6 6126.0 97.5 3.44 0.5755 0.6830 0.7389 0.7750 0.8006 Dawei Sun, Zixin Luo, Jiahui Zhang We use the SIFT detector and ContextDesc descriptor, and then we train an improved inlier classification and fundamental matrix estimation network based on [Yi et al. CVPR2018] (https://arxiv.org/pdf/1711.05971.pdf). https://github.com/lzx551402/contextdesc zluoag@cse.ust.hk N/A 128 float32
SIFT-GeoDesc-GitHub
kp:8000, match:nn
19-05-08 F 97.4 5213.3 95.5 3.33 0.3805 0.4731 0.5298 0.5741 0.6141 Zixin Luo GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features extracted from the code provided by the authors. https://arxiv.org/abs/1807.06294 zluoag@cse.ust.hk N/A 128 float32
SIFT + GeoDesc
kp:8000, match:nn
19-04-24 F 97.3 5583.8 95.8 3.39 0.3858 0.4778 0.5317 0.5790 0.6139 Challenge organizers GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. https://github.com/lzx551402/geodesc imagematching@uvic.ca N/A 128 float32
SIFT + HardNet
kp:8000, match:nn
19-04-24 F 97.9 6552.9 97.2 3.36 0.3894 0.4887 0.5481 0.5940 0.6310 Challenge organizers HardNet extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. https://github.com/DagnyT/hardnet imagematching@uvic.ca N/A 128 float32
SIFT + L2-Net
kp:8000, match:nn
19-04-24 F 97.3 6082.5 96.3 3.27 0.3513 0.4473 0.5087 0.5548 0.5942 Challenge organizers L2-Net extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. https://github.com/yuruntian/L2-Net imagematching@uvic.ca N/A 128 float32
SIFT + ContextDesc + Inlier Classification V1
kp:8000, match:custom
19-05-29 F/M 98.4 6045.8 97.8 3.43 0.5553 0.6633 0.7169 0.7545 0.7849 Dawei Sun, Zixin Luo, Jiahui Zhang We use the SIFT detector and ContextDesc descriptor, and then we train an inlier classification and fundamental matrix estimation network using the architecture of [Yi et al. CVPR2018] (https://arxiv.org/pdf/1711.05971.pdf). https://github.com/lzx551402/contextdesc zluoag@cse.ust.hk N/A 128 float32
SIFT (OpenCV)
kp:8000, match:nn
19-04-24 F 93.6 4341.5 88.5 3.15 0.2881 0.3640 0.4146 0.4550 0.4901 Challenge organizers SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A 128 float32
SIFT + TFeat
kp:8000, match:nn
19-04-24 F 96.5 5434.3 94.1 3.20 0.3153 0.4057 0.4643 0.5122 0.5511 Challenge organizers T-Feat extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search. https://github.com/vbalnt/tfeat imagematching@uvic.ca N/A 128 float32
SIFT (OpenCV)
kp:2048, match:nn
19-05-17 F 85.3 1214.3 76.9 2.93 0.1521 0.2060 0.2439 0.2762 0.3038 Challenge organizers SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A 128 float32
Superpoint (nn matcher)
kp:2048, match:nn
19-06-07 F 95.0 1578.8 90.3 3.55 0.3203 0.4198 0.4778 0.5207 0.5578 Challenge organizers SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. TBA imagematching@uvic.ca N/A 256 float32
Superpoint (1:1 matcher)
kp:2048, match:nn1to1
19-06-07 F 95.6 1525.9 91.4 3.66 0.3806 0.4845 0.5440 0.5877 0.6210 Challenge organizers SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference. TBA imagematching@uvic.ca N/A 256 float32
SuperPoint (default)
kp:2048, match:nn
19-04-24 F 91.2 989.7 85.9 3.50 0.2845 0.3745 0.4267 0.4665 0.4974 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned (about 1200 on average). Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:1024, match:nn
19-04-26 F 90.1 813.6 83.9 3.47 0.2812 0.3646 0.4145 0.4506 0.4792 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:2048, match:nn
19-04-26 F 91.8 1465.2 87.8 3.40 0.2760 0.3650 0.4222 0.4631 0.4948 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:256, match:nn
19-04-26 F 55.1 148.9 42.5 2.55 0.0841 0.1125 0.1289 0.1400 0.1493 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:512, match:nn
19-04-26 F 80.9 383.5 72.5 3.24 0.2112 0.2751 0.3118 0.3383 0.3586 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
SuperPoint
kp:8000, match:nn
19-04-26 F 93.7 4943.5 91.9 3.22 0.2148 0.3231 0.3932 0.4454 0.4882 Challenge organizers SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork imagematching@uvic.ca N/A 256 float32
Scale-invariant desc. (Cartesian, lambda=16)
kp:8000, match:nn
19-07-29 F 97.9 7013.7 97.3 3.32 0.3478 0.4541 0.5208 0.5686 0.6076 Patrick Ebel We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. (This is a baseline where we use cartesian patches instead.) Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019. TBA patrick.ebel@epfl.ch N/A 128 float32
SuperPoint (trained on coco + phototourism training set)
kp:2048, match:nn
19-05-30 F 94.3 1213.1 88.6 3.54 0.3217 0.4166 0.4735 0.5176 0.5521 Daniel DeTone, Paul Sarlin, Tomasz Malisiewicz, Andrew Rabinovich SuperPoint V1 model trained on COCO homographic warps at VGA resolution, plus pairs from the phototourism training set using the GT poses and depths for correspondence. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned. https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork ddetone@magicleap.com N/A 256 float32
SuperPoint + Custom Matcher (v1)
kp:2048, match:custom
19-05-30 F/M 96.5 1129.8 92.0 3.85 0.4647 0.5738 0.6320 0.6721 0.7010 Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich Features are extracted by a modified SuperPoint: - retrained on the phototourism training set; - keypoints refinement; - better descriptor sampling; - adjusted thresholds. A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). TBA pesarlin@magicleap.com N/A 256 float32
SuperPoint + Custom Matcher (v2)
kp:2048, match:custom
19-05-28 F/M 96.5 1349.6 92.1 3.87 0.4826 0.5908 0.6458 0.6841 0.7122 Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich Features are extracted by a modified SuperPoint: - retrained on the phototourism training set, - keypoints refinement, - better descriptor sampling, - adjusted thresholds; A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018). TBA pesarlin@magicleap.com N/A 256 float32
SURF (OpenCV)
kp:8000, match:nn
19-04-24 F 90.1 2864.2 80.8 2.87 0.1894 0.2555 0.3007 0.3372 0.3702 Challenge organizers SURF, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. https://opencv.org imagematching@uvic.ca N/A TBA
SIFT + ContextDesc
kp:8000, match:nn1to1
19-06-07 F 98.1 6472.1 98.0 3.34 0.4287 0.5371 0.6017 0.6464 0.6826 Challenge organizers ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with nearest-neighbour search, enforcing cross-match consistency. Features are quantized to uint8 and extracted from the code provided by the authors. https://github.com/lzx551402/contextdesc imagematching@uvic.ca N/A TBA



[S1] SILDa dataset — Image matching task

Coming Soon.