Seventh Workshop on Image Matching: Local Features & Beyond

The challenge contains the following datasets and tasks (see this for details):

Some notes:

Place the mouse cursor over row headers for details about the metrics (or here for an example).
You can filter using the search box and labels, which are listed under the name of the method. Sparse methods are broken down into categories by the number of keypoints used: up to 256, 512, 1024, 2048, and 8000 (the maximum allowed) keypoints per image. Sparse feature matching can be done by brute-force nearest neighbour search (“nn”), one to one correspondences (“1to1”), or user-provided matches.

## [P1] Phototourism dataset — Stereo task

Performance in stereo matching, averaged over all the test sequences.

Click here for a breakdown by sequence

Stereo — averaged over all sequences
Method	Date	Type	#kp	MS	mAP^{5^o}	mAP^{10^o}	mAP^{15^o}	mAP^{20^o}	mAP^{25^o}	By	Details	Link	Contact	Updated	Descriptor size
AKAZE (OpenCV) kp:8000, match:nn	19-04-24	F	7879.7	0.2207	0.0006	0.0070	0.0288	0.0726	0.1380	Challenge organizers	AKAZE, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search.	https://opencv.org	imagematching@uvic.ca	N/A	TBA
SphereDesc kp:8000, match:nn	19-04-26	F	5356.7	0.2715	0.0006	0.0106	0.0403	0.0978	0.1737	Anonymous	We use OpenCV's implementation of AKAZE detector, and for each keypoint we extract a descriptor via CNN.	N/A	Anonymous	N/A	256 float32
Brisk + SSS kp:8000, match:nn	19-05-14	F	4652.0	0.2742	0.0006	0.0074	0.0336	0.0856	0.1522	Anonymous	We use OpenCV's implementation of brisk detector with the default settings, and for each image there are at most 8K keypoints. For each keypoint, we extract a descriptor via a CNN model.	TBA	Anonymous	N/A	128 float32
D2-Net (single scale) kp:8000, match:nn	19-05-07	F	5474.7	0.2683	0.0006	0.0119	0.0461	0.1084	0.1900	Challenge organizers	D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf	https://github.com/mihaidusmanu/d2-net	imagematching@uvic.ca	N/A	512 float32
D2-Net (multiscale) kp:8000, match:nn	19-05-07	F	7332.7	0.2793	0.0006	0.0127	0.0490	0.1177	0.2036	Challenge organizers	D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf	https://github.com/mihaidusmanu/d2-net	imagematching@uvic.ca	N/A	512 float32
D2-Net (single scale, no PT dataset) kp:8000, match:nn	19-06-01	F	5542.2	0.2676	0.0010	0.0121	0.0449	0.1062	0.1888	Challenge organizers	D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf	https://github.com/mihaidusmanu/d2-net	imagematching@uvic.ca	N/A	512 float32
D2-Net (multi-scale, no PT dataset) kp:8000, match:nn	19-06-05	F	6814.2	0.2785	0.0006	0.0121	0.0490	0.1134	0.2004	Challenge organizers	D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf	https://github.com/mihaidusmanu/d2-net	imagematching@uvic.ca	N/A	512 float32
DELF kp:1024, match:nn	19-05-05	F	1024.0	0.2762	0.0004	0.0078	0.0297	0.0744	0.1361	Challenge organizers	DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321	https://github.com/tensorflow/models/tree/master/research/delf	imagematching@uvic.ca	N/A	40 float32
DELF kp:2048, match:nn	19-05-05	F	2038.9	0.2706	0.0006	0.0082	0.0324	0.0770	0.1408	Challenge organizers	DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321	https://github.com/tensorflow/models/tree/master/research/delf	imagematching@uvic.ca	N/A	40 float32
DELF kp:256, match:nn	19-05-05	F	256.0	0.2772	0.0004	0.0059	0.0217	0.0534	0.1032	Challenge organizers	DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321	https://github.com/tensorflow/models/tree/master/research/delf	imagematching@uvic.ca	N/A	40 float32
DELF kp:512, match:nn	19-05-05	F	512.0	0.2778	0.0004	0.0069	0.0266	0.0642	0.1210	Challenge organizers	DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321	https://github.com/tensorflow/models/tree/master/research/delf	imagematching@uvic.ca	N/A	40 float32
ELF-256D kp:512, match:nn	19-05-07	F	344.8	0.2498	0.0006	0.0056	0.0230	0.0564	0.1012	Anonymous	ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints.	TBA	Anonymous	N/A	256 float32
ELF-512D kp:512, match:nn	19-05-09	F	344.8	0.2644	0.0006	0.0066	0.0274	0.0635	0.1119	Anonymous	ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints.	TBA	Anonymous	N/A	256 float32
ELF-SIFT kp:512, match:nn	19-04-26	F	484.0	0.2356	0.0007	0.0082	0.0286	0.0666	0.1188	Anonymous	ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are HOG (as in SIFT).	N/A	Anonymous	N/A	128 uint8
SIFT + GeoDesc kp:2048, match:nn	19-05-19	F	1946.0	0.2404	0.0007	0.0076	0.0313	0.0778	0.1419	Challenge organizers	GeoDesc extracted on SIFT keypoints. Feature matching with brute-force nearest-neighbour search.	https://github.com/lzx551402/geodesc	imagematching@uvic.ca	N/A	128 float32
SIFT + HardNet kp:2048, match:nn	19-05-19	F	1946.0	0.2438	0.0006	0.0079	0.0319	0.0776	0.1407	Challenge organizers	HardNet extracted on SIFT keypoints. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger patches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search.	https://github.com/DagnyT/hardnet	imagematching@uvic.ca	N/A	128 float32
HarrisZ/RsGLOH2 kp:8000, match:sGOr2f	19-05-23	F	2364.4	0.2475	0.0007	0.0098	0.0411	0.1011	0.1796	Fabio Bellavia and Carlo Colombo	HarrisZ keypoints and Root squared (like RootSIFT) sGLOH2 descriptors using sGOr2f* matching strategy. Distance table entries lower than their respective row-wise and column-wise averages are discarded and then matching pairs are computed using the greedy nearest neighbour as in the WISW@CAIP2019 contest. Keypoints and matches are ordered according to their ranks (best ones first).	http://cvg.dsi.unifi.it/cvg/index.php?id=research#descriptor	bellavia.fabio@gmail.com	N/A	256 float32
HesAffNet - HardNet2 kp:8000, match:nn	19-05-29	F	7682.0	0.2498	0.0009	0.0098	0.0380	0.0932	0.1682	Milan Pultar, Dmytro Mishkin, Jiří Matas	Detector: Hessian + AffNet(affine shape) + OriNet (orientation), Code and weight from https://github.com/ducha-aiki/affnet, Paper: https://arxiv.org/abs/1711.06704; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: https://github.com/pultarmi/HardNet_MultiDataset, Paper: https://arxiv.org/pdf/1901.09780.pdf; DeepNets were plugged into MODS framework, but without view synthesis and matching https://github.com/ducha-aiki/mods-light-zmq; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images.	TBA	ducha.aiki@gmail.com	N/A	128 uint8
Hessian - HardNet2 kp:8000, match:nn	19-05-30	F	7749.5	0.2653	0.0008	0.0111	0.0416	0.1024	0.1851	Milan Pultar, Dmytro Mishkin, Jiří Matas	Detector: Hessian. Gravity vector orientation is assumed; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: https://github.com/pultarmi/HardNet_MultiDataset Paper: https://arxiv.org/pdf/1901.09780.pdf, DeepNets were plugged into MODS framework, but without view synthesis and matching https://github.com/ducha-aiki/mods-light-zmq; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images.	TBA	ducha.aiki@gmail.com	N/A	128 uint8
ORB (OpenCV) kp:8000, match:nn	19-04-24	F	7128.0	0.2239	0.0002	0.0045	0.0189	0.0502	0.0952	Challenge organizers	ORB, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search.	https://opencv.org	imagematching@uvic.ca	N/A	TBA
Scale-invariant desc. (Log-Polar, lambda=32) kp:8000, match:nn	19-06-25	F	7884.6	0.2558	0.0009	0.0111	0.0420	0.0995	0.1788	Patrick Ebel	We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019.	TBA	patrick.ebel@epfl.ch	N/A	128 float32
Scale-invariant desc. (Log-Polar, lambda=64) kp:8000, match:nn	19-06-24	F	7884.6	0.2612	0.0007	0.0118	0.0432	0.1044	0.1828	Patrick Ebel	We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019.	TBA	patrick.ebel@epfl.ch	N/A	128 float32
Scale-invariant desc. (Log-Polar, lambda=96) kp:8000, match:nn	19-06-20	F	7884.6	0.2638	0.0008	0.0119	0.0448	0.1066	0.1850	Patrick Ebel	We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019.	TBA	patrick.ebel@epfl.ch	N/A	128 float32
SIFT-AID (NN matcher) kp:8000, match:nn	19-05-10	F	3828.5	0.2456	0.0005	0.0061	0.0226	0.0543	0.0981	Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon	We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: https://hal.archives-ouvertes.fr/hal-02016010. Code: https://github.com/rdguez-mariano/sift-aid	https://hal.archives-ouvertes.fr/hal-02016010	facciolo@cmla.ens-cachan.fr	N/A	6272 bits
SIFT-AID (custom matcher) kp:8000, match:sift-aid	19-04-29	F/M	3828.5	0.2454	0.0005	0.0052	0.0218	0.0515	0.0960	Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon	We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: https://hal.archives-ouvertes.fr/hal-02016010. Code: https://github.com/rdguez-mariano/sift-aid	https://hal.archives-ouvertes.fr/hal-02016010	facciolo@cmla.ens-cachan.fr	N/A	6272 bits
SIFT + ContextDesc kp:8000, match:nn	19-05-09	F	7515.2	0.3006	0.0008	0.0117	0.0439	0.1025	0.1832	Zixin Luo	ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors.	https://github.com/lzx551402/contextdesc	zluoag@cse.ust.hk	N/A	128 uint8
SIFT-Dense-ContextDesc kp:8000, match:nn	19-05-28	F	7515.2	0.2855	0.0007	0.0111	0.0432	0.1001	0.1779	Zixin Luo, Jiahui Zhang	Dense-ContextDesc is a variant of ContextDesc, where descriptors are densely extracted from full images, instead of image patches, while other settings stay unchanged as original ContextDesc. We find Dense-ContextDesc performs better regarding in particular illumination changes. Dense-ContextDesc is extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. The model will be available on the authors' GitHub page.	TBA	zluoag@cse.ust.hk	N/A	TBA
SIFT + ContextDesc + Inlier Classification V2 kp:8000, match:custom	19-05-28	F/M	7515.2	0.3633	0.0016	0.0217	0.0823	0.1818	0.2963	Dawei Sun, Zixin Luo, Jiahui Zhang	We use the SIFT detector and ContextDesc descriptor, and then we train an improved inlier classification and fundamental matrix estimation network based on [Yi et al. CVPR2018] (https://arxiv.org/pdf/1711.05971.pdf).	https://github.com/lzx551402/contextdesc	zluoag@cse.ust.hk	N/A	128 float32
SIFT-GeoDesc-GitHub kp:8000, match:nn	19-05-08	F	7515.2	0.2337	0.0007	0.0091	0.0368	0.0904	0.1653	Zixin Luo	GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features extracted from the code provided by the authors.	https://arxiv.org/abs/1807.06294	zluoag@cse.ust.hk	N/A	128 float32
SIFT + GeoDesc kp:8000, match:nn	19-04-24	F	7885.0	0.2329	0.0007	0.0103	0.0367	0.0907	0.1655	Challenge organizers	GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search.	https://github.com/lzx551402/geodesc	imagematching@uvic.ca	N/A	128 float32
SIFT + HardNet kp:8000, match:nn	19-04-24	F	7885.0	0.2523	0.0009	0.0112	0.0425	0.1005	0.1793	Challenge organizers	HardNet extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search.	https://github.com/DagnyT/hardnet	imagematching@uvic.ca	N/A	128 float32
SIFT + L2-Net kp:8000, match:nn	19-04-24	F	7885.0	0.2476	0.0009	0.0110	0.0400	0.0929	0.1696	Challenge organizers	L2-Net extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search.	https://github.com/yuruntian/L2-Net	imagematching@uvic.ca	N/A	128 float32
SIFT + ContextDesc + Inlier Classification V1 kp:8000, match:custom	19-05-29	F/M	7515.2	0.3623	0.0018	0.0199	0.0752	0.1699	0.2789	Dawei Sun, Zixin Luo, Jiahui Zhang	We use the SIFT detector and ContextDesc descriptor, and then we train an inlier classification and fundamental matrix estimation network using the architecture of [Yi et al. CVPR2018] (https://arxiv.org/pdf/1711.05971.pdf).	https://github.com/lzx551402/contextdesc	zluoag@cse.ust.hk	N/A	128 float32
SIFT (OpenCV) kp:8000, match:nn	19-04-24	F	7884.4	0.2148	0.0004	0.0068	0.0277	0.0692	0.1303	Challenge organizers	SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search.	https://opencv.org	imagematching@uvic.ca	N/A	128 float32
SIFT + TFeat kp:8000, match:nn	19-04-24	F	7885.0	0.2359	0.0007	0.0103	0.0357	0.0845	0.1540	Challenge organizers	T-Feat extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search.	https://github.com/vbalnt/tfeat	imagematching@uvic.ca	N/A	128 float32
SIFT (OpenCV) kp:2048, match:nn	19-05-17	F	2048.0	0.2215	0.0004	0.0055	0.0223	0.0570	0.1060	Challenge organizers	SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search.	https://opencv.org	imagematching@uvic.ca	N/A	128 float32
Superpoint (nn matcher) kp:2048, match:nn	19-06-07	F	1880.6	0.2464	0.0008	0.0112	0.0415	0.0975	0.1696	Challenge organizers	SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference.	TBA	imagematching@uvic.ca	N/A	256 float32
Superpoint (1:1 matcher) kp:2048, match:nn1to1	19-06-07	F	1880.6	0.2651	0.0008	0.0133	0.0519	0.1168	0.2016	Challenge organizers	SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference.	TBA	imagematching@uvic.ca	N/A	256 float32
SuperPoint (default) kp:2048, match:nn	19-04-24	F	1266.5	0.2576	0.0008	0.0121	0.0414	0.0970	0.1676	Challenge organizers	SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned (about 1200 on average). Feature matching done with brute-force nearest-neighbour search.	https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork	imagematching@uvic.ca	N/A	256 float32
SuperPoint kp:1024, match:nn	19-04-26	F	1024.0	0.2565	0.0008	0.0111	0.0402	0.0953	0.1658	Challenge organizers	SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search.	https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork	imagematching@uvic.ca	N/A	256 float32
SuperPoint kp:2048, match:nn	19-04-26	F	2048.0	0.2495	0.0009	0.0112	0.0416	0.0930	0.1641	Challenge organizers	SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search.	https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork	imagematching@uvic.ca	N/A	256 float32
SuperPoint kp:256, match:nn	19-04-26	F	256.0	0.2485	0.0008	0.0080	0.0308	0.0731	0.1270	Challenge organizers	SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search.	https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork	imagematching@uvic.ca	N/A	256 float32
SuperPoint kp:512, match:nn	19-04-26	F	512.0	0.2553	0.0007	0.0091	0.0377	0.0897	0.1569	Challenge organizers	SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search.	https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork	imagematching@uvic.ca	N/A	256 float32
SuperPoint kp:8000, match:nn	19-04-26	F	8000.0	0.2404	0.0008	0.0106	0.0387	0.0879	0.1529	Challenge organizers	SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search.	https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork	imagematching@uvic.ca	N/A	256 float32
Scale-invariant desc. (Cartesian, lambda=16) kp:8000, match:nn	19-07-29	F	7884.6	0.2490	0.0012	0.0114	0.0405	0.0988	0.1792	Patrick Ebel	We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. (This is a baseline where we use cartesian patches instead.) Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019.	TBA	patrick.ebel@epfl.ch	N/A	128 float32
SuperPoint (trained on coco + phototourism training set) kp:2048, match:nn	19-05-30	F	1562.8	0.2479	0.0010	0.0102	0.0389	0.0920	0.1654	Daniel DeTone, Paul Sarlin, Tomasz Malisiewicz, Andrew Rabinovich	SuperPoint V1 model trained on COCO homographic warps at VGA resolution, plus pairs from the phototourism training set using the GT poses and depths for correspondence. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned.	https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork	ddetone@magicleap.com	N/A	256 float32
SuperPoint + Custom Matcher (v1) kp:2048, match:custom	19-05-30	F/M	1502.0	0.3548	0.0011	0.0157	0.0590	0.1338	0.2257	Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich	Features are extracted by a modified SuperPoint: - retrained on the phototourism training set; - keypoints refinement; - better descriptor sampling; - adjusted thresholds. A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018).	TBA	pesarlin@magicleap.com	N/A	256 float32
SuperPoint + Custom Matcher (v2) kp:2048, match:custom	19-05-28	F/M	1880.6	0.3547	0.0012	0.0171	0.0640	0.1413	0.2396	Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich	Features are extracted by a modified SuperPoint: - retrained on the phototourism training set, - keypoints refinement, - better descriptor sampling, - adjusted thresholds; A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018).	TBA	pesarlin@magicleap.com	N/A	256 float32
SURF (OpenCV) kp:8000, match:nn	19-04-24	F	7748.8	0.2078	0.0006	0.0057	0.0210	0.0542	0.1059	Challenge organizers	SURF, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search.	https://opencv.org	imagematching@uvic.ca	N/A	TBA
SIFT + ContextDesc kp:8000, match:nn1to1	19-06-07	F	7515.2	0.3112	0.0009	0.0130	0.0513	0.1226	0.2151	Challenge organizers	ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with nearest-neighbour search, enforcing cross-match consistency. Features are quantized to uint8 and extracted from the code provided by the authors.	https://github.com/lzx551402/contextdesc	imagematching@uvic.ca	N/A	TBA

## [P2] Phototourism dataset — Multi-view task

Performance in SfM reconstruction, averaged over all the test sequences.

MVS — averaged over all sequences
Method	Date	Type	Ims (%)	#Pts	SR	TL	mAP^{5^o}	mAP^{10^o}	mAP^{15^o}	mAP^{20^o}	mAP^{25^o}	ATE	By	Details	Link	Contact	Updated	Descriptor size
AKAZE (OpenCV) kp:8000, match:nn	19-04-24	F	94.3	4357.8	88.5	3.25	0.2906	0.3747	0.4285	0.4725	0.5075	—	Challenge organizers	AKAZE, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search.	https://opencv.org	imagematching@uvic.ca	N/A	TBA
SphereDesc kp:8000, match:nn	19-04-26	F	96.6	3901.2	94.1	3.24	0.3168	0.4199	0.4865	0.5360	0.5772	—	Anonymous	We use OpenCV's implementation of AKAZE detector, and for each keypoint we extract a descriptor via CNN.	N/A	Anonymous	N/A	256 float32
Brisk + SSS kp:8000, match:nn	19-05-14	F	92.9	2548.4	87.9	2.76	0.1252	0.2074	0.2694	0.3209	0.3667	—	Anonymous	We use OpenCV's implementation of brisk detector with the default settings, and for each image there are at most 8K keypoints. For each keypoint, we extract a descriptor via a CNN model.	TBA	Anonymous	N/A	128 float32
D2-Net (single scale) kp:8000, match:nn	19-05-07	F	93.9	4278.0	92.8	3.06	0.2162	0.3304	0.4072	0.4627	0.5075	—	Challenge organizers	D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf	https://github.com/mihaidusmanu/d2-net	imagematching@uvic.ca	N/A	512 float32
D2-Net (multiscale) kp:8000, match:nn	19-05-07	F	94.6	5533.0	94.4	2.93	0.1995	0.3218	0.4001	0.4623	0.5142	—	Challenge organizers	D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Trained on sequences overlapping with our test set: see 'no PT' for eligible results (this entry is provided only for reference). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf	https://github.com/mihaidusmanu/d2-net	imagematching@uvic.ca	N/A	512 float32
D2-Net (single scale, no PT dataset) kp:8000, match:nn	19-06-01	F	93.9	4289.3	92.8	3.06	0.2157	0.3349	0.4102	0.4652	0.5112	—	Challenge organizers	D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Single-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf	https://github.com/mihaidusmanu/d2-net	imagematching@uvic.ca	N/A	512 float32
D2-Net (multi-scale, no PT dataset) kp:8000, match:nn	19-06-05	F	94.5	5098.5	94.0	2.91	0.1889	0.3132	0.3967	0.4579	0.5098	—	Challenge organizers	D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Multi-scale features, brute-force nearest neighbour matching. Re-trained removing conflicting sequences (models: d2_tf_no_phototourism.pth). Paper: https://dsmn.ml/files/d2-net/d2-net.pdf	https://github.com/mihaidusmanu/d2-net	imagematching@uvic.ca	N/A	512 float32
DELF kp:1024, match:nn	19-05-05	F	83.2	637.0	75.8	2.37	0.0213	0.0735	0.1236	0.1708	0.2095	—	Challenge organizers	DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321	https://github.com/tensorflow/models/tree/master/research/delf	imagematching@uvic.ca	N/A	40 float32
DELF kp:2048, match:nn	19-05-05	F	87.9	1278.7	83.8	2.36	0.0323	0.0991	0.1629	0.2185	0.2656	—	Challenge organizers	DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321	https://github.com/tensorflow/models/tree/master/research/delf	imagematching@uvic.ca	N/A	40 float32
DELF kp:256, match:nn	19-05-05	F	34.6	64.1	22.2	1.65	0.0006	0.0034	0.0072	0.0111	0.0147	—	Challenge organizers	DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321	https://github.com/tensorflow/models/tree/master/research/delf	imagematching@uvic.ca	N/A	40 float32
DELF kp:512, match:nn	19-05-05	F	69.6	260.7	56.4	2.29	0.0073	0.0307	0.0585	0.0842	0.1069	—	Challenge organizers	DELF features for object retrieval, trained on the Google Landmarks dataset. Paper: https://arxiv.org/abs/1612.06321	https://github.com/tensorflow/models/tree/master/research/delf	imagematching@uvic.ca	N/A	40 float32
ELF-256D kp:512, match:nn	19-05-07	F	52.7	153.3	38.4	2.27	0.0354	0.0543	0.0657	0.0755	0.0842	—	Anonymous	ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints.	TBA	Anonymous	N/A	256 float32
ELF-512D kp:512, match:nn	19-05-09	F	60.4	170.1	45.6	2.52	0.0407	0.0677	0.0849	0.0976	0.1082	—	Anonymous	ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are computed with the interpolation of the VGG pool3 feature map on the detected keypoints.	TBA	Anonymous	N/A	256 float32
ELF-SIFT kp:512, match:nn	19-04-26	F	72.5	273.0	56.5	2.81	0.1012	0.1438	0.1678	0.1865	0.2021	—	Anonymous	ELF detector: Keypoints are local maxima of a saliency map generated by the gradient of a feature map with respect to the image of a pre-trained CNN. Descriptors are HOG (as in SIFT).	N/A	Anonymous	N/A	128 uint8
SIFT + GeoDesc kp:2048, match:nn	19-05-19	F	92.8	1470.4	88.6	3.13	0.2406	0.3251	0.3802	0.4273	0.4638	—	Challenge organizers	GeoDesc extracted on SIFT keypoints. Feature matching with brute-force nearest-neighbour search.	https://github.com/lzx551402/geodesc	imagematching@uvic.ca	N/A	128 float32
SIFT + HardNet kp:2048, match:nn	19-05-19	F	92.9	1635.2	88.1	3.09	0.2391	0.3213	0.3778	0.4232	0.4611	—	Challenge organizers	HardNet extracted on SIFT keypoints. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger patches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search.	https://github.com/DagnyT/hardnet	imagematching@uvic.ca	N/A	128 float32
HarrisZ/RsGLOH2 kp:8000, match:sGOr2f	19-05-23	F	96.6	1727.7	94.0	3.31	0.3388	0.4395	0.5040	0.5545	0.5935	—	Fabio Bellavia and Carlo Colombo	HarrisZ keypoints and Root squared (like RootSIFT) sGLOH2 descriptors using sGOr2f* matching strategy. Distance table entries lower than their respective row-wise and column-wise averages are discarded and then matching pairs are computed using the greedy nearest neighbour as in the WISW@CAIP2019 contest. Keypoints and matches are ordered according to their ranks (best ones first).	http://cvg.dsi.unifi.it/cvg/index.php?id=research#descriptor	bellavia.fabio@gmail.com	N/A	256 float32
HesAffNet - HardNet2 kp:8000, match:nn	19-05-29	F	96.8	5418.9	95.7	3.43	0.3716	0.4680	0.5284	0.5756	0.6154	—	Milan Pultar, Dmytro Mishkin, Jiří Matas	Detector: Hessian + AffNet(affine shape) + OriNet (orientation), Code and weight from https://github.com/ducha-aiki/affnet, Paper: https://arxiv.org/abs/1711.06704; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: https://github.com/pultarmi/HardNet_MultiDataset, Paper: https://arxiv.org/pdf/1901.09780.pdf; DeepNets were plugged into MODS framework, but without view synthesis and matching https://github.com/ducha-aiki/mods-light-zmq; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images.	TBA	ducha.aiki@gmail.com	N/A	128 uint8
Hessian - HardNet2 kp:8000, match:nn	19-05-30	F	95.1	5707.2	94.1	3.49	0.3624	0.4523	0.5055	0.5467	0.5794	—	Milan Pultar, Dmytro Mishkin, Jiří Matas	Detector: Hessian. Gravity vector orientation is assumed; Descriptor: HardNet trained on AMOS + mix of other datasets, similar to Code for train: https://github.com/pultarmi/HardNet_MultiDataset Paper: https://arxiv.org/pdf/1901.09780.pdf, DeepNets were plugged into MODS framework, but without view synthesis and matching https://github.com/ducha-aiki/mods-light-zmq; Number of max.keypoints to detect: 8k, detection done on 2x upsampled images.	TBA	ducha.aiki@gmail.com	N/A	128 uint8
ORB (OpenCV) kp:8000, match:nn	19-04-24	F	87.6	2988.1	77.9	2.81	0.1356	0.1893	0.2307	0.2664	0.2996	—	Challenge organizers	ORB, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search.	https://opencv.org	imagematching@uvic.ca	N/A	TBA
Scale-invariant desc. (Log-Polar, lambda=32) kp:8000, match:nn	19-06-25	F	97.7	6627.9	96.4	3.31	0.3811	0.4782	0.5389	0.5855	0.6246	—	Patrick Ebel	We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019.	TBA	patrick.ebel@epfl.ch	N/A	128 float32
Scale-invariant desc. (Log-Polar, lambda=64) kp:8000, match:nn	19-06-24	F	97.6	6792.6	96.4	3.30	0.3803	0.4783	0.5396	0.5848	0.6221	—	Patrick Ebel	We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019.	TBA	patrick.ebel@epfl.ch	N/A	128 float32
Scale-invariant desc. (Log-Polar, lambda=96) kp:8000, match:nn	19-06-20	F	97.6	6831.3	96.9	3.29	0.3827	0.4803	0.5427	0.5886	0.6255	—	Patrick Ebel	We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019.	TBA	patrick.ebel@epfl.ch	N/A	128 float32
SIFT-AID (NN matcher) kp:8000, match:nn	19-05-10	F	88.7	2397.1	80.7	2.85	0.1572	0.2237	0.2688	0.3076	0.3422	—	Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon	We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: https://hal.archives-ouvertes.fr/hal-02016010. Code: https://github.com/rdguez-mariano/sift-aid	https://hal.archives-ouvertes.fr/hal-02016010	facciolo@cmla.ens-cachan.fr	N/A	6272 bits
SIFT-AID (custom matcher) kp:8000, match:sift-aid	19-04-29	F/M	87.6	2261.4	79.1	2.78	0.1454	0.2107	0.2553	0.2912	0.3249	—	Mariano Rodríguez, Gabriele Facciolo, Rafael Grompone Von Gioi, Pablo Musé, Jean-Michel Morel, Julie Delon	We extract the keypoints using OpenCV's implementation of SIFT. The AID descriptors are computed with a CNN from patches extracted at each keypoint location, the result is a binary descriptor of 6272 bits. The matching is computed as the Hamming distance between the descriptors, with the decision threshold set at 4000. Preprint: https://hal.archives-ouvertes.fr/hal-02016010. Code: https://github.com/rdguez-mariano/sift-aid	https://hal.archives-ouvertes.fr/hal-02016010	facciolo@cmla.ens-cachan.fr	N/A	6272 bits
SIFT + ContextDesc kp:8000, match:nn	19-05-09	F	97.9	6020.4	97.2	3.26	0.3828	0.4821	0.5399	0.5853	0.6226	—	Zixin Luo	ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors.	https://github.com/lzx551402/contextdesc	zluoag@cse.ust.hk	N/A	128 uint8
SIFT-Dense-ContextDesc kp:8000, match:nn	19-05-28	F	96.6	5084.7	94.8	3.29	0.3559	0.4536	0.5129	0.5582	0.5946	—	Zixin Luo, Jiahui Zhang	Dense-ContextDesc is a variant of ContextDesc, where descriptors are densely extracted from full images, instead of image patches, while other settings stay unchanged as original ContextDesc. We find Dense-ContextDesc performs better regarding in particular illumination changes. Dense-ContextDesc is extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features are quantized to uint8 and extracted from the code provided by the authors. The model will be available on the authors' GitHub page.	TBA	zluoag@cse.ust.hk	N/A	TBA
SIFT + ContextDesc + Inlier Classification V2 kp:8000, match:custom	19-05-28	F/M	98.6	6126.0	97.5	3.44	0.5755	0.6830	0.7389	0.7750	0.8006	—	Dawei Sun, Zixin Luo, Jiahui Zhang	We use the SIFT detector and ContextDesc descriptor, and then we train an improved inlier classification and fundamental matrix estimation network based on [Yi et al. CVPR2018] (https://arxiv.org/pdf/1711.05971.pdf).	https://github.com/lzx551402/contextdesc	zluoag@cse.ust.hk	N/A	128 float32
SIFT-GeoDesc-GitHub kp:8000, match:nn	19-05-08	F	97.4	5213.3	95.5	3.33	0.3805	0.4731	0.5298	0.5741	0.6141	—	Zixin Luo	GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search. Features extracted from the code provided by the authors.	https://arxiv.org/abs/1807.06294	zluoag@cse.ust.hk	N/A	128 float32
SIFT + GeoDesc kp:8000, match:nn	19-04-24	F	97.3	5583.8	95.8	3.39	0.3858	0.4778	0.5317	0.5790	0.6139	—	Challenge organizers	GeoDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search.	https://github.com/lzx551402/geodesc	imagematching@uvic.ca	N/A	128 float32
SIFT + HardNet kp:8000, match:nn	19-04-24	F	97.9	6552.9	97.2	3.36	0.3894	0.4887	0.5481	0.5940	0.6310	—	Challenge organizers	HardNet extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search.	https://github.com/DagnyT/hardnet	imagematching@uvic.ca	N/A	128 float32
SIFT + L2-Net kp:8000, match:nn	19-04-24	F	97.3	6082.5	96.3	3.27	0.3513	0.4473	0.5087	0.5548	0.5942	—	Challenge organizers	L2-Net extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search.	https://github.com/yuruntian/L2-Net	imagematching@uvic.ca	N/A	128 float32
SIFT + ContextDesc + Inlier Classification V1 kp:8000, match:custom	19-05-29	F/M	98.4	6045.8	97.8	3.43	0.5553	0.6633	0.7169	0.7545	0.7849	—	Dawei Sun, Zixin Luo, Jiahui Zhang	We use the SIFT detector and ContextDesc descriptor, and then we train an inlier classification and fundamental matrix estimation network using the architecture of [Yi et al. CVPR2018] (https://arxiv.org/pdf/1711.05971.pdf).	https://github.com/lzx551402/contextdesc	zluoag@cse.ust.hk	N/A	128 float32
SIFT (OpenCV) kp:8000, match:nn	19-04-24	F	93.6	4341.5	88.5	3.15	0.2881	0.3640	0.4146	0.4550	0.4901	—	Challenge organizers	SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search.	https://opencv.org	imagematching@uvic.ca	N/A	128 float32
SIFT + TFeat kp:8000, match:nn	19-04-24	F	96.5	5434.3	94.1	3.20	0.3153	0.4057	0.4643	0.5122	0.5511	—	Challenge organizers	T-Feat extracted on SIFT keypoints. Number of keypoints: 8000 per image. Models trained on the Liberty sequence of the Brown dataset. We use slightly larger paches than specified for SIFT (scale multiplying factor 16/12). Feature matching with brute-force nearest-neighbour search.	https://github.com/vbalnt/tfeat	imagematching@uvic.ca	N/A	128 float32
SIFT (OpenCV) kp:2048, match:nn	19-05-17	F	85.3	1214.3	76.9	2.93	0.1521	0.2060	0.2439	0.2762	0.3038	—	Challenge organizers	SIFT, as implemented in OpenCV. Feature matching with brute-force nearest-neighbour search.	https://opencv.org	imagematching@uvic.ca	N/A	128 float32
Superpoint (nn matcher) kp:2048, match:nn	19-06-07	F	95.0	1578.8	90.3	3.55	0.3203	0.4198	0.4778	0.5207	0.5578	—	Challenge organizers	SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference.	TBA	imagematching@uvic.ca	N/A	256 float32
Superpoint (1:1 matcher) kp:2048, match:nn1to1	19-06-07	F	95.6	1525.9	91.4	3.66	0.3806	0.4845	0.5440	0.5877	0.6210	—	Challenge organizers	SuperPoint features from the submission 'SuperPoint + Custom Matcher (v2)' with a brute-force 1:1 matcher instead of the custom matcher. For reference.	TBA	imagematching@uvic.ca	N/A	256 float32
SuperPoint (default) kp:2048, match:nn	19-04-24	F	91.2	989.7	85.9	3.50	0.2845	0.3745	0.4267	0.4665	0.4974	—	Challenge organizers	SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned (about 1200 on average). Feature matching done with brute-force nearest-neighbour search.	https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork	imagematching@uvic.ca	N/A	256 float32
SuperPoint kp:1024, match:nn	19-04-26	F	90.1	813.6	83.9	3.47	0.2812	0.3646	0.4145	0.4506	0.4792	—	Challenge organizers	SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search.	https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork	imagematching@uvic.ca	N/A	256 float32
SuperPoint kp:2048, match:nn	19-04-26	F	91.8	1465.2	87.8	3.40	0.2760	0.3650	0.4222	0.4631	0.4948	—	Challenge organizers	SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search.	https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork	imagematching@uvic.ca	N/A	256 float32
SuperPoint kp:256, match:nn	19-04-26	F	55.1	148.9	42.5	2.55	0.0841	0.1125	0.1289	0.1400	0.1493	—	Challenge organizers	SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search.	https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork	imagematching@uvic.ca	N/A	256 float32
SuperPoint kp:512, match:nn	19-04-26	F	80.9	383.5	72.5	3.24	0.2112	0.2751	0.3118	0.3383	0.3586	—	Challenge organizers	SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search.	https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork	imagematching@uvic.ca	N/A	256 float32
SuperPoint kp:8000, match:nn	19-04-26	F	93.7	4943.5	91.9	3.22	0.2148	0.3231	0.3932	0.4454	0.4882	—	Challenge organizers	SuperPoint features. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We lower the default detection threshold to take the number of features indicated in the label. Feature matching done with brute-force nearest-neighbour search.	https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork	imagematching@uvic.ca	N/A	256 float32
Scale-invariant desc. (Cartesian, lambda=16) kp:8000, match:nn	19-07-29	F	97.9	7013.7	97.3	3.32	0.3478	0.4541	0.5208	0.5686	0.6076	—	Patrick Ebel	We compute scale-invariant descriptors with a log-polar transformation of the patch. Keypoints are DoG, with a scaling factor of lambda/12 over its chosen scale. (This is a baseline where we use cartesian patches instead.) Reference: 'Beyond Cartesian Representations for Local Descriptors', Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls, ICCV 2019.	TBA	patrick.ebel@epfl.ch	N/A	128 float32
SuperPoint (trained on coco + phototourism training set) kp:2048, match:nn	19-05-30	F	94.3	1213.1	88.6	3.54	0.3217	0.4166	0.4735	0.5176	0.5521	—	Daniel DeTone, Paul Sarlin, Tomasz Malisiewicz, Andrew Rabinovich	SuperPoint V1 model trained on COCO homographic warps at VGA resolution, plus pairs from the phototourism training set using the GT poses and depths for correspondence. If necessary, we downsample the images so that the largest dimension is at most 1024 pixels. We extract features with the default parameters and use however many are returned.	https://github.com/MagicLeapResearch/SuperPointPretrainedNetwork	ddetone@magicleap.com	N/A	256 float32
SuperPoint + Custom Matcher (v1) kp:2048, match:custom	19-05-30	F/M	96.5	1129.8	92.0	3.85	0.4647	0.5738	0.6320	0.6721	0.7010	—	Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich	Features are extracted by a modified SuperPoint: - retrained on the phototourism training set; - keypoints refinement; - better descriptor sampling; - adjusted thresholds. A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018).	TBA	pesarlin@magicleap.com	N/A	256 float32
SuperPoint + Custom Matcher (v2) kp:2048, match:custom	19-05-28	F/M	96.5	1349.6	92.1	3.87	0.4826	0.5908	0.6458	0.6841	0.7122	—	Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich	Features are extracted by a modified SuperPoint: - retrained on the phototourism training set, - keypoints refinement, - better descriptor sampling, - adjusted thresholds; A custom matcher estimates the image rotation and rejects outlier matches using the correspondence classifier introduced in 'Learning To Find Good Correspondences' (Yi et al., 2018).	TBA	pesarlin@magicleap.com	N/A	256 float32
SURF (OpenCV) kp:8000, match:nn	19-04-24	F	90.1	2864.2	80.8	2.87	0.1894	0.2555	0.3007	0.3372	0.3702	—	Challenge organizers	SURF, as implemented in OpenCV. Number of keypoints: 8000 per image. Feature matching with brute-force nearest-neighbour search.	https://opencv.org	imagematching@uvic.ca	N/A	TBA
SIFT + ContextDesc kp:8000, match:nn1to1	19-06-07	F	98.1	6472.1	98.0	3.34	0.4287	0.5371	0.6017	0.6464	0.6826	—	Challenge organizers	ContextDesc extracted on SIFT keypoints. Number of keypoints: 8000 per image. Feature matching with nearest-neighbour search, enforcing cross-match consistency. Features are quantized to uint8 and extracted from the code provided by the authors.	https://github.com/lzx551402/contextdesc	imagematching@uvic.ca	N/A	TBA

[S1] SILDa dataset — Image matching task

Coming Soon.

IMW CVPR 2019: Leaderboard

[S1] SILDa dataset — Image matching task

home