-
Notifications
You must be signed in to change notification settings - Fork 794
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
about GlobalDescriptor #1105
Comments
the image GlobalDescriptor was not implemented. |
Is there a specific question? The GlobalDescriptor is currently a placeholder, there is no built-in mechanism to create one or even use it to find loop closures. It was introduced to help integration with https://github.com/MarvinStuede/cmr_lidarloop, so that GlobalDescriptor can be saved in the database, and then re-extracted afterwards for external loop closure detection. |
I recently re-read previous papers to consider how to utilize the global descriptor, especially VLAD. It‘s mainly useful in two aspects, evaluating the similarity between nodes, and fast nodes retrieval. Currently, the similarity is evaluated by rtabmap/corelib/src/Signature.cpp Line 234 in a0476af
Likelihood is calculated via similarity or TF-IDF. With global descriptors of two nodes, similarity can be obtained by directly calculating their scalar product. Especially when VLAD has been normalized, the result is actually cosine similarity. Because VLAD can encode images into very small data, global descriptors for a large number of nodes can be kept in memory for fast retrieval. For the 4096-D VLAD output by NetVLAD or HF-Net, each global descriptor will occupy 16KB. So if there are 10,000 nodes, it would take 164MB to store their global descriptors. KNN search can be used to find nodes that may detect loop closure from LTM, or to filter nodes used to calculate likelihood from WM, similar to LoopGPS. Another question is whether Bayesian filter is still necessary when using global descriptor. I currently believe that evaluating likelihood is more reasonable than using similarity directly. After all, it is a bit difficult to determine thresholds for different types of global descriptors. Likelihood can tell whether the similarity is truly significant, and eliminate unnecessary comparisons. We can now keep global descriptors all in memory. However, the memory management of RTAB-Map and VLAD make it possible to challenge extremely large-scale scenarios (such as city-level) in the future. By then we may also consider memory management of global descriptors. |
The Bayes filter is still useful to filter spurious high likelihood (false positives), i.e., we need some consecutive high likelihood in an area to trigger a loop closures/re-localization. I saw your pull request #1255, I integrated the changes to #1163 (which I just updated with latest master). You can give a try. It seems to work, I will merge it like this for now after CI is happy. The current issue is that the resulting loop closure hypotheses are low, so they not trigger loop closures. However, the highest hypothesis seems to be the right one. Here an example (likelihood computed with NETVLAD using your similarity approach): Here is the full result using the sample dataset (on left is with NETVLAD and on right is with TF-IDF BOW approach): We can see that the highest hypotheses are pretty much the same between the 2 approaches, though the actual hypothesis value is lower with netvlad. Would need to spend more time to see why and maybe adding a scaling factor on similarity can make the best hypothesis more higher in comparison to the others. This could be related to how we compute the likelihood for the "no loop closure" likelihood (see this). Command used:
Python script to extract netvlad: https://github.com/introlab/rtabmap/blob/pydescriptor/corelib/src/python/rtabmap_netvlad.py |
Glad to see the result. I marked #1255 as draft because I haven't tested it yet. The VLAD output of HF-Net on the OAK camera is still incorrect. So I originally wanted to use it to verify the model's output first. Since NetVLAD is also available, I will test it on Jetson later. The current issue may be due to the fact that we are using cosine distance to evaluate similarity (All about VLAD, Section 3), while the loss during NetVLAD training uses Euclidean distance. The Euclidean distance should range from 0 to 2, when the descriptors have been normalized. In addition, the range of cosine similarity is actually -1 to 1, while Signature::compareTo() wants 0 to 1. 0 means no correlation between locations, while -1 means negative correlation. I can't imagine what kind of locations would be negatively correlated:> But cosine similarity is also proportional to the square of the Euclidean distance, which seems easier to understand. Obviously the similarity evaluation here needs to be adjusted. You can try to see which formulation is more reasonable. Perhaps we should also check the distribution of calculated similarities. If cosine similarity works, it can then be rewritten as matrix-vector multiplication. This allows efficient one-to-many similarity evaluation. |
I'll give a try using the euclidean distance as a similarity measure and see if there is a big difference. From the original paper:
Reading the paper you linked I see why you used the scalar product:
But yeah, we may indeed cannot use the scalar product directly, but would need to rescale it between 0 and 1. |
Σ((Xna - Xnb)^2) |
Thanks for the equations. I compared L2 distance (rescaled between 0 and 1) versus dot product (rescaled between 0 and 1) and they are indeed proportional: rtabmap/corelib/src/Rtabmap.cpp Line 5291 in 11adbdc
From this paper:
Here is the comparison of the raw likelihood (and effect on adjusted likelihood and bayes filter posterior) between netvlad dotproduct similarity, direct local features similarity (pairs/totalWords) and TF-IDF approaches on NewCollege dataset for a specific image. NETVLAD dot product similarity: Direct local features similarity (pairs/totalWords): In conclusion, with global descriptors, we would need to make similarity less similar when images are not taken at the same place. This would decrease the |
This is also the expected behavior resulting from the learning goal. According to the NetVLAD paper, Section 4, the goal is to make the distance between the query image and the positive smaller than the distance to the negatives. |
I was checking this other paper, and we can observe the same results I saw earlier (too similar to all other descriptors) and they also found that was a problem. They did a PCA on some images smilar to the dataset to know which dimensions are more discriminative in the descriptor, then just do similarity with them. Left is the similarity matrix between all images of the dataset, and on right a sample of an image compared to all others. There is a bigger difference after doing their PCA approach. From what I understand, NetVLAD descriptor is already the resulting PCA (normalized), so first value is the most discriminative dimension. It is why we can take only 128 first values of 4096 and have similar performance. In that paper, thy used a particle filter to smooth the detections, but they kinda hard-coded the minimum distance between descriptors to be considered as possible loop closures. In most loop closure detection papers I've seen using global descriptors, they generally check if distance is over a fixed threshold, then geometrically verify that the best match works, then a loop closure is detected. For example, in Kimera-VIO-NetVLAD, they seem to do the same even with their BOW approach. Another paper that seems to have decent loop closure results, though the hypotheses selection is quite different than what we do in BOW. The results seem to vary more between datasets. They suggest to have a robust back-end to ignore false positives. I'll merge the PR, while not yet super useful in rtabmap like this for loop closure detection, one could still enable NetVLAD for localization, if we assume that the robot is always in the map, just always test the best hypothesis even if it is low. |
GlobalDescriptor was not implemented.
The text was updated successfully, but these errors were encountered: