-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
should we eliminate num MinHashes? (no) #1354
Comments
To me, they are useful to have in the same API for comparison as a validation that |
Thank you, this is valuable feedback!
It already kind of is, but we should be more explicit about it :) My two concerns are -
Clearer communication FTW. |
I don't think we need to remove them, but I think we should split them. Many of the misuse issues happen because we didn't check if it was a num or scaled minhash... (and splitting them also allows many perf optimizations underneath, merging would be greatly simplified...) That said, it is easier to split on the Rust side, but what would be the API on the Python side? Do we keep one MinHash object, or create |
yes. well, and because we only slowly came to accept how much better scaled MinHashes were for most things :)
agree.
I'm tentatively in favor of having two different subclasses, but would want to explore the consequences. I wonder how much of this could be explored in Python-land before making any Rust changes? |
I think it is OK to prototype by checking explicitly (everywhere, all the time) for |
Seems reasonable. There are lots of small things to explore here - see esp these big related issues, among the myriad of smaller ones:
|
while they are nice to have around and are not much of a maintenance burden, they do complicate things --
see #1345, #870 for examples.
they are also not usable for a lot of purposes.
Discuss!
The text was updated successfully, but these errors were encountered: