-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hash.mapping of interaction might be incorrect #67
Comments
What do you mean by overlap by collision? If there are some collisions, I supposed it was expected to have some redundancy in the mapping. Or do you mean there are something else making worse the number of collisions in the mapping? |
It is an internal error and the resulting mapping might miss some feature of interaction if the main effects (non-interacted features) have collision on the space of 2^32(before taking modulo). The possibility should be low, but I need to figure out a way to eliminate the possibility. |
This should be the last change before sumitting v0.9. |
Well, after reviewing the code, I think the probability of occurring this bug is very very small. The bug only occurs If there is an interaction of any kind of feature with an array-type feature, and the hashed values of the array-type feature have collision in one record. For example, suppose there are two columns To resolve this issue, I need to sacrifice lots of efficiency under current structure. Therefore, I prefer to do nothing. This package is originally designed for predictive analysis and the I'll leave this issue open to see if anyone could give me sufficient reason to drive me fix it. This requires either sacrificing the efficiency or re-writing the core functions. |
I fully agree of not fixing it. Anyway there will be collisions. However it would be interesting to add a word in the detail documentation of the hashing function. What do you think? Kind regards, |
You're right! Thanks for reminding me. |
I still have not found any realization of this bug. It is still theoretical. I'll remove this from milestone 0.9.1 because it blocks me. |
The construction of interaction term in hash.mapping uses the
inverse_mapping
which might be overlapped by collision.The text was updated successfully, but these errors were encountered: