-
Notifications
You must be signed in to change notification settings - Fork 292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HashTable grows even though capacity is not exceeded #602
Comments
This is because I proposed a cleanup method in #255, but you would still need a proper hasher to use that. |
There are two issues here.
There should be a way to initialize HashTable that ensures the table will not grow unless you exceed the initially given size. This is an important feature of collections. Currently with_capacity is buggy/misleading because it implies that guarantee but it does not actually uphold it. All std collections' with_capacity method guarantee this. If HashTable needs more space than the given capacity, then it is free to internally allocate more in with_capacity. It is not free to allocate this later. I'm unsure whether there should be a guarantee for not moving elements around when there is capacity. I assumed HashMap worked this way but I see now it is not stated anywhere. You probably don't want to guarantee this to get more freedom with the implementation. |
Are deletions leaving unusable entries, that can only be reused by rehashing the entire table, the only reason that capacity can go down? |
Double insert for the same key leads to the same issue, but seemingly only when capacity is equal to length: use std::collections::HashMap;
const PROBE_SIZE: usize = 7;
fn main() {
let mut map = HashMap::with_capacity(PROBE_SIZE);
for i in 0..PROBE_SIZE {
map.insert(i, 0);
map.insert(i, 0);
eprintln!(
"i = {i}, len = {}, capacity = {}",
map.len(),
map.capacity()
);
}
}
|
I think the issue is that the hashtable does not check whether the key is already present before deciding to grow. The following only does the duoble insert for the last iteration and has the same output as your own: use std::collections::HashMap;
const PROBE_SIZE: usize = 7;
fn main() {
let mut map = HashMap::with_capacity(PROBE_SIZE);
for i in 0..PROBE_SIZE {
map.insert(i, 0);
if i == PROBE_SIZE - 1{
map.insert(i, 0);
}
eprintln!(
"i = {i}, len = {}, capacity = {}",
map.len(),
map.capacity()
);
}
} |
Right, it happens here: I'm not really sure how to make it work out of the box without incurring performance penalties, but it seems like a good behavior to uphold. |
HashTable::with_capacity
docs say:The following test shows that this statement is incorrect:
Expectation:
As the assert shows, every time the loop begins there is at least one capacity left. The HashTable does not need to grow and it will not call our hasher in
HashTable::entry
.Reality:
The HashTable attempts to grow and calls the hasher, triggering the unreachable. The debug prints tell us that this happens at i=112.
This is a problem because it prevents you from writing code that relies on HashTable not growing when enough capacity has been allocated. You should be able to write such code.
The text was updated successfully, but these errors were encountered: