-
Notifications
You must be signed in to change notification settings - Fork 729
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[hashset feature] Scan shortcut fix #1217
Conversation
Signed-off-by: Rain Valentine <[email protected]>
Signed-off-by: Rain Valentine <[email protected]>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## hashset #1217 +/- ##
===========================================
- Coverage 70.77% 70.36% -0.42%
===========================================
Files 115 115
Lines 64714 63812 -902
===========================================
- Hits 45804 44899 -905
- Misses 18910 18913 +3
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finally you convinced me that the current logic has a problem!
This PR solves the issue indeed. I'm inclined to merge it. I just posted some suggestions.
Letting rehash follow probing chains can add a lot of latency to individual commands though. I'm not sure that's worse than adding very little latency to each scan call.
Some context: When debugging a test case of AOF rewrite, I observed probing chains of length 6000. This is when reading an AOF file. Actually I believe all non-empty buckets of the table were in the same probing chain.
Reading an AOF file means reading commands and executing them. The AOF file create by AOF rewrite has the commands stored per key, and the keys are in iterator order = probing order. Do you see the problem?
All commands that hash to the same bucket in a large table come together in the beginning of the AOF file, creating huge probing chains in small tables, before growing kicks in. A SET command takes 300 times longer than normal. I tried to let rehashing follow probing chains, but this got even worse spikes, with long probe chains + rehashing a very long chains.
Here's the test case: https://github.com/valkey-io/valkey/pull/1178/files#r1814492725
If we could let the iterator walk in some other order (such as index++) we could avoid this, but it doesn't completely solve the problem either. I guess there's simply a chance of getting very large probe chains and this cases a severe degradation.
So what can we do? Maybe keep track of not only the number of 'everfulls', but the max lengths of probe chains, and grow the table if it's too long? This might help.
But I think the best solution is Madelyn's idea about chaining instead of probing.
I think we need to implement it. It has other benefits too, like
- we can have higher fill factor and we can avoid rehashing indefinitely when there's a fork running;
- scan doesn't need to follow chains wrapping around zero, so it won't return duplicates.
src/hashset.c
Outdated
/* Mask the start cursor to the bigger of the tables, so we can detect if we | ||
/* Mask the start cursor to the smaller of the tables, so we can detect if we | ||
* come back to the start cursor and break the loop. It can happen if enough | ||
* tombstones (in both tables while rehashing) make us continue scanning. */ | ||
cursor = cursor & (expToMask(s->bucket_exp[0]) | expToMask(s->bucket_exp[1])); | ||
cursor &= expToMask(s->bucket_exp[0]); | ||
if (hashsetIsRehashing(s)) { | ||
cursor &= expToMask(s->bucket_exp[1]); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this change, I guess there's a small increased risk of returning duplicates.
Time 1:
- Table size: 16 buckets.
- Scan order: 0 -> 8 -> 4 -> 12 -> 2 -> 10 -> 6 -> 14 -> 1 -> ... -> 15.
SCAN 0
returned new cursor 8.
Time 2:
- Table sizes 16 and 2. Shrinking started. Scan order: 0 -> 1.
- User continues previous scan with
SCAN 8
. - Old implementation:
SCAN 8
returns elements from small table bucket 0 (masked with small table mask) and large table bucket 8 (masked with large table mask). Large table bucket 0 can safely be skipped. - New implementation:
SCAN 8
returns elements from small table bucket 0 (masked with small table mask) and large table bucket 0 and 8 (expansion of cursor masked with small table mask). - Both implementaions: Return new cursor 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I see! Yes, that makes sense. I think it'll be fixed if I don't modify cursor
here and instead apply the masks to start_cursor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, maybe. But it doesn't matter much, especially if we want to change probing to chaining anyway... Let's not spend too much effort on this.
d33eec8
to
0849972
Compare
…duplicates when a shrink-rehash started since the last scan step Signed-off-by: Rain Valentine <[email protected]>
0849972
to
8388f85
Compare
With chaining instead of probing, I guess we don't need this fix anymore. Thanks anyway! |
This is targeting the hashset branch and will update the hashset PR when merged. This fixes the issue I saw where scan can miss elements when we're rehashing: #1186 (comment)
The UT fails without my fix, and passes with my fix 😁