Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New hash table #1186

Merged
merged 3 commits into from
Dec 10, 2024
Merged

New hash table #1186

merged 3 commits into from
Dec 10, 2024

Conversation

zuiderkwast
Copy link
Contributor

@zuiderkwast zuiderkwast commented Oct 17, 2024

This PR implements a new hash table and uses it for keys, command lookup and more. There are multiple commits. Do not squash-merge.

  1. Hash table implementation and unit tests
  2. Use new hash table for command lookup
  3. Use new hash table keys and expire

Hash table design

The hash table is a cache line optimized and implemented as outlined in #169, but changed to chaining instead of probing after an idea by Madelyn. The key-value entry is user-defined, which allows the user to embed key and value within a single allocation. The hash table supports incremental rehashing, scan, random key, etc. just like the dict but faster and using less memory. Each bucket contains a few bits of metadata per entry. For details, see the comments in src/hashtable.{c,h}.

If a bucket is full, the last entry pointer in a bucket can be replaced by a child-bucket pointer and we get a bucket chain.

      Bucket          Bucket          Bucket               
-----+---------------+---------------+---------------+-----
 ... | x x x x x x p | x x x x x x x | x x x x x x x | ... 
-----+-------------|-+---------------+---------------+-----
                   |                                       
                   v  Child bucket                         
                 +---------------+                         
                 | x x x x x x p |                         
                 +-------------|-+                         
                               |                           
                               v  Child bucket             
                             +---------------+             
                             | x x x x x x x |             
                             +---------------+             

Command lookup

The 2nd commit relaces dict with hashtable for command lookup. This was implemented by @SoftlyRaining.

Keys and expire

The 3rd commit replaces dict with the new hash table in kvstore.c and all code that uses it, such as db.c.

The hashtable entry in this case is the robj struct. The key and optionally an expire timestamp are embedded in the robj struct, i.e. the key is embedded in the value. Therefore, we can call this a valkey object, val + key. This design saves roughly 20 bytes per key for short string keys.

 keys                                      expire
hashtable             robj                hashtable
 +---+         +-------------------+        +---+
 | 1 --------->| type, enc, lru    |<-------- 1 |
 | 2 |         | refcount, flags   |        | 2 |
 | 3 |         | ptr               |        | 3 |
 | . |         | [expire]          |        | . |
 | . |         | embedded key      |        | . |
 | . |         | [embedded value]  |        | . |
 +---+         +-------------------+        +---+

[.] = optional

Some db.c functions like dbAdd, setKey and setExpire now reallocate the value object to embed the key and optional expire in it. setKey does not increment the reference counter, since it would require duplicating the object.

Fixes #991
Fixes #992

@zuiderkwast zuiderkwast changed the title Add hashset (a new hash table implementation) and use in command lookup Add hashset (new hash table) and use in command lookup Oct 17, 2024
Copy link

codecov bot commented Oct 17, 2024

Codecov Report

Attention: Patch coverage is 82.72543% with 341 lines in your changes missing coverage. Please review.

Project coverage is 70.90%. Comparing base (f951a1c) to head (3559ba8).
Report is 6 commits behind head on unstable.

Files with missing lines Patch % Lines
src/hashtable.c 74.21% 231 Missing ⚠️
src/module.c 0.00% 31 Missing ⚠️
src/memory_prefetch.c 0.00% 18 Missing ⚠️
src/kvstore.c 95.50% 12 Missing ⚠️
src/defrag.c 81.48% 10 Missing ⚠️
src/server.c 93.46% 10 Missing ⚠️
src/acl.c 74.28% 9 Missing ⚠️
src/object.c 94.69% 7 Missing ⚠️
src/debug.c 63.63% 4 Missing ⚠️
src/io_threads.c 0.00% 4 Missing ⚠️
... and 3 more
Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #1186      +/-   ##
============================================
+ Coverage     70.85%   70.90%   +0.04%     
============================================
  Files           118      119       +1     
  Lines         63612    64652    +1040     
============================================
+ Hits          45075    45839     +764     
- Misses        18537    18813     +276     
Files with missing lines Coverage Δ
src/aof.c 80.20% <100.00%> (+0.07%) ⬆️
src/bitops.c 93.77% <100.00%> (ø)
src/cluster.c 88.51% <100.00%> (ø)
src/cluster_legacy.c 86.86% <100.00%> (+0.22%) ⬆️
src/config.c 77.65% <100.00%> (+0.02%) ⬆️
src/evict.c 97.71% <100.00%> (-0.04%) ⬇️
src/expire.c 96.59% <100.00%> (+0.02%) ⬆️
src/geo.c 93.58% <100.00%> (-0.02%) ⬇️
src/hyperloglog.c 92.23% <100.00%> (ø)
src/latency.c 80.92% <100.00%> (+0.04%) ⬆️
... and 25 more

... and 6 files with indirect coverage changes

@zuiderkwast zuiderkwast marked this pull request as ready for review October 17, 2024 14:10
@zuiderkwast zuiderkwast requested a review from madolson October 17, 2024 14:11
@xingbowang
Copy link
Contributor

xingbowang commented Oct 17, 2024

For command lookup, since the command table is rarely updated, why not build an efficient trie? Or it is even more slower?
Or try perfect hash function. https://en.wikipedia.org/wiki/Perfect_hash_function

Copy link
Member

@madolson madolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a partial review (everything but hashset.c, which I guess is the interesting stuff) and tests. Blocked time off tomorrow to read through :) Exciting stuff!

@zuiderkwast
Copy link
Contributor Author

For command lookup, since the command table is rarely updated, why not build an efficient trie? Or it is even more slower? Or try perfect hash function. https://en.wikipedia.org/wiki/Perfect_hash_function

@xingbowang For command table, yes a perfect hash function would be even better. The main point of this implementation is to use it for the database keys though, which will come in another PR. (A trie is possibly slower because there might be more memory accesses.)

Copy link
Member

@soloestoy soloestoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

excellent work!

to be honest, I didn't like open-addressing hash tables before, because the tombstones generated by deletions have a very negative impact on both performance and space. I felt they were not suitable for online services with a large number of deletion operations.

However, this design is quite clever, deleting an element does not create a tombstone, only when a bucket has been full at some point is it considered to have a tombstone, which greatly reduces the impact of deletion operations.

A bucket's length is 7, so in terms of chained hashing, a tombstone effect would only occur when there are 7 collisions, which intuitively seems to be a relatively rare situation.

In theory, this should save memory. Compared to the previous chained hashing, it saves two pointers (for the previous and next dictEntry) and adds one hash value, saving about 15 bytes per element. However, the frequency of resizing might be higher, as resizing is triggered not only by the fill factor but also by the proportion of tombstones. Do we have reliable tests for space and performance?

@zuiderkwast
Copy link
Contributor Author

In theory, this should save memory. Compared to the previous chained hashing, it saves two pointers (for the previous and next dictEntry) and adds one hash value, saving about 15 bytes per element. However, the frequency of resizing might be higher, as resizing is triggered not only by the fill factor but also by the proportion of tombstones. Do we have reliable tests for space and performance?

@soloestoy I'm glad you like the design.

In my WIP branch for using this for keys, i get 10-15% better performace for GET compared to unstable.

I set the fill factor to 77% currently, and with 7/8 of the space used for pointers (1/8 in each bucket is metadata bitys), it means that 67% of the allocation is pointers.

Rehashing triggered by the proportion of tombstones is a corner case. It didn't implement it first, but I realized in the defrag tests, the test code tried to create a lot of fragmentation and it added many keys and eviction deleted many keys. In this case, the table became full of tombstones. It does not happen very often and I think it will not affect the performance.

There is a potential problem with CoW during fork though. The dict can avoid rehashing up to it becomes 500% fill factor. This is not possible with open addressing. We have to rehash at some point. I set the hard limit to 90% currently. I hope it will not be a major problem. In the "resize avoid" mode, resizing is allowed but incremental rehashing is paused, so only new keys are added to the new table. This also avoids destroying the CoW.

Copy link
Member

@madolson madolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like 75% review of hashset.c, and my brain is done with reviewing. Overall looks really solid though.

The one thing that occurred to me while I was reviewing is that instead of using the everfull bit to start probing, we could use it to indicate the last position is a pointer to another bucket and do linked list resolution. This does a couple of things:

  1. Prevents significant chain lengths, since collisions won't cause issues.
  2. Keeps the overall scanning implementation much more similar to the current implementation.
  3. We get most of the performance benefit from the cache lines and the nextCursorCall() is likely almost as random as a random DRAM lookup. (Which we'll probably prefetch anyways?)

@SoftlyRaining
Copy link
Contributor

Like 75% review of hashset.c, and my brain is done with reviewing. Overall looks really solid though.

The one thing that occurred to me while I was reviewing is that instead of using the everfull bit to start probing, we could use it to indicate the last position is a pointer to another bucket and do linked list resolution. This does a couple of things:

1. Prevents significant chain lengths, since collisions won't cause issues.

2. Keeps the overall scanning implementation much more similar to the current implementation.

3. We get _most_ of the performance benefit from the cache lines and the nextCursorCall() is likely almost as random as a random DRAM lookup.

For what it's worth, I can make nextCursor() use sequential, in-memory order for rehashing, scanning, and iterator if we use the most significant bits instead of the least significant bits of the hash for bucket index. The math for iterating the two different-sized tables in parallel for rehash and scan would be a little different but not bad - basically bit shifting instead of masking.

At any rate, I plan to add prefetch hints - not sure how much that would eliminate the benefit of sequential memory access.

@zuiderkwast
Copy link
Contributor Author

The one thing that occurred to me while I was reviewing is that instead of using the everfull bit to start probing, we could use it to indicate the last position is a pointer to another bucket and do linked list resolution.

@madolson This is a great idea. It reminds me of "bulk chaining" in the SegCache paper. I think it can be done without changing the API, so it can be done at any point.

I believe it has many benefits, like we can postpone rehashing almost indefinitely like the dict does when there's a fork. We can also prevent scan from returning duplicates (when it wraps around zero following a probe chain).

For what it's worth, I can make nextCursor() use sequential, in-memory order for rehashing, scanning, and iterator if we use the most significant bits instead of the least significant bits of the hash for bucket index. The math for iterating the two different-sized tables in parallel for rehash and scan would be a little different but not bad - basically bit shifting instead of masking.

@SoftlyRaining This sounds interesting. Memory is fetched per cache-line, so I'm wondering if there are benefits to accessing it sequentially to the current reverse-bit-increment. Systems with larger cache lines, sure, but that's only macbook so far? Predictive memory fetching predicts that cache lines are accessed sequentially?

@SoftlyRaining
Copy link
Contributor

This sounds interesting. Memory is fetched per cache-line, so I'm wondering if there are benefits to accessing it sequentially to the current reverse-bit-increment. Systems with larger cache lines, sure, but that's only macbook so far? Predictive memory fetching predicts that cache lines are accessed sequentially?

@zuiderkwast afaik you're right about cache lines. The benefit would be making it predictable so branch prediction can understand it and prefetch buckets efficiently. My guess is that it'd perform almost as well as hand-tuned prefetch hints.

@zuiderkwast
Copy link
Contributor Author

zuiderkwast commented Nov 7, 2024

Hello. I've changed the design based on Madelyn's idea of bucket chaining. It's no longer open addressing.

The one thing that occurred to me while I was reviewing is that instead of using the everfull bit to start probing, we could use it to indicate the last position is a pointer to another bucket and do linked list resolution. This does a couple of things:

  1. Prevents significant chain lengths, since collisions won't cause issues.

  2. Keeps the overall scanning implementation much more similar to the current implementation.

  3. We get most of the performance benefit from the cache lines and the nextCursorCall() is likely almost as random as a random DRAM lookup. (Which we'll probably prefetch anyways?)

Furthermore,

  1. there is no hard limit on fill factor, so we can postpone rehashing indefinitely when there is a fork (as we do for dict)
  2. iteration and incremental rehashing are not coupled with the scan cursor increment
  3. scan no longer returns duplicates or extremely many elements in the same iteration
  4. it can work with a higher fill factor and I believe it will use less memory in practice

@zuiderkwast
Copy link
Contributor Author

@touitou-dan Hello! Can you help with a prefetching implementation for this hash table?

@touitou-dan
Copy link

Hey @zuiderkwast , I’d be happy to help! However, I’d first like the opportunity to hear about the new implementation. Would you be available for a chat?

Copy link
Member

@rjd15372 rjd15372 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've only reviewed the hashset.[hc] code. I'll continue reviewing the remaining files later today.

@rjd15372
Copy link
Member

rjd15372 commented Nov 13, 2024

@zuiderkwast finished the review, and I don't have more comments apart from the ones I sent this morning. Very nice design and implementation!

Copy link
Contributor

@JimB123 JimB123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like that we plan to combine key/value/metadata into a single reference counted "element" and that the main dict just becomes a set of reference counted elements. Using an independent reference counted element makes it much easier to perform internal asynchronous operations.

I mostly reviewed the API definition. My main comments are:

  1. I don't think we should be differentiating between a "key" and an "element". An element IS A key.
  2. I don't think we should ever be exposing pointers into the structure. Instead, an opaque hashsetPosition can have appropriate accessor methods.
  3. Keep the opaque stuff opaque. Don't let stack allocation be a reason to expose internals.

@zuiderkwast zuiderkwast force-pushed the hashset branch 2 times, most recently from ed3126a to 9df0601 Compare November 28, 2024 14:40
@hwware
Copy link
Member

hwware commented Nov 28, 2024

In theory, this should save memory. Compared to the previous chained hashing, it saves two pointers (for the previous and next dictEntry) and adds one hash value, saving about 15 bytes per element. However, the frequency of resizing might be higher, as resizing is triggered not only by the fill factor but also by the proportion of tombstones. Do we have reliable tests for space and performance?

@soloestoy I'm glad you like the design.

In my WIP branch for using this for keys, i get 10-15% better performace for GET compared to unstable.

I set the fill factor to 77% currently, and with 7/8 of the space used for pointers (1/8 in each bucket is metadata bitys), it means that 67% of the allocation is pointers.

Rehashing triggered by the proportion of tombstones is a corner case. It didn't implement it first, but I realized in the defrag tests, the test code tried to create a lot of fragmentation and it added many keys and eviction deleted many keys. In this case, the table became full of tombstones. It does not happen very often and I think it will not affect the performance.

There is a potential problem with CoW during fork though. The dict can avoid rehashing up to it becomes 500% fill factor. This is not possible with open addressing. We have to rehash at some point. I set the hard limit to 90% currently. I hope it will not be a major problem. In the "resize avoid" mode, resizing is allowed but incremental rehashing is paused, so only new keys are added to the new table. This also avoids destroying the CoW.

I do not review the changes carefully. But here you mentioned you have 10% -- 15% better performance for GET. I think the point is used memory. If it is, could you share the qps perforamce ? I am not sure how the changes effect to the qps, Thanks

@zuiderkwast
Copy link
Contributor Author

I do not review the changes carefully. But here you mentioned you have 10% -- 15% better performance for GET. I think the point is used memory. If it is, could you share the qps perforamce ? I am not sure how the changes effect to the qps, Thanks

@hwware Thanks for looking. I did a new benchmark just now.

My simple benchmark setup: valkey-server --save ''. Run valkey-benchmark on the same computer.

Unstable:

$ ./valkey-benchmark -P 10 --threads 4 -n 10000000 -r 1000000 -q -t set,get
SET: 1050972.12 requests per second, p50=0.415 msec                     
GET: 1080847.38 requests per second, p50=0.399 msec                     

$ ./valkey-cli info memory | grep used_memory_human
used_memory_human:70.05M

This branch:

$ ./valkey-benchmark -P 10 --threads 4 -n 10000000 -r 1000000 -q -t set,get
SET: 1110864.25 requests per second, p50=0.383 msec                     
GET: 1248283.62 requests per second, p50=0.351 msec                     

$ ./valkey-cli info memory | grep used_memory_human
used_memory_human:63.40M

My laptop is Linux on Macbook pro M2. It is unusual, yes. But others got similar results. @SoftlyRaining @uriyage you have done more benchmarks on more normal server hardware, with/without IO threads, and with larger value size. Please share.

The explanation of the improved speed is fewer memory lookups, especially memory that are not already in L1 cache. This is a very slow operation, so counting memory accesses is a good way to optimize.

  • In this benchmark, the key and value are small and both are embedded in one robj, in one allocation of less than 64 bytes, so it is just one memory access.
  • The hashtable stores a pointer directly to this robj. To avoid checking the key if it is a hash collision, there are some extra hash bits stored to reduce the probability of collision to 1/256. This means that normally only one memory access is required to find this pointer.

So the total is 2 memory accesses in the most cases. For dict with key embedded in dictEntry (current unstable) with have minimum 3 memory accesses, with more if there are hash collisions.

dict -> dictEntry -> ... -> dictEntry -> robj
          (if there are collissions)

Copy link
Member

@madolson madolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Through everything now, nothing is standing out as a major concern to me right now. I think it should be good to merge ASAP.

return 0;
}

if (ht->type->resizeAllowed) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does resize allowed callback override the hard limit check? I would expect this to be in the conditional.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea here is that we want to force a resize if the fill factor is above the hard limit, i.e. it's too full and we need to expand. We want to ignore the resizeAllowed in this case. Additionally, the double fill factor is only computed when needed.

For the context, these are the lines below this one:

    if (ht->type->resizeAllowed) {
        double fill_factor = (double)min_capacity / ((double)numBuckets(old_exp) * ENTRIES_PER_BUCKET);
        if (fill_factor * 100 < MAX_FILL_PERCENT_HARD && !ht->type->resizeAllowed(alloc_size, fill_factor)) {
            /* Resize callback says no. */
            return 0;
        }
    }

Maybe you think we also need to forbid the user from explicitly shrink to a very small size, which would take us above the max fill factor? resize is a internal function and the only way some user can try to shrink it is with hashtableShrinkIfNeeded, so a resize can't get you above the hard max fill factor.

src/hashtable.c Outdated
*
* It's allowed to insert and replace entries. Deleting entries is only allowed
* for the entry that was just returned by hashtableNext. Deleting other entries
* are not supported. (It can cause internal fragmentation.)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like we are trying to handle the internal fragmentation, so is this actually a constraint?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean. The code that handles internal fragmentation in iterator and scan only works for the bucket chain that was just iterated/scanned. If the user deletes from another bucket chain while iterating or from the scan callback, it will leave holes that we don't handle.

* bucket chain, before we move on to the next index. */
if (iter->hashtable->pause_rehash == 1 &&
iter->hashtable->used[iter->table] < iter->last_seen_size) {
compactBucketChain(iter->hashtable, iter->index, iter->table);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More of a followup, maybe we should make the safe iterator have the same guarantees as SCAN, in that you have to process everything in the bucket. Defrag should be fine, since it can queue up everything into defrag later, active expire should be able to cover everything. I don't think there should be any other iteration where items can change in between invocations. If we do that, then we never have any of this bucket compaction code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how we can do that. You mean that we shall allow the fillBucketHole function to run on delete also when iterating? It reorders the entries in the bucket chain. How do we make sure all the entries in the chain are returned exactly once?

@zuiderkwast zuiderkwast force-pushed the hashset branch 2 times, most recently from d6da54f to dfbea9d Compare December 4, 2024 18:13
@enjoy-binbin enjoy-binbin added release-notes This issue should get a line item in the release notes run-extra-tests Run extra tests on this PR (Runs all tests from daily except valgrind and RESP) labels Dec 6, 2024
Copy link
Member

@enjoy-binbin enjoy-binbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't review everything, but overall LGTM, it is a great job.

@@ -276,9 +276,9 @@ void restoreCommand(client *c) {
}

/* Create the key and set the TTL if any */
dbAdd(c->db, key, obj);
dbAdd(c->db, key, &obj);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we use setKey here with TTL as a flag to avoid reallocation?

Copy link
Contributor Author

@zuiderkwast zuiderkwast Dec 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setKey doesn't take a TTL either. I want to improve this in a follow-up to provide something like dbAddWithExpire and setKeyWithExpire to avoid reallocation. It needs some refactoring and I don't want this PR to become larger than what it already is.

Copy link
Member

@madolson madolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all my concerns are addressed, so let's get it merged :)

Copy link
Contributor

@SoftlyRaining SoftlyRaining left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally finished reviewing all of it! I think it's looking pretty good - my comments are just a few small things I don't think are critical to address. :)

zuiderkwast and others added 3 commits December 10, 2024 13:38
A cache-line aware hash table with a user-defined key-value entry type,
supporting incremental rehashing, scan, iterator, random sampling,
incremental lookup and more...

Signed-off-by: Viktor Söderqvist <[email protected]>
This changes the type of command tables from dict to hashtable. Command
table lookup takes ~3% of overall CPU time in benchmarks, so it is a
good candidate for optimization.

My initial SET benchmark comparison suggests that hashtable is about 4.5
times faster than dict and this replacement reduced overall CPU time by
2.79% 🥳

---------

Signed-off-by: Rain Valentine <[email protected]>
Signed-off-by: Rain Valentine <[email protected]>
Signed-off-by: Viktor Söderqvist <[email protected]>
Co-authored-by: Rain Valentine <[email protected]>
Instead of a dictEntry with pointers to key and value, the hashtable
has a pointer directly to the value (robj) which can hold an embedded
key and acts as a key-value in the hashtable. This minimizes the number
of pointers to follow and thus the number of memory accesses to lookup
a key-value pair.

        Keys         robj
      hashtable
      +-------+   +-----------------------+
      | 0     |   | type, encoding, LRU   |
      | 1 ------->| refcount, expire      |
      | 2     |   | ptr                   |
      | ...   |   | optional embedded key |
      +-------+   | optional embedded val |
                  +-----------------------+

The expire timestamp (TTL) is also stored in the robj, if any. The expire
hash table points to the same robj.

Overview of changes:

* Replace dict with hashtable in kvstore (kvstore.c)
* Add functions for embedding key and expire in robj (object.c)
  * When there's unused space, reserve an expire field to avoid realloting
    it later if expire is added.
  * Always reserve space for expire for large key names to avoid realloc
    if it's set later.
* Update db functions (db.c)
  * dbAdd, setKey and setExpire reallocate the object when embedding a key
  * setKey does not increment the reference counter, since it would require
    duplicating the object. This responsibility is moved to the caller.
* Remove logic for shared integer objects as values in the database. The keys
  are now embedded in the objects, so all objects in the database need to be
  unique. Thus, we can't use shared objects as values. Also delete test cases
  for shared integers.
* Adjust various commands to the changes mentioned above.
* Adjust defrag code
  * Improvement: Don't access the expires table before defrag has actually
    reallocated the object.
* Adjust test cases that were using hard-coded sizes for dict when realloc
  would happen, and some other adjustments in test cases.
* Adjust memory prefetch for new hash table implementation in IO-threading,
  using new `hashtableIncrementalFind` API
* Adjust offloading of free() to IO threads: Object free to be done in main
  thread while keeping obj->ptr offloading in IO-thread since the DB object is
  now allocated by the main-thread and not by the IO-thread as it used to be.
* Let expireIfNeeded take an optional value, to avoid looking up the expires
  table when possible.

---------

Signed-off-by: Uri Yagelnik <[email protected]>
Signed-off-by: uriyage <[email protected]>
Signed-off-by: Viktor Söderqvist <[email protected]>
Co-authored-by: Uri Yagelnik <[email protected]>
@zuiderkwast zuiderkwast merged commit 3eb8314 into unstable Dec 10, 2024
72 of 73 checks passed
@zuiderkwast
Copy link
Contributor Author

Merged. If anyone was going to post comments, please post them anyway. I can handle them as follow-ups.

@enjoy-binbin enjoy-binbin deleted the hashset branch December 11, 2024 04:25
zuiderkwast added a commit that referenced this pull request Dec 14, 2024
The new `hashtable` provides faster lookups and uses less memory than
`dict`.

A TCL test case "SRANDMEMBER with a dict containing long chain" is
deleted because it's covered by a hashtable unit test
"test_random_entry_with_long_chain", which is already present.

This change also moves some logic from dismissMemory (object.c) to
zmadvise_dontneed (zmalloc.c), so the hashtable implementation which
needs the dismiss functionality doesn't need to depend on object.c and
server.h.

This PR follows #1186.

---------

Signed-off-by: Rain Valentine <[email protected]>
Signed-off-by: Viktor Söderqvist <[email protected]>
Co-authored-by: Viktor Söderqvist <[email protected]>
kronwerk pushed a commit to kronwerk/valkey that referenced this pull request Jan 27, 2025
The new `hashtable` provides faster lookups and uses less memory than
`dict`.

A TCL test case "SRANDMEMBER with a dict containing long chain" is
deleted because it's covered by a hashtable unit test
"test_random_entry_with_long_chain", which is already present.

This change also moves some logic from dismissMemory (object.c) to
zmadvise_dontneed (zmalloc.c), so the hashtable implementation which
needs the dismiss functionality doesn't need to depend on object.c and
server.h.

This PR follows valkey-io#1186.

---------

Signed-off-by: Rain Valentine <[email protected]>
Signed-off-by: Viktor Söderqvist <[email protected]>
Co-authored-by: Viktor Söderqvist <[email protected]>
zuiderkwast added a commit that referenced this pull request Feb 9, 2025
With verbose logging, this message is logged every 5 seconds.

Before #1186, there were 128 slots in a database with 100 keys. Slots
here means the number of entries that can be stored on the top-level of
a hash table, which in the dict was the same as the number of buckets.
With the new hash table, there are multiple entries per bucket, so the
number of buckets is not a useful metric to show.

This PR multiplies the number of buckets with the number of entry
positions per bucket. With this change, the log message is instead

36236:M 09 Feb 2025 15:47:18.923 - DB 9: 100 keys (0 volatile) in 112
slots HT.

Signed-off-by: Viktor Söderqvist <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-notes This issue should get a line item in the release notes run-extra-tests Run extra tests on this PR (Runs all tests from daily except valgrind and RESP)
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Embed key and TTL in robj Implement new hash table