Embed key and TTL in robj #992

zuiderkwast · 2024-09-04T17:59:02Z

This is to prepare for storing robj in a new hash table implementation.

This includes replace dict with hashset in kvstore, db, expire, ...

The new hash table implementation doesn't have a "dict entry", nor does it have keys and values. It just has "elements". An element can effectively be anything that contains the key. The user provides a "get key" callback function in the hashsetType (same idea as the callbacks in dictType).

 Hash table
+-----------+        +----------------+
| 1         |        | Object with    |
| 2       ---------> | embedded key,  |
| 3         |        | value, LRU and |
+-----------+        | more...        |
                     +----------------+

To use an robj as a container for both key and value, we can embed the key into the robj. The robj (typedef'ed to struct serverObject) is accessed in very many places, so we can't easily make it opaque and change all the accesses to it. That would be very much work. Instead, we can add some extra fields (for optional embedded key and embedded TTL) and keep the existing fields. Wherever the robj is passed around in legacy code, an embedded key will just be ignored.

struct serverObject {
    unsigned type : 4;
    unsigned encoding : 4;
    unsigned lru : LRU_BITS; /* LRU_BITS = 24 */
    int refcount;
};

           robj
+-----------------------+
| type | encoding | LRU |
|-----------------------|
| refcount              |
|-----------------------|      +--------+
| ptr             ------------>| Value  |
+-----------------------+      +--------+

There's is already something called embedded string (obj->encoding = OBJ_ENCODING_EMBSTR, created by createEmbeddedStringObject) which allocates extra space after the struct, copies the string contents (an sds string) into the same allocation and points ptr to it. We can call this "embedded value" and it's not the same as an embedded key, which doesn't exist yet.

     robj with embedded value (legacy)
    +-----------------------+
    | type | encoding | LRU |
    |-----------------------|
    | refcount              |
    |-----------------------|
  ,---ptr                   |
 |  |-----------------------+
 |  | sds header            |
  `-> sds contents          |
    |                       |
    +-----------------------+

The sds strings are somewhat weird since they are pointers to the string contents, and they also have a header which is before the content (so internally, the sds library does s[-1] to look at the bytes before the content). That's why ptr points to the sds content, rather than to the sds header.

The ptr pointer takes up some space (8 bytes) in the struct and it would be possible to avoid it (by computing it instead), but it's accessed in thousands of places just like obj->ptr so let's leave it like that for now. We can still embed the key. We can avoid adding another pointer to the embedded key though, if we use the same method that was used in #541 to embed a key in dictEntry.

To embed also a key (another sds string) into the same robj, let's put the key before the value, because the key never changes. It can just be deleted, and then the whole robj is deleted. The value can change though (shrink or grows, possibly be moved out to its own allocation).

Let's use a bit from refcount to flag that there's an embedded key. We can also embed TTL and add a "has ttl" flag.

 struct serverObject {
     unsigned type : 4;
     unsigned encoding : 4;
     unsigned lru : LRU_BITS; /* LRU_BITS = 24 */
-    int refcount;
+    unsigned hasexpire : 1;
+    unsigned hasembkey : 1;
+    int refcount : 29;
     void *ptr;
+    /* Embedded stuff */
 };

The data[] field isn't strictly needed, since it's zero size by default and it's possible to allocate more space and use it anyway, but with a data field, we can access the end of the struct like obj->data and &obj->data[offset].

     Future layout of robj
    +-----------------------+
    | type | encoding | LRU |
    |-----------------------|
    | embkey | hasttl       |
    |-----------------------|
    | refcount              |
    |-----------------------|
  ,---ptr                   |
 |  |-----------------------|
 |  | expire                | Embedded expire (TTL)
 |  |-----------------------| (optional)
 |  | size of sds header    |
 |  | sds header            | Embedded key
 |  | sds contents          | (optional)
 |  |-----------------------|
 |  | sds header            |
  `-> sds contents          | Embedded value
    |                       | (optional)
    +-----------------------+

Note the added "size of sds header" in the drawing above. There are different variants of the sds header, with different header sizes. (See src/sds.h.) Currently, createEmbeddedStringObject always embeds the header struct sdshdr8 (which is a header of 3 bytes and the max length of strings of this sds variant 256 bytes, since it uses one byte to stores the length of the string). Embedded strings are (currently) only used for short strings anyway.

When we embed keys, though, we want to embed them regardless of their size. Thus we want to able to use different sds headers. We add one byte to store the size of the sds header, so we can compute where the sds content is, and from that we can call sdslen and other sds functions to figure out the length of the sds string and where it ends.

Now, what we need is a function to combine two robj into one, with an embedded key and a (embedded or not) value.

If we are smart, we can convert an object with an embedded value into an object with an embedded key and no value (ptr = NULL), without reallocating or moving the actual value. That's why I'm suggesting (not sure if it's a good idea) that we add a "sds header size" byte also for embedded value. When we convert an robj with embedded string into an robj with embedded key and no value, we just set the obj->embkey = 1; obj->ptr = NULL;.

Update: The strings from a client are parsed and stored as robj objects in c->argv and this argument vector needs to be valid even after the command has been executed, for replication and other purposes, so when we're handling a command like SET k v, k and v are both robj with a valid string value "k" and "v" respectively. We need to embed "k" as embedded key in the v object, without destroying k or turning the k into something else. Currently (as of valkey 8.0-rc1) the keys are embedded in a dictEntry and they're copied there (which seems fine, performance wise). Similarly we need to copy the string content of k into the 'embedded-key' part of v and then add v to the hash table.

The text was updated successfully, but these errors were encountered:

madolson · 2024-09-04T23:26:27Z

We would ideally align TTL, since most arithmetic operations require 8 byte alignment on x86 and ARM to be most performant, so it would be before the key when possible.

SoftlyRaining · 2024-09-04T23:57:43Z

I'm looking into it

zuiderkwast · 2024-09-05T04:05:02Z

@madolson so shall we place TTL before key? We'd have to memmove key in some cases if TTL is added or removed, if we want to save memory.

JimB123 · 2024-09-05T15:52:22Z

We probably want a reasonably small size on the embedded key (and embedded value). Since jemalloc size buckets get progressively larger, the wasted space (internal fragmentation) will get (on average) increasingly larger as the key/value size increases.

Re: TTL and alignment, I'm with Maddy that I like to ensure alignment if reasonably possible. However from everything I've read, there's not much alignment penalty on modern processors. (But it really bugs me to have things unaligned by design.) Regarding TTL being added/removed, I suggest that after TTL is added, it is never removed. Any key which had a TTL at one time, is likely to have a TTL again. Also, removing the TTL most likely won't result in memory savings anyway as the jemalloc bucket size is less likely to change once we start embedding keys and values (using larger buckets).

zuiderkwast · 2024-09-05T16:51:09Z

Hello Jim! I agree with most of this. Regarding key, I would like to always
embed it though. The key needs to be allocated somewhere and there can be
internal fragmentation there in any case. We can think of the robj fields as a
header to the key, if it makes you happier. :) Of course we can analyze the most
popular key sizes and the available allocation sizes to compare too.

Regarding TTL, we can put it before embedded key (but then we need to memmove the key
the first time TTL is added, if SET + EXPIRE is used rather than SET with EX),
or we can put TTL after the key but aligned and waste a few bytes, or make it unaligned. Not sure which is best, but an unaligned access is most likely cheap compared to another cache miss.

I think the value should be embedded if it fits with the key and ttl within 64 bytes (one cache line).

Otherwise, I think value should only be embedded if it fits within the usable
size of the allocation bin we would already get when embedding the key and a TTL
field. In this way, there's no penalty if the value later grows and we need to
move it to its own allocation.

We may want to avoid reallocating small objects to move them between the 48B and
64B bins, but we would need to update the references to them if we do that. The
same object gets added in the key space and in the expire hash table. It's safe
to do it if refounc = 1 though.

component	sizes 1	sizes 2	sizes 3
type, encoding, refcount, LRU	8	8	8
ptr	8	8	8
TTL field (optional)	8	0	0
embedded key overhead	2	2	2
embedded key content	20	15	15
embedded value (optional)	0	0	30
Sum	44	33	63
Jemalloc bin	48	48	64
Unused space	4	15	1

Anyway, we should provide an abstraction for this and provide functions to get expire, get embedded key, etc. Later, we'll be able to change the logic for when to embed what.

madolson assigned SoftlyRaining Sep 4, 2024

zuiderkwast mentioned this issue Sep 30, 2024

Re-thinking the main hash table #169

Open

5 tasks

zuiderkwast assigned zuiderkwast and unassigned SoftlyRaining Oct 14, 2024

zuiderkwast mentioned this issue Oct 22, 2024

[hashset feature] Replace dict with hashset for keys and expire #1178

Closed

zuiderkwast linked a pull request Oct 22, 2024 that will close this issue

[hashset feature] Replace dict with hashset for keys and expire #1178

Closed

zuiderkwast linked a pull request Nov 18, 2024 that will close this issue

New hash table #1186

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Embed key and TTL in robj #992

Embed key and TTL in robj #992

zuiderkwast commented Sep 4, 2024 •

edited

Loading

madolson commented Sep 4, 2024

SoftlyRaining commented Sep 4, 2024

zuiderkwast commented Sep 5, 2024

JimB123 commented Sep 5, 2024

zuiderkwast commented Sep 5, 2024 •

edited

Loading

Embed key and TTL in robj #992

Embed key and TTL in robj #992

Comments

zuiderkwast commented Sep 4, 2024 • edited Loading

madolson commented Sep 4, 2024

SoftlyRaining commented Sep 4, 2024

zuiderkwast commented Sep 5, 2024

JimB123 commented Sep 5, 2024

zuiderkwast commented Sep 5, 2024 • edited Loading

zuiderkwast commented Sep 4, 2024 •

edited

Loading

zuiderkwast commented Sep 5, 2024 •

edited

Loading