Add maxmemory-reserved parameter for evicting key earlier to avoid OOM #831

hwware · 2024-07-26T15:24:25Z

All cases are under caching, no 2nd storage disk.
When user set maxmemory in config file, OOM (out of memory) message could be returned to user once Valkey reaches the maxmemory threshold.

The PR Incremental eviction processing redis/redis#7653 solves some cases of OOM issues, but in some scenarios, OOM still happens. Through this PR, we could reduce the probability of OOM occurrence or delay the OOM, especially in heavy-write case.

Example:

We have one VM 100GB, and user set maxmemory as 10GB.

If Valkey server is working on a heavy-write process with key-value pair is big (200MB), and we increase maxmemory-eviction-tenacity to adapt to this situation. Before we apply this PR, the following actions will happen：
step 0: used memory is 10GB
step 1: a new write command processes successfully, used memory reaches 10.2GB (no eviction happens)
step 2: a new write command comes, 150MB key is evicted, and used memory is 10.05GB, OOM happens
step 3: User retries the previous write command, another 150MB key is evicted. And used memory is just below 10GB,
the write command can be processed successfully

With this PR, we could set maxmemory-reserved as 9GB. Thus, once the Valkey comes to the 9GB memory, key eviction process will begin.

step 0: used memory is 9GB
step 1: a new write command processes successfully, used memory reaches 9.2GB (no eviction happens)
step 2: the second write command comes, 150MB key is evicted, and used memory is 9.05GB
the write command executes successfully, used memory update to 9.25 GB
step 3: the third write command comes, 150MB key is evicted, and used memory is 9.1GB
the write command executes successfully, used memory update to 9.3 GB
...........
After more than 20 steps, OOM happens.

From the above example, we can see that although with a heavy-write system, the used memory is increasing, but we can significantly delay the OOM.

codecov · 2024-07-26T15:53:55Z

Codecov Report

Attention: Patch coverage is 83.01887% with 9 lines in your changes missing coverage. Please review.

Project coverage is 71.03%. Comparing base (aced268) to head (4be3b3c).
Report is 9 commits behind head on unstable.

Files with missing lines	Patch %	Lines
src/config.c	69.23%	4 Missing ⚠️
src/module.c	0.00%	3 Missing ⚠️
src/evict.c	87.50%	2 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##           unstable     #831      +/-   ##
============================================
+ Coverage     70.87%   71.03%   +0.16%     
============================================
  Files           121      121              
  Lines         65203    65282      +79     
============================================
+ Hits          46212    46375     +163     
+ Misses        18991    18907      -84

Files with missing lines	Coverage Δ
src/server.c	`87.72% <100.00%> (+0.09%)`	⬆️
src/server.h	`100.00% <ø> (ø)`
src/evict.c	`98.10% <87.50%> (-0.76%)`	⬇️
src/module.c	`9.59% <0.00%> (ø)`
src/config.c	`78.24% <69.23%> (-0.16%)`	⬇️

... and 23 files with indirect coverage changes

enjoy-binbin · 2024-09-11T15:14:45Z

this seem like a interesting feature, did not review, just drop a comment that approve the concept.

madolson · 2024-09-11T15:59:51Z

I also like the idea. I just haven't really spent enough time thinking about it. Memory management is a big area in Valkey we need to improve.

PingXie

Took a quick look.

src/evict.c

src/server.c

src/evict.c

PingXie · 2024-09-25T04:31:26Z

+1 on introducing "back pressure" earlier. I feel that this could be used with `maxmemory_eviction_tenacity" to give an even smoother eviction experience.

PingXie

Thanks @hwware! LGTM overall. Can you add some tests too?

src/evict.c

src/server.h

hwware · 2024-10-02T13:52:11Z

Thanks @hwware! LGTM overall. Can you add some tests too?

Sure, ready to work, Thanks

src/config.c

valkey.conf

PingXie · 2024-10-11T06:23:53Z

src/evict.c

@@ -398,11 +398,12 @@ int getMaxmemoryState(size_t *total, size_t *logical, size_t *tofree, float *lev
    if (total) *total = mem_reported;

    /* We may return ASAP if there is no need to compute the level. */
-    if (!server.maxmemory) {
+    if (!server.maxmemory_available) {


Thinking about this more, I feel that it is more preferable to model this new setting as a "soft" maxmemory, which can trigger key eviction earlier but won't cause OOM to be returned before hitting the actual maxmemory. Otherwise, we effectively create an alias of maxmemory. More specifically, I think performEviction should only return EVICT_FAIL when the memory usage goes beyond the real maxmemory.

Additionally, since getMaxmemoryState is also used outside of the OOM prevention path such as in VM_GetUsedMemoryRatio, we should consider parameterizing maxmemory and having it passed in by the caller instead, so that we can maintain the same semantics in these externally facing scenarios.

Thoughts?

Like what you suggested, if we can think when "maxmemory_available" is available, it is a soft maxmemory. Then you maybe think we should not return OOM, in the case, how we can return message to client, any idea?

right - how about just processing the command normally?

if used_memory is below soft max, no change to the existing logic

if used_memory is above soft max but below hard max, trigger key eviction and continue with the normal processing

if used_memory is above hard max, no change to the existing logic, i.e., trigger key eviction and fail the command if the used_memory is still above hard max after the eviction)

Signed-off-by: hwware <[email protected]>

hwware · 2025-02-06T19:15:03Z

The INFO field mem_not_counted_for_evict should include the reserved memory, right? In our docs, we say that eviction happens when
used_memory - mem_not_counted_for_evict > maxmemory
so we should make sure this formula is still correct. Docs here: https://valkey.io/topics/lru-cache/

No, i do not think so. The mem_not_counted_for_evict (aka called by freeMemoryGetNotCountedMemory())
is the size of replicas output buffers and AOF buffer from the count of used memory.

However, the reserved memory is the size memory stepped away from the maxmemory. It should not include the AOF and replica output buffer part.

zuiderkwast · 2025-02-06T19:58:04Z

The INFO field mem_not_counted_for_evict should include the reserved memory, right? In our docs, we say that eviction happens when
used_memory - mem_not_counted_for_evict > maxmemory
so we should make sure this formula is still correct. Docs here: https://valkey.io/topics/lru-cache/
No, i do not think so. The mem_not_counted_for_evict (aka called by freeMemoryGetNotCountedMemory()) is the size of replicas output buffers and AOF buffer from the count of used memory.

However, the reserved memory is the size memory stepped away from the maxmemory. It should not include the AOF and replica output buffer part.

So.... instead we should update the docs to explain the formula like this?

used_memory - mem_not_counted_for_evict > maxmemory - maxmemory_reserved

soloestoy · 2025-02-07T06:52:06Z

since what we actually do is early key/data eviction, how about change the name to maxmemory-key-eviction or maxmemory-dataset, that can also align with maxmemory-clients.

zuiderkwast · 2025-02-07T08:49:23Z

@soloestoy maxmemory-key-eviction or maxmemory-dataset, you also suggested maxmemory-eviction-threshold. These are good names, but these names all mean the total memory to use, not a margin below maxmemory?

hwware · 2025-02-07T18:06:31Z

Azure uses maxmemory-reserved name is aligned with our goal, you can check the link https://learn.microsoft.com/en-us/azure/azure-cache-for-redis/cache-best-practices-memory-management.

And I post the screenshot here

So let's discuss the naming issue in weekly meeting, Thanks

madolson · 2025-02-07T19:27:59Z

src/server.c

                "maxmemory:%lld\r\n", server.maxmemory,
                "maxmemory_human:%s\r\n", maxmemory_hmem,
                "maxmemory_policy:%s\r\n", evict_policy,
+                "maxmemory_reserved:%lld\r\n", server.maxmemory_reserved,
+                "maxmemory_reserved_human:%s\r\n", maxmemory_reserved_hmem,


Suggested change

"maxmemory:%lld\r\n", server.maxmemory,

"maxmemory_human:%s\r\n", maxmemory_hmem,

"maxmemory_policy:%s\r\n", evict_policy,

"maxmemory_reserved:%lld\r\n", server.maxmemory_reserved,

"maxmemory_reserved_human:%s\r\n", maxmemory_reserved_hmem,

"maxmemory:%lld\r\n", server.maxmemory - server.maxmemory_reserved,

"maxmemory_human:%s\r\n", maxmemory_reserved_hmem,

"maxmemory_policy:%s\r\n", evict_policy,

"maxmemory_for_key_evictions:%lld\r\n", server.maxmemory - server.maxmemory_reserved,

"maxmemory_for_key_evictions_human:%s\r\n", <this needs to be computed I guess>,

I would like to consider this. from a user perspective, they want to see the actual limit, they don't want to do the math themselves.

I do not like to do the mathematics as well. Agree with you. I will update it after we finalize the naming for this pr. Thanks

Users still need to do some math to know when eviction happens. mem_not_counted_for_evict is part of the formula.

The math will be easier for users if we add a field like free_memory_below_eviction_threshold defined (if my math is right) like this:

maxmemory - maxmemory_reserved - used_memory + mem_not_counted_for_evict

With a field like this, users can just check this to know how near eviction is.

I remember this doc PR and linked issue: valkey-io/valkey-doc#194. It's the problem for users to understand this math.

PingXie

The core logic looks good overall, but I still have a few open comments that need to be addressed.

Looks like we will close on the naming and metrics in the next meeting so I skipped them.

PingXie · 2025-02-08T19:47:27Z

valkey.conf

@@ -1279,6 +1279,13 @@ acllog-max-len 128
 #
 # maxmemory-policy noeviction

+# `maxmemory-reserved` specifies a fixed amount of memory set aside from `maxmemory`.


I am good with changing <bytes> to 0 but then we should be explicit in the comments that this config expects "bytes"

Suggested change

# `maxmemory-reserved` specifies a fixed amount of memory set aside from `maxmemory`.

# `maxmemory-reserved` specifies a fixed amount of memory in bytes set aside from `maxmemory`.

PingXie · 2025-02-08T19:51:25Z

src/config.c

+            serverLog(LL_WARNING, "WARNING: the difference between memory usage and maxmemory is less than reserved memory. "
+                                  "This will result in key eviction depending on the maxmemory-policy. But server can still accept new write commands.");
+        }
+        startEvictionTimeProc();


I think it is confusing to trigger the key eviction cron job when updating maxmemory-reserveed, since we already said this new config is all about "oppurtunistic" key eviction on the data access path.

PingXie · 2025-02-08T19:54:29Z

src/config.c

@@ -3329,6 +3344,7 @@ standardConfig static_configs[] = {

    /* Unsigned Long Long configs */
    createULongLongConfig("maxmemory", NULL, MODIFIABLE_CONFIG, 0, ULLONG_MAX, server.maxmemory, 0, MEMORY_CONFIG, NULL, updateMaxmemory),
+    createULongLongConfig("maxmemory-reserved", NULL, MODIFIABLE_CONFIG, 0, ULLONG_MAX, server.maxmemory_reserved, 0, MEMORY_CONFIG, NULL, updateMaxmemoryReserved),


Have we used negative values for percentage already? I vaguely remember we did but not 100% sure. Also have we used the negative value hack for things other percentage?

I am good either way (adding percentage support or not).

PingXie · 2025-02-08T20:06:42Z

src/evict.c

+        } else {
+            result = EVICT_OK; /* used_memory greater than key_eviction_memory, but not reach OOM */
+        }
        goto update_metrics;


We still want to perform eviction when used_memory is greater than key_eviction_memory I think

Suggested change

} else {

result = EVICT_OK; /* used_memory greater than key_eviction_memory, but not reach OOM */

}

goto update_metrics;

goto update_metrics;

}

PingXie · 2025-02-08T20:09:12Z

src/evict.c

@@ -697,7 +706,7 @@ int performEvictions(void) {
                 * across the dbAsyncDelete() call, while the thread can
                 * release the memory all the time. */
                if (server.lazyfree_lazy_eviction) {
-                    if (getMaxmemoryState(NULL, NULL, NULL, NULL) == C_OK) {
+                    if (getMaxmemoryState(NULL, NULL, NULL, NULL, server.key_eviction_memory) == C_OK) {


I see - updating mem_used here essentially moves the goal post. I think the variable name (mem_used) is the source of my confusion. agreed with you that we don't need to update mem_used

madolson · 2025-02-10T15:52:10Z

Decision: We're going to wait until after RC1 to decide what to do about this. It sounds like @hwware had some internal use case where this was important related to storing data on disk. Wen, can you follow up with the specific use case you had in mind for this?

hwware · 2025-02-10T20:11:55Z

Decision: We're going to wait until after RC1 to decide what to do about this. It sounds like @hwware had some internal use case where this was important related to storing data on disk. Wen, can you follow up with the specific use case you had in mind for this?

I just update the top comment and give one specific example.
And I misremembered the scenario, we use this parameter in our internal with the caching system, not storage cases.

soloestoy · 2025-02-11T04:02:41Z

@hwware I looked at the issue you described, and I think it's because we currently only attempt to perform eviction when executing commands (including reading commands). I think we should also check maxmemory and execute startEvictionTimeProc in serverCron, which could also resolve the issue you mentioned. Of course, maxmemory-key-eviction is still necessary.

hwware · 2025-02-11T15:39:21Z

startEvictionTimeProc

I do not want to add startEvictionTimeProc() in the serverCron because some unnecessary eviction process could be triggered.
One example:

Assume we have maxmemory 10GB, maxmemory-key-eviction 9GB.
At some timestamp, Valkey reaches 8.9 GB, and a new write command with 300 MB key executes successfully.
Thus the server used memory is 9.2 GB, which is over the maxmemory-key-eviction setting.

And the server has not received the write command for a long time, then theoretically, there is no key eviction and used memory keeps on 9.2GB.

However, if we just add startEvictionTimeProc() in the serverCron, key eviction process could happen, some keys will be evicted from the server.

madolson · 2025-02-12T01:00:43Z

And the server has not received the write command for a long time, then theoretically, there is no key eviction and used memory keeps on 9.2GB.

Is staying at 200MB over maxmemory an issue? Basically Redis was designed to have a soft maxmemory limit, in that you can exceed it. You should set maxmemory so that you have some amount of additional buffer on the host. I agree with Zhao that if we think that temporarily breaching that limit is a problem, we should probably consider doing eviction after the write command was executed.

If Valkey server is working on a heavy-write process with key-value pair is big (200MB), and we increase maxmemory-eviction-tenacity to adapt to this situation. Before we apply this PR, the following actions will happen：
step 0: used memory is 10GB
step 1: a new write command processes successfully, used memory reaches 10.2GB (no eviction happens)
step 2: a new write command comes, 150MB key is evicted, and used memory is 10.05GB, OOM happens
step 3: User retries the previous write command, another 150MB key is evicted. And used memory is just below 10GB,
the write command can be processed successfully

I don't think this is how Valkey behaves. In step 2, it will keep evicting keys until it's below 10GB. If we start the backround job, we will still accept the command. We only mark the command as OOM if performEvictions function returns EVICT_FAIL.

hwware · 2025-02-13T16:37:35Z

Is staying at 200MB over maxmemory an issue? Basically Redis was designed to have a soft maxmemory limit, in that you can exceed it. You should set maxmemory so that you have some amount of additional buffer on the host. I agree with Zhao that if we think that temporarily breaching that limit is a problem, we should probably consider doing eviction after the write command was executed.

I think we should consider this problem from system perspective or user perspective if the 200MB over maxmemory is feasible.
Assume there is one use case: after one write command is executed, 200MB over maxmemory is reached. At the time, if we execute key eviction, 204,800 keys (1KB per key) could be evicted. It means if there is no more write commands, user will not find 204,800 keys in the server.
DO we think this case is reasonable?

If Valkey server is working on a heavy-write process with key-value pair is big (200MB), and we increase maxmemory-eviction-tenacity to adapt to this situation. Before we apply this PR, the following actions will happen：
step 0: used memory is 10GB
step 1: a new write command processes successfully, used memory reaches 10.2GB (no eviction happens)
step 2: a new write command comes, 150MB key is evicted, and used memory is 10.05GB, OOM happens
step 3: User retries the previous write command, another 150MB key is evicted. And used memory is just below 10GB,
the write command can be processed successfully

I don't think this is how Valkey behaves. In step 2, it will keep evicting keys until it's below 10GB. If we start the backround job, we will still accept the command. We only mark the command as OOM if performEvictions function returns EVICT_FAIL.

Yes, I agree with you, this is currently how Valkey works. And my pr could improve it as well.

For example:
In the 1st CPU cycle, server is 200MB over maxmemory. A new write command is coming, key eviction is needed. But unfortunately, during the key eviction process, EVICT_FAIL is thrown, the user will receive the OOM.
However, if we have maxmemory_key_eviction, when server reaches it, even during the eviction process, no proper key is found (if (bestkey) is false), we still allow write command executed, and there is no OOM thrown.

Secondly, my pr is much more friendly to the big key. Assume there is a big key 200MB coming, and for our current design, if only 20MB keys could be evicted per CPU cycle. Thus, the big key need wait for 10 CPU cycle. And in my PR, after one CPU cycle, the 200MB big key can be written to the server and do not need wait for another 9 CPU cycle.

I am not sure if above use case could convince you. Thanks

hwware force-pushed the maxmemory-reserved-parameter branch from e31dd8b to b8a98df Compare July 26, 2024 15:39

hwware requested review from PingXie, enjoy-binbin and madolson and removed request for PingXie and enjoy-binbin July 26, 2024 15:52

hwware force-pushed the maxmemory-reserved-parameter branch from b8a98df to ba13a0c Compare August 29, 2024 14:50

hwware force-pushed the maxmemory-reserved-parameter branch from ba13a0c to e27e9de Compare September 9, 2024 16:16

hwware force-pushed the maxmemory-reserved-parameter branch from e27e9de to 0a68013 Compare September 16, 2024 17:10

hwware force-pushed the maxmemory-reserved-parameter branch from 0a68013 to 9596c78 Compare September 24, 2024 01:14

PingXie reviewed Sep 25, 2024

View reviewed changes

src/evict.c Outdated Show resolved Hide resolved

src/server.c Outdated Show resolved Hide resolved

src/evict.c Outdated Show resolved Hide resolved

hwware force-pushed the maxmemory-reserved-parameter branch 2 times, most recently from 35437eb to db966fe Compare September 26, 2024 02:46

PingXie reviewed Sep 29, 2024

View reviewed changes

src/evict.c Outdated Show resolved Hide resolved

src/server.h Outdated Show resolved Hide resolved

hwware force-pushed the maxmemory-reserved-parameter branch 4 times, most recently from f7cc8c8 to 7a5584b Compare October 7, 2024 09:31

PingXie reviewed Oct 11, 2024

View reviewed changes

hwware force-pushed the maxmemory-reserved-parameter branch 3 times, most recently from 200b203 to 4da32a2 Compare October 18, 2024 05:54

hwware force-pushed the maxmemory-reserved-parameter branch from 4da32a2 to 510265f Compare October 28, 2024 13:42

hwware added 16 commits February 6, 2025 19:14

Update to maxmemory_soft_scale

2603e0b

Signed-off-by: hwware <[email protected]>

Update eviction logic

62784f7

Signed-off-by: hwware <[email protected]>

update passed parameter

67231ba

Signed-off-by: hwware <[email protected]>

update condition in server.maxmemory_soft_scale

355e253

Signed-off-by: hwware <[email protected]>

update to add new conf parameter key-eviction-memory

5348e90

Signed-off-by: hwware <[email protected]>

make existing test case pass

1cc8943

Signed-off-by: hwware <[email protected]>

Add test cases

8cb5d20

Signed-off-by: hwware <[email protected]>

remove unrelated test cases and set used memory as a const value

9c21792

Signed-off-by: hwware <[email protected]>

Fix memefficiency test case failure issue

46e0f2c

Signed-off-by: hwware <[email protected]>

update codes to align with maxmemory-reserved config parameter

6542eff

Signed-off-by: hwware <[email protected]>

adjust clang-format

c8e217f

Signed-off-by: hwware <[email protected]>

Ajust code format for 2 files

f891244

Signed-off-by: hwware <[email protected]>

Address Ping comment to update codes

28be0fa

Signed-off-by: hwware <[email protected]>

restore &mem_used to NULL

41780a1

Signed-off-by: hwware <[email protected]>

update info field to maxmemory_reserved

4a803c4

Signed-off-by: hwware <[email protected]>

remove cant_free goto statement

4be3b3c

Signed-off-by: hwware <[email protected]>

hwware force-pushed the maxmemory-reserved-parameter branch from 5638c62 to 4be3b3c Compare February 6, 2025 19:16

madolson reviewed Feb 7, 2025

View reviewed changes

PingXie reviewed Feb 8, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add maxmemory-reserved parameter for evicting key earlier to avoid OOM #831

Add maxmemory-reserved parameter for evicting key earlier to avoid OOM #831

hwware commented Jul 26, 2024 •

edited

Loading

codecov bot commented Jul 26, 2024 •

edited

Loading

enjoy-binbin commented Sep 11, 2024

madolson commented Sep 11, 2024

PingXie left a comment

PingXie commented Sep 25, 2024

PingXie left a comment

hwware commented Oct 2, 2024

PingXie Oct 11, 2024

hwware Oct 18, 2024 •

edited

Loading

PingXie Oct 24, 2024

hwware commented Feb 6, 2025

zuiderkwast commented Feb 6, 2025

soloestoy commented Feb 7, 2025

zuiderkwast commented Feb 7, 2025

hwware commented Feb 7, 2025

madolson Feb 7, 2025 •

edited

Loading

hwware Feb 7, 2025

zuiderkwast Feb 7, 2025

PingXie left a comment •

edited

Loading

PingXie Feb 8, 2025

PingXie Feb 8, 2025

PingXie Feb 8, 2025

PingXie Feb 8, 2025

PingXie Feb 8, 2025

madolson commented Feb 10, 2025

hwware commented Feb 10, 2025 •

edited

Loading

soloestoy commented Feb 11, 2025

hwware commented Feb 11, 2025 •

edited

Loading

madolson commented Feb 12, 2025

hwware commented Feb 13, 2025 •

edited

Loading

	# `maxmemory-reserved` specifies a fixed amount of memory set aside from `maxmemory`.
	# `maxmemory-reserved` specifies a fixed amount of memory in bytes set aside from `maxmemory`.

Add maxmemory-reserved parameter for evicting key earlier to avoid OOM #831

Are you sure you want to change the base?

Add maxmemory-reserved parameter for evicting key earlier to avoid OOM #831

Conversation

hwware commented Jul 26, 2024 • edited Loading

codecov bot commented Jul 26, 2024 • edited Loading

Codecov Report

enjoy-binbin commented Sep 11, 2024

madolson commented Sep 11, 2024

PingXie left a comment

Choose a reason for hiding this comment

PingXie commented Sep 25, 2024

PingXie left a comment

Choose a reason for hiding this comment

hwware commented Oct 2, 2024

Choose a reason for hiding this comment

hwware Oct 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hwware commented Feb 6, 2025

zuiderkwast commented Feb 6, 2025

soloestoy commented Feb 7, 2025

zuiderkwast commented Feb 7, 2025

hwware commented Feb 7, 2025

madolson Feb 7, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

PingXie left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

madolson commented Feb 10, 2025

hwware commented Feb 10, 2025 • edited Loading

soloestoy commented Feb 11, 2025

hwware commented Feb 11, 2025 • edited Loading

madolson commented Feb 12, 2025

hwware commented Feb 13, 2025 • edited Loading

hwware commented Jul 26, 2024 •

edited

Loading

codecov bot commented Jul 26, 2024 •

edited

Loading

hwware Oct 18, 2024 •

edited

Loading

madolson Feb 7, 2025 •

edited

Loading

PingXie left a comment •

edited

Loading

hwware commented Feb 10, 2025 •

edited

Loading

hwware commented Feb 11, 2025 •

edited

Loading

hwware commented Feb 13, 2025 •

edited

Loading