-
Notifications
You must be signed in to change notification settings - Fork 729
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add maxmemory-reserved parameter for evicting key earlier to avoid OOM #831
base: unstable
Are you sure you want to change the base?
Conversation
e31dd8b
to
b8a98df
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## unstable #831 +/- ##
============================================
+ Coverage 70.87% 71.03% +0.16%
============================================
Files 121 121
Lines 65203 65282 +79
============================================
+ Hits 46212 46375 +163
+ Misses 18991 18907 -84
|
b8a98df
to
ba13a0c
Compare
ba13a0c
to
e27e9de
Compare
this seem like a interesting feature, did not review, just drop a comment that approve the concept. |
I also like the idea. I just haven't really spent enough time thinking about it. Memory management is a big area in Valkey we need to improve. |
e27e9de
to
0a68013
Compare
0a68013
to
9596c78
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took a quick look.
+1 on introducing "back pressure" earlier. I feel that this could be used with `maxmemory_eviction_tenacity" to give an even smoother eviction experience. |
35437eb
to
db966fe
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @hwware! LGTM overall. Can you add some tests too?
Sure, ready to work, Thanks |
f7cc8c8
to
7a5584b
Compare
src/evict.c
Outdated
@@ -398,11 +398,12 @@ int getMaxmemoryState(size_t *total, size_t *logical, size_t *tofree, float *lev | |||
if (total) *total = mem_reported; | |||
|
|||
/* We may return ASAP if there is no need to compute the level. */ | |||
if (!server.maxmemory) { | |||
if (!server.maxmemory_available) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about this more, I feel that it is more preferable to model this new setting as a "soft" maxmemory
, which can trigger key eviction earlier but won't cause OOM to be returned before hitting the actual maxmemory
. Otherwise, we effectively create an alias of maxmemory
. More specifically, I think performEviction
should only return EVICT_FAIL
when the memory usage goes beyond the real maxmemory
.
Additionally, since getMaxmemoryState
is also used outside of the OOM prevention path such as in VM_GetUsedMemoryRatio
, we should consider parameterizing maxmemory
and having it passed in by the caller instead, so that we can maintain the same semantics in these externally facing scenarios.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like what you suggested, if we can think when "maxmemory_available" is available, it is a soft maxmemory. Then you maybe think we should not return OOM, in the case, how we can return message to client, any idea?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right - how about just processing the command normally?
- if
used_memory
is below soft max, no change to the existing logic - if
used_memory
is above soft max but below hard max, trigger key eviction and continue with the normal processing - if
used_memory
is above hard max, no change to the existing logic, i.e., trigger key eviction and fail the command if theused_memory
is still above hard max after the eviction)
200b203
to
4da32a2
Compare
4da32a2
to
510265f
Compare
Signed-off-by: hwware <[email protected]>
Signed-off-by: hwware <[email protected]>
Signed-off-by: hwware <[email protected]>
Signed-off-by: hwware <[email protected]>
Signed-off-by: hwware <[email protected]>
Signed-off-by: hwware <[email protected]>
Signed-off-by: hwware <[email protected]>
Signed-off-by: hwware <[email protected]>
Signed-off-by: hwware <[email protected]>
Signed-off-by: hwware <[email protected]>
Signed-off-by: hwware <[email protected]>
Signed-off-by: hwware <[email protected]>
Signed-off-by: hwware <[email protected]>
Signed-off-by: hwware <[email protected]>
Signed-off-by: hwware <[email protected]>
Signed-off-by: hwware <[email protected]>
No, i do not think so. The mem_not_counted_for_evict (aka called by freeMemoryGetNotCountedMemory()) However, the reserved memory is the size memory stepped away from the maxmemory. It should not include the AOF and replica output buffer part. |
5638c62
to
4be3b3c
Compare
So.... instead we should update the docs to explain the formula like this?
|
since what we actually do is early key/data eviction, how about change the name to |
@soloestoy maxmemory-key-eviction or maxmemory-dataset, you also suggested maxmemory-eviction-threshold. These are good names, but these names all mean the total memory to use, not a margin below maxmemory? |
Azure uses maxmemory-reserved name is aligned with our goal, you can check the link https://learn.microsoft.com/en-us/azure/azure-cache-for-redis/cache-best-practices-memory-management. And I post the screenshot here So let's discuss the naming issue in weekly meeting, Thanks |
"maxmemory:%lld\r\n", server.maxmemory, | ||
"maxmemory_human:%s\r\n", maxmemory_hmem, | ||
"maxmemory_policy:%s\r\n", evict_policy, | ||
"maxmemory_reserved:%lld\r\n", server.maxmemory_reserved, | ||
"maxmemory_reserved_human:%s\r\n", maxmemory_reserved_hmem, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"maxmemory:%lld\r\n", server.maxmemory, | |
"maxmemory_human:%s\r\n", maxmemory_hmem, | |
"maxmemory_policy:%s\r\n", evict_policy, | |
"maxmemory_reserved:%lld\r\n", server.maxmemory_reserved, | |
"maxmemory_reserved_human:%s\r\n", maxmemory_reserved_hmem, | |
"maxmemory:%lld\r\n", server.maxmemory - server.maxmemory_reserved, | |
"maxmemory_human:%s\r\n", maxmemory_reserved_hmem, | |
"maxmemory_policy:%s\r\n", evict_policy, | |
"maxmemory_for_key_evictions:%lld\r\n", server.maxmemory - server.maxmemory_reserved, | |
"maxmemory_for_key_evictions_human:%s\r\n", <this needs to be computed I guess>, |
I would like to consider this. from a user perspective, they want to see the actual limit, they don't want to do the math themselves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not like to do the mathematics as well. Agree with you. I will update it after we finalize the naming for this pr. Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Users still need to do some math to know when eviction happens. mem_not_counted_for_evict
is part of the formula.
The math will be easier for users if we add a field like free_memory_below_eviction_threshold
defined (if my math is right) like this:
maxmemory - maxmemory_reserved - used_memory + mem_not_counted_for_evict
With a field like this, users can just check this to know how near eviction is.
I remember this doc PR and linked issue: valkey-io/valkey-doc#194. It's the problem for users to understand this math.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The core logic looks good overall, but I still have a few open comments that need to be addressed.
Looks like we will close on the naming and metrics in the next meeting so I skipped them.
@@ -1279,6 +1279,13 @@ acllog-max-len 128 | |||
# | |||
# maxmemory-policy noeviction | |||
|
|||
# `maxmemory-reserved` specifies a fixed amount of memory set aside from `maxmemory`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am good with changing <bytes>
to 0
but then we should be explicit in the comments that this config expects "bytes"
# `maxmemory-reserved` specifies a fixed amount of memory set aside from `maxmemory`. | |
# `maxmemory-reserved` specifies a fixed amount of memory in bytes set aside from `maxmemory`. |
serverLog(LL_WARNING, "WARNING: the difference between memory usage and maxmemory is less than reserved memory. " | ||
"This will result in key eviction depending on the maxmemory-policy. But server can still accept new write commands."); | ||
} | ||
startEvictionTimeProc(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is confusing to trigger the key eviction cron job when updating maxmemory-reserveed
, since we already said this new config is all about "oppurtunistic" key eviction on the data access path.
@@ -3329,6 +3344,7 @@ standardConfig static_configs[] = { | |||
|
|||
/* Unsigned Long Long configs */ | |||
createULongLongConfig("maxmemory", NULL, MODIFIABLE_CONFIG, 0, ULLONG_MAX, server.maxmemory, 0, MEMORY_CONFIG, NULL, updateMaxmemory), | |||
createULongLongConfig("maxmemory-reserved", NULL, MODIFIABLE_CONFIG, 0, ULLONG_MAX, server.maxmemory_reserved, 0, MEMORY_CONFIG, NULL, updateMaxmemoryReserved), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have we used negative values for percentage already? I vaguely remember we did but not 100% sure. Also have we used the negative value hack for things other percentage?
I am good either way (adding percentage support or not).
} else { | ||
result = EVICT_OK; /* used_memory greater than key_eviction_memory, but not reach OOM */ | ||
} | ||
goto update_metrics; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still want to perform eviction when used_memory is greater than key_eviction_memory I think
} else { | |
result = EVICT_OK; /* used_memory greater than key_eviction_memory, but not reach OOM */ | |
} | |
goto update_metrics; | |
goto update_metrics; | |
} |
@@ -697,7 +706,7 @@ int performEvictions(void) { | |||
* across the dbAsyncDelete() call, while the thread can | |||
* release the memory all the time. */ | |||
if (server.lazyfree_lazy_eviction) { | |||
if (getMaxmemoryState(NULL, NULL, NULL, NULL) == C_OK) { | |||
if (getMaxmemoryState(NULL, NULL, NULL, NULL, server.key_eviction_memory) == C_OK) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see - updating mem_used
here essentially moves the goal post. I think the variable name (mem_used
) is the source of my confusion. agreed with you that we don't need to update mem_used
Decision: We're going to wait until after RC1 to decide what to do about this. It sounds like @hwware had some internal use case where this was important related to storing data on disk. Wen, can you follow up with the specific use case you had in mind for this? |
I just update the top comment and give one specific example. |
@hwware I looked at the issue you described, and I think it's because we currently only attempt to perform eviction when executing commands (including reading commands). I think we should also check |
I do not want to add startEvictionTimeProc() in the serverCron because some unnecessary eviction process could be triggered. Assume we have maxmemory 10GB, maxmemory-key-eviction 9GB. And the server has not received the write command for a long time, then theoretically, there is no key eviction and used memory keeps on 9.2GB. However, if we just add startEvictionTimeProc() in the serverCron, key eviction process could happen, some keys will be evicted from the server. |
Is staying at 200MB over maxmemory an issue? Basically Redis was designed to have a soft maxmemory limit, in that you can exceed it. You should set maxmemory so that you have some amount of additional buffer on the host. I agree with Zhao that if we think that temporarily breaching that limit is a problem, we should probably consider doing eviction after the write command was executed.
I don't think this is how Valkey behaves. In step 2, it will keep evicting keys until it's below 10GB. If we start the backround job, we will still accept the command. We only mark the command as OOM if performEvictions function returns EVICT_FAIL. |
I think we should consider this problem from system perspective or user perspective if the 200MB over maxmemory is feasible.
Yes, I agree with you, this is currently how Valkey works. And my pr could improve it as well. For example: Secondly, my pr is much more friendly to the big key. Assume there is a big key 200MB coming, and for our current design, if only 20MB keys could be evicted per CPU cycle. Thus, the big key need wait for 10 CPU cycle. And in my PR, after one CPU cycle, the 200MB big key can be written to the server and do not need wait for another 9 CPU cycle. I am not sure if above use case could convince you. Thanks |
All cases are under caching, no 2nd storage disk.
When user set maxmemory in config file, OOM (out of memory) message could be returned to user once Valkey reaches the maxmemory threshold.
The PR Incremental eviction processing redis/redis#7653 solves some cases of OOM issues, but in some scenarios, OOM still happens. Through this PR, we could reduce the probability of OOM occurrence or delay the OOM, especially in heavy-write case.
Example:
We have one VM 100GB, and user set maxmemory as 10GB.
If Valkey server is working on a heavy-write process with key-value pair is big (200MB), and we increase maxmemory-eviction-tenacity to adapt to this situation. Before we apply this PR, the following actions will happen:
step 0: used memory is 10GB
step 1: a new write command processes successfully, used memory reaches 10.2GB (no eviction happens)
step 2: a new write command comes, 150MB key is evicted, and used memory is 10.05GB, OOM happens
step 3: User retries the previous write command, another 150MB key is evicted. And used memory is just below 10GB,
the write command can be processed successfully
With this PR, we could set maxmemory-reserved as 9GB. Thus, once the Valkey comes to the 9GB memory, key eviction process will begin.
step 0: used memory is 9GB
step 1: a new write command processes successfully, used memory reaches 9.2GB (no eviction happens)
step 2: the second write command comes, 150MB key is evicted, and used memory is 9.05GB
the write command executes successfully, used memory update to 9.25 GB
step 3: the third write command comes, 150MB key is evicted, and used memory is 9.1GB
the write command executes successfully, used memory update to 9.3 GB
...........
After more than 20 steps, OOM happens.
From the above example, we can see that although with a heavy-write system, the used memory is increasing, but we can significantly delay the OOM.