-
Notifications
You must be signed in to change notification settings - Fork 716
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
client struct: lazy init components and optimize struct layout #1405
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## unstable #1405 +/- ##
============================================
- Coverage 70.81% 70.79% -0.02%
============================================
Files 118 119 +1
Lines 63561 64728 +1167
============================================
+ Hits 45010 45826 +816
- Misses 18551 18902 +351
|
Amazing! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good! It's a large diff but most of the changed lines are trivial changes. I can't verify everything so I have to trust you and the tests for the correctness.
I noticed in many helper functions, we're using these client sub-structs, for example c->mstate->something
, without checking that c->mstate
exists. It should always exist in these cases so it's not an error, but would it make sense to add serverAssert(c->mstate != NULL)
in all these places?
If we have code coverage by tests, we will find these errors anyway, so IMHO it doens't matter too match if it's a null-pointer crash or an assert. What is your reasoning about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few nits now.
b84c52d
to
b4755cb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the overall change looks O.K, but some cases are not 100% trivial IMO and the change impacts almost all logical flows. I would suggest we run the daily tests to verify this change.
Signed-off-by: Uri Yagelnik <[email protected]>
Signed-off-by: Uri Yagelnik <[email protected]>
Co-authored-by: Viktor Söderqvist <[email protected]> Signed-off-by: uriyage <[email protected]>
Co-authored-by: Viktor Söderqvist <[email protected]> Signed-off-by: uriyage <[email protected]>
Co-authored-by: Viktor Söderqvist <[email protected]> Signed-off-by: uriyage <[email protected]>
Signed-off-by: Uri Yagelnik <[email protected]>
b4755cb
to
d4ebb91
Compare
Signed-off-by: Uri Yagelnik <[email protected]>
…y-io#1405) # Refactor client structure to use modular data components ## Current State The client structure allocates memory for replication / pubsub / multi-keys / module / blocked data for every client, despite these features being used by only a small subset of clients. In addition the current field layout in the client struct is suboptimal, with poor alignment and unnecessary padding between fields, leading to a larger than necessary memory footprint of 896 bytes per client. Furthermore, fields that are frequently accessed together during operations are scattered throughout the struct, resulting in poor cache locality. ## This PR's Change 1. Lazy Initialization - **Components are only allocated when first used:** - PubSubData: Created on first SUBSCRIBE/PUBLISH operation - ReplicationData: Initialized only for replica connections - ModuleData: Allocated when module interaction begins - BlockingState: Created when first blocking command is issued - MultiState: Initialized on MULTI command 2. Memory Layout Optimization: - Grouped related fields for better locality - Moved rarely accessed fields (e.g., client->name) to struct end - Optimized field alignment to eliminate padding 3. Additional changes: - Moved watched_keys to be static allocated in the `mstate` struct - Relocated replication init logic to replication.c ### Key Benefits - **Efficient Memory Usage:** - 45% smaller base client structure - Basic clients now use 528 bytes (down from 896). - Better memory locality for related operations - Performance improvement in high throughput scenarios. No performance regressions in other cases. ### Performance Impact Tested with 650 clients and 512 bytes values. #### Single Thread Performance | Operation | Dataset | New (ops/sec) | Old (ops/sec) | Change % | |------------|---------|---------------|---------------|-----------| | SET | 1 key | 261,799 | 258,261 | +1.37% | | SET | 3M keys | 209,134 | ~209,000 | ~0% | | GET | 1 key | 281,564 | 277,965 | +1.29% | | GET | 3M keys | 231,158 | 228,410 | +1.20% | #### 8 IO Threads Performance | Operation | Dataset | New (ops/sec) | Old (ops/sec) | Change % | |------------|---------|---------------|---------------|-----------| | SET | 1 key | 1,331,578 | 1,331,626 | -0.00% | | SET | 3M keys | 1,254,441 | 1,152,645 | +8.83% | | GET | 1 key | 1,293,149 | 1,289,503 | +0.28% | | GET | 3M keys | 1,152,898 | 1,101,791 | +4.64% | #### Pipeline Performance (3M keys) | Operation | Pipeline Size | New (ops/sec) | Old (ops/sec) | Change % | |-----------|--------------|---------------|---------------|-----------| | SET | 10 | 548,964 | 538,498 | +1.94% | | SET | 20 | 606,148 | 594,872 | +1.89% | | SET | 30 | 631,122 | 616,606 | +2.35% | | GET | 10 | 628,482 | 624,166 | +0.69% | | GET | 20 | 687,371 | 681,659 | +0.84% | | GET | 30 | 725,855 | 721,102 | +0.66% | ### Observations: 1. Single-threaded operations show consistent improvements (1-1.4%) 2. Multi-threaded performance shows significant gains for large datasets: - SET with 3M keys: +8.83% improvement - GET with 3M keys: +4.64% improvement 3. Pipeline operations show consistent improvements: - SET operations: +1.89% to +2.35% - GET operations: +0.66% to +0.84% 4. No performance regressions observed in any test scenario Related issue:valkey-io#761 --------- Signed-off-by: Uri Yagelnik <[email protected]> Signed-off-by: uriyage <[email protected]> Co-authored-by: Viktor Söderqvist <[email protected]>
This pr will fix valkey-io#1683 After valkey-io#1405, cause this coredump.
This pr will fix valkey-io#1683 After valkey-io#1405, cause this coredump.
This pr will fix valkey-io#1683 After valkey-io#1405, cause this coredump.
This pr will fix valkey-io#1683 After valkey-io#1405, cause this coredump. Signed-off-by: bodong.ybd <[email protected]>
After #1405, `client trackinginfo` will crash when tracking is off ``` /lib64/libpthread.so.0(+0xf630)[0x7fab74931630] ./src/valkey-server *:6379(clientTrackingInfoCommand+0x12b)[0x57f8db] ./src/valkey-server *:6379(call+0x5ba)[0x5a791a] ./src/valkey-server *:6379(processCommand+0x968)[0x5a8938] ./src/valkey-server *:6379(processInputBuffer+0x18d)[0x58381d] ./src/valkey-server *:6379(readQueryFromClient+0x59)[0x585ea9] ./src/valkey-server *:6379[0x46fa4d] ./src/valkey-server *:6379(aeMain+0x89)[0x5bf3e9] ./src/valkey-server *:6379(main+0x4e1)[0x455821] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7fab74576555] ./src/valkey-server *:6379[0x4564f2] ``` The reason is that we did not init pubsub_data by default, we only init it when tracking on. Fixes #1683. Co-authored-by: Binbin <[email protected]> Co-authored-by: Viktor Söderqvist <[email protected]> Co-authored-by: ranshid <[email protected]>
Refactor client structure to use modular data components
Current State
The client structure allocates memory for replication / pubsub / multi-keys / module / blocked data for every client, despite these features being used by only a small subset of clients. In addition the current field layout in the client struct is suboptimal, with poor alignment and unnecessary padding between fields, leading to a larger than necessary memory footprint of 896 bytes per client. Furthermore, fields that are frequently accessed together during operations are scattered throughout the struct, resulting in poor cache locality.
This PR's Change
Memory Layout Optimization:
Additional changes:
mstate
structKey Benefits
Performance Impact
Tested with 650 clients and 512 bytes values.
Single Thread Performance
8 IO Threads Performance
Pipeline Performance (3M keys)
Observations:
Related issue:#761