-
Notifications
You must be signed in to change notification settings - Fork 653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add cpu-usec metric support under CLUSTER SLOT-STATS command (#20). #712
Conversation
…io#20). The metric tracks cpu time in micro-seconds, sharing the same value as INFO COMMANDSTATS, aggregated under per-slot context. Signed-off-by: Kyle Kim <[email protected]>
First revision only holds implementation changes for initial feedback purposes, with pending perf testing. |
I totally get the value in per-slot information! But wondering about the how part. Have you considered alternatives? For example:
The reason I raise this is we have many slot array in many places in the code and this seems to be arbitrary addition to that and on the other hand we have several duration tracking mechanisms and this adds another one. I am wondering if we can merge/reuse these. In addition I can see we would want additional information like command count and maybe histogram.. this will not extend natively and we should strive to reuse what we have or enhance it, IMO. |
Thanks for your comments. To answer your concerns;
For However, when reviewing the per-slot metrics holistically, with the additions of per-slot 1) Plus, we obtain cache locality benefits from keeping all metrics memory contiguous, as the Given considerations of Valkey 8.0 rc1 code-cutoff timeline, refactoring efforts, remaining per-slot metrics storage cohesiveness, and cache locality benefits, my preference is to keep the typedef struct slotStat {
uint64_t cpu_usec;
uint64_t network_bytes_in;
uint64_t network_bytes_out;
uint64_t memory_bytes;
} slotStat;
struct clusterState {
...
slotStat slot_stats[CLUSTER_SLOTS];
}; |
- Added more integration tests. - Updated canAddCpuDuration() condition to avoid duplicate accounting upon exec, eval and fcall commands. Signed-off-by: Kyle Kim <[email protected]>
Command count might make, I'm not sure we really want a histogram on a per-slot basis seems to be missing the point of statistics though. The point of these metrics is for identify hot shards for migration for auto-balancing, it's not for trying to introspect too much into a specific slot. |
9659a40
to
d3d71c1
Compare
- Added a modifiable config, "cluster-slot-stats-enabled", with default value false. - Added comments for bypassing EXEC, EVAL and FCALL calculations. - Added integration test cases for functions. - Fixed existing integration test cases. - Moved slot_stats array under clusterState. Signed-off-by: Kyle Kim <[email protected]>
- Updated the aggregation to bypass nested commands, and instead account for wrapper commands (EXEC, EVAL, FCALL). - Added slot invalidation for cross-slot lua scripting. - Added resetClusterStats(). - Added documentation to valkey.conf Signed-off-by: Kyle Kim <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM now, just some minor comments, please merge and we can address the comments and merge.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## unstable #712 +/- ##
=========================================
Coverage 70.36% 70.36%
=========================================
Files 112 112
Lines 61275 61308 +33
=========================================
+ Hits 43115 43141 +26
- Misses 18160 18167 +7
|
1a1ce40
to
930fe95
Compare
- Updated integration test to assert against expected slots existence. - Updated valkey.conf Signed-off-by: Kyle Kim <[email protected]>
Signed-off-by: Madelyn Olson <[email protected]>
- Updated ORDERBY handling to raise error appropriately. - Updated integration tests to include the above test case. Signed-off-by: Kyle Kim <[email protected]>
} else { | ||
addReplyError(c, "Unrecognized sort metric for ORDER BY. The supported metrics are: key-count."); | ||
addReplyError(c, "Unrecognized sort metric for ORDERBY."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love this error, it would be nice to be more useful. We can follow up on this though.
Signed-off-by: Madelyn Olson <[email protected]>
Reply schema looked good: https://github.com/valkey-io/valkey/actions/runs/10050729040 |
…io#20). (valkey-io#712) The metric tracks cpu time in micro-seconds, sharing the same value as `INFO COMMANDSTATS`, aggregated under per-slot context. --------- Signed-off-by: Kyle Kim <[email protected]> Signed-off-by: Madelyn Olson <[email protected]> Co-authored-by: Madelyn Olson <[email protected]>
Docs for CLUSTER SLOT-STATS, with key-count, cpu-usec, network-bytes-in, and network-bytes-out metrics. - valkey-io/valkey#351 - valkey-io/valkey#712 - valkey-io/valkey#720 --------- Signed-off-by: Kyle Kim <[email protected]> Signed-off-by: Madelyn Olson <[email protected]> Signed-off-by: Viktor Söderqvist <[email protected]> Co-authored-by: Madelyn Olson <[email protected]> Co-authored-by: Viktor Söderqvist <[email protected]>
The metric tracks cpu time in micro-seconds, sharing the same value as
INFO COMMANDSTATS
, aggregated under per-slot context.