Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce shared query buffer for client reads #258

Merged
merged 11 commits into from
May 28, 2024

Conversation

uriyage
Copy link
Contributor

@uriyage uriyage commented Apr 8, 2024

This PR optimizes client query buffer handling in Valkey by introducing a shared query buffer that is used by default for client reads. This reduces memory usage by ~20KB per client by avoiding allocations for most clients using short (<16KB) complete commands. For larger or partial commands, the client still gets its own private buffer.

The primary changes are:

  • Adding a shared query buffer shared_qb that clients use by default
  • Modifying client querybuf initialization and reset logic
  • Copying any partial query from shared to private buffer before command execution
  • Freeing idle client query buffers when empty to allow reuse of shared buffer
  • Master client query buffers are kept private as their contents need to be preserved for replication stream

In addition to the memory savings, this change shows a 3% improvement in latency and throughput when running with 1000 active clients.

The memory reduction may also help reduce the need to evict clients when reaching max memory limit, as the query buffer is the main memory consumer per client.

@uriyage uriyage force-pushed the shared_qb branch 2 times, most recently from 6ce2ea4 to fc9ee81 Compare April 8, 2024 11:48
Copy link
Member

@madolson madolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall it looks good to me. Can you document the procedure you were using when you saw the 3% performance boost? Did we also test it for pipelined clients?

I would also ideally like some protection in freeClient() to make sure if a client gets freed it somehow doesn't also free the shard query buffer.

src/networking.c Outdated Show resolved Hide resolved
src/networking.c Outdated Show resolved Hide resolved
src/networking.c Outdated Show resolved Hide resolved
src/server.c Show resolved Hide resolved
tests/unit/querybuf.tcl Show resolved Hide resolved
src/replication.c Outdated Show resolved Hide resolved
@uriyage
Copy link
Contributor Author

uriyage commented May 5, 2024

Performance Benchmark:

Server Setup:

  • 3 million keys, each with 512 bytes

Benchmark Configuration:

  • 1,000 clients running SET commands
  • Server and benchmark running on separate ARM instances with 64 cores

Server Command:

./valkey-server --save

Benchmark Command:

./valkey-benchmark -t set -d 512 -r 3000000 -c 1000 --threads 50 -h <server_host_address> -n 50000000

Results (Average of 3 runs):

  1. Without shared query buffer:

    • Throughput: 208,361 operations per second
    • Average latency: 4.806 milliseconds
  2. With shared query buffer:

    • Throughput: 214,062 operations per second
    • Average latency: 4.692 milliseconds

Copy link
Member

@PingXie PingXie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change LTGM overall. Thanks @uriyage

src/server.c Show resolved Hide resolved
src/networking.c Outdated Show resolved Hide resolved
src/networking.c Outdated Show resolved Hide resolved
src/server.c Show resolved Hide resolved
src/networking.c Outdated Show resolved Hide resolved
Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM

src/networking.c Outdated Show resolved Hide resolved
uriyage and others added 5 commits May 9, 2024 05:28
Co-authored-by: Madelyn Olson <[email protected]>
Signed-off-by: Uri Yagelnik <[email protected]>
Signed-off-by: Uri Yagelnik <[email protected]>
Copy link
Member

@madolson madolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with it. @uriyage Can you do the merge to get it up to date?

@madolson madolson added release-notes This issue should get a line item in the release notes to-be-merged Almost ready to merge labels May 9, 2024
src/networking.c Outdated Show resolved Hide resolved
Copy link

codecov bot commented May 13, 2024

Codecov Report

Attention: Patch coverage is 96.22642% with 2 lines in your changes are missing coverage. Please review.

Project coverage is 70.25%. Comparing base (045d475) to head (05224a5).
Report is 3 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable     #258      +/-   ##
============================================
+ Coverage     69.99%   70.25%   +0.25%     
============================================
  Files           109      109              
  Lines         59913    59957      +44     
============================================
+ Hits          41938    42122     +184     
+ Misses        17975    17835     -140     
Files Coverage Δ
src/replication.c 87.21% <100.00%> (+<0.01%) ⬆️
src/server.c 88.93% <100.00%> (+0.02%) ⬆️
src/networking.c 85.40% <95.23%> (+0.27%) ⬆️

... and 16 files with indirect coverage changes

@soloestoy
Copy link
Member

soloestoy commented May 13, 2024

@uriyage can you give more tests about the cases that the whole command or argv exceed PROTO_IOBUF_LEN?

@uriyage
Copy link
Contributor Author

uriyage commented May 13, 2024

@uriyage can you give move test results about the cases that the whole command or argv exceed PROTO_IOBUF_LEN?

I checked with 24K values (since with jemalloc we actually allocate more than 16K).

Commands:

./valkey-server --protected-mode no --save

./valkey-benchmark -t set -d 24000 -r 1000 -c 500 --threads 4 -h 192.31.233.204 -n 10000000

I get 83,544 without a shared query buffer and 83,238 with a shared query buffer.

So, essentially about the same, which is expected since with larger values, we revert to the current logic where we use a private buffer per client.

@uriyage
Copy link
Contributor Author

uriyage commented May 14, 2024

I'm good with it. @uriyage Can you do the merge to get it up to date?

@madolson Done

@madolson
Copy link
Member

@soloestoy Do you have any further followup?

@soloestoy
Copy link
Member

I'm thinking if we can eliminate the limit of shared query buffer (and resize in cron if needed), maybe we can get more benefits.

@uriyage
Copy link
Contributor Author

uriyage commented May 27, 2024

I'm thinking if we can eliminate the limit of shared query buffer (and resize in cron if needed), maybe we can get more benefits.

@soloestoy once we exceed the initial read size for the shared query buffer, we need to transfer the data to the client's private buffer. Leaving the data in the shared buffer isn't an option, therefore, the subsequent read will be anyway directed to the client's private query buffer.

src/networking.c Outdated Show resolved Hide resolved
Signed-off-by: Madelyn Olson <[email protected]>
Copy link
Member

@madolson madolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style comments from clang format change.

src/server.c Outdated Show resolved Hide resolved
src/networking.c Outdated Show resolved Hide resolved
src/networking.c Outdated Show resolved Hide resolved
src/networking.c Outdated Show resolved Hide resolved
Signed-off-by: Madelyn Olson <[email protected]>
@madolson madolson merged commit fd58b73 into valkey-io:unstable May 28, 2024
18 checks passed
@madolson madolson removed the to-be-merged Almost ready to merge label May 28, 2024
madolson pushed a commit that referenced this pull request Jun 3, 2024
#593)

Test `query buffer resized correctly` start to fail
(https://github.com/valkey-io/valkey/actions/runs/9278013807) with
non-jemalloc allocators after
#258 PR.

With Jemalloc we allocate ~20K for the query buffer, in the test we read
1 byte in the first read, in the second read we make sure we have at
least 16KB free place in the query buffer and we have as Jemalloc
allocated 20KB, But with non jemalloc we allocate in the first read
exactly 16KB. in the second read we check and see that we don't have
16KB free space as we already read 1 byte hence we reallocate this time
greedly (*2 of the requested size of 16KB+1) hence the test condition
that the querybuf size is < 32KB is no longer true

The `query buffer resized correctly test` starts
[failing](https://github.com/valkey-io/valkey/actions/runs/9278013807)
with non-jemalloc allocators after PR #258 .

With jemalloc, we allocate ~20KB for the query buffer. In the test, we
read 1 byte initially and then ensure there is at least 16KB of free
space in the buffer for the second read, which is satisfied by
jemalloc's 20KB allocation. However, with non-jemalloc allocators, the
first read allocates exactly 16KB. When we check again, we don't have
16KB free due to the 1 byte already read. This triggers a greedy
reallocation (doubling the requested size of 16KB+1), causing the query
buffer size to exceed the 32KB limit, thus failing the test condition.

This PR adjusted the test query buffer upper limit to be 32KB +2.

Signed-off-by: Uri Yagelnik <[email protected]>
naglera pushed a commit to naglera/placeholderkv that referenced this pull request Jun 10, 2024
valkey-io#593)

Test `query buffer resized correctly` start to fail
(https://github.com/valkey-io/valkey/actions/runs/9278013807) with
non-jemalloc allocators after
valkey-io#258 PR.

With Jemalloc we allocate ~20K for the query buffer, in the test we read
1 byte in the first read, in the second read we make sure we have at
least 16KB free place in the query buffer and we have as Jemalloc
allocated 20KB, But with non jemalloc we allocate in the first read
exactly 16KB. in the second read we check and see that we don't have
16KB free space as we already read 1 byte hence we reallocate this time
greedly (*2 of the requested size of 16KB+1) hence the test condition
that the querybuf size is < 32KB is no longer true

The `query buffer resized correctly test` starts
[failing](https://github.com/valkey-io/valkey/actions/runs/9278013807)
with non-jemalloc allocators after PR valkey-io#258 .

With jemalloc, we allocate ~20KB for the query buffer. In the test, we
read 1 byte initially and then ensure there is at least 16KB of free
space in the buffer for the second read, which is satisfied by
jemalloc's 20KB allocation. However, with non-jemalloc allocators, the
first read allocates exactly 16KB. When we check again, we don't have
16KB free due to the 1 byte already read. This triggers a greedy
reallocation (doubling the requested size of 16KB+1), causing the query
buffer size to exceed the 32KB limit, thus failing the test condition.

This PR adjusted the test query buffer upper limit to be 32KB +2.

Signed-off-by: Uri Yagelnik <[email protected]>
sundb pushed a commit to sundb/redis that referenced this pull request Aug 20, 2024
redis#593)

Test `query buffer resized correctly` start to fail
(https://github.com/valkey-io/valkey/actions/runs/9278013807) with
non-jemalloc allocators after
valkey-io/valkey#258 PR.

With Jemalloc we allocate ~20K for the query buffer, in the test we read
1 byte in the first read, in the second read we make sure we have at
least 16KB free place in the query buffer and we have as Jemalloc
allocated 20KB, But with non jemalloc we allocate in the first read
exactly 16KB. in the second read we check and see that we don't have
16KB free space as we already read 1 byte hence we reallocate this time
greedly (*2 of the requested size of 16KB+1) hence the test condition
that the querybuf size is < 32KB is no longer true

The `query buffer resized correctly test` starts
[failing](https://github.com/valkey-io/valkey/actions/runs/9278013807)
with non-jemalloc allocators after PR #258 .

With jemalloc, we allocate ~20KB for the query buffer. In the test, we
read 1 byte initially and then ensure there is at least 16KB of free
space in the buffer for the second read, which is satisfied by
jemalloc's 20KB allocation. However, with non-jemalloc allocators, the
first read allocates exactly 16KB. When we check again, we don't have
16KB free due to the 1 byte already read. This triggers a greedy
reallocation (doubling the requested size of 16KB+1), causing the query
buffer size to exceed the 32KB limit, thus failing the test condition.

This PR adjusted the test query buffer upper limit to be 32KB +2.

Signed-off-by: Uri Yagelnik <[email protected]>
sundb added a commit to redis/redis that referenced this pull request Sep 4, 2024
This PR is based on the commits from PR
valkey-io/valkey#258,
valkey-io/valkey#593,
valkey-io/valkey#639

This PR optimizes client query buffer handling in Redis by introducing
a reusable query buffer that is used by default for client reads. This
reduces memory usage by ~20KB per client by avoiding allocations for
most clients using short (<16KB) complete commands. For larger or
partial commands, the client still gets its own private buffer.

The primary changes are:

* Adding a reusable query buffer `thread_shared_qb` that clients use by
default.
* Modifying client querybuf initialization and reset logic.
* Freeing idle client query buffers when empty to allow reuse of the
reusable query buffer.
* Master client query buffers are kept private as their contents need to
be preserved for replication stream.
* When nested commands is executed, only the first user uses the reuse
buffer, and subsequent users will still use the private buffer.

In addition to the memory savings, this change shows a 3% improvement in
latency and throughput when running with 1000 active clients.

The memory reduction may also help reduce the need to evict clients when
reaching max memory limit, as the query buffer is the main memory
consumer per client.

This PR is different from valkey-io/valkey#258
1. When a client is in the mid of requiring a reused buffer and
returning it, regardless of whether the query buffer has changed
(expanded), we do not update the reused query buffer in the middle, but
return the reused query buffer (expanded or with data remaining) or
reset it at the end.
2. Adding a new thread variable `thread_shared_qb_used` to avoid
multiple clients requiring the reusable query buffer at the same time.

---------

Signed-off-by: Uri Yagelnik <[email protected]>
Signed-off-by: Madelyn Olson <[email protected]>
Co-authored-by: Uri Yagelnik <[email protected]>
Co-authored-by: Madelyn Olson <[email protected]>
Co-authored-by: oranagra <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-notes This issue should get a line item in the release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants