Add scoped RDB loading context and immediate abort flag #1173

naglera · 2024-10-15T12:04:44Z

This PR introduces a new mechanism for temporarily changing the
server's loading_rio context during RDB loading operations. The new
RDB_SCOPED_LOADING_RIO macro allows for a scoped change of the
server.loading_rio value, ensuring that it's automatically restored
to its original value when the scope ends.

Introduces a dedicated flag to rio to signal immediate abort, preventing
potential use-after-free scenarios during replication disconnection in
dual-channel load. This ensures proper termination of rdbLoadRioWithLoadingCtx
when replication is cancelled due to connection loss on main connection.

Fixes #1152

codecov · 2024-10-15T12:21:17Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 70.84%. Comparing base (4f61034) to head (55e1eec).
Report is 61 commits behind head on unstable.

Additional details and impacted files

@@             Coverage Diff              @@
##           unstable    #1173      +/-   ##
============================================
- Coverage     70.85%   70.84%   -0.01%     
============================================
  Files           118      118              
  Lines         63591    63598       +7     
============================================
- Hits          45058    45057       -1     
- Misses        18533    18541       +8

Files with missing lines	Coverage Δ
src/rdb.c	`76.64% <100.00%> (-0.17%)`	⬇️
src/replication.c	`87.41% <100.00%> (+0.11%)`	⬆️
src/rio.h	`100.00% <100.00%> (ø)`
src/server.c	`87.39% <100.00%> (+<0.01%)`	⬆️
src/server.h	`100.00% <ø> (ø)`

... and 10 files with indirect coverage changes

ranshid · 2024-10-15T12:26:27Z

General comment:
Although I agree this fix will work and at first glance I see no issue with it, I would like to suggest maybe to tackle the problem from a more holistic POV:

Basically we would like to have a method to tell the current load process to stop ASAP. This can also be achieved by adding an RIO flag (eg #define RIO_FLAG_STOP_ASAP (1 << 2)) and have rio check for this flag when it is performing different io operations. the only issue is that the rdb RIO is local to the rdbLoadRioWithCtx. we can, however keep a pointer in the server to the current active loading rio so that in any point during load we can set the flag RIO_FLAG_STOP_ASAP on the current loading rio. IMO this would be cleaner.

src/server.h

src/rdb.c

src/rio.h

…onnection handling Introduces a dedicated flag in provisional primary struct to signal immediate abort, preventing potential use-after-free scenarios during replication disconnection in dual-channel load. This ensures proper termination of rdbLoadRioWithLoadingCtx when replication is cancelled due to connection loss on main connection. Fixes valkey-io#1152 Signed-off-by: naglera <[email protected]>

Signed-off-by: Madelyn Olson <[email protected]>

- Add test to consistently reproduce rdb load callback crash - Avoid checking close_asap when no data was processed Signed-off-by: naglera <[email protected]>

…ion disconnection handling" This reverts commit b873d41. Signed-off-by: naglera <[email protected]>

This commit introduces a new mechanism for temporarily changing the server's loading_rio context during RDB loading operations. The new RDB_SCOPED_LOADING_RIO macro allows for a scoped change of the server.loading_rio value, ensuring that it's automatically restored to its original value when the scope ends. Signed-off-by: naglera <[email protected]>

Signed-off-by: naglera <[email protected]>

ranshid

Thank you @naglera looks promising! I like scoped actions, but I only want to make sure about the compiler support is not compromised.
BTW if it is not, we can consider having a generic ScopeGuard macro in Valkey

src/replication.c

src/rio.h

src/replication.c

src/rdb.h

src/server.h

tests/integration/dual-channel-replication.tcl

Co-authored-by: ranshid <[email protected]> Signed-off-by: Amit Nagler <[email protected]>

Signed-off-by: naglera <[email protected]>

src/replication.c

Signed-off-by: naglera <[email protected]>

ranshid

I approve in order to indicate this generally looks fine to me. We do need to decide on the cleanup attribute use which I think is mostly supported with some exceptions. (At least we would probably get compilation error IMO)

tests/integration/dual-channel-replication.tcl

Signed-off-by: Madelyn Olson <[email protected]>

madolson

Mostly looks good to me.

tests/integration/dual-channel-replication.tcl

Signed-off-by: naglera <[email protected]>

…plica while syncing (only expect it to be eventually connected) Signed-off-by: naglera <[email protected]>

src/server.h

Signed-off-by: naglera <[email protected]>

ranshid · 2024-11-27T13:37:22Z

@madolson I reviewed and approved. However since you were also reviewing and had some comments would wait for you to approve as well before we merge.

ranshid · 2024-12-02T10:50:08Z

@naglera we need to rebase and resolve the conflicts

Signed-off-by: Amit Nagler <[email protected]>

tests/integration/dual-channel-replication.tcl

- By not waiting `repl-diskless-sync-delay` when we don't have to, we can reduce ~30% of dual channel tests execution time. - This commit also drops one test which is not required for regular sync (`Sync should continue if not all slaves dropped`). - Skip dual channel test with master diskless disabled because it will initiate the same synchronization process as the non-dual channel test, making it redundant. Before: ``` Execution time of different units: 171 seconds - integration/dual-channel-replication 305 seconds - integration/replication-psync \o/ All tests passed without errors! ``` After: ``` Execution time of different units: 120 seconds - integration/dual-channel-replication 236 seconds - integration/replication-psync \o/ All tests passed without errors! ``` Discused on #1173 --------- Signed-off-by: naglera <[email protected]>

ranshid · 2024-12-24T06:14:27Z

Daily test failures do not seem related to this feature

This PR introduces a new mechanism for temporarily changing the server's loading_rio context during RDB loading operations. The new `RDB_SCOPED_LOADING_RIO` macro allows for a scoped change of the `server.loading_rio` value, ensuring that it's automatically restored to its original value when the scope ends. Introduces a dedicated flag to `rio` to signal immediate abort, preventing potential use-after-free scenarios during replication disconnection in dual-channel load. This ensures proper termination of `rdbLoadRioWithLoadingCtx` when replication is cancelled due to connection loss on main connection. Fixes valkey-io#1152 --------- Signed-off-by: naglera <[email protected]> Signed-off-by: Madelyn Olson <[email protected]> Signed-off-by: Amit Nagler <[email protected]> Co-authored-by: Madelyn Olson <[email protected]> Co-authored-by: ranshid <[email protected]>

This PR introduces a new mechanism for temporarily changing the server's loading_rio context during RDB loading operations. The new `RDB_SCOPED_LOADING_RIO` macro allows for a scoped change of the `server.loading_rio` value, ensuring that it's automatically restored to its original value when the scope ends. Introduces a dedicated flag to `rio` to signal immediate abort, preventing potential use-after-free scenarios during replication disconnection in dual-channel load. This ensures proper termination of `rdbLoadRioWithLoadingCtx` when replication is cancelled due to connection loss on main connection. Fixes #1152 --------- Signed-off-by: naglera <[email protected]> Signed-off-by: Madelyn Olson <[email protected]> Signed-off-by: Amit Nagler <[email protected]> Co-authored-by: Madelyn Olson <[email protected]> Co-authored-by: ranshid <[email protected]>

This PR is based on: #12109 valkey-io/valkey#60 Closes: #11678 **Motivation** During a full sync, when master is delivering RDB to the replica, incoming write commands are kept in a replication buffer in order to be sent to the replica once RDB delivery is completed. If RDB delivery takes a long time, it might create memory pressure on master. Also, once a replica connection accumulates replication data which is larger than output buffer limits, master will kill replica connection. This may cause a replication failure. The main benefit of the rdb channel replication is streaming incoming commands in parallel to the RDB delivery. This approach shifts replication stream buffering to the replica and reduces load on master. We do this by opening another connection for RDB delivery. The main channel on replica will be receiving replication stream while rdb channel is receiving the RDB. This feature also helps to reduce master's main process CPU load. By opening a dedicated connection for the RDB transfer, the bgsave process has access to the new connection and it will stream RDB directly to the replicas. Before this change, due to TLS connection restriction, the bgsave process was writing RDB bytes to a pipe and the main process was forwarding it to the replica. This is no longer necessary, the main process can avoid these expensive socket read/write syscalls. It also means RDB delivery to replica will be faster as it avoids this step. In summary, replication will be faster and master's performance during full syncs will improve. **Implementation steps** 1. When replica connects to the master, it sends 'rdb-channel-repl' as part of capability exchange to let master to know replica supports rdb channel. 2. When replica lacks sufficient data for PSYNC, master sends +RDBCHANNELSYNC reply with replica's client id. As the next step, the replica opens a new connection (rdb-channel) and configures it against the master with the appropriate capabilities and requirements. It also sends given client id back to master over rdbchannel, so that master can associate these channels. (initial replica connection will be referred as main-channel) Then, replica requests fullsync using the RDB channel. 3. Prior to forking, master attaches the replica's main channel to the replication backlog to deliver replication stream starting at the snapshot end offset. 4. The master main process sends replication stream via the main channel, while the bgsave process sends the RDB directly to the replica via the rdb-channel. Replica accumulates replication stream in a local buffer, while the RDB is being loaded into the memory. 5. Once the replica completes loading the rdb, it drops the rdb channel and streams the accumulated replication stream into the db. Sync is completed. **Some details** - Currently, rdbchannel replication is supported only if `repl-diskless-sync` is enabled on master. Otherwise, replication will happen over a single connection as in before. - On replica, there is a limit to replication stream buffering. Replica uses a new config `replica-full-sync-buffer-limit` to limit number of bytes to accumulate. If it is not set, replica inherits `client-output-buffer-limit <replica>` hard limit config. If we reach this limit, replica stops accumulating. This is not a failure scenario though. Further accumulation will happen on master side. Depending on the configured limits on master, master may kill the replica connection. **API changes in INFO output:** 1. New replica state: `send_bulk_and_stream`. Indicates full sync is still in progress for this replica. It is receiving replication stream and rdb in parallel. ``` slave0:ip=127.0.0.1,port=5002,state=send_bulk_and_stream,offset=0,lag=0 ``` Replica state changes in steps: - First, replica sends psync and receives +RDBCHANNELSYNC :`state=wait_bgsave` - After replica connects with rdbchannel and delivery starts: `state=send_bulk_and_stream` - After full sync: `state=online` 2. On replica side, replication stream buffering metrics: - replica_full_sync_buffer_size: Currently accumulated replication stream data in bytes. - replica_full_sync_buffer_peak: Peak number of bytes that this instance accumulated in the lifetime of the process. ``` replica_full_sync_buffer_size:20485 replica_full_sync_buffer_peak:1048560 ``` **API changes in CLIENT LIST** In `client list` output, rdbchannel clients will have 'C' flag in addition to 'S' replica flag: ``` id=11 addr=127.0.0.1:39108 laddr=127.0.0.1:5001 fd=14 name= age=5 idle=5 flags=SC db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=0 argv-mem=0 multi-mem=0 rbs=1024 rbp=0 obl=0 oll=0 omem=0 tot-mem=1920 events=r cmd=psync user=default redir=-1 resp=2 lib-name= lib-ver= io-thread=0 ``` **Config changes:** - `replica-full-sync-buffer-limit`: Controls how much replication data replica can accumulate during rdbchannel replication. If it is not set, a value of 0 means replica will inherit `client-output-buffer-limit <replica>` hard limit config to limit accumulated data. - `repl-rdb-channel` config is added as a hidden config. This is mostly for testing as we need to support both rdbchannel replication and the older single connection replication (to keep compatibility with older versions and rdbchannel replication will not be enabled if repl-diskless-sync is not enabled). it affects both the master (not to respond to rdb channel requests), and the replica (not to declare capability) **Internal API changes:** Changes that were introduced to Redis replication: - New replication capability is added to replconf command: `capa rdb-channel-repl`. Indicates replica is capable of rdb channel replication. Replica sends it when it connects to master along with other capabilities. - If replica needs fullsync, master replies `+RDBCHANNELSYNC <client-id>` to the replica's PSYNC request. - When replica opens rdbchannel connection, as part of replconf command, it sends `rdb-channel 1` to let master know this is rdb channel. Also, it sends `main-ch-client-id <client-id>` as part of replconf command so master can associate channels. **Testing:** As rdbchannel replication is enabled by default, we run whole test suite with it. Though, as we need to support both rdbchannel and single connection replication, we'll be running some tests twice with `repl-rdb-channel yes/no` config. **Replica state diagram** ``` * * Replica state machine * * * Main channel state * ┌───────────────────┐ * │RECEIVE_PING_REPLY │ * └────────┬──────────┘ * │ +PONG * ┌────────▼──────────┐ * │SEND_HANDSHAKE │ RDB channel state * └────────┬──────────┘ ┌───────────────────────────────┐ * │+OK ┌───► RDB_CH_SEND_HANDSHAKE │ * ┌────────▼──────────┐ │ └──────────────┬────────────────┘ * │RECEIVE_AUTH_REPLY │ │ REPLCONF main-ch-client-id <clientid> * └────────┬──────────┘ │ ┌──────────────▼────────────────┐ * │+OK │ │ RDB_CH_RECEIVE_AUTH_REPLY │ * ┌────────▼──────────┐ │ └──────────────┬────────────────┘ * │RECEIVE_PORT_REPLY │ │ │ +OK * └────────┬──────────┘ │ ┌──────────────▼────────────────┐ * │+OK │ │ RDB_CH_RECEIVE_REPLCONF_REPLY│ * ┌────────▼──────────┐ │ └──────────────┬────────────────┘ * │RECEIVE_IP_REPLY │ │ │ +OK * └────────┬──────────┘ │ ┌──────────────▼────────────────┐ * │+OK │ │ RDB_CH_RECEIVE_FULLRESYNC │ * ┌────────▼──────────┐ │ └──────────────┬────────────────┘ * │RECEIVE_CAPA_REPLY │ │ │+FULLRESYNC * └────────┬──────────┘ │ │Rdb delivery * │ │ ┌──────────────▼────────────────┐ * ┌────────▼──────────┐ │ │ RDB_CH_RDB_LOADING │ * │SEND_PSYNC │ │ └──────────────┬────────────────┘ * └─┬─────────────────┘ │ │ Done loading * │PSYNC (use cached-master) │ │ * ┌─▼─────────────────┐ │ │ * │RECEIVE_PSYNC_REPLY│ │ ┌────────────►│ Replica streams replication * └─┬─────────────────┘ │ │ │ buffer into memory * │ │ │ │ * │+RDBCHANNELSYNC client-id │ │ │ * ├──────┬───────────────────┘ │ │ * │ │ Main channel │ │ * │ │ accumulates repl data │ │ * │ ┌──▼────────────────┐ │ ┌───────▼───────────┐ * │ │ REPL_TRANSFER ├───────┘ │ CONNECTED │ * │ └───────────────────┘ └────▲───▲──────────┘ * │ │ │ * │ │ │ * │ +FULLRESYNC ┌───────────────────┐ │ │ * ├────────────────► REPL_TRANSFER ├────┘ │ * │ └───────────────────┘ │ * │ +CONTINUE │ * └──────────────────────────────────────────────┘ */ ``` ----- This PR also contains changes and ideas from: valkey-io/valkey#837 valkey-io/valkey#1173 valkey-io/valkey#804 valkey-io/valkey#945 valkey-io/valkey#989 --------- Co-authored-by: Yuan Wang <[email protected]> Co-authored-by: debing.sun <[email protected]> Co-authored-by: Moti Cohen <[email protected]> Co-authored-by: naglera <[email protected]> Co-authored-by: Amit Nagler <[email protected]> Co-authored-by: Madelyn Olson <[email protected]> Co-authored-by: Binbin <[email protected]> Co-authored-by: Viktor Söderqvist <[email protected]> Co-authored-by: Ping Xie <[email protected]> Co-authored-by: Ran Shidlansik <[email protected]> Co-authored-by: ranshid <[email protected]> Co-authored-by: xbasel <[email protected]>

- By not waiting `repl-diskless-sync-delay` when we don't have to, we can reduce ~30% of dual channel tests execution time. - This commit also drops one test which is not required for regular sync (`Sync should continue if not all slaves dropped`). - Skip dual channel test with master diskless disabled because it will initiate the same synchronization process as the non-dual channel test, making it redundant. Before: ``` Execution time of different units: 171 seconds - integration/dual-channel-replication 305 seconds - integration/replication-psync \o/ All tests passed without errors! ``` After: ``` Execution time of different units: 120 seconds - integration/dual-channel-replication 236 seconds - integration/replication-psync \o/ All tests passed without errors! ``` Discused on valkey-io#1173 --------- Signed-off-by: naglera <[email protected]>

This PR introduces a new mechanism for temporarily changing the server's loading_rio context during RDB loading operations. The new `RDB_SCOPED_LOADING_RIO` macro allows for a scoped change of the `server.loading_rio` value, ensuring that it's automatically restored to its original value when the scope ends. Introduces a dedicated flag to `rio` to signal immediate abort, preventing potential use-after-free scenarios during replication disconnection in dual-channel load. This ensures proper termination of `rdbLoadRioWithLoadingCtx` when replication is cancelled due to connection loss on main connection. Fixes valkey-io#1152 --------- Signed-off-by: naglera <[email protected]> Signed-off-by: Madelyn Olson <[email protected]> Signed-off-by: Amit Nagler <[email protected]> Co-authored-by: Madelyn Olson <[email protected]> Co-authored-by: ranshid <[email protected]>

naglera force-pushed the load-callback-crash branch from acedb47 to 4aee158 Compare October 15, 2024 12:06

xbasel reviewed Oct 15, 2024

View reviewed changes

src/server.h Outdated Show resolved Hide resolved

madolson reviewed Oct 15, 2024

View reviewed changes

src/rdb.c Outdated Show resolved Hide resolved

madolson reviewed Oct 15, 2024

View reviewed changes

src/rio.h Outdated Show resolved Hide resolved

naglera force-pushed the load-callback-crash branch 2 times, most recently from a7aac51 to 6f9d737 Compare October 21, 2024 16:59

naglera and others added 5 commits October 29, 2024 11:48

Update src/rio.h

61ab0a5

Signed-off-by: Madelyn Olson <[email protected]>

Consistently reproduce crash and improve test reliability

4997941

- Add test to consistently reproduce rdb load callback crash - Avoid checking close_asap when no data was processed Signed-off-by: naglera <[email protected]>

Revert "Add ASAP abort flag to provisional primary for safer replicat…

f89b716

…ion disconnection handling" This reverts commit b873d41. Signed-off-by: naglera <[email protected]>

naglera force-pushed the load-callback-crash branch from d5e83f6 to 9849350 Compare October 29, 2024 11:50

naglera changed the title ~~Add ASAP abort flag to provisional primary for safer replication disconnection handling~~ Add scoped RDB loading context and immediate abort flag Oct 29, 2024

return 0 on async error during rioWrite (for symmetricality)

eba00eb

Signed-off-by: naglera <[email protected]>

naglera force-pushed the load-callback-crash branch from 6cc4f5e to eba00eb Compare October 29, 2024 12:10

ranshid reviewed Oct 29, 2024

View reviewed changes

naglera and others added 2 commits October 29, 2024 16:11

Update src/server.h

b82704f

Co-authored-by: ranshid <[email protected]> Signed-off-by: Amit Nagler <[email protected]>

Fix comments

4d8126b

Signed-off-by: naglera <[email protected]>

xbasel reviewed Oct 30, 2024

View reviewed changes

src/replication.c Outdated Show resolved Hide resolved

naglera added 3 commits October 31, 2024 08:03

Signal rio to be closed asap on connection state change

39f0217

Signed-off-by: naglera <[email protected]>

Test fix- Wait for replica online

88466b6

Signed-off-by: naglera <[email protected]>

typo fix

36ec221

Signed-off-by: naglera <[email protected]>

ranshid approved these changes Nov 11, 2024

View reviewed changes

madolson added the run-extra-tests Run extra tests on this PR (Runs all tests from daily except valgrind and RESP) label Nov 15, 2024

madolson reviewed Nov 15, 2024

View reviewed changes

tests/integration/dual-channel-replication.tcl Outdated Show resolved Hide resolved

Update tests/integration/dual-channel-replication.tcl

0af35e3

Signed-off-by: Madelyn Olson <[email protected]>

madolson reviewed Nov 15, 2024

View reviewed changes

tests/integration/dual-channel-replication.tcl Outdated Show resolved Hide resolved

tests/integration/dual-channel-replication.tcl Outdated Show resolved Hide resolved

Wrapper for rdbLoadRioWithLoadingCtx for scoped RDB

cd9dca0

Signed-off-by: naglera <[email protected]>

naglera added 2 commits November 17, 2024 11:34

Decrease rdb key save delay

43b1e22

Signed-off-by: naglera <[email protected]>

after resetting rdb-key-save-delay to 0, don't expect to catch the re…

612e621

…plica while syncing (only expect it to be eventually connected) Signed-off-by: naglera <[email protected]>

ranshid reviewed Nov 26, 2024

View reviewed changes

src/server.h Show resolved Hide resolved

Remove redundent test's logline set

41ea9e9

Signed-off-by: naglera <[email protected]>

naglera force-pushed the load-callback-crash branch from 90cde6b to 41ea9e9 Compare November 26, 2024 08:27

nullify loading_rio on startup

c6f676e

Signed-off-by: naglera <[email protected]>

Merge branch 'unstable' into load-callback-crash

55e1eec

Signed-off-by: Amit Nagler <[email protected]>

madolson reviewed Dec 17, 2024

View reviewed changes

tests/integration/dual-channel-replication.tcl Show resolved Hide resolved

naglera mentioned this pull request Dec 23, 2024

Reduce dual channel testing time #1477

Merged

madolson approved these changes Dec 24, 2024

View reviewed changes

ranshid merged commit 9f4503c into valkey-io:unstable Dec 24, 2024
56 of 57 checks passed

naglera mentioned this pull request Jan 1, 2025

[Test Failure] Timeout for dual channel replication #1491

Closed

madolson added the release-notes This issue should get a line item in the release notes label Jan 6, 2025

tezc mentioned this pull request Jan 8, 2025

Rdb channel replication redis/redis#13732

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add scoped RDB loading context and immediate abort flag #1173

Add scoped RDB loading context and immediate abort flag #1173

naglera commented Oct 15, 2024 •

edited

Loading

codecov bot commented Oct 15, 2024 •

edited

Loading

ranshid commented Oct 15, 2024

ranshid left a comment

ranshid left a comment

madolson left a comment

ranshid commented Nov 27, 2024

ranshid commented Dec 2, 2024

ranshid commented Dec 24, 2024

Add scoped RDB loading context and immediate abort flag #1173

Add scoped RDB loading context and immediate abort flag #1173

Conversation

naglera commented Oct 15, 2024 • edited Loading

codecov bot commented Oct 15, 2024 • edited Loading

Codecov Report

ranshid commented Oct 15, 2024

ranshid left a comment

Choose a reason for hiding this comment

ranshid left a comment

Choose a reason for hiding this comment

madolson left a comment

Choose a reason for hiding this comment

ranshid commented Nov 27, 2024

ranshid commented Dec 2, 2024

ranshid commented Dec 24, 2024

naglera commented Oct 15, 2024 •

edited

Loading

codecov bot commented Oct 15, 2024 •

edited

Loading