Integrate replication manager with networking stack #387

adzialocha · 2023-05-24T10:31:04Z

This PR introduces a new replication service which represents the replication protocol layer on top of the networking stack. It uses the service bus to communicate with the networking service, gets informed about established and closed connection with peers and new incoming replication messages.

The replication service keeps track of all currently available peers, can initiate new replication sessions with a simple scheduler logic and maintains the sync manager which is the state machine dealing with the actual replication protocol logic.

Closes: #373 and #396

📋 Checklist

Add tests that cover your changes
Add this PR to the Unreleased section in CHANGELOG.md
Link this PR to any issues it closes
New files contain a SPDX license header

…plication messages

aquadoggo/src/replication/service.rs

sandreae

We finally made it 🤣 !!! Thanks for these final changes, it's really clear what's going on now and I'm observing very consistent behaviour when testing locally. Much better experience observing node behaviour now that the reaction to disconnects is immediate.

This final approach to peers/connections should make it easy to transition to scheduling dialing peers from a behaviour, rather than in the sync manager.

I believe I'm still experiencing one issue which occurs when new schema are being materialized. It looks like a materialization task crashes and so the schema provider is never updated, if i restart the node, the provider is updated and replication can continue. Logs are here: https://laub.liebechaos.org/WQ3SzzmtSPyfESfLfNEgFA don't know if this is a concern of the current PR though, I can open an issue once I understand the problem better.

I renamed Naive strategy to LogHeight. It's not perfect, but seems mean to call the current setup naive 😄

Would be good to figure out the cause of the issue (above) but apart from that I'm very happy with this PR now.

CHANGELOG.md

aquadoggo/src/network/peers/behaviour.rs

aquadoggo/src/network/peers/handler.rs

sandreae · 2023-06-19T05:46:29Z

aquadoggo/src/network/peers/handler.rs

        }
    }

    fn connection_keep_alive(&self) -> KeepAlive {
-        self.keep_alive
+        if self.critical_error {
+            return KeepAlive::No;


Just to check my understanding, right now we still expect this to trigger the connection to be gracefully closed as no other behaviours will keep it open. Right?

When all behaviours flip KeepAlive to false then the connection gets removed (time out)

aquadoggo/src/network/mod.rs

aquadoggo/src/network/peers/peer.rs

aquadoggo/src/network/service.rs

adzialocha · 2023-06-19T07:39:35Z

I believe I'm still experiencing one issue which occurs when new schema are being materialized. It looks like a materialization task crashes and so the schema provider is never updated, if i restart the node, the provider is updated and replication can continue. Logs are here: https://laub.liebechaos.org/WQ3SzzmtSPyfESfLfNEgFA don't know if this is a concern of the current PR though, I can open an issue once I understand the problem better.

Yes, I observed the same! It's caused by entries arriving in unexpected order (or at least some race condition). Also think it's unrelated to the networking stack

sandreae · 2023-06-19T07:45:17Z

Wahoo!! MERGED!

adzialocha · 2023-06-19T07:56:18Z

Wahoo!! MERGED!

Juhuuu!!!

* development: (23 commits) Implement `dialer` behaviour (#444) Sort expected results in strategy tests Update CHANGELOG Replicate operations in topo order (#442) Maintain sorted operation indexes (#438) Use fork of `asynchronous-codec` (#440) Ingest check for duplicate entries (#439) Reverse lookup for pinned relations in dependency task (#434) Remove unnecessary exact version pinning in Cargo.toml Make `TaskInput` an enum and other minor clean ups in materialiser (#429) Use `libp2p` `v0.52.0` (#425) Fix race condition when check for existing view ids was too early (#420) Reduce logging verbosity CI: Temporary workaround for Rust compiler bug (#417) Fix early document view insertion (#413) Handle duplicate document view insertions (#410) Decouple p2panda's authentication data types from libp2p's (#408) Remove dead_code attribute in lib Integrate replication manager with networking stack (#387) Implement naive replication protocol (#380) ...

sandreae and others added 12 commits May 23, 2023 01:35

Use SyncMessage in replication network behaviour

d3bc9e7

Use target set in sync request

2d2c05c

Convert integer to Mode

d812c06

Add replication to main behaviour struct

3c2d696

Add SyncManager to replication behaviour

9e522a2

Add schema provider to behaviour

e73c3a7

Move mananger again out of network behaviour, add replication service

3bd31c4

Introduce event loop to handle swarm and channel events

abdaa74

Add new service message types to enum

3c11663

Better method name and structure for event loop

9a67a46

Send and receive service messages on new or closed connections and re…

26d6445

…plication messages

Have peer id on network config struct

b0e44eb

adzialocha changed the base branch from main to development May 24, 2023 10:31

adzialocha changed the title ~~Integrate replication manager to behaviour~~ Integrate replication manager with networking stack May 24, 2023

adzialocha added 9 commits May 24, 2023 15:49

Introduce connection manager in replication service

e7a9bab

Prepare methods for finished or failing sessions

93ae241

Add and remove peers in connection manager

ffd2d09

Count failed and successful sessions

a113ac4

Initiate replication with peers

448e8f1

Add some basic logging

2e12f3a

Do not override with default when building config in cli

00a32ba

Fix checking only for certain messages in async loop

f89be7e

Clippy happy, developer happy

2f255e4

adzialocha commented May 25, 2023

View reviewed changes

aquadoggo/src/replication/service.rs Outdated Show resolved Hide resolved

sandreae added 5 commits May 25, 2023 16:04

Make Domain error in IngestError transparent

414b866

Add logging for replication entry exchange

f60b1b2

Sort system schema to the front of TargetSet

d561290

Refactor log height diff logic

3d7bd86

Don't diff over schema sub-range of target set

df90bbb

sandreae linked an issue May 25, 2023 that may be closed by this pull request

Send schema entries first during replication #389

Closed

adzialocha added 4 commits June 16, 2023 21:10

Fix issue where outbound streams could not be re-established after error

0c3dde9

Add behaviour logic which always uses latest healthy connection

bfd9def

Rename to peers behaviour

891892f

Make clippy happy

94378ee

adzialocha requested a review from sandreae June 16, 2023 19:18

adzialocha added 9 commits June 16, 2023 21:19

Add entry to CHANGELOG.md

45a5976

Use connection ids to identify peers

a2c28da

Clean up logging a little bit

158ca76

A little bit less verbose logging

bca3e16

Fix tests

c8e6f04

Add a test for connection manager

b4a77cc

Write some more doc-strings

908e4fe

Add more docs

4461a19

Disconnect from all peers before shutdown

e3daade

This was linked to issues Jun 18, 2023

Remove all sessions for a peer when connection closes #386

Closed

Get latest schema provider state before each replication session begins #400

Closed

Gracefully disconnect peers when shutting down application #402

Closed

sandreae added 4 commits June 19, 2023 13:29

Dial peers by multiaddr on mdns discovery

0215c62

Rename Naive -> LogHeight strategy

1bcba23

Naming improvement

9091145

Doc strings

d6ea4f4

sandreae approved these changes Jun 19, 2023

View reviewed changes

fmt

462c4ff

sandreae reviewed Jun 19, 2023

View reviewed changes

aquadoggo/src/network/service.rs Show resolved Hide resolved

adzialocha merged commit c45b96a into development Jun 19, 2023

adzialocha deleted the integrate-replication-manager-to-behaviour branch June 19, 2023 07:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate replication manager with networking stack #387

Integrate replication manager with networking stack #387

adzialocha commented May 24, 2023 •

edited

Loading

sandreae left a comment

sandreae Jun 19, 2023

adzialocha Jun 19, 2023

adzialocha commented Jun 19, 2023

sandreae commented Jun 19, 2023

adzialocha commented Jun 19, 2023

Integrate replication manager with networking stack #387

Integrate replication manager with networking stack #387

Conversation

adzialocha commented May 24, 2023 • edited Loading

📋 Checklist

sandreae left a comment

Choose a reason for hiding this comment

sandreae Jun 19, 2023

Choose a reason for hiding this comment

adzialocha Jun 19, 2023

Choose a reason for hiding this comment

adzialocha commented Jun 19, 2023

sandreae commented Jun 19, 2023

adzialocha commented Jun 19, 2023

adzialocha commented May 24, 2023 •

edited

Loading