Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Nwaku Sync #2403

Merged
merged 7 commits into from
Aug 13, 2024
Merged

feat: Nwaku Sync #2403

merged 7 commits into from
Aug 13, 2024

Conversation

SionoiS
Copy link
Contributor

@SionoiS SionoiS commented Feb 6, 2024

Description

This is the new Waku Sync protocol. It works by wrapping the Negentropy protocol which implements RBSR.
See this research post for more info.

At configurable intervals (A sync can also be requested ad-hoc), a sync request is sent to a peer chosen by the peer manager. After multiple requests and responses between the 2 peers, a callback containing the list of missing hashes is triggered. With the list of missing hashes, store is used to request the messages from the other node.

WIP spec -> https://github.com/waku-org/specs/blob/feat--waku-sync/standards/core/sync.md

Changes

  • new Waku Sync protocol.
  • new config params.
  • waku_node.nim mount the new protocol.
  • Negentropy Nim Bindings.
  • tests.

Tracking

@SionoiS SionoiS self-assigned this Feb 6, 2024
@SionoiS SionoiS added E:2.3: Basic distributed Store services See https://github.com/waku-org/pm/issues/64 for details and removed E:2.3: Basic distributed Store services See https://github.com/waku-org/pm/issues/64 for details labels Feb 6, 2024
Copy link

github-actions bot commented Feb 6, 2024

You can find the image built from this PR at

quay.io/wakuorg/nwaku-pr:2403-rln-v2-false

Built from 21f4fd5

waku/node/waku_node.nim Outdated Show resolved Hide resolved
Copy link
Contributor

@jm-clius jm-clius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially imagined writing our own wrapper protocol around the underlying bindings (rather than just passing the payload directly onto the wire). For now, this approach will lead us to faster POC and tests, so seems reasonable to me. Do you have any intuition for what our longer term approach will be?

waku/waku_sync/protocol.nim Outdated Show resolved Hide resolved
waku/waku_sync/protocol.nim Outdated Show resolved Hide resolved
waku/node/waku_node.nim Outdated Show resolved Hide resolved
@SionoiS
Copy link
Contributor Author

SionoiS commented Feb 8, 2024

I initially imagined writing our own wrapper protocol around the underlying bindings (rather than just passing the payload directly onto the wire). For now, this approach will lead us to faster POC and tests, so seems reasonable to me. Do you have any intuition for what our longer term approach will be?

Negentropy protocol has only one option maxFrameSize that's it. I'm not sure what other information we might want to exchange justifying wrapping the protocol. I'll think about it...

@SionoiS
Copy link
Contributor Author

SionoiS commented Mar 7, 2024

Sorry I squashed everything and loss your work attribution @chaitanyaprem

@chaitanyaprem
Copy link
Contributor

Sorry I squashed everything and loss your work attribution @chaitanyaprem

Ah..hope it won't be considered as no contributions from me for the past few weeks :)

Maybe you can mention me as Co-author in comments, like this https://docs.github.com/en/pull-requests/committing-changes-to-your-project/creating-and-editing-commits/creating-a-commit-with-multiple-authors

@SionoiS
Copy link
Contributor Author

SionoiS commented Mar 21, 2024

I notice that requests are received twice and that connections are not closed properly. I will investigate further.

Copy link

github-actions bot commented Apr 30, 2024

You can find the image built from this PR at

quay.io/wakuorg/nwaku-pr:2403-rln-v1

Built from 760d6b2

Copy link

github-actions bot commented Apr 30, 2024

You can find the image built from this PR at

quay.io/wakuorg/nwaku-pr:2403-rln-v2

Built from 760d6b2

@SionoiS
Copy link
Contributor Author

SionoiS commented May 3, 2024

I have stashed an update with a new pruning mechanism for after the state machine PR is merged.

@SionoiS SionoiS force-pushed the feat--nwaku-sync branch 2 times, most recently from 58d78d0 to 0ae1858 Compare May 14, 2024 13:03
@SionoiS SionoiS marked this pull request as ready for review May 14, 2024 15:45
Makefile Show resolved Hide resolved
vendor/negentropy Outdated Show resolved Hide resolved
waku/factory/external_config.nim Show resolved Hide resolved
waku/factory/external_config.nim Show resolved Hide resolved
waku/node/waku_node.nim Outdated Show resolved Hide resolved
waku/waku_sync/protocol.nim Outdated Show resolved Hide resolved
waku/waku_sync/protocol.nim Show resolved Hide resolved
waku/waku_sync/protocol.nim Show resolved Hide resolved
waku/waku_sync/protocol.nim Outdated Show resolved Hide resolved
waku/waku_sync/protocol.nim Outdated Show resolved Hide resolved
Copy link
Contributor

@jm-clius jm-clius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this. Really great effort and good work (also @chaitanyaprem). I do have a few comments below which I would appreciate your looking into. Approving, though, as I do not want to stand in the way of deployed experiments starting.

import std/[strutils], chronicles, std/options, stew/[results, byteutils], confutils
import ../waku_core/message

const negentropyPath =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there no way to compile a static library into our binary here?

If not, note that this will be a resource that also needs to be included in our Docker images (and, as Prem noted elsewhere, into nwaku-compose)

Copy link
Contributor

@chaitanyaprem chaitanyaprem May 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be possible, but as per recommendations noted in nim language bindings guide in status..I chose share lib.

waku/factory/external_config.nim Outdated Show resolved Hide resolved
waku/factory/external_config.nim Show resolved Hide resolved
waku/waku_sync/raw_bindings.nim Outdated Show resolved Hide resolved
waku/waku_sync/storage_manager.nim Show resolved Hide resolved
@@ -193,6 +195,125 @@ proc connectToNodes*(
# NOTE Connects to the node without a give protocol, which automatically creates streams for relay
await peer_manager.connectToNodes(node.peerManager, nodes, source = source)

## Waku Sync

proc mountWakuSync*(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still a bit hard for me to see why this would not be better managed in a composite protocol WakuStoreSync :). I understand the objection to arbitrarily creating dependencies, but in this case we are defining a composite protocol that depends unidirectionally on wakuSync, a local archive and a storeClient. This is now simply implicit in the mount code, but imho can be better managed if the relationship is made explicit in a separate module. We should also expect this to become more complex in future, with e.g. sanity checks for local archive and negentropy configurations, penalising non-responsive/misbehaving store sync peers, etc. In fact, the interrelationship between all these elements to form Store Sync should likely be a specification on its own.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of agree, but maybe for now we can leave it as is and refactor this once we start next phase of work which involves handling more scenarios etc?

waku/waku_sync/protocol.nim Outdated Show resolved Hide resolved
waku/node/waku_node.nim Outdated Show resolved Hide resolved
waku/node/waku_node.nim Outdated Show resolved Hide resolved
@SionoiS
Copy link
Contributor Author

SionoiS commented Jul 30, 2024

After discussing with @jm-clius I will have to add a SyncRange and a SyncInterval so that we can sync the last hour every 5m for example.

Copy link
Collaborator

@Ivansete-status Ivansete-status left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great one! Congrats for it! 🙌
I've added a bunch of comments, some of them are pedantic and others are potentially more interesting :)
Very insightful PR

Makefile Outdated Show resolved Hide resolved
waku/waku_sync.nim Outdated Show resolved Hide resolved
waku/waku_sync/codec.nim Outdated Show resolved Hide resolved
waku/waku_sync/protocol.nim Outdated Show resolved Hide resolved
waku/waku_sync/protocol.nim Outdated Show resolved Hide resolved
waku/waku_sync/protocol.nim Outdated Show resolved Hide resolved
waku/waku_sync/protocol.nim Outdated Show resolved Hide resolved
waku/waku_sync/protocol.nim Outdated Show resolved Hide resolved
break

for (hash, timestamp) in elements:
self.storage.erase(timestamp, hash).isOkOr:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you are right @SionoiS
We need a mechanism to measure the memory occupied by negentropy.

By using C++ we are at high risk of having memory leaks there, for example, this sounds like a candidate that we need to enhance: https://github.com/waku-org/negentropy/blob/311a21a22bdb6d80e5c4ba5e3d2f550e0062b2cb/cpp/negentropy_wrapper.cpp#L188


I agree @NagyZoltanPeter , we need to review the library as well :)

waku/node/waku_node.nim Show resolved Hide resolved
Copy link
Contributor

@gabrielmer gabrielmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woow, thanks so much! Amazing work!

Started reviewing and will be adding batches of comments/questions :)

waku/waku_sync/protocol.nim Outdated Show resolved Hide resolved
waku/waku_sync/protocol.nim Outdated Show resolved Hide resolved
waku/waku_sync/protocol.nim Outdated Show resolved Hide resolved
waku/waku_core/time.nim Outdated Show resolved Hide resolved
waku/waku_sync/storage_manager.nim Outdated Show resolved Hide resolved
@SionoiS
Copy link
Contributor Author

SionoiS commented Aug 1, 2024

Let me know when I can rebase I don't want to mess your reviews.

Copy link
Contributor

@NagyZoltanPeter NagyZoltanPeter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Highest level, pretty, neat and awesome feature. Thank you!!!
I will love to see it working.
Left some observations, mostly questions.

waku/waku_sync/protocol.nim Outdated Show resolved Hide resolved
waku/waku_sync/session.nim Show resolved Hide resolved
waku/waku_sync/session.nim Outdated Show resolved Hide resolved
waku/waku_sync/common.nim Outdated Show resolved Hide resolved
waku/waku_sync/protocol.nim Outdated Show resolved Hide resolved
tries = 3
while true:
(await callback(hashes, peer.peerId)).isOkOr:
error "transfer callback failed", error = $error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we continously retry, why is this an error? If its an error why we think it will be ok at next try?
I'm asking because of operator point of view, when one can see repetitive errors in log, what he/she supposed to do against?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True maybe log error when done trying?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Is there threshold when we can consider giving up probing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No other protocol has retry mechanism it's up to us the define good params. 3 tries was chosen arbitrarily.

peer: RemotePeerInfo
tries = 3

while true:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly I lost in the soooo many break statements....

Copy link
Contributor Author

@SionoiS SionoiS Aug 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's called break dance. 🤣

Does Nim have labels for loop control break: myLoop or continue: myOtherLoop ? I don't think so?

Copy link
Contributor

@NagyZoltanPeter NagyZoltanPeter Aug 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it does not have, while using while true: and breaks seems to me controversial.
Once upon a time there was an article, Goto statement considered harmful, which is resonates with this, especially with two embeded while true:... reading it is not easy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IDK how loops can be avoided. I'm open to suggestion.

@@ -33,3 +33,16 @@ template nanosecondTime*(collector: Gauge, body: untyped) =
metrics.set(collector, nowInUnixFloat() - start)
else:
body

# Unused yet
#[ proc timestampInSeconds*(time: Timestamp): Timestamp =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we think (suppose) the value will be an actual time value (not a diff), we don't need to convert it to string. Isn't that cheaper to just compare to some numeric value? Am I miss something? Btw why we expect the value may vary between milli/micro and nano secs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know. @chaitanyaprem wrote this part maybe he can answer.

if hashes.len > 0 and self.transferCallBack.isSome():
let callback = self.transferCallBack.get()

(await callback(hashes, conn.peerId)).isOkOr:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel it, but still it can be an option to move out these refs out of this protocol into a separate callback setup place. That would eliminate the need for refs of WakuArchive and WakuStoreClient.
Now they need to be provided with new*.
In other words by separating setting up the callback from the protocol will move away the decision how that is implemented. Also enables mocking it for testing. WDYT?

waku/node/waku_node.nim Outdated Show resolved Hide resolved
@NagyZoltanPeter
Copy link
Contributor

NagyZoltanPeter commented Aug 1, 2024

Let me know when I can rebase I don't want to mess your reviews.

Left some comments, so from my side you can go with it.

Copy link
Collaborator

@Ivansete-status Ivansete-status left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for it 💯 ! I think we can have some more simplification :P

waku/waku_sync/protocol.nim Show resolved Hide resolved
waku/waku_sync/protocol.nim Show resolved Hide resolved
waku/waku_sync/protocol.nim Show resolved Hide resolved
waku/waku_sync/protocol.nim Show resolved Hide resolved
waku/waku_sync/protocol.nim Show resolved Hide resolved
waku/waku_sync/protocol.nim Show resolved Hide resolved
waku/waku_sync/protocol.nim Show resolved Hide resolved

peer

let connOpt = await self.peerManager.dialPeer(peer, WakuSyncCodec)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should also make sure that we properly close the connections, would test and take a look that open connections don't grow indefinitely

Copy link
Collaborator

@Ivansete-status Ivansete-status left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for the patience! 💯

@SionoiS SionoiS force-pushed the feat--nwaku-sync branch 3 times, most recently from f987bb9 to 499c5e6 Compare August 13, 2024 11:23
@SionoiS SionoiS merged commit 2cc86c5 into master Aug 13, 2024
8 of 9 checks passed
@SionoiS SionoiS deleted the feat--nwaku-sync branch August 13, 2024 11:27
jakubgs added a commit that referenced this pull request Aug 21, 2024
We shouldn't assume it exists on the host.

Introduced by:
#2403

Signed-off-by: Jakub Sokołowski <[email protected]>
jakubgs added a commit that referenced this pull request Aug 21, 2024
We shouldn't assume it exists on the host.

Introduced by:
#2403

Signed-off-by: Jakub Sokołowski <[email protected]>
jakubgs added a commit that referenced this pull request Aug 21, 2024
We shouldn't assume it exists on the host.

Introduced by:
#2403

Signed-off-by: Jakub Sokołowski <[email protected]>
gabrielmer pushed a commit that referenced this pull request Aug 22, 2024
We shouldn't assume it exists on the host.

Introduced by:
#2403

Signed-off-by: Jakub Sokołowski <[email protected]>
gabrielmer pushed a commit that referenced this pull request Aug 22, 2024
We shouldn't assume it exists on the host.

Introduced by:
#2403

Signed-off-by: Jakub Sokołowski <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

7 participants