-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Nwaku Sync #2403
feat: Nwaku Sync #2403
Conversation
You can find the image built from this PR at
Built from 21f4fd5 |
3635cfc
to
6bc4e3a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially imagined writing our own wrapper protocol around the underlying bindings (rather than just passing the payload directly onto the wire). For now, this approach will lead us to faster POC and tests, so seems reasonable to me. Do you have any intuition for what our longer term approach will be?
Negentropy protocol has only one option |
2f42740
to
0b5a025
Compare
203c0fc
to
a31d630
Compare
Sorry I squashed everything and loss your work attribution @chaitanyaprem |
Ah..hope it won't be considered as no contributions from me for the past few weeks :) Maybe you can mention me as |
a31d630
to
9fca248
Compare
bb10593
to
a6ea792
Compare
I notice that requests are received twice and that connections are not closed properly. I will investigate further. |
You can find the image built from this PR at
Built from 760d6b2 |
You can find the image built from this PR at
Built from 760d6b2 |
I have stashed an update with a new pruning mechanism for after the state machine PR is merged. |
58d78d0
to
0ae1858
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this. Really great effort and good work (also @chaitanyaprem). I do have a few comments below which I would appreciate your looking into. Approving, though, as I do not want to stand in the way of deployed experiments starting.
import std/[strutils], chronicles, std/options, stew/[results, byteutils], confutils | ||
import ../waku_core/message | ||
|
||
const negentropyPath = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there no way to compile a static library into our binary here?
If not, note that this will be a resource that also needs to be included in our Docker images (and, as Prem noted elsewhere, into nwaku-compose)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be possible, but as per recommendations noted in nim language bindings guide in status..I chose share lib.
@@ -193,6 +195,125 @@ proc connectToNodes*( | |||
# NOTE Connects to the node without a give protocol, which automatically creates streams for relay | |||
await peer_manager.connectToNodes(node.peerManager, nodes, source = source) | |||
|
|||
## Waku Sync | |||
|
|||
proc mountWakuSync*( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still a bit hard for me to see why this would not be better managed in a composite protocol WakuStoreSync
:). I understand the objection to arbitrarily creating dependencies, but in this case we are defining a composite protocol that depends unidirectionally on wakuSync
, a local archive
and a storeClient
. This is now simply implicit in the mount code, but imho can be better managed if the relationship is made explicit in a separate module. We should also expect this to become more complex in future, with e.g. sanity checks for local archive and negentropy configurations, penalising non-responsive/misbehaving store sync peers, etc. In fact, the interrelationship between all these elements to form Store Sync should likely be a specification on its own.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kind of agree, but maybe for now we can leave it as is and refactor this once we start next phase of work which involves handling more scenarios etc?
After discussing with @jm-clius I will have to add a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great one! Congrats for it! 🙌
I've added a bunch of comments, some of them are pedantic and others are potentially more interesting :)
Very insightful PR
break | ||
|
||
for (hash, timestamp) in elements: | ||
self.storage.erase(timestamp, hash).isOkOr: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes you are right @SionoiS
We need a mechanism to measure the memory occupied by negentropy.
By using C++ we are at high risk of having memory leaks there, for example, this sounds like a candidate that we need to enhance: https://github.com/waku-org/negentropy/blob/311a21a22bdb6d80e5c4ba5e3d2f550e0062b2cb/cpp/negentropy_wrapper.cpp#L188
I agree @NagyZoltanPeter , we need to review the library as well :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woow, thanks so much! Amazing work!
Started reviewing and will be adding batches of comments/questions :)
Let me know when I can rebase I don't want to mess your reviews. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Highest level, pretty, neat and awesome feature. Thank you!!!
I will love to see it working.
Left some observations, mostly questions.
waku/waku_sync/protocol.nim
Outdated
tries = 3 | ||
while true: | ||
(await callback(hashes, peer.peerId)).isOkOr: | ||
error "transfer callback failed", error = $error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we continously retry, why is this an error? If its an error why we think it will be ok at next try?
I'm asking because of operator point of view, when one can see repetitive errors in log, what he/she supposed to do against?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True maybe log error when done trying?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Is there threshold when we can consider giving up probing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No other protocol has retry mechanism it's up to us the define good params. 3 tries was chosen arbitrarily.
peer: RemotePeerInfo | ||
tries = 3 | ||
|
||
while true: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly I lost in the soooo many break statements....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's called break dance. 🤣
Does Nim have labels for loop control break: myLoop
or continue: myOtherLoop
? I don't think so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it does not have, while using while true:
and breaks seems to me controversial.
Once upon a time there was an article, Goto statement considered harmful, which is resonates with this, especially with two embeded while true:
... reading it is not easy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IDK how loops can be avoided. I'm open to suggestion.
@@ -33,3 +33,16 @@ template nanosecondTime*(collector: Gauge, body: untyped) = | |||
metrics.set(collector, nowInUnixFloat() - start) | |||
else: | |||
body | |||
|
|||
# Unused yet | |||
#[ proc timestampInSeconds*(time: Timestamp): Timestamp = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we think (suppose) the value will be an actual time value (not a diff), we don't need to convert it to string. Isn't that cheaper to just compare to some numeric value? Am I miss something? Btw why we expect the value may vary between milli/micro and nano secs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know. @chaitanyaprem wrote this part maybe he can answer.
if hashes.len > 0 and self.transferCallBack.isSome(): | ||
let callback = self.transferCallBack.get() | ||
|
||
(await callback(hashes, conn.peerId)).isOkOr: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel it, but still it can be an option to move out these refs out of this protocol into a separate callback setup place. That would eliminate the need for refs of WakuArchive and WakuStoreClient.
Now they need to be provided with new*.
In other words by separating setting up the callback from the protocol will move away the decision how that is implemented. Also enables mocking it for testing. WDYT?
Left some comments, so from my side you can go with it. |
5c14b43
to
4c310dd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for it 💯 ! I think we can have some more simplification :P
|
||
peer | ||
|
||
let connOpt = await self.peerManager.dialPeer(peer, WakuSyncCodec) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should also make sure that we properly close the connections, would test and take a look that open connections don't grow indefinitely
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for the patience! 💯
f987bb9
to
499c5e6
Compare
Co-authored-by: Ivan FB <[email protected]> Co-authored-by: Prem Chaitanya Prathi <[email protected]>
499c5e6
to
97c050f
Compare
We shouldn't assume it exists on the host. Introduced by: #2403 Signed-off-by: Jakub Sokołowski <[email protected]>
We shouldn't assume it exists on the host. Introduced by: #2403 Signed-off-by: Jakub Sokołowski <[email protected]>
We shouldn't assume it exists on the host. Introduced by: #2403 Signed-off-by: Jakub Sokołowski <[email protected]>
We shouldn't assume it exists on the host. Introduced by: #2403 Signed-off-by: Jakub Sokołowski <[email protected]>
We shouldn't assume it exists on the host. Introduced by: #2403 Signed-off-by: Jakub Sokołowski <[email protected]>
Description
This is the new Waku Sync protocol. It works by wrapping the Negentropy protocol which implements RBSR.
See this research post for more info.
At configurable intervals (A sync can also be requested ad-hoc), a sync request is sent to a peer chosen by the peer manager. After multiple requests and responses between the 2 peers, a callback containing the list of missing hashes is triggered. With the list of missing hashes, store is used to request the messages from the other node.
WIP spec -> https://github.com/waku-org/specs/blob/feat--waku-sync/standards/core/sync.md
Changes
waku_node.nim
mount the new protocol.Tracking