-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discv5 factorize #480
base: master
Are you sure you want to change the base?
Discv5 factorize #480
Conversation
Messages and message encoding has nothing to do with the underlying authenticated communications framework. Separate these two.
Protocol was actually made of two sub-protocols. * a lower-half handling authentication, encryption, key exchange, and request/response. This is now called Transport. * an upper-half handling DHT messages. This is still called Protocol. Separation of these two reduces dependencies and simplifies modifications to the protocol. Signed-off-by: Csaba Kiraly <[email protected]>
Note that the test does not compile, but it was also not compiling before Signed-off-by: Csaba Kiraly <[email protected]>
- interface between Transport and Protocol is at the encoded Message level
Node already has the address, so it does not make sense to pass it as a separate parameter.
This is a request part of a Request/Response, generating also a reques ID. So call it what it is.
This completes the Request/Response semantics.
It is better to rely on protocol.nim to do all the encoding, thus cleaning up dependencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This set of patches tries to break up protocols.nim and encoding.nim into smaller chunks of code, reducing dependencies and simplifying modifications to the discv5 code.
The main part is to separate protocol.nim into two layers:
I'm fine with doing something like this. I'd like to hear a bit more about the context and your use case. What are you trying to achieve?
I've thought in the past about making parts of discv5 generic. But I was then specifically thinking about doing it at a layer which skips the whole wire protocol of the discv5 spec. So, generic nodes queries / lookups. (And on the other side of the spectrum, perhaps the possibility of switching also the actual UDP transport). This was in the context of reusing most of the code, except the wire protocol, for discv4. However, never got to that considering I was unsure if we would still need discv4 and thus if it was worthy the effort. This resurfaced however in the context of our Fluffy project.
With your suggestion in this PR, we split of the messages and their encoding (rlp), from the packet encoding. That sounds good, but it would only allow you to swap in/out either the packet encoding or the protocol messages. So I'm assuming you are planning to reuse only one of these 2 parts in the dagger project?
I think most of the other suggestions are fine too, but I'll do a more thorough review later on.
I would probably drop the message_encoding just in the messages file. There is already rlp encoding going on there anyhow, no need for a new file. Unless you want to move all rlp encoding out there too, and only keep the message definitions, in order to abstract that part also.
Indeed, I see I’ve left some of the RPL part. I’m adding a patch to move that out as well. I think it would make sense. Now, the broader context: The main thing we need is a fast DHT, that implements the primitives of the libp2p-dht spec, and integrates well with the libp2p world, implemented in Nim. Ideally, It should be useful both for libp2p-dht (which relies on the underlying stream transport), and for a dht that skips the overhead coming with those streams and goes UDP. As you can see your code ticks many of these boxes. It is fast, UDP based, with good datagram crypto, it is in Nim, and it already went through several iterations. There are however some things that would need changing, and there are some conflicting requirements: Now one by one:
|
It certainly does, thanks!
Moving the messages encoding is definitely fine. Moving what you call the It would be good to know how far this gets you already, some additional remarks/info on your points:
Yes, additional messages could be done over talk. And you could choose your encoding, but the outer layer needs to be rlp encoded byte string. And as you mentioned, there are some constraints such as the multiple response problem.
Well, they could I guess, but it would be ugly, see above.
The packet encoding doesn't use rlp, only the messages and the ENR. For the messages it seems solved with this PR if you make a custom messages definition / encoding?
I'd have to read up what "libp2p signed PeerRecord" are, but assuming they are similar to ENRs in the sense that give same security guarantees?
Well, not sure what can be done here. Depends if the goal is to have a DHT that can be used efficiently to lookup, or if it is to follow purely libp2p-dht specs. Discv5 has indeed altered the FindNodes message slightly to be distance based, compared to just passing the target like in discv4 or just original Kademlia. IIRC this was adjusted so that a node would know when another node is returning not really its closest neighbours (whether due to bug or malicious). |
Sorry for coming back to this only now.
Naming was just a first shot. It is essentially a "SecureDatagramTransport" or "AuthenticatedEncryptedDatagramTransport", I guess.
Right, I think I was confused by still leaving some RLP code in decodeHandshakePacket. For the Node, it really goes hand-in-hand with the ENR I think. I see it mostly as an in-memory representation of the ENR.
They are similar in their base functionality of having the underlay address(es) info and the public key, all signed, although without ENR's generic extensible key/value pairs.
Just curious, do you have a link to an evaluation of the cost of this in lookup speed? |
@cskiraly what is the state of this PR? In general, we have a lot of copies of the discovery and kademlia in particular spread out over the projects - it might be a good time to review these and see if they can be unified perhaps |
I might pick up some of this PR while I look into reusing more discv5 code in discv4 |
@kdeme, I'm now back at this codebase, so I will check how can we rebase this to the current master |
This set of patches tries to break up protocols.nim and encoding.nim into smaller chunks of code,
reducing dependencies and simplifying modifications to the discv5 code.
The main part is to separate protocol.nim into two layers:
Another patch separates encoding related to the lower-half from encoding related to the upper-half.
Other patches try to improve semantics of various send messages, and remove some duplicated code.
The last patch removes some dependencies, but also an export from enr.nim. I can imagine this could create
issues, so it can be removed from the set.