-
Notifications
You must be signed in to change notification settings - Fork 452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster Tunnel crypto by re-implementing IPv8 crypto in "Rust" programming language #4567
Comments
@synctext could i be assigned too? thanks! |
also, could the main repo link be edited to this? https://github.com/ip-v8/rust-ipv8 |
the other assignees were correct though :) |
I edited the link in the first post. |
Also, not entirely related to this ticket, but could you guys let me know when the message serialization method is working? I have an experiment where I test the market community under a high system load and it seems that one of the bottlenecks is IPv8 message serialization. If there is something faster, I can integrate it into this experiment and 1) check the speed improvement and 2) check your implementation for correctness. 👍 |
Yes of course. We use the Serde library for it which provides zero copy serialization, extendable to custom objects. In many cases we could basically use the default serialization strategy as python does this too with the struct library. We provide some "atoms" which correspond to python types which we guarantee to be serialized in a certain way. An example of this is the Varlen16 struct which guarantees to serialize to 2 bytes of length + data of max 65536 bytes. There is also a Varlen8, Varlen32 and Varlen64 (which arent strictly necessary as pyipv8 doesnt use them, but adding them was really easy and might prove useful). Similarly we have a Bits struct which serializes to an u8. etc. For these we implemented a custom serde serializer process. As long as you use these types in your structs instead of the builtin rust types you are guaranteed that python is able to interpret them. Serde obviously does all the hard work. Now about the coupling to your python project. As we haven't even started to make the python FFI you would have to do this yourself. This won't be too easy and it also is what we are starting on next week. So to use it in the market community will be something you have to figure out on your own mostly. (or you wait a few more weeks) However, verifying our serializer would be highly appreciated so go ahead! We already have a lot of testcases but more never hurts. More detailed explanations of the serializer can also be found in comments around it as we document our code pretty well in my opinion. We also generate a documentation page which can be found here. |
Update about the project itself: we now run our tests on windows and mac too which might be useful to know. In the past only linux testing was performed |
@jonay2000 thanks for the update! It's not an urgent need but it would allow me to squeeze out more throughput under high load 👍 |
Yes, that would be great. Honestly, that's why we make rust-ipv8. To remove some bottlenecks in certain communities. We hope to have a working FFI to python around the end of august. |
We have done some benchmarks of the system and they achieve some very promising results: The deserializing and signature verifying of a packet takes 57 us (microseconds) per packet. Taking this 57us time- taken as an average over about 500,000 runs, running on one thread of one core of a desktop cpu with a clockspeed of 3.5 Ghz - this means we can process around 3.5 megabytes a second per cpu core. (almost twice that with hyperthreading) . This all gives you a theoretical througput of 2.8 terabytes per day on just one core of one cpu. Now there will be some overhead of the communities themselves - we know that - but even if you halve this speed this will be a great improvement of the system in place right now. However we don't think the impact of communties will be even close to this much as signature verification is very clearly the bottleneck of the system. Another idea we have, which can be implemented in the future, is batch verification. At this moment the verification of one ED25519 signature takes 273364 instructions on a modern x86 cpu. When we wait with verification until multiple packets have come in one could theoretically half this number. Although we haven't looked into how we could do this, nor have we planned to do this any time soon, this could greatly improve speeds. (note: 64 ed255129 signatures in a batch takes 134000 instructions per signature checked) We have even noticed that we are (marignally) faster than some quite optimized alternatives to our verification process. We expect this to be due to the link time optimization we do which does drastically increase compile time (double to triple) but yields a speed bonus of around 1.5% for crypto and 50% for serialization/deserialization. (though the ser/de is still way faster than the crypto) More optimizing will be done and more benchmarks will be taken for sure. P.S. note that we do true multithreading so all speeds stated will roughly scale with core count. |
Great progress. Please consider integrating as early as possible with Python and our complete exe build process. That has been identified as the cardinal pain point. For instance, merely owning the udp socket and acting as a "Rust proxy" with the rest in Python. Can be parallel development track. |
I would advise against any buffer in a networking library. You should treat packets like hot potatoes: never hold on to them. In this case, it's better to be a bit less efficient. Two examples: (1) overzealous use of buffers and batching led to packets having a propagation time of up to 10 seconds in Dispersy and (2) a bit more generally, the problem of bufferbloat. |
Also, the batching mechanism in Dispersy made it a nightmare for developers to debug tests, since it was very hard to see whether a message is/was in the buffer, is being processed or has been processed. It was also one of the reasons why these individual tests could take up to 10 seconds to complete. I agree with @qstokkink here, we learned from buffering messages in Dispersy and it is not a mechanism I would like to see back, even if it (marginally) improves performance. |
Alright that's very clear. No batch processing. Thanks for that feedback. We wouldn't have done it for months anyways but now we won't even research the idea. |
@jonay2000 , could you guys please measure the performance of processing Tunnel Community AES-GCM encrypted packets? That is where our real bottleneck is. |
will do, probably done this week |
@ichorid Does tribler/ipv8 use AES-128-GCM or AES-256-GCM? |
@jonay2000 , I guess it's 128. You better ask @egbertbouman about this to be sure. |
@ichorid You're right, it's |
Thank you both! |
ToDo, schedule update meeting |
Due to inactivity on this issue, I will move it to backlog. |
160 Mbit/sec download speed. Amazing work on Experimental release 😲 🚀 🎉 |
This is a long-term issue for re-implementing our crypto tunnel core for raw speed, zero-copy protocol stack, and usage of work stealing thread pool. Everything in Rust. This means leaving our ideally-suitable-for-rapid-prototyping Python stack, making it less easy to modify and freezing the wire format and behavior.
This issue partly addresses #1 issue Tribler anonymous downloads are fast and secure.
This is an honor students project at TUDelft, complimentary to ongoing tunnel re-factoring and tweaking like: #4459.
Current code repo: https://github.com/ip-v8/rust-ipv8
The text was updated successfully, but these errors were encountered: