Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get packet buffer size from operating system #330

Open
XAMPPRocky opened this issue Jul 12, 2021 · 2 comments
Open

Get packet buffer size from operating system #330

XAMPPRocky opened this issue Jul 12, 2021 · 2 comments
Labels
area/networking Related to networking I/O area/performance Anything to do with Quilkin being slow, or making it go faster. good first issue Good for newcomers help wanted Extra attention is needed kind/cleanup Refactoring code, fixing up documentation, etc priority/low Issues that don't need to be addressed in the near term.

Comments

@XAMPPRocky
Copy link
Collaborator

Currently Quilkin always allocates a u16::MAX length buffer for every message, this is incorrect for a number of reasons. Firstly, the max length is slightly less than that as it's really 65535 - (IP + UDP headers), further is far too large compared to realistic workloads which are typically in the 200–500 bytes range (to prevent IPv4 fragmentation), and is much bigger than what most OS' set as the limit for UDP payloads. For example; macOS by default has a limit of 9216.

Just testing #321 with different workloads and buffer sizes, having a lower buffer has a significant impact on the performance. For example having a buffer of 1500 bytes had a 15% increase in throughput for smaller workloads. So instead of always using the maximum, we should try to be a bit smarter about how much we allocate. A good default would be to query the OS for the maximum UDP buffer, I also think it would be worth adding a server level option for setting a buffer limit manually allowing you to lower it for when you're using it with a protocol with a smaller MTU.

quilkin/src/proxy/server.rs

Lines 240 to 242 in 35005f0

// Initialize a buffer for the UDP packet. We use the maximum size of a UDP
// packet, which is the maximum value of 16 a bit integer.
let mut buf = [0; 1 << 16];

@XAMPPRocky XAMPPRocky added good first issue Good for newcomers help wanted Extra attention is needed area/performance Anything to do with Quilkin being slow, or making it go faster. kind/cleanup Refactoring code, fixing up documentation, etc priority/medium Issues that we want to resolve, but don't require immediate resolution. labels Jul 12, 2021
@iffyio
Copy link
Collaborator

iffyio commented Jul 14, 2021

The u16::MAX buffer isn't allocated per packet, notice its outside of the loop - we reserve that space inline once on startup (the ip+udp header is so small it shouldnt be significant here). for each packet we allocate its exact size

.send((recv_addr, (&buf[..size]).to_vec()))

It's interesting that there's a significant difference with different buffer sizes then, can't really see why that would be the case but I can imagine the future will be pretty big due to the 64K buffer so that moveing it at any point will be a more expensive operation - I thought this would only be done once on startup too but maybe (pure speculation :) tokio internally moves futures e.g while it schedules tasks? Otherwise I suspect the benchmark is noisy such that startup changes are significantly affecting its result.

I'd suggest to instead move the buffer to the heap (i.e use a Vec) in any case to avoid the big future issue and see how far that goes, I suspect that would resolve this issue. Adding some cross platform code to detect OS buffer size complicates things and this doesnt feel worthy of a config option so would be nice to avoid those if we can.

@XAMPPRocky
Copy link
Collaborator Author

Well you can try the benchmark yourself, it records the previous and tells you the diff , it will tell you if it's within the noise threshold. I included an example output (though not from the same run) Some of it is probably noise, but yeah it is also not great that we're allocating 65kb on the stack or in general, because if we want to run a lot of instances of Quilkin it will add up pretty quickly. We'd be able to get a more fine grained view of the performance in tokio with #317.

Adding some cross platform code to detect OS buffer size complicates things and this doesnt feel worthy of a config option so would be nice to avoid those if we can.

Yeah I wouldn't want to add the OS code to Quilkin itself, not the least because it would be generally more useful to other projects, but if someone in the community makes or has already made a cross platform lib that handles all the complexity and we just get the size (similar to num_cpus), I wouldn't see an issue with adding it.

Screenshot 2021-07-14 at 09 02 49

iffyio added a commit that referenced this issue Jul 15, 2021
Allocating the 64K buffer on the stack makes the future expensive to move.
This allocates the buffer on the heap instead.

Noticed some significant perf improvements in load tests
via https://github.com/majek/dump/tree/master/how-to-receive-a-million-packets

Tried using the naive benchmark from #321 but that doesn't seem
consisten atm - constantly got perf regressions/improvements on reruns even with
no code change (I think either running both the proxy and the mock server within the same process/scheduler
adds too much noise or we don't have a large enough unit of work)

Relates to #330
@XAMPPRocky XAMPPRocky added priority/low Issues that don't need to be addressed in the near term. and removed priority/medium Issues that we want to resolve, but don't require immediate resolution. labels Jul 15, 2021
markmandel added a commit that referenced this issue Jul 19, 2021
* Move packet buffer to heap

Allocating the 64K buffer on the stack makes the future expensive to move.
This allocates the buffer on the heap instead.

Noticed some significant perf improvements in load tests
via https://github.com/majek/dump/tree/master/how-to-receive-a-million-packets

Tried using the naive benchmark from #321 but that doesn't seem
consisten atm - constantly got perf regressions/improvements on reruns even with
no code change (I think either running both the proxy and the mock server within the same process/scheduler
adds too much noise or we don't have a large enough unit of work)

Relates to #330

* Initialize buffer

Co-authored-by: Mark Mandel <[email protected]>
@XAMPPRocky XAMPPRocky added the area/networking Related to networking I/O label Oct 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/networking Related to networking I/O area/performance Anything to do with Quilkin being slow, or making it go faster. good first issue Good for newcomers help wanted Extra attention is needed kind/cleanup Refactoring code, fixing up documentation, etc priority/low Issues that don't need to be addressed in the near term.
Projects
None yet
Development

No branches or pull requests

2 participants