Get packet buffer size from operating system #330

XAMPPRocky · 2021-07-12T09:53:56Z

Currently Quilkin always allocates a u16::MAX length buffer for every message, this is incorrect for a number of reasons. Firstly, the max length is slightly less than that as it's really 65535 - (IP + UDP headers), further is far too large compared to realistic workloads which are typically in the 200–500 bytes range (to prevent IPv4 fragmentation), and is much bigger than what most OS' set as the limit for UDP payloads. For example; macOS by default has a limit of 9216.

Just testing #321 with different workloads and buffer sizes, having a lower buffer has a significant impact on the performance. For example having a buffer of 1500 bytes had a 15% increase in throughput for smaller workloads. So instead of always using the maximum, we should try to be a bit smarter about how much we allocate. A good default would be to query the OS for the maximum UDP buffer, I also think it would be worth adding a server level option for setting a buffer limit manually allowing you to lower it for when you're using it with a protocol with a smaller MTU.

quilkin/src/proxy/server.rs

Lines 240 to 242 in 35005f0

    
           // Initialize a buffer for the UDP packet. We use the maximum size of a UDP 
        
           // packet, which is the maximum value of 16 a bit integer. 
        
           let mut buf = [0; 1 << 16];

The text was updated successfully, but these errors were encountered:

iffyio · 2021-07-14T06:31:03Z

The u16::MAX buffer isn't allocated per packet, notice its outside of the loop - we reserve that space inline once on startup (the ip+udp header is so small it shouldnt be significant here). for each packet we allocate its exact size

quilkin/src/proxy/server.rs

Line 250 in 35005f0

.send((recv_addr, (&buf[..size]).to_vec()))

It's interesting that there's a significant difference with different buffer sizes then, can't really see why that would be the case but I can imagine the future will be pretty big due to the 64K buffer so that moveing it at any point will be a more expensive operation - I thought this would only be done once on startup too but maybe (pure speculation :) tokio internally moves futures e.g while it schedules tasks? Otherwise I suspect the benchmark is noisy such that startup changes are significantly affecting its result.

I'd suggest to instead move the buffer to the heap (i.e use a Vec) in any case to avoid the big future issue and see how far that goes, I suspect that would resolve this issue. Adding some cross platform code to detect OS buffer size complicates things and this doesnt feel worthy of a config option so would be nice to avoid those if we can.

XAMPPRocky · 2021-07-14T07:16:48Z

Well you can try the benchmark yourself, it records the previous and tells you the diff , it will tell you if it's within the noise threshold. I included an example output (though not from the same run) Some of it is probably noise, but yeah it is also not great that we're allocating 65kb on the stack or in general, because if we want to run a lot of instances of Quilkin it will add up pretty quickly. We'd be able to get a more fine grained view of the performance in tokio with #317.

Adding some cross platform code to detect OS buffer size complicates things and this doesnt feel worthy of a config option so would be nice to avoid those if we can.

Yeah I wouldn't want to add the OS code to Quilkin itself, not the least because it would be generally more useful to other projects, but if someone in the community makes or has already made a cross platform lib that handles all the complexity and we just get the size (similar to num_cpus), I wouldn't see an issue with adding it.

Allocating the 64K buffer on the stack makes the future expensive to move. This allocates the buffer on the heap instead. Noticed some significant perf improvements in load tests via https://github.com/majek/dump/tree/master/how-to-receive-a-million-packets Tried using the naive benchmark from #321 but that doesn't seem consisten atm - constantly got perf regressions/improvements on reruns even with no code change (I think either running both the proxy and the mock server within the same process/scheduler adds too much noise or we don't have a large enough unit of work) Relates to #330

* Move packet buffer to heap Allocating the 64K buffer on the stack makes the future expensive to move. This allocates the buffer on the heap instead. Noticed some significant perf improvements in load tests via https://github.com/majek/dump/tree/master/how-to-receive-a-million-packets Tried using the naive benchmark from #321 but that doesn't seem consisten atm - constantly got perf regressions/improvements on reruns even with no code change (I think either running both the proxy and the mock server within the same process/scheduler adds too much noise or we don't have a large enough unit of work) Relates to #330 * Initialize buffer Co-authored-by: Mark Mandel <[email protected]>

iffyio mentioned this issue Jul 15, 2021

Move packet buffer to heap #334

Merged

XAMPPRocky added priority/low Issues that don't need to be addressed in the near term. and removed priority/medium Issues that we want to resolve, but don't require immediate resolution. labels Jul 15, 2021

XAMPPRocky mentioned this issue Jul 21, 2021

Support UDP Segmentation Offload #344

Closed

XAMPPRocky added the area/networking Related to networking I/O label Oct 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Get packet buffer size from operating system #330

Get packet buffer size from operating system #330

XAMPPRocky commented Jul 12, 2021

iffyio commented Jul 14, 2021

XAMPPRocky commented Jul 14, 2021

Get packet buffer size from operating system #330

Get packet buffer size from operating system #330

Comments

XAMPPRocky commented Jul 12, 2021

iffyio commented Jul 14, 2021

XAMPPRocky commented Jul 14, 2021