-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: don't allocate in UDP recv path #2184
Conversation
ea91ae6
to
c2520e2
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2184 +/- ##
=======================================
Coverage 95.39% 95.39%
=======================================
Files 112 112
Lines 36373 36447 +74
=======================================
+ Hits 34697 34769 +72
- Misses 1676 1678 +2 ☔ View full report in Codecov by Sentry. |
Failed Interop TestsQUIC Interop Runner, client vs. server neqo-latest as client
neqo-latest as server
All resultsSucceeded Interop TestsQUIC Interop Runner, client vs. server neqo-latest as client
neqo-latest as server
Unsupported Interop TestsQUIC Interop Runner, client vs. server neqo-latest as client
neqo-latest as server
|
c2520e2
to
0dcc9b2
Compare
Benchmark resultsPerformance differences relative to b5e0374. coalesce_acked_from_zero 1+1 entries: Change within noise threshold.time: [98.917 ns 99.193 ns 99.481 ns] change: [-0.9459% -0.5340% -0.0999%] (p = 0.02 < 0.05) coalesce_acked_from_zero 3+1 entries: Change within noise threshold.time: [117.01 ns 117.37 ns 117.75 ns] change: [-0.8819% -0.5415% -0.2079%] (p = 0.00 < 0.05) coalesce_acked_from_zero 10+1 entries: Change within noise threshold.time: [116.51 ns 116.85 ns 117.29 ns] change: [-1.3921% -0.8099% -0.3062%] (p = 0.00 < 0.05) coalesce_acked_from_zero 1000+1 entries: No change in performance detected.time: [96.717 ns 101.78 ns 112.97 ns] change: [-2.1810% +0.6453% +6.4132%] (p = 0.83 > 0.05) RxStreamOrderer::inbound_frame(): Change within noise threshold.time: [111.61 ms 111.67 ms 111.74 ms] change: [-0.3427% -0.2718% -0.1991%] (p = 0.00 < 0.05) transfer/pacing-false/varying-seeds: No change in performance detected.time: [25.850 ms 26.759 ms 27.654 ms] change: [-7.9363% -3.0643% +2.2316%] (p = 0.24 > 0.05) transfer/pacing-true/varying-seeds: No change in performance detected.time: [33.889 ms 35.649 ms 37.426 ms] change: [-6.8098% -0.1727% +6.8300%] (p = 0.96 > 0.05) transfer/pacing-false/same-seed: No change in performance detected.time: [25.294 ms 25.955 ms 26.623 ms] change: [-6.4483% -2.4512% +1.7271%] (p = 0.25 > 0.05) transfer/pacing-true/same-seed: No change in performance detected.time: [39.871 ms 41.886 ms 43.943 ms] change: [-9.8397% -3.6845% +3.0139%] (p = 0.27 > 0.05) 1-conn/1-100mb-resp/mtu-1500 (aka. Download)/client: No change in performance detected.time: [871.83 ms 880.49 ms 889.33 ms] thrpt: [112.44 MiB/s 113.57 MiB/s 114.70 MiB/s] change: time: [-0.7754% +0.5489% +1.9642%] (p = 0.44 > 0.05) thrpt: [-1.9264% -0.5459% +0.7815%] 1-conn/10_000-parallel-1b-resp/mtu-1500 (aka. RPS)/client: No change in performance detected.time: [316.33 ms 319.33 ms 322.42 ms] thrpt: [31.015 Kelem/s 31.315 Kelem/s 31.613 Kelem/s] change: time: [-1.3503% -0.0743% +1.3366%] (p = 0.92 > 0.05) thrpt: [-1.3190% +0.0743% +1.3688%] 1-conn/1-1b-resp/mtu-1500 (aka. HPS)/client: No change in performance detected.time: [33.780 ms 34.009 ms 34.259 ms] thrpt: [29.189 elem/s 29.404 elem/s 29.603 elem/s] change: time: [-1.5083% -0.5383% +0.4334%] (p = 0.29 > 0.05) thrpt: [-0.4315% +0.5412% +1.5314%] 1-conn/1-100mb-resp/mtu-65536 (aka. Download)/client: 💚 Performance has improved.time: [111.51 ms 111.80 ms 112.08 ms] thrpt: [892.23 MiB/s 894.46 MiB/s 896.77 MiB/s] change: time: [-2.0616% -1.5830% -1.0883%] (p = 0.00 < 0.05) thrpt: [+1.1003% +1.6085% +2.1050%] 1-conn/10_000-parallel-1b-resp/mtu-65536 (aka. RPS)/client: No change in performance detected.time: [315.47 ms 318.63 ms 321.84 ms] thrpt: [31.072 Kelem/s 31.384 Kelem/s 31.699 Kelem/s] change: time: [-1.0625% +0.4502% +1.9110%] (p = 0.55 > 0.05) thrpt: [-1.8752% -0.4482% +1.0739%] 1-conn/1-1b-resp/mtu-65536 (aka. HPS)/client: No change in performance detected.time: [33.916 ms 34.102 ms 34.308 ms] thrpt: [29.148 elem/s 29.324 elem/s 29.484 elem/s] change: time: [-0.2857% +0.5354% +1.3188%] (p = 0.21 > 0.05) thrpt: [-1.3016% -0.5325% +0.2865%] Client/server transfer resultsTransfer of 33554432 bytes over loopback.
|
7c23ff6
to
af47ac2
Compare
This is the final pull request in the stack. All commits, but the first (08a0003), have been reviewed and merged through the 3 previous stacked pull requests. @martinthomson and @larseggert can you do one more review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Previously the payload type of `Datagram` was `Vec<u8>`, thus the `Datagram` always owned the UDP datagram payload. This change enables `Datagram` to represent both (a) an owned payload allocation (i.e. `Vec<u8>`), but also represent (b) a view into an existing payload (e.g. a long lived receive buffer via `&[u8]`). The default payload type stays `Vec<u8>`, thus not breaking existing usage of `Datagram`.
Previously `process_input` (and the like) took a `Datagram<Vec<u8>>` as input. In other words, it required allocating each UDP datagram payload in a dedicated `Vec<u8>` before passing it to `process_input`. With this patch, `process_input` accepts a `Datagram<&[u8]>`. In other words, it accepts a `Datagram` with a borrowed view into an existing buffer (`&[u8]`), e.g. a long lived receive buffer.
Previously `recv_inner` would return `Datagram<Vec<u8>>`. In other words, it would allocate a new `Vec<u8>` for each UDP datagram payload. Now `recv_inner` reads into a provided buffer and returns `Datagram<&[u8]>`, i.e. it returns a view into the provided buffer without allocating.
Pass a long lived receive buffer to `neqo_udp::recv_inner`, receiving an iterator of `Datagram<&[u8]>`s pointing into that buffer, thus no longer allocating in UDP receive path.
af47ac2
to
2b322d7
Compare
@mxinden is this good to go? Nothing more pending? |
Nothing else pending from my end @larseggert. Please merge in case you have no further comments. I will propose a new release once merged. Excited to get this into Firefox. |
Thank you for the help here! For early CPU profiles of this patch in Firefox see #2198 (comment) |
This is the final stacked pull request combining the following commits:
feat(common): make Datagram generic over payload type
Previously the payload type of
Datagram
wasVec<u8>
, thus theDatagram
always owned the UDP datagram payload.This change enables
Datagram
to represent both (a) an owned payload allocation (i.e.Vec<u8>
), but also represent (b) a view into an existing payload (e.g. a long lived receive buffer via&[u8]
).The default payload type stays
Vec<u8>
, thus not breaking existing usage ofDatagram
.feat(transport): accept borrowed instead of owned input datagram
Previously
process_input
(and the like) took aDatagram<Vec<u8>>
as input. In other words, it required allocating each UDP datagram payload in a dedicatedVec<u8>
before passing it toprocess_input
.With this patch,
process_input
accepts aDatagram<&[u8]>
. In other words, it accepts aDatagram
with a borrowed view into an existing buffer (&[u8]
), e.g. a long lived receive buffer.feat(udp): return borrowed Datagram on receive
Previously
recv_inner
would returnDatagram<Vec<u8>>
. In other words, it would allocate a newVec<u8>
for each UDP datagram payload.Now
recv_inner
reads into a provided buffer and returnsDatagram<&[u8]>
, i.e. it returns a view into the provided buffer without allocating.feat(bin): don't allocate in UDP recv path
Pass a long lived receive buffer to
neqo_udp::recv_inner
, receiving an iterator ofDatagram<&[u8]>
s pointing into that buffer, thus no longer allocating in UDP receive path.Extracted out of #2093.
Part of #1693.
Pull request stack: