Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large NPM package response data causes timeout #337

Open
maxfisher-g opened this issue May 10, 2023 · 5 comments
Open

Large NPM package response data causes timeout #337

maxfisher-g opened this issue May 10, 2023 · 5 comments

Comments

@maxfisher-g
Copy link
Contributor

maxfisher-g commented May 10, 2023

After fixes for #139, remaining packages that still cause timeouts all have very large response data sizes (e.g. 6MB)

@calebbrown
Copy link
Contributor

Go appears to enable compression (gzip) by default on requests.

@calebbrown
Copy link
Contributor

Looking into this further, NPM supports HTTP2. I suspect that there is some weird behavior w.r.t timeouts and HTTP2 multiplexing.

@calebbrown
Copy link
Contributor

Inspecting the network traffic, there is only 1 connection being opened to https://registry.npmjs.org/, so yes, this is multiplexing queries over a single TCP+TLS connection to NPM.

This means that the large repos are congesting the single multiplexed HTTP2 connection. While the aggregate of the responses will be received faster than individual HTTP1 connections, each individual response is slower than if it was not multiplexed.

Some ideas for improving the performance:

  • remove the limited workers, and just fire all the requests concurrently.
  • add a small sleep (e.g. 0.1s) between each request to give the server some space.
  • increase the timeout significantly (e.g. 1m30s) to allow for the slower responses.

@maxfisher-g
Copy link
Contributor Author

Thanks for investigating this @calebbrown!

calebbrown added a commit that referenced this issue May 11, 2023
- 10 seconds should space out requests more to reduce congestion.
- Early requests don't have jitter appled to avoid slowing down small
  batch sizes.

See #337

Signed-off-by: Caleb Brown <[email protected]>
calebbrown added a commit that referenced this issue May 11, 2023
…w. (#340)

* Add a threshold so small batches don't have jitter. Bump jitter window.

- 10 seconds should space out requests more to reduce congestion.
- Early requests don't have jitter appled to avoid slowing down small
  batch sizes.

See #337

Signed-off-by: Caleb Brown <[email protected]>

* Switch to a simpler toggle.

Signed-off-by: Caleb Brown <[email protected]>

---------

Signed-off-by: Caleb Brown <[email protected]>
@calebbrown
Copy link
Contributor

We still have errors here. Another attempt to reduce this will be to add an LRU cache and ETag "If-None-Match" checking on requets to NPM for a given package.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants