Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure bundling subresources doesn't cost extra bandwidth #594

Open
jyasskin opened this issue Sep 1, 2020 · 12 comments
Open

Ensure bundling subresources doesn't cost extra bandwidth #594

jyasskin opened this issue Sep 1, 2020 · 12 comments
Labels
discuss Needs a verbal or face-to-face discussion

Comments

@jyasskin
Copy link
Member

jyasskin commented Sep 1, 2020

In #551, @briankanderson and @kuro68k worried that once folks are putting their subresources in bundles, it will be more difficult for browsers or extensions to avoid transferring a subset of those resources in order to save time or money. For example, browsers' "disable images" and "disable javascript" options should be able to block just those kinds of subresources without preventing the enabled kinds of subresources from loading. It sounds like there are also kinds of resources beyond "all scripts" that folks on satellite links want to block, although "ads" is adequately covered by #551.

Some of @yoavweiss' ideas around having the browser express which resources are already cached, and having the server adaptively generate a subset of the bundle, could be helpful here, but

  1. We have to make sure the default is good even for servers that can't dynamically subset bundles, and
  2. I think we need more information than just the list of resource URLs. Maybe we also provide the preload as value?
@yoavweiss
Copy link
Collaborator

I think there are 2 separate cases here. There's "bundles as a way to optimize subresource delivery" and "bundles as a way to deliver whole sites".
I think this problem may be more related to the former.

When using bundles as an optimization, we could encourage (and maybe even enforce) that subresources would be for a certain destination (e.g. using an "as" attribute), so that if they need to be blocked based on their type, the browser can easily do that. Similarly, the browser would be able to block individual resources based on their URL (using the "resources" attribute).
While the latter bit would only work for dynamic servers, the former would work for static ones as well.

All this falls apart when we want to deliver full applications in web bundles, as we'd now need both an as attribute per-resource and a dynamic server.

So I guess my question is: is it enough to cover the "subresource optimization" case? Or do we need to cover the "app delivery" case as well? And if we need to cover both, are we willing to take an RTT hit for the latter?

@kuro68k
Copy link

kuro68k commented Sep 1, 2020

Even one bundle per resource type wouldn't really help because the site operator will likely bundle all the images that get blocked anyway. Not just ads, some browsers/add-ons block images over a certain size (say 50k bytes).

The only way this can be mitigated is if the browser is presented with a complete manifest of all resources with enough detail to decide which ones it wants, and then the bundle is dynamically generated based on its decision.

@KenjiBaheux
Copy link
Collaborator

Yes, this could be an option. Another option (a short term one): disable WBN support if any of the "don't load resource of type X" is enabled.

@kuro68k do you have examples for the "add-ons blocking images over a certain size"? I'm wondering how it's done in practice.

For the "app delivery" scenario, I'm wondering if this could be based on client-hints. @yoavweiss is that what you had in mind given the "one extra RTT" hint?

@kuro68k
Copy link

kuro68k commented Sep 1, 2020

@KenjiBaheux uBlock Origin supports blocking all resources over X bytes.

There are other dangers here, e.g. if the browser says "don't bother sending that ad image" the site could refuse to let the user view other content. It's possible to do now but usually implemented in Javascript where it can be blocked, while this makes having the server do it much easier.

@yoavweiss
Copy link
Collaborator

@KenjiBaheux uBlock Origin supports blocking all resources over X bytes.

I'm curious how they do that technically. The browser is not aware of the size of the resources it is about to download until they are downloaded, and in many cases (e.g. resources without content length) exceeding said quota.

Otherwise, it seems rather unsafe to apply that to no-cors fetched cross-origin resources, as that can reveal information about the user's state. See my talk about this for more details.

@kuro68k
Copy link

kuro68k commented Sep 1, 2020

@KenjiBaheux uBlock Origin supports blocking all resources over X bytes.

I'm curious how they do that technically. The browser is not aware of the size of the resources it is about to download until they are downloaded, and in many cases (e.g. resources without content length) exceeding said quota.

Otherwise, it seems rather unsafe to apply that to no-cors fetched cross-origin resources, as that can reveal information about the user's state. See my talk about this for more details.

According to the documentation it does contact the server to fetch the size of the object, it just then doesn't download it if it exceeds the threshold.

@jjdelc
Copy link

jjdelc commented Sep 20, 2020

Serving an all or nothing bundle of a site would break many of important HTML bw saving features, for example

  • How does this play at all with the img tag's loading=lazy attribute?
  • In the case of responsive images, which one should be downloaded? or all of them?
  • What about file format support? The <video> tag that could provide multiple format sources depending on the browser's support, mp4, webm, av1
  • How does this affect browser caching for resources between pages. All bundles need to ship all its deps everytime, rendering etags, expire headers useless and all those smarts useless.
  • On top of the client's bandwidth, this seems to also increase the server's bandwidth if resources need to be uploaded on every page that the client requests? article_a.html, article_b.html

@yoavweiss
Copy link
Collaborator

A bundle doesn't have to include all of the site's resources.

  • How does this play at all with the img tag's loading=lazy attribute?

One can imagine a bundle excluding all out-of-viewport images, or adding them to the end of it, if an offline scenario is likely.

  • In the case of responsive images, which one should be downloaded? or all of them?

One can imagine Client Hints being used to subset non-required media on the server side.

  • What about file format support? The <video> tag that could provide multiple format sources depending on the browser's support, mp4, webm, av1

Content negotiation (e.g. the Accept header) can be used to subset unneeded file formats.

  • How does this affect browser caching for resources between pages. All bundles need to ship all its deps everytime, rendering etags, expire headers useless and all those smarts useless.

See https://docs.google.com/document/d/11t4Ix2bvF1_ZCV9HKfafGfWu82zbOD7aUhZ_FyDAgmA/edit# for a proposal on that front.

  • On top of the client's bandwidth, this seems to also increase the server's bandwidth if resources need to be uploaded on every page that the client requests? article_a.html, article_b.html

I'm not sure I get your point here.

@kuro68k
Copy link

kuro68k commented Sep 21, 2020

Does it seem at all likely that sites will generate a custom bundle for every browser request based on what specific resources that browser would like to receive? And if so will it end up being any better than not using a bundle?

As it stands it looks likely that any kind of content filtering system (ad blocker, privacy settings in the browser, disabling auto-play videos, bandwidth saving proxies or disabling large object downloading in browser settings etc.) will require completely disabling the use of bundles.

Furthermore they break one of the most fundamental concepts of the World Wide Web: that the client is responsible for selecting resources and displaying them as the user desires. It gives control of the user's software, of their network connection, to the server operator which requires a level of trust that many users are unwilling to give, even before considering the cost and bandwidth ramifications.

@yoavweiss
Copy link
Collaborator

You seem to be assuming that the main and only use of bundles is sending the entire contents of the site with them. I don't think that's an accurate assumption.

@kuro68k
Copy link

kuro68k commented Sep 21, 2020

No such assumption, but in any case the onus is on those promoting this idea to demonstrate that it won't cause these kinds of issues and so far nobody has proposed a solution that allows for the current level of fine-grained control that browsers have.

@jjdelc
Copy link

jjdelc commented Sep 22, 2020

I'm not sure I get your point here.

For the server's case, if they need to serve the bundle for say all articles, they would need to serve the same CSS/JS/Imgs files every time when the browser can have them cached. Increasing server's bandwidth consumption.

I read the GDoc linked, it looks like a lot of dynamic machinery is needed to generate on the fly bundles according to each of the cached states the client could be in. It starts to look a lot like a more complex version of HTTP2 with pipelining.

By having the bundles being dynamic it sounds like the idea of being able to have a neat bundle weakens, would this be a HTTP server plugin?

@jyasskin jyasskin added the discuss Needs a verbal or face-to-face discussion label Oct 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Needs a verbal or face-to-face discussion
Projects
None yet
Development

No branches or pull requests

5 participants