Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update content-addressable-bundles.md #639

Closed
wants to merge 6 commits into from
Closed
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
107 changes: 83 additions & 24 deletions explainers/content-addressable-bundles.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,8 @@ other bundles.

## Participate

- [WICG/webpackage](https://github.com/WICG/webpackage/issues)
- [WICG/webpackage](https://github.com/WICG/webpackage/)
([#638](https://github.com/WICG/webpackage/issues/638))

<!-- TOC -->

Expand Down Expand Up @@ -77,40 +78,62 @@ This explainer proposes a new approach on the top of [Subresource loading with
Web Bundles], aiming more flexibility of how resources are grouped together as
bundles. This allows web developers to split code into various bundles which can
then be loaded on demand or in parallel, and also provide a new capability to
express a dependency between bundles, which can be used for a browser to improve
a loading performance.
express a dependency between bundles.

## Goals

- This proposal aims to support [Code Splitting], as [webpack] or other bundlers
already support as a user-land solution. Smaller bundles, if used correctly,
can have a major impact on load time.
- This proposal aims to support [Code Splitting] use cases, as bundlers like
[Webpack](https://webpack.js.org/guides/code-splitting/) or [Browserify]
already support as a user-land solution.

As an application grows in complexity or is maintained, CSS and JavaScripts
files or bundles grow in byte size, especially as the number and size of
included third-party libraries increases.

Smaller bundles, if used correctly, can have a major impact on load time.
Features required at page load can be downloaded immediately with additional
bundles being lazy loaded after the page or application is interactive, thus
improving performance.

- This proposal aims to add a new capability to declare dependencies between
bundles inside of a _bundle itself_. This self-descriptiveness makes an
on-demand lazy-loading possible without any further additional configuration
outside of a bundle.

A good analogy would be dynamic shared libraries, like `*.so` files. They have
a dependencies section to declare dependencies to other libraries.

- Non-opinionated about bundle granularity. There are trade-offs how a site
composes their resources into bundles in order to balance various factors like
total bytes transferred, loading latency, or cache granularity. Instead of a
_all-or-nothing_ bundle, this proposal aims to provide a way to express a
dependency graph of bundles. The use of bundlers is an established practice in
Web development. Bundlers, such as [webpack], [skypack], would know much about
which resources should be grouped as a bundle, and might want to express their
intent as a dependency graph of bundles, considering various trade-offs. They
wouldn't want to lose this information in building an all-or-nothing bundle,
and a browser wants to know it to improve a loading performance.
Web development. Bundlers would know much about which resources should be
grouped as a bundle, and might want to express their intent as a dependency
graph of bundles, considering various trade-offs.

- Bundles can be served from a static content server. The proposal doesn't
require any smart server, such as dynamically assembling resources into a
bundle. Bundles should be statistically generated by a bundler and can be
copied to a static content server. This proposals assumes that this is an
copied to a static content server. This proposals assumes that this is
important for a wide adoption.

- The proposal aims to give a browser an opportunity to improve their cache
efficiency by introducing _immutability_ to a bundle. If a bundle's URL
doesn't change, we assume the bundle's contents are _exactly_ same. This is
not an effort by a convention. The proposal aims to _force_ immutability by
introducing a Content-Addressable Hash, which is conceptually similar to a
[Git]'s commit ID you might be familiar with. Content-Addressability gives web
developers reproducible builds as well as giving a browser an opportunity to
improve their cache efficiency.
- This proposal aims to achieve Content-Addressability to a bundle. If a
bundle's URL doesn't change, we can assume the bundle's contents are _exactly_
same. This is not an effort by a convention. The proposal aims to _force_
immutability by introducing a Content-Addressable Hash, which is conceptually
similar to a [Git]'s commit ID you might be familiar with.

Content-Addressability gives web developers fearless reproducible builds, and
prevents some kinds of attacks, like unexpected manipulation of fetching
resources. Brave
[raised the concern](https://brave.com/webbundles-harmful-to-content-blocking-security-tools-and-the-open-web/)
that bundling systems could be used in a way where URLs are "rotated" between
different requests, making URLs less meaningful/stable. Content-Addressability
prevents this kind of undesired behaviors on a server side.

Content-Addressability also encourages fearless sharing bundles, instead of
embedding-all-for-safety strategy.

## Non-Goals

Expand Down Expand Up @@ -155,6 +178,8 @@ number. The bundle should have the following fields:
section writes down a hash as just 4 characters, however, a hash would be
much longer in real cases, like 40 characters used in a Git commit ID.

TODO(hayato): Consider to use [Subresource Integrity] if that is applicable.

2. bundle's main-resource URL: `https://example.com/app/index.js`

Note: The current WebBundle format also defines a main-resource URL. Now, a
Expand Down Expand Up @@ -425,9 +450,13 @@ formal procedure, however, there are several possible approaches:
space-consuming.

For example, given that browser has the following dependency graph of bundles,
where the bodies of `bundle-C`, `bundle-D`, `bundle-E` are missing, the
browser can start to fetch them again, possibly in parallel, without waiting
fetching `bundle-C`.
where the bodies of `bundle-C`, `bundle-D`, `bundle-E` are omitted to save the
cache disk space, the browser can start to fetch them again, possibly in
parallel. Note that a browser can start to fetch `bundle-D` and `bundle-E`
immediately without waiting fetching `bundle-C` (and scanning its index
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is:

  • Web developers should write multiple webbundle link elements in the HTML to fetch the webbundles in parallel if the loading performance is important.
  • This nested webbundle format should be used when lazy-loading is important, and the loading performance is not important.

Is my understanding correct?

If so, I don't understand the motivation of browser developers of introducing a new cache mechanism to keep the dependency information. Could you please explain this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A good question!

It's fine for a browser to prefetch nested bundles even if a user doesn't request it explicitly.

  • If a user requests prefetch via <link> elements explicitly, browser should prefetch it.
  • If a user doesn't request it via <link> elements, a browser doesn't have to prefetch it, but may prefetch it, enabling the content to load instantly if and when the user requests it.

section) because Content-Addressability guarantees that `bundle-C`'s content
and its dependencies have not been changed. `bundle-C` always depends on the
exactly same `bundle-D` and `bundle-E` addressed by their URLs.

```
bundle-A
Expand All @@ -437,6 +466,14 @@ formal procedure, however, there are several possible approaches:
└── bundle-E (body is missing)
```

- It might be nice to share cache storage for CABs among origins, however,
modern browsers split HTTPCache storage per origin for various reasons,
especially for security reasons. Thus, sharing cache storage among origins is
fundamentally challenging for security reasons. Security and privacy must be
prioritized.

TODO(hayato): Explore this problem space. Feedback is welcome.

- Individual inline resources in a bundle must not interfere a browser's HTTP
Cache of its resolved URL.

Expand All @@ -449,6 +486,16 @@ formal procedure, however, there are several possible approaches:

TODO(hayato): [Description of the end-user scenario]

### Application Developers using bundlers

TODO(hayato): Describe key scenarios.

### CDNs (Content Delivery Networks)

TODO(hayato): Describe key scenarios. How this feature would make CDNs happy and
which features are important to reduce their disk space and/or the total
bandwidth? Feedback is welcome.

## Considered alternatives

### [Resource Bundles]
Expand Down Expand Up @@ -613,12 +660,18 @@ Not yet.
- [Web Bundles]
- [Subresource Loading with Web Bundles]
- [Resource Bundles]
- [Resource Batch Preloading]
- [Bundling for the Web]
- [Subresource Integrity]
- [Webpack]
- [Browserify]
- [Import maps]
- [Dynamic bundle serving with Web Bundles]
- [NixOS]
- [Cargo]

[webpack]: https://webpack.js.org/
[browserify]: http://browserify.org/
[skypack]: https://www.skypack.dev/
[git]: https://git-scm.com/
[get started with web bundles]: https://web.dev/web-bundles/
Expand All @@ -637,4 +690,10 @@ Not yet.
[deps.ts]:
https://deno.land/manual/linking_to_external_code#it-seems-unwieldy-to-import-urls-everywhere
[first contentful paint]: https://web.dev/first-contentful-paint/
[code splitting]: https://webpack.js.org/guides/code-splitting/
[code splitting]:
https://developer.mozilla.org/en-US/docs/Glossary/Code_splitting
[import-maps]: https://github.com/WICG/import-maps
[resource batch preloading]:
https://gist.github.com/littledan/e01801001c277b0be03b1ca54788505e
[subresource integrity]:
https://developer.mozilla.org/en-US/docs/Web/Security/Subresource_Integrity