Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion of Scale-out Architecture for High-volume Repositories TAP (TAP 21) #190

Open
ergonlogic opened this issue Nov 25, 2024 · 19 comments

Comments

@ergonlogic
Copy link

TAP-21 originated with theupdateframework/specification/issues/309. Following discussion at the TUF community meeting on 2024-11-01, we've drafted TAP-21 and submitted a PR (#189).

In addition to broader discussions of TAP-21, we are specifically seeking feedback to:

  • Validate our motivation and rationale
  • Validate the calculations we undertook (See: TAP-21 metadata overhead calculator)
  • Better understand how to approach the following sections (which are sparse, atm):
    • Specification
    • Backwards Compatibility
    • Augmented Reference Implementation

We have thoughts on the sparse sections, but have not yet been able to articulate them clearly enough to include in the TAP.

@jku
Copy link
Member

jku commented Nov 26, 2024

You mention TOFU but it's a little unclear how the client reacts when top level repository changes the "initial sub-repo metadata".

Assume client has cached earlier sub-repo metadata and the new sub-repo metadata is not compatible with the sub-repo metadata in client cache (as it can't be guaranteed to be). How does client react?

@ergonlogic
Copy link
Author

ergonlogic commented Nov 26, 2024

You mention TOFU but it's a little unclear how the client reacts when top level repository changes the "initial sub-repo metadata".
Assume client has cached earlier sub-repo metadata and the new sub-repo metadata is not compatible with the sub-repo metadata in client cache (as it can't be guaranteed to be). How does client react?

By "initial sub-repo metadata", we mean only 1.root.json for each sub-repo. For each relevant subrepo, the client must then follow the usual TUF procedure of trying to download and validate 2.root.json for the sub-repo, etc.

Under normal circumstances, key rotation for a sub-repo would involve deploying a new n.root.json to the sub-repo. If any non-root keys were rotated, we'd also need to re-sign the corresponding metadata, etc. No changes to the top-level metadata would occur in this process.

So, the scenario you describe should not occur under normal operations. If it were to occur, presumably the top-level repo has to be considered canonical, since it is the root of trust (ie. the top-level repo's 1.root.json ships with the client).

The client behaviour should presumably be to:

  1. Update top-level TUF repo metadata
  2. For each sub-repo, verify that its cached sub-repo initial root metadata remains valid
    1. If valid, proceed as normal (update sub-repo metadata, etc.)
    2. If not valid, remove all of the sub-repo's TUF metadata, including initial root metadata, download the new initial root metadata for the sub-repo, validate it (against top-level TUF metadata), then proceed with normal sub-repo operations.

@jku
Copy link
Member

jku commented Nov 27, 2024

For each sub-repo, verify that its cached sub-repo initial root metadata remains valid

This is currently not part of the spec but I believe it would be a good addition: verify-from-bootstrap feature in python-tuf

So, the scenario you describe should not occur under normal operations.

If this scenario is not expected to happen... then I'm thinking I maybe have not understood who the signers/keyowners of the sub-repo are. Can you talk more about that?

The reason I'm thinking I don't quite understand the setup is this:

  • if the signers are project maintainers (or project-specific release automation), then signing keys will be lost and some maintainers will turn out to be malicious: In a high-volume repository the top-level will need to intervene somewhat regularly
  • if the signers are repository automation (in other words the keys are KMS keys controlled by repository) I can see how content in top-level repo does not need to change ... but then I don't see the why sub-repos exist at all -- a single signing key (delivered as an artifact in top-level repo) would work just as well.

@ergonlogic
Copy link
Author

  • if the signers are project maintainers (or project-specific release automation), then signing keys will be lost and some maintainers will turn out to be malicious: In a high-volume repository the top-level will need to intervene somewhat regularly

I believe the client behaviour described above ought to handle this scenario reasonably well. But please correct me if you see a flaw in that process. We're working on incorporating it into the Specification section of the TAP.

  • if the signers are repository automation (in other words the keys are KMS keys controlled by repository) I can see how content in top-level repo does not need to change ... but then I don't see the why sub-repos exist at all -- a single signing key (delivered as an artifact in top-level repo) would work just as well.

First off, we are primarily targeting this scenario ("signers are repository automation"). I'll update the TAP to reflect this.

That said, we considered re-using the same root metadata across all the sub-repos. However, when an online key (targets, snapshot or timestamp) is rotated, all of the metadata signed by that key needs to be re-signed. For a high-volume repository this can take a (relatively) long time. Separate root metadata per sub-repo allows for an incremental roll-out of new online keys.

Note that this still allows implementers to re-use keys across sub-repos. In fact, I think this would be the recommended approach.

@jku
Copy link
Member

jku commented Nov 28, 2024

I meant that I don't understand what the advantage of TAP-21 in general is for the "signers are repository automation" case.

Compare to a setup where there are no sub-repos and the top-level TUF repo only contains a set of public keys as artifacts (no project indexes or packages are added to TUF): These keys are used to sign the project indexes and/or actual packages outside of tuf by the repository automation, and clients get these keys using TUF and use them to verify indexes/packages. So TUF is only used as a repository signing key rotation mechanism, not as a delegation mechanism. In this setup:

  • TUF repository is very simple to maintain and very slow moving (repository only changes if signing key changes)
  • client bandwidth use and number of requests is minimal
  • upload latency is minimal
  • the security posture seems roughly similar to me when compared to TAP-21
    • neither provides global snapshot
    • repository online key compromise is still "total" (all packages can be signed by attacker) but recoverable
    • the one clear advantage TAP-21 seems to have is builtin timeliness check from timestamp role: this can be mitigated with making the signatures expire with e.g. signing certs

It's entirely possible I've missed something: Apart from the theoretical ability to switch to "project signing" at a later date, do you believe TAP-21 has real security advantages over the simple setup described above?

@ergonlogic
Copy link
Author

Compare to a setup where there are no sub-repos and the top-level TUF repo only contains a set of public keys as artifacts (no project indexes or packages are added to TUF): These keys are used to sign the project indexes and/or actual packages outside of tuf by the repository automation, and clients get these keys using TUF and use them to verify indexes/packages. So TUF is only used as a repository signing key rotation mechanism, not as a delegation mechanism.

How do clients get the index & package signatures in this scenario? That's the payload of the sub-repos.

@jku
Copy link
Member

jku commented Nov 29, 2024

Compare to a setup where there are no sub-repos and the top-level TUF repo only contains a set of public keys as artifacts (no project indexes or packages are added to TUF): These keys are used to sign the project indexes and/or actual packages outside of tuf by the repository automation, and clients get these keys using TUF and use them to verify indexes/packages. So TUF is only used as a repository signing key rotation mechanism, not as a delegation mechanism.

How do clients get the index & package signatures in this scenario? That's the payload of the sub-repos.

I suppose there's many ways to handle that but for discussion let's say well-known URLs based on the artifact URL: if client wants to download "$PACKAGE_URL", signature is at "${PACKAGE_URL}.signature"

@ergonlogic
Copy link
Author

the one clear advantage TAP-21 seems to have is builtin timeliness check from timestamp role [...]

This is an advantage of TUF, not TAP-21 specifically. TUF also affords protection against a variety of attacks that the "simple setup" appears not to provide. For example, TUF mitigates against endless data attacks by including the size of downloaded files within TUF metadata.

[lack of timeliness checks] can be mitigated with making the signatures expire with e.g. signing certs

I don't doubt that there are alternative mechanisms to provide the same protections as TUF. But each would presumably complicate the alternative solution. This, in turn, undermines the argument for its simplicity. Why re-invent the wheel?

do you believe TAP-21 has real security advantages over the simple setup described above?

TAP-21 preserves all but a handful of the protections afforded by TUF. So, yes.

@trishankatdatadog
Copy link
Member

Left some comments in your TAP PR, @ergonlogic. I think your real problem there is different than the one you have proposed a solution for. As I suggested there, do you think getting on a call to discuss your problem in its entire context would help serve you better?

@jku
Copy link
Member

jku commented Dec 2, 2024

For example, TUF mitigates against endless data attacks by including the size of downloaded files within TUF metadata.

Sure but this pretty much a form of DOS which the attacker could do anyway if they control the repository or the mirror. I don't think this makes a meaningful difference between the two models.

I don't doubt that there are alternative mechanisms to provide the same protections as TUF. But each would presumably complicate the alternative solution. This, in turn, undermines the argument for its simplicity. Why re-invent the wheel?

I think you may be underestimating the complexity of running the sub-repositories. The two models being compared have, in my opinion, vastly different levels of simplicity -- and this overall difference would not be changed by some small changes in either one

@jku
Copy link
Member

jku commented Dec 2, 2024

TAP-21 preserves all but a handful of the protections afforded by TUF.

The TUF "security model" refers to an idealized repository that we've found out does not usually exist. I'd really like to see specific relevant attacks that are being protected from: DOS protection is nice but not that persuasive...

(I'm not trying be difficult by the way: There may well be attacks that this protects from, I just haven't been able to find any that a simpler solution wouldn't handle)

@ergonlogic
Copy link
Author

Left some comments in your TAP PR, @ergonlogic. I think your real problem there is different than the one you have proposed a solution for.

Thank you, I've responded in the PR.

In the #tuf Slack channel, @mnm678 reviewed our metadata download calculations, and surmised that previous calculations (for PEP-458) underestimated the size of snapshot metadata.

The problems we describe in the TAP are not purely theoretical. We are observing them IRL. That said, maybe we've misinterpreted the root cause. Could you elaborate?

As I suggested there, do you think getting on a call to discuss your problem in its entire context would help serve you better?

I'm happy to discuss further on a call. FWIW, we did discuss this on the last community call. Is there another one coming up this week? If not, I'm happy to chat in another venue.

@ergonlogic
Copy link
Author

I think you may be underestimating the complexity of running the sub-repositories.

Perhaps. However, based on my experience implementing Rugged, I don't think it would add too much complexity. Each sub-repo is just a simple TUF repo, after all.

The two models being compared have, in my opinion, vastly different levels of simplicity --

Agreed. TAP-21 is implementing TUF for sub-repos, whereas the alternative is essentially just publishing some signatures.

and this overall difference would not be changed by some small changes in either one

I disagree. I think you're comparing apples and oranges. For an alternative to actually provide security comparable to TUF, it's reasonable to think that it would become significantly more complex than just publishing signatures.

@ergonlogic
Copy link
Author

I'd really like to see specific relevant attacks that are being protected from [...] that a simpler solution wouldn't handle

Within a sub-repo, all of the protections afforded by TUF are present.

As far as I can tell, the "simple" alternative does not appear to protect against rollback or indefinite freeze attacks. For example, an attacker could present an older version of an index, along with its previously published signature. Since the signature was generated by a valid key and matched the older index, a client would presumably validate it.

I'm not trying be difficult by the way

I'm sorry if I am being defensive. I guess I just don't really understand how showing that TUF (not TAP-21, but TUF itself) is more secure than a theoretical rudimentary alternative is relevant.

Registries are free to adopt TUF or not. Is it overkill for sub-repos? Perhaps. But I'd prefer to err on the side of caution. TUF itself, along with PHP-TUF, Rugged and other implementations, have all passed security audits. It doesn't seem to me that starting from scratch is a worthwhile endeavour.

@trishankatdatadog
Copy link
Member

I'm happy to discuss further on a call. FWIW, we did discuss this on the last community call. Is there another one coming up this week? If not, I'm happy to chat in another venue.

No need to discuss again on the community call, but let me reach out to you over DM on the CNCF Slack to chat separately. Thanks!

@ergonlogic
Copy link
Author

At the recent TUF Community meeting, we discussed TAP-21:

  1. The majority of the discussion centered on the motivations behind TAP-21, and @JustinCappos's suggestion that there were likely mechanisms already in TUF, or in other TAPs (eg. TAP-16), that might address those issues.
  2. @jku also raised concerns about the safety of relying on the proposed top-level repo for initial root metadata, since there would be no way to detect replacement of said metadata.

We ran out of time at that meeting, and so scheduled a follow-up call.

In the follow-up call, we discussed:

  1. Further context about the Drupal.org (~50k packages w/ ~10 releases each, ~5 releases/day) and Packagist registries (~500k packages w/ ~10 releases each, up to multiple releases/second), for which TAP-21 is being contemplated. TUF integration in Composer assumes target names are structured like <vendor>/<package>/<version>. It was noted that:
    • It's relatively rare that a new vendor is added
    • It's relatively uncommon that a new package ("project") is created.
    • It's very common to add new releases for existing packages.
  2. Justin then proposed a repository layout that could address concerns around metadata churn:
    • Top level Targets metadata delegates to a Targets file per vendor.
      • delegation in top-level targets: <vendorname>-*
    • Each "vendor" Targets file delegates to a new target file per package.
      • delegation in vendor targets: vendorname-packagename-*
    • Each package's target file contains the metadata for all releases of that package (and perhaps the un-versioned registry index files, eg. package.json).
    • This achieves a reduction in targets metadata churn similar to what TAP-21 proposes, since the package-level targets files only contain metadata about a single package. However, this could result in 500k+ entries in Snapshot metadata
  3. Most of the remaining time was spent discussing TAP-16 (ie. Merkle Trees), and how efficiently it would allow such a huge volume of snapshot metadata to be generated and verified.

@ergonlogic
Copy link
Author

Each package's target file contains the metadata for all releases of that package (and perhaps the un-versioned registry index files, eg. package.json).

This achieves a reduction in targets metadata churn similar to what TAP-21 proposes, since the package-level targets files only contain metadata about a single package. However, this could result in 500k+ entries in Snapshot metadata

Reflecting on this further, this approach would seem to also require a large volume of target delegation rules. Unlike Snapshot metadata, delegations would be slow-moving, as they would only change when new packages are added to the registry. Even so, a single level of intermediate delegations (vendor) could still result in some fairly large downloads, but at a relatively low frequency.

@JustinCappos
Copy link
Member

Even so, a single level of intermediate delegations (vendor) could still result in some fairly large downloads, but at a relatively low frequency.

Right. Most vendors wouldn't have this. The few that would could use hash bin delegations within their role to avoid this cost.

@ergonlogic
Copy link
Author

Reviewing TAP 4, I note that it describes the core use-case of TAP-21:

Use case 1: obtaining different targets from different repositories
It may be desirable to use the same instance of TUF to download and verify different targets hosted on different repositories. For example, a user might want to obtain some Python packages from their maintainers, and others from PyPI. In this way, one can securely access all Python packages, regardless of where they are hosted, and without the need for a different client tool instance (e.g., copy of pip) for each repository.

So, it looks like there's already an accepted mechanism to delegate targets to separate repos. It also provides guidance for where the client should look for sub-repository root metadata:

The map file contains a dictionary that holds two keys, "repositories" and "mapping." The value of the "repositories" key is another dictionary that lists the URLs for a set of repositories. Each key in this dictionary is a repository name, and its value is a list of URLs. The repository name also corresponds to the name of the local directory on the TUF client where metadata files would be cached. Crucially, this is where the root metadata file for a repository is located. [Emphasis added]

I haven't found any guidance (yet) about how to provide this root metadata in a trustworthy fashion. For a handful of well-known or private repos, I suppose this could be done manually. However, to manage this for thousands of sub-repos, as we'd need for TAP-21, that's completely unmanageable.

This leads back to @jku's concern. Based on subsequent offline discussions, I believe that this concern amounts to:

  • TAP-21 (currently) proposes that we treat the initial root metadata for each sub-repo as a target.
  • Since this just treats each root metadata as a blob of text, we have no way of preventing it from being completely replaced in the top-level repo, as opposed to updated with a newer version.

However, since the root metadata format is prescriptively defined in TUF, we don't need to treat it as simple target. Instead of the top-level containing only targets metadata, we could perhaps add a role (eg. repositories.json) that includes the version of each repository's root metadata. This ought to allow us to guard against arbitrary changes to sub-repo root metadata.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants