-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Worker Identity and the Worker Key #157
Comments
@taskcluster/services-reviewers let me know if you have questions or comments? |
I think the join-until-artifact-expiration could be better accomplished by just setting an "expiration" value per workerPool, and setting that to 1 year for level-3 workers (or whatever the maximum artifact lifetime we want is). That saves a join and simplifies the model a bit. To allow revocation, we could store keys in a separate table and the |
I think this works, as long as the 1 year expiry is counted after the most recent task run on that worker has completed... otherwise there will be some window where the artifact exists and the worker doesn't. Certainly if the worker lives less than 1 day, this may not be a big deal. If a worker lives for months (e.g. hardware), we may have issues unless we refresh the key or rotate workerIds regularly. We may still want to pad this:
This made me realize that there's a period where the key is valid to sign new artifacts (the lifespan of the worker), and a period where the key is retrievable to verify signatures, but shouldn't be able to sign any new artifacts (the period between the worker going away, until the final artifact expires). I'm not sure how much we should address this: maybe a |
Could the workers generate the key and pass only the public key to the manager? That would prevent the manager from having access to sensitive key material. Optional: could we leverage cloud features and use KMSs to hold those keys? That would remove the need to store & operate keys in the workers themselves, and would move the security control to the cloud provider instead. (Caveat: KMSs may not support signing operations). |
This is possible, yes. The upside is the private key would never be transported over the wire or known by the manager, plus we don't run the risk of running low on entropy if we generate a large number of keypairs (not sure if this is as large a concern in newer crypto than, say, gpg). There is the potential for reusing keys or having some weaker algorithm on the workers, but we can address this with, say, worker runner, which can guarantee a specific version of the worker is installed. So yes, let's go with the generate-key-on-worker model.
Dustin pointed out that with the generate-key-on-worker model, this is an implementation detail. The cloud worker instance could potentially get the public key from the KMS, and submit that to the worker manager. We'd need to research the KMSs to a) make sure they support signing, and b) find out which signing algorithms they support, because that may influence our decision about what flavor of signing we use in general. I'm under the impression that KMSs are only an option for cloud instances, and we'll still have to support key generation on the worker for hardware workers, so if we go this route, we'll need to support a hybrid approach. |
Do we build artifacts on hardware workers? I genuinely don't know. |
Yes, we have PGO profiles we generate on hardware, which we download and use to build release builds. I suppose we could determine whether these are low-risk enough to not need worker keys. |
More points from discussion Wednesday:
|
Hm. This issue covers 1) taskcluster-provisioned cloud instances, and 2) hardware workers. We have a third type of worker we'll need to cover in the firefoxci cluster: scriptworkers. The mac signers are hardware, so could follow the pattern for (2). All other scriptworkers are currently docker containers running in k8s. If we're able to handle that in the cloud-provisioned solution, great. Otherwise we may need to use the hardware solution for them, or think of a third way. |
I suspect that the worker side of this functionality would be implemented in worker-runner, so it would "just work" for anything that uses the "static" provider. Depending on how dynamic that k8s deployment is, that might be easy or hard :) |
(This is related to #156, but probably needs a few more questions answered.)
I can open an RFC once we have an initial consensus.
The goal is to provide an Artifact Integrity guarantee that a given artifact was generated by a worker under our control.
In this model, the worker manager will provide a key for each provisioned worker.
Keypair
We've gone back and forth between PKI and no PKI. In the PKI model, we would have an intermediate cert on the Worker Manager, and sign the worker cert. We would trust the root cert and verify signatures through the web of trust. This brings up questions around key rotation and revocation that we should address if we go this route.
In the non-PKI model, we could generate a small unique keypair, possibly ed25519, per worker instance. As long as the public key is associated with the worker on the Worker Manager, we can verify its signatures. This means we'll need to keep the worker information in Worker Manager as long as we need to verify its artifacts. We also need to decide if we generate the keypair in the Worker Manager and send the private key to the worker, or if we generate the keypair on the worker and send the public key to the Worker Manager.
This is the
Worker Key
.We're currently assuming we're going the non-PKI model.
Cloud provisioned workers
Aiui, cloud provisioned workers have an identity document from the cloud provider. Once the worker identity is verified, we can store the public key with the rest of the worker information. If the key generation happened on the Worker Manager, we can pass down the private key to the worker.
Hardware workers
The security here will be colo- and subnet-based security. We need some way to add a keypair to the hardware workers, and get the public key into Worker Manager.
Key rotation / reused workerIds
We can generate a new key for every cloud instance, especially if they're short-lived. If we reuse cloud
workerId
s we need to be able to either return a set of valid public keys, or perhaps add the datetime the artifact was created to the public key request. We may also want to be able to rotate keys on a hardware worker without changing itsworkerId
.Public Key query endpoint
For the non-PKI solution, the Worker Manager will keep track of each worker's public key(s), and either return the set of valid public keys for a given workerId, or the valid public key for a given datetime.
Preserve important worker history until artifact expiration
For the non-PKI solution, the Worker Manager will need to keep track of the important (read: level 3) workers until their artifacts expire. Likely we'll need to specify which worker pools are "important" in configs, and we'll need a
join
in postgres to find the latest expiring artifacts uploaded by thisworkerId
.Artifact content signature
The ContentSha256 of an artifact guarantees that the artifact has not been modified between artifact upload and artifact download. By signing this ContentSha256 with the Worker Key, we also show that the artifact was uploaded by a worker under our control.
The text was updated successfully, but these errors were encountered: