-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
workers should support explicit, verified inputs with a manifest #89
Comments
I'm not sure what this means. I don't think we could limit the inputs a task could consume.. Can you make a more detailed proposal? |
If we have this, we could potentially then set up the firewall to disallow outbound connections during the task to force limiting the inputs a task could consume. That's not a requirement, but this RFC would be a first step towards being able to do that. What details would you like? Essentially, CoT verification of inputs is always going to be a patchy, hacky thing as long as the task can download inputs in any way: mh configs, env vars, mach commands, etc. By pre-defining this in the task definition in a standardized way, we can allow for a standardized verification. One way would be to standardize on
|
What I don't understand is, we can write as much as we want in the task definition, but the task is still arbitrary code and can do what it wants. We can firewall a little, but most stuff we talk to is on Heroku or S3 or EC2 so that's a pretty blunt instrument. It certainly couldn't, for example, limit to tooltool artifacts with a particular hash or artifacts from a whitelist of taskIds. |
Sure, someone can add something rogue. But for the official inputs, e.g. the build for a repackage task, or the complete mars for a partial regeneration task, we can put them in and make sure that their shas/sigs have not been modified before download. Otherwise we have to put in breadcrumbs for
i'd rather know that repackage would die if it downloaded a different sha from the build's upload. |
This could be through something like the artifact service I've seen comments about. Scopes could potentially allow for modifying that, but signing could then compare the sha from the build task's CoT artifact and compare it to the sha from the artifact service, rather than having to download and verify every artifact. For signing to know that it needed to verify that artifact's sha, it would need to have a standardized way for repackage to say "build's artifact is an input I rely on", rather than adding in checks for all the various config methods of specifying inputs that we have today. |
So if I can boil this down, the idea is to have a standard way for a task to describe its "upstream" inputs. This would make CoT verification easier, and also make task implementation easier since the worker would download those inputs before beginning the task. Generic-worker supports something like this in the @escapewindow can you have a look at that functionality? If that suits, then maybe the proposal here is to replicate that functionality in taskcluster-worker and, when everything is using taskcluster-worker - #10 - start switching in-tree tasks to use that approach. I think @petemoore was interested in "plugging" tasks together this way (which probably explains why generic-worker has this support!) |
I think a) an artifacts service that stores and returns artifact shas (we'd need to verify no one is modifying the artifacts + shas at rest), or If (a), we may want to record our mounts' paths and shas in the chain of trust artifact, so the scriptworker verification step can compare the downloaded sha vs the uploaded sha. If (b), we're moving towards the taskcluster-supported workers verifying the chain of trust artifact, which we may want at some point anyway. |
For artifact SHA validation, we can update generic-worker (and other workers) to use the new Artifact API from #7 that jhford has been working on. This should take care of SHA validation of taskcluster artifacts. For url SHA validation, we could make the sha-256 (or other algorithm) value an optional parameter in the payload, such that task will fail if checksum is not correct. Chain of trust could then make assertions that the checksum(s) are included in the payload, which would give us flexibility not to force task definitions to state checksum(s), but in the case we want to enforce it, we can via chain-of-trust. |
We have a number of inputs that go into Gecko tasks:
and we define those in various ways: requirements files, tooltool files, docker image task definition locations, env vars, etc. Having to audit or verify the inputs to a task is a very complex ask right now.
If we could define explicit inputs to a task,
That's much easier to audit. It also could be the initial steps towards limiting outbound traffic once the task starts. This reminds me of @petemoore 's inputs/outputs to tasks proposal... where tasks can be chained like commandline pipes, although it's not one-dimensional (many-to-many piping).
The text was updated successfully, but these errors were encountered: