Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Worker Machine Image Building Automation #164

Closed
wants to merge 7 commits into from

Conversation

petemoore
Copy link
Member

@petemoore petemoore commented Sep 8, 2020

I consider this ready for review now.

There is still one open point for me, which is I haven't quite yet worked out if /config/imagesets.yml should be updated by automation or not, as we would then get a flurry of commits by automation every time a new image set is built. Not decided if this is a good or bad thing. An alternative might be to publish the data somewhere, but not commit back to source code. I'm not sure that is a great idea either, so at the moment tending to thinking explicit commit by automation is best.

@petemoore petemoore self-assigned this Sep 8, 2020
@petemoore petemoore marked this pull request as ready for review September 22, 2020 13:23
@sciurus
Copy link

sciurus commented Sep 22, 2020

For config updates, we already have a process for doing this in an automated fashion for firefox-ci-tc. I suspect it makes more sense to leverage this than to build something new for community-tc.

@petemoore
Copy link
Member Author

For config updates, we already have a process for doing this in an automated fashion for firefox-ci-tc. I suspect it makes more sense to leverage this than to build something new for community-tc.

@sciurus That's great - do you know if there are any docs that describe the process? Thanks!

@sciurus
Copy link

sciurus commented Sep 22, 2020

@sciurus That's great - do you know if there are any docs that describe the process? Thanks!

Not great ones. I threw together https://github.com/mozilla-services/cloudops-docs/blob/master/Services/Taskcluster/CI%20Admin.md and am happy to answer questions.

from community-tc-config git commit SHA to imageset names is persisted in
community-tc-config repo directly.
* If there are changes to any files which are used to produce the image set
images, a new image set will be built, with name `<imageset>-<PR>`. This
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For PRs with multiple revisions, how can we be sure that a given worker running <imageset>-<PR> has the most recent changes from that PR? Is there a piece of metadata that will contain the hash of the most recent commit contained in the image?

Copy link
Contributor

@djmitche djmitche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a general comment, this seems to take a great deal of care to generate CoT keys. Does anything in the community cluster use CoT (aside from this process itself)? I believe the answer is "no". If that's the case, could we just disable CoT on the worker images, and not worry about generating keys?

* Only `github.com/mozilla/community-tc-config:branch:*` has the required scopes to
create tasks for this worker type
* (Optional) we add a policy to taskcluster-auth that enforces that the
following scopes:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of enforcement is done via tc-admin checks in community-tc-config, rather than in the auth service.

Note, client `static/taskcluster/queue` shouldn't require such scopes, unless
we are forced to create the `proj-taskcluster.worker-image-builder` tasks from
a decision task. The author is currently unaware of a reason this would be
required.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain what the implications of this note are?

* Once built, a staging worker pool is dynamically created that uses the
new image set, and a test task is created from a task template in the imageset
directory to test the new image set (I don't think we can use a hook here since
the commit SHA is a variable input).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hooks support variable input

* Access to image building workers is restricted to only a small whitelist of
trusted employees
* The public chain of trust key for the worker image builder is published to
one or more public places
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What validates the CoT signatures?

* For this, a reliable mechanism will be needed that determines which commit
of community-tc-config a given imageset is built from. Perhaps this mapping
from community-tc-config git commit SHA to imageset names is persisted in
community-tc-config repo directly.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs more resolution. Storing data in the repo automatically requires GitHub credentials with write access, which if disclosed would circumvent the other security measures here.

@petemoore
Copy link
Member Author

As a general comment, this seems to take a great deal of care to generate CoT keys. Does anything in the community cluster use CoT (aside from this process itself)? I believe the answer is "no". If that's the case, could we just disable CoT on the worker images, and not worry about generating keys?

@djmitche So I think Chain of Trust is a generally useful feature in the worker, but I think it has an unfortunate name, since it makes it sound like it is only relevant for the mozilla firefox CI chain of trust validation process. However, it is simply an artifact signing feature, that provides a guarantee that the artifacts that were posted against a given task were signed by a key that was on the worker itself. It would be reasonable for all tasks to always have the feature enabled, as it costs very list and offers protection against taskcluster credential breaches (it isn't possible to obtain the signing keys via a taskcluster credential breach, the signing key would also need to be breached).

Perhaps we should rename the feature to signedArtifacts instead. I certainly feel like it is a useful security feature in any production deployment of taskcluster.

I'll respond to the other points tomorrow - many thanks!

@escapewindow
Copy link
Contributor

We could retitle #158 to something like obsolete chain of trust keys and simplify maintenance and deployment :)

@petemoore
Copy link
Member Author

Closing for now as I'm not actively working on it.

@petemoore petemoore closed this Apr 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants