Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

github repo specific schedulerId #143

Open
petemoore opened this issue Mar 19, 2019 · 2 comments
Open

github repo specific schedulerId #143

petemoore opened this issue Mar 19, 2019 · 2 comments

Comments

@petemoore
Copy link
Member

petemoore commented Mar 19, 2019

This issue has been extracted (and slightly rewritten) from this comment of issue #16.

Tasks created by taskcluster-github have "schedulerId": "taskcluster-github".

To cancel a taskcluster-github-created task requires scope queue:cancel-task:taskcluster-github/<taskGroupId>/<taskId>. Since taskGroupId and taskId do not follow a repo-specific naming pattern, the scope queue:cancel-task:taskcluster-github/* is the only scope assignment that serves the general purpose of being able to cancel any taskcluster-github task for a given repo, without the possibility to restrict this to an individual github repo.

By using unique github scheduler ids per repo, this limitation would be lifted. If tasks created for repo github.com/foo/bar were to have (e.g.) "schedulerId": "github-foo-bar", then to cancel a task, a client would need to have queue:cancel-task:github-foo-bar/<taskGroupId>/<taskId> rather than queue:cancel-task:taskcluster-github/<taskGroupId>/<taskId> so it would be relatively straightforward to grant queue:cancel-task:github-foo-bar/* to roles/clients that should be able to cancel any task for only this repo. They would then not be able to cancel tasks for other github repos, as they currently can now.

Note, one complication is that schedulerIds are currently limited to ^([a-zA-Z0-9-_]*)$ with a maximum limit of 38 chars, so the github org/user + repository name cannot be simply embedded in the schedulerId since this will not necessarily comply with the required schedulerId pattern. We should therefore define the schedulerId as a function of the org/user and repo name, that satisfies the following properties:

  1. (Required) It always returns a schedulerId that conforms to the required regexp for schedulerId.
  2. (Required) It returns a schedulerId that is unique per repo.
  3. (Preferred) The github org/user and repo name are reasonably easy to determine from the schedulerId (i.e. the function is reverse-engineerable), or if not, it is a simple and well-defined lexical function that users could implement themselves to predict the schedulerId in any tooling they may wish to create.

One example of such a function (in this illustration written in go) could be the schedulerId function below:

import (
	"crypto/sha256"
	"fmt"
)

func schedulerId(userOrOrg, repoName string) string {

	qualifiedRepo := stripASCII(userOrOrg) + "-" + stripASCII(repoName)
	if len(qualifiedRepo) <= 35 {
		return "gh-" + qualifiedRepo
	}
	return "gh-" + qualifiedRepo[0:30] + hash(qualifiedRepo)[0:5]
}

func hash(orig string) (hashed string) {
	return fmt.Sprintf("%x", sha256.Sum256([]byte(orig)))
}

func stripASCII(orig string) (stripped string) {
	for _, char := range orig {
		if (char >= '0' && char <= '9') || (char >= 'a' && char <= 'z') || (char >= 'A' && char <= 'Z') {
			stripped += string(char)
		}
	}
	return
}
@djmitche
Copy link
Contributor

The character limit is 38 now.

Note that there is another threat to uniqueness: we want to prevent, for example, someone creating an org named taskcluster-generic and a repo named worker and getting the same schedulerId as taskcluster/generic-worker.

@djmitche
Copy link
Contributor

Oops, I thought I marked for @owlishDeveloper's review but it's not a PR. Anyway, please take a look!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants