- Comments: #189
- Proposed by: @Alphare and @ahal
Add API endpoints to query the definition, status or index paths of multiple tasks in a single call.
When looking at Decision task profiles in Gecko, it was noticed that nearly 70% of the runtime (representing ~3 minutes) was spent waiting on queries to two Taskcluster APIs:
/task/<taskId>/status
/task/<indexPath>
Each individual call is fairly quick, but Gecko Taskgraph's optimization phase can make thousands of these requests. Creating an API that can return all the information Taskgraph needs in a handful of API requests would greatly speed up the overall time the Queue and Index services spend looking things up in the database, as well as the time Gecko Decision tasks spend waiting on the network.
Note: Taskgraph doesn't actually use the task/<taskId>
endpoint here, but
this endpoint is adjacent to the other two, so for consistency it may make
sense to implement a batch API for that as well.
A proof of concept was created whereby the requests to Taskcluster were simulated such that all data could be obtained in a single API call. The overal Decision task time was reduced by ~3 minutes.
The following new APIs will be created:
- Endpoint:
/tasks
- HTTP GET:
- Request body consisting of a JSON object:
{ "taskIds": [<taskId>] }
- Response body:
{ "tasks": { <taskId>: <same format as `queue.task(<taskId>)`> }, "continuationToken": <continuation token> }
- Request body consisting of a JSON object:
- Endpoint:
/tasks/status
- HTTP GET:
- Request body consisting of a JSON object:
{ "taskIds": [<taskId>] }
- Response body:
{ "statuses": { <taskId>: <same format as `queue.status(<taskId>)`> }, "continuationToken": <continuation token> }
- Request body consisting of a JSON object:
- Endpoint
/tasks/indexes
- HTTP GET:
- Request body consisting of a JSON object:
{ "indexes": [<indexPath>] }
- Response body:
{ "tasks": [<same format as `index.findTask(<indexPath>)`>] "continuationToken": <continuation token> }
- Request body consisting of a JSON object:
Each endpoint will return up to 1000 results. If this number is exceeded, a
continuationToken
will be provided.
There are no compatibility or security concerns, all new APIs are essentially wrapping existing APIs.
- Do we bother implementing
/tasks
as well even though Taskgraph wouldn't benefit much? - Should
/tasks/indexes
also allow listing multiple tasks under multiple namespaces? Or should we enforce index paths pointing to specific tasks? - Should we bother with continuationTokens? Or simply set a limit and force consumers to chunk their own task ids and index paths if they exceed the limit?
<Once the RFC is decided, these links will provide readers a way to track the implementation through to completion, and to know if they are running a new enough version to take advantage of this change. It's fine to update this section using short PRs or pushing directly to master after the RFC is decided>