Skip to content

Latest commit

 

History

History
 
 

client-py

Taskcluster Client for Python

Download License

A Taskcluster client library for Python.

This library is a complete interface to Taskcluster in Python. It provides both synchronous and asynchronous interfaces for all Taskcluster API methods.

Usage

For a general guide to using Taskcluster clients, see Calling Taskcluster APIs.

Setup

Before calling an API end-point, you'll need to create a client instance. There is a class for each service, e.g., Queue and Auth. Each takes the same options, described below. Note that only rootUrl is required, and it's unusual to configure any other options aside from credentials.

For each service, there are sync and async variants. The classes under taskcluster (e.g., taskcluster.Queue) operate synchronously. The classes under taskcluster.aio (e.g., taskcluster.aio.Queue) are asynchronous.

Authentication Options

Here is a simple set-up of an Index client:

import taskcluster
index = taskcluster.Index({
  'rootUrl': 'https://tc.example.com',
  'credentials': {'clientId': 'id', 'accessToken': 'accessToken'},
})

The rootUrl option is required as it gives the Taskcluster deployment to which API requests should be sent. Credentials are only required if the request is to be authenticated -- many Taskcluster API methods do not require authentication.

In most cases, the root URL and Taskcluster credentials should be provided in standard environment variables. Use taskcluster.optionsFromEnvironment() to read these variables automatically:

auth = taskcluster.Auth(taskcluster.optionsFromEnvironment())

Note that this function does not respect TASKCLUSTER_PROXY_URL. To use the Taskcluster Proxy from within a task:

auth = taskcluster.Auth({'rootUrl': os.environ['TASKCLUSTER_PROXY_URL']})

Authorized Scopes

If you wish to perform requests on behalf of a third-party that has small set of scopes than you do. You can specify which scopes your request should be allowed to use, in the authorizedScopes option.

opts = taskcluster.optionsFromEnvironment()
opts['authorizedScopes'] = ['queue:create-task:highest:my-provisioner/my-worker-type']
queue = taskcluster.Queue(opts)

Other Options

The following additional options are accepted when constructing a client object:

  • signedUrlExpiration - default value for the expiration argument to buildSignedUrl
  • maxRetries - maximum number of times to retry a failed request

Calling API Methods

API methods are available as methods on the corresponding client object. For sync clients, these are sync methods, and for async clients they are async methods; the calling convention is the same in either case.

There are four calling conventions for methods:

client.method(v1, v1, payload)
client.method(payload, k1=v1, k2=v2)
client.method(payload=payload, query=query, params={k1: v1, k2: v2})
client.method(v1, v2, payload=payload, query=query)

Here, v1 and v2 are URL parameters (named k1 and k2), payload is the request payload, and query is a dictionary of query arguments.

For example, in order to call an API method with query-string arguments:

await queue.listTaskGroup('JzTGxwxhQ76_Tt1dxkaG5g',
  query={'continuationToken': previousResponse.get('continuationToken')})

Generating URLs

It is often necessary to generate the URL for an API method without actually calling the method. To do so, use buildUrl or, for an API method that requires authentication, buildSignedUrl.

import taskcluster

index = taskcluster.Index(taskcluster.optionsFromEnvironment())
print(index.buildUrl('findTask', 'builds.v1.latest'))
secrets = taskcluster.Secrets(taskcluster.optionsFromEnvironment())
print(secret.buildSignedUrl('get', 'my-secret'))

Note that signed URLs are time-limited; the expiration can be set with the signedUrlExpiration option to the client constructor, or with the expiration keyword arguement to buildSignedUrl, both given in seconds.

Generating Temporary Credentials

If you have non-temporary taskcluster credentials you can generate a set of temporary credentials as follows. Notice that the credentials cannot last more than 31 days, and you can only revoke them by revoking the credentials that was used to issue them (this takes up to one hour).

It is not the responsibility of the caller to apply any clock drift adjustment to the start or expiry time - this is handled by the auth service directly.

import datetime

start = datetime.datetime.now()
expiry = start + datetime.timedelta(0,60)
scopes = ['ScopeA', 'ScopeB']
name = 'foo'

credentials = taskcluster.createTemporaryCredentials(
    # issuing clientId
    clientId,
    # issuing accessToken
    accessToken,
    # Validity of temporary credentials starts here, in timestamp
    start,
    # Expiration of temporary credentials, in timestamp
    expiry,
    # Scopes to grant the temporary credentials
    scopes,
    # credential name (optional)
    name
)

You cannot use temporary credentials to issue new temporary credentials. You must have auth:create-client:<name> to create a named temporary credential, but unnamed temporary credentials can be created regardless of your scopes.

Handling Timestamps

Many taskcluster APIs require ISO 8601 time stamps offset into the future as way of providing expiration, deadlines, etc. These can be easily created using datetime.datetime.isoformat(), however, it can be rather error prone and tedious to offset datetime.datetime objects into the future. Therefore this library comes with two utility functions for this purposes.

dateObject = taskcluster.fromNow("2 days 3 hours 1 minute")
  # -> datetime.datetime(2017, 1, 21, 17, 8, 1, 607929)
dateString = taskcluster.fromNowJSON("2 days 3 hours 1 minute")
  # -> '2017-01-21T17:09:23.240178Z'

By default it will offset the date time into the future, if the offset strings are prefixed minus (-) the date object will be offset into the past. This is useful in some corner cases.

dateObject = taskcluster.fromNow("- 1 year 2 months 3 weeks 5 seconds");
  # -> datetime.datetime(2015, 10, 30, 18, 16, 50, 931161)

The offset string is ignorant of whitespace and case insensitive. It may also optionally be prefixed plus + (if not prefixed minus), any + prefix will be ignored. However, entries in the offset string must be given in order from high to low, ie. 2 years 1 day. Additionally, various shorthands may be employed, as illustrated below.

  years,    year,   yr,   y
  months,   month,  mo
  weeks,    week,         w
  days,     day,          d
  hours,    hour,         h
  minutes,  minute, min
  seconds,  second, sec,  s

The fromNow method may also be given a date to be relative to as a second argument. This is useful if offset the task expiration relative to the the task deadline or doing something similar. This argument can also be passed as the kwarg dateObj

dateObject1 = taskcluster.fromNow("2 days 3 hours");
dateObject2 = taskcluster.fromNow("1 year", dateObject1);
taskcluster.fromNow("1 year", dateObj=dateObject1);
  # -> datetime.datetime(2018, 1, 21, 17, 59, 0, 328934)

Generating SlugIDs

To generate slugIds (Taskcluster's client-generated unique IDs), use taskcluster.slugId(), which will return a unique slugId on each call.

In some cases it is useful to be able to create a mapping from names to slugIds, with the ability to generate the same slugId multiple times. The taskcluster.stableSlugId() function returns a callable that does just this.

gen = taskcluster.stableSlugId()
sometask = gen('sometask')
assert gen('sometask') == sometask  # same input generates same output
assert gen('sometask') != gen('othertask')

gen2 = taskcluster.stableSlugId()
sometask2 = gen('sometask')
assert sometask2 != sometask  # but different slugId generators produce
                              # different output

Scope Analysis

The scopeMatch(assumedScopes, requiredScopeSets) function determines whether one or more of a set of required scopes are satisfied by the assumed scopes, taking *-expansion into account. This is useful for making local decisions on scope satisfaction, but note that assumed_scopes must be the expanded scopes, as this function cannot perform expansion.

It takes a list of a assumed scopes, and a list of required scope sets on disjunctive normal form, and checks if any of the required scope sets are satisfied.

Example:

requiredScopeSets = [
    ["scopeA", "scopeB"],
    ["scopeC:*"]
]
assert scopesMatch(['scopeA', 'scopeB'], requiredScopeSets)
assert scopesMatch(['scopeC:xyz'], requiredScopeSets)
assert not scopesMatch(['scopeA'], requiredScopeSets)
assert not scopesMatch(['scopeC'], requiredScopeSets)

Pagination

Many Taskcluster API methods are paginated. There are two ways to handle pagination easily with the python client. The first is to implement pagination in your code:

import taskcluster
queue = taskcluster.Queue({'rootUrl': 'https://tc.example.com'})
i = 0
tasks = 0
outcome = queue.listTaskGroup('JzTGxwxhQ76_Tt1dxkaG5g')
while outcome.get('continuationToken'):
    print('Response %d gave us %d more tasks' % (i, len(outcome['tasks'])))
    if outcome.get('continuationToken'):
        outcome = queue.listTaskGroup('JzTGxwxhQ76_Tt1dxkaG5g', query={'continuationToken': outcome.get('continuationToken')})
    i += 1
    tasks += len(outcome.get('tasks', []))
print('Task Group %s has %d tasks' % (outcome['taskGroupId'], tasks))

There's also an experimental feature to support built in automatic pagination in the sync client. This feature allows passing a callback as the 'paginationHandler' keyword-argument. This function will be passed the response body of the API method as its sole positional arugment.

This example of the built in pagination shows how a list of tasks could be built and then counted:

import taskcluster
queue = taskcluster.Queue({'rootUrl': 'https://tc.example.com'})

responses = []

def handle_page(y):
    print("%d tasks fetched" % len(y.get('tasks', [])))
    responses.append(y)

queue.listTaskGroup('JzTGxwxhQ76_Tt1dxkaG5g', paginationHandler=handle_page)

tasks = 0
for response in responses:
    tasks += len(response.get('tasks', []))

print("%d requests fetch %d tasks" % (len(responses), tasks))

Pulse Events

This library can generate exchange patterns for Pulse messages based on the Exchanges definitions provded by each service. This is done by instantiating a <service>Events class and calling a method with the name of the vent. Options for the topic exchange methods can be in the form of either a single dictionary argument or keyword arguments. Only one form is allowed.

from taskcluster import client
qEvt = client.QueueEvents({rootUrl: 'https://tc.example.com'})
# The following calls are equivalent
print(qEvt.taskCompleted({'taskId': 'atask'}))
print(qEvt.taskCompleted(taskId='atask'))

Note that the client library does not provide support for interfacing with a Pulse server.

Logging

Logging is set up in taskcluster/__init__.py. If the special DEBUG_TASKCLUSTER_CLIENT environment variable is set, the __init__.py module will set the logging module's level for its logger to logging.DEBUG and if there are no existing handlers, add a logging.StreamHandler() instance. This is meant to assist those who do not wish to bother figuring out how to configure the python logging module but do want debug messages

Uploading and Downloading Objects

The Object service provides an API for reliable uploads and downloads of large objects. This library provides convenience methods to implement the client portion of those APIs, providing well-tested, resilient upload and download functionality. These methods will negotiate the appropriate method with the object service and perform the required steps to transfer the data.

All methods are available in both sync and async versions, with identical APIs except for the async/await keywords.

In either case, you will need to provide a configured Object instance with appropriate credentials for the operation.

NOTE: There is an helper function to upload s3 artifacts, taskcluster.helper.upload_artifact, but it is deprecated as it only supports the s3 artifact type.

Uploads

To upload, use any of the following:

  • await taskcluster.aio.upload.uploadFromBuf(projectId=.., name=.., contentType=.., contentLength=.., uploadId=.., expires=.., maxRetries=.., objectService=.., data=..) - asynchronously upload data from a buffer full of bytes.
  • await taskcluster.aio.upload.uploadFromFile(projectId=.., name=.., contentType=.., contentLength=.., uploadId=.., expires=.., maxRetries=.., objectService=.., file=..) - asynchronously upload data from a standard Python file. Note that this is probably what you want, even in an async context.
  • await taskcluster.aio.upload(projectId=.., name=.., contentType=.., contentLength=.., expires=.., uploadId=.., maxRetries=.., objectService=.., readerFactory=..) - asynchronously upload data from an async reader factory.
  • taskcluster.upload.uploadFromBuf(projectId=.., name=.., contentType=.., contentLength=.., expires=.., uploadId=.., maxRetries=.., objectService=.., data=..) - upload data from a buffer full of bytes.
  • taskcluster.upload.uploadFromFile(projectId=.., name=.., contentType=.., contentLength=.., expires=.., uploadId=.., maxRetries=.., objectService=.., file=..) - upload data from a standard Python file.
  • taskcluster.upload(projectId=.., name=.., contentType=.., contentLength=.., expires=.., uploadId=.., maxRetries=.., objectService=.., readerFactory=..) - upload data from a sync reader factory.

A "reader" is an object with a read(max_size=-1) method which reads and returns a chunk of 1 .. max_size bytes, or returns an empty string at EOF, async for the async functions and sync for the remainder. A "reader factory" is an async callable which returns a fresh reader, ready to read the first byte of the object. When uploads are retried, the reader factory may be called more than once.

The uploadId parameter may be omitted, in which case a new slugId will be generated.

Downloads

To download, use any of the following:

  • await taskcluster.aio.download.downloadToBuf(name=.., maxRetries=.., objectService=..) - asynchronously download an object to an in-memory buffer, returning a tuple (buffer, content-type). If the file is larger than available memory, this will crash.
  • await taskcluster.aio.download.downloadToFile(name=.., maxRetries=.., objectService=.., file=..) - asynchronously download an object to a standard Python file, returning the content type.
  • await taskcluster.aio.download.download(name=.., maxRetries=.., objectService=.., writerFactory=..) - asynchronously download an object to an async writer factory, returning the content type.
  • taskcluster.download.downloadToBuf(name=.., maxRetries=.., objectService=..) - download an object to an in-memory buffer, returning a tuple (buffer, content-type). If the file is larger than available memory, this will crash.
  • taskcluster.download.downloadToFile(name=.., maxRetries=.., objectService=.., file=..) - download an object to a standard Python file, returning the content type.
  • taskcluster.download.download(name=.., maxRetries=.., objectService=.., writerFactory=..) - download an object to a sync writer factory, returning the content type.

A "writer" is an object with a write(data) method which writes the given data, async for the async functions and sync for the remainder. A "writer factory" is a callable (again either async or sync) which returns a fresh writer, ready to write the first byte of the object. When uploads are retried, the writer factory may be called more than once.

Artifact Downloads

Artifacts can be downloaded from the queue service with similar functions to those above. These functions support all of the queue's storage types, raising an error for error artifacts. In each case, if runId is omitted then the most recent run will be used.

  • await taskcluster.aio.download.downloadArtifactToBuf(taskId=.., runId=.., name=.., maxRetries=.., queueService=..) - asynchronously download an object to an in-memory buffer, returning a tuple (buffer, content-type). If the file is larger than available memory, this will crash.
  • await taskcluster.aio.download.downloadArtifactToFile(taskId=.., runId=.., name=.., maxRetries=.., queueService=.., file=..) - asynchronously download an object to a standard Python file, returning the content type.
  • await taskcluster.aio.download.downloadArtifact(taskId=.., runId=.., name=.., maxRetries=.., queueService=.., writerFactory=..) - asynchronously download an object to an async writer factory, returning the content type.
  • taskcluster.download.downloadArtifactToBuf(taskId=.., runId=.., name=.., maxRetries=.., queueService=..) - download an object to an in-memory buffer, returning a tuple (buffer, content-type). If the file is larger than available memory, this will crash.
  • taskcluster.download.downloadArtifactToFile(taskId=.., runId=.., name=.., maxRetries=.., queueService=.., file=..) - download an object to a standard Python file, returning the content type.
  • taskcluster.download.downloadArtifact(taskId=.., runId=.., name=.., maxRetries=.., queueService=.., writerFactory=..) - download an object to a sync writer factory, returning the content type.

Integration Helpers

The Python Taskcluster client has a module taskcluster.helper with utilities which allows you to easily share authentication options across multiple services in your project.

Generally a project using this library will face different use cases and authentication options:

  • No authentication for a new contributor without Taskcluster access,
  • Specific client credentials through environment variables on a developer's computer,
  • Taskcluster Proxy when running inside a task.

Shared authentication

The class taskcluster.helper.TaskclusterConfig is made to be instantiated once in your project, usually in a top level module. That singleton is then accessed by different parts of your projects, whenever a Taskcluster service is needed.

Here is a sample usage:

  1. in project/__init__.py, no call to Taskcluster is made at that point:
from taskcluster.helper import Taskcluster config

tc = TaskclusterConfig('https://community-tc.services.mozilla.com')
  1. in project/boot.py, we authenticate on Taskcuster with provided credentials, or environment variables, or taskcluster proxy (in that order):
from project import tc

tc.auth(client_id='XXX', access_token='YYY')
  1. at that point, you can load any service using the authenticated wrapper from anywhere in your code:
from project import tc

def sync_usage():
    queue = tc.get_service('queue')
    queue.ping()

async def async_usage():
    hooks = tc.get_service('hooks', use_async=True)  # Asynchronous service class
    await hooks.ping()

Supported environment variables are:

  • TASKCLUSTER_ROOT_URL to specify your Taskcluster instance base url. You can either use that variable or instanciate TaskclusterConfig with the base url.
  • TASKCLUSTER_CLIENT_ID & TASKCLUSTER_ACCESS_TOKEN to specify your client credentials instead of providing them to TaskclusterConfig.auth
  • TASKCLUSTER_PROXY_URL to specify the proxy address used to reach Taskcluster in a task. It defaults to http://taskcluster when not specified.

For more details on Taskcluster environment variables, here is the documentation.

Loading secrets across multiple authentications

Another available utility is taskcluster.helper.load_secrets which allows you to retrieve a secret using an authenticated taskcluster.Secrets instance (using TaskclusterConfig.get_service or the synchronous class directly).

This utility loads a secret, but allows you to:

  1. share a secret across multiple projects, by using key prefixes inside the secret,
  2. check that some required keys are present in the secret,
  3. provide some default values,
  4. provide a local secret source instead of using the Taskcluster service (useful for local development or sharing secrets with contributors)

Let's say you have a secret on a Taskcluster instance named project/foo/prod-config, which is needed by a backend and some tasks. Here is its content:

common:
  environment: production
  remote_log: https://log.xx.com/payload

backend:
  bugzilla_token: XXXX

task:
  backend_url: https://backend.foo.mozilla.com

In your backend, you would do:

from taskcluster import Secrets
from taskcluster.helper import load_secrets

prod_config = load_secrets(
  Secrets({...}),
  'project/foo/prod-config',

  # We only need the common & backend parts
  prefixes=['common', 'backend'],

  # We absolutely need a bugzilla token to run
  required=['bugzilla_token'],

  # Let's provide some default value for the environment
  existing={
    'environment': 'dev',
  }
)
  # -> prod_config == {
  #     "environment": "production"
  #     "remote_log": "https://log.xx.com/payload",
  #     "bugzilla_token": "XXXX",
  #   }

In your task, you could do the following using TaskclusterConfig mentionned above (the class has a shortcut to use an authenticated Secrets service automatically):

from project import tc

prod_config = tc.load_secrets(
  'project/foo/prod-config',

  # We only need the common & bot parts
  prefixes=['common', 'bot'],

  # Let's provide some default value for the environment and backend_url
  existing={
    'environment': 'dev',
    'backend_url': 'http://localhost:8000',
  }
)
  # -> prod_config == {
  #     "environment": "production"
  #     "remote_log": "https://log.xx.com/payload",
  #     "backend_url": "https://backend.foo.mozilla.com",
  #   }

To provide local secrets value, you first need to load these values as a dictionary (usually by reading a local file in your format of choice : YAML, JSON, ...) and providing the dictionary to load_secrets by using the local_secrets parameter:

import os
import yaml

from taskcluster import Secrets
from taskcluster.helper import load_secrets

local_path = 'path/to/file.yml'

prod_config = load_secrets(
  Secrets({...}),
  'project/foo/prod-config',

  # We support an optional local file to provide some configuration without reaching Taskcluster
  local_secrets=yaml.safe_load(open(local_path)) if os.path.exists(local_path) else None,
)

Compatibility

This library is co-versioned with Taskcluster itself. That is, a client with version x.y.z contains API methods corresponding to Taskcluster version x.y.z. Taskcluster is careful to maintain API compatibility, and guarantees it within a major version. That means that any client with version x.* will work against any Taskcluster services at version x.*, and is very likely to work for many other major versions of the Taskcluster services. Any incompatibilities are noted in the Changelog.