-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
In-Task Metrics #52
Comments
Another option here as far as where to stuff it is talking with the other teams within EngOps to get their perspective. They also might already have an inhouse solution for doing this. |
Another option that I've used before for things like this is to have another file descriptor available to the task for writing metrics in some format if they don't want to dirty up their stdout with metrics information. We could support both methods for writing metrics if we want. Other options include an http endpoint or something, but file descriptors are pretty convenient from bash etc. |
I like the fd idea. It's pretty easy to write to an fd from anything -- even shell. |
Inside tasks running on taskcluster we do a lot of steps that it would be interesting to measure. Things like "time to clone gecko", "firefox build time", or "how often do we have a clobber build". It would be convenient for task-writers to record these metrics by printing special annotations in the log, like ### BEGIN my-metric-name and ### END my-metric-name . If workers would extract such annotations along with timestamps and report them to a services that would aggregate them we would be able to easily build statistics many different things. The service aggregating these metrics would have to index by when the metric was recorded as well as task.tags of the task the metric was recorded from. Such that we can slice and dice a metric by tags. As an example we might want to look at median, 95th percentile and mean for the firefox-build-time metric over all tasks with tags level=*, kind=debug and platform=linux64.
Extracting metrics from logs is a bit of work, the hard would be to index and aggregate the metrics in a scalable manner. Presumably, we would have to throw everything in a relational database, or perhaps a time series database like influxdb. It might also be worth while look at data warehouse solutions for inspiration. Or look into options for on-the-fly aggregation using t-digests, granted that probably won't work considering the explosive dimensionality of task.tags.
The text was updated successfully, but these errors were encountered: