Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#102 - Add overhead value to TestTask #243

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions mozci/queries/test_task_overhead.query
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
from: unittest
format: list
groupby: task.id
limit: 20000
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need such a high limit? If so, we might want to use destination: url like https://github.com/mozilla/mozci/blob/737c4cc0810bd745d13c90c511d18afff6baee20/mozci/queries/push_revisions.query.

Copy link
Contributor Author

@worldomonation worldomonation Jul 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No we do not - this was just carried over from the working query. It should return only 1 row. 1000 ought to be enough, though I'd like @klahnakoski to weigh in.

select:
- {value: action.start_time, name: task_min, aggregate: min}
- {value: action.end_time, name: task_max, aggregate: max}
- {value: result.start_time, name: group_min, aggregate: min}
- {value: result.end_time, name: group_max, aggregate: max}
where:
- in: {task.id: {$eval: task_id}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than having a separate query for this (or maybe in addition), I think we could refactor the existing push_task_results_from_unittest.query to include this information as well, saving ourselves from running a separate query.

I'd be fine if you decide to file a follow-up issue to implement this and save it for later.

38 changes: 38 additions & 0 deletions mozci/task.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,15 @@
import json
import os
from abc import ABC, abstractmethod
from argparse import Namespace
from dataclasses import dataclass, field
from enum import Enum
from inspect import signature
from statistics import median
from typing import Dict, List, Optional

import requests
from adr.query import run_query
from adr.util import memoized_property
from loguru import logger
from urllib3.response import HTTPResponse
Expand Down Expand Up @@ -358,6 +360,30 @@ def configuration(self):
parts = config.split("-")
return "-".join(parts[:-1] if parts[-1].isdigit() else parts)

@property
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be memoized

def overhead(self):
worldomonation marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is defined on a specific task, but we will have to define overheads, total_overhead, median_overhead on LabelSummary to actually use the information (e.g. the median overhead for a given suite on a given platform/configuration).

For task-level scheduling, we can just consider the sum of the median durations of all scheduled tasks.
For manifest-level scheduling, for each task we'll have to sum the durations of the groups in the task and sum the overhead associated with the task (I think by having a median overhead by suite and platform/configuration, we might be precise enough; maybe even just platform/configuration?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I will implement them while I wait for review from ahal and ekyle.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@worldomonation I will require the group-level run times, but I plan to compute the aggregates after the data has been put into the database. Specifically, there is a BigQuery database with some of the data generated by mozci; my plan is to use include what these functions generate and form queries from there.

"""Calculate the overhead of a task.

The methodology is simple: each task (action) has a start/end time.
Each group also has a start/end time. Take the earliest known group start
and latest known group end time, ensure the two falls somewhere in between
task start/end.

This definition of overhead does not take into account inter-group overhead
eg. restarting browser, teardown, etc.

Returns:
float: difference between task start/end and group start/end times.
"""
data = run_query("test_task_overhead", Namespace(task_id=self.id))["data"].pop()
# Sanity check to ensure group start/end times are within task start/end.
if data["task_min"] < data["group_min"] or data["task_max"] > data["group_max"]:
logger.warning(f"task f{self.id} has inconsistent group duration.")

return (data["group_min"] - data["task_min"]) + (
data["task_max"] - data["group_max"]
)


# Don't perform type checking because of https://github.com/python/mypy/issues/5374.
@dataclass # type: ignore
Expand Down Expand Up @@ -496,6 +522,18 @@ def total_duration(self):
def median_duration(self):
return median(self.durations)

@property
def overheads(self):
return [task.overhead for task in self.tasks]

@property
def total_overheads(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: shouldn't be pluralized

return sum(self.overheads)

@property
def median_overhead(self):
return median(self.overheads)

@memoized_property
def status(self):
overall_status = None
Expand Down