-
Notifications
You must be signed in to change notification settings - Fork 792
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TRIS-722] Custom Tagging #2144
base: master
Are you sure you want to change the base?
Conversation
7a3be98
to
66e562e
Compare
edd8055
to
e178c81
Compare
dd7073a
to
00be0f0
Compare
2810093
to
0d45ad5
Compare
0d45ad5
to
4c5cebf
Compare
TestingSetup: Environment variable:
Step functions create command:
Flow:
Output/Result: Additional notes: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
notes for testing, the config variable
METAFLOW_BATCH_EMIT_TAGS=True
needs to be set in order for tags to have any effect (even validation).
For a better UX, there could be a check somewhere in the path that if aws tags are being set, but emit tags is set to false, we print a warning on this.
Validation could also maybe be run despite the emit_tags feature flag, in order to catch issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to add similar logic that is present in f.ex. the kubernetes_decorator.py#runtime_step_cli
which json.dumps some attributes to the cli_args.commands
, namely the new aws_tags
in this case.
otherwise the passed in value will be not handled correctly in the CLI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for example replacing the cli_args.commands.update(self.attributes)
with the following should work:
# dictionaries need to be json dumped for the CLI
json_types = ["aws_tags"]
for k, v in self.attributes.items():
if k in json_types:
cli_args.command[k] = json.dumps(v)
else:
cli_args.command_options[k] = v
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also for the batch_cli.py
side, the --aws_tags
option needs to be
@click.option("--aws-tags", multiple=False, type=JSONTypeClass(), default=None, help="AWS tags.")
in order to convert types correctly. note the multiple=False also, was there a reason to have the plural option act as multiple?
|
||
if not isinstance(BATCH_DEFAULT_TAGS, dict): | ||
raise BatchException( | ||
"The BATCH_DEFAULT_TAGS config option must be a dictionary of key-value tags." | ||
) | ||
|
||
for name, value in BATCH_DEFAULT_TAGS.items(): | ||
aws_tag = {'key': name, 'value': value} | ||
validate_aws_tag(aws_tag) | ||
job.tag(name, value) | ||
|
||
if step_function_tags is not None: | ||
aws_tags_list = [] | ||
for tag in step_function_tags: | ||
key_value = tag.split("=", 1) | ||
if len(key_value) == 2: | ||
aws_tags_list.append({ | ||
'key': key_value[0], | ||
'value': key_value[1] | ||
}) | ||
for tag in aws_tags_list: | ||
validate_aws_tag(tag) | ||
job.tag(tag['key'], tag['value']) | ||
|
||
if aws_tags is not None: | ||
if not isinstance(aws_tags, dict): | ||
raise BatchException("tags must be a dictionary of key-value tags.") | ||
decorator_aws_tags_list = [ | ||
{'key': key, | ||
'value': val} for key, val in aws_tags.items() | ||
] | ||
for tag in decorator_aws_tags_list: | ||
validate_aws_tag(tag) | ||
job.tag(tag['key'], tag['value']) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't test this part thoroughly, but there is one significant caveat with running validations in the batch setup phase, namely: when validation fails, the metadata for the run/step/task have already been registered in this case, leading to a failed task with no logs.
An alternative approach with a better UX would be to perform batch tag validation inside the batch_decorator.py#step_init
, which will happen before job submission. This allows for erroring out fairly quickly due to malformed tags.
Is there any issue with moving the validation and the BATCH_DEFAULT_TAGS
usage into the decorator completely?
also can you verify after the changes that the following flows run as expected? both with batch, and step-functions expected to fail from metaflow import step, FlowSpec, batch
class BatchTags(FlowSpec):
@batch(aws_tags={"inv@lid": "t@g"})
@step
def start(self):
print("Starting 👋")
self.next(self.end)
@step
def end(self):
print("Done! 🏁")
if __name__ == "__main__":
BatchTags() expected to run from metaflow import (
FlowSpec,
step,
batch,
)
custom_step_tags = {
"goodbye": "world",
"hello": "universe",
}
class BatchTags2(FlowSpec):
@step
def start(self):
self.next(self.hello)
@batch(aws_tags=custom_step_tags)
@step
def hello(self):
self.next(self.end)
@step
def end(self):
pass
if __name__ == "__main__":
BatchTags2() |
Options to add custom tags are in the order:
environment variable dictionary,
step functions cli create flag,
batch decorator variable dictionary
This PR builds ontop of this PR: #1628 - this original PR did not work when testing but I would like to credit it here as some of the changes are still used.