Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: allow mustache templates in job environment variables #6506

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

grondo
Copy link
Contributor

@grondo grondo commented Dec 13, 2024

The job shell expands mustache templates for input/output filenames as well as in command arguments. It would also be useful to support this template expansion in environment variables, but currently this is not supported. It probably isn't efficient to wholesale expand mustache templates in all environment variables, since the common case would be no variables to expand. Also, some environment variables may happen to contain mustache templates which should not be expanded.

This PR introduces a new env-expand shell builtin plugin which applies mustache expansion to any environment variables in a new env-expand shell options key in jobspec.

On the cli side, the get_filtered_environment() function is extended to look for environment variable values specified on the command line and pass these back in a dictionary separate from the main environment. This dictionary is then saved in the attributes.system.shell.options.env-expand key for use by the shell plugin.

This means that only environment variables specified on the command line or in a file as VAR=TEMPLATE where TEMPLATE contains {{}} will be expanded. Variables in the environment that already do contain {{}} pass through the normal environment dictionary unmodified. Additionally, the env-expand shell plugin only has to process the minimum number of variables.

This is a WIP so I can get feedback on the approach before writing tests.

The current use case here is to place files in the job's tmpdir and set environment variables in the job to point to these files, e.g. to extend rc1:

flux alloc --add-file=rc1.d/rc1:700=`#!/bin/sh\nflux exec dosomething` --env=FLUX_RC_EXTRA={{tmpdir}}

@grondo grondo force-pushed the mustache-env branch 2 times, most recently from 3eb221f to 8185f27 Compare December 14, 2024 01:22
@garlick
Copy link
Member

garlick commented Dec 14, 2024

This seems like it could be really powerful. Nice!

Problem: It would be useful to allow expansion of mustache templates
in envionment variables in the job shell. However, expanding templates
in the entire jobspec environment dictionary would be inefficient,
since values with templates would be by far the uncommon case. Also,
environment variables may legitimately contain values that look like
mustache templates but should not be expanded.

Add a new shell builtin plugin for expanding select environment
variables in a `env-expand` shell options dictionary. This allows
the jobspec to opt-in to mustache expansion for a select set of
environment variables, solving the issues noted above.
Problem: It would be useful to expand mustache templates in environment
variable values, but this is currently not supported in the submission
cli commands.

When processing the `--env*` set of options, detect any provided
environment variable values that appear to contain a mustache
template. Return these in a separate dict from the main environ
dictionary and add them to `attributes.system.shel.optionsenv-expand`
in jobspec.  The job shell will then expand these variables before
running the job.
Problem: Environment variables in the env-expand jobspec options
dictionary cannot be expanded to different values per task because
the env-expand plugin only runs at the shell.init callback.

Add a task_expand_env() function in the env-expand plugin that runs
at task.init and thus is able to set per-task environment variables.

To avoid processing the same mustache templates multiple times,
keep the original env_expand() function to expand non task-specific
templates.  When a template is detected to be fully expanded, it is
removed from the env-expand object so that it is not re-processed
per task.
Problem: The shell env-expand plugin now supports task-specific
template expansion, but the shell doesn't provide any task tags.

Add support to the job shell for expanding the following task-specific
mustache tags:

 - {{taskid}}, {{task.id}}, or {{task.rank}} - global task rank
 - {{task.index}} or {{task.localid}} - local task rank
Problem: The job shell supports a limited set of mustache tags.

Expand the set of supported tags, including:

 - {{nnodes}} and {{size}} for the total node and task count
 - A set of node.* tags including:
  - {{node.id}}: index of this shell relative to job
  - {{node.ntasks}}: number of tasks local to this node
  - {{node.name}}: hostname
  - {{node.ncores}}: number of allocated cores on this node
  - {{node.ngpus}}: number of allocated gpus on this node
  - {{node.taskids}: task idset, e.g. 0-3
  - {{node.coreids}}: core idset, e.g. "0-3"
  - {{node.gpuids}}: gpu idset e.g. "0"
Copy link

codecov bot commented Dec 16, 2024

Codecov Report

Attention: Patch coverage is 27.77778% with 78 lines in your changes missing coverage. Please review.

Project coverage is 83.56%. Comparing base (23e0f78) to head (f5ecedc).

Files with missing lines Patch % Lines
src/shell/shell.c 9.61% 47 Missing ⚠️
src/shell/env-expand.c 36.95% 29 Missing ⚠️
src/bindings/python/flux/cli/base.py 80.00% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           master    #6506       +/-   ##
===========================================
+ Coverage   53.67%   83.56%   +29.89%     
===========================================
  Files         475      526       +51     
  Lines       79573    87797     +8224     
===========================================
+ Hits        42709    73369    +30660     
+ Misses      36864    14428    -22436     
Files with missing lines Coverage Δ
src/shell/builtins.c 92.30% <ø> (ø)
src/bindings/python/flux/cli/base.py 95.49% <80.00%> (ø)
src/shell/env-expand.c 36.95% <36.95%> (ø)
src/shell/shell.c 74.79% <9.61%> (+26.38%) ⬆️

... and 445 files with indirect coverage changes

@grondo
Copy link
Contributor Author

grondo commented Dec 16, 2024

I've just pushed an update (though this PR should still be considered an experimental WIP) that adds per-task environment variable mustache template expansion, along with some new task-specific mustache tags:

  • {{taskid}}, {{task.id}}, or {{task.rank}} - global task rank
  • {{task.index}} or {{task.localid}} - local task rank

Additionally, a set of job and node specific mustache tags are added:

  • {{nnodes}} and {{size}} for the total node and task count
  • A set of node.* tags including:
    • {{node.id}}: index of this shell relative to job
    • {{node.ntasks}}: number of tasks local to this node
    • {{node.name}}: hostname
    • {{node.ncores}}: number of allocated cores on this node
    • {{node.ngpus}}: number of allocated gpus on this node
    • {{node.taskids}: task idset, e.g. 0-3
    • {{node.coreids}}: core idset, e.g. "0-3"
    • {{node.gpuids}}: gpu idset e.g. "0"

With this, users could "opt-in" to having some equivalent environment variables to the 60 or set by default by Slurm.

This could also be useful in the cli plugins prototyped in #2875.

Finally, if the shell output plugin is modified to render the output file name per task, we can possibly close #4696, since a unique template per-task or per-node would result in per-task or per-node output files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants