Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow tool access to Flux while running outside an allocation #6546

Open
ardangelo opened this issue Jan 10, 2025 · 6 comments
Open

Allow tool access to Flux while running outside an allocation #6546

ardangelo opened this issue Jan 10, 2025 · 6 comments

Comments

@ardangelo
Copy link

We would like to support attaching our tools to jobs run outside of a batch job or allocation. The tools require Flux utilities such as flux archive and accessing the Flux API while not inside an allocation.

Ideally there would be a way to enter or provide the context of the running job to use flux arhive and access the Flux API.

For example using flux archive to attempt to add a tool support file for a job launched with flux run results in the error

error storing blob: Request requires owner credentials
@grondo
Copy link
Contributor

grondo commented Jan 10, 2025

Perhaps obvious, but flux archive requires write access to the KVS to store target files for extraction. Its use is also discouraged at the system instance level since files in the KVS of a long-running instance may bloat the content store. Additionally, most system instances of Flux are configured with a flat TBON (all compute node brokers are direct children of rank 0), so flux archive may not scale very well at all.

I wonder if we can come up with some alternate solution vs flux archive for the case where the target job is running in a multi-user instance. One idea would be to somehow use the subprocess server in the job shell (running as the job owner) to copy the files. I wonder if @garlick has any other ideas.

@garlick
Copy link
Member

garlick commented Jan 10, 2025

See also #5697 for a review of the use case - is that description still accurate @ardangelo?

@garlick
Copy link
Member

garlick commented Jan 10, 2025

I had forgotten but flux exec --jobid only works for the instance owner :-(

 garlick@picl0:~$ flux batch -N1 --wrap sleep 3600
ƒ6JHrmF78sZ
 garlick@picl0:~$ flux exec --jobid $(flux job last) hostname
flux-exec: Error: rank 0: hostname: Request requires owner credentials

From shell/rexec.c

/* The embedded subprocess server restricts access based on FLUX_ROLE_OWNER,
 * but this shell cannot trust message credentials if they are passing through
 * a Flux instance running as a different user (e.g. the "flux" user in a
 * system instance).  If that user were compromised, they could run arbitrary
 * commands as any user that currently has a job running.  Therefore, this
 * additional check ensures that we only trust an instance running as the same
 * user.
 * 
 * For good measure, check that the shell userid matches the credential
 * userid. After the above check, this could only fail in test where the
 * owner can be mocked.
 */

@grondo
Copy link
Contributor

grondo commented Jan 10, 2025

I can't remember if we had any ideas to address that? Perhaps an optional munge credential that validates message credentials?

@garlick
Copy link
Member

garlick commented Jan 10, 2025

Good question, I'll open a separate issue.

@ardangelo
Copy link
Author

See also #5697 for a review of the use case - is that description still accurate @ardangelo?

Yes, we basically need to be able to perform the same operations as in an allocation / batch but outside an allocation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants