Replies: 1 comment
-
@ardangelo and @jameshcorbett: This is a catch-all issue ticket and we should move this to flux-core Discussions. I will move it there. If we don't like it there, we can always move this back to the Issue tracker. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I have written up a document with requirements for Flux to support our CTI tool launch library and products in general, how should I create individual issues for this project? Should I create one for every level-3 header?
Flux CDST requirements
General MPIR support
For more information on MPIR, see specification document at https://www.mpi-forum.org/docs/mpir-specification-03-01-2018.pdf. Can be expanded in individual issues.
CTI support
Reliably detecting the presence of Flux WLM
All supported CTI workload manager implementations have a method to determine if the implementation is supported in the current environment. It may be simple, such as the Slurm implementation checking the output for
slurm --version
, or more complex, such as the Cray PALS implementation ensuring that it is running on a Shasta machine, and that the current user has an API authentication token accessible.Service authentication
If authenticated, service interactions should be able to take place without any manual user input. For example, upon initialization, the Cray PALS implementation reads the user's local token and uses it to authenticate all service requests without manual user input.
Redirecting or acquiring job input / output / standard error
It is much easier to work programmatically with a running job if standard input can be supplied from an arbitrary file descriptor or file, and output / error redirected to a file descriptor or file. It is most pertinent when attaching to a currently running job, if there is a way to acquire input / output / error from a running job.
Query job status and data
The following job data should be queryable from Flux:
Allocation management
It should be possible to detect if the current terminal is running inside an existing resource allocation, if such facility is provided by Flux.
Programmatic job launch
Job launch requests should support:
Running job management
The Flux service should provide facilites to:
ATP support
For native ATP tool support, Flux should provide support for running pre-launch scripts or other plugin support that can run arbitrary binaries. It must be able to set environment variables in the job before it is launched on compute nodes, and be at a point in job launch at which the MPIR proctable information described above is available.
Beta Was this translation helpful? Give feedback.
All reactions