Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

launch a multiuser instance as a job #5531

Open
1 task done
Tracked by #5201
grondo opened this issue Nov 3, 2023 · 9 comments
Open
1 task done
Tracked by #5201

launch a multiuser instance as a job #5531

grondo opened this issue Nov 3, 2023 · 9 comments

Comments

@grondo
Copy link
Contributor

grondo commented Nov 3, 2023

It would be useful to be able to launch a subinstance in a system instance which is also capable of running multiuser jobs.
This feature in combination with user-based access controls could offer one way implement dedicated access time.

This is a tracking issue to discuss the implementation and track any bugs that need to be fixed to get a basic implementation.

Tasks

@grondo
Copy link
Contributor Author

grondo commented Nov 3, 2023

Some notes offline from @garlick:

  • It might be nice if the DAT local socket ended up in /run/flux like the system instance one but wtih a different name. Then we wouldn't have to audit security for the other things in the rundir.
  • need to disable the doom timeout
  • might want to launch the instance with a large fanout similar to system instance
  • would be nice to have a new frontend command for this purpose which encodes all the above

Other notes found experimentally:

  • For systemd/sdbus support in the DAT instance, XDG_RUNTIME_DIR=/run/user/$UID and DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/$UID/bus may need to be exported to the job
  • if connector socket is not moved to /run/flux then chmod +x $(flux getattr rundir) on all ranks will be necessary

@garlick
Copy link
Member

garlick commented Nov 4, 2023

If (non-critical) nodes crash during a DAT, the DAT instance continues but we don't have a way to re-add them. #5184 (FLUB bootstrap) might be once piece of a solution...

@grondo
Copy link
Contributor Author

grondo commented Aug 8, 2024

This is a minimal proof-of-concept of launching a multi-user capable subinstance:

As user flux:

$ flux alloc -N2 --conf=access.allow-guest-user=true --conf=exec.imp=$(flux config get exec.imp)
$ flux exec sh -c 'chmod uo+x $(flux getattr rundir)'

Then as another valid user on the system:

$ flux jobs -A
       JOBID USER     NAME       ST NTASKS NNODES     TIME INFO
 ƒMDKKUWCR4B flux     flux        R      2      2   1.517m pi[3-4]
$ flux uptime
 13:38:57 run 2.9m,  owner flux,  depth 1,  size 2
$ flux run -N2 id
uid=1001(grondo) gid=1001(grondo) groups=1001(grondo),100(users)
uid=1001(grondo) gid=1001(grondo) groups=1001(grondo),100(users)

For testing purposes, more complex configuration could be placed in a test.toml provided to --conf, e.g. to support the job prolog on a multi-user subinstance:

[job-manager]
plugins = [ { load = "perilog.so" } ]

[job-manager.prolog]
command = [
  "flux",
  "perilog-run",
  "prolog",
  "--timeout=10s",
  "--with-imp",
  "-e",
  "prolog"
]

Then just add --conf=test.toml to the command line above.

One potential issue is that if the subinstance were to use sdexec or housekeeping, I think there is a potential for name collision in the units at different instance levels.

@grondo
Copy link
Contributor Author

grondo commented Aug 8, 2024

BTW, there was a question about whether users would have access to the URI for child instances launched by the flux user. Obviously, they do (because the above example just works). This is because the URI is set via a memo event in the job's eventlog, which is in turn added to the job user annotations available from the job-list module. (This is also how jobs that are subinstances of flux appear in the color blue in the output of flux jobs, when color is available)

@wihobbs
Copy link
Member

wihobbs commented Aug 28, 2024

This one is a bit of a head-scratcher. Seen in the prolog of a multiuser instance:

[flux@fluke2:flux]$ flux alloc -N2 --requires=host:fluke[131-132] --conf=/var/flux/hobbs.toml
[flux@fluke131:tmp]$ flux exec sh -c 'chmod uo+x $(flux getattr rundir)' && chmod a+rx /tmp/flux/flux-x6McZ6/ && flux exec sh -c 'chmod uo+x /tmp/flux'
[flux@fluke131:tmp]$ flux alloc -N1 hostname
Aug 28 10:28:29.863952 PDT job-manager.err[0]: fBXJFMS7: prolog: stderr: fluke132 (rank 1): flux-job: Operation not permitted
fluke132

The error was triggered by calling flux job info $FLUX_JOB_ID jobspec.

For the record, id in the prolog shows:

2024-08-28T17:28:37.025461Z job-manager.info[0]: fBXJFMS7: epilog: stdout: uid=0(root) gid=755(flux) groups=755(flux),3172(iseepids)

Maybe the gid also needs to be root to run this operation?

This is similar to what we do in the system-instance prolog, except I'm not running --sdexec under flux perilog-run.

@grondo
Copy link
Contributor Author

grondo commented Aug 28, 2024

Oh, you might need to ensure that access.allow-root-owner is set to true since the prolog runs as root.

@wihobbs
Copy link
Member

wihobbs commented Aug 28, 2024

That did it, thanks @grondo!

@wihobbs
Copy link
Member

wihobbs commented Oct 15, 2024

For a final summary here is how to start a multiuser instance as a job under the instance owner of a system instance:

[flux@fluke2:~]$ cat /var/flux/conf.toml
# Test configuration file for launching multiuser Flux
# instance as a job

[access]
allow-guest-user = true
allow-root-owner = true

[job-manager]
plugins = [ { load = "perilog.so" } ]

[ingest.validator]
plugins = [ "jobspec" ]

[exec]
imp = "/usr/libexec/flux/flux-imp"

# Note you could add a job-manager.prolog and epilog here.
# This will require a separate imp configuration in a specific /etc
# directory and will overwrite previous imp configs for that node/
# system. Proceed with caution.

[flux@fluke2:~]$ flux alloc -N2 --conf=/var/flux/conf.toml
[flux@fluke131:~]$ flux resource list
     STATE PROPERTIES NNODES NCORES NGPUS NODELIST
      free batch           2      4     0 fluke[131-132]
 allocated                 0      0     0
      down                 0      0     0

Note the need for some directory permission mangling:

[flux@fluke131:~]$     flux exec sh -c 'chmod uo+x $(flux getattr rundir)' && \
>       chmod a+rx /tmp/flux/flux-*/ && flux exec sh -c \
>       'chmod uo+x /tmp/flux'

And then as another user on the system

(s=130,d=0)  fluke2 ~ $ whoami
hobbs17
(s=130,d=0)  fluke2 ~ $ flux jobs -u flux
       JOBID QUEUE    USER     NAME       ST NTASKS NNODES     TIME INFO
 fAMC368J3a7 batch    flux     flux        R      2      2   54.42s fluke[131-132]
(s=130,d=0)  fluke2 ~ $ flux proxy fAMC368J3a7
(s=2,d=1)  fluke2 ~ $ flux run -N2 hostname
fluke131
fluke132

@wihobbs
Copy link
Member

wihobbs commented Oct 15, 2024

Correction: "will overwrite" in that comment is a bit of a misstatement.

The final TOML table that is read in alphabetical order from /etc/flux/imp/conf.d can overwrite previous tables if there is a conflict, i.e. the run table is defined in both a-imp.toml and b-imp.toml (b-imp.toml's run table would prevail). Even if the actual keys in the run table in both config files do not have conflicting names.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants