How to make a modular job shell plugin #3232
-
I am currently working on a
Down the road, there could also be a fifth stage where the plugin signals that the staging is complete (to enable async staging, where the staging does not block the starting of the job). Currently, all of these stages are implemented within the same job shell plugin. Stage 3 currently relies on the scheduler including the mountpoint in the emitted R. Stage 4 currently uses Boost's filesystem::copy to perform the staging. It would be great if stages 3 and 4 could be abstracted out into their own plugins, so that we can replace the simple default functionality with more powerful functionality (e.g., parallel staging with something like dcp) or with vendor-specific mechanisms. In a previous coffee call, @grondo helpfully described that the job shell provides two modularity options: services and the plugstack. My understanding of how services would work is that stages 3 and 4 would be spun out into their own job shell plugins and register their own services with I'm not quite able to determine how the plugstack would be used. It seems clear to me that the |
Beta Was this translation helpful? Give feedback.
Replies: 5 comments 9 replies
-
Using
As long as all the appropriate plugins have been loaded by the shell and have "subscribed" to the right topic via BTW, every shell plugin has an associated "name", and only one plugin for each name can be loaded at a time. This is how you can override builtin shell plugins for example (the last loaded plugin of a given name wins). Hopefully this helped and didn't cause more confusion! |
Beta Was this translation helpful? Give feedback.
-
Follow up question. With plugstack, it looks relatively easy to pass information down into the plugin/call via the |
Beta Was this translation helpful? Give feedback.
-
Yes, the plugin being called can set "output" arguments in the flux_plugin_arg_pack (args, FLUX_PLUGIN_ARG_OUT, "{s:s}", "mountpt", "/foo") If multiple plugins could be called as part of the callback, then you might also want to use the |
Beta Was this translation helpful? Give feedback.
-
I embarassingly forgot about the data staging prototype in flux-sched when working on #4789 which was merged yesterday and is documented in flux-filemap(1) and flux-shell(1). The design of the data-staging prototype would appear to be the basis for coral2 rabbit support in Flux, in which one can request storage as a resource using a post-v1 jobspec, and then fluxion allocates storage represented as JGF under the scheduling key in R. On the surface of it, the new core If possible I'd like to revive @SteVwonder 's idea in this discussion (If I am understanding ok) about having this plugin outsource steps 3 and 4 above. If we can get this plugin to pass the storage mount point to the @jameshcorbett and/or @trws can you confirm that is still the plan for rabbit support on coral2? |
Beta Was this translation helpful? Give feedback.
-
It's a shell plugin provided by fluxion, and actually it's installed on all our production systems :-) So maybe we ought to rethink that if it's been overtaken by events, or even consider removing it entirely. I can open an issue on that. Just talking with @grondo offline, it sounds like the coral2 plan involves setting DWS attributes in jobspec v1 to request storage resources, not using this post-v1 jobspec that seems to be assumed by |
Beta Was this translation helpful? Give feedback.
Using
flux_plugin_register(3)
a shell plugin can register a callback for any topic glob, e.g.datastaging.*
ordatastaging.copy
, not justshell.*
topics. In a way, you could think of a shell plugin as "subscribing" to events that are emitted by the shell via the plugin interfaces. Since a shell plugin can subscribe to any event, and another shell plugin can "call" any event (viaflux_shell_plugstack_call(3)
), this allows any plugin to call a function from any other shell plugin.