How to make a modular job shell plugin #3232

SteVwonder · 2020-09-28T18:14:53Z

SteVwonder
Sep 28, 2020
Maintainer

I am currently working on a datastaging job shell plugin, and I'm looking to make it as modular as possible. The four stages of the plugin are:

Validate 'staging' attributes section in jobspec
Parse R and Jobspec to determine what data to stage and which storage resource to stage it into (for the current rank)
Get the mountpoint for the storage resource that we are staging into
Perform the actual data staging/copy

Down the road, there could also be a fifth stage where the plugin signals that the staging is complete (to enable async staging, where the staging does not block the starting of the job).

Currently, all of these stages are implemented within the same job shell plugin. Stage 3 currently relies on the scheduler including the mountpoint in the emitted R. Stage 4 currently uses Boost's filesystem::copy to perform the staging. It would be great if stages 3 and 4 could be abstracted out into their own plugins, so that we can replace the simple default functionality with more powerful functionality (e.g., parallel staging with something like dcp) or with vendor-specific mechanisms.

In a previous coffee call, @grondo helpfully described that the job shell provides two modularity options: services and the plugstack. My understanding of how services would work is that stages 3 and 4 would be spun out into their own job shell plugins and register their own services with flux_shell_service_register. The datastaging plugin would then call these services with flux_shell_rpc_pack. To swap out the default stage 3 or 4 for different functionality, you would simply unload the default stage3/4 plugin and load a different job shell plugin that provides the same service as the stage 3/4 plugin.

I'm not quite able to determine how the plugstack would be used. It seems clear to me that the datastaging plugin would call flux_shell_plugstack_call to initiate stages 3/4, but what is not clear to me is how the plugstack gets created. Would the functionality in stages 3 and 4 still need to be spun out into separate plugins/shared libraries so that the datastaging plugin could call plugstack_load on them? Would the stage3/4 plugins call flux_plugin_register and flux_plugin_add_handler to register their services?

Answered by grondo

Sep 28, 2020

I'm not quite able to determine how the plugstack would be used.

Using flux_plugin_register(3) a shell plugin can register a callback for any topic glob, e.g. datastaging.* or datastaging.copy, not just shell.* topics. In a way, you could think of a shell plugin as "subscribing" to events that are emitted by the shell via the plugin interfaces. Since a shell plugin can subscribe to any event, and another shell plugin can "call" any event (via flux_shell_plugstack_call(3)), this allows any plugin to call a function from any other shell plugin.

what is not clear to me is how the plugstack gets created. Would the functionality in stages 3 and 4 still need to be spun out into separate plug…

View full answer

grondo · 2020-09-28T19:11:04Z

grondo
Sep 28, 2020
Maintainer

I'm not quite able to determine how the plugstack would be used.

Using flux_plugin_register(3) a shell plugin can register a callback for any topic glob, e.g. datastaging.* or datastaging.copy, not just shell.* topics. In a way, you could think of a shell plugin as "subscribing" to events that are emitted by the shell via the plugin interfaces. Since a shell plugin can subscribe to any event, and another shell plugin can "call" any event (via flux_shell_plugstack_call(3)), this allows any plugin to call a function from any other shell plugin.

what is not clear to me is how the plugstack gets created. Would the functionality in stages 3 and 4 still need to be spun out into separate plugins/shared libraries so that the datastaging plugin could call plugstack_load on them? Would the stage3/4 plugins call flux_plugin_register and flux_plugin_add_handler to register their services?

As long as all the appropriate plugins have been loaded by the shell and have "subscribed" to the right topic via flux_plugin_add_handler(3) or flux_plugin_register(3), then the datastaging plugin should be able to call the right functions in the stage 3/4 plugins just by using flux_shell_plugstack_call(3). The tricky part is making sure only the right plugins are called for the type of datastaging being done. This can be done by either making sure the shell only loads the appropriate plugin(s) on a given system (e.g. via the shell's intirc), or the shell could load all possible plugins and the type of staging could be passed down to the plugins via flux_plugin_arg_t * argument of flux_shell_plugstack_call(3).

BTW, every shell plugin has an associated "name", and only one plugin for each name can be loaded at a time. This is how you can override builtin shell plugins for example (the last loaded plugin of a given name wins).

Hopefully this helped and didn't cause more confusion!

1 reply

SteVwonder Sep 28, 2020
Maintainer Author

That's super helpful. Thanks @grondo!

SteVwonder · 2020-09-29T04:47:55Z

SteVwonder
Sep 29, 2020
Maintainer Author

Follow up question. With plugstack, it looks relatively easy to pass information down into the plugin/call via the flux_plugin_arg_t. Is there an equivalent mechanism for getting a "response" from the plugin other than through the mutation of the void *data that is passed to flux_plugin_add_handler? I was hoping to make getting the mountpoint of a device a plugin so that we can easily support different vendor-specific APIs, but that requires getting the mountpoint back from the plugin.

0 replies

grondo · 2020-09-29T15:03:24Z

grondo
Sep 29, 2020
Maintainer

Is there an equivalent mechanism for getting a "response" from the plugin other than through the mutation of the void *data that is passed to flux_plugin_add_handler?

Yes, the plugin being called can set "output" arguments in the flux_plugin_arg_t with flux_plugin_arg_set() or flux_plugin_arg_pack() using the FLUX_PLUGIN_ARG_OUT flag, e.g.

    flux_plugin_arg_pack (args, FLUX_PLUGIN_ARG_OUT, "{s:s}", "mountpt", "/foo")

If multiple plugins could be called as part of the callback, then you might also want to use the FLUX_PLUGIN_ARG_UPDATE flag, which would update the output args instead of overwriting them.

0 replies

garlick · 2022-12-09T17:00:02Z

garlick
Dec 9, 2022
Maintainer

I embarassingly forgot about the data staging prototype in flux-sched when working on #4789 which was merged yesterday and is documented in flux-filemap(1) and flux-shell(1). The design of the data-staging prototype would appear to be the basis for coral2 rabbit support in Flux, in which one can request storage as a resource using a post-v1 jobspec, and then fluxion allocates storage represented as JGF under the scheduling key in R.

On the surface of it, the new core stage-in plugin should take over this task by extracting the mount point from R on each shell and using it as a copy destination. However, that would require that flux-core be able to parse this out of the JGF, when the scheduling key in R is intended to be opaque to core.

If possible I'd like to revive @SteVwonder 's idea in this discussion (If I am understanding ok) about having this plugin outsource steps 3 and 4 above. If we can get this plugin to pass the storage mount point to the stage-in plugin in core through the plugstack, then I think it could transparently work with the rabbits.

@jameshcorbett and/or @trws can you confirm that is still the plan for rabbit support on coral2?

8 replies

jameshcorbett Dec 9, 2022
Maintainer

The plan to stuff opaque rabbit scheduling information into R is separate from any interactions with data-staging I guess.

jameshcorbett Dec 9, 2022
Maintainer

Perhaps it makes more sense if I add that Flux isn't responsible for doing any rabbit data movement? HPE software handles all of that.

trws Dec 9, 2022
Maintainer

We should probably discuss that, at least in terms of whether we should make sure that whatever flux's native interface for that will be will also compose with the rabbit setup. We need to have staging of some kind in flux in general, so while having HPE-specific support makes sense to me as a way to use the native setup they provide and use the rabbits more easily, I want to make sure we don't cut ourselves off at the knees for other systems.

garlick Dec 9, 2022
Maintainer

Right it does seem like that should be possible with a little "shimming" between the fluxion JGF and the stuff in core. I opened #4813 to track that notion from the core end. At minimum, the generic support in core ought to keep an eye out for use cases and ideas from the coral2 implementation. I already snagged a few of those by looking at open issues and the data-staging source code and opened issues.

SteVwonder Dec 9, 2022
Maintainer Author

Here is the data staging plug-in PR: flux-framework/flux-sched#743

It's already merged into flux-sched. I forget all the nitty gritty details. But the PR should explain.

garlick · 2022-12-09T19:17:09Z

garlick
Dec 9, 2022
Maintainer

It's a shell plugin provided by fluxion, and actually it's installed on all our production systems :-)

So maybe we ought to rethink that if it's been overtaken by events, or even consider removing it entirely. I can open an issue on that.

Just talking with @grondo offline, it sounds like the coral2 plan involves setting DWS attributes in jobspec v1 to request storage resources, not using this post-v1 jobspec that seems to be assumed by data-staging. If the part that reads the mount point out of R and does stuff with it is being provided by HPE I'm not sure that this thing has any role at this point.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to make a modular job shell plugin #3232

{{title}}

Replies: 5 comments 9 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

How to make a modular job shell plugin #3232

SteVwonder Sep 28, 2020 Maintainer

Replies: 5 comments · 9 replies

grondo Sep 28, 2020 Maintainer

SteVwonder Sep 28, 2020 Maintainer Author

SteVwonder Sep 29, 2020 Maintainer Author

grondo Sep 29, 2020 Maintainer

garlick Dec 9, 2022 Maintainer

jameshcorbett Dec 9, 2022 Maintainer

jameshcorbett Dec 9, 2022 Maintainer

trws Dec 9, 2022 Maintainer

garlick Dec 9, 2022 Maintainer

SteVwonder Dec 9, 2022 Maintainer Author

garlick Dec 9, 2022 Maintainer

SteVwonder
Sep 28, 2020
Maintainer

Replies: 5 comments 9 replies

grondo
Sep 28, 2020
Maintainer

SteVwonder Sep 28, 2020
Maintainer Author

SteVwonder
Sep 29, 2020
Maintainer Author

grondo
Sep 29, 2020
Maintainer

garlick
Dec 9, 2022
Maintainer

jameshcorbett Dec 9, 2022
Maintainer

jameshcorbett Dec 9, 2022
Maintainer

trws Dec 9, 2022
Maintainer

garlick Dec 9, 2022
Maintainer

SteVwonder Dec 9, 2022
Maintainer Author

garlick
Dec 9, 2022
Maintainer