Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update scr and axl spack packages to handle BBAPI #288

Closed
adammoody opened this issue Mar 17, 2021 · 9 comments
Closed

Update scr and axl spack packages to handle BBAPI #288

adammoody opened this issue Mar 17, 2021 · 9 comments
Assignees

Comments

@adammoody
Copy link
Contributor

The scr package has a variant for the async mode, like async_api=IBM_BBAPI, e.g.,:

bin/spack install scr resource_manager=LSF async_api=IBM_BBAPI -fortran %xl ^spectrum-mpi

The scr package turns on some cmake flags that then cause the build to fail. Otherwise, I don't think SCR does anything with that flag since the code has been moved to AXL.

  1. The axl package has a "daemon" option that should probably be removed.
  2. Should there be a "pthreads" variant? It's probably safe to always build with pthreads, in which case a variant is not needed.
  3. Do we need a "bbapi" variant in axl or should we just drop it and rely axl's cmake auto-detecting and auto-building with the bb software? If we decide to keep the bbvariant in axl, does it actually work?
  4. And should the scr package pass through its bbapi variant setting to its axl dependency, or should we require spack users to specify axl as a depencency? The feeling on the call was that we should keep that in scr and pass it through if needed.
@CamStan
Copy link
Member

CamStan commented Mar 17, 2021

Was thinking a little about some of this earlier.

  1. Do we need a "bbapi" variant in axl or should we just drop it and rely axl's cmake auto-detecting and auto-building with the bb software? If we decide to keep the bbvariant in axl, does it actually work?

At first glance, I don't know that we do need this currently. It looks like axl will try to find the bb software automatically, but there's not a cmake option for this as it does it independently of the AXL_ASYNC_API option.

Which begs the question, does AXL_ASYNC_API (and in turn, SCR_ASYNC_API) serve any purpose at the moment? If not, maybe at least remove the variants from the spackages, if not as a cmake option altogether (unless there's a plan to eventually use it in some way).
Afterthought: realized the async_api variant is probably there to support spack install scr@legacy

From what I can tell (correct me if I'm wrong @tonyhutter), if BBAPI is found, it is preferred, but if it fails or is unsupported, it will error rather than fall back to pthreads unless the user has set -DENABLE_BBAPI_FALLBACK (ECP-VeloC/AXL#64).

So the axl spackage might need a variant to enable/disable that fallback at least (e.g., bbapi_fallback).

Which I suppose comes back to your original question; do we want the user to be able to opt out of using BBAPI, even when it is available (which they currently cannot, I believe). Or do we want to always require them to use it when it's automatically found, and optionally fallback to pthreads if it's found but unsupported?


  1. And should the scr package pass through its bbapi variant setting to its axl dependency, or should we require spack users to specify axl as a depencency? The feeling on the call was that we should keep that in scr and pass it through if needed.

My biased opinion on this is to set the variants in the scr package and pass them through to dependencies. I think this retains one of the advantages of Spack, which is to only worry about the package you are installing and not so much on what's going on underneath.

If that makes sense, cool, if not, here's the explosion of thoughts my mind went though (feel free to skip 😆)


My mind sees this as a bottom-up vs top-down approach, for example:

Say we've added the bbapi_fallback variant mentioned above to axl

If I'm installing SCR without Spack I have to learn that SCR's dependencies exist and then install them all first. If I want to enable the BBAPI fallback (or any option), I also need to remember to pass the cmake option to AXL when building. I then tell SCR where all these installs of its dependencies are when building.

Even with scr-top or the bootstrap script (which appears to also be out-of-date) doing this for me, I still have to know the option is in AXL and edit scr-top/bootstrap and add the option to the AXL configure beforehand.

However, with Spack we can still hide this dependency management from the user and still make SCR look like one entity.

If I'm installing SCR with Spack, I'd think, "how do I want SCR to behave? I want to (insert behavior) fall back to pthreads if BBAPI isn't supported, okay, tell Spack to install SCR with that support."

spack install scr+bbapi_fallback resource_manager=LSF

which does make the spackage more complicated as we'd then need something like

depends_on('axl+bbapi_fallback', when='+bbapi_fallback')

vs

"How do I want SCR to behave? I want to fall back to pthreads if BBAPI isn't supported, okay tell Spack to install SCR with that support. Oh, SCR doesn't handle that, AXL does. What is AXL? Oh, it's a dependency of SCR. Okay, to get SCR to behave how I want, I need to tell a dependency of SCR I didn't know about till now how to behave."

spack install scr resource_manager=LSF ^axl+bbapi_fallback

Now we're back to the level of knowledge the end user needs about our dependencies when installing with cmake.

This could become more complicated as additional options are ever needed, especially for multiple dependencies.
For example:

Assuming the KVTREE_FILE_LOCK option for kvtree is still valid (which the spackage doesn't currently have, but it's still in the scr spackage), the user has to learn about two dependencies vs doing it all in scr:

spack install scr+bbapi_fallback resource_manager=LSF file_lock=FNCTL

vs

spack install scr resource_manager=LSF ^axl+bbapi_fallback ^kvtree file_lock=FNCTL

I may be way off track here. Feel free to point out any assumptions I've made or things I've missed.

@tonyhutter
Copy link
Contributor

tonyhutter commented Mar 17, 2021

@CamStan I'm not sure what xfer type SCR uses by default.

Overall, the user should be specifying the xfer type for the storage descriptor in scr.conf. That is, use FLUSH=BBAPI or FLUSH=PTHREAD for the particular storage tier. (Note FLUSH was previously named TYPE). Here's the relevant commit:

commit cc5b4707a81cbbc47b7f2822094f92c093289a51
Author: Tony Hutter <[email protected]>
Date:   Wed Jan 29 14:23:01 2020 -0800

    Specify custom STORE transfer type in scr.conf
    
    Update the TYPE definition in the STORE descriptors to
    allow you to specify which AXL transfer type SCR should use
    to copy files into the STORE.  For example:
    
       STORE=/tmp          GROUP=NODE   COUNT=1
       STORE=/ssd          GROUP=NODE   COUNT=3  TYPE=bbapi
       STORE=/dev/persist  GROUP=NODE   COUNT=1  ENABLED=1  MKDIR=0
       STORE=/p/lscratcha  GROUP=WORLD  TYPE=pthread
    
    The old TYPE definitions (like "DATAWARP" and "POSIX") are mapped
    to the AXL types for backwards compatibility.

If a user specifies FLUSH=BBAPI in scr.conf and BBAPI support wasn't built into AXL, they should just get an error at runtime.

@adammoody
Copy link
Contributor Author

@CamStan , yes, I agree with your opinion on taking the top-down approach. Your example is awesome :-)

Nice find on the kvtree file_lock item. Let's add that as an item to the v3.0 list. If you find any other unsupported or missing variants, let's consider fixing or dropping each of them.

One thing that may be useful as an SCR variant would be to enable the user to specify SCR's default AXL transfer method. It defaults to SYNC, but maybe the user would want to set that at build time.

https://github.com/LLNL/scr/blob/develop/src/scr_conf.h#L205-L208

That's not critical, since people can set it at runtime in various ways. We could list that as a "nice to have".

@adammoody
Copy link
Contributor Author

adammoody commented Mar 17, 2021

Skimming the scr package, I see it defaults to write the scr config file to scr.conf:

https://github.com/spack/spack/blob/e57053bd32f9b0857dfffe03a7c9782ea6057a8e/var/spack/repos/builtin/packages/scr/package.py#L69

Since there is no prefix, does that get placed within the spack install directory?

Oh, this line suggests that it does:
https://github.com/spack/spack/blob/e57053bd32f9b0857dfffe03a7c9782ea6057a8e/var/spack/repos/builtin/packages/scr/package.py#L71

v3.0 item: Let's update readthedocs to describe how someone finds that file after a spack install so they can edit it.

@CamStan
Copy link
Member

CamStan commented Mar 18, 2021

Nice find on the kvtree file_lock item. Let's add that as an item to the v3.0 list.

Created an item. Can add more variants to it as we find them.

One thing that may be useful as an SCR variant would be to enable the user to specify SCR's default AXL transfer method. It defaults to SYNC, but maybe the user would want to set that at build time.

Might be able to ask this question about several runtime options (e.g., redundancy scheme, etc). Agreed these could be added to a "nice to have," but it might be simpler to only worry about variants for build options for v3.0. Granted, the longer I stare at this the more confused I get about which options are build-time only. 😅

Since there is no prefix, does that get placed within the spack install directory?

Yes, when the path provided is not an absolute path, I believe that is correct.
https://github.com/spack/spack/blob/e57053bd32f9b0857dfffe03a7c9782ea6057a8e/var/spack/repos/builtin/packages/scr/package.py#L119-L121
and
https://github.com/spack/spack/blob/e57053bd32f9b0857dfffe03a7c9782ea6057a8e/var/spack/repos/builtin/packages/scr/package.py#L103-L108

For reference when updating readthedocs, the spack install directory can be retrieved with spack location -i <package name>.

@adammoody
Copy link
Contributor Author

Thanks, @CamStan . I should have browsed through the file a bit more. Cool. That's some slick code. Nice.

@CamStan
Copy link
Member

CamStan commented Apr 16, 2021

@adammoody, @gonsie, I've been jumping back and forth on this, but have come to the conclusion that we should simply pull the async_api variant out of the AXL spackage (and likely the SCR package as well).
Currently, this variant looks like:

variant('async_api', default='daemon',
        description="Set of async transfer APIs to enable",
        values=['cray_dw', 'ibm_bbapi', 'daemon', 'none'], multi=True,
        validator=async_api_validator)

# not-yet implemented functionality
conflicts('async_api=cray_dw', when='@0.1.0')
conflicts('async_api=ibm_bbapi', when='@0.1.0')

However, daemon is now gone, and finding BBAPI is now automated, so they both can be removed.
The variant would then look like:

variant('async_api', default='none',
        description="Set of async transfer APIs to enable",
        values=['cray_dw', 'none'], multi=True,
        validator=async_api_validator)
 
# not-yet implemented functionality
conflicts('async_api=cray_dw', when='@0.1.0')

I think it was also mentioned that our datawarp support likely doesn't work (is that correct?) as we haven't tested with it lately.

We could add intel_cppr to the list, but since it's not implemented within AXL we'd also need to make it conflict when used at all, so seems pointless:

conflicts('async_api=intel_cppr', when='@0.1.0:')

It was also discussed on our latest call that, as we do ensure support for these options, we will likely automate them as well (like with BBAPI) and so they'd get removed from this option anyway.

Lastly, since we also mentioned on our last call that we should deprecate the older AXL versions, we don't need to ensure this variant still exists/works to support them. So we can simplify the package to only what is needed/works for the newest release or ones to come.

Any thoughts or objections?

@adammoody
Copy link
Contributor Author

Yes, I agree. It looks like we could just drop the async_api variant from AXL.

If I had to bet on it, my guess is that our AXL Datawarp code does not work. I think it was ported over from SCR to AXL but without testing, since we don't have a machine where we can easily test it. It's possible that ECP users at LANL or NERSC might want it, and I'm not clear on Datawarp status on upcoming systems.

Let's create an issue for AXL to either refresh Datawarp or decide to drop it. If we decide to keep it, we can auto-detect it during the AXL build like BBAPI.

For CPPR, we can check whether that project is still active. We could check with Argonne or Intel.
http://sc16.supercomputing.org/sc-archive/tech_poster/tech_poster_pages/post255.html

@CamStan
Copy link
Member

CamStan commented Apr 30, 2021

Addressed in spack/spack#23044 and spack/spack#23378

@CamStan CamStan closed this as completed Apr 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants