Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update spack_build.yml to fix cmake pulling llvm #698

Open
wants to merge 6 commits into
base: qn-solver-cleanup
Choose a base branch
from

Conversation

cameronrutherford
Copy link
Collaborator

Fix failing spack pipelines

@cameronrutherford
Copy link
Collaborator Author

@nychiang found this potential fix by looking at why llvm was building and disabling relevant package features

@@ -129,7 +129,7 @@ jobs:
cat << EOF > spack.yaml
spack:
specs:
- ${{ matrix.spack_spec }} target=x86_64_v2
- ${{ matrix.spack_spec }} ^openblas ^cmake~qtgui~ncurses ^libevent~openssl target=x86_64_v2
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nychiang seems like my specification of CMake variant worked for the HiOp package, but not for low level packages like openblas.

Do you want me to try and fix this, or can you try and take it from here?

Copy link
Collaborator

@nychiang nychiang Nov 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see what you did change here, and it seems like solving the problem.
However, now it introduce more problems, complaining that some packages are not installed:
[email protected]/bp54t5w: PackageNotInstalledError: package not installed
These packages have been pre-builded, haven't them? @cameronrutherford

@cameronrutherford
Copy link
Collaborator Author

@nychiang now it seems like my openblas specification didn't work properly, so we will have to do a better job enforcing openblas / revert a change I made.

Do you want me to keep debugging this? Are you following what I'm doing/how I'm fixing these issues?

Copy link
Collaborator Author

@cameronrutherford cameronrutherford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am working on getting the final build to use the right openblas...

@@ -141,10 +141,15 @@ jobs:
padded_length: False
mirrors:
local-buildcache: oci://${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
# spack: https://binaries.spack.io/develop
spack: https://binaries.spack.io/develop
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this is a bad idea

- hiop@develop~mpi~raja~shared~kron~sparse ^openblas ^libevent~openssl
- hiop@develop~mpi+raja~shared~kron~sparse ^openblas ^libevent~openssl
- hiop@develop+mpi~raja~shared~kron~sparse ^openmpi
- hiop@develop~mpi~raja~shared~kron~sparse
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nychiang this build is failing because for some reason it's trying to build libflame, and not openblas.

Looks at the concretization comparison between builds, and you'll see that a 3 hour build would make no sense for the smallest build of the batch here.

Copy link
Collaborator Author

@cameronrutherford cameronrutherford Nov 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would that spec possibly need python for this build?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

python is required via
hiop -> amdlibflame -> aocl-utils -> doxygen -> graphviz -> gtkplus -> meson -> python-xxx
why do we need amdlibflame? Concretization shows that it installs a lot of packages.
@cameronrutherford

Comment on lines 149 to 150
blas:
require: "openblas"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why this didn't work. It worked for cmake.

I would consider reporting this as a spack bug, but it's near impossible to debug GitHub actions runners sometimes since they run in a special VM

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This didn't work either... Trying more extreme measures.

Comment on lines 132 to 137
- ${{ matrix.spack_spec }} target=x86_64_v2
- openblas
- libevent~openssl
concretizer:
reuse: dependencies
reuse: false
unify: true
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully this gets openblas to always be built...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got what I wanted... I meant to just have openblas be a part of the concretization, but it appears as though it's always building from source now, and not pulling from the cache. I think reverting to ^openblas in the spec might fix this... I also am trying to better remember where libevent gets pulled in, but I digress.

@cameronrutherford
Copy link
Collaborator Author

Guess it's working...

@nychiang
Copy link
Collaborator

nychiang commented Nov 8, 2024

Guess it's working...

A mystery for me. Thanks a lot!

Copy link
Collaborator

@nychiang nychiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just some questions.

Comment on lines 151 to 153
# https://spack.readthedocs.io/en/latest/packages_yaml.html#setting-requirements-on-virtual-specs
cmake:
require: "~qtgui~ncurses"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cameronrutherford how did you figure out these parameters?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this was part of my initial debugging.

As we are using spack@develop, we are always getting the latest and "greatest" changes. In this case we inherited some changes to CMake that seemed to make it build these options when it wasn't previously.

This forces any build of CMake to have these options disabled.

See this job's concretization step for an example of what I mean. Compare that to a later passing pipeline here. Notice how the updated config results in a much smaller build.

Sure we could probably get the more complicated builds to work too, but a minimal build is faster, and less error prone - we shouldn't build things we don't need.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What pointed me to this was noticing the delta in the concretizations. I looked through the logs, noticed that we were having builds fail on a dependency we don't normally build, and then figured out what package was to blame. Then I worked with that package's spack build config to figure out what to enable / disable.

https://packages.spack.io is a great way to get this information quickly.

Comment on lines 136 to 137
reuse: true
unify: true
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cameronrutherford how did you figure out these?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reuse true is just a sane default. It makes sure you always pull a version of a package from a build cache when possible. There are different options as detailed here.

I was having trouble with the blas provider specification, so I decided to put it in the core spack spec. Then, by specifying unify to be true, I was able to ensure the version of hiop was also built with the version of blas that I wanted.

Comment on lines +108 to +111
blas_provider:
- openblas
compiler:
- gcc
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to add other variants of your current build, or add compiler / blas specifications as wanted.

Would be easy enough to extend this to build / test with different MPI providers as well.

Comment on lines +155 to +160
cmake:
# We don't need CMake GUI features for minimal container builds
require: "~qtgui~ncurses"
libevent:
# Building OpenSSL was causing errors
require: "~openssl"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was finding spack's ^ syntax in the spec to not always specify variants as I wanted. This is a more sustainable way to always enforce certain options on desired pacakges.

@cameronrutherford
Copy link
Collaborator Author

cameronrutherford commented Nov 8, 2024

Some final notes:

  • Disable the spack public mirror as you have issues. Normally issues are reported and fixed relatively soon after you notice them, but feel free to ping the spack slack / make an issue
  • Checking for external packages and concretizing are still the longest parts of the jobs - this is possible to speed up, but the tradeoff is that you wouldn't be getting rolling spack updates
  • Feel free to extend these builds to other platforms / GPU runners / MPIs / compilers / OSs as you find convenient
  • Logging is currently quite verbose. You can minimize / maximize it as needed
  • The GHCR registry you have configured might need to be cleared out once every so often. Re-generating it from scratch isn't so bad, just will have initial builds take a while

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants