Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rzadams host-configs #1447

Merged
merged 15 commits into from
Oct 23, 2024
Merged

rzadams host-configs #1447

merged 15 commits into from
Oct 23, 2024

Conversation

bmhan12
Copy link
Contributor

@bmhan12 bmhan12 commented Oct 15, 2024

This PR:

Relates to #1375
Closes #1413

@bmhan12 bmhan12 added Build system Issues related to Axom's build system Hip Issues related to Hip labels Oct 15, 2024
Copy link
Member

@kennyweiss kennyweiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bmhan12 !

Looks great overall, but before approving, I want to double check that we'll still be generating host-configs for tioga and rzvernal when we run build_tpls.

# Devtools
#------------------------------------------------------------------------------

# ClangFormat disabled due to llvm and devtools not in spec
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Outside the scope of this PR, but should we enable devtools on toss4_cray at this point?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes!
Added an issue to track this: #1449

Comment on lines 21 to 27
"__comment__":"# Use amdgpu_target=gfx942 for rzadams",
"__comment__":"# Use amdgpu_target=gfx90a for tioga/rzvernal",
"__comment__":"# Use amdgpu_target=gfx908 for rznevada",
"__comment__":"# -Wno-int-conversion flag needed for building HDF5",
"toss_4_x86_64_ib_cray":
[ "[email protected]~openmp+mfem+c2c+profiling+rocm amdgpu_target=gfx90a ^hip@5.6.0 ^hsa-rocr-dev@5.6.0 ^llvm-amdgpu@5.6.0 ^rocprim@5.6.0 ^raja~openmp+rocm ^umpire~openmp+rocm ^hdf5 cflags=-Wno-int-conversion",
"[email protected]~openmp+mfem+c2c+profiling+rocm amdgpu_target=gfx90a ^hip@5.7.1 ^hsa-rocr-dev@5.7.1 ^llvm-amdgpu@5.7.1 ^rocprim@5.7.1 ^raja~openmp+rocm ^umpire~openmp+rocm ^hdf5 cflags=-Wno-int-conversion" ],
[ "[email protected]~openmp+mfem+c2c+profiling+rocm amdgpu_target=gfx942 ^hip@6.1.2 ^hsa-rocr-dev@6.1.2 ^llvm-amdgpu@6.1.2 ^rocprim@6.1.2 ^raja~openmp+rocm ^umpire~openmp+rocm ^hdf5 cflags=-Wno-int-conversion",
"[email protected]~openmp+mfem+c2c+profiling+rocm amdgpu_target=gfx942 ^hip@6.2.1 ^hsa-rocr-dev@6.2.1 ^llvm-amdgpu@6.2.1 ^rocprim@6.2.1 ^raja~openmp+rocm ^umpire~openmp+rocm ^hdf5 cflags=-Wno-int-conversion" ],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that specs.yaml will no longer be generating host-configs for tioga and rzvernal (gfx90a)?
I think we still want that, especially for our CI.

Copy link
Contributor Author

@bmhan12 bmhan12 Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that specs.yaml will no longer be generating host-configs for tioga and rzvernal (gfx90a)? I think we still want that, especially for our CI.

You can still use the specs.json, but you have to manually swap to gfx90a in the specs.json before calling build_tpls.py, as toss_4_x86_64_ib_cray is recognized by both rzadams and tioga/rzvernal. There was a period of time when this had to be done to support rznevada and tioga.

I'm not sure what is the better way to handle the two-machine case.
We could some add some additional machine checking in the python scripts (a "hostname" or other environment variable parsing?), either at the spack package recipe level or in llnl_lc_build_tools.py.
Not sure if either is the level of visibility we want for something that is part of the spack spec.

This Teams discussion is relevant (performing the machine check at the uberenv level): link

Copy link
Member

@rhornung67 rhornung67 Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could build a single TPL set that supports both compute architectures by setting CMAKE_HIP_ARCHITECTURES to a list of valid architectures.

Unfortunately, the Cray Fortran compiler cannot handle multiple architectures. Checking with a few app folks,it seems that they no longer do development or support their codes on MI250X machines because of this.

If we want to maintain GitLab CI testing for HIP on the CZ, we'll have to build for MI250X until tuolumne becomes GA.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could build a single TPL set that supports both compute architectures by setting CMAKE_HIP_ARCHITECTURES to a list of valid architectures.

This works, setting amdgpu_target=gfx90a, gfx942 will have the one TPL set work for both rzvernal and rzadams.

Unfortunately, the Cray Fortran compiler cannot handle multiple architectures. Checking with a few app folks,it seems that they no longer do development or support their codes on MI250X machines because of this.

We are currently testing only amdclang++ with amdflang, so I did not run into this issue.
We removed the crayCC with crayftn configs to sort out issues on CI with older rocm versions: #1273

I'll take a look at creating a config with the Cray compilers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sorting these issues @bmhan12

@bmhan12 bmhan12 merged commit 270953a into develop Oct 23, 2024
13 checks passed
@bmhan12 bmhan12 deleted the feature/han12/rzadams branch October 23, 2024 17:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Build system Issues related to Axom's build system Hip Issues related to Hip
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add host-config for LC's rzAdams platform
4 participants