Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compile error on NOAA Jet #1719

Closed
MicroTed opened this issue Apr 21, 2023 · 9 comments
Closed

compile error on NOAA Jet #1719

MicroTed opened this issue Apr 21, 2023 · 9 comments

Comments

@MicroTed
Copy link
Contributor

I've been working on an update for the NSSL microphysics scheme, and I'm getting an odd error on Jet at the final link stage:

[ 99%] Linking Fortran executable ufs_model
/apps/oneapi/compiler/2022.0.2/linux/compiler/lib/intel64_lin/libifcoremt.a(for_diags_intel.o): In function `for__io_return':
for_diags_intel.c:(.text+0xcf2): relocation truncated to fit: R_X86_64_PC32 against symbol `message_catalog' defined in COMMON section in /apps/oneapi/compiler/2022.0.2/linux/compiler/lib/intel64_lin/libifcoremt.a(for_diags_intel.o)

The same code compiles fine on Mac (GNU/x86) and Hera (intel 2021). And the unmodifed microphysics code links fine on Jet. I have no idea why the problem arises on Jet, but maybe the newer Intel compiler has something to do with it? The NSSL code is adding 3 variables and associated rate arrays and code to compute them.

It seems to be a memory addressing issue (based on google search), but I'm not familiar with diagnosing that.

@jkbk2004
Copy link
Collaborator

If there is still problem, it might be worth to connect with #1707. Can you point to the experiment path? So, we may give a try with the spack stack.

@MicroTed
Copy link
Contributor Author

Sure, on jet the build directory is

/mnt/lfs4/NAGAPE/hpc-wof1/mansell/ufs/ufs-srw-210/sorc/ufs_dev (build_ideal2)

This directory also has some changes to atmos_cubed_sphere to support idealized doubly-periodic test cases. But the same is true on hera, where it compiles and runs fine. When I have some time, I can set up a test on jet where the only changes are for CCPP.

@zach1221 zach1221 moved this to In Progress in Backlog: platforms and RT May 16, 2023
@zach1221
Copy link
Collaborator

I attempted to run a test (control_CubedSphereGrid) against these changes with the new spack-stack on Jet and received a similar error as the one depicted above. It looks like it failed in the "Linking Fortran executable ufs_model" step in the compile. Logs: /lfs4/HFIP/h-nems/Zachary.Shrader/RT_RUNDIRS/Zachary.Shrader/FV3_RT/rt_54376/compile_001
image

@MicroTed
Copy link
Contributor Author

MicroTed commented Jun 2, 2023

From some websearching, it may work to use an option -mcmodel=medium (default is 'small'), but it doesn't make sense because it compiles/links without problem on hera (also with Intel compilers, but slightly older versioin). It's strange. Also, I'm not sure how to add that compile option everywhere to test if it would work. It would be better not to have to use it because it might affect performance.

@MicroTed
Copy link
Contributor Author

FYI: I was able to get a successful build by adding -mcmodel=medium to the fortran flags in Intel.cmake. I'm not sure that's the best option, though. The current release code compiles OK, so the problem is somehow caused by my updated NSSL microphysics subroutine that I'm testing. The update adds a non-trivial amount of code to the main subroutine. I don't know if it would help to try to split some of that into separate routines, or if it is just the total amount of code that is at issue. The static work arrays don't seem to play a role, because changing their size from (500) to (1) doesn't help.

@MicroTed
Copy link
Contributor Author

Another qualifier is that the error shows up on Jet with -DDEBUG=ON but not when debug is off. On hera, it completes either way.

@jkbk2004
Copy link
Collaborator

@MicroTed Can I check this git issue related to the PR #1924 ? If #1924 runs ok on jet, a progress for this git issue, I guess ?

@MicroTed
Copy link
Contributor Author

@jkbk2004 Yes, the PR implements the 'mcmodel' fix for Jet. I think I understand now that Jet has this issue because it compiles for multiple CPU types, whereas hera only has one type. That results in larger object files on Jet and thus hits the size limit. What is curious is that the explicit mcmodel=small in ESMF appears to override the setting of medium, and it still works -- I'm not completely sure how that works.

We could probably close this out.

@zach1221
Copy link
Collaborator

zach1221 commented Nov 1, 2023

@jkbk2004 Yes, the PR implements the 'mcmodel' fix for Jet. I think I understand now that Jet has this issue because it compiles for multiple CPU types, whereas hera only has one type. That results in larger object files on Jet and thus hits the size limit. What is curious is that the explicit mcmodel=small in ESMF appears to override the setting of medium, and it still works -- I'm not completely sure how that works.

We could probably close this out.

Ok, @MicroTed . Closing this out for the time being. If it reoccurs during continued testing of 1924, then we can re-open it.

@zach1221 zach1221 closed this as completed Nov 1, 2023
@github-project-automation github-project-automation bot moved this from In Progress to Done in Backlog: platforms and RT Nov 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

No branches or pull requests

4 participants