You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Runs of GEOSgcm with GNU (13 or 14) are now failing with:
EXTDATA: DEBUG: ExtData Run_: READ_LOOP: Done
[borgj101:222883] *** An error occurred in MPI_Wait
[borgj101:222883] *** reported by process [2176581633,0]
[borgj101:222883] *** on communicator MPI COMMUNICATOR 22 CREATE FROM 21
[borgj101:222883] *** MPI_ERR_TRUNCATE: message truncated
[borgj101:222883] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[borgj101:222883] *** and potentially your MPI job)
>> Error << /discover/swdev/gmao_SIteam/MPI/openmpi/4.1.6-SLES15/gcc-13.2.0/bin/mpirun -np 96 /discover/nobackup/mathomp4/Experiments/stock-2024Oct04-1day-c24-GNU-DEBUG-ExtDataDebug/scratch/GEOSgcm.x --logging_config logging.yaml: status = 15; at /gpfsm/dnb34/mathomp4/SystemTests/builds/AGCM_GNU/CURRENT/GEOSgcm/install-Debug/bin/esma_mpirun line 377.
GEOSgcm Run Status: -1
So after ExtData runs, we crash. Running with DDT showed it crashing here:
if (spec%regrid_method /= REGRID_METHOD_NEAREST_STOD) then
call ESMF_FieldRegrid(src_field, dst_field, &
& routeHandle=route_handle, &
& dynamicMask=this%dynamic_mask, &
& termorderflag=ESMF_TERMORDER_SRCSEQ, &
& zeroregion=ESMF_REGION_SELECT, &
& rc=status)
_VERIFY(status)
with an error code of ESMC_RC_NOT_IMPL (which seems to be the catchall of ESMF_FieldRegrid failure).
Now, tracking down in my nightly tests when GNU develop Debug runs failed I happed on the same exact time @bena-nasa converted ExtData to containers and things became non-zero-diff (see #3025), aka PR #3007
So, as a final test, I did a cherry-pick of commit d888902 (aka before #3007):
Runs of GEOSgcm with GNU (13 or 14) are now failing with:
So after ExtData runs, we crash. Running with DDT showed it crashing here:
MAPL/base/MAPL_EsmfRegridder.F90
Lines 1344 to 1351 in d543cf8
with an error code of
ESMC_RC_NOT_IMPL
(which seems to be the catchall of ESMF_FieldRegrid failure).Now, tracking down in my nightly tests when GNU
develop
Debug runs failed I happed on the same exact time @bena-nasa converted ExtData to containers and things became non-zero-diff (see #3025), aka PR #3007So, as a final test, I did a cherry-pick of commit d888902 (aka before #3007):
and that does run. So something between d888902...cae216f GNU does not like. :(
The text was updated successfully, but these errors were encountered: