Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cache_put_flush (../../../../src/pm/hydra/proxy/pmip_pmi.c:183): assert (s) failed #7195

Open
jeffhammond opened this issue Nov 1, 2024 · 4 comments

Comments

@jeffhammond
Copy link
Member

I am using 0b1a4ba and see this error with every MPI test, but not with hostname:

[proxy:0@host] cache_put_flush (../../../../src/pm/hydra/proxy/pmip_pmi.c:183): assert (s) failed

There is no impact to MPI test behavior that I can see.

I built MPICH with NVHPC 24.9 compilers and UCX.

mpichversion
MPICH Version:      4.3.0a1
MPICH Release date: unreleased development copy
MPICH ABI:          0:0:0
MPICH Device:       ch4:ucx
MPICH configure:    CC=nvc CXX=nvc++ FC=nvfortran --with-device=ch4:ucx --prefix=~/MPI/mpich-nvhpc-ch4-ucx-install
MPICH CC:           nvc     --diag_suppress=branch_past_initialization -O2
MPICH CXX:          nvc++
MPICH F77:          nvfortran
MPICH FC:           nvfortran
MPICH features:     threadcomm

I ran rm -rf /tmp/* just in case there were some remnants there, but saw no change.

@jeffhammond
Copy link
Member Author

e.g. cpi shows the asserts, but still succeeds.

~/mpich-nvhpc-ch4-ucx-install/bin/mpirun -n 4 ./cpi
[proxy:0@oppenheimer] cache_put_flush (../../../../src/pm/hydra/proxy/pmip_pmi.c:183): assert (s) failed
Process 0 of 4 is on oppenheimer
Process 1 of 4 is on oppenheimer
Process 3 of 4 is on oppenheimer
Process 2 of 4 is on oppenheimer
pi is approximately 3.1415926544231239, Error is 0.0000000008333307
wall clock time = 0.000027
[proxy:0@oppenheimer] cache_put_flush (../../../../src/pm/hydra/proxy/pmip_pmi.c:183): assert (s) failed

@jeffhammond
Copy link
Member Author

jeffhammond commented Nov 7, 2024

This is the result with printf of the two arguments to the assert that fails...

~/MPI/mpich/build$ ~/MPI/mpich-nvhpc-ch4-ucx-install/bin/mpirun -n 4 ./examples/cpi
[proxy:0@oppenheimer] cache_put_flush (../../../../src/pm/hydra/proxy/pmip_pmi.c:185): assert (s) failed
Process 0 of 4 is on oppenheimer
Process 1 of 4 is on oppenheimer
Process 2 of 4 is on oppenheimer
Process 3 of 4 is on oppenheimer
pi is approximately 3.1415926544231239, Error is 0.0000000008333307
wall clock time = 0.000027
[proxy:0@oppenheimer] cache_put_flush (../../../../src/pm/hydra/proxy/pmip_pmi.c:185): assert (s) failed
s=�_�
status=0
s=(null)
status=0
s=(null)
status=0

@jeffhammond
Copy link
Member Author

This is the result with printf of the two arguments to the assert that fails...

~/MPI/mpich/build$ ~/MPI/mpich-nvhpc-ch4-ucx-install/bin/mpirun -n 4 ./examples/cpi
[proxy:0@oppenheimer] cache_put_flush (../../../../src/pm/hydra/proxy/pmip_pmi.c:185): assert (s) failed
Process 0 of 4 is on oppenheimer
Process 1 of 4 is on oppenheimer
Process 2 of 4 is on oppenheimer
Process 3 of 4 is on oppenheimer
pi is approximately 3.1415926544231239, Error is 0.0000000008333307
wall clock time = 0.000027
[proxy:0@oppenheimer] cache_put_flush (../../../../src/pm/hydra/proxy/pmip_pmi.c:185): assert (s) failed
s=@��
status=(null)
s=(null)
status=(null)
s=(null)
status=(null)

@jeffhammond
Copy link
Member Author

I'm not really sure what I am looking at, but the only value I can reason about is *p=-allgather-shm-1-0, which comes from this:

    while (p) {
        struct pmip_kvs *s = NULL;
        HASH_FIND_STR(pg->kvs, *p, s);
        if (s==0) {
            printf("s=%p\n",s);
            printf("pg->kvs=%p\n",pg->kvs);
            printf("p=%p\n",p);
            printf("*p=%s\n",*p);
            printf("status=%d\n",status);
        }
        HYDU_ASSERT(s, status);
        PMIU_cmd_add_str(&pmi, s->key, s->val);

        p = (const char **) utarray_next(pg->kvs_batch, p);
    }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant