You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I am trying to see the overhead of creating a ucc communication context in a 4 node cluster with 64 slots/node.
This is a graph of calling ucc_context_create and ucc_context_destroy for 50 iterations and the average time spent on each operation. I was surprised to see that context creation takes constant time across the 4 nodes. Is this the expected behavior?
I measured the timings for the oob allgather operation (time between when the request was created and when the request completes) and it doesn't come close to this 3s mark. Can anyone shed some light on this?
The text was updated successfully, but these errors were encountered:
This is a graph of calling ucc_context_create and ucc_context_destroy for 50 iterations and the average time spent on each operation. I was surprised to see that context creation takes constant time across the 4 nodes. Is this the expected behavior?
ucc_context create time depends on multiple factors, e.g. what UCC TLs you are using and whether it's global context or not. It's expected if you run with TL UCP only, since we don't connect endpoints in advance and instead connection between peers is established only if it's really needed.
I measured the timings for the oob allgather operation (time between when the request was created and when the request completes) and it doesn't come close to this 3s mark. Can anyone shed some light on this?
Again it depends on TLs being used. Context create for TL UCP includes initializing UCP context and UCP worker.
Hi, I am trying to see the overhead of creating a ucc communication context in a 4 node cluster with 64 slots/node.
This is a graph of calling
ucc_context_create
anducc_context_destroy
for 50 iterations and the average time spent on each operation. I was surprised to see that context creation takes constant time across the 4 nodes. Is this the expected behavior?I measured the timings for the oob allgather operation (time between when the request was created and when the request completes) and it doesn't come close to this 3s mark. Can anyone shed some light on this?
The text was updated successfully, but these errors were encountered: