Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about Group Assignment for P2P Primitives #1246

Open
arkhadem opened this issue Jul 9, 2024 · 1 comment
Open

Question about Group Assignment for P2P Primitives #1246

arkhadem opened this issue Jul 9, 2024 · 1 comment
Assignees

Comments

@arkhadem
Copy link

arkhadem commented Jul 9, 2024

Hi Everyone,

I am using rccl 6.1.2. For the P2P tests, such as scatter and gather, I find that the logic in the appendWorkElemP2p and finishWorkP2p functions of enqueue.cc entails the following:

  • If there's both send and recv on a channel, group is 2: 2 64-thread warps for send and 2 for recv.
  • If there's only recv, group is 1: 4 warps only for recv.
  • If there's only send, group is 2: 2 warps operate on send and 2 warps are idle!

I wanted to understand the reasoning behind the last item, i.e., when there's only send primitives. This happens in the following situations:

  • In gather, all GPUs except the root have only send primitives to the root.
  • In scatter, the root GPU has only send primitives to all other GPUs.

Best,
Alireza

@schung-amd
Copy link

Hi @arkhadem, are you seeing this in practice, or inferring from the source code? If the former, what's your hardware configuration (i.e. number and type of GPU)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants