Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about ucc topology #1034

Open
MC952-arch opened this issue Oct 12, 2024 · 1 comment
Open

Question about ucc topology #1034

MC952-arch opened this issue Oct 12, 2024 · 1 comment

Comments

@MC952-arch
Copy link

Hello, I have a question about the design of ucc topology. As we can see, the topo structs are defined in ucc_topo.h:

typedef struct ucc_context_topo {
ucc_proc_info_t procs;
ucc_rank_t n_procs;
ucc_rank_t nnodes;
ucc_rank_t min_ppn; /
< smallest ppn value across the nodes
spanned by the group of processes
defined by ucc_addr_storage_t /
ucc_rank_t max_ppn; /
< biggest ppn across the nodes /
ucc_rank_t max_n_sockets; /
< max number of different sockets
on a node /
uint32_t sock_bound; /
< global flag, 1 if processes are bound
to sockets /
ucc_rank_t max_n_numas; /
< max number of different numa domains
on a node /
uint32_t numa_bound; /
< global flag, 1 if processes are bound
to numa nodes */
} ucc_context_topo_t;

typedef struct ucc_addr_storage ucc_addr_storage_t;

/* This topo structure is initialized over a SUBSET of processes
from ucc_context_topo_t.

For example, if ucc_context_t is global then address exchange
is performed during ucc_context_create and we have ctx wide
ucc_addr_storage_t. So, we init ucc_context_topo_t on ucc_context.

Then, ucc_team is a subset of ucc_context mapped via team->ctx_map.
It represents a subset of ranks and we can initialize ucc_topo_t
for that subset, ie for a team. */
typedef struct ucc_topo {
ucc_context_topo_t topo; /< Cached pointer of the ctx topo /
ucc_sbgp_t sbgps[UCC_SBGP_LAST]; /
< LOCAL sbgps initialized on demand */
ucc_sbgp_t all_sockets; /< array of socket sbgps, init on demand */
int n_sockets;
ucc_sbgp_t all_numas; /< array of numa sbgps, init on demand /
int n_numas;
ucc_rank_t node_leader_rank_id; /
< defines which rank on a node will be
node leader. Similar to local node rank.
currently set to 0, can be selected differently
in the future /
ucc_rank_t node_leader_rank; /
< actual rank of the node leader in the original
(ucc_team) ranking /
ucc_subset_t set; /
< subset of procs from the ucc_context_topo.
for ucc_team topo it is team->ctx_map /
ucc_rank_t min_ppn; /
< min ppn across the nodes for a team /
ucc_rank_t max_ppn; /
< max ppn across the nodes for a team /
ucc_rank_t min_socket_size; /
< min number of processes on a socket,
across all nodes of a team /
ucc_rank_t max_socket_size; /
< max number of processes on a socket,
across all nodes of a team /
ucc_rank_t min_numa_size; /
< min number of processes on a numa,
across all nodes of a team /
ucc_rank_t max_numa_size; /
< max number of processes on a numa,
across all nodes of a team */
} ucc_topo_t;

They concern more about the node-level items including processor, socket, numa. What are they designed for? Besides, where can I find the intra-node link info (NVLink, PCIe, etc.) and the inter-node net info (RDMA, TCP/IP, etc.)?

Thanks for your reply in advance.

@Sergei-Lebedev
Copy link
Contributor

Processor, socket and numa information is used in many places in UCC. For example

  1. In CL HIER we build groups (sbgp) which are combination of ranks that share some resources. There are SBGP NODE (all ranks on the same node), SBGP SOCKET (all ranks on the same socket and so on.
  2. TL UCP use topology information to do rings and trees permutation for better locality.
  3. Topology information is also used to identify common hardware configurations for which uses provides best known parameters (TL/UCP: Grace tuning #1027)

NVLink topology discover is part of TL CUDA (https://github.com/openucx/ucc/blob/master/src/components/tl/cuda/tl_cuda_topo.h), though it maybe make sense to move it to ucc topo.

We don't have network topology discover yet, but it's in our backlog.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants