Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose the private network interface name in a system environment variable #2220

Open
jvstme opened this issue Jan 23, 2025 · 0 comments
Open
Labels
enhancement A non-feature improvement

Comments

@jvstme
Copy link
Collaborator

jvstme commented Jan 23, 2025

Problem

A node in a distributed task may need to know the name of its private network interface because sometimes distributed training frameworks fail to discover the correct interface automatically (currently the case when running PyTorch with NCCL in Vultr).

Currently, there isn't an easy way for a node to find out its private network interface name.

Solution

Provide a new system environment variable containing the node's private network interface name(s).

This can probably be implemented by comparing the node's private IP address with the list of its network interfaces or by checking the route to other nodes in the cluster.

Requires research: it is not yet clear how and whether dstack should provide multiple interface names when they are available, e.g. on AWS instances with multiple EFA attachments (see #1804).

Workaround

The node can try parsing the output of ifconfig or ip and compare it with existing system environment variables, such as DSTACK_MASTER_NODE_IP and DSTACK_NODES_IPS.

Additional information

Do after or along with #2219.

Would you like to help us implement this feature by sending a PR?

Yes

@jvstme jvstme added the enhancement A non-feature improvement label Jan 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A non-feature improvement
Projects
None yet
Development

No branches or pull requests

1 participant