Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-GPU not supported for Windows #111

Open
KonradDanielewski opened this issue Nov 16, 2023 · 2 comments
Open

Multi-GPU not supported for Windows #111

KonradDanielewski opened this issue Nov 16, 2023 · 2 comments

Comments

@KonradDanielewski
Copy link
Contributor

KonradDanielewski commented Nov 16, 2023

More of an information than a bug report. Native Windows NCCL is not available via conda-forge (also not available according to Nvidia docs), I don't know whether there is one precompiled with CUDA or something specifically for Windows

I'll try to compile a system agnostic one and check if it works. I found this issue, cause I have a dataset of 84 recordings (45k frames each) and it doesn't fit on one 4090 - but when trying to run:

from jax_moseq.utils import set_mixed_map_gpus
set_mixed_map_gpus(2)

So then when running:

model = kpms.init_model(data, pca=pca, **config()) 

it throws:

C:\anaconda3\envs\keypoint_moseq_gpu\lib\site-packages\jax\_src\dispatch.py:380: UserWarning:

The jitted function resample_discrete_stateseqs includes a pmap. Using jit-of-pmap can lead to inefficient data movement, as the outer jit does not preserve sharded data representations and instead collects input and output arrays onto a single device. Consider removing the outer jit unless you know what you're doing. See https://github.com/google/jax/issues/2926.

2023-11-16 15:30:23.348349: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2153] Execution of replica 1 failed: UNIMPLEMENTED: NCCL support is not available: this binary was not built with a CUDA compiler, which is necessary to build the NCCL source library.
2023-11-16 15:30:34.961307: F external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2298] Replicated computation launch failed, but not all replicas terminated. Aborting process to work around deadlock. Failure message (there may have been multiple failures, see the error log for all failures):

NCCL support is not available: this binary was not built with a CUDA compiler, which is necessary to build the NCCL source library.

Found this issue, that may be useful in implementing a solution for Windows: tensorflow/tensorflow#21470

@calebweinreb
Copy link
Contributor

Thanks for the info! So far we haven't had many users trying to use multiple GPUs on Windows so haven't seen this yet. Keep me posted if you figure out a solution! I wonder if using system installs of CUDA/cudnn would help?

@KonradDanielewski
Copy link
Contributor Author

KonradDanielewski commented Dec 6, 2023

Thanks for the info! So far we haven't had many users trying to use multiple GPUs on Windows so haven't seen this yet. Keep me posted if you figure out a solution! I wonder if using system installs of CUDA/cudnn would help?

Multi-GPU is related to NCCL (NVIDIA Collective Communications Library). There is apparently a system agnostic version available, I'll try to compile it, add to my CUDA installation and see if it works.

Techically it's not a big issue, I can just use part of the data to train the model, it should be fine anyway if I shuffle properly between all the groups.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants