You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
More of an information than a bug report. Native Windows NCCL is not available via conda-forge (also not available according to Nvidia docs), I don't know whether there is one precompiled with CUDA or something specifically for Windows
I'll try to compile a system agnostic one and check if it works. I found this issue, cause I have a dataset of 84 recordings (45k frames each) and it doesn't fit on one 4090 - but when trying to run:
C:\anaconda3\envs\keypoint_moseq_gpu\lib\site-packages\jax\_src\dispatch.py:380: UserWarning:
The jitted function resample_discrete_stateseqs includes a pmap. Using jit-of-pmap can lead to inefficient data movement, as the outer jit does not preserve sharded data representations and instead collects input and output arrays onto a single device. Consider removing the outer jit unless you know what you're doing. See https://github.com/google/jax/issues/2926.
2023-11-16 15:30:23.348349: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2153] Execution of replica 1 failed: UNIMPLEMENTED: NCCL support is not available: this binary was not built with a CUDA compiler, which is necessary to build the NCCL source library.
2023-11-16 15:30:34.961307: F external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2298] Replicated computation launch failed, but not all replicas terminated. Aborting process to work around deadlock. Failure message (there may have been multiple failures, see the error log for all failures):
NCCL support is not available: this binary was not built with a CUDA compiler, which is necessary to build the NCCL source library.
Thanks for the info! So far we haven't had many users trying to use multiple GPUs on Windows so haven't seen this yet. Keep me posted if you figure out a solution! I wonder if using system installs of CUDA/cudnn would help?
Thanks for the info! So far we haven't had many users trying to use multiple GPUs on Windows so haven't seen this yet. Keep me posted if you figure out a solution! I wonder if using system installs of CUDA/cudnn would help?
Multi-GPU is related to NCCL (NVIDIA Collective Communications Library). There is apparently a system agnostic version available, I'll try to compile it, add to my CUDA installation and see if it works.
Techically it's not a big issue, I can just use part of the data to train the model, it should be fine anyway if I shuffle properly between all the groups.
More of an information than a bug report. Native Windows NCCL is not available via
conda-forge
(also not available according to Nvidia docs), I don't know whether there is one precompiled with CUDA or something specifically for WindowsI'll try to compile a system agnostic one and check if it works. I found this issue, cause I have a dataset of 84 recordings (45k frames each) and it doesn't fit on one 4090 - but when trying to run:
So then when running:
it throws:
Found this issue, that may be useful in implementing a solution for Windows: tensorflow/tensorflow#21470
The text was updated successfully, but these errors were encountered: