We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Current I'm training a large model (114M sentences) with 2 GPUS but I see a problem on GPU parallelism during the training on nvidia-smi.
``
Second GPU are all time 98-99% usage OK But the First GPU have a fluctuation on GPU-Utilization, sometimes on 11% and others 45%, 95% etc..
+---------------------------------------------------------------------------------------+ | NVIDIA-SMI 530.41.03 Driver Version: 530.41.03 CUDA Version: 12.1 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA GeForce RTX 3060 Off| 00000000:01:00.0 Off | N/A | | 39% 56C P2 55W / 170W| 11343MiB / 12288MiB | 11% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 1 NVIDIA GeForce RTX 3060 Off| 00000000:02:00.0 Off | N/A | | 37% 43C P2 48W / 170W| 11343MiB / 12288MiB | 99% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+
It's normal or have a build/configuration problem? train-marian.txt
[2023-05-18 03:17:45] Using synchronous SGD [2023-05-18 03:17:57] [training] Batches are processed as 1 process(es) x 2 devices/process
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Current I'm training a large model (114M sentences) with 2 GPUS but I see a problem on GPU parallelism during the training on nvidia-smi.
``
Second GPU are all time 98-99% usage OK
But the First GPU have a fluctuation on GPU-Utilization, sometimes on 11% and others 45%, 95% etc..
It's normal or have a build/configuration problem?
train-marian.txt
[2023-05-18 03:17:45] Using synchronous SGD
[2023-05-18 03:17:57] [training] Batches are processed as 1 process(es) x 2 devices/process
The text was updated successfully, but these errors were encountered: