-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
INFO[0000] Not collecting DCP metrics: This request is serviced by a module of DCGM that is not currently loaded #398
Comments
I just found out that dcgm does not support GTX/RTX gpus, unfortunately, as pointed out by this comment. It would be really useful to add this to documentation, as I can easily build a cloud with GTX/RTX gpus. Is there a similar tool that does the same thing for GTX/RTX? Except of course profiling with nsys/ncu. I just want to monitor the SM occupancy rates at every point of time without interfering with the running programs |
I also encountered the same problem, how to solve it? nvidia-smi logs:
|
Hi, have you fixed it? Thanks! |
Hello! No, there is no way to fix it for customer grade GPUs. This tool is built specifically for cloud GPUs, unfortunately. Hopefully, Nvidia will add a similar tool for customer grade GPUs in future |
I see, thank you! |
Hello!
I have built dcgm-exporter from source with
git clone https://github.com/NVIDIA/dcgm-exporter.git cd dcgm-exporter make binary
Then, I have created a custom metrics file with
And finally started dcgm-exporter with the custom metrics
This gives me
Watching at http://localhost:9400/metrics does not show any metrics, so I assume they are not collected (and/or not enabled), which is actually stated in the dcgm-exporter logs.
I have also tried using the latest dcgm-exporter docker images (
nvcr.io/nvidia/k8s/dcgm-exporter:3.3.8-3.6.0-ubuntu22.04
- latest andnvcr.io/nvidia/k8s/dcgm-exporter:3.3.0-3.2.0-ubuntu22.04
- that matches my driver that ships with CUDA 12.2) withBut it gives me the same output
How should I deal with this issue? And how do I enable these metrics?
The text was updated successfully, but these errors were encountered: