-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SMHP: slurm exporter to report gpu metrics #181
base: main
Are you sure you want to change the base?
Conversation
Nice! |
@@ -10,7 +10,7 @@ if sudo systemctl is-active --quiet slurmctld; then | |||
echo "Go is already installed." | |||
fi | |||
echo "This was identified as the controller node because Slurmctld is running. Begining SLURM Exporter Installation" | |||
git clone -b 0.20 https://github.com/vpenso/prometheus-slurm-exporter.git | |||
git clone -b development https://github.com/vpenso/prometheus-slurm-exporter.git |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Set a tag, not dev
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-gpus-acct
throws error with v0.20, contrary to documentation (>=0.19). A few issues link this to Slurm version. The development
branch works and I can pin to specific commit.
If development
branch (pin or not) is not preferred, need to test if main
branch works. Otherwise, it's either no -gpus-acct
or pin to the head of development
branch (latest commit was two years ago anyway).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you raise an issue on https://github.com/vpenso/prometheus-slurm-exporter to get another release cut?
e48c186
to
b4a4395
Compare
44e448e
to
1209815
Compare
Issue #, if available: N/A
Description of changes: Prometheus Slurm exporter to report GPU metrics (total, allocated).
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.