[WIP] start of work to add prometheus plugin #5215
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem: we want to report metrics from flux to Prometheus
Solution: write a plugin!
This will close #5214
These early steps create the plugin skeleton, adding build logic to enable the plugin given particular flags are present. I next want to try installing this, adding the module load to rc1, and seeing what happens. I think likely we want the metrics to be updated from elsewhere (meaning the trigger for an update) but not fully understanding how modules run I am not sure yet.
I'm opening this as a draft because I have some questions that would be helpful to be answered! I have the basic structure of my plugin and I am currently wondering:
And more for discussion with others that care about the operator - what metrics (and thus abstractions for them, namely gauge, counter, and histogram) are we interested in for our autoscaler?
Also note associated with this work I've started a "how to write a plugin" document because I couldn't really find anything. https://gist.github.com/vsoch/756f10b52f7889e1b781ccdc599fa8cc. Also note that I didn't add my faux test directory because it's largely useless.
I'm going to test building this into a container with a modified rc1, and maybe my first question will be partially answered if I only see the print once. I basically want this to fire up and be running (and eventually that will mean the metrics server) when the Flux broker starts. I know there are ways to do it with
flux cron
and a python script, but I'd rather try the harder approach first that will be better (and indeed I'm learning a lot and enjoying that).