Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] start of work to add prometheus plugin #5215

Closed

Conversation

vsoch
Copy link
Member

@vsoch vsoch commented May 28, 2023

Problem: we want to report metrics from flux to Prometheus
Solution: write a plugin!

This will close #5214

These early steps create the plugin skeleton, adding build logic to enable the plugin given particular flags are present. I next want to try installing this, adding the module load to rc1, and seeing what happens. I think likely we want the metrics to be updated from elsewhere (meaning the trigger for an update) but not fully understanding how modules run I am not sure yet.

I'm opening this as a draft because I have some questions that would be helpful to be answered! I have the basic structure of my plugin and I am currently wondering:

  • is the mod_main only called when the module is initially loaded? What other things trigger a plugin / module and do we need to add logic somewhere else to the code. E.g.,: "if the queue size changes, update the metric if the module is active."
  • How do I get queue metrics from flux?
  • How do I use the configure.ac to get the library paths to add later to the prometheus/Makefile.am? Is it just an automatic thing (e.g., the variable LIBPROM and LIBPROMHTTP will exist?)

And more for discussion with others that care about the operator - what metrics (and thus abstractions for them, namely gauge, counter, and histogram) are we interested in for our autoscaler?

Also note associated with this work I've started a "how to write a plugin" document because I couldn't really find anything. https://gist.github.com/vsoch/756f10b52f7889e1b781ccdc599fa8cc. Also note that I didn't add my faux test directory because it's largely useless.

I'm going to test building this into a container with a modified rc1, and maybe my first question will be partially answered if I only see the print once. I basically want this to fire up and be running (and eventually that will mean the metrics server) when the Flux broker starts. I know there are ways to do it with flux cron and a python script, but I'd rather try the harder approach first that will be better (and indeed I'm learning a lot and enjoying that).

Problem: we want to report metrics from flux to prometheus
Solution: write a plugin!

These early steps create the plugin skeleton, adding build
logic to enable the plugin given particular flags are present.
I next want to try installing this, adding the module load
to rc1, and seeing what happens. I think likely we want the
metrics to be updated from elsewhere (meaning the trigger for
an update) but not fully understanding how modules run
I am not sure yet.

Signed-off-by: vsoch <[email protected]>
When I ran the original daemon, it did not seem to save
state. I looked at the other daemons and saw they would
create a separate context to save to the handle, so I
am trying that here.

Signed-off-by: vsoch <[email protected]>
@codecov
Copy link

codecov bot commented May 29, 2023

Codecov Report

Merging #5215 (0d7db15) into master (c2964c9) will decrease coverage by 0.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #5215      +/-   ##
==========================================
- Coverage   83.66%   83.66%   -0.01%     
==========================================
  Files         444      444              
  Lines       75701    75701              
==========================================
- Hits        63338    63333       -5     
- Misses      12363    12368       +5     

see 6 files with indirect coverage changes

@vsoch
Copy link
Member Author

vsoch commented May 31, 2023

No longer necessary, anyone interested can see https://github.com/converged-computing/prometheus-flux and for the actual solution I used, https://github.com/converged-computing/flux-metrics-api - TLDR: the Flux broker serves a Kubernetes custom metrics API directly that can advise an autoscaler/v2. I'll probably do a writeup since I learned a lot (and for future me to not forget). Thanks!

@vsoch vsoch closed this May 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Discussion for Prometheus plugin (example use case autoscaling)
1 participant