Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus Metrics #678

Open
LukeHandle opened this issue May 6, 2021 · 3 comments
Open

Prometheus Metrics #678

LukeHandle opened this issue May 6, 2021 · 3 comments

Comments

@LukeHandle
Copy link

LukeHandle commented May 6, 2021

  • Likely using https://github.com/akoutmos/prom_ex

  • Namespace/key prefix under glimesh_

  • To state the obvious, the labels chosen and values that the labels can be must be controlled

    • If the label can be more than say, 50+ values, that is probably bad (eg. user ID is an awful label value)
    • Prometheus is not at all suitable for measuring anything identifiable at a user or request level
  • Do we care about exposing these metrics?

    • If so, will likely need some level of auth in front
    • Basic auth, or static env based Bearer I would expect?
    • I don't think there is any proxy in-front of Glimesh in prod, (just a DO LB?), so we can't add protection there
  • Probably will be polled every 5 - 15 seconds or so

Metric Ideas

Comments appreciated - I'm sure some of these are either not possible or just not worth doing considering Phoenix etc.

  • Probably 2 different areas I would expect histogram duration based metrics

    • On specific (say, 30 max?) incoming HTTP paths

      • glimesh_http_request_duration_seconds histogram. Labels for code (HTTP status code), method (HTTP method), handler
      • Not sure how this works in practice with Elixir (?)
      • /api/oauth/token and /api would be suitable for this I think? Not sure what other paths there are (excluding websocket)
      • Maybe category pages? (eg. /streams/gaming, /streams/education?)
      • Again, this might make 0 sense with Elixir / Liveviews etc. - I am very naïve.
    • Specific functions that you wrap

      • eg. glimesh_api_processing_duration_seconds histogram ? then a source or resource or handler` label?
      • This would be similar to how you use Appsignal.instrument I think
      • Other DB specific calls under their own metric name?
      • Any external calls (especially if they are in response to a user request)? eg. Mailgun ?
  • Gauges and Counters?

    • Gauges for current total stream viewer counts? And Streamer counts?
    • Gauge for current websocket connections?
    • Counter for every websocket connection since service start? (a duration histogram makes no sense for a long lives connection)
    • Counter for errors thrown?
    • Gauge for current threads etc. ?
    • Gauge of Allocated bytes of memory?
    • Counter process_cpu_seconds_total for "Total user and system CPU time spent in seconds"?
    • Gauge process_start_time_seconds "Start time of the process since unix epoch in seconds."
@LukeHandle LukeHandle changed the title Proemtheus Metrics Prometheus Metrics May 6, 2021
@wolfcomp
Copy link
Contributor

If we want to use https://github.com/akoutmos/prom_ex we would need to wait for the next version due to a bug that is in the current version on hex that has been fixed but not pushed out where if the response body for http requests are nil prom_ex pull request 54

@clone1018
Copy link
Member

Did a minimal amount of work towards this here: #725

@LukeHandle
Copy link
Author

I think PromEx covered most of the points tbh. It's kinda then around wrapping specific things - eg. Mailgun calls, or Spaces Uploads. Stream viewer counts...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants