Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding prometheus endpoint in tron - TRON-2124 #944

Merged
merged 1 commit into from
Mar 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions requirements-minimal.txt
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ ipython
Jinja2>=3.1.2
lockfile
moto
prometheus-client==0.20.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should remove this pin from requirement-minimal - leaving it in here means "any version that is not 0.20.0 will not work"

i.e., this line should just have the dependency name

psutil
py-bcrypt
pyasn1
Expand Down
1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@ oauthlib==3.1.0
parso==0.7.0
pexpect==4.7.0
pickleshare==0.7.5
prometheus-client==0.20.0
prompt-toolkit==3.0.38
psutil==5.6.6
ptyprocess==0.6.0
Expand Down
1 change: 1 addition & 0 deletions tests/api/resource_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,7 @@ def test__init__(self):
b"metrics",
b"status",
b"events",
b"prom-metrics",
b"",
]
assert set(expected_children) == set(self.resource.children)
Expand Down
2 changes: 2 additions & 0 deletions tron/api/resource.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
import traceback

import staticconf
from prometheus_client.twisted import MetricsResource as MetricsResourceProm

from tron.config.static_config import get_config_watcher
from tron.config.static_config import NAMESPACE
Expand Down Expand Up @@ -500,6 +501,7 @@ def __init__(self, mcp):
self.putChild(b"status", StatusResource(mcp))
self.putChild(b"events", EventsResource())
self.putChild(b"metrics", MetricsResource())
self.putChild(b"prom-metrics", MetricsResourceProm())
self.putChild(b"", self)

@AsyncResource.bounded
Expand Down
7 changes: 7 additions & 0 deletions tron/kubernetes.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
from twisted.internet.defer import logError

import tron.metrics as metrics
import tron.prom_metrics as prom_metrics
from tron import __version__
from tron.actioncommand import ActionCommand
from tron.config.schema import ConfigFieldSelectorSource
Expand Down Expand Up @@ -90,7 +91,13 @@ def report_resources(self, decrement: bool = False) -> None:
Update internal resource utilization statistics of all tronjobs running for this task's Tron master.
"""
# TODO(TRON-1612): these should eventually be Prometheus metrics
# these should be replaced with gauges in prometheus
multiplier = -1 if decrement else 1
# prometheus gauges
prom_metrics.tron_cpu_gauge.inc(self.task_config.cpus * multiplier)
prom_metrics.tron_memory_gauge.inc(self.task_config.memory * multiplier)
prom_metrics.tron_disk_gauge.inc(self.task_config.disk * multiplier)

metrics.count("tron.mesos.cpus", self.task_config.cpus * multiplier)
metrics.count("tron.mesos.mem", self.task_config.memory * multiplier)
metrics.count("tron.mesos.disk", self.task_config.disk * multiplier)
Expand Down
6 changes: 6 additions & 0 deletions tron/prom_metrics.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
from prometheus_client import Gauge


tron_cpu_gauge = Gauge("tron_k8s_cpus", "Measuring CPU for tron jobs on K8s")
tron_memory_gauge = Gauge("tron_k8s_mem", "Measuring memory for tron jobs on K8s")
tron_disk_gauge = Gauge("tron_k8s_disk", "Measuring disk for tron jobs on K8s")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (feel free to ignore):

tron_cpu_gauge = Gauge("tron_k8s_cpus", "Total number of CPUs allocated to Tron-launched containers")
tron_memory_gauge = Gauge("tron_k8s_mem", "Total amount of memory allocated to Tron-launched containers (in megabytes)")
tron_disk_gauge = Gauge("tron_k8s_disk", "Total amount of disk allocated to Tron-launched containers (in megabytes)")

Loading