Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[agent] Ray metrics use the name parsing as Prometheus #50443

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

israbbani
Copy link
Contributor

We will fail fast when trying to create metrics that do not pass validation in Prometheus. Added a unit test to make sure invalid metric names are not allowed.

Closes #40586

@israbbani israbbani added the go add ONLY when ready to merge, run all tests label Feb 11, 2025
@israbbani israbbani marked this pull request as ready for review February 11, 2025 21:57
@jjyao jjyao assigned edoakes and unassigned jjyao Feb 11, 2025
a private member of the Metric class

Signed-off-by: Ibrahim Rabbani <[email protected]>
# Copied from Prometheus Python Client. While the regex is not part of the public API
# for Prometheus, it's not expected to change
# https://github.com/prometheus/client_python/blob/46eae7bae88f76951f7246d9f359f2dd5eeff110/prometheus_client/validation.py#L4
METRIC_NAME_RE = re.compile(r"^[a-zA-Z_:][a-zA-Z0-9_:]*$")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
METRIC_NAME_RE = re.compile(r"^[a-zA-Z_:][a-zA-Z0-9_:]*$")
_METRIC_NAME_RE = re.compile(r"^[a-zA-Z_:][a-zA-Z0-9_:]*$")

to indicate it's not a public API (since this is in public API namespace)

Comment on lines +1043 to +1046
with pytest.raises(ValueError):
Metric("faulty-metric")
with pytest.raises(ValueError):
Metric("1cannotstartwithnumber")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
with pytest.raises(ValueError):
Metric("faulty-metric")
with pytest.raises(ValueError):
Metric("1cannotstartwithnumber")
with pytest.raises(ValueError, match="string to match in the exception message"):
Metric("faulty-metric")
with pytest.raises(ValueError, match="string to match in the exception message"):
Metric("1cannotstartwithnumber")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Admittedly I've never written much python before so I'm learning.

Doesn't this pattern effectively make the exception message part of the class API i.e. if you change the message, the tests break? Is that the intention?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes -- it's mostly a way to enforce it to ourselves that we're providing good error messages (for example if someone in the future added a new validation clause this would force them to explicitly acknowledge they changed the user-facing error message)

raise ValueError("Empty name is not allowed. Please provide a metric name.")
if not METRIC_NAME_RE.match(name):
raise ValueError(
f"Invalid metric name: {name}. Does not match regex: {METRIC_NAME_RE.pattern}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

regex's are really hard to read. maybe for common error types such as starting with an int or special character, we can also add handle-rolled validation to provide a better error message

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about changing the error message to:
Invalid metric name: {name}. Names can only start with letters or _. Names can only contain letters, numbers, and _

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds great

@edoakes
Copy link
Contributor

edoakes commented Feb 11, 2025

Validation failure in Ray Data tests! Interesting...

https://buildkite.com/ray-project/premerge/builds/34512#0194f74f-c2b7-4a9b-b0fb-9a9895a72175/1563-1597

@israbbani
Copy link
Contributor Author

israbbani commented Feb 12, 2025

It looks like our integration tests with Mars are failing. Apparently they create ray metrics with names that contain ..

What do next steps for this look like?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go add ONLY when ready to merge, run all tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Agent] Make sure Ray metrics perform same validation as Prometheus to weed out invalid names early on
3 participants