-
Notifications
You must be signed in to change notification settings - Fork 420
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat(llmobs): support joining custom evaluations via tags (#11535)
This PR implements `LLMObs.submit_evaluation_for` method, which gives users two options for joining custom evaluations - by tag via the `span_with_tag` argument, which accepts a tuple containing a tag key/value pair - by span via the `span` argument, which accepts a dictionary containing `span_id` and `trace_id` keys There are also a couple behavior differences between `submit_evaluation_for` and `submit_evaluation`. In the new method, we - throw whenever a required argument is the wrong value or type - remove `metadata` argument - move the warning log for missing api key to the eval metric writer's `periodic` method Other changes: #### Eval metric writer Update the eval metric writer to write to the `v2` eval metric endpoint. The main difference with this endpoint is that it accepts a `join_with` field that holds joining information instead of a top-level trace and span id fields. #### Deprecate `submit_evaluation` Deprecates `submit_evaluation`. **I've set the removal version to be `3.0.0`.** ## Checklist - [x] PR author has checked that all the criteria below are met - The PR description includes an overview of the change - The PR description articulates the motivation for the change - The change includes tests OR the PR description describes a testing strategy - The PR description notes risks associated with the change, if any - Newly-added code is easy to change - The change follows the [library release note guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html) - The change includes or references documentation updates if necessary - Backport labels are set (if [applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)) ## Reviewer Checklist - [x] Reviewer has checked that all the criteria below are met - Title is accurate - All changes are related to the pull request's stated goal - Avoids breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes - Testing strategy adequately addresses listed risks - Newly-added code is easy to change - Release note makes sense to a user of the library - If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment - Backport labels are set in a manner that is consistent with the [release branch maintenance policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting) --------- Co-authored-by: lievan <[email protected]>
- Loading branch information
Showing
15 changed files
with
648 additions
and
162 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
17 changes: 17 additions & 0 deletions
17
releasenotes/notes/submit-evaluation-for-01096d803d969e3e.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
--- | ||
features: | ||
- | | ||
LLM Observability: This introduces the `LLMObs.submit_evaluation_for` method, which provides the ability to join a custom evaluation | ||
to a span using a tag key-value pair on the span. The tag key-value pair is expected to uniquely identify a single span. | ||
Tag-based joining is an alternative to the existing method of joining evaluations to spans using trace and span IDs. | ||
Example usage: | ||
- Evaluation joined by tag: `LLMObs.submit_evaluation_for(span_with_tag_value={"tag_key": "message_id", "tag_value": "dummy_message_id"}, label="rating", ...)`. | ||
- Evaluation joined by trace/span ID: `LLMObs.submit_evaluation_for(span={"trace_id": "...", "span_id": "..."}, label="rating", ...)`. | ||
deprecations: | ||
- | | ||
LLM Observability: `LLMObs.submit_evaluation` is deprecated and will be removed in ddtrace 3.0.0. | ||
As an alternative to `LLMObs.submit_evaluation`, you can use `LLMObs.submit_evaluation_for` instead. | ||
To migrate, replace `LLMObs.submit_evaluation(span_context={"span_id": ..., "trace_id": ...}, ...)` with: | ||
`LLMObs.submit_evaluation_for(span={"span_id": ..., "trace_id": ...}, ...) | ||
You may also join an evaluation to a span using a tag key-value pair like so: | ||
`LLMObs.submit_evaluation_for(span_with_tag_value={"tag_key": ..., "tag_val": ...}, ...)`. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
15 changes: 8 additions & 7 deletions
15
...lmobs/llmobs_cassettes/tests.llmobs.test_llmobs_eval_metric_writer.send_score_metric.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
15 changes: 8 additions & 7 deletions
15
...s_cassettes/tests.llmobs.test_llmobs_eval_metric_writer.test_send_categorical_metric.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,15 +1,16 @@ | ||
interactions: | ||
- request: | ||
body: '{"data": {"type": "evaluation_metric", "attributes": {"metrics": [{"span_id": | ||
"12345678901", "trace_id": "98765432101", "metric_type": "categorical", "categorical_value": | ||
"very", "label": "toxicity", "ml_app": "dummy-ml-app", "timestamp_ms": 1724249500253}]}}}' | ||
body: '{"data": {"type": "evaluation_metric", "attributes": {"metrics": [{"join_on": | ||
{"span": {"span_id": "12345678901", "trace_id": "98765432101"}}, "metric_type": | ||
"categorical", "categorical_value": "very", "label": "toxicity", "ml_app": "dummy-ml-app", | ||
"timestamp_ms": 1732568297307}]}}}' | ||
headers: | ||
Content-Type: | ||
- application/json | ||
DD-API-KEY: | ||
- XXXXXX | ||
method: POST | ||
uri: https://api.datad0g.com/api/intake/llm-obs/v1/eval-metric | ||
uri: https://api.datad0g.com/api/intake/llm-obs/v2/eval-metric | ||
response: | ||
body: | ||
string: '{"status":"error","code":403,"errors":["Forbidden"],"statuspage":"http://status.datadoghq.com","twitter":"http://twitter.com/datadogops","email":"[email protected]"}' | ||
|
@@ -21,7 +22,7 @@ interactions: | |
content-type: | ||
- application/json | ||
date: | ||
- Wed, 21 Aug 2024 14:11:40 GMT | ||
- Mon, 25 Nov 2024 20:58:17 GMT | ||
strict-transport-security: | ||
- max-age=31536000; includeSubDomains; preload | ||
x-content-type-options: | ||
|
Oops, something went wrong.