feat(openai): add token usage stream options to request #11606

Yun-Kim · 2024-12-02T15:48:41Z

This PR adds special casing such that any user's openai streamed chat/completion requests, unless explicitly specified otherwise, will by default include the token usage as part of the streamed response.

Motivation

OpenAI streamed responses have historically not provided token usage details as part of the streamed response. However OpenAI earlier this year added a stream_options: {"include_usage": True} kwarg option to the chat/completions API to provide token usage details as part of an additional stream chunk at the end of the streamed response.

If this kwarg option was not specified by the user, then token usage is not provided by OpenAI and our current behavior is to give our best effort to 1) use the tiktoken library to calculate token counts, or 2) use a very crude heuristic to estimate token counts. Both are not ideal as neither alternative takes into account function/tool calling. It is simpler and more accurate to just request the token counts from OpenAI directly.

Proposed design

There are 2 major components for this feature:

If a user does not specify stream_options: {"include_usage": True} as a kwarg on the chat/completions call, we need to manually insert that as part of the kwargs before the request is made.
If a user does not specify stream_options: {"include_usage": True} as a kwarg on the chat/completions call but we add that option on the integration-side, the returned streamed response will include an additional chunk (with empty content) at the end containing token usage information. To avoid disrupting user applications with one more chunk (with different content/fields) than expected, the integration should automatically extract the last chunk under the hood.

Note: if a user does explicitly specify stream_options: {"include_usage": False}, then we must respect their intent and avoid adding token usage into the kwargs. We'll add in our release note that we cannot guarantee 100% accurate token counts in this case.`

Streamed reading logic change

Additionally, we make a change to __iter__/__aiter__ methods of our traced streamed responses. Previously we returned the traced streamed response (and relied on the underlying __next__/__anext__ methods), but to ensure spans will be finished even if the streamed response is not fully consumed, we change the __iter__/__aiter__ methods to implement the stream consumption using a try/catch/finally.

Note: this only applies to

When users use __iter__/__aiter__(), since directly calling __next__()/__anext__() individually will not let us know when the overall response is fully consumed.
When users use __aiter__() and break early, they are still responsible for calling resp.close(), since asynchronous generators do not automatically close when the context manager is exited (this is held until close() is called either manually or by the garbage collector).

Testing

This PR modifies the existing OpenAI streamed completion/chat completion tests to be simplified (use snapshots when possible instead of making large numbers of tedious assertions) and to add coverage for the token extraction behavior (existing tests remove include_usage: True options to assert that the automatic extraction works, and we add a couple tests asserting our original behavior if include_usage: False is explicitly set).

Checklist

PR author has checked that all the criteria below are met
The PR description includes an overview of the change
The PR description articulates the motivation for the change
The change includes tests OR the PR description describes a testing strategy
The PR description notes risks associated with the change, if any
Newly-added code is easy to change
The change follows the library release note guidelines
The change includes or references documentation updates if necessary
Backport labels are set (if applicable)

Reviewer Checklist

Reviewer has checked that all the criteria below are met
Title is accurate
All changes are related to the pull request's stated goal
Avoids breaking API changes
Testing strategy adequately addresses listed risks
Newly-added code is easy to change
Release note makes sense to a user of the library
If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment
Backport labels are set in a manner that is consistent with the release branch maintenance policy

github-actions · 2024-12-02T15:49:14Z

CODEOWNERS have been resolved as:

releasenotes/notes/feat-openai-streamed-chunk-auto-extract-4cbaea8870b1df13.yaml  @DataDog/apm-python
tests/snapshots/tests.contrib.openai.test_openai.test_chat_completion_stream.json  @DataDog/apm-python
tests/snapshots/tests.contrib.openai.test_openai.test_completion_stream.json  @DataDog/apm-python
tests/snapshots/tests.contrib.openai.test_openai_v1.test_completion_stream_est_tokens.json  @DataDog/apm-python
ddtrace/contrib/internal/openai/_endpoint_hooks.py                      @DataDog/ml-observability
ddtrace/contrib/internal/openai/utils.py                                @DataDog/ml-observability
tests/contrib/openai/test_openai_llmobs.py                              @DataDog/ml-observability
tests/contrib/openai/test_openai_v1.py                                  @DataDog/ml-observability

pr-commenter · 2024-12-02T16:27:15Z

Benchmarks

Benchmark execution time: 2025-01-08 22:33:48

Comparing candidate commit 2da832f in PR branch yunkim/openai-streamed-token-magic with baseline commit 75bed24 in branch main.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 394 metrics, 2 unstable metrics.

ddtrace/contrib/internal/openai/utils.py

datadog-dd-trace-py-rkomorn · 2024-12-31T03:17:02Z

Datadog Report

Branch report: yunkim/openai-streamed-token-magic
Commit report: 2da832f
Test service: dd-trace-py

✅ 0 Failed, 87 Passed, 1468 Skipped, 4m 10.11s Total duration (35m 16.92s time saved)

ncybul

This LGTM. I tested it out with a simple dummy script. It looks like everything is working as expected (e.g. we return the usage chunk when user requests it but extract it out under the hood when they leave out that option).

Just curious, why do we want to avoid requesting open ai for the usage information if the user explicitly sets include_usage to False? If we do the extraction under the hood, isn't this a benign change on our end?

Yun-Kim · 2025-01-07T20:13:29Z

Just curious, why do we want to avoid requesting open ai for the usage information if the user explicitly sets include_usage to False? If we do the extraction under the hood, isn't this a benign change on our end?

You're not wrong that this is strictly a benign change on our part, but the overriding principle here is to minimize disrupting user applications where possible. If a user decides to explicitly reject token usage from OpenAI, then regardless of how strange that use case is (also likely a very rare use case IMO), we should respect that - it's one thing for us to infer/add kwargs, but it's another for us to set a user-provided kwarg to the opposite value specified by the user.

This way, we'll be able to keep most people happy, just with the explicit caveat that users in this specific case shouldn't expect 100% accurate token counts. Does that make sense @ncybul?

Kyle-Verhoog

just one question, otherwise good to me!

ddtrace/contrib/internal/openai/utils.py

Kyle-Verhoog

well done 👏

Magic to add stream token options

708c51c

Yun-Kim added 2 commits December 3, 2024 11:17

Allow updating stream_options kwarg instead of overriding

a9abc46

Extract helper

23e706b

datadog-datadog-prod-us1 bot reviewed Dec 10, 2024

View reviewed changes

ddtrace/contrib/internal/openai/utils.py Outdated Show resolved Hide resolved

Yun-Kim force-pushed the yunkim/openai-streamed-token-magic branch from b2fff76 to 480f3d0 Compare December 31, 2024 02:59

Fix helper chunk extraction method to be async compatible

92f0707

Yun-Kim force-pushed the yunkim/openai-streamed-token-magic branch from 480f3d0 to 92f0707 Compare December 31, 2024 03:03

Yun-Kim and others added 5 commits December 31, 2024 10:49

Add completion magic, use __anext__ instead of anext()

8b81295

Merge branch 'main' into yunkim/openai-streamed-token-magic

096d282

Revert supporting completion hooks stream tokens

761748b

Simplify tests to be snapshots, rename stream magic identifier

262ae5f

Add testing coverage for token extraction

ebae35d

Yun-Kim marked this pull request as ready for review December 31, 2024 17:06

Yun-Kim requested review from a team as code owners December 31, 2024 17:06

Yun-Kim requested review from P403n1x87 and mabdinur December 31, 2024 17:06

Yun-Kim added 5 commits December 31, 2024 12:14

Remove unnecessary tests, add release note

b7fbbb0

fmt

0ceb32f

Fix snapshot test

afb00ba

Ignore potentially differing output/total token counts

6b0e759

Fix completion stream token snapshots

5531b92

Yun-Kim force-pushed the yunkim/openai-streamed-token-magic branch from f8c5e1e to 5531b92 Compare January 6, 2025 21:26

ncybul approved these changes Jan 7, 2025

View reviewed changes

Kyle-Verhoog reviewed Jan 8, 2025

View reviewed changes

ddtrace/contrib/internal/openai/utils.py Show resolved Hide resolved

Yun-Kim commented Jan 8, 2025

View reviewed changes

ddtrace/contrib/internal/openai/utils.py Show resolved Hide resolved

Add explanatory comments

453cc99

Kyle-Verhoog approved these changes Jan 8, 2025

View reviewed changes

Merge branch 'main' into yunkim/openai-streamed-token-magic

2da832f

Yun-Kim enabled auto-merge (squash) January 8, 2025 21:50

Yun-Kim disabled auto-merge January 8, 2025 22:01

Yun-Kim merged commit 35fe7b5 into main Jan 9, 2025
579 checks passed

Yun-Kim deleted the yunkim/openai-streamed-token-magic branch January 9, 2025 15:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(openai): add token usage stream options to request #11606

feat(openai): add token usage stream options to request #11606

Yun-Kim commented Dec 2, 2024 •

edited

Loading

github-actions bot commented Dec 2, 2024 •

edited

Loading

pr-commenter bot commented Dec 2, 2024 •

edited

Loading

datadog-dd-trace-py-rkomorn bot commented Dec 31, 2024 •

edited

Loading

ncybul left a comment

Yun-Kim commented Jan 7, 2025

Kyle-Verhoog left a comment

Kyle-Verhoog left a comment

feat(openai): add token usage stream options to request #11606

feat(openai): add token usage stream options to request #11606

Conversation

Yun-Kim commented Dec 2, 2024 • edited Loading

Motivation

Proposed design

Streamed reading logic change

Testing

Checklist

Reviewer Checklist

github-actions bot commented Dec 2, 2024 • edited Loading

pr-commenter bot commented Dec 2, 2024 • edited Loading

Benchmarks

datadog-dd-trace-py-rkomorn bot commented Dec 31, 2024 • edited Loading

Datadog Report

ncybul left a comment

Choose a reason for hiding this comment

Yun-Kim commented Jan 7, 2025

Kyle-Verhoog left a comment

Choose a reason for hiding this comment

Kyle-Verhoog left a comment

Choose a reason for hiding this comment

Yun-Kim commented Dec 2, 2024 •

edited

Loading

github-actions bot commented Dec 2, 2024 •

edited

Loading

pr-commenter bot commented Dec 2, 2024 •

edited

Loading

datadog-dd-trace-py-rkomorn bot commented Dec 31, 2024 •

edited

Loading