Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat(openai): add token usage stream options to request (#11606)
This PR adds special casing such that any user's openai streamed chat/completion requests, unless explicitly specified otherwise, will by default include the token usage as part of the streamed response. ### Motivation OpenAI streamed responses have historically not provided token usage details as part of the streamed response. However OpenAI earlier this year added a `stream_options: {"include_usage": True}` kwarg option to the chat/completions API to provide token usage details as part of an additional stream chunk at the end of the streamed response. If this kwarg option was not specified by the user, then token usage is not provided by OpenAI and our current behavior is to give our best effort to 1) use the `tiktoken` library to calculate token counts, or 2) use a very crude heuristic to estimate token counts. Both are not ideal as neither alternative takes into account function/tool calling. **It is simpler and more accurate to just request the token counts from OpenAI directly.** ### Proposed design There are 2 major components for this feature: 1. If a user does not specify `stream_options: {"include_usage": True}` as a kwarg on the chat/completions call, we need to manually insert that as part of the kwargs before the request is made. 2. If a user does not specify `stream_options: {"include_usage": True}` as a kwarg on the chat/completions call but we add that option on the integration-side, the returned streamed response will include an additional chunk (with empty content) at the end containing token usage information. To avoid disrupting user applications with one more chunk (with different content/fields) than expected, the integration should automatically extract the last chunk under the hood. Note: if a user does explicitly specify `stream_options: {"include_usage": False}`, then we must respect their intent and avoid adding token usage into the kwargs. We'll add in our release note that we cannot guarantee 100% accurate token counts in this case.` ### Streamed reading logic change Additionally, we make a change to `__iter__/__aiter__` methods of our traced streamed responses. Previously we returned the traced streamed response (and relied on the underlying `__next__/__anext__` methods), but to ensure spans will be finished even if the streamed response is not fully consumed, we change the `__iter__/__aiter__` methods to implement the stream consumption using a try/catch/finally. Note: this only applies to 1. When users use `__iter__/__aiter__()`, since directly calling `__next__()/__anext__()` individually will not let us know when the overall response is fully consumed. 2. When users use `__aiter__()` and break early, they are still responsible for calling `resp.close()`, since asynchronous generators do not automatically close when the context manager is exited (this is held until close() is called either manually or by the garbage collector). ### Testing This PR modifies the existing OpenAI streamed completion/chat completion tests to be simplified (use snapshots when possible instead of making large numbers of tedious assertions) and to add coverage for the token extraction behavior (existing tests remove `include_usage: True` options to assert that the automatic extraction works, and we add a couple tests asserting our original behavior if `include_usage: False` is explicitly set). ## Checklist - [x] PR author has checked that all the criteria below are met - The PR description includes an overview of the change - The PR description articulates the motivation for the change - The change includes tests OR the PR description describes a testing strategy - The PR description notes risks associated with the change, if any - Newly-added code is easy to change - The change follows the [library release note guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html) - The change includes or references documentation updates if necessary - Backport labels are set (if [applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)) ## Reviewer Checklist - [x] Reviewer has checked that all the criteria below are met - Title is accurate - All changes are related to the pull request's stated goal - Avoids breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes - Testing strategy adequately addresses listed risks - Newly-added code is easy to change - Release note makes sense to a user of the library - If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment - Backport labels are set in a manner that is consistent with the [release branch maintenance policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)
- Loading branch information