-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add TTFT (time to first token) to Langfuse traces #8385
Comments
Note to self: TTFT in Langfuse is automatically calculated when This could be done by attaching custom stream callback on chat generator (most likely from our LangfuseTracer), consuming first token in the callback and calling generation span update - perhaps directly in that callback. To be investigated how we can eventually do this in async calls as well. |
This was, in fact, not so hard to do. Forget the recommendation above. We need to simply timestamp the first chunk received from the LLM. And we should do this across all LLM chat generators. When streaming we don't have data for prompt and completion tokens available and what's really interesting is that even if we set "prompt_tokens" and "completion_tokens" to 0 (see branch above) Langfuse somehow correctly counts them. I'll check with them how this is actually done. Here is the trace depicted below. |
Interesting, I have never thought about this approach, but I guess it should work. So basically we store By the way, for OpenAI and the latest versions of Azure OpenAI models, if you set |
Aha nice @LastRemote that's why I collected those meta chunks hoping that one day this will work. The change in langfuse tracer was minimal as well. One LOC change in
I'll speak to @julian-risch about scheduling this change in the near future but if you wish - feel free to create a PR that updates our chat generators with this change and we'll take it from there - I can review the PR and we can try out various chat generators together. |
@vblagoje Okay sure, I will make a PR. |
@LastRemote before we open a bunch of PRs or group together all the changes into one PR for all chat generators - let's do one trial PR and set the standard for other chat generators. |
@vblagoje Sorry for the delayed response. I took some days off last week. Here we go: #8444 Note that I only made minimal changes since I am not exactly sure how to enable |
By the way, I also attempted to support Anthropic (including Bedrock Anthropic) models. It seems there's a mismatch in the usage data format between the Langfuse API and Anthropic when updating the Langfuse span, which causes the operation to fail. I believe it would be better for Langfuse to provide direct support for the raw Anthropic format. |
Hey @LastRemote I'll check it out and get back to you. The most recent Anthropic haystack release should work our of the box. I don't think Bedrock Anthropic works with Langfuse atm. |
Is your feature request related to a problem? Please describe.
We are developing several chatbot-like applications that require streaming the response from LLM. There are a couple of metrics to look at, and one of which is the TTFT (time to first token) indicating how long the user needs to wait before seeing something in the output dialog box. However, due to the way that the tracing spans are handled in the pipeline, the run invocation inside the component does not have direct access to the span, so we are not able to log this information to the tracer.
Describe the solution you'd like
The simplest solution would be adding visibility to tracing span from component run() method. This could be a context variable that the methods inside of the component have access to, but I am not very confident about the exact approach here.
Describe alternatives you've considered
The only temporary solution right now is to directly manipulate low-level tracing sdks inside the streaming callback function, and make a special callback function such that it uploads the timestamp upon receiving the first SSE.
Additional context
Add any other context or screenshots about the feature request here.
The text was updated successfully, but these errors were encountered: