Add TTFT (time to first token) to Langfuse traces #8385

LastRemote · 2024-09-19T11:17:16Z

Is your feature request related to a problem? Please describe.
We are developing several chatbot-like applications that require streaming the response from LLM. There are a couple of metrics to look at, and one of which is the TTFT (time to first token) indicating how long the user needs to wait before seeing something in the output dialog box. However, due to the way that the tracing spans are handled in the pipeline, the run invocation inside the component does not have direct access to the span, so we are not able to log this information to the tracer.

Describe the solution you'd like
The simplest solution would be adding visibility to tracing span from component run() method. This could be a context variable that the methods inside of the component have access to, but I am not very confident about the exact approach here.

Describe alternatives you've considered
The only temporary solution right now is to directly manipulate low-level tracing sdks inside the streaming callback function, and make a special callback function such that it uploads the timestamp upon receiving the first SSE.

Additional context
Add any other context or screenshots about the feature request here.

vblagoje · 2024-09-23T13:22:00Z

Note to self:

TTFT in Langfuse is automatically calculated when completion_start_time (the timestamp of the first token) is provided to generation span i.e. just call update on the generation span with this key/value i.e completion_start_time=datetime.now()

This could be done by attaching custom stream callback on chat generator (most likely from our LangfuseTracer), consuming first token in the callback and calling generation span update - perhaps directly in that callback.

To be investigated how we can eventually do this in async calls as well.

vblagoje · 2024-09-26T09:01:45Z

@LastRemote and @julian-risch

This was, in fact, not so hard to do. Forget the recommendation above. We need to simply timestamp the first chunk received from the LLM. And we should do this across all LLM chat generators. When streaming we don't have data for prompt and completion tokens available and what's really interesting is that even if we set "prompt_tokens" and "completion_tokens" to 0 (see branch above) Langfuse somehow correctly counts them. I'll check with them how this is actually done. Here is the trace depicted below.

LastRemote · 2024-09-26T09:16:56Z

@LastRemote and @julian-risch

This was, in fact, not so hard to do. Forget the recommendation above. We need to simply timestamp the first chunk received from the LLM. And we should do this across all LLM chat generators. When streaming we don't have data for prompt and completion tokens available and what's really interesting is that even if we set "prompt_tokens" and "completion_tokens" to 0 (see branch above) Langfuse somehow correctly counts them. I'll check with them how this is actually done.

Interesting, I have never thought about this approach, but I guess it should work. So basically we store completion_first_chunk as a part of the usage meta so it can be accessed inside of haystack.component.output?

By the way, for OpenAI and the latest versions of Azure OpenAI models, if you set stream_options accordingly, the last streaming chunk will include the actual usage data. Additionally Langfuse will automatically count the usage tokens if the model is from OpenAI or Claude (see screenshot below).

vblagoje · 2024-09-26T09:25:08Z

Aha nice @LastRemote that's why I collected those meta chunks hoping that one day this will work. The change in langfuse tracer was minimal as well. One LOC change in langfuse/tracer.py at ~151, we need to:

span._span.update(usage=meta.get("usage") or None,
                                  model=meta.get("model"),
                                  completion_start_time=meta.get("usage", {}).get("completion_start_time"))

I'll speak to @julian-risch about scheduling this change in the near future but if you wish - feel free to create a PR that updates our chat generators with this change and we'll take it from there - I can review the PR and we can try out various chat generators together.

LastRemote · 2024-09-26T09:56:28Z

@vblagoje Okay sure, I will make a PR.

vblagoje · 2024-09-30T09:13:40Z

@LastRemote before we open a bunch of PRs or group together all the changes into one PR for all chat generators - let's do one trial PR and set the standard for other chat generators.

LastRemote · 2024-10-08T10:03:47Z

@vblagoje Sorry for the delayed response. I took some days off last week. Here we go: #8444

Note that I only made minimal changes since I am not exactly sure how to enable include_usage in OpenAI SDK. I have a customized OpenAI implementation based on httpx that works, but that would require a complete refactor.

LastRemote · 2024-10-08T10:36:47Z

By the way, I also attempted to support Anthropic (including Bedrock Anthropic) models. It seems there's a mismatch in the usage data format between the Langfuse API and Anthropic when updating the Langfuse span, which causes the operation to fail. I believe it would be better for Langfuse to provide direct support for the raw Anthropic format.

vblagoje · 2024-10-08T11:46:19Z

Hey @LastRemote I'll check it out and get back to you. The most recent Anthropic haystack release should work our of the box. I don't think Bedrock Anthropic works with Langfuse atm.

julian-risch added the P2 Medium priority, add to the next sprint if no P1 available label Sep 19, 2024

julian-risch assigned vblagoje Sep 23, 2024

vblagoje changed the title ~~Access to tracing span during component run invocation~~ Add TTFT (time to first token) to Langfuse traces Sep 30, 2024

LastRemote mentioned this issue Oct 8, 2024

feat: Add TTFT support in OpenAI chat generator #8444

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add TTFT (time to first token) to Langfuse traces #8385

Add TTFT (time to first token) to Langfuse traces #8385

LastRemote commented Sep 19, 2024 •

edited

Loading

vblagoje commented Sep 23, 2024 •

edited

Loading

vblagoje commented Sep 26, 2024 •

edited

Loading

LastRemote commented Sep 26, 2024

vblagoje commented Sep 26, 2024

LastRemote commented Sep 26, 2024

vblagoje commented Sep 30, 2024

LastRemote commented Oct 8, 2024

LastRemote commented Oct 8, 2024 •

edited

Loading

vblagoje commented Oct 8, 2024 •

edited

Loading

Add TTFT (time to first token) to Langfuse traces #8385

Add TTFT (time to first token) to Langfuse traces #8385

Comments

LastRemote commented Sep 19, 2024 • edited Loading

vblagoje commented Sep 23, 2024 • edited Loading

vblagoje commented Sep 26, 2024 • edited Loading

LastRemote commented Sep 26, 2024

vblagoje commented Sep 26, 2024

LastRemote commented Sep 26, 2024

vblagoje commented Sep 30, 2024

LastRemote commented Oct 8, 2024

LastRemote commented Oct 8, 2024 • edited Loading

vblagoje commented Oct 8, 2024 • edited Loading

LastRemote commented Sep 19, 2024 •

edited

Loading

vblagoje commented Sep 23, 2024 •

edited

Loading

vblagoje commented Sep 26, 2024 •

edited

Loading

LastRemote commented Oct 8, 2024 •

edited

Loading

vblagoje commented Oct 8, 2024 •

edited

Loading