Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(vertex_ai/gemini): improve chunk parsing for streaming responses #8401

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

miraclebakelaser
Copy link
Contributor

This PR fixes bug #8143 by updating the vertex_ai streaming chunk parsing logic to remove only a leading data: prefix.

Relevant issues

#8143

Type

🐛 Bug Fix

Changes

    def _common_chunk_parsing_logic(self, chunk: str) -> GenericStreamingChunk:
        try:
-            chunk = chunk.replace("data:", "")
+            chunk = chunk.strip()
+            if chunk.startswith("data:"):
+                chunk = chunk[len("data:"):].strip()
             if len(chunk) > 0:
                 """
                 Check if initial chunk valid json
                 - if partial json -> enter accumulated json logic
                 - if valid - continue
                 """
                 if self.chunk_type == "valid_json":
                     return self.handle_valid_json_chunk(chunk=chunk)
                 elif self.chunk_type == "accumulated_json":
                     return self.handle_accumulated_json_chunk(chunk=chunk)

Fix incorrect removal of "data:" strings within response content.
Copy link

vercel bot commented Feb 8, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
litellm ✅ Ready (Inspect) Visit Preview 💬 Add feedback Feb 9, 2025 10:18am

Copy link
Contributor

@ishaan-jaff ishaan-jaff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please a add a test and / or unit test

chunk = chunk.replace("data:", "")
chunk = chunk.strip()
if chunk.startswith("data:"):
chunk = chunk[len("data:"):].strip()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.strip() Is this needed ? Wouldn't this remove an start / end empty space that actually came from llm ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case where self.chunk_type == "valid_json", the strip wouldn't have any effect on the llm message content itself (added testing for it). I updated the handling to be more precise.

Enhance test coverage for ModelResponseIterator's chunk parsing logic, focusing on preserving 'data:' prefixes and surrounding spaces in Vertex AI and Google Studio Gemini streaming responses
@miraclebakelaser
Copy link
Contributor Author

please add a test and / or unit test

Added tests that cover cases where self.chunk_type == "valid_json". How are chunks formatted when self.chunk_type == "accumulated_json"?

I understand that in the case of self.chunk_type == "valid_json", the messages arrive in event stream format and are chunked by message:

chunk 1:
data: {"candidates": [{"content": {"role": "model","parts": [{"text": "data"}]}}],"usageMetadata": {},"modelVersion": "gemini-2.0-flash-exp","createTime": "2025-01-01T00:00:00.000000Z","responseId": "12345"}

chunk 2:
data: {"candidates": [{"content": {"role": "model","parts": [{"text": ": line "}]}}],"modelVersion": "gemini-2.0-flash-exp","createTime": "2025-01-01T00:00:00.000000Z","responseId": "12345"}

chunk 3:
data: {"candidates": [{"content": {"role": "model","parts": [{"text": "1\\ndata: line 2\\ndata: line 3\\ndata:"}]}}],"modelVersion": "gemini-2.0-flash-exp","createTime": "2025-01-01T00:00:00.000000Z","responseId": "12345"}

chunk 4:
data: {"candidates": [{"content": {"role": "model","parts": [{"text": " line 4\\n"}]},"finishReason": "STOP"}],"usageMetadata": {"promptTokenCount": 42,"candidatesTokenCount": 24,"totalTokenCount": 66,"promptTokensDetails": [{"modality": "TEXT","tokenCount": 42}],"candidatesTokensDetails": [{"modality": "TEXT","tokenCount": 24}]},"modelVersion": "gemini-2.0-flash-exp","createTime": "2025-01-01T00:00:00.000000Z","responseId": "12345"}

When self.chunk_type == "accumulated_json", do chunks have the same format as above (i.e. data: ...), or are they something like a json string split into pieces like the following:

chunk 1:
{"candidates": [{"content": {"role": "model","parts

chunk 2:
": [{"text": "```"}]}}],"usageMetadata": {},"modelVersion": "gemini-1.5-flash-001","createTime":

I haven't encountered a situation where the code goes down this path so your input would help me add a test for that case too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants