Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
fix(langchain): pydantic output parser tagging does not throw (#11652)
MLOB-1973 ## What does this PR do? Fixes #11638 Adds some extra checking around the tagging of JSON-like output parsers in streamed cases. These kinds of output parsers concatenate their output for us, so we do not need to append a bunch of chunks together. It was previously thought that the only type was `JsonOutputParser`, which could be `json.dumps`'d as a string tag. However, the `PydanticOutputParser` inherits from `JsonOutputParser`, but cannot be JSON dumped. Thus, we just stringify it instead. To avoid this behavior of throwing in the future, I've added a `try`/`except` to the `json.dumps`. I've special-cased `PydanticOuputParser` as to not generalize it as an expensive exception to `json.dumps`. These are the only two JSON-type output parsers I've seen, but should more be introduced, we'll log our incompatability and just attempt to `str` it instead. ## Testing For a script like: ```python from typing import List from langchain_core.output_parsers import PydanticOutputParser from langchain_core.prompts import ChatPromptTemplate from langchain_openai import ChatOpenAI from pydantic import BaseModel, Field class Person(BaseModel): """Information about a person.""" name: str = Field(..., description="The name of the person") height_in_meters: float = Field( ..., description="The height of the person expressed in meters." ) class People(BaseModel): """Identifying information about all people in a text.""" people: List[Person] # Set up a parser parser = PydanticOutputParser(pydantic_object=People) # Prompt prompt = ChatPromptTemplate.from_messages( [ ( "system", "Answer the user query. Wrap the output in `json` tags\n{format_instructions}", ), ("human", "{query}"), ] ).partial(format_instructions=parser.get_format_instructions()) query = "Anna is 23 years old and she is 6 feet tall" llm = ChatOpenAI() chain = prompt | llm | parser for event in chain.stream({ "query": query }): print(event) ``` The output tagging is as follows on APM spans: <img width="530" alt="Screenshot 2024-12-10 at 12 04 57 PM" src="https://github.com/user-attachments/assets/623b778a-bd6b-45ef-b56c-d0c3d2ada9ae"> and LLMObs spans: <img width="714" alt="Screenshot 2024-12-10 at 12 05 17 PM" src="https://github.com/user-attachments/assets/77bce189-a4ce-40a6-bd9f-fd2a189184f5"> without throwing errors. ## Checklist - [x] PR author has checked that all the criteria below are met - The PR description includes an overview of the change - The PR description articulates the motivation for the change - The change includes tests OR the PR description describes a testing strategy - The PR description notes risks associated with the change, if any - Newly-added code is easy to change - The change follows the [library release note guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html) - The change includes or references documentation updates if necessary - Backport labels are set (if [applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)) ## Reviewer Checklist - [x] Reviewer has checked that all the criteria below are met - Title is accurate - All changes are related to the pull request's stated goal - Avoids breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes - Testing strategy adequately addresses listed risks - Newly-added code is easy to change - Release note makes sense to a user of the library - If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment - Backport labels are set in a manner that is consistent with the [release branch maintenance policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting) (cherry picked from commit 9ba734f)
- Loading branch information