Support structured outputs response format based on signature in JSON adapter #1881

dbczumar · 2024-12-03T07:48:57Z

Support structured outputs response format based on signature in JSON adapter

Signed-off-by: dbczumar <[email protected]>

dbczumar · 2024-12-03T07:49:54Z

dspy/adapters/json_adapter.py

+        A Pydantic model representing the `response_format` parameter for the LM request.
+    """
+
+    def filter_json_schema_extra(field_name: str, field_info: FieldInfo) -> FieldInfo:


This needs test coverage

dbczumar · 2024-12-03T07:51:12Z

dspy/adapters/json_adapter.py

+                        "Failed to obtain response using signature-based structured outputs"
+                        " response format: Falling back to default 'json_object' response format."
+                        " Exception: {e}"
+                    )


We expect to hit this case for tuples, until there's support for prefixItems in the OpenAI structured outputs API. Other vendors, e.g. Databricks, will likely lag even further behind (e.g. Databricks doesn't support anyOf currently, but OpenAI does), meaning that we could hit this case for additional output types

Signed-off-by: dbczumar <[email protected]>

dbczumar · 2024-12-03T07:53:27Z

tests/reliability/utils.py

@@ -31,7 +31,6 @@ def assert_program_output_correct(
        grading_guidelines = [grading_guidelines]

    with judge_dspy_configuration():
-        print("GUIDELINES", grading_guidelines)


Removing a leftover & unintentional debugging statement from test generation code

dbczumar · 2024-12-03T07:54:21Z

tests/reliability/test_pydantic_models.py

    )
    assert answer.certainty >= 0
    assert answer.certainty <= 1
    assert len(answer.comments) >= 2


-def test_color_classification_using_enum():
+@pytest.mark.parametrize("module", [dspy.Predict, dspy.ChainOfThought])


CoT fails this test case with chat and json adapters on master:

FAILED test_pydantic_models.py::test_color_classification_using_enum[llama-3.1-70b-instruct-ChainOfThought] - ValueError: Color.BLUE is not a valid name or value for the enum Color ================================================ 1 failed, 1 passed, 24 skipped, 26 deselected, 2 warnings in 0.22s

However, it passes on the PR branch :D

dbczumar · 2024-12-03T07:55:54Z

dspy/clients/lm.py

@@ -212,7 +216,7 @@ def copy(self, **kwargs):
        return new_instance


-@functools.lru_cache(maxsize=None)
+# @functools.lru_cache(maxsize=None)


This is a hack to get the implementation working end to end for test purposes (see also https://github.com/stanfordnlp/dspy/pull/1881/files#r1867211773). We need a proper fix before merge, e.g. #1862 (though it's not 100% clear to me why we need LRU caching here in the first place on top of the caching that LiteLLM is already providing)

cc @okhat I'm sure I'm missing something here - let me know if there's additional context motivating this lru_cache.

dbczumar · 2024-12-03T07:58:12Z

dspy/clients/lm.py

@@ -92,7 +92,7 @@ def __call__(self, prompt=None, messages=None, **kwargs):
            completion = cached_litellm_text_completion if cache else litellm_text_completion

        response = completion(
-            request=ujson.dumps(dict(model=self.model, messages=messages, **kwargs)),


When response_format is a pydantic model (recommended by LiteLLM), ujson.dumps() fails because pydantic models are not directly serializable using ujson.dumps(). This line diff is a temporary hack to get the implementation working end-to-end for test purposes. We need a proper solution before merge.

Hi @dbczumar, please let me know if this approach is worthy of a PR, I am happy to contribute. Custom adapter using this approach.

Signed-off-by: dbczumar <[email protected]>

dbczumar · 2024-12-10T04:14:49Z

dspy/adapters/json_adapter.py

+                try:
+                    response_format = _get_structured_outputs_response_format(signature)
+                    outputs = lm(**inputs, **lm_kwargs, response_format=response_format)
+                except Exception:


LM providers have differing levels of support for response_format fields. For example, Databricks doesn't support anyOf / allOf, but OpenAI does.

A blanket try/catch seems appropriate here to start.

dbczumar · 2024-12-10T04:15:07Z

dspy/adapters/json_adapter.py

+                    response_format = _get_structured_outputs_response_format(signature)
+                    outputs = lm(**inputs, **lm_kwargs, response_format=response_format)
+                except Exception:
+                    _logger.debug(


Debated a warning, but it seems too spammy

chenmoneygithub · 2024-12-10T06:44:21Z

dspy/adapters/json_adapter.py

+            # Recursively update fields of the nested model
+            nested_model = field_copy.annotation.__pydantic_model__
+            updated_fields = {
+                key: filter_json_schema_extra(key, value) for key, value in nested_model.__fields__.items()


Curious - why do we need recursive handling? nested_model.__fields__ should be Pydantic fields instead of DSPy fields, do they also have these DSPy internal attributes like __dspy_field_type?

In the vast majority of cases, hopefully not....

Though the following user error will silently produce this state, which is probably best to exclude from response_format because the program still runs

import dspy import pydantic class Obj(pydantic.BaseModel): a: int = dspy.OutputField() b: str class MySig(dspy.Signature): inp: str = dspy.InputField() outp: Obj = dspy.OutputField() print(MySig.schema())

{'$defs': {'Obj': {'properties': {'a': {'__dspy_field_type': 'output', 'title': 'A', 'type': 'integer'}, 'b': {'title': 'B', 'type': 'string'}}, 'required': ['a', 'b'], 'title': 'Obj', 'type': 'object'}}, 'description': 'Given the fields `inp`, produce the fields `outp`.', 'properties': {'inp': {'__dspy_field_type': 'input', 'desc': '${inp}', 'prefix': 'Inp:', 'title': 'Inp', 'type': 'string'}, 'outp': {'$ref': '#/$defs/Obj', '__dspy_field_type': 'output', 'desc': '${outp}', 'prefix': 'Outp:'}}, 'required': ['inp', 'outp'], 'title': 'MySig', 'type': 'object'}

… adapter (#1881) * Fix Signed-off-by: dbczumar <[email protected]> * Fix Signed-off-by: dbczumar <[email protected]> * fix Signed-off-by: dbczumar <[email protected]> * Debug Signed-off-by: dbczumar <[email protected]> * Fix Signed-off-by: dbczumar <[email protected]> * Here Signed-off-by: dbczumar <[email protected]> * Here Signed-off-by: dbczumar <[email protected]> * Update json_adapter.py * Update json_adapter.py * Update json_adapter.py * Update json_adapter.py --------- Signed-off-by: dbczumar <[email protected]>

Fix

7a1a84f

Signed-off-by: dbczumar <[email protected]>

dbczumar commented Dec 3, 2024

View reviewed changes

Fix

b7dfbb8

Signed-off-by: dbczumar <[email protected]>

dbczumar commented Dec 3, 2024

View reviewed changes

dbczumar changed the title ~~[WIP] Support structured outputs response format based on signature in JSON adapter~~ Support structured outputs response format based on signature in JSON adapter Dec 10, 2024

dbczumar added 4 commits December 9, 2024 19:24

fix

ed9d504

Signed-off-by: dbczumar <[email protected]>

fix

6e01b2e

Signed-off-by: dbczumar <[email protected]>

Debug

7c0e03b

Signed-off-by: dbczumar <[email protected]>

Fix

5af146d

Signed-off-by: dbczumar <[email protected]>

dbczumar force-pushed the structured_outputs_pr branch from 5bfba54 to 5af146d Compare December 10, 2024 03:56

dbczumar marked this pull request as ready for review December 10, 2024 03:57

dbczumar and others added 6 commits December 9, 2024 19:58

Here

6007f61

Signed-off-by: dbczumar <[email protected]>

Here

68d8877

Signed-off-by: dbczumar <[email protected]>

Update json_adapter.py

b87cf96

Update json_adapter.py

90dc353

Update json_adapter.py

40dee38

Update json_adapter.py

80f9f34

dbczumar commented Dec 10, 2024

View reviewed changes

chenmoneygithub reviewed Dec 10, 2024

View reviewed changes

okhat merged commit f1fc6bc into stanfordnlp:main Dec 10, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support structured outputs response format based on signature in JSON adapter #1881

Support structured outputs response format based on signature in JSON adapter #1881

dbczumar commented Dec 3, 2024

dbczumar Dec 3, 2024

dbczumar Dec 3, 2024 •

edited

Loading

dbczumar Dec 3, 2024

dbczumar Dec 3, 2024 •

edited

Loading

dbczumar Dec 3, 2024 •

edited

Loading

dbczumar Dec 3, 2024 •

edited

Loading

dbczumar Dec 3, 2024 •

edited

Loading

rohitgarud Dec 8, 2024 •

edited

Loading

dbczumar Dec 10, 2024

dbczumar Dec 10, 2024

chenmoneygithub Dec 10, 2024

dbczumar Dec 10, 2024 •

edited

Loading

Support structured outputs response format based on signature in JSON adapter #1881

Support structured outputs response format based on signature in JSON adapter #1881

Conversation

dbczumar commented Dec 3, 2024

dbczumar Dec 3, 2024

Choose a reason for hiding this comment

dbczumar Dec 3, 2024 • edited Loading

Choose a reason for hiding this comment

dbczumar Dec 3, 2024

Choose a reason for hiding this comment

dbczumar Dec 3, 2024 • edited Loading

Choose a reason for hiding this comment

dbczumar Dec 3, 2024 • edited Loading

Choose a reason for hiding this comment

dbczumar Dec 3, 2024 • edited Loading

Choose a reason for hiding this comment

dbczumar Dec 3, 2024 • edited Loading

Choose a reason for hiding this comment

rohitgarud Dec 8, 2024 • edited Loading

Choose a reason for hiding this comment

dbczumar Dec 10, 2024

Choose a reason for hiding this comment

dbczumar Dec 10, 2024

Choose a reason for hiding this comment

chenmoneygithub Dec 10, 2024

Choose a reason for hiding this comment

dbczumar Dec 10, 2024 • edited Loading

Choose a reason for hiding this comment

dbczumar Dec 3, 2024 •

edited

Loading

dbczumar Dec 3, 2024 •

edited

Loading

dbczumar Dec 3, 2024 •

edited

Loading

dbczumar Dec 3, 2024 •

edited

Loading

dbczumar Dec 3, 2024 •

edited

Loading

rohitgarud Dec 8, 2024 •

edited

Loading

dbczumar Dec 10, 2024 •

edited

Loading