Extend our eval library / workflow code to support LLM-as-judge #862

aittalam · 2025-02-13T12:40:17Z

See rationale here.

The goal of this task is to have all the code in place to run a pipeline which provides LLM-as-judge evaluation of a given dataset (following the design we aligned on and extending eval library with some aggregate method to have e.g. mean score from models)

github-project-automation bot added this to Lumigator Public Roadmap H1 2025 Feb 13, 2025

aittalam self-assigned this Feb 13, 2025

ividal added backend api Changes which impact API/presentation layer labels Feb 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend our eval library / workflow code to support LLM-as-judge #862

Extend our eval library / workflow code to support LLM-as-judge #862

aittalam commented Feb 13, 2025

Extend our eval library / workflow code to support LLM-as-judge #862

Extend our eval library / workflow code to support LLM-as-judge #862

Comments

aittalam commented Feb 13, 2025