Align on architecture for running LLM-as-judge evaluation #861

aittalam · 2025-02-13T12:37:48Z

See rationale here.

The main goal of this task is to end up with alignment on how we want to implement this new evaluation (e.g. as a new workflow which calls inference + LLM judge inference + evaluation vs having evaluation call LLM judge inference).
The deliverable should be a diagram explaining how we will run this new feature in Lumigator

github-project-automation bot added this to Lumigator Public Roadmap H1 2025 Feb 13, 2025

aittalam changed the title ~~align on architecture for running both the evaluation with LLM as judge and the evaluation of judges themselves~~ Align on architecture for running both the evaluation with LLM as judge and the evaluation of judges themselves Feb 13, 2025

aittalam changed the title ~~Align on architecture for running both the evaluation with LLM as judge and the evaluation of judges themselves~~ Align on architecture for running LLM-as-judge evaluation Feb 13, 2025

aittalam self-assigned this Feb 13, 2025

ividal added backend api Changes which impact API/presentation layer labels Feb 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Align on architecture for running LLM-as-judge evaluation #861

Align on architecture for running LLM-as-judge evaluation #861

aittalam commented Feb 13, 2025

Align on architecture for running LLM-as-judge evaluation #861

Align on architecture for running LLM-as-judge evaluation #861

Comments

aittalam commented Feb 13, 2025