Use valid JSON booleans in DAG template #1349

Spectavi · 2025-02-08T06:47:40Z

This capitalized True was confusing my model, causing it to sometimes send invalid JSON.
Changing this fixed the problem.

vercel · 2025-02-08T06:47:45Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
evals-docs	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Feb 8, 2025 6:48am

penguine-ip · 2025-02-08T06:55:55Z

Hey @Spectavi thanks for the PR! I was wondering will this affect the unpacking of true and false into the Verdict ? Since python does have capitlization...

Spectavi · 2025-02-08T07:00:47Z

Yes, the json.loads() method parses into the Python bool as expected. I saw no ill-effects from this change and it fully resolved the occasional false-positive.

Spectavi · 2025-02-08T07:16:16Z

To clarify in more detail, if a model obeys that template as-is they will be sending back invalid JSON that causes json.loads() in trimAndLoadJson to fail. Most models are smart enough to “correct” the template and send back valid JSON, but once in a while for whatever reason, it will follow the template exactly and send back invalid JSON. I verified this by printing out jsonStr right before json.loads() is called.

penguine-ip · 2025-02-08T07:25:21Z

@Spectavi got it, appreciate it!

penguine-ip · 2025-02-08T07:26:06Z

Also @Spectavi since dag is pretty new thing we added, would love to get some feedback!

Spectavi · 2025-02-08T08:13:13Z

Sure, yeah I saw it just went in so kind of expected a few bumps. Now that I have the kinks smoothed out it does seem to be more controllable than GEval, which is really nice!

I did hit a few bumps and head-scratchers, here’s the notes:

The code on the Docs page imports DAG instead of DeepAcyclicGraph, and sets dag=DAG at the bottom.
Ability to make BinaryJudgementNode a root node, maybe? Seems like in simpler classification scenarios that is all that’s needed.
It wasn’t clear to me why in VerdictNode the score is being divided by 10. I’m assuming that means I should be setting the positive case to be 10 instead of 1? A section that calls that out in the docs would probably be a good idea.

penguine-ip · 2025-02-09T02:11:03Z

@Spectavi right just saw the bug too on importing DAG, thanks!

Yes it makes sense for binary node to be able to be a root node, next release!

For the verdict score we just didn't want users to have to type floats instead of ints. I'd imagine its more error prone if they had to do 0.1 instead of 1 (like what if they do 01, 0.01, etc). Will include it in the docs!

Use valid JSON booleans in templates.py

5fb1817

vercel bot deployed to Preview February 8, 2025 06:48 View deployment

penguine-ip merged commit e364119 into confident-ai:main Feb 8, 2025
4 of 5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use valid JSON booleans in DAG template #1349

Use valid JSON booleans in DAG template #1349

Spectavi commented Feb 8, 2025

vercel bot commented Feb 8, 2025 •

edited

Loading

penguine-ip commented Feb 8, 2025

Spectavi commented Feb 8, 2025

Spectavi commented Feb 8, 2025

penguine-ip commented Feb 8, 2025

penguine-ip commented Feb 8, 2025

Spectavi commented Feb 8, 2025 •

edited

Loading

penguine-ip commented Feb 9, 2025

Use valid JSON booleans in DAG template #1349

Use valid JSON booleans in DAG template #1349

Conversation

Spectavi commented Feb 8, 2025

vercel bot commented Feb 8, 2025 • edited Loading

penguine-ip commented Feb 8, 2025

Spectavi commented Feb 8, 2025

Spectavi commented Feb 8, 2025

penguine-ip commented Feb 8, 2025

penguine-ip commented Feb 8, 2025

Spectavi commented Feb 8, 2025 • edited Loading

penguine-ip commented Feb 9, 2025

vercel bot commented Feb 8, 2025 •

edited

Loading

Spectavi commented Feb 8, 2025 •

edited

Loading