Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use valid JSON booleans in DAG template #1349

Merged
merged 1 commit into from
Feb 8, 2025

Conversation

Spectavi
Copy link
Contributor

@Spectavi Spectavi commented Feb 8, 2025

This capitalized True was confusing my model, causing it to sometimes send invalid JSON.
Changing this fixed the problem.

Copy link

vercel bot commented Feb 8, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
evals-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Feb 8, 2025 6:48am

@penguine-ip
Copy link
Contributor

Hey @Spectavi thanks for the PR! I was wondering will this affect the unpacking of true and false into the Verdict ? Since python does have capitlization...

@Spectavi
Copy link
Contributor Author

Spectavi commented Feb 8, 2025

Yes, the json.loads() method parses into the Python bool as expected. I saw no ill-effects from this change and it fully resolved the occasional false-positive.

@Spectavi
Copy link
Contributor Author

Spectavi commented Feb 8, 2025

To clarify in more detail, if a model obeys that template as-is they will be sending back invalid JSON that causes json.loads() in trimAndLoadJson to fail. Most models are smart enough to “correct” the template and send back valid JSON, but once in a while for whatever reason, it will follow the template exactly and send back invalid JSON. I verified this by printing out jsonStr right before json.loads() is called.

@penguine-ip
Copy link
Contributor

@Spectavi got it, appreciate it!

@penguine-ip penguine-ip merged commit e364119 into confident-ai:main Feb 8, 2025
4 of 5 checks passed
@penguine-ip
Copy link
Contributor

Also @Spectavi since dag is pretty new thing we added, would love to get some feedback!

@Spectavi
Copy link
Contributor Author

Spectavi commented Feb 8, 2025

Sure, yeah I saw it just went in so kind of expected a few bumps. Now that I have the kinks smoothed out it does seem to be more controllable than GEval, which is really nice!

I did hit a few bumps and head-scratchers, here’s the notes:

  • The code on the Docs page imports DAG instead of DeepAcyclicGraph, and sets dag=DAG at the bottom.
  • Ability to make BinaryJudgementNode a root node, maybe? Seems like in simpler classification scenarios that is all that’s needed.
  • It wasn’t clear to me why in VerdictNode the score is being divided by 10. I’m assuming that means I should be setting the positive case to be 10 instead of 1? A section that calls that out in the docs would probably be a good idea.

@penguine-ip
Copy link
Contributor

@Spectavi right just saw the bug too on importing DAG, thanks!

Yes it makes sense for binary node to be able to be a root node, next release!

For the verdict score we just didn't want users to have to type floats instead of ints. I'd imagine its more error prone if they had to do 0.1 instead of 1 (like what if they do 01, 0.01, etc). Will include it in the docs!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants