-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix tests/test_examples_run #436
Comments
I ran most of the valid examples in For example:
Replicate(skipping over some details because the output is very long) Now computing fibonacci(17)
Ollama(skipping over some details because the output is very long) Find a random number between 1 and 20 def fibonacci(n): Note: the granite model used on Replicate actually produced incorrect code (fib(17)=1597, not 987). The model on ollama produced correct results. I wonder if the difference is because
Note: Pretty similar in terms of output.
Note: Again, the outputs are similar. Next, I looked at some of the files that were labeled as Non-Deterministic.
Note: Replicate produced an elaborate explanation while Ollama was very concise.
Note: Replicate and Ollama yielded different similarity scores/metrics. I was able to test this far before reaching the Replicate API limit. Do we care about correctness for |
Describe the bug
The test
test_examples_run
is currently failing due to non-determinism (litellm is not passing temperature:0 to replicate).We need to figure out if moving away from Replicate can help (e.g. ollama), and how to run the nightly test as a github action (perhaps with watsonx instead).
To Reproduce
Run that test.
Expected behavior
Screenshots
Desktop (please complete the following information):
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: