You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Sometimes when running evals against models which have flaky APIs which trigger their is_rate_limit() override to return True, users get confused as to why their eval seems to be "stuck" indefinitely. A few observations:
Some provider implementations like google treat numerous errors as is_rate_limit (e.g. 500, 503 and 504) (source). Maybe this override would be better named is_retryable_error.
AFAIK it is not that obvious to users that Inspect is busy retrying/waiting to retry failed HTTP requests. The inspect trace anomalies is helpful. It might be even more user-friendly if the "Running samples" UI somehow indicated that a sample was in a retry loop (a bit like how we show "Generating ...").
The "rate limits" counter in the UI only updates for actual HTTP 429 errors (not other errors we treat as rate limits/retryable). This might well be the right behaviour, but just wanted to highlight this was one source of confusion. May be resolved by tackling 1.\
The text was updated successfully, but these errors were encountered:
Various suggestions from @craigwalton-dsit:
Sometimes when running evals against models which have flaky APIs which trigger their is_rate_limit() override to return True, users get confused as to why their eval seems to be "stuck" indefinitely. A few observations:
Some provider implementations like google treat numerous errors as is_rate_limit (e.g. 500, 503 and 504) (source). Maybe this override would be better named is_retryable_error.
AFAIK it is not that obvious to users that Inspect is busy retrying/waiting to retry failed HTTP requests. The inspect trace anomalies is helpful. It might be even more user-friendly if the "Running samples" UI somehow indicated that a sample was in a retry loop (a bit like how we show "Generating ...").
The "rate limits" counter in the UI only updates for actual HTTP 429 errors (not other errors we treat as rate limits/retryable). This might well be the right behaviour, but just wanted to highlight this was one source of confusion. May be resolved by tackling 1.\
The text was updated successfully, but these errors were encountered: