You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm having an issue where our curator data generation runs are failing to properly resume from cache. We are using LiteLLM backend to contact a locally hosted LLM. This issue occurs on both the v0.1.20 version and the 0.1.19 version that was used yesterday. Resumes have worked for us in the past, so this seems to be new behavior.
On the first restart, the number of concurrent requests will be far below what it should (We have it set to 80, it will begin at around 18). It will then slowly count down to 1 concurrent request and sit there far beyond the time it should take to complete 1 request (10+ minutes). If restarted after that, it will stay on "Preparing to generate responses..." with no movement.
This behavior appears to be new, as we've been able to resume from cache with no issues previously.
The below example is from DeepSeek v3 but I have verified it occurs when contacting Tulu405B as well.
❯ python bespoke_shisa_v1_reannotator.py -m "deepseek-ai/DeepSeek-V3" --api-base http://ip-10-1-1-135:8000/v1
[03/01/25 06:48:05] INFO Manually set max_concurrent_requests to 80 base_online_request_processor.py:173
INFO Manually set max_concurrent_requests to 80 base_online_request_processor.py:173[03/01/25 06:48:06]
INFO Getting rate limits for model: hosted_vllm/deepseek-ai/DeepSeek-V3 litellm_online_request_processor.py:243
WARNING LiteLLM does not support cost estimation for model: This model isn't mapped yet. model=deepseek-ai/DeepSeek-V3, custom_llm_provider=hosted_vllm. Add it here - litellm_online_request_processor.py:226 https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json.
INFO Test call headers: {'llm_provider-date': 'Sat, 01 Mar 2025 06:48:05 GMT', 'llm_provider-server': 'uvicorn', 'llm_provider-content-length': '402', litellm_online_request_processor.py:229 'llm_provider-content-type': 'application/json'}
INFO Running LiteLLMOnlineRequestProcessor completions with model: hosted_vllm/deepseek-ai/DeepSeek-V3 base_request_processor.py:132[03/01/25 06:48:10]
INFO Using cached requests. If you want to regenerate the dataset, disable or delete the cache. base_request_processor.py:213[03/01/25 06:48:11]
INFO Manually set max_requests_per_minute to 10000 base_online_request_processor.py:190
INFO Manually set max_tokens_per_minute to 10000000 base_online_request_processor.py:209
INFO Resuming progress by reading existing file: /fsx/ubuntu/.cache/curator/1f96e0acfca6031c/responses_0.jsonl base_request_processor.py:511
INFO Found 132 successful requests and 0 previously failed requests and 0 parsing errors in /fsx/ubuntu/.cache/curator/1f96e0acfca6031c/responses_0.jsonl base_request_processor.py:537
Preparing to generate 187943 responses using hosted_vllm/deepseek-ai/DeepSeek-V3 with combined input/output token limiting strategy ─╯
I have verified that there is nothing unusual in the debug log in the cache folder. I will happily provide any other logs you need.
The text was updated successfully, but these errors were encountered:
Thank you for reporting @cryptowooser and sorry for this issue. Do you mind attaching the debug logs in the cache folder?
It seems like it's not really that curator failed to resume from cache but more that it is unable to contact the server. I assume you're able to make requests to your server outside of curator with raw HTTP requests? It could potentially be a LiteLLM issue, so maybe one fix here is to use the OpenAI backend.
I'm having an issue where our curator data generation runs are failing to properly resume from cache. We are using LiteLLM backend to contact a locally hosted LLM. This issue occurs on both the v0.1.20 version and the 0.1.19 version that was used yesterday. Resumes have worked for us in the past, so this seems to be new behavior.
Our backend generation parameters are as follows:
On the first restart, the number of concurrent requests will be far below what it should (We have it set to 80, it will begin at around 18). It will then slowly count down to 1 concurrent request and sit there far beyond the time it should take to complete 1 request (10+ minutes). If restarted after that, it will stay on "Preparing to generate responses..." with no movement.
This behavior appears to be new, as we've been able to resume from cache with no issues previously.
The below example is from DeepSeek v3 but I have verified it occurs when contacting Tulu405B as well.
I have verified that there is nothing unusual in the debug log in the cache folder. I will happily provide any other logs you need.
The text was updated successfully, but these errors were encountered: