Failing to resume from cache with locally hosted LLM? #564

cryptowooser · 2025-03-01T07:09:10Z

I'm having an issue where our curator data generation runs are failing to properly resume from cache. We are using LiteLLM backend to contact a locally hosted LLM. This issue occurs on both the v0.1.20 version and the 0.1.19 version that was used yesterday. Resumes have worked for us in the past, so this seems to be new behavior.

Our backend generation parameters are as follows:

backend_params = {
        "base_url": api_base,
        "generation_params": { "max_tokens": 32768, "min_p": 0.1 },
        "max_concurrent_requests": 80, 
        "max_requests_per_minute": 10000, 
        "max_tokens_per_minute": 10_000_000, 
        "request_timeout": 120
    }

On the first restart, the number of concurrent requests will be far below what it should (We have it set to 80, it will begin at around 18). It will then slowly count down to 1 concurrent request and sit there far beyond the time it should take to complete 1 request (10+ minutes). If restarted after that, it will stay on "Preparing to generate responses..." with no movement.

This behavior appears to be new, as we've been able to resume from cache with no issues previously.

The below example is from DeepSeek v3 but I have verified it occurs when contacting Tulu405B as well.

❯ python bespoke_shisa_v1_reannotator.py -m "deepseek-ai/DeepSeek-V3" --api-base http://ip-10-1-1-135:8000/v1
[03/01/25 06:48:05] INFO     Manually set max_concurrent_requests to 80                                                                                                                                      base_online_request_processor.py:173                    
INFO     Manually set  max_concurrent_requests to 80                                                                                                                                      base_online_request_processor.py:173[03/01/25 06:48:06] 
INFO     Getting rate limits for model: hosted_vllm/deepseek-ai/DeepSeek-V3                                                                                                          litellm_online_request_processor.py:243                    
WARNING  LiteLLM does not support cost estimation for model: This model isn't mapped yet. model=deepseek-ai/DeepSeek-V3, custom_llm_provider=hosted_vllm. Add it here -               litellm_online_request_processor.py:226                             https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json.                                                                                                                                                      
INFO     Test call headers: {'llm_provider-date': 'Sat, 01 Mar 2025 06:48:05 GMT', 'llm_provider-server': 'uvicorn', 'llm_provider-content-length': '402',                            litellm_online_request_processor.py:229                             'llm_provider-content-type': 'application/json'}                                                                                                                                                                                        
INFO     Running LiteLLMOnlineRequestProcessor completions with model: hosted_vllm/deepseek-ai/DeepSeek-V3                                                                                      base_request_processor.py:132[03/01/25 06:48:10] 
INFO     Using cached requests. If you want to regenerate the dataset, disable or delete the cache.                                                                                             base_request_processor.py:213[03/01/25 06:48:11] 
INFO     Manually set max_requests_per_minute to 10000                                                                                                                                   base_online_request_processor.py:190                    
INFO     Manually set max_tokens_per_minute to 10000000                                                                                                                                  base_online_request_processor.py:209                    
INFO     Resuming progress by reading existing file: /fsx/ubuntu/.cache/curator/1f96e0acfca6031c/responses_0.jsonl                                                                              base_request_processor.py:511                    
INFO     Found 132 successful requests and 0 previously failed requests and 0 parsing errors in /fsx/ubuntu/.cache/curator/1f96e0acfca6031c/responses_0.jsonl                                   base_request_processor.py:537
Preparing to generate 187943 responses using hosted_vllm/deepseek-ai/DeepSeek-V3 with combined input/output token limiting strategy                                                                                                           ─╯

I have verified that there is nothing unusual in the debug log in the cache folder. I will happily provide any other logs you need.

The text was updated successfully, but these errors were encountered:

vutrung96 · 2025-03-01T14:21:05Z

Thank you for reporting @cryptowooser and sorry for this issue. Do you mind attaching the debug logs in the cache folder?

It seems like it's not really that curator failed to resume from cache but more that it is unable to contact the server. I assume you're able to make requests to your server outside of curator with raw HTTP requests? It could potentially be a LiteLLM issue, so maybe one fix here is to use the OpenAI backend.

vutrung96 · 2025-03-01T14:22:51Z

e.g. something like this

llm = curator.LLM(
    model_name="deepseek-ai/DeepSeek-V3",
    backend="openai",
    backend_params={
        "base_url": "https://your-openai-compatible-api-url",
        "api_key": <YOUR_OPENAI_COMPATIBLE_SERVICE_API_KEY>,
    },
)

RyanMarten added P1 Important curator-cache Related to caching labels Mar 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failing to resume from cache with locally hosted LLM? #564

Failing to resume from cache with locally hosted LLM? #564

cryptowooser commented Mar 1, 2025 •

edited

Loading

vutrung96 commented Mar 1, 2025

vutrung96 commented Mar 1, 2025

Failing to resume from cache with locally hosted LLM? #564

Failing to resume from cache with locally hosted LLM? #564

Comments

cryptowooser commented Mar 1, 2025 • edited Loading

vutrung96 commented Mar 1, 2025

vutrung96 commented Mar 1, 2025

cryptowooser commented Mar 1, 2025 •

edited

Loading