Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing to resume from cache with locally hosted LLM? #564

Open
cryptowooser opened this issue Mar 1, 2025 · 2 comments
Open

Failing to resume from cache with locally hosted LLM? #564

cryptowooser opened this issue Mar 1, 2025 · 2 comments
Labels
curator-cache Related to caching P1 Important

Comments

@cryptowooser
Copy link

cryptowooser commented Mar 1, 2025

I'm having an issue where our curator data generation runs are failing to properly resume from cache. We are using LiteLLM backend to contact a locally hosted LLM. This issue occurs on both the v0.1.20 version and the 0.1.19 version that was used yesterday. Resumes have worked for us in the past, so this seems to be new behavior.

Our backend generation parameters are as follows:

backend_params = {
        "base_url": api_base,
        "generation_params": { "max_tokens": 32768, "min_p": 0.1 },
        "max_concurrent_requests": 80, 
        "max_requests_per_minute": 10000, 
        "max_tokens_per_minute": 10_000_000, 
        "request_timeout": 120
    }

On the first restart, the number of concurrent requests will be far below what it should (We have it set to 80, it will begin at around 18). It will then slowly count down to 1 concurrent request and sit there far beyond the time it should take to complete 1 request (10+ minutes). If restarted after that, it will stay on "Preparing to generate responses..." with no movement.

This behavior appears to be new, as we've been able to resume from cache with no issues previously.

The below example is from DeepSeek v3 but I have verified it occurs when contacting Tulu405B as well.

❯ python bespoke_shisa_v1_reannotator.py -m "deepseek-ai/DeepSeek-V3" --api-base http://ip-10-1-1-135:8000/v1
[03/01/25 06:48:05] INFO     Manually set max_concurrent_requests to 80                                                                                                                                      base_online_request_processor.py:173                    
INFO     Manually set  max_concurrent_requests to 80                                                                                                                                      base_online_request_processor.py:173[03/01/25 06:48:06] 
INFO     Getting rate limits for model: hosted_vllm/deepseek-ai/DeepSeek-V3                                                                                                          litellm_online_request_processor.py:243                    
WARNING  LiteLLM does not support cost estimation for model: This model isn't mapped yet. model=deepseek-ai/DeepSeek-V3, custom_llm_provider=hosted_vllm. Add it here -               litellm_online_request_processor.py:226                             https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json.                                                                                                                                                      
INFO     Test call headers: {'llm_provider-date': 'Sat, 01 Mar 2025 06:48:05 GMT', 'llm_provider-server': 'uvicorn', 'llm_provider-content-length': '402',                            litellm_online_request_processor.py:229                             'llm_provider-content-type': 'application/json'}                                                                                                                                                                                        
INFO     Running LiteLLMOnlineRequestProcessor completions with model: hosted_vllm/deepseek-ai/DeepSeek-V3                                                                                      base_request_processor.py:132[03/01/25 06:48:10] 
INFO     Using cached requests. If you want to regenerate the dataset, disable or delete the cache.                                                                                             base_request_processor.py:213[03/01/25 06:48:11] 
INFO     Manually set max_requests_per_minute to 10000                                                                                                                                   base_online_request_processor.py:190                    
INFO     Manually set max_tokens_per_minute to 10000000                                                                                                                                  base_online_request_processor.py:209                    
INFO     Resuming progress by reading existing file: /fsx/ubuntu/.cache/curator/1f96e0acfca6031c/responses_0.jsonl                                                                              base_request_processor.py:511                    
INFO     Found 132 successful requests and 0 previously failed requests and 0 parsing errors in /fsx/ubuntu/.cache/curator/1f96e0acfca6031c/responses_0.jsonl                                   base_request_processor.py:537
Preparing to generate 187943 responses using hosted_vllm/deepseek-ai/DeepSeek-V3 with combined input/output token limiting strategy                                                                                                           ─╯

I have verified that there is nothing unusual in the debug log in the cache folder. I will happily provide any other logs you need.

@vutrung96
Copy link
Contributor

Thank you for reporting @cryptowooser and sorry for this issue. Do you mind attaching the debug logs in the cache folder?

It seems like it's not really that curator failed to resume from cache but more that it is unable to contact the server. I assume you're able to make requests to your server outside of curator with raw HTTP requests? It could potentially be a LiteLLM issue, so maybe one fix here is to use the OpenAI backend.

@vutrung96
Copy link
Contributor

e.g. something like this

llm = curator.LLM(
    model_name="deepseek-ai/DeepSeek-V3",
    backend="openai",
    backend_params={
        "base_url": "https://your-openai-compatible-api-url",
        "api_key": <YOUR_OPENAI_COMPATIBLE_SERVICE_API_KEY>,
    },
)

@RyanMarten RyanMarten added P1 Important curator-cache Related to caching labels Mar 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
curator-cache Related to caching P1 Important
Projects
None yet
Development

No branches or pull requests

3 participants