Setting TPM and RPM in proxy seems not working #8016

rivamarco · 2025-01-27T12:52:45Z

rivamarco
Jan 27, 2025

Hi, I'm playing around with LiteLLM Proxy both for cloud (OpenAI) and served models (vLLM) and it's great.

I was trying to implement the rate and token limits but I don't understand if what I want to do is achievable, because it seems not working, but probably I'm doing something wrong.

What I would like to do is limit the invocations of the model, so for example if I have a proxy with one OpenAI model i would like to set a maximum of tokens (e.g. 1000) so that after that limit is reached, the proxy returns an error. The same for the RPM.

The problem is that given the following:

model_list:
  - model_name: openai/gpt-4o
    litellm_params:
      api_key: os.environ/OPENAI_API_KEY
      model: openai/gpt-4o
      rpm: 2
      tpm: 1000

router_settings:
  routing_strategy: usage-based-routing-v2 # 👈 KEY CHANGE
  redis_host: redis
  redis_password: myredissecret
  redis_port: 6379
  enable_pre_call_check: true

general_settings:
  master_key: sk-1234

It doesn't work as I expect, because requests are never blocked even with an higher amount of tokens or an higher amount of requests

Did I miss something?

Thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Setting TPM and RPM in proxy seems not working #8016

{{title}}

Replies: 0 comments

Select a reply

Setting TPM and RPM in proxy seems not working #8016

rivamarco Jan 27, 2025

Replies: 0 comments

rivamarco
Jan 27, 2025