Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Usage]: MLA disable prefix cache #13822

Open
zeroorhero opened this issue Feb 25, 2025 · 0 comments
Open

[Usage]: MLA disable prefix cache #13822

zeroorhero opened this issue Feb 25, 2025 · 0 comments
Labels
usage How to use vllm

Comments

@zeroorhero
Copy link

zeroorhero commented Feb 25, 2025

Your current environment

When using MLA, the prefix cache will be automatically disabled(https://github.com/vllm-project/vllm/blob/v0.7.3/vllm/config.py#L3328). However, when this content is commented out, the prefix cache will work normally. What is the reason for disabling the prefix cache here? My startup command is as follows:
python3 -m vllm.entrypoints.openai.api_server --model /data00/models/DeepSeek-R1 --port 8000 --enable-prefix-caching --gpu-memory-utilization 0.98 --max-model-len 1024 -tp 8

@zeroorhero zeroorhero added the usage How to use vllm label Feb 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
usage How to use vllm
Projects
None yet
Development

No branches or pull requests

1 participant