[Usage]: MLA disable prefix cache #13822

zeroorhero · 2025-02-25T11:02:51Z

Your current environment

When using MLA, the prefix cache will be automatically disabled(https://github.com/vllm-project/vllm/blob/v0.7.3/vllm/config.py#L3328). However, when this content is commented out, the prefix cache will work normally. What is the reason for disabling the prefix cache here? My startup command is as follows：
python3 -m vllm.entrypoints.openai.api_server --model /data00/models/DeepSeek-R1 --port 8000 --enable-prefix-caching --gpu-memory-utilization 0.98 --max-model-len 1024 -tp 8

The text was updated successfully, but these errors were encountered:

zeroorhero added the usage How to use vllm label Feb 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Usage]: MLA disable prefix cache #13822

[Usage]: MLA disable prefix cache #13822

zeroorhero commented Feb 25, 2025 •

edited

Loading

[Usage]: MLA disable prefix cache #13822

[Usage]: MLA disable prefix cache #13822

Comments

zeroorhero commented Feb 25, 2025 • edited Loading

Your current environment

zeroorhero commented Feb 25, 2025 •

edited

Loading