You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using MLA, the prefix cache will be automatically disabled(https://github.com/vllm-project/vllm/blob/v0.7.3/vllm/config.py#L3328). However, when this content is commented out, the prefix cache will work normally. What is the reason for disabling the prefix cache here? My startup command is as follows: python3 -m vllm.entrypoints.openai.api_server --model /data00/models/DeepSeek-R1 --port 8000 --enable-prefix-caching --gpu-memory-utilization 0.98 --max-model-len 1024 -tp 8
The text was updated successfully, but these errors were encountered:
Your current environment
When using MLA, the prefix cache will be automatically disabled(https://github.com/vllm-project/vllm/blob/v0.7.3/vllm/config.py#L3328). However, when this content is commented out, the prefix cache will work normally. What is the reason for disabling the prefix cache here? My startup command is as follows:
python3 -m vllm.entrypoints.openai.api_server --model /data00/models/DeepSeek-R1 --port 8000 --enable-prefix-caching --gpu-memory-utilization 0.98 --max-model-len 1024 -tp 8
The text was updated successfully, but these errors were encountered: