How to limit KV cache size? #1020

lostmsu · 2024-12-20T20:55:37Z

lostmsu
Dec 20, 2024

I am trying to load a very large model (70B in 4 bit) into 2x24GB GPUs. The model loads fine (takes about 20GB on each), but then the KV cache is failing to allocate enough memory (it tries to get 5GB).

How do I limit its size?

martindevans · 2024-12-20T21:06:09Z

martindevans
Dec 20, 2024
Maintainer

Setting ModelParams.ContextSize should do the job I think.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to limit KV cache size? #1020

{{title}}

Replies: 1 comment

{{title}}

Select a reply

How to limit KV cache size? #1020

lostmsu Dec 20, 2024

Replies: 1 comment

martindevans Dec 20, 2024 Maintainer

lostmsu
Dec 20, 2024

martindevans
Dec 20, 2024
Maintainer