Replies: 1 comment
-
Setting |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am trying to load a very large model (70B in 4 bit) into 2x24GB GPUs. The model loads fine (takes about 20GB on each), but then the KV cache is failing to allocate enough memory (it tries to get 5GB).
How do I limit its size?
Beta Was this translation helpful? Give feedback.
All reactions