llama 3 8B does not fit in 4090 GPU #27168
Replies: 2 comments 1 reply
-
Hey, @aurotripathy! I'm Dosu, a bot here to help you with bug fixes, questions, and contributions. Let's tackle this issue together! To resolve the
Here's a code snippet to help manage memory: import gc
import torch
torch.cuda.empty_cache()
gc.collect()
!nvidia-smi These steps can help manage memory usage and potentially resolve the |
Beta Was this translation helpful? Give feedback.
-
First of all, I'd suggest checking the other processes using |
Beta Was this translation helpful? Give feedback.
-
Checked other resources
Commit to Help
Example Code
The Llama 3.1 8GB FP16 model should fit in a 24BG 4090 (8x2GB=16GB + activation + KV-cache), it does not,
kindly see error message.
Description
OutOfMemoryError: CUDA out of memory. Tried to allocate 16.00 MiB. GPU 0 has a total capacity of 23.55 GiB of which 16.69 MiB is free. Including non-PyTorch memory, this process has 0 bytes memory in use. Of the allocated memory 23.15 GiB is allocated by PyTorch, and 1.17 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
System Info
langchain==0.3.2
langchain-community==0.3.1
langchain-core==0.3.9
langchain-text-splitters==0.3.0
Note: you may need to restart the kernel to use updated packages.
Python 3.10.12
container:
nvcr.io/nvidia/pytorch_24.07-py3/jupyter
Beta Was this translation helpful? Give feedback.
All reactions