Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU memory comsumption increases every epoch during training #55

Open
jhkwag970 opened this issue Jan 23, 2025 · 2 comments
Open

GPU memory comsumption increases every epoch during training #55

jhkwag970 opened this issue Jan 23, 2025 · 2 comments

Comments

@jhkwag970
Copy link

Hello,

Thank you for sharing your work!

As I am training on ImageNet1K, I noticed that memory consumption increases by approximately 254MB with each epoch. If this trend continues, the total memory usage will reach 254MB * 300 = 76.2GB.

Is this the intended behavior?

Thank you!

@jhkwag970 jhkwag970 changed the title GPU memory comsumption increases every epoch GPU memory comsumption increases every epoch during training Jan 23, 2025
@ahatamiz
Copy link
Collaborator

ahatamiz commented Feb 9, 2025

Hi @jhkwag970

Could you please provide more details ? specifically whether you are using this version of the codebase or another.

I have not faced this issue before in any of the training runs.

Best

@jhkwag970
Copy link
Author

@ahatamiz
Hello, Thank you for your response. I am using the current MambaVision repo. After validation and reloading memory for training, memory consumption is higher than in the previous epoch. I tried using torch.cuda.empty_cache() as an alternative solution for now. I just wanted to make sure this will not cause any problems during training.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants