Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU memory profiling #162

Open
QiJune opened this issue Aug 21, 2020 · 2 comments
Open

GPU memory profiling #162

QiJune opened this issue Aug 21, 2020 · 2 comments

Comments

@QiJune
Copy link
Collaborator

QiJune commented Aug 21, 2020

We are trying to compare the GPU memory consumption between GoTorch and PyTorch with the Resnet50 model. The scripts locate at https://github.com/wangkuiyi/gotorch/tree/develop/example/resnet.

The GPU card is P100 with 16G memory.

Experiment 1:

Following is the result, it's measured with nvidia-smi command.

Only Forward Forward and Backward
PyTorch 3719 MiB 2545 MiB
GoTorch 2447 MiB 2767 MiB

We remove three-line codes in Only Forward scenario:

# optimizer.zero_grad()
# loss.backward()
# optimizer.step()

Experiment 2:

GPU memory with different batch size:

Batch Size 16 128 160
PyTorch 2545 MiB 13161 MiB 15295 MiB
GoTorch 2767 MiB 14755 MiB OOM
@QiJune
Copy link
Collaborator Author

QiJune commented Aug 21, 2020

From this answer https://discuss.pytorch.org/t/how-to-delete-pytorch-objects-correctly-from-memory/947, it seems that GPU memory consumption value with nvidia-smi is not accurate.

@sneaxiy
Copy link
Collaborator

sneaxiy commented Sep 2, 2020

  • We can use torch.cuda.max_memory_allocated() to get the actual GPU memory occupied by tensors only.
  • We can use torch.cuda.empty_cache() to release the other memory that is not occupied by the tensors but the auto-growth caching allocators. In this way, the memory consumption value with nvidia-smi would be accurate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants