GPU memory profiling #162

QiJune · 2020-08-21T05:06:55Z

We are trying to compare the GPU memory consumption between GoTorch and PyTorch with the Resnet50 model. The scripts locate at https://github.com/wangkuiyi/gotorch/tree/develop/example/resnet.

The GPU card is P100 with 16G memory.

Experiment 1:

Following is the result, it's measured with nvidia-smi command.

	Only Forward	Forward and Backward
PyTorch	3719 MiB	2545 MiB
GoTorch	2447 MiB	2767 MiB

We remove three-line codes in Only Forward scenario:

# optimizer.zero_grad()
# loss.backward()
# optimizer.step()

Experiment 2:

GPU memory with different batch size:

Batch Size	16	128	160
PyTorch	2545 MiB	13161 MiB	15295 MiB
GoTorch	2767 MiB	14755 MiB	OOM

The text was updated successfully, but these errors were encountered:

QiJune · 2020-08-21T05:20:12Z

From this answer https://discuss.pytorch.org/t/how-to-delete-pytorch-objects-correctly-from-memory/947, it seems that GPU memory consumption value with nvidia-smi is not accurate.

sneaxiy · 2020-09-02T10:50:27Z

We can use torch.cuda.max_memory_allocated() to get the actual GPU memory occupied by tensors only.
We can use torch.cuda.empty_cache() to release the other memory that is not occupied by the tensors but the auto-growth caching allocators. In this way, the memory consumption value with nvidia-smi would be accurate.

Provide feedback