CUDNN limitation on 3D BN #1558

sdesrozis · 2021-02-06T09:28:35Z

sdesrozis
Feb 6, 2021

Hello,

Have you ever faced the cudnn limitation about number of elements in a tensor ? The size max is bounded to 2**31. So tensor shapes like [70, 32, 100, 100, 100] raises an error. It can happen with batch of 70 on cubes 100^3 with conv 32 filters...

Currently I face such limitation on a 3D conv net for DRP (digital rock physics) app. Btw, Im pretty sure that using monai for my project should be a good option.

Thank you for any inputs you could have 😉

Answered by ericspod

Feb 8, 2021

I definitely haven't encountered an issue like this. It does look like a deep-rooted CUDNN problem from the issue you raised, unless they fix it as described we can't do much at the MONAI level. Would a smaller batch size be a feasible solution? The solution they discussed is to use CPU batchnorm calculations but I would expect that to be rather slow.

View full answer

Nic-Ma · 2021-02-08T15:29:52Z

Nic-Ma
Feb 8, 2021
Maintainer

Hi @sdesrozis ,

I didn't face this issue before, may I know what's your GPU hardware?
@ericspod have you guys faced similar limitation?

Thanks.

0 replies

ericspod · 2021-02-08T15:52:33Z

ericspod
Feb 8, 2021
Maintainer

I haven't encountered this before. As far as I can tell from source sizes are stored as 64-bit signed ints so I wouldn't think we'd have a 32 bit limitation here. With Pytorch 1.7.1 I was able to allocate this tensor size with 32-bit floats on CPU and GPU memory:

t=torch.rand(70, 32, 100, 100, 100,dtype=torch.float32)

Perhaps you have a memory constraint issue or your version of Pytorch doesn't allow this?

0 replies

sdesrozis · 2021-02-08T17:05:14Z

sdesrozis
Feb 8, 2021
Author

Ok sorry, my initial post missed some crucial informations... The issue is related to BatchNorm3d with large tensors. It was fixed for Conv3d but not so simple for batch normalization. You can have a look here pytorch/pytorch#51776

I use Tesla V100-PCIE-16GB but I'm facing same issue on 32GB. I was thinking that maybe you faced this limitation too.

0 replies

ericspod · 2021-02-08T18:29:12Z

ericspod
Feb 8, 2021
Maintainer

I definitely haven't encountered an issue like this. It does look like a deep-rooted CUDNN problem from the issue you raised, unless they fix it as described we can't do much at the MONAI level. Would a smaller batch size be a feasible solution? The solution they discussed is to use CPU batchnorm calculations but I would expect that to be rather slow.

1 reply

sdesrozis Feb 8, 2021
Author

Yep smaller batch size solves my issue. So wait the fix from PyTorch and cudnn (maybe). I asked here because I know you handle 3D data.

My idea is to see how SyncBatchNorm could be an insight to split the large batch size into smaller ones that fit the cudnn limitation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDNN limitation on 3D BN #1558

{{title}}

Replies: 4 comments 1 reply

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

CUDNN limitation on 3D BN #1558

sdesrozis Feb 6, 2021

Replies: 4 comments · 1 reply

Nic-Ma Feb 8, 2021 Maintainer

ericspod Feb 8, 2021 Maintainer

sdesrozis Feb 8, 2021 Author

ericspod Feb 8, 2021 Maintainer

sdesrozis Feb 8, 2021 Author

sdesrozis
Feb 6, 2021

Replies: 4 comments 1 reply

Nic-Ma
Feb 8, 2021
Maintainer

ericspod
Feb 8, 2021
Maintainer

sdesrozis
Feb 8, 2021
Author

ericspod
Feb 8, 2021
Maintainer

sdesrozis Feb 8, 2021
Author