-
Hello, Have you ever faced the cudnn limitation about number of elements in a tensor ? The size max is bounded to 2**31. So tensor shapes like [70, 32, 100, 100, 100] raises an error. It can happen with batch of 70 on cubes 100^3 with conv 32 filters... Currently I face such limitation on a 3D conv net for DRP (digital rock physics) app. Btw, Im pretty sure that using monai for my project should be a good option. Thank you for any inputs you could have 😉 |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 1 reply
-
Hi @sdesrozis , I didn't face this issue before, may I know what's your GPU hardware? Thanks. |
Beta Was this translation helpful? Give feedback.
-
I haven't encountered this before. As far as I can tell from source sizes are stored as 64-bit signed ints so I wouldn't think we'd have a 32 bit limitation here. With Pytorch 1.7.1 I was able to allocate this tensor size with 32-bit floats on CPU and GPU memory: t=torch.rand(70, 32, 100, 100, 100,dtype=torch.float32) Perhaps you have a memory constraint issue or your version of Pytorch doesn't allow this? |
Beta Was this translation helpful? Give feedback.
-
Ok sorry, my initial post missed some crucial informations... The issue is related to I use Tesla V100-PCIE-16GB but I'm facing same issue on 32GB. I was thinking that maybe you faced this limitation too. |
Beta Was this translation helpful? Give feedback.
-
I definitely haven't encountered an issue like this. It does look like a deep-rooted CUDNN problem from the issue you raised, unless they fix it as described we can't do much at the MONAI level. Would a smaller batch size be a feasible solution? The solution they discussed is to use CPU batchnorm calculations but I would expect that to be rather slow. |
Beta Was this translation helpful? Give feedback.
I definitely haven't encountered an issue like this. It does look like a deep-rooted CUDNN problem from the issue you raised, unless they fix it as described we can't do much at the MONAI level. Would a smaller batch size be a feasible solution? The solution they discussed is to use CPU batchnorm calculations but I would expect that to be rather slow.