Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CUDA] ConvTranspose2D Error with PreconditionNotMet #70837

Open
jwnhy opened this issue Jan 15, 2025 · 1 comment
Open

[CUDA] ConvTranspose2D Error with PreconditionNotMet #70837

jwnhy opened this issue Jan 15, 2025 · 1 comment
Assignees

Comments

@jwnhy
Copy link

jwnhy commented Jan 15, 2025

bug描述 Describe the Bug

当执行下列代码时,报错 PreconditionNotMet.

import paddle as pdl

model = pdl.nn.Conv2DTranspose(1, 1, kernel_size=[1, 1], stride=[46889, 4], padding=[0, 1])
tensor = pdl.rand([1, 1, 46889, 2])

model(tensor)

错误信息如下

W0115 10:23:44.565328 2009009 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 9.0, Driver API Version: 12.6, Runtime API Version: 12.3
W0115 10:23:44.565850 2009009 gpu_resources.cc:164] device: 0, cuDNN Version: 9.0.
Traceback (most recent call last):
  File "/home/jwnhy/gpu_fuzz/gen/poc3.py", line 6, in <module>
    model(tensor)
  File "/home/jwnhy/miniconda3/lib/python3.12/site-packages/paddle/nn/layer/layers.py", line 1426, in __call__
    return self.forward(*inputs, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jwnhy/miniconda3/lib/python3.12/site-packages/paddle/nn/layer/conv.py", line 883, in forward
    out = F.conv2d_transpose(
          ^^^^^^^^^^^^^^^^^^^
  File "/home/jwnhy/miniconda3/lib/python3.12/site-packages/paddle/nn/functional/conv.py", line 1262, in conv2d_transpose
    pre_bias = op(
               ^^^
RuntimeError: (PreconditionNotMet) Tensor's dimension is out of bound.Tensor's dimension must be equal or less than the size of its memory.But received Tensor's dimension is 18446744048552321260, memory's size is 26382377216.
  [Hint: Expected numel() * SizeOf(dtype()) <= memory_size(), but received numel() * SizeOf(dtype()):18446744048552321260 > memory_size():26382377216.] (at ../paddle/phi/core/dense_tensor_impl.cc:46)

显然我的 Tensor 的长度并没有达到 18446744048552321260,疑似是库中存在整数溢出 Bug

我的环境如下:

[pip3] numpy==1.26.4
[pip3] nvidia-cublas-cu12==12.3.4.1
[pip3] nvidia-cuda-cupti-cu12==12.3.101
[pip3] nvidia-cuda-nvrtc-cu12==12.3.107
[pip3] nvidia-cuda-runtime-cu12==12.3.101
[pip3] nvidia-cudnn-cu12==9.0.0.312
[pip3] nvidia-cufft-cu12==11.2.1.3
[pip3] nvidia-curand-cu12==10.3.5.147
[pip3] nvidia-cusolver-cu12==11.6.1.9
[pip3] nvidia-cusparse-cu12==12.3.1.170
[pip3] nvidia-nccl-cu12==2.19.3
[pip3] nvidia-nvjitlink-cu12==12.6.85
[pip3] nvidia-nvtx-cu12==12.4.127
[conda] numpy                     1.26.4                   pypi_0    pypi
[conda] nvidia-cublas-cu12        12.3.4.1                 pypi_0    pypi
[conda] nvidia-cuda-cupti-cu12    12.3.101                 pypi_0    pypi
[conda] nvidia-cuda-nvrtc-cu12    12.3.107                 pypi_0    pypi
[conda] nvidia-cuda-runtime-cu12  12.3.101                 pypi_0    pypi
[conda] nvidia-cudnn-cu12         9.0.0.312                pypi_0    pypi
[conda] nvidia-cufft-cu12         11.2.1.3                 pypi_0    pypi
[conda] nvidia-curand-cu12        10.3.5.147               pypi_0    pypi
[conda] nvidia-cusolver-cu12      11.6.1.9                 pypi_0    pypi
[conda] nvidia-cusparse-cu12      12.3.1.170               pypi_0    pypi
[conda] nvidia-nccl-cu12          2.19.3                   pypi_0    pypi
[conda] nvidia-nvjitlink-cu12     12.6.85                  pypi_0    pypi
[conda] nvidia-nvtx-cu12          12.4.127                 pypi_0    pypi

其他补充信息 Additional Supplementary Information

No response

@liym27
Copy link
Contributor

liym27 commented Jan 15, 2025

感谢反馈,已复现该问题,我们将修复该问题。@jwnhy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants