-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA ERROR #44
Comments
Hello, I have encountered the same issue and set File "train.py", line 221, in <module>
train(net, loader_train, loader_test, optimizer, criterion)
File "train.py", line 66, in train
out = net(x)
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 166, in forward
return self.module(*inputs[0], **kwargs[0])
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/root/DEA-RWKV/code/model/backbone_train.py", line 113, in forward
x8 = self.level3_VRWKV8(x8, patch_resolution)
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/root/DEA-RWKV/code/model/vrwkv6.py", line 356, in forward
x = _inner_forward(x)
File "/root/DEA-RWKV/code/model/vrwkv6.py", line 349, in _inner_forward
x = x + self.drop_path(self.att(self.ln1(x), patch_resolution))
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/root/DEA-RWKV/code/model/vrwkv6.py", line 239, in forward
x = _inner_forward(x)
File "/root/DEA-RWKV/code/model/vrwkv6.py", line 232, in _inner_forward
x = RUN_CUDA_RWKV6(B, T, C, self.n_head, r, k, v, w, u=self.time_faaaa)
File "/root/DEA-RWKV/code/model/vrwkv6.py", line 66, in RUN_CUDA_RWKV6
return WKV_6.apply(B, T, C, H, r, k, v, w, u)
(Triggered internally at ../torch/csrc/autograd/python_anomaly_mode.cpp:104.)
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
Traceback (most recent call last):
File "train.py", line 221, in <module>
train(net, loader_train, loader_test, optimizer, criterion)
File "train.py", line 72, in train
loss.backward()
File "/root/miniconda3/lib/python3.8/site-packages/torch/_tensor.py", line 363, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/root/miniconda3/lib/python3.8/site-packages/torch/autograd/__init__.py", line 173, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/root/miniconda3/lib/python3.8/site-packages/torch/autograd/function.py", line 253, in apply
return user_fn(self, *args)
File "/root/DEA-RWKV/code/model/vrwkv6.py", line 62, in backward
gu = torch.sum(gu, 0).view(H, C//H)
RuntimeError: CUDA error: an illegal memory access was encountered Have you found a solution for this issue? Any insight would be greatly appreciated! Thank you. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Does anyone have the same issue?
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with
TORCH_USE_CUDA_DSA
to enable device-side assertions.The text was updated successfully, but these errors were encountered: