We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hey guys,
Not sure if this is bug or not, but just in case, i am posting it here so we can discuss.
When I define a variable that is a view of CuArray outside the kernel, I get the expected result (considering the column major order storage).
However, inside the cuda kernel, i apply the exactly same command and get different results.
Here is a sample code to reproduce the behavior.
Can you please enlight me if this is expected or if it is indeed a problem ?
Thanks in advance
#-------------------------------------------
using CUDA function my_kernel(matrix) i = threadIdx().x + (blockIdx().x - 1) * blockDim().x if i == 1 row_view = @view matrix[1, :] col_view = @view matrix[:, 1] @cuprintln("Type of row_view inside the kernel: ", typeof(row_view)) @cuprintln("Type of col_view inside the kernel: ", typeof(col_view)) end matrix[i, i] += 1 return end matrix = CUDA.zeros( Float32 , 1024 , 1024 ) row_view = @view matrix[1, :] col_view = @view matrix[:, 1] println("Type of row_view outside the kernel: ", typeof(row_view)) println("Type of col_view outside the kernel: ", typeof(col_view)) println("") @cuda threads=256 blocks=4 my_kernel(matrix) result = Array(matrix)
The text was updated successfully, but these errors were encountered:
This is expected.
There's "Discussions" for that :-)
Sorry, something went wrong.
No branches or pull requests
Hey guys,
Not sure if this is bug or not, but just in case, i am posting it here so we can discuss.
When I define a variable that is a view of CuArray outside the kernel, I get the expected result (considering the column major order storage).
However, inside the cuda kernel, i apply the exactly same command and get different results.
Here is a sample code to reproduce the behavior.
Can you please enlight me if this is expected or if it is indeed a problem ?
Thanks in advance
#-------------------------------------------
#-------------------------------------------
The text was updated successfully, but these errors were encountered: