Can inline assembly in sycl like cuda does? #13203
-
You know, developers can inline PTX in the cuda kernel, which is much more efficient for expert developers. That is much more important for extreme optimization. |
Beta Was this translation helpful? Give feedback.
Answered by
AlexeySachkov
Apr 2, 2024
Replies: 1 comment 6 replies
-
Yes it is exactly as in the nvcc compiler (see https://docs.nvidia.com/cuda/inline-ptx-assembly/index.html) except you have to wrap ptx code in the following MACRO:
In a parallel_for. |
Beta Was this translation helpful? Give feedback.
6 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
@alanzhai219, yes, this functionality is supported already, you can find some examples (but with Intel GPUs ASM) here: https://github.com/intel/llvm/tree/sycl/sycl/test-e2e/InlineAsm