Replies: 4 comments 3 replies
-
We should do our best to avoid this as maintenance of the project quickly becomes very costly.
Can you clarify why? I see bfloat16 is customized for NVPTX already, so it's not clear why changing the storage type is unwanted? Is it just code divergence for different targets? If so, I suppose the proposed solution with inline asm should be easier to maintain. Am I right?
|
Beta Was this translation helpful? Give feedback.
-
Yeah exactly. We could support it if we diverge for different targets. I think the fact that there are a lot of generic implementations of bfloat16 math functions using shorts might be remain problematic if we do this.
Yeah I thought the simplest solution would be to operate with inline asm directly on the shorts, which avoids us having to make the switch over to __bf16 in NVPTX right now. However I can see why there might be an argument for using __bf16 in backends where it is supported. So I wanted to get some other opinions on this. It would be useful to know what the plans of other backends are regarding __bf16 in order to make the best decision. |
Beta Was this translation helpful? Give feedback.
-
Hi @JackAKirk I have started looking at this issue. i wanted to quickly clarify if I am looking at relevant portions of bfloat16 support. I suppose we are specifically referring to __bf16 added here. Is that correct? Thanks commit ecd682b |
Beta Was this translation helpful? Give feedback.
-
Yes, we do plan to add this type in SPIR-V and map it to the appropriate LLVM IR type and/or SPIR-V friendly IR construct. Our main motivation here is to extend cooperative/joint matrix extension capabilities. We don't have timeline for this yet though. |
Beta Was this translation helpful? Give feedback.
-
DPC++ does not currently support __bf16 in device code, see https://reviews.llvm.org/D141375
However it is supported for the NVPTX llvm backend: https://reviews.llvm.org/D136311 , and it is supported on amdgpu as a storage type https://reviews.llvm.org/D139398 (I believe it is only used for a single instruction in amdgpu currently:
mfma
).Based on my current understanding, I don't think it is a good idea to switch the storage type of bfloat16 here https://github.com/intel/llvm/blob/sycl/sycl/include/sycl/ext/oneapi/bfloat16.hpp#L27 from
uint16_t
to__bf16
, even conditionally for certain backends that could support it (e.g. NVPTX), until full support is available for all backends. Note that a recent llvm-project pull down had to be partially reverted due to a switch of some NVPTX builtins from using uint16_t to __bf16. However in the DPC++ cuda backend we can easily go around this by using inline ptx (that uses the uint16_t storage type) as a replacement for calling these builtins, at least as a temporary solution. This would avoid any problems with past/future pulldowns that use __bf16 in NVPTX builtins, without breaking DPC++ code.However I thought it would be a good idea to raise a discussion item here in case people have plans/thoughts on the issue of future
__bf16
support in DPC++. Is there a plan to support the__bf16
type for SPIRV backends for example?Beta Was this translation helpful? Give feedback.
All reactions