-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mxp: Mixed precision extension #294
base: main
Are you sure you want to change the base?
Conversation
@germasch what do you think? |
Thanks for the recommendations; they are added now. I think content-wise this PR is complete, also including a number of tests. Any input is more than welcome! |
I'll see if I can reproduce the SYCL CI error with latest oneAPI toolkit. The non debug build does pass. Going forward, it is possible to run small GPU code on a computer with Gen9 integrated GPU and with the right env var for double simulation on Gen11. This covers most intel processors, including laptops. |
@bd4 did you manage to reproduce the error? I don't manage to build even vanilla gtensor with sycl as the
in |
Sorry for the delay, I am able to reproduce with latest 2025 oneapi. It looks like it's related to the kernel name in SYCL, somehow an extra '"' is ending up in the name and that fails to compile. Will see if I can wrap my head around what to change to fix it.
Once level zero and oneapi are installed, you need to do the following:
|
Ok, so I found the problem with the sycl debug build:
Then, I guess, the value of 34 corresponding to the character To check this assumption and to pass the build for now, the last "debugging" commit 1da0506 just avoids the tests for the template parameter higher than 30. Anyhow, to properly avoid it, it should be sufficient to use, e.g., Note that the pipeline still fails at the moment, but now this is due to some actual tests not passing with thy sycl build. |
Good find! I saw the quote there in the error of the template name and was not immediately obvious where that came from.
I agree it seems like a bug, but I have been wrong before on this front. I'll work on a minimal reproducer, should be pretty straightforward.
It looks like a lot of those tests are 1 \neq 1 issue, where the type conversion is not happening in the same way as on other platforms. |
Thanks - you gave me the necessary hint in pointing out the extra
With your instructions above I finally managed to "locally" reproduce the sycl debug build error. I didn't dive any deeper on why it only happens in the debug build, or why the error apparently only throws for the device tests. But it is indeed very easy to reproduce outside of gtensor: Using the example code from SYCL Basic Code: parallel_for and replacing
gives the corresponding compilation error via
Gonna have a look into this next.
|
Turns out all of the failing tests were due to normal (non-mxp) gtensor kernels achieving more accuracy than expected with the SYCL build. The reason is that the default So, obviously testing the normal gtensor kernels for value-safety was a bit over-ambitious (I didn't take into account that value-unsafe optimizations might of course also improve the result). Hence, I will just remove those non-mxp "reference tests" and just keep the tests for the mxp kernels. Then the pipeline should pass (I hope). |
I think content-wise this PR is complete, and now it also passes the pipeline.
I reported it in an intel/llvm issue. Let's see what comes out of it.
Did this to pass the pipeline. |
Mixed-precision extension to gtensor
Provides template functions
mxp::adapt
(andmxp::adapt_device
) inspired from and extendinggt::adapt
.An
mxp::mxp_span
(derived fromgt::gtensor_span
) is returned, enabling mixed precision computations in gtensor kernels, i.e., computations where the compute precision differs from the data / storage precision.From the user's perspective, simply replacing
gt::adapt
bymxp::adapt
is sufficient.Example
Features for this PR
.view( ...placeholders... )
Right now only one test and minimal source code is added mainly with the purpose of opening the PR and clarify the questions below.
Questions
mxp::adapt<...>(...)
orgt::mxp::adapt<...>(...)
orgt::mxp_adapt<...>(...)
?#include <gtensor/gtensor.h>
but only if flagGTENSOR_ENABLE_MXP
set (like for FP16)? Or separate explicit include via#include <gtensor/mxp.h>
?include/gtensor
all starting withmxp_
?