-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kokkos Support #1650
Comments
Re visability of symbols for ClangEnzyme, did you try LLDEnzyme or one of the other paths described here? https://enzyme.mit.edu/getting_started/UsingEnzyme/#differentiating-cc |
In the case of several of the above functions, I think the right solution is to mark them as allocation like (we have an attribute for this). |
@ZuseZ4 Thanks for the tip. I started with that example CMakeLists.txt, and made Kokkos a subdirectory. I do get through everything until the final executable link:
I assume the primal argument 1 is talking about |
You probably want to do extern on enzyme_dup per that error. I will forewarn that LTO comes with various compile time implications. If it works now that's a great starting point, but we'll probably want to do some attributes directly for ease of kokkos users. |
Thanks @wsmoses , that fixed it for the for-loop version but not the parallel_reduce version. The error messages look very similar to with ClangEnzyme, so maybe I misdiagnosed the original problem:
|
Yeah my same comment about marking the function as allocation like (assuming I'm reading this correctly as an allocation function) applies here. |
But also this is still odd because this implies that you didn't do full LTO with wherever these functions were implemented. And this Enzyme couldn't find the definition to differentiate and complained. I do think this is probably at the level we should mark a custom derivative for at a higher level, but still would be good to confirm it is okay if given the definitions in llvm. |
@wsmoses Made some progress - I wasn't very familiar with the usage of LTO before. I still had to add Now, it's "cannot deduce type of memset" and "cannot deduce type of copy"
Is this the kind of thing that would be fixed by adding the allocation-like attribute to the Kokkos functions in question?
|
Not recommended for production usecase, but try https://enzyme.mit.edu/getting_started/UsingEnzyme/#loose-type-analysis to get started. I think there might also be an example in the cmake on how to add it (but not sure). Also, LLVMgold.so looks suspicious, I think it should use LLD and not gold(?), but maybe Billy knows more. |
@ZuseZ4 no gold is fine here @brian-kelley can you add -g so we can look at a backtrace of where the error comes from. It probably makes sense to mark some mutex or something as inactive there. also FYI Enzyme does differentiate through AMD GPUs as well per your earlier comment |
For those not familiar with it, Kokkos (https://github.com/kokkos/kokkos) is a C++ library and programming model for portable, shared-memory parallelism. A program written once using Kokkos can be compiled for a variety of CPU and GPU backends using the common vendor toolchains. OpenMP, Cuda, HIP, and SYCL are some of the most popular backends.
Kokkos is used heavily by many codes to come out of the Exascale Computing Project, as well as Trilinos and apps built with it. Enzyme has support for OpenMP and Cuda code, but supporting Kokkos could be an efficient route to autodiff on other parallel backends (in particular, AMD and Intel GPUs).
Here is a simple Kokkos program that attempts to use
__enzyme_autodiff
to compute the gradient of the 2-norm.nrm2
usesKokkos::parallel_reduce
to sum up the squared elements of a vector, and then doesKokkos::sqrt
(one of the portable wrappers for basic math functions). The hand-written gradientgradNrm2
computes what the answer should be.Uncommenting the first version of
nrm2
(where the reduction is a normal for loop) works, so Enzyme is smart enough to understand theView<double*>
as a data structure, and accesses usingoperator()
.While most Kokkos constructs are templated on execution/memory spaces, View type, functor type, etc, there are still functions that are compiled into
libkokkos
and so their definitions are not available in the translation unit that ClangEnzyme can see. Here is some of the output that happens when I try to build this program with Enzyme:Most of functions are called internally by the View allocation and the
parallel_reduce
.The high-level path to supporting Kokkos probably looks something like:
CallDerivatives.cpp
?)parallel_for
andparallel_reduce
: for the reverse mode, the gradient code should be aparallel_reduce
andparallel_for
respectively with differentiated versions of the functor body. Like we discussed, this must also avoid data races using atomics if View elements are updated in parallel.Some other open questions:
Enzyme community members who might be interested: @wsmoses @michel2323 @ftynse @vchuravy @ivanradanov @albertcohen
And others from the Sandia/Kokkos group: @kliegeois @srajama1
The text was updated successfully, but these errors were encountered: