New recipe: High Performance Tensor Transposition library (v1.0.5) #4765
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This adds the HPTT library (High Performance Tensor Transposition), original version here:
https://github.com/springer13/hptt
I plan to use this library in forthcoming versions of TensorOperations.jl, and it can also be useful to other tensor libraries.
The patch fixes compatibility with clang, and corresponds to the changes of PR springer13/hptt#16 .
While the library promises dedicated optimisations for the ARM platform, these lead to compiler issues ( believe the original source file does contain bugs or is broken), so I decided to fall back to the general build flags on this platform and to disable specific optimisations.
On Intel platforms, I enable avx optimisations, which yields a warning that I cannot assume these to be present. The library does not support switching avx on or off at runtime, so the only way around would be to have separate builds based on the availability of AVX, which I do not know how to realise using BinaryBuilder. I assume most users would actually have AVX available.