Replies: 4 comments 8 replies
-
Hello, I think I will merge the current state of the develop branch (with some changes to the test suite and not the library) this month as all the other features require more time to implement. Have you tested this optimization to be useful for GROMACS? It only affects big systems and I am not sure if it has much impact on MI300 (due to it having L3 cache). |
Beta Was this translation helpful? Give feedback.
-
Great!
A quick benchmark on MI250X has shown speed-up up to ~7% in FFT time (primarily for small systems 🤔; will need to looks more into that). Don't have MI300 to try. |
Beta Was this translation helpful? Give feedback.
-
The improvement may also be due to another commit: daf09d3 where radix kernels were slightly optimized. The padding mentioned in the AMD commit by default is enabled for systems with all dimension sizes multiplied being more than 2097152. |
Beta Was this translation helpful? Give feedback.
-
Hello, I have implemented a new register assignment logic and a set of optimizations to generated kernels that improved performance quite a bit (especially on Nvidia). I think it may be interesting to run GROMACS with VkFFT on Nvidia hardware to compare these optimizations to cuFFT. Sorry for the delay with the release, I wanted these improvements to be in it. Best regards, |
Beta Was this translation helpful? Give feedback.
-
Hi! Do you have a timeline for the next VkFFT release? Would like to see how it matches up with GROMACS release timings to decide which version to bundle. 1377057 would be nice to have :)
Beta Was this translation helpful? Give feedback.
All reactions