-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tweak type 3 setpts #609
Tweak type 3 setpts #609
Conversation
…atk arrays; reduce large array reads/writes
This is a good idea. I thought of doing it at some point. Do you have an estimate of the improvement? I don't remember this taking too long in my profiling. |
I didn't notice any considerable improvement, but memory reduction is always nice, and I expect that scaling behavior should be better with fewer memory accesses. The overhead for this part of the code can be large, but of course it is only part of the planning phase and therefore not a major concern. It is possible to speed up the correction factor considerably by allowing the compiler to vectorize the many |
I like it as is and can be merged. For vectorization we could open an issue where I vectorize it with xsimd since it has a cos implementation. We can use pragmas/attributes around the function with fast-math to have the compiler doing so for us. Othewise, we can move everything that cannot rely on fast-math to another file and enable it on this file. |
Thanks for the effort, but I perhaps would have liked to approve this before merging, since it rewrites some of my code, and I am back from vacation. Next time :) The "operator () (T k)" stuff I guess is just a way to get on-the-fly phihat[k] for each dimension without storing to a previous array. But, surely all of this abstraction into this mini-class will have negligible performance change? (computing the cosines dominates the DRAM movement, right? Or am I wrong?) A speed comparison would be good. |
Apologies.
Class go away when compiling (in theory) so this is likely to have no impact on performance. The opposite will surprise me. DRAM movement is the same as the previous version. The only difference is that now things are allocated on the heap instead of the stack. Heap can be slower than stack and we pay the allocation. But I do not think it makes a measurable difference. On the other hand, it reduces the memory consumption. |
This removes the temporary
phihatk
arrays in thesetpts
method for type 3 transforms. It also reduces the amount of large array accesses happening in this function.