Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tweak type 3 setpts #609

Merged
merged 4 commits into from
Jan 28, 2025
Merged

Tweak type 3 setpts #609

merged 4 commits into from
Jan 28, 2025

Conversation

mreineck
Copy link
Collaborator

This removes the temporary phihatk arrays in the setpts method for type 3 transforms. It also reduces the amount of large array accesses happening in this function.

@mreineck mreineck requested a review from ahbarnett January 27, 2025 11:02
src/finufft_core.cpp Outdated Show resolved Hide resolved
@DiamonDinoia
Copy link
Collaborator

This is a good idea. I thought of doing it at some point. Do you have an estimate of the improvement? I don't remember this taking too long in my profiling.

@mreineck
Copy link
Collaborator Author

This is a good idea. I thought of doing it at some point. Do you have an estimate of the improvement? I don't remember this taking too long in my profiling.

I didn't notice any considerable improvement, but memory reduction is always nice, and I expect that scaling behavior should be better with fewer memory accesses.

The overhead for this part of the code can be large, but of course it is only part of the planning phase and therefore not a major concern. It is possible to speed up the correction factor considerably by allowing the compiler to vectorize the many cos calls, but that requires -ffast-math or similar, which could be problematic.

@DiamonDinoia
Copy link
Collaborator

I like it as is and can be merged. For vectorization we could open an issue where I vectorize it with xsimd since it has a cos implementation. We can use pragmas/attributes around the function with fast-math to have the compiler doing so for us. Othewise, we can move everything that cannot rely on fast-math to another file and enable it on this file.

@DiamonDinoia DiamonDinoia merged commit 8baf39d into master Jan 28, 2025
332 checks passed
@ahbarnett
Copy link
Collaborator

Thanks for the effort, but I perhaps would have liked to approve this before merging, since it rewrites some of my code, and I am back from vacation. Next time :)

The "operator () (T k)" stuff I guess is just a way to get on-the-fly phihat[k] for each dimension without storing to a previous array. But, surely all of this abstraction into this mini-class will have negligible performance change? (computing the cosines dominates the DRAM movement, right? Or am I wrong?) A speed comparison would be good.

@DiamonDinoia
Copy link
Collaborator

Thanks for the effort, but I perhaps would have liked to approve this before merging, since it rewrites some of my code, and I am back from vacation. Next time :)

Apologies.

The "operator () (T k)" stuff I guess is just a way to get on-the-fly phihat[k] for each dimension without storing to a previous array. But, surely all of this abstraction into this mini-class will have negligible performance change? (computing the cosines dominates the DRAM movement, right? Or am I wrong?) A speed comparison would be good.

Class go away when compiling (in theory) so this is likely to have no impact on performance. The opposite will surprise me.

DRAM movement is the same as the previous version. The only difference is that now things are allocated on the heap instead of the stack. Heap can be slower than stack and we pay the allocation. But I do not think it makes a measurable difference. On the other hand, it reduces the memory consumption.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants