Skip to content

Commit

Permalink
fix after cherry-pick c7d8ac4
Browse files Browse the repository at this point in the history
  • Loading branch information
ahbarnett committed Nov 25, 2024
1 parent d1034a8 commit 9a6cf20
Show file tree
Hide file tree
Showing 3 changed files with 587 additions and 162 deletions.
2 changes: 1 addition & 1 deletion docs/c_gpu.rst
Original file line number Diff line number Diff line change
Expand Up @@ -313,7 +313,7 @@ while ``modeord=1`` selects FFT-style ordering starting at zero and wrapping ove

**gpu_device_id**: Sets the GPU device ID. Leave at default unless you know what you're doing. [To be documented]

**gpu_spreadinterponly**: if ``0`` do the NUFFT as intended. If ``1``, omit the FFT and deconvolution (diagonal division by kernel Fourier transform) steps, which returns *garbage answers as a NUFFT*, but allows advanced users to perform an isolated spreading or interpolation using the usual type 1 or type 2 ``cufinufft`` interface. To do this, the nonzero flag value must be used *only* with ``upsampfac=1.0`` (since the input and output grids are the same size, and neither represents Fourier coefficients), and ``kerevalmeth=1``. The known use-case here is estimating so-called density compensation, conventionally used in MRI. (See [MRI-NUFFT](https://mind-inria.github.io/mri-nufft/nufft.html)) Please note that this flag is also internally used by type 3 transforms (which was its original use case).
**gpu_spreadinterponly**: if ``0`` do the NUFFT as intended. If ``1``, omit the FFT and deconvolution (diagonal division by kernel Fourier transform) steps, which returns *garbage answers as a NUFFT*, but allows advanced users to perform an isolated spreading or interpolation using the usual type 1 or type 2 ``cufinufft`` interface. To do this, the nonzero flag value must be used *only* with ``upsampfac=1.0`` (since the input and output grids are the same size, and neither represents Fourier coefficients), and ``kerevalmeth=1``. The known use-case here is estimating so-called density compensation, conventionally used in MRI (see `MRI-NUFFT <https://mind-inria.github.io/mri-nufft/nufft.html>`_). Please note that this flag is also internally used by type 3 transforms (being its original use case).



Expand Down
7 changes: 4 additions & 3 deletions docs/opts.rst
Original file line number Diff line number Diff line change
Expand Up @@ -54,10 +54,11 @@ Here are their default settings (from ``src/finufft.cpp:finufft_default_opts``):

As for quick advice, the main options you'll want to play with are:

- ``upsampfac`` to trade-off between spread/interpolate vs FFT speed and RAM
- ``modeord`` to flip ("fftshift") the Fourier mode ordering
- ``debug`` to look at timing output (to determine if your problem is spread/interpolation dominated, vs FFT dominated)
- ``nthreads`` to run with a different number of threads than the current maximum available through OpenMP (a large number can sometimes be detrimental, and very small problems can sometimes run faster on 1 thread)
- ``fftw`` to try slower plan modes which give faster transforms. The next natural one to try is ``FFTW_MEASURE`` (look at the FFTW3 docs)
- ``fftw`` to try slower FFTW plan modes which give faster transforms. The next natural one to try is ``FFTW_MEASURE`` (look at the FFTW3 docs)

See :ref:`Troubleshooting <trouble>` for good advice on trying options, and read the full options descriptions below.

Expand Down Expand Up @@ -158,9 +159,9 @@ There is thus little reason for the nonexpert to mess with this option.
* ``spread_kerpad=1`` : pad to next multiple of four


**upsampfac**: This is the internal real factor by which the FFT (fine grid)
**upsampfac**: This is the internal factor, greater than 1, by which the FFT (fine grid)
is chosen larger than
the number of requested modes in each dimension, for type 1 and 2 transforms. It must be greater than 1.
the number of requested modes in each dimension, for type 1 and 2 transforms. For type 3 transforms this factor gets squared, due to type 2 nested in a type-1-style spreading operation, so has even more effect.
We have built efficient kernels
for only two settings that are greater than 1, as follows. Otherwise, setting it to zero chooses a good heuristic:

Expand Down
Loading

0 comments on commit 9a6cf20

Please sign in to comment.