Python implementation refactor, numba kernel optimizations, some GPU implementations #8

din14970 · 2023-01-02T21:17:13Z

After more than a year I finally found a bit of spare time to work on this project a bit more. I have taken your implementation @berkels and made some improvements in terms of:

general pythonic code quality & formatting, descriptive naming, separation into modules/classes, use of type hints, ...
improved performance on the CPU by optimizing numba jit functions, utilization of multiple cores in some of the kernels, and caching of some pre-computed results so they don't get calculated again.
support for running the calculation on the GPU using cupy to replace numpy and numba.cuda kernels to replace numba.jit (see the cuda_kernels module). Unfortunately at this point it seems GPU performs worse in most cases due to data transfers and allocations.
unit tests that show the code still works the same way as before and as expected.

What still needs to happen:

further work out the user facing API that currently resides in implementation/implementation.py, in the JNNR class, support writing intermediate results to a file.
human readable docs would be nice that explain how the code actually works and implements what is written in the papers. This will not be in the scope of this PR though.
I don't think the multistage optimization works as intended yet, as described in https://arxiv.org/pdf/1901.01709.pdf. Particularly, I'm not sure how the bias correction step is supposed to be implemented. This is in JNNR.run. Some help here would be appreciated.
benchmarking against matchseries.

@berkels I would really appreciate if you could skim over this if/when you find time and comment on:

the bias reduction step issue
naming of functions and variables, descriptions in docstrings, ... (is it accurate/correct?)
tips on further improving performance
anything else you notice

… work

… the code for different energy functionals

…_integrate_pd_over_cells_single and is how one typically iterates in FE

… hand that does "constant in normal direction" extension

…ion problem as non-linear least squares problem

…mization problem as non-linear least squares problem

…e regularizer part of the residual

…rix vector mutliplication with a constant matrix and a constant shift

…ation

…1 (will make the multi level descent easier)

…rmalized integration domain instead of in pixels (will make the multi level descent easier)

berkels · 2023-01-03T09:59:55Z

pymatchseries/implementation/implementation.py

+                )
+                self.state.deformations = ComplexSignal2D(dp.stack(corrected_images))
+                self.state.completed_stages = stage + 1
+                L *= self.config.regularization.factor_stage


This seems slightly different from what my C++ implementation is doing. There, the first stage uses lambda, all other stages use lambda*extraStagesLambdaFactor, i.e. the regularization parameter is only changed when moving from stage 1 to stage 2. It's not changed further when going from stage 2 to 3..

berkels · 2023-01-03T10:20:50Z

The bias correction step indeed still seems to be missing. Let $$N[\psi]:=\frac12\sum_{i=1}^{n}\int_\Omega\lVert\phi_i(\psi(x))-x\rVert^2\mathrm{d}x.$$ To compute the minimizing $\phi$, we do a gradient flow of $N$ with respect to $\psi$. This needs the first variation
$$\langle N'[\psi,\phi_1,\ldots,\phi_n],\zeta\rangle=\sum_{i=1}^{n}\int_\Omega (D\phi_i)(\psi(x))^T(\phi_i(\psi(x))-x)\cdot\zeta\mathrm{d}x.$$ This is very similar to how the registration itself is done. This just doesn't have a regularizer and includes all deformations, not just one.

berkels · 2023-01-03T10:32:03Z

Note $N$ can be seen as sum of data terms typical for registration. Let $x_1$ and $x_2$ denote the components of $x$, i.e. $x=(x_1,x_2)$. Furthermore, denote the components of $\phi_i(x)$ with $\phi_i(x)=(\phi_{i,1}(x),\phi_{i,2}(x))$. Then,
$$N[\psi]=\frac12\sum_{i=1}^{n}\sum_{j=1}^2\int_\Omega(\phi_{i,j}(\psi(x))-x_j)^2\mathrm{d}x.$$
and
$$\langle N'[\psi,\phi_1,\ldots,\phi_n],\zeta\rangle=\sum_{i=1}^{n}\sum_{j=1}^2\int_\Omega(\phi_{i,j}(\psi(x))-x_j)\nabla\phi_{i,j}(\psi(x)) \cdot\zeta.$$
Using this, you should be able to implement this using the data term and its gradient you implemented for the registration. There the images would be $\phi_{i,j}$ and $x_j$. The C++ code uses the interpretation above, but reusing your existing registration should work as well and take less effort.

berkels · 2023-01-03T10:33:59Z

BTW: I'm very happy to see that you made a lot of progress on this!

berkels · 2023-01-03T10:36:35Z

FYI, the C++ code handles the bias correction in SeriesMatching::reduceDeformations in projects/electronMicroscopy/matchSeries.h. I don't think that though this is very helpful as reference.

din14970 and others added 30 commits April 28, 2021 20:45

started python implementation

03fab69

changed zero to empty

29e0e57

added gradient function

00604dd

fixed initialization of an array

a14952c

added a main to show that the registration energy and its gradient do…

c479bfd

… work

typo fixes

c1642e1

group some variables into arrays

80051ad

formatting with "black -l 120"

8b02d6a

changed the implementation of the gradient to make it easier to reuse…

34849d7

… the code for different energy functionals

iterate over cells instead of nodes, which gets rid off all "if"s in …

b920ad9

…_integrate_pd_over_cells_single and is how one typically iterates in FE

formatting with "black -l 120"

9bd9018

replaced the image interpolation code with an implementation I had at…

f5b696e

… hand that does "constant in normal direction" extension

added a residual function as preparation to reformulate the minimizat…

410f24c

…ion problem as non-linear least squares problem

minor change to make some variable names more accurate

ee6803b

added the gradient of the residual function and reformulated the mini…

30356de

…mization problem as non-linear least squares problem

precompute the constant matrix that corresponds to the gradient of th…

5d0212d

…e regularizer part of the residual

the regularizer part of the residual itself is now implemented as mat…

f06dee2

…rix vector mutliplication with a constant matrix and a constant shift

minor simplifications

7e811f5

started to organize the code from main into a class

06c7e57

continued to organize the code from main into a class

31b0330

continued to organize the code from main into a class

4c783c6

continued to organize the code from main into a class

884507b

finished to organize the code from main into a class

6ae36f4

the degrees of freedom are now the displacement instead of the deform…

32e67ad

…ation

small improvements

5271fe9

normalize the integration domain so that its longest axis has length …

55b36a0

…1 (will make the multi level descent easier)

the degrees of freedom of the displacement are now relative to the no…

8ca6828

…rmalized integration domain instead of in pixels (will make the multi level descent easier)

implemented multi level optimization in main

946bbb6

increased resolution of the example images

8c43e07

fixed a deprecation warning

a2556e3

din14970 added 13 commits November 27, 2022 21:03

moved solver to own module

8085c80

implemented fast interpolation

212f134

minor modification interpolation parameters

5e28965

functioning evaluation and pd evaluation kernels cpu gpu

c9f1a95

fixed up quadrature and objective functions

e172ac7

registrationobjectivefunction appears to work

ef62db8

working tests quadrature

cee216e

added test data and tests for objective functions

7d7b6da

added basic test for solver

df0fc59

added gpu test for solver

e81798a

updated implementation, gave shape to jnnr class

c1dba19

functioning tests and first version of implementation user interface

9900260

code quality/deployment stuff

e5fe05d

din14970 requested a review from berkels January 2, 2023 21:17

din14970 self-assigned this Jan 2, 2023

din14970 added 9 commits January 3, 2023 08:31

updated python versions

2ffbb25

small edit

edd38d1

fixed setup.py

1981361

updating install dependencies in build.yml

a273e35

updated test command

c3edaf7

fixing typing and test issues

e55129b

additional fixes

182630e

fixing issues with jit

82b18e7

fixing jit decorator issues

e137abb

berkels reviewed Jan 3, 2023

View reviewed changes

din14970 mentioned this pull request Jan 26, 2023

Presenting at M&M 2023 pyxem/pyxem#902

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python implementation refactor, numba kernel optimizations, some GPU implementations #8

Python implementation refactor, numba kernel optimizations, some GPU implementations #8

din14970 commented Jan 2, 2023

berkels Jan 3, 2023

berkels commented Jan 3, 2023

berkels commented Jan 3, 2023

berkels commented Jan 3, 2023

berkels commented Jan 3, 2023

Python implementation refactor, numba kernel optimizations, some GPU implementations #8

Are you sure you want to change the base?

Python implementation refactor, numba kernel optimizations, some GPU implementations #8

Conversation

din14970 commented Jan 2, 2023

berkels Jan 3, 2023

Choose a reason for hiding this comment

berkels commented Jan 3, 2023

berkels commented Jan 3, 2023

berkels commented Jan 3, 2023

berkels commented Jan 3, 2023