Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Streamline] Introduce StreamlinePlus: Exhaustive streamlining #39

Merged
merged 22 commits into from
Feb 6, 2025

Conversation

iksnagreb
Copy link

Proposes an alternative, reworked streamlining with proper composition of transformations which exhaustively re-applies transformations until the graph does no longer change. Also by default enables steps required for residual topologies, i.e., forking and joining operations, as well as transpose, squeeze and im2col operations.

Includes various extensions and bug-fixes to individual transformations in the streamlining sub-package (reorder, absorbs, ...) collected from multiple feature and bug-fix branches... Most of these are not properly merged nor cherry-picked but directly "transplanted" from the ongoing transformer merge branch... At least this means they have been tested in integration for one non-trivial model topology...

Requires at least this addition to QONNX for the new core functionality ans also some other QONNX fixes and functionality:

Once #9 is merged, we should have everything required from the QONNX side. In terms of other FINN additions and bug-fixes, I tried to make this PR self-contained, by transplanting stuff from the ongoing Transformer merge effort...

iksnagreb added 20 commits April 3, 2024 15:12
Flips the order of AbsorbSignBiasIntoMultiThreshold and
MoveScalarLinearPastInvariants streamlining transforms to prefer
absorbing adds into multi-thresholds instead of propagating them
downwards. This should prevent accumulation of scalar adds in front of
two-input matmuls in scaled dot-product attention operators (they cannot
be moved past the matmul operation in that case).
The MoveScalarMulPastMatMul transformation can now handle matmul
operations with both inputs preceded by a scalar multiplication.

This change is required for streamlining scaled dot-product attention
operations, which are essentially two-input matmuls.
Assertions are to restrictive, causing the program to terminate in cases
the streamlining simply encounters nodes to which the transforms are not
applicable: Just skip those nodes.

Only the two transforms currently affecting the streamlining of scaled
dot-product attention have been changed.
This is pretty much copy and paste of the existing test case, just
replacing the MatMul initializer by a second top-input followed by a
scalar Mul.
Folding quantized initializers into add-like nodes did not repsect the
order of inputs to the add node correctly. This is fixed by testing for
one of the two possible orders and selecting the following indices
accordingly.

Shape inference following the transformation is fixed by deleting the
annotations instead of propagating them incorrectly. Deleting the shape
annotations should not hurt, as these are redone by running shape
inference after each transformation anyways.
Add is commutative and thus the export does not always generate the
initializer as the second input. However, this was always assumed by
this transformation, failing via assertion if the inputs were simply
ordered differently. The transformation now handles both of the two
possible input orderings.
This is required for streamlining packed input projections of multi-head
scaled dot-product attention. Adds support for Squeeze and Unsqueeze as
well. Skip moving of fork-node producers as this is not handled
correctly. However, the same effect can be attained by applying the
MoveLinearPastFork transformation first.
Explicitly rejects absorbing into fork-nodes. Previously, this probably
would have failed, silently resulting in a wrong model. Not sure whether
this happened in any practically relevant models?
This probably is still rather sketchy, but at least it tries to check
the data layout annotation. For now seems to be enough for getting the
thresholds of multi-head attention right, IF qonnx properly annotates
the 3D layouts.
Proposes an alternative, reworked streamlining with proper composition
of transformations which exhaustively re-applies transformations until
the graph does no longer change. Also by default enables steps required
for residual topologies, i.e., forking and joing operations, as well as
transpose, squeeze and im2col operations.

Includes various extensions and bug-fixes to individual transformations
in the streamlining subpackage (reorder, absorbs, ...) collected from
multiple feature and bug-fix branches... Most of these are not properly
merged nor cherry-picked but directly "transplanted" from the ongoing
transformer merge branch... At least this means they have been tested in
integration for *one* non-trivial model topology...
@iksnagreb iksnagreb self-assigned this Feb 4, 2025
@iksnagreb iksnagreb requested a review from fpjentzsch February 4, 2025 15:11
@iksnagreb iksnagreb merged commit a5d32f2 into dev Feb 6, 2025
0 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Merged into FINN+
Development

Successfully merging this pull request may close these issues.

1 participant