[Streamline] Introduce StreamlinePlus: Exhaustive streamlining #39

iksnagreb · 2025-02-04T15:11:46Z

Proposes an alternative, reworked streamlining with proper composition of transformations which exhaustively re-applies transformations until the graph does no longer change. Also by default enables steps required for residual topologies, i.e., forking and joining operations, as well as transpose, squeeze and im2col operations.

Includes various extensions and bug-fixes to individual transformations in the streamlining sub-package (reorder, absorbs, ...) collected from multiple feature and bug-fix branches... Most of these are not properly merged nor cherry-picked but directly "transplanted" from the ongoing transformer merge branch... At least this means they have been tested in integration for one non-trivial model topology...

Requires at least this addition to QONNX for the new core functionality ans also some other QONNX fixes and functionality:

[Transform] Introduce ComposedTransformation fastmachinelearning/qonnx#169

Once #9 is merged, we should have everything required from the QONNX side. In terms of other FINN additions and bug-fixes, I tried to make this PR self-contained, by transplanting stuff from the ongoing Transformer merge effort...

Flips the order of AbsorbSignBiasIntoMultiThreshold and MoveScalarLinearPastInvariants streamlining transforms to prefer absorbing adds into multi-thresholds instead of propagating them downwards. This should prevent accumulation of scalar adds in front of two-input matmuls in scaled dot-product attention operators (they cannot be moved past the matmul operation in that case).

The MoveScalarMulPastMatMul transformation can now handle matmul operations with both inputs preceded by a scalar multiplication. This change is required for streamlining scaled dot-product attention operations, which are essentially two-input matmuls.

Assertions are to restrictive, causing the program to terminate in cases the streamlining simply encounters nodes to which the transforms are not applicable: Just skip those nodes. Only the two transforms currently affecting the streamlining of scaled dot-product attention have been changed.

This is pretty much copy and paste of the existing test case, just replacing the MatMul initializer by a second top-input followed by a scalar Mul.

Folding quantized initializers into add-like nodes did not repsect the order of inputs to the add node correctly. This is fixed by testing for one of the two possible orders and selecting the following indices accordingly. Shape inference following the transformation is fixed by deleting the annotations instead of propagating them incorrectly. Deleting the shape annotations should not hurt, as these are redone by running shape inference after each transformation anyways.

Add is commutative and thus the export does not always generate the initializer as the second input. However, this was always assumed by this transformation, failing via assertion if the inputs were simply ordered differently. The transformation now handles both of the two possible input orderings.

This is required for streamlining packed input projections of multi-head scaled dot-product attention. Adds support for Squeeze and Unsqueeze as well. Skip moving of fork-node producers as this is not handled correctly. However, the same effect can be attained by applying the MoveLinearPastFork transformation first.

Explicitly rejects absorbing into fork-nodes. Previously, this probably would have failed, silently resulting in a wrong model. Not sure whether this happened in any practically relevant models?

This probably is still rather sketchy, but at least it tries to check the data layout annotation. For now seems to be enough for getting the thresholds of multi-head attention right, IF qonnx properly annotates the 3D layouts.

…amline

Proposes an alternative, reworked streamlining with proper composition of transformations which exhaustively re-applies transformations until the graph does no longer change. Also by default enables steps required for residual topologies, i.e., forking and joing operations, as well as transpose, squeeze and im2col operations. Includes various extensions and bug-fixes to individual transformations in the streamlining subpackage (reorder, absorbs, ...) collected from multiple feature and bug-fix branches... Most of these are not properly merged nor cherry-picked but directly "transplanted" from the ongoing transformer merge branch... At least this means they have been tested in integration for *one* non-trivial model topology...

…e-plus

iksnagreb added 20 commits April 3, 2024 15:12

Remove misplaced/outdated comment

09c1993

Address some linting issues

8bae5d7

[Tests] Add test for MoveScalarMulPastMatMul handling join nodes

b22ebe3

This is pretty much copy and paste of the existing test case, just replacing the MatMul initializer by a second top-input followed by a scalar Mul.

[Deps] Update qonnx version to include FoldTransposeIntoQuantInit fix

c10fa1d

[Streamline] Absorb1BitMulIntoMatMul/Conv does not handle fork-nodes

b3e50d7

Explicitly rejects absorbing into fork-nodes. Previously, this probably would have failed, silently resulting in a wrong model. Not sure whether this happened in any practically relevant models?

[Deps] Temporarily switch qonnx to my fork including necessary fixes

0413368

Make quantized activation handlers data layout aware

2bf7949

This probably is still rather sketchy, but at least it tries to check the data layout annotation. For now seems to be enough for getting the thresholds of multi-head attention right, IF qonnx properly annotates the 3D layouts.

[Deps] Update qonnx

8783fd4

[Deps] Update qonnx

2bf37f1

[Deps] Update qonnx

a4fc498

Fix some typos

6c56382

Merge remote-tracking branch 'xilinx/dev' into feature/attention-stre…

15a9daa

…amline

Merge branch 'feature/split-concat' into feature/streamline-plus

fb6fe31

iksnagreb self-assigned this Feb 4, 2025

iksnagreb requested a review from fpjentzsch February 4, 2025 15:11

iksnagreb mentioned this pull request Feb 4, 2025

Transformer in FINN: Scaled Dot-Product Attention #13

Merged

15 tasks

iksnagreb added 2 commits February 6, 2025 16:02

Merge remote-tracking branch 'eki-project/dev' into feature/streamlin…

f7e1178

…e-plus

Merge branch 'dev' into feature/streamline-plus

2ffb095

iksnagreb merged commit a5d32f2 into dev Feb 6, 2025
0 of 2 checks passed

fpjentzsch mentioned this pull request Feb 10, 2025

PR for MoveLinearPastEltwiseMul transformation Xilinx/finn#1275

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Streamline] Introduce StreamlinePlus: Exhaustive streamlining #39

[Streamline] Introduce StreamlinePlus: Exhaustive streamlining #39

iksnagreb commented Feb 4, 2025

[Streamline] Introduce StreamlinePlus: Exhaustive streamlining #39

[Streamline] Introduce StreamlinePlus: Exhaustive streamlining #39

Conversation

iksnagreb commented Feb 4, 2025