Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Tiling Support to All CCT Kernels and Fix CCT Operators on Siracusa Platform for L2 #35

Open
wants to merge 3 commits into
base: devel
Choose a base branch
from

Conversation

runwangdl
Copy link
Contributor

@runwangdl runwangdl commented Feb 10, 2025

Description

This update improves CCT's kernel tiling support and resolves multiple operator issues on the Siracusa platform. The new kernel templates for convolution and max-pooling enhance padding integration while adopting an HWC layout. Additionally, key constraints for tiling have been introduced, fixing several execution issues in GEMM, MatMul, and float-based computations. The layers has also been refined to handle bias broadcasting correctly, ensuring accurate output shape inference.

Added

  1. Float Bindings, Tilers for Pulp Target

    • Introduced tiling support for all float operators on Pulp Target(including float add, gemm, Matmul, Conv, Maxpool, Layernorm, Gelu, Gather).
  2. Float Convolution, MaxPool Parser, Template, Kernel

    • Implemented kernel and template different from the generic version.
    • Designed to integrate with padding.
    • C kernel with HWC layout.
  3. Tiling Constraints

    • Added tiling constraints for conv gather and layernorm and exisitng constraints for other kernels.

Fixed

  1. CycleMeasure Pass for Siracusa Untiling Profilling

  2. GEMM Tiling Constraints Issue

    • Fixed transA and `transB' not supported.
  3. MatMul Multi-Dimensional Input Issue

    • Improved Matmul handling of multi-dimensional input.
  4. Add Layer for Broadcasted Bias

  • Fixed add layer computeshape being compatible with broadcasted bias
  1. Code Generation Float32 Issue
    • Resolved an issue where concatenation of float32 with f caused inf errors.

Changed

  1. Regenerated ONNX for CCT
    • Adjusted bias handling to prevent incorrect output shape inference when broadcasting is required.
    • Modified bias dimensions for add and gemm to avoid unnecessary broadcasting.

PR Merge Checklist

  1. The PR is rebased on the latest devel commit and pointing to devel.
  2. Your PR reviewed and approved.
  3. All checks are passing.
  4. The CHANGELOG.md file has been updated.

@runwangdl runwangdl changed the title [Draft] Add Tiling Support to All CCT Kernels and Fix CCT Operators Derivation on Siracusa Platform [Draft] Add Tiling Support to All CCT Kernels and Fix CCT Operators on Siracusa Platform Feb 10, 2025
@runwangdl runwangdl force-pushed the PULPCCT branch 11 times, most recently from cd2ee51 to 8afb9f3 Compare February 12, 2025 23:26
@runwangdl runwangdl marked this pull request as ready for review February 12, 2025 23:56
Copy link
Member

@Victor-Jung Victor-Jung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Run, great PR addressing lots of issues and building strong ground for every fp execution on PULPOpen! A few comments to address but no critical ones.

@@ -120,6 +124,7 @@
MemoryManagementGeneration("L3.*"),
MemoryManagementGeneration("L2"),
MemoryManagementGeneration(),
ProfilingCodeGeneration()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not enable that by default. This is only useful in the case of untiled execution (with testRunner_Siracusa.py) so let's add this pass only in this situation. I recommend adding an argument to the untiled test runner and add this pass from networkGenerate.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I would fix it.

Copy link
Contributor Author

@runwangdl runwangdl Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed #35

Deeploy/Targets/PULPOpen/Parsers.py Outdated Show resolved Hide resolved
TargetLibraries/PULPOpen/src/MaxPool_fp32.c Show resolved Hide resolved
@runwangdl runwangdl changed the title [Draft] Add Tiling Support to All CCT Kernels and Fix CCT Operators on Siracusa Platform Add Tiling Support to All CCT Kernels and Fix CCT Operators on Siracusa Platform Feb 13, 2025
@Victor-Jung Victor-Jung added the enhancement New feature or request label Feb 13, 2025
@runwangdl runwangdl changed the title Add Tiling Support to All CCT Kernels and Fix CCT Operators on Siracusa Platform Add Tiling Support to All CCT Kernels and Fix CCT Operators on Siracusa Platform for L2 Feb 13, 2025
Copy link
Member

@Victor-Jung Victor-Jung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the interface that you used (CodeGenVerbosity) but I don't like that the pass is in the PULPTiling pass. A small change and this will roll.

@@ -303,6 +305,7 @@ def generate_test(self):

command = f"python {generation_script} -d {self._dir_gen} -t {self._dir_test} -p {self._platform} {self.gen_args}"
command += self._argument_parser.generate_cmd_args()
print(command)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit 206bdd3.

Comment on lines 57 to 59
if verbose.untilingProfiling:
ctxt, executionBlock = self.profiluntiling.apply(ctxt, executionBlock, name)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pass is unrelated to tiling and should not be there. You should make an independent pass that does smth only when the given flag is passed (through CodeGenVerbosity). Also, untiling does not mean anything in this context; let's call it profileUntiled.

Copy link
Contributor Author

@runwangdl runwangdl Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit 206bdd3. Add new PULPProfileUntiled Pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants