[codegen][gpu] Adding conv filter layout fhwc to preprocessing pipeline #19974

jerryyin · 2025-02-12T15:30:37Z

This PR is follow-up to #19739 and #19798. In previous PRs, we allowed convolution filter to be converted according to pipeline options and lower through igemm lowering pipeline. As @Max191's tuning on convolution demonstrated the performance benefit of this pass, we decide to plumb the pass by default in this follow-up PR.

In this PR, we turn this pass on by default sdxl inference pipeline iree-preprocessing-transpose-convolution-pipeline. This will allow the pass to turn on in convolution related inferencing workloads. It will also pick fhwc layout by default if no option is explicitly set for filter-layout.

Credit to discussion from @Max191 and @nirvedhmeshram that we don't always want to turn this pass on because transpose on filter can still be an overhead in training workloads. This pass matches the location where ConvertConvToChannelsLast is.

Max191 · 2025-02-12T17:20:42Z

Hmm looks like the regression tests for SDXL had a couple of runtime failures. That's a bit surprising to me. I'll rerun the jobs and see if it reproduces.

nirvedhmeshram · 2025-02-12T17:39:37Z

Hmm looks like the regression tests for SDXL had a couple of runtime failures. That's a bit surprising to me. I'll rerun the jobs and see if it reproduces.

I think there is numeric change

@main\n[FAILED] result[0]: element at index 0 (-0.290527) does not match the expected (-0.425781); expected tha......][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...]]]\n'

jerryyin · 2025-02-12T17:45:52Z

Let me rebase, the base branch is slightly older now

jerryyin · 2025-02-12T21:59:53Z

I've got some nice suggestions about how to triage those model failures offline from @Max191 and @nirvedhmeshram, WIP on this.

Summary of failures is indicating:

The failing tests are only mix-precision ones. (test_run_punet_int8_fp16 and test_run_punet_int8_fp8)
More conventional setup tests are okay (test_run_punet_fp16 is passing), so we didn't discover this when running the model manually
MI250 is passing but that's irrelevant because those tests are marked with xfail and skipped

…d_to_intrinsics on convolution (#20073) The `pad_to_intrinsics` pass only support `linalg.conv2d` op with `nhwc_hwcf` layout of convolution. This has created inconvenience around taking advantage of other convolution variants for their performance potentials. Once such scenario is the IR from `conv_filter_to_channels_last` will populate `conv2d_nhwc_fhwc` represented by `linalg.generic`. This PR extend support of the `pad_to_intrinsics` pass such that other convolution variants including: - Those that are represented with `linalg.generic` - Other layouts such as (filter layout of) `fhwc` `fchw` This PR will unblock #19974, and allow us to continue to use `pad_to_intrinsics` as igemm padding kernel catch up in performance. --------- Signed-off-by: jerryyin <[email protected]>

Signed-off-by: jerryyin <[email protected]>

jerryyin · 2025-02-25T15:07:00Z

Original mixed precision test is passing after rebase. Now test_vae_rocm is failing. ~~WIP in reproducing locally now.~~ This is now fixed in latest commit.

Signed-off-by: jerryyin <[email protected]>

nirvedhmeshram

LGTM!

jerryyin requested review from Max191 and nirvedhmeshram February 12, 2025 15:30

jerryyin requested review from qedawkins and MaheshRavishankar as code owners February 12, 2025 15:30

jerryyin changed the title ~~Adding conv filter layout fhwc to preprocessing pipeline~~ [codegen][gpu] Adding conv filter layout fhwc to preprocessing pipeline Feb 12, 2025

jerryyin force-pushed the users/zyin/filter-fhwc-preprocessing branch from 0cc26c5 to 0be3371 Compare February 12, 2025 17:47

MaheshRavishankar requested a review from IanWood1 February 12, 2025 21:50

This was referenced Feb 24, 2025

[codegen][gpu] Adding support to generic op and flexible layout to pad_to_intrinsics on convolution #20073

Merged

[codegen][gpu] Use HWFC filter layout for SDXL convolution Ops #19701

Closed

jerryyin force-pushed the users/zyin/filter-fhwc-preprocessing branch from 0be3371 to bcc6d69 Compare February 25, 2025 14:28

Adding conv filter layout fhwc to preprocessing pipeline

bcc6d69

Signed-off-by: jerryyin <[email protected]>

jerryyin force-pushed the users/zyin/filter-fhwc-preprocessing branch from 3ef18c3 to b281af1 Compare February 25, 2025 18:31

Fixing conv without batch dim exposed by vae compilation

b281af1

Signed-off-by: jerryyin <[email protected]>

nirvedhmeshram approved these changes Feb 25, 2025

View reviewed changes

jerryyin merged commit 1aff06d into main Feb 26, 2025
46 checks passed

jerryyin deleted the users/zyin/filter-fhwc-preprocessing branch February 26, 2025 14:17

jerryyin mentioned this pull request Feb 26, 2025

[codegen][gpu] Performance investigation of default convolution filter layout #20105

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[codegen][gpu] Adding conv filter layout fhwc to preprocessing pipeline #19974

[codegen][gpu] Adding conv filter layout fhwc to preprocessing pipeline #19974

jerryyin commented Feb 12, 2025 •

edited

Loading

Max191 commented Feb 12, 2025

nirvedhmeshram commented Feb 12, 2025

jerryyin commented Feb 12, 2025

jerryyin commented Feb 12, 2025

jerryyin commented Feb 25, 2025 •

edited

Loading

nirvedhmeshram left a comment

[codegen][gpu] Adding conv filter layout fhwc to preprocessing pipeline #19974

[codegen][gpu] Adding conv filter layout fhwc to preprocessing pipeline #19974

Conversation

jerryyin commented Feb 12, 2025 • edited Loading

Max191 commented Feb 12, 2025

nirvedhmeshram commented Feb 12, 2025

jerryyin commented Feb 12, 2025

jerryyin commented Feb 12, 2025

jerryyin commented Feb 25, 2025 • edited Loading

nirvedhmeshram left a comment

Choose a reason for hiding this comment

jerryyin commented Feb 12, 2025 •

edited

Loading

jerryyin commented Feb 25, 2025 •

edited

Loading