Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[codegen][gpu] Adding conv filter layout fhwc to preprocessing pipeline #19974

Merged
merged 2 commits into from
Feb 26, 2025

Conversation

jerryyin
Copy link
Member

@jerryyin jerryyin commented Feb 12, 2025

This PR is follow-up to #19739 and #19798. In previous PRs, we allowed convolution filter to be converted according to pipeline options and lower through igemm lowering pipeline. As @Max191's tuning on convolution demonstrated the performance benefit of this pass, we decide to plumb the pass by default in this follow-up PR.

In this PR, we turn this pass on by default sdxl inference pipeline iree-preprocessing-transpose-convolution-pipeline. This will allow the pass to turn on in convolution related inferencing workloads. It will also pick fhwc layout by default if no option is explicitly set for filter-layout.

Credit to discussion from @Max191 and @nirvedhmeshram that we don't always want to turn this pass on because transpose on filter can still be an overhead in training workloads. This pass matches the location where ConvertConvToChannelsLast is.

@jerryyin jerryyin changed the title Adding conv filter layout fhwc to preprocessing pipeline [codegen][gpu] Adding conv filter layout fhwc to preprocessing pipeline Feb 12, 2025
@Max191
Copy link
Contributor

Max191 commented Feb 12, 2025

Hmm looks like the regression tests for SDXL had a couple of runtime failures. That's a bit surprising to me. I'll rerun the jobs and see if it reproduces.

@nirvedhmeshram
Copy link
Contributor

Hmm looks like the regression tests for SDXL had a couple of runtime failures. That's a bit surprising to me. I'll rerun the jobs and see if it reproduces.

I think there is numeric change

@main\n[FAILED] result[0]: element at index 0 (-0.290527) does not match the expected (-0.425781); expected tha......][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...]]]\n'

@jerryyin
Copy link
Member Author

Let me rebase, the base branch is slightly older now

@jerryyin jerryyin force-pushed the users/zyin/filter-fhwc-preprocessing branch from 0cc26c5 to 0be3371 Compare February 12, 2025 17:47
@jerryyin
Copy link
Member Author

I've got some nice suggestions about how to triage those model failures offline from @Max191 and @nirvedhmeshram, WIP on this.

Summary of failures is indicating:

  • The failing tests are only mix-precision ones. (test_run_punet_int8_fp16 and test_run_punet_int8_fp8)
  • More conventional setup tests are okay (test_run_punet_fp16 is passing), so we didn't discover this when running the model manually
  • MI250 is passing but that's irrelevant because those tests are marked with xfail and skipped

jerryyin added a commit that referenced this pull request Feb 25, 2025
…d_to_intrinsics on convolution (#20073)

The `pad_to_intrinsics` pass only support `linalg.conv2d` op with
`nhwc_hwcf` layout of convolution. This has created inconvenience around
taking advantage of other convolution variants for their performance
potentials. Once such scenario is the IR from
`conv_filter_to_channels_last` will populate `conv2d_nhwc_fhwc`
represented by `linalg.generic`.

This PR extend support of the `pad_to_intrinsics` pass such that other
convolution variants including:
 - Those that are represented with `linalg.generic`
 - Other layouts such as (filter layout of) `fhwc` `fchw`

This PR will unblock #19974, and
allow us to continue to use `pad_to_intrinsics` as igemm padding kernel
catch up in performance.

---------

Signed-off-by: jerryyin <[email protected]>
@jerryyin jerryyin force-pushed the users/zyin/filter-fhwc-preprocessing branch from 0be3371 to bcc6d69 Compare February 25, 2025 14:28
@jerryyin
Copy link
Member Author

jerryyin commented Feb 25, 2025

Original mixed precision test is passing after rebase. Now test_vae_rocm is failing. WIP in reproducing locally now. This is now fixed in latest commit.

@jerryyin jerryyin force-pushed the users/zyin/filter-fhwc-preprocessing branch from 3ef18c3 to b281af1 Compare February 25, 2025 18:31
Copy link
Contributor

@nirvedhmeshram nirvedhmeshram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@jerryyin jerryyin merged commit 1aff06d into main Feb 26, 2025
46 checks passed
@jerryyin jerryyin deleted the users/zyin/filter-fhwc-preprocessing branch February 26, 2025 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants