[codegen][gpu] Adding support to generic op and flexible layout to pad_to_intrinsics on convolution #20073

jerryyin · 2025-02-24T15:37:29Z

The pad_to_intrinsics pass only support linalg.conv2d op with nhwc_hwcf layout of convolution. This has created inconvenience around taking advantage of other convolution variants for their performance potentials. Once such scenario is the IR from conv_filter_to_channels_last will populate conv2d_nhwc_fhwc represented by linalg.generic.

This PR extend support of the pad_to_intrinsics pass such that other convolution variants including:

Those that are represented with linalg.generic
Other layouts such as (filter layout of) fhwc fchw

This PR will unblock #19974, and allow us to continue to use pad_to_intrinsics as igemm padding kernel catch up in performance.

Signed-off-by: jerryyin <[email protected]>

nirvedhmeshram

LGTM!

Signed-off-by: jerryyin <[email protected]>

nirvedhmeshram · 2025-02-24T17:42:47Z

Interesting regression : https://github.com/iree-org/iree/actions/runs/13503269777/job/37727576624?pr=20073#step:9:209

I guess we could land this and #19974 together, but I am not able to come up with a hypothesis on why this by itself is so bad, any thoughts?

EDIT : Just thought of a hypothesis, if its padding more convs now and thats causing a slow down, I dont think merging with #19974 will help either.

nirvedhmeshram · 2025-02-24T17:47:07Z

Actually I see this on main, so might not be anything in this PR,
https://github.com/iree-org/iree/actions/runs/13501782679/job/37724211700

jerryyin · 2025-02-24T17:48:13Z

Yes, I agree with you the culprit must be in that I've made this path too flexible and now it can handle any type of convolution (versus in the past it only deal with linalg.conv2d hwcf variant). I'll try to reproduce locally and see what's going on.

Actually I see this on main, so might not be anything in this PR

Oh wow, thanks for point that out. Let me take a second look at main's CI record too.

jerryyin · 2025-02-24T18:00:28Z

Per discord discussion, the perf degradation is caused by mi300 switching to cpx mode and irrelevant with this PR.

I'll leave this PR open till tomorrow to merge it in case there's other feedbacks.

nirvedhmeshram · 2025-02-24T18:01:14Z

@jerryyin Since this is an optional (and experimental pass) it is okay the way it is but one thing to consider is if we should have these two cases where we dont do the padding
https://github.com/iree-org/iree/blob/main/compiler/src/iree/compiler/Dialect/LinalgExt/Utils/Utils.cpp#L432-L442

jerryyin · 2025-02-24T18:03:09Z

~~@nirvedhmeshram Sounds good, will do.~~

On a second look, those scenarios has already been blocked by below. Since I preserved this conditional, I don't have to do anything here.

iree/compiler/src/iree/compiler/Preprocessing/Common/PadToIntrinsics.cpp

Lines 192 to 198 in e87dd2e

    
           if (convolutionDims->outputChannel.size() != 1 || 
        
               convolutionDims->inputChannel.size() != 1 || 
        
               convolutionDims->filterLoop.size() < 1 || 
        
               convolutionDims->outputImage.size() < 1 || 
        
               convolutionDims->depth.size() != 0) { 
        
             return; 
        
           }

I was imprecise when mentioned that this PR will allow any type of convolution.

Adding support to conv generic and flexible layout to pad_to_intrinsics

795c322

Signed-off-by: jerryyin <[email protected]>

jerryyin requested review from Max191 and nirvedhmeshram February 24, 2025 15:37

jerryyin requested review from qedawkins and MaheshRavishankar as code owners February 24, 2025 15:37

jerryyin changed the title ~~Adding support to generic op and flexible layout to pad_to_intrinsics on convolution~~ [codegen][gpu] Adding support to generic op and flexible layout to pad_to_intrinsics on convolution Feb 24, 2025

nirvedhmeshram approved these changes Feb 24, 2025

View reviewed changes

Removing unused header

cde72a4

Signed-off-by: jerryyin <[email protected]>

jerryyin mentioned this pull request Feb 24, 2025

[codegen][gpu] Use HWFC filter layout for SDXL convolution Ops #19701

Closed

jerryyin merged commit 50ac991 into main Feb 25, 2025
44 of 46 checks passed

jerryyin deleted the users/zyin/support-generic-padtointrinsics branch February 25, 2025 14:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[codegen][gpu] Adding support to generic op and flexible layout to pad_to_intrinsics on convolution #20073

[codegen][gpu] Adding support to generic op and flexible layout to pad_to_intrinsics on convolution #20073

jerryyin commented Feb 24, 2025

nirvedhmeshram left a comment

nirvedhmeshram commented Feb 24, 2025 •

edited

Loading

nirvedhmeshram commented Feb 24, 2025

jerryyin commented Feb 24, 2025 •

edited

Loading

jerryyin commented Feb 24, 2025

nirvedhmeshram commented Feb 24, 2025

jerryyin commented Feb 24, 2025 •

edited

Loading

[codegen][gpu] Adding support to generic op and flexible layout to pad_to_intrinsics on convolution #20073

[codegen][gpu] Adding support to generic op and flexible layout to pad_to_intrinsics on convolution #20073

Conversation

jerryyin commented Feb 24, 2025

nirvedhmeshram left a comment

Choose a reason for hiding this comment

nirvedhmeshram commented Feb 24, 2025 • edited Loading

nirvedhmeshram commented Feb 24, 2025

jerryyin commented Feb 24, 2025 • edited Loading

jerryyin commented Feb 24, 2025

nirvedhmeshram commented Feb 24, 2025

jerryyin commented Feb 24, 2025 • edited Loading

nirvedhmeshram commented Feb 24, 2025 •

edited

Loading

jerryyin commented Feb 24, 2025 •

edited

Loading

jerryyin commented Feb 24, 2025 •

edited

Loading