Broadcast support for elementwise ops #148

slyalin · 2024-07-24T20:42:37Z

TODO:

On-demand taking of dynamic dims as values instead of pre-initialization with all dynamic dims. (Will be covered with a further PR, should have no negative impact if we are lowering to linalg as it always requires taking all dynamic dimensions to construct the outputs. Plus I would expect dead code elimination from the MLIR pipeline.)
Element type restriction for the new binary eltwise convertor
Fix incorrect shape collapsing in case of broadcast with unaligned ranks
Reuse dynamic dimensions map in MatMul conversion.

Now the entire graph (except tensor constants) from #146 (comment) is converted as a single MLIR function:

module @fragment_name {
  func.func @entry(%arg0: memref<?x?xf32>, %arg1: memref<1x1xf32>, %arg2: memref<128x1024xf32>, %arg3: memref<1x128xf32>, %arg4: memref<?x128xf32>) {
    %0 = bufferization.to_tensor %arg0 restrict : memref<?x?xf32>
    %c0 = arith.constant 0 : index
    %dim = tensor.dim %0, %c0 : tensor<?x?xf32>
    %c1 = arith.constant 1 : index
    %dim_0 = tensor.dim %0, %c1 : tensor<?x?xf32>
    %1 = bufferization.to_tensor %arg1 restrict : memref<1x1xf32>
    %2 = bufferization.to_tensor %arg2 restrict : memref<128x1024xf32>
    %3 = bufferization.to_tensor %arg3 restrict : memref<1x128xf32>
    %4 = tensor.empty(%dim, %dim_0) : tensor<?x?xf32>
    %5 = linalg.mul ins(%0, %0 : tensor<?x?xf32>, tensor<?x?xf32>) outs(%4 : tensor<?x?xf32>) -> tensor<?x?xf32>
    %collapsed = tensor.collapse_shape %1 [] : tensor<1x1xf32> into tensor<f32>
    %6 = tensor.empty(%dim, %dim_0) : tensor<?x?xf32>
    %broadcasted = linalg.broadcast ins(%collapsed : tensor<f32>) outs(%6 : tensor<?x?xf32>) dimensions = [0, 1] 
    %7 = tensor.empty(%dim, %dim_0) : tensor<?x?xf32>
    %8 = linalg.add ins(%5, %broadcasted : tensor<?x?xf32>, tensor<?x?xf32>) outs(%7 : tensor<?x?xf32>) -> tensor<?x?xf32>
    %9 = tensor.empty(%dim, %dim_0) : tensor<?x?xf32>
    %10 = linalg.sub ins(%0, %8 : tensor<?x?xf32>, tensor<?x?xf32>) outs(%9 : tensor<?x?xf32>) -> tensor<?x?xf32>
    %11 = tensor.empty(%dim, %dim_0) : tensor<?x?xf32>
    %12 = linalg.add ins(%0, %0 : tensor<?x?xf32>, tensor<?x?xf32>) outs(%11 : tensor<?x?xf32>) -> tensor<?x?xf32>
    %13 = tensor.empty(%dim, %dim_0) : tensor<?x?xf32>
    %14 = linalg.mul ins(%12, %10 : tensor<?x?xf32>, tensor<?x?xf32>) outs(%13 : tensor<?x?xf32>) -> tensor<?x?xf32>
    %15 = tensor.empty(%dim, %dim_0) : tensor<?x?xf32>
    %16 = linalg.div ins(%14, %0 : tensor<?x?xf32>, tensor<?x?xf32>) outs(%15 : tensor<?x?xf32>) -> tensor<?x?xf32>
    %17 = tensor.empty(%dim) : tensor<?x128xf32>
    %cst = arith.constant 0.000000e+00 : f32
    %18 = linalg.fill ins(%cst : f32) outs(%17 : tensor<?x128xf32>) -> tensor<?x128xf32>
    %19 = linalg.matmul_transpose_b ins(%16, %2 : tensor<?x?xf32>, tensor<128x1024xf32>) outs(%18 : tensor<?x128xf32>) -> tensor<?x128xf32>
    %collapsed_1 = tensor.collapse_shape %3 [[0, 1]] : tensor<1x128xf32> into tensor<128xf32>
    %20 = tensor.empty(%dim) : tensor<?x128xf32>
    %broadcasted_2 = linalg.broadcast ins(%collapsed_1 : tensor<128xf32>) outs(%20 : tensor<?x128xf32>) dimensions = [0] 
    %21 = tensor.empty(%dim) : tensor<?x128xf32>
    %22 = linalg.add ins(%19, %broadcasted_2 : tensor<?x128xf32>, tensor<?x128xf32>) outs(%21 : tensor<?x128xf32>) -> tensor<?x128xf32>
    bufferization.materialize_in_destination %22 in restrict writable %arg4 : (tensor<?x128xf32>, memref<?x128xf32>) -> ()
    return
  }
}

…amic dimensions handling based on symbols.

slyalin · 2024-07-24T20:50:00Z

src/common/transformations/src/transformations/mlir/op/binary_eltwise.cpp

+                    }
+                }
+
+                auto squeezed = builder.create<tensor::CollapseShapeOp>(loc, inputs[i], squeeze_map);


Discussed this way of "preparation for broadcast" with @adam-smnk.
I think this approach is not convenient because the whole reason for using named ops is simplicity. If a widely used broadcast semantics requires tricks, it is not simple to use, and too low level to be an entry point dialect in OV case.

@adam-smnk, @rengolin, please propose a better way of doing that.

Digging a bit deeper, it is the intended approach in line with the original op design:
https://discourse.llvm.org/t/rfc-primitive-ops-add-broadcastop-to-linalg/66313

Size-1 expansion is deliberately forbidden.

You can use generics for now, but we're discussing adding broadcast and transpose to named ops.
https://discourse.llvm.org/t/rfc-transpose-attribute-for-linalg-matmul-operations/80092/

I don't have experience using linalg generics in C++, so cannot quickly produce this code. I will merge it as-is with collapse/broadcast.

… Relu.

…n. Forced f32 in the conversion pipeline.

…cast

Broadcast support for element-wise ops and more economical way of dyn…

b8d50cf

…amic dimensions handling based on symbols.

slyalin requested review from rengolin and adam-smnk July 24, 2024 20:42

slyalin assigned rengolin Jul 24, 2024

github-actions bot added the category: transformations label Jul 24, 2024

slyalin commented Jul 24, 2024

View reviewed changes

slyalin added 4 commits July 25, 2024 13:21

Simpler broadcast dims cacluations, moved to common utils.

13b7952

Use common function to compute dynamic dimension values in MatMul and…

e7555c8

… Relu.

Element type configurable restriction for the new BinaryEltwisePatter…

e35dc64

…n. Forced f32 in the conversion pipeline.

Merge remote-tracking branch 'slyalin/mlir' into binary_eltwise_broad…

516e201

…cast

slyalin marked this pull request as ready for review July 25, 2024 15:53

Merge remote-tracking branch 'slyalin/mlir' into binary_eltwise_broad…

6ace93b

…cast

slyalin merged commit f0f76b9 into mlir Jul 25, 2024
14 of 30 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Broadcast support for elementwise ops #148

Broadcast support for elementwise ops #148

slyalin commented Jul 24, 2024 •

edited

Loading

slyalin Jul 24, 2024

adam-smnk Jul 25, 2024

rengolin Jul 25, 2024

slyalin Jul 25, 2024

Broadcast support for elementwise ops #148

Broadcast support for elementwise ops #148

Conversation

slyalin commented Jul 24, 2024 • edited Loading

slyalin Jul 24, 2024

Choose a reason for hiding this comment

adam-smnk Jul 25, 2024

Choose a reason for hiding this comment

rengolin Jul 25, 2024

Choose a reason for hiding this comment

slyalin Jul 25, 2024

Choose a reason for hiding this comment

slyalin commented Jul 24, 2024 •

edited

Loading