batch_normalization: Introduce vectorization optimization in the batch norm elementwise kernel. #933

xytintel · 2024-09-25T14:37:24Z

Due to performance issues with the low-precision data type implementation of group stride loops on PVC (jira: PYTORCHDGQ-5162), partial vectorization optimization is used.

fengyuan14 · 2024-10-15T11:36:17Z

src/ATen/native/xpu/sycl/BatchNormKernels.cpp

+          for (int vt = 0; vt < VEC_SIZE; ++vt) {
+            index_t feature = feature_vec_begin + vt;
+            vec[vt] = static_cast<input_scalar_t>(
+                gamma * (i[feature] - mean) * invstd + beta);


Why not using vectorization for inputs? I see vt is contiguous across the iteration.

In the CNL case, consecutive work-items cannot access consecutive data for the OffsetCalculator is used for calculating loading stride.

If we know the vt size, we can calculate offset with linear id 0, 3, 7, 11? if the vt size is 4?

src/ATen/native/xpu/sycl/BatchNormKernels.cpp

using vectorized-write for batch_norm_transform_inputs kernel

de7880f

xytintel added the kernel_optimization label Sep 25, 2024

xytintel requested review from EikanWang and fengyuan14 September 25, 2024 14:45

Merge branch 'main' into xyt/batch_norm_vec_opt

e50f408

fengyuan14 changed the title ~~Perform vectorization optimization on the batch normalization forward pass~~ batch_normalization: Introduce vectorization optimization in the forward pass Sep 26, 2024

fengyuan14 changed the title ~~batch_normalization: Introduce vectorization optimization in the forward pass~~ batch_normalization: Introduce vectorization optimization in the batch norm elementwise kernel. Sep 26, 2024

fengyuan14 mentioned this pull request Oct 15, 2024

Performance: Improve BatchNormalization forward/backward to align with oneDNN implementation. #937

Open

fengyuan14 reviewed Oct 15, 2024

View reviewed changes

src/ATen/native/xpu/sycl/BatchNormKernels.cpp Show resolved Hide resolved

xytintel requested a review from fengyuan14 October 15, 2024 14:10

Merge branch 'main' into xyt/batch_norm_vec_opt

330c9d6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

batch_normalization: Introduce vectorization optimization in the batch norm elementwise kernel. #933

batch_normalization: Introduce vectorization optimization in the batch norm elementwise kernel. #933

xytintel commented Sep 25, 2024 •

edited

Loading

fengyuan14 Oct 15, 2024

xytintel Oct 15, 2024

fengyuan14 Oct 17, 2024

batch_normalization: Introduce vectorization optimization in the batch norm elementwise kernel. #933

Are you sure you want to change the base?

batch_normalization: Introduce vectorization optimization in the batch norm elementwise kernel. #933

Conversation

xytintel commented Sep 25, 2024 • edited Loading

fengyuan14 Oct 15, 2024

Choose a reason for hiding this comment

xytintel Oct 15, 2024

Choose a reason for hiding this comment

fengyuan14 Oct 17, 2024

Choose a reason for hiding this comment

xytintel commented Sep 25, 2024 •

edited

Loading