Refactor matmul test suite. #22

ScottTodd · 2024-09-05T22:52:05Z

Progress on #2. See also the long Discord thread here.

Summaries of changes

Further decoupled test suites from the core CMake project

Forked iree_native_test.cmake to iree_test_suites_native_test.cmake
- Dropped support (temporarily?) for testing on Android, RISC-V, and ARM with SME
Forked iree_e2e_generated_runner_test.cmake to iree_test_suites_runner_test.cmake
- Dropped support (temporarily?) for filtering within the build system which tests are defined and compile .vmfb files
Now we can set -DIREE_BUILD_TESTS=OFF and avoid pulling in IREE's other tests
Added a new hand-authored linalg_ops/matmul/CMakeLists.txt that runs tests on each backend using default flags

Simplified the test generator

Dropped unused functions
Folded GPU-specific shapes into generic "small" and "large" shape test suites

Ran the `generate_e2e_matmul_tests.py` script offline and checked in the generated files

Currently 56 files totaling 1.90MB on disk (~27000 lines of code according to GitHub)
Now we can inspect the test cases without needing to run the generator locally, and I fixed a few formatting issues
I think this makes test suite management easier, and having the generated files in this test suites repository doesn't cost the main repository much (just extra git checkout time), but I could see a case for more tightly coupling the generator with the test runner

What is left to do?

I want to iterate some more on the linalg_ops/matmul/CMakeLists.txt file or move to a different test runner somehow. I mainly want to support XFAIL in some way for both compiling and running.
We should add back tests using CPU features like AVX512, GPU features like Vulkan float16 extensions, and other non-default flags somehow. Either infer what the compiler can from the host / target, or add test suites explicitly.

ScottTodd · 2024-09-05T23:21:53Z

Requested reviews from a few people that might have suggestions. Sorry, the code is kind of gross before and after... such is the nature of test suites x_x

bjacob

Thanks for looking after these tests!

ScottTodd · 2024-09-06T16:03:18Z

linalg_ops/matmul/generate_e2e_matmul_tests.py

+            # Shapes involving vectors (i.e. most rectangular cases).
+            TestShape(m=1, k=1000, n=1000, accumulate=True),  # large vector*matrix
+            TestShape(m=1000, k=1000, n=1, accumulate=True),  # large matrix*vector
+            TestShape(m=1000, k=1000, n=1, accumulate=False),  # large matrix*vector


I could see these matvec / vecmat test cases getting their own "shapes" group.

Other ideas:

small

large

vectors

square

aligned (all dimensions multiples of 4/8)

unaligned (odd number dimensions)

model_unet

model_resnet

model_llama

What other groupings would make sense to test and label?

Let's think about why exactly do we sometimes want to split test cases into separate "shapes" groups?

I say it's:

To get low-latency small-shapes tests that we can always run, regardless of high latencies and other problems with bigger shapes.

To get shapes that are deemed "suitable" for a particular target. Originally I had created only "small" and "large", both containing odd (what people call "unaligned") shapes to exercise the most corner cases. Then early GPU codegen folks came in and added GPU tests but said "hold on, we can't actually support odd shapes on GPU". That was the reason why "gpu_aligned" was introduced. I don't suppose that's current anymore?

I don't know a reason to split vectors into their own separate group.

I was leaning more towards a single file per test case, like how https://github.com/nod-ai/rocm-gemm-benchmark/tree/main/kernels/mlir is set up. That has "problem sizes" coming from shapes found in production models: https://github.com/nod-ai/rocm-gemm-benchmark/blob/main/gemmbench/problems.py . The focus there is benchmarking though, not correctness testing.

@benvanik pointed out here on Discord that grouping was useful for test suites. A few of his points:

that's painful and wasteful when there's a logical grouping
the cost of launching the processes, creating thread pools/gpu devices, etc to run a single 4x4 matmul is astronomical
small (smoketests that should run frequently) and large (bigger regression tests that run less frequently) and a division by data type is how this stuff naturally falls out
most exclusions per target will be cut on data type - cpu doesn't run bf16, etc
if anything, I'd want a test runner to batch more - especially with how unreliable the AMD devices are

ScottTodd · 2024-09-06T16:07:01Z

linalg_ops/matmul/CMakeLists.txt

+set(_DTYPES)
+# list(APPEND _DTYPES "i8_into_i32")  # Currently failing.
+list(APPEND _DTYPES "f32_into_f32")
+# list(APPEND _DTYPES "f16_into_f16")  # Failing to compile.
+# list(APPEND _DTYPES "f16_into_f32")  # Failing to compile.
+# list(APPEND _DTYPES "bf16_into_bf16")  # Failing to compile.
+# list(APPEND _DTYPES "bf16_into_f32")  # Failing to compile.
+# list(APPEND _DTYPES "f8E4M3FNUZ_into_f32")  # Unsupported data type.
+foreach(_DTYPE IN LISTS _DTYPES)
+  foreach(_SIZE IN LISTS _SIZES)
+    iree_test_suites_runner_test(
+      NAME
+        matmul_vulkan_${_DTYPE}_${_SIZE}
+      TESTS_SRC
+        "generated/${_DTYPE}/matmul_${_DTYPE}_${_SIZE}.mlir"


Could pass list of xpass, xfail, skip here... maybe. Would then want to move compilation from build time to test time.

Vulkan should support f16 and i8, but I think that is target/extension dependent (see the --iree-vulkan-target=valhall flag in the iree-org/iree Vulkan tests and the vulkan_uses_vk_khr_shader_float16_int8 label )

ScottTodd · 2024-09-06T16:08:29Z

linalg_ops/matmul/CMakeLists.txt

+list(APPEND _DTYPES "f16_into_f32")
+list(APPEND _DTYPES "bf16_into_bf16")
+list(APPEND _DTYPES "bf16_into_f32")
+# list(APPEND _DTYPES "f8E4M3FNUZ_into_f32")  # Failing to compile.


iree-org/iree tests ran tests on gfx94x/CDNA3 with the LLVMGPUVectorDistributeMFMA compilation info. Need to bake some target info into the test xpass/xfails.

ScottTodd · 2024-09-16T15:45:16Z

Ping @erman-gurses / others that might have opinions?

erman-gurses

lgtm!

ScottTodd · 2024-10-15T20:07:06Z

I've discussed this on and off with @erman-gurses . We'd like to merge and then iterate.

Probably worth diffing the changes in this repo with iree-org/iree#18725 to make sure we pick up all the improvements.

ScottTodd · 2024-10-15T20:18:13Z

Whoops, I should have known better than to merge without syncing and re-running the tests ;P

https://github.com/iree-org/iree-test-suites/actions/runs/11353574142/job/31578892613#step:8:445
<unknown>:0: error: HIP target not set; did you forget to pass '--iree-hip-target'?

This repo is low traffic and this PR touched a good chunk of large files so I'll fix-forward instead of revert.

This fixes the build error reported here: #22 (comment)

ScottTodd added 9 commits September 5, 2024 10:37

Drop unused functions from generate_e2e_matmul_tests.py.

402076c

Generate and check in .mlir files for each test combination.

5d34645

Fold GPU-specific shapes into generic "small" and "large" shapes.

c7b7745

Fix generated newlines in a few places.

c22d366

Add some TODOs for tidying up the generated files.

9fec305

Fork iree_native_test.cmake into the repo.

904104f

Fork minimal set of CMake helpers, add one iree_test_suites_runner_test.

884d884

Loop over test cases. This could go in a helper test suite function.

f02dc3a

Add VMVX, Vulkan, CUDA, and HIP tests. Disable some failing tests.

313e8e5

ScottTodd requested review from benvanik, bjacob, kuhar and erman-gurses September 5, 2024 23:20

bjacob approved these changes Sep 6, 2024

View reviewed changes

ScottTodd commented Sep 6, 2024

View reviewed changes

This was referenced Sep 10, 2024

Refactor convolution test suite #24

Merged

Generate MLIR files for matmul tests. #14

Closed

ScottTodd mentioned this pull request Sep 27, 2024

[GPU][DT] Add e2e matmul tests for GPU data tiling iree-org/iree#18627

Merged

ScottTodd mentioned this pull request Oct 15, 2024

e2e matmul test improvements iree-org/iree#18725

Merged

erman-gurses approved these changes Oct 15, 2024

View reviewed changes

ScottTodd merged commit 3a6820f into iree-org:main Oct 15, 2024
2 checks passed

ScottTodd deleted the matmul-refactor branch October 15, 2024 20:07

ScottTodd mentioned this pull request Oct 15, 2024

Only enable hip tests if IREE_HIP_TEST_TARGET_CHIP is set. #31

Merged

ScottTodd added a commit that referenced this pull request Oct 15, 2024

Only enable hip tests if IREE_HIP_TEST_TARGET_CHIP is set. (#31)

c6e933e

This fixes the build error reported here: #22 (comment)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor matmul test suite. #22

Refactor matmul test suite. #22

ScottTodd commented Sep 5, 2024 •

edited

Loading

ScottTodd commented Sep 5, 2024

bjacob left a comment

ScottTodd Sep 6, 2024

bjacob Sep 6, 2024

ScottTodd Sep 6, 2024

ScottTodd Sep 6, 2024

ScottTodd Sep 6, 2024

ScottTodd commented Sep 16, 2024

erman-gurses left a comment

ScottTodd commented Oct 15, 2024

ScottTodd commented Oct 15, 2024

Refactor matmul test suite. #22

Refactor matmul test suite. #22

Conversation

ScottTodd commented Sep 5, 2024 • edited Loading

Summaries of changes

Further decoupled test suites from the core CMake project

Simplified the test generator

Ran the generate_e2e_matmul_tests.py script offline and checked in the generated files

What is left to do?

ScottTodd commented Sep 5, 2024

bjacob left a comment

Choose a reason for hiding this comment

ScottTodd Sep 6, 2024

Choose a reason for hiding this comment

bjacob Sep 6, 2024

Choose a reason for hiding this comment

ScottTodd Sep 6, 2024

Choose a reason for hiding this comment

ScottTodd Sep 6, 2024

Choose a reason for hiding this comment

ScottTodd Sep 6, 2024

Choose a reason for hiding this comment

ScottTodd commented Sep 16, 2024

erman-gurses left a comment

Choose a reason for hiding this comment

ScottTodd commented Oct 15, 2024

ScottTodd commented Oct 15, 2024

ScottTodd commented Sep 5, 2024 •

edited

Loading

Ran the `generate_e2e_matmul_tests.py` script offline and checked in the generated files