-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor matmul test suite. #22
Conversation
Requested reviews from a few people that might have suggestions. Sorry, the code is kind of gross before and after... such is the nature of test suites x_x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking after these tests!
# Shapes involving vectors (i.e. most rectangular cases). | ||
TestShape(m=1, k=1000, n=1000, accumulate=True), # large vector*matrix | ||
TestShape(m=1000, k=1000, n=1, accumulate=True), # large matrix*vector | ||
TestShape(m=1000, k=1000, n=1, accumulate=False), # large matrix*vector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could see these matvec / vecmat test cases getting their own "shapes" group.
Other ideas:
- small
- large
- vectors
- square
- aligned (all dimensions multiples of 4/8)
- unaligned (odd number dimensions)
- model_unet
- model_resnet
- model_llama
What other groupings would make sense to test and label?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's think about why exactly do we sometimes want to split test cases into separate "shapes" groups?
I say it's:
- To get low-latency small-shapes tests that we can always run, regardless of high latencies and other problems with bigger shapes.
- To get shapes that are deemed "suitable" for a particular target. Originally I had created only "small" and "large", both containing odd (what people call "unaligned") shapes to exercise the most corner cases. Then early GPU codegen folks came in and added GPU tests but said "hold on, we can't actually support odd shapes on GPU". That was the reason why "gpu_aligned" was introduced. I don't suppose that's current anymore?
I don't know a reason to split vectors into their own separate group.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was leaning more towards a single file per test case, like how https://github.com/nod-ai/rocm-gemm-benchmark/tree/main/kernels/mlir is set up. That has "problem sizes" coming from shapes found in production models: https://github.com/nod-ai/rocm-gemm-benchmark/blob/main/gemmbench/problems.py . The focus there is benchmarking though, not correctness testing.
@benvanik pointed out here on Discord that grouping was useful for test suites. A few of his points:
that's painful and wasteful when there's a logical grouping
the cost of launching the processes, creating thread pools/gpu devices, etc to run a single 4x4 matmul is astronomical
small (smoketests that should run frequently) and large (bigger regression tests that run less frequently) and a division by data type is how this stuff naturally falls out
most exclusions per target will be cut on data type - cpu doesn't run bf16, etc
if anything, I'd want a test runner to batch more - especially with how unreliable the AMD devices are
set(_DTYPES) | ||
# list(APPEND _DTYPES "i8_into_i32") # Currently failing. | ||
list(APPEND _DTYPES "f32_into_f32") | ||
# list(APPEND _DTYPES "f16_into_f16") # Failing to compile. | ||
# list(APPEND _DTYPES "f16_into_f32") # Failing to compile. | ||
# list(APPEND _DTYPES "bf16_into_bf16") # Failing to compile. | ||
# list(APPEND _DTYPES "bf16_into_f32") # Failing to compile. | ||
# list(APPEND _DTYPES "f8E4M3FNUZ_into_f32") # Unsupported data type. | ||
foreach(_DTYPE IN LISTS _DTYPES) | ||
foreach(_SIZE IN LISTS _SIZES) | ||
iree_test_suites_runner_test( | ||
NAME | ||
matmul_vulkan_${_DTYPE}_${_SIZE} | ||
TESTS_SRC | ||
"generated/${_DTYPE}/matmul_${_DTYPE}_${_SIZE}.mlir" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could pass list of xpass, xfail, skip here... maybe. Would then want to move compilation from build time to test time.
Vulkan should support f16 and i8, but I think that is target/extension dependent (see the --iree-vulkan-target=valhall
flag in the iree-org/iree Vulkan tests and the vulkan_uses_vk_khr_shader_float16_int8
label )
list(APPEND _DTYPES "f16_into_f32") | ||
list(APPEND _DTYPES "bf16_into_bf16") | ||
list(APPEND _DTYPES "bf16_into_f32") | ||
# list(APPEND _DTYPES "f8E4M3FNUZ_into_f32") # Failing to compile. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iree-org/iree tests ran tests on gfx94x/CDNA3 with the LLVMGPUVectorDistributeMFMA
compilation info. Need to bake some target info into the test xpass/xfails.
Ping @erman-gurses / others that might have opinions? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
I've discussed this on and off with @erman-gurses . We'd like to merge and then iterate. Probably worth diffing the changes in this repo with iree-org/iree#18725 to make sure we pick up all the improvements. |
Whoops, I should have known better than to merge without syncing and re-running the tests ;P https://github.com/iree-org/iree-test-suites/actions/runs/11353574142/job/31578892613#step:8:445 This repo is low traffic and this PR touched a good chunk of large files so I'll fix-forward instead of revert. |
This fixes the build error reported here: #22 (comment)
Progress on #2. See also the long Discord thread here.
Summaries of changes
Further decoupled test suites from the core CMake project
iree_native_test.cmake
toiree_test_suites_native_test.cmake
iree_e2e_generated_runner_test.cmake
toiree_test_suites_runner_test.cmake
-DIREE_BUILD_TESTS=OFF
and avoid pulling in IREE's other testslinalg_ops/matmul/CMakeLists.txt
that runs tests on each backend using default flagsSimplified the test generator
Ran the
generate_e2e_matmul_tests.py
script offline and checked in the generated filesgit checkout
time), but I could see a case for more tightly coupling the generator with the test runnerWhat is left to do?
linalg_ops/matmul/CMakeLists.txt
file or move to a different test runner somehow. I mainly want to support XFAIL in some way for both compiling and running.