Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky hip command_buffer_dispatch tests returning all 0s #20061

Open
ScottTodd opened this issue Feb 21, 2025 · 0 comments
Open

Flaky hip command_buffer_dispatch tests returning all 0s #20061

ScottTodd opened this issue Feb 21, 2025 · 0 comments
Assignees
Labels
bug 🐞 Something isn't working hal/hip Runtime HIP HAL backend

Comments

@ScottTodd
Copy link
Member

What happened?

Some tests have started failing on and off, with logs like these:

5/186 Test   #60: iree/hal/drivers/hip/cts/hip_stream_command_buffer_dispatch_constants_test .......................***Failed    1.12 sec
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from CommandBufferDispatchConstantsTest
[ RUN      ] CommandBufferDispatchConstantsTest.DispatchWithDispatchConstants
iree/runtime/src/iree/hal/cts/command_buffer_dispatch_constants_test.h:124: Failure
Value of: output_data
Expected: equals { 11, 22, 33, [44](https://github.com/iree-org/iree/actions/runs/13448729921/job/37579537431#step:11:45) }
  Actual: { 0, 0, 0, 0 }, which has these unexpected elements: 0, 0, 0, 0,
and doesn't have these expected elements: 11, 22, 33, 44

[  FAILED  ] CommandBufferDispatchConstantsTest.DispatchWithDispatchConstants (10 ms)

These are all the tests that failed, each in a similar way:

The following tests FAILED:
Errors while running CTest
	 60 - iree/hal/drivers/hip/cts/hip_stream_command_buffer_dispatch_constants_test (Failed)
	 69 - iree/hal/drivers/hip/cts/hip_graph_command_buffer_dispatch_constants_test (Failed)
	 78 - iree/hal/drivers/hip/cts/hip_multi_queue_stream_command_buffer_dispatch_constants_test (Failed)
	 87 - iree/hal/drivers/hip/cts/hip_multi_queue_graph_command_buffer_dispatch_constants_test (Failed)
	 96 - iree/hal/drivers/hip/cts/hip_multi_queue_stream_queue_1_command_buffer_dispatch_constants_test (Failed)
	105 - iree/hal/drivers/hip/cts/hip_multi_queue_graph_queue_1_command_buffer_dispatch_constants_test (Failed)

Steps to reproduce your issue

  1. Build with ROCm/HIP enabled
  2. Run ctest

What component(s) does this issue relate to?

Runtime

Version information

Observed on fb3523b and 308d176.

There were recent changes to https://github.com/iree-org/iree/commits/main/runtime/src/iree/hal/drivers/hip that look relevant:

Additional context

So far only seeing this on MI300 CI bots in this test_mi300 job, which have other ongoing infrastructure issues that make spotting these flakes harder: #19955.

@ScottTodd ScottTodd added bug 🐞 Something isn't working hal/hip Runtime HIP HAL backend labels Feb 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐞 Something isn't working hal/hip Runtime HIP HAL backend
Projects
None yet
Development

No branches or pull requests

2 participants