Possible to do "background compute work" without multi-queue support? #6268

BGR360 · 2024-09-14T00:02:20Z

BGR360
Sep 14, 2024

I'm working on an interactive Mandelbrot fractal explorer / visualizer. I use eframe to render GUI frames using its wgpu backend.

Computing the fractal for each pixel is very compute intensive. I'm interested in offloading that computation to the GPU.

The computation needs to happen "in the background," i.e., not block the main UI loop. When the fractal is zoomed in very deeply, these computations can take multiple seconds to complete.

Having read up on the documentation for Queue, some things are not clear to me about how submitted work is executed:

1. Is each distinct `submit()` call to a `Queue` executed serially on the GPU?

In other words, does one long submission block subsequent submissions from executing?

Let's say my GUI thread submits work to render a frame, and then my compute thread submits work to compute the fractal. If the compute submission takes a long time, does that necessarily delay the next GUI frame? Or can the work for the next GUI frame execute and finish while the compute submission is still going?

2. Is each distinct `CommandBuffer` in a `submit([buffers])` call executed serially on the GPU?

In other words, can I expect the GPU to parallelize work if I submit that work in multiple command buffers? And if I submit the work in fewer command buffers, should I expect less parallelism?

Reason I ask is because I am breaking the screen into chunks and computing the fractal data separately per chunk. I want to know that the GPU will process as many chunks in parallel as it can, even if they're split into separate ComputePasses.

I see some discussion of this in #5074 (comment), but it seems the original poster was left hanging on an answer.

JMS55 · 2024-09-14T00:43:44Z

JMS55
Sep 14, 2024

Yes (wgpu automatically inserts fences/semaphores/etc between queue submissions)
Yes (wgpu automatically inserts barriers between command buffers)

You might be able to work something out with using more than 1 Adapter, and copying the result back to the CPU to send to the other adapter to render. Not sure though, I've never tried more than 1 adapter.

But in general what you want is "multi queue" or "async compute", which wgpu does not support, and is unlikely to be supported for a long while given the complexity involved.

4 replies

BGR360 Sep 14, 2024
Author

Yes (wgpu automatically inserts barriers between command buffers)

Sorry, could you clarify? Barriers makes it sound like command buffers run in parallel and then are synchronized with each other when they all complete. But do you mean that each command buffer runs one after the other?

JMS55 Sep 14, 2024

Each command buffer will run one after another. In order to ensure that, wgpu inserts barriers between command buffers when multiple command buffers are sent in one submit call. See also "usage scope" in the WebGPU spec.

BGR360 Sep 14, 2024
Author

Is there any guaranteed order of execution for the list of command buffers supplied to submit()? Are they guaranteed to execute in the same order as they are in the list? Or is there no ordering guarantee?

JMS55 Sep 14, 2024

They will execute in the same order as supplied to submit.

If you do two submit calls, all command buffers in the first call will finish executing before the second ones start.

kpreid · 2024-09-14T00:51:58Z

kpreid
Sep 14, 2024

My understanding is that even if wgpu supported multiple queues, you should still, for compatibility, assume the worst case: that the GPU is not capable of executing concurrent work at all, nor of preemption. So, while the single queue is a significant limitation on concurrency for your own purposes, even if it wasn't, if any individual piece of work takes too long, this may not only interfere with your UI, but interfere with other applications’ UI and other GPU work, and may even cause the driver to abort your computation (and cause the device to become “lost”).

It's up to you to divide your work into small enough chunks that no chunk takes too long (while still keeping them large enough to fully occupy the SIMD-parallel hardware). Unfortunately, I don't know what type of subdivision is necessary — I haven't studied this problem in any detail, beyond entire frames taking too long — but you might need just separate dispatch_workgroups() calls, or separate command buffers.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible to do "background compute work" without multi-queue support? #6268

{{title}}

Replies: 2 comments 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Possible to do "background compute work" without multi-queue support? #6268

BGR360 Sep 14, 2024

1. Is each distinct submit() call to a Queue executed serially on the GPU?

2. Is each distinct CommandBuffer in a submit([buffers]) call executed serially on the GPU?

Replies: 2 comments · 4 replies

JMS55 Sep 14, 2024

BGR360 Sep 14, 2024 Author

JMS55 Sep 14, 2024

BGR360 Sep 14, 2024 Author

JMS55 Sep 14, 2024

kpreid Sep 14, 2024

BGR360
Sep 14, 2024

1. Is each distinct `submit()` call to a `Queue` executed serially on the GPU?

2. Is each distinct `CommandBuffer` in a `submit([buffers])` call executed serially on the GPU?

Replies: 2 comments 4 replies

JMS55
Sep 14, 2024

BGR360 Sep 14, 2024
Author

BGR360 Sep 14, 2024
Author

kpreid
Sep 14, 2024