-
Notifications
You must be signed in to change notification settings - Fork 976
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deadlock in Device #5279
Comments
I'm sporadically experiencing a similar deadlock in my app, I think it might be the same issue. In my case one thread is calling device.create_buffer_init and the other is calling queue.write_texture. For me it started happening after the update to wgpu 0.19, at least that's when I've first experienced it. Full Backtrace
|
I'm also seeing this very sporadically (twice in the past ~2 weeks), and in my case it seems to only involve Presentation thread (main):
Background/simulation thread:
|
Another data point: This happens significantly more frequently when resizing the window while the background thread is running. This is on Windows, so resize happens in a modal loop, so this issue may interact with rust-windowing/winit#3272. When the background thread is paused, deadlocks do not seem to occur. |
One more data point: This happens frequently on macOS with MoltenVK (without any resizing etc. going on). Manually protecting all calls to |
I'm hitting this bug basically instantly in one of my applications. Some instrumentation of the snatch lock has revealed this (on current trunk, but line numbers may be off by one because I added some single-line comments):
Generally, a thread must only ever hold a single read lock at a time, otherwise you're immediately open to deadlocks when another thread attempts to acquire a write lock inbetween. The
(I'm honestly somewhat surprised that neither the stdlib's nor parking_lot's |
I haven't worked through this in complete detail, but just on general principles:
|
@SludgePhD Thank you for this very clear illustration of where the code goes wrong! |
#5426 fixed this. |
I am using wgpu in a multi-threaded scenario where I have multiple render threads that each manage a separate window/view. I recently upgraded to 0.19 for the improved multi-threading support and started to run into these deadlock scenarios (to be fair I haven't tried to repro with 0.18 so it might be an issue with 0.18 as well).
Here are stack traces from a repro.
Thread 1 and Thread 2 are both executing
queue_write_texture
.queue_write_texture
first locksdevice.pending_writes.lock()
(line 743 in queue.rs) then later acquires a read lock on thedevice.snatchable_lock.read()
(line 788 in queue.rs).Thread 3 is executing a
queue_submit
.queue_submit
first acquires a read lock ondevice.snatchable_lock.read()
(line 1120 in queue.rs) then later acquires thedevice.pending_writes.lock()
(line 1401 in queue.rs)If all these locks were exclusive it would be an obvious deadlock since the order the locks are taken in are different in
queue_write_texture
andqueue_submit
, but thedevice.snatchable_lock
is anRwLock
and only read access is requested. This last bit tripped me up a bit. I think the deadlock comes when we introduce Thread 4.Thread 4 is doing a
SurfaceTexture::present()
. On line 326 in present.rs a write lock is acquired ondevice.snatchable_lock.write()
. This attempt will block since Thread 3 already holds a read lock ondevice.snatchable_lock
.As per the documentation of
parking_lot::RwLock
:"This lock uses a task-fair locking policy which avoids both reader and writer starvation. This means that readers trying to acquire the lock will block even if the lock is unlocked when there are writers waiting to acquire the lock. Because of this, attempts to recursively acquire a read lock within a single thread may result in a deadlock." (emphasis is mine)
This means that we have a deadlock:
Thread 1 is waiting for Thread 4 (the
RwLock
being fair and will block new readers since there is a writer waiting)Thread 4 is waiting for Thread 3 (Thread 3 holds a read access to the
snatchable_lock
)Thread 3 is waiting for Thread 1 (Thread 1 holds the
device.pending_writes
)I don't know the codebase at all, so I can't suggest how to fix this, but hopefully this investigation can help pinpoint the problem.
Platform:
wgpu 0.19.1
OS Win 11
The text was updated successfully, but these errors were encountered: