SQL code used by some containers can cause PG deadlocks, and the containers crash in response #357

simon-20 · 2024-11-20T16:59:49Z

Brief Description
This issue occurs regularly but not frequently, and has been doing so as far as our logs go back. The issue is that some of the SQL UPDATE commands which reset the *_start flags for each pipeline stage can enter deadlock. Here's an example from the Postgres logs:

2024-11-20 00:20:47 UTC-673d2b5e.880d0-ERROR:  deadlock detected
2024-11-20 00:20:47 UTC-673d2b5e.880d0-DETAIL:  Process 557264 waits for ShareLock on transaction 868302883; blocked by process 557268.
	Process 557268 waits for ShareLock on transaction 868302881; blocked by process 557264.
	Process 557264: 
	        UPDATE document
	        SET clean_start = null
	        WHERE clean_end is null
	    
	Process 557268: 
	        UPDATE document
	        SET flatten_start=null
	        WHERE flatten_end is null

This can happen (even though the data being updated is not the same) because Postgres locks the entire row when updating, the two updates may contain the same rows, and the order in which UPDATE processes the rows is not deterministic.

Severity
High

Issue Location
one of the database statements above

Expected Results/Behaviour
Minimum: the containers should not crash when deadlocks occur.
Ideal: use SELECT FOR UPDATE to ensure the UPDATEs process records in the same order; this should then cause one to wait for another, and avoid deadlock.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SQL code used by some containers can cause PG deadlocks, and the containers crash in response #357

SQL code used by some containers can cause PG deadlocks, and the containers crash in response #357

simon-20 commented Nov 20, 2024

SQL code used by some containers can cause PG deadlocks, and the containers crash in response #357

SQL code used by some containers can cause PG deadlocks, and the containers crash in response #357

Comments

simon-20 commented Nov 20, 2024