Scheduled tasks thread permanently crashes due to database deadlocks (causes "hot" to stop updating) #3076

sunaurus · 2023-06-13T19:11:26Z

Did you check to see if this issue already exists?
Is this only a single bug? Do not put multiple bugs in one issue.
Is this a question or discussion? Don't use this, use https://lemmy.ml/c/lemmy_support .
Is this a UI / front end issue? Use the lemmy-ui repo.

Issue Summary

In 0.17.4, scheduled tasks can cause deadlocks like this:

2023-06-13 21:05:00	2023-06-13T18:05:00.447695Z  INFO lemmy_server::scheduled_tasks: Updating hot ranks for last week...
2023-06-13 21:05:12	thread '<unnamed>' panicked at 'update comment_aggregate hot_ranks: DatabaseError(Unknown, "deadlock detected")', src/scheduled_tasks.rs:82:6

There are two key issues here:

The general issue that scheduled tasks thread can't recover from database errors
Whatever specific set of queries is causing the deadlocks

One major result of this issue is that the "hot" tab stops updating, because the hot_rank calculation just stops completely.

Steps to Reproduce

~~Start two lemmy_server processes~~ UPDATE: also happens with a single lemmy_server process
Wait for scheduled tasks to execute for a few times (sometimes a deadlock won't happen on the first run)
...
Eventually, there will be a database deadlock

The only way to mitigate this issue for instance admins on 0.17.4 right now is to regularly restart lemmy_server

The text was updated successfully, but these errors were encountered:

amassicci2 · 2023-06-13T22:34:24Z

Any interest in incorporating something to provide general consensus (etcd or zookeeper)? Would provide a solution here and help where it might be needed in the future

sunaurus · 2023-06-14T05:20:53Z

After further investigation, I am unfortunately starting to think that this issue is not caused by multiple concurrent scheduled task executions (at least not exclusively).

I have been running the code from #3077 on https://lemm.ee since last night, with only one node having scheduled tasks enabled, and I've still seen deadlocks crash the scheduled tasks thread several times on the sole node running scheduled tasks.

I will investigate this further.

sunaurus · 2023-06-14T05:46:24Z

I just received confirmation that I am not the only one experiencing this, and it's also happening on an instance with a single lemmy_server process.

So far, I have only seen deadlocks in the comment_aggregates table:

Jun 13 20:50:13 lemmee-app-one lemmy_server[31307]: thread '<unnamed>' panicked at 'update comment_aggregate hot_ranks: DatabaseError(Unknown, "deadlock detected")', src/scheduled_tasks.rs:82:6
Jun 13 21:25:11 lemmee-app-one lemmy_server[34090]: thread '<unnamed>' panicked at 'update comment_aggregate hot_ranks: DatabaseError(Unknown, "deadlock detected")', src/scheduled_tasks.rs:82:6
Jun 13 22:20:12 lemmee-app-one lemmy_server[37191]: thread '<unnamed>' panicked at 'update comment_aggregate hot_ranks: DatabaseError(Unknown, "deadlock detected")', src/scheduled_tasks.rs:82:6
Jun 14 05:40:12 lemmee-app-one lemmy_server[38943]: thread '<unnamed>' panicked at 'update comment_aggregate hot_ranks: DatabaseError(Unknown, "deadlock detected")', src/scheduled_tasks.rs:82:6

There is a long gap between the 3rd and 4th deadlock, because the scheduled tasks thread was stopped for that whole period as I was asleep - generally, the deadlock seems to occur within an hour of restarting lemmy_server on lemm.ee.

sunaurus · 2023-06-14T07:26:48Z

My current theory is that this deadlock is caused by Comment::create here:

lemmy/crates/db_schema/src/impls/comment.rs

Line 54 in 87aa0ac

pub async fn create(

This function also updates the comment_aggregates table, and it is being invoked constantly even by federated comments

Nutomic · 2023-06-14T10:28:28Z

These tasks are run from src/scheduled_tasks.rs. That file uses expect in a lot of places, meaning any error will cause the thread to crash with given error message. To fix it the entire file should be changed to use Result and ? operator to handle errors gracefully. Would be good if someone can make a PR for this.

sunaurus · 2023-06-14T10:57:08Z

I'm already creating exactly that PR, will submit for review soon

- #3076

phiresky · 2023-06-16T15:35:00Z

Deadlocks in PG aren't really critical at all - on deadlock the code should retry the transactions until they go through.

But they can also be prevented. Deadlocks only happen when multiple transactions update overlapping rows in a different order. So to prevent them, all things that lock multiple rows (updates) just need to be ordered.

Looks like the comment_aggregates table is updated in three tables:

the comment_aggregates_score trigger. this only updates a single row so isn't relevant for deadlocks
the query in Comment::create as @sunaurus mentioned
the scheduled update task

So to fix this deadlock, something like this should work (untested):

diff --git a/crates/db_schema/src/impls/comment.rs b/crates/db_schema/src/impls/comment.rs
index 46045cd1..49ac2407 100644
--- a/crates/db_schema/src/impls/comment.rs
+++ b/crates/db_schema/src/impls/comment.rs
@@ -111,6 +111,7 @@ from (
   join comment c2 on c2.path <@ c.path and c2.path != c.path
   and c.path <@ '{top_parent}'
   group by c.id
+  order by c.id
 ) as c
 where ca.comment_id = c.id"
           );
diff --git a/src/scheduled_tasks.rs b/src/scheduled_tasks.rs
index 0f75fdba..85518dbb 100644
--- a/src/scheduled_tasks.rs
+++ b/src/scheduled_tasks.rs
@@ -72,7 +72,7 @@ pub fn setup(db_url: String, user_agent: String) -> Result<(), LemmyError> {
 /// Update the hot_rank columns for the aggregates tables
 fn update_hot_ranks(conn: &mut PgConnection, last_week_only: bool) {
   let mut post_update = diesel::update(post_aggregates::table).into_boxed();
-  let mut comment_update = diesel::update(comment_aggregates::table).into_boxed();
+  let mut comment_select = comment_aggregates::table.select(comment_aggregates::comment_id).order(comment_aggregates::comment_id.asc()).into_boxed();
   let mut community_update = diesel::update(community_aggregates::table).into_boxed();
 
   // Only update for the last week of content
@@ -81,7 +81,7 @@ fn update_hot_ranks(conn: &mut PgConnection, last_week_only: bool) {
     let last_week = now - diesel::dsl::IntervalDsl::weeks(1);
 
     post_update = post_update.filter(post_aggregates::published.gt(last_week));
-    comment_update = comment_update.filter(comment_aggregates::published.gt(last_week));
+    comment_select = comment_select.filter(comment_aggregates::published.gt(last_week));
     community_update = community_update.filter(community_aggregates::published.gt(last_week));
   } else {
     info!("Updating hot ranks for all history...");
@@ -103,11 +103,12 @@ fn update_hot_ranks(conn: &mut PgConnection, last_week_only: bool) {
     }
   }
 
-  match comment_update
+  match diesel::update(comment_aggregates::table)
     .set(comment_aggregates::hot_rank.eq(hot_rank(
       comment_aggregates::score,
       comment_aggregates::published,
     )))
+    .filter(comment_aggregates::comment_id.eq_any(comment_select))
     .execute(conn)
   {
     Ok(_) => {}

In addition, it would be better for scalability if this update was done in batches (e.g. 100 comments at a time) because otherwise all comments are still locked simultaneously and every other operation on comment_aggregates has to wait. That's unrelated to the deadlocks though.

* Move connection creation into scheduler. - #3076 * Fix clippy.

sunaurus · 2023-06-25T15:23:33Z

The deadlocks crashing the scheduled tasks thread was fixed in #3090, which is included in 0.18
The deadlocks themselves should be significantly reduced (in practice, on lemm.ee, completely resolved) in #3175, which has not been merged yet

sunaurus added the bug Something isn't working label Jun 13, 2023

sunaurus mentioned this issue Jun 13, 2023

Add --disable-scheduled-tasks CLI flag #3077

Merged

sunaurus changed the title ~~Scheduled tasks create database deadlocks when running multiple loadbalanced lemmy_server processes~~ Scheduled tasks thread permanently crashes due to database deadlocks Jun 14, 2023

sunaurus changed the title ~~Scheduled tasks thread permanently crashes due to database deadlocks~~ Scheduled tasks thread permanently crashes due to database deadlocks (causes "hot" to stop updating) Jun 14, 2023

sunaurus mentioned this issue Jun 14, 2023

Handle scheduled task errors #3090

Merged

dessalines added a commit that referenced this issue Jun 15, 2023

Move connection creation into scheduler.

dd26f0f

- #3076

dessalines mentioned this issue Jun 15, 2023

Move connection creation into scheduler. #3120

Merged

sunaurus mentioned this issue Jun 15, 2023

[Bug]: Old posts get synced with a too high hot_rank #3125

Closed

4 tasks

This comment was marked as abuse.

Sign in to view

sunaurus mentioned this issue Jun 18, 2023

Hot rank update batching + deadlock avoidance #3175

Merged

Nutomic pushed a commit that referenced this issue Jun 20, 2023

Move connection creation into scheduler. (#3120)

981c111

* Move connection creation into scheduler. - #3076 * Fix clippy.

This was referenced Jun 24, 2023

Repeated deadlock detected warning in lemmy backend log after 0.18.0 update #3314

Closed

[Bug]: Postgres container - 100% CPU usage and erros #3342

Closed

This comment was marked as abuse.

Sign in to view

Nutomic closed this as completed Jun 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scheduled tasks thread permanently crashes due to database deadlocks (causes "hot" to stop updating) #3076

Scheduled tasks thread permanently crashes due to database deadlocks (causes "hot" to stop updating) #3076

sunaurus commented Jun 13, 2023 •

edited

Loading

amassicci2 commented Jun 13, 2023

sunaurus commented Jun 14, 2023

sunaurus commented Jun 14, 2023 •

edited

Loading

sunaurus commented Jun 14, 2023

Nutomic commented Jun 14, 2023

sunaurus commented Jun 14, 2023

This comment was marked as abuse.

This comment was marked as abuse.

phiresky commented Jun 16, 2023 •

edited

Loading

This comment was marked as abuse.

sunaurus commented Jun 25, 2023 •

edited

Loading

Scheduled tasks thread permanently crashes due to database deadlocks (causes "hot" to stop updating) #3076

Scheduled tasks thread permanently crashes due to database deadlocks (causes "hot" to stop updating) #3076

Comments

sunaurus commented Jun 13, 2023 • edited Loading

Issue Summary

Steps to Reproduce

amassicci2 commented Jun 13, 2023

sunaurus commented Jun 14, 2023

sunaurus commented Jun 14, 2023 • edited Loading

sunaurus commented Jun 14, 2023

Nutomic commented Jun 14, 2023

sunaurus commented Jun 14, 2023

This comment was marked as abuse.

This comment was marked as abuse.

phiresky commented Jun 16, 2023 • edited Loading

This comment was marked as abuse.

sunaurus commented Jun 25, 2023 • edited Loading

sunaurus commented Jun 13, 2023 •

edited

Loading

sunaurus commented Jun 14, 2023 •

edited

Loading

phiresky commented Jun 16, 2023 •

edited

Loading

sunaurus commented Jun 25, 2023 •

edited

Loading