-
-
Notifications
You must be signed in to change notification settings - Fork 890
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scheduled tasks thread permanently crashes due to database deadlocks (causes "hot" to stop updating) #3076
Comments
Any interest in incorporating something to provide general consensus (etcd or zookeeper)? Would provide a solution here and help where it might be needed in the future |
After further investigation, I am unfortunately starting to think that this issue is not caused by multiple concurrent scheduled task executions (at least not exclusively). I have been running the code from #3077 on https://lemm.ee since last night, with only one node having scheduled tasks enabled, and I've still seen deadlocks crash the scheduled tasks thread several times on the sole node running scheduled tasks. I will investigate this further. |
I just received confirmation that I am not the only one experiencing this, and it's also happening on an instance with a single lemmy_server process. So far, I have only seen deadlocks in the
There is a long gap between the 3rd and 4th deadlock, because the scheduled tasks thread was stopped for that whole period as I was asleep - generally, the deadlock seems to occur within an hour of restarting lemmy_server on lemm.ee. |
My current theory is that this deadlock is caused by
This function also updates the |
These tasks are run from |
I'm already creating exactly that PR, will submit for review soon |
This comment was marked as abuse.
This comment was marked as abuse.
This comment was marked as abuse.
This comment was marked as abuse.
Deadlocks in PG aren't really critical at all - on deadlock the code should retry the transactions until they go through. But they can also be prevented. Deadlocks only happen when multiple transactions update overlapping rows in a different order. So to prevent them, all things that lock multiple rows (updates) just need to be ordered. Looks like the comment_aggregates table is updated in three tables:
So to fix this deadlock, something like this should work (untested): diff --git a/crates/db_schema/src/impls/comment.rs b/crates/db_schema/src/impls/comment.rs
index 46045cd1..49ac2407 100644
--- a/crates/db_schema/src/impls/comment.rs
+++ b/crates/db_schema/src/impls/comment.rs
@@ -111,6 +111,7 @@ from (
join comment c2 on c2.path <@ c.path and c2.path != c.path
and c.path <@ '{top_parent}'
group by c.id
+ order by c.id
) as c
where ca.comment_id = c.id"
);
diff --git a/src/scheduled_tasks.rs b/src/scheduled_tasks.rs
index 0f75fdba..85518dbb 100644
--- a/src/scheduled_tasks.rs
+++ b/src/scheduled_tasks.rs
@@ -72,7 +72,7 @@ pub fn setup(db_url: String, user_agent: String) -> Result<(), LemmyError> {
/// Update the hot_rank columns for the aggregates tables
fn update_hot_ranks(conn: &mut PgConnection, last_week_only: bool) {
let mut post_update = diesel::update(post_aggregates::table).into_boxed();
- let mut comment_update = diesel::update(comment_aggregates::table).into_boxed();
+ let mut comment_select = comment_aggregates::table.select(comment_aggregates::comment_id).order(comment_aggregates::comment_id.asc()).into_boxed();
let mut community_update = diesel::update(community_aggregates::table).into_boxed();
// Only update for the last week of content
@@ -81,7 +81,7 @@ fn update_hot_ranks(conn: &mut PgConnection, last_week_only: bool) {
let last_week = now - diesel::dsl::IntervalDsl::weeks(1);
post_update = post_update.filter(post_aggregates::published.gt(last_week));
- comment_update = comment_update.filter(comment_aggregates::published.gt(last_week));
+ comment_select = comment_select.filter(comment_aggregates::published.gt(last_week));
community_update = community_update.filter(community_aggregates::published.gt(last_week));
} else {
info!("Updating hot ranks for all history...");
@@ -103,11 +103,12 @@ fn update_hot_ranks(conn: &mut PgConnection, last_week_only: bool) {
}
}
- match comment_update
+ match diesel::update(comment_aggregates::table)
.set(comment_aggregates::hot_rank.eq(hot_rank(
comment_aggregates::score,
comment_aggregates::published,
)))
+ .filter(comment_aggregates::comment_id.eq_any(comment_select))
.execute(conn)
{
Ok(_) => {} In addition, it would be better for scalability if this update was done in batches (e.g. 100 comments at a time) because otherwise all comments are still locked simultaneously and every other operation on comment_aggregates has to wait. That's unrelated to the deadlocks though. |
* Move connection creation into scheduler. - #3076 * Fix clippy.
Issue Summary
In 0.17.4, scheduled tasks can cause deadlocks like this:
There are two key issues here:
One major result of this issue is that the "hot" tab stops updating, because the
hot_rank
calculation just stops completely.Steps to Reproduce
Start twoUPDATE: also happens with a single lemmy_server processlemmy_server
processesThe only way to mitigate this issue for instance admins on 0.17.4 right now is to regularly restart lemmy_server
The text was updated successfully, but these errors were encountered: