[16.0][IMP] queue_job: HA job runner using session level advisory lock #668

sbidoul · 2024-07-02T11:08:00Z

Another attempt.

closes #422

OCA-git-bot · 2024-07-02T11:08:05Z

Hi @guewen,
some modules you are maintaining are being modified, check this out!

sbidoul · 2024-07-02T12:04:27Z

Yep, this should work.

guewen

Yes!

sbidoul · 2024-07-04T13:45:50Z

@PCatinean do you know who we should ping in the odoo.sh team to have an opinion on this approach?

PCatinean · 2024-07-04T13:54:42Z

Hi @sbidoul the only two people I know around this topic are @amigrave which gave the initial feedback on the advisory lock MR here #256 and @sts-odoo which also provided some feedback on the pg_application_name approach

sbidoul · 2024-07-04T16:44:34Z

@amigrave @sts-odoo so the TL;DR here is that we have one long lived connection to the database on which we take a session-level advisory lock and do a LISTEN. There is no long-lived transaction, so this should not impact replication.

I plan to deploy this on a odoo.sh dev env soon to see how it goes. I can PM you the details if you wish to monitor something.

github-actions · 2024-11-03T12:34:23Z

There hasn't been any activity on this pull request in the past 4 months, so it has been marked as stale and it will be closed automatically if no further activity occurs in the next 30 days.
If you want this PR to never become stale, please ask a PSC member to apply the "no stale" label.

simahawk · 2024-12-06T07:08:17Z

I plan to deploy this on a odoo.sh dev env soon to see how it goes. I can PM you the details if you wish to monitor something.

@sbidoul any feedback?

sbidoul · 2024-12-06T08:41:32Z

Feeback given in #673 (comment).

And rebased.

AnizR

Code LGTM.
I'm going to install it on one of my projects and battle test it.

0yik · 2025-01-09T13:38:26Z

sorry, why is this not merged yet?

luke-stdev001 · 2025-01-20T13:22:35Z

@simahawk @sbidoul ,

I plan to deploy this on a odoo.sh dev env soon to see how it goes. I can PM you the details if you wish to monitor something.

@sbidoul any feedback?

I'd like to run this on my staging and production GKE cluster. I would especially like to test the scaling capabilities of this in my staging environment. If I deploy this to my staging env, would either of you like the keys to my staging env and the GKE staging cluster to kick the tires and load test this with K6 or similar tools?

I would love to see this merged, and would be happy to run this in production after some load testing in staging and report back on results or allow you to monitor.

I can reach out to you via email to get this going through your company's official channels if this is something you'd like to explore.

sbidoul · 2025-01-20T13:32:12Z

Hi everyone. This is not merged precisely because we would like more feedback from actual deployments. Tests are ongoing at Acsone, and I would encourage others to do the same.

luke-stdev001 · 2025-01-21T03:00:45Z

Hi everyone. This is not merged precisely because we would like more feedback from actual deployments. Tests are ongoing at Acsone, and I would encourage others to do the same.

Thanks. I'll get this into staging and then production and report back with findings.

luke-stdev001 · 2025-02-28T05:36:47Z

@sbidoul
extremely silly question from my end, but is it safe to just pull the changes in this branch and upgrade if we're just running on the vanilla OCA/queue modules in 16.0?

I'm pulling this into our staging environment now, but if all goes well I do plan to run this in production over a few weeks and report back if I encounter issues.

AnizR · 2025-02-28T07:18:19Z

Code LGTM. I'm going to install it on one of my projects and battle test it.

It has been deployed in production for almost 3 weeks and I haven't any issue to report

Without this, we leak connections to Databases that don't have queue_job installed.

Without this we risk connection leaks in case of exceptions in the constructor.

sbidoul · 2025-02-28T07:35:29Z

is it safe to just pull the changes in this branch and upgrade if we're just running on the vanilla OCA/queue modules in 16.0?

@luke-stdev001 yes it should be safe. I just rebased.

luke-stdev001 · 2025-03-01T02:45:31Z

is it safe to just pull the changes in this branch and upgrade if we're just running on the vanilla OCA/queue modules in 16.0?

@luke-stdev001 yes it should be safe. I just rebased.

Thank you.

#422 (comment)

Yes, same config on all instances will work.

@sbidoul ,

Seems to be working well from initial load testing in staging, thank you.

I'd like to rearchitect our GKE based HA deployment of Odoo:

User-facing app instances with server wide modules without queue job and with HTTP workers
Larger single vertically scaled instance for cron workers, no HTTP workers (current architecture we have has queue jobs on this same server for the same reasons)
Queue job only instance with server wide modules with queue job and with no HTTP workers, dedicated node pool to scale out queue instances separately to user-facing instances

My understanding is that with the DB managing leader election it should be perfectly acceptable to have a dedicated auto-scaling pool of queue job only instances for distributing jobs to, that can scale up/down with demand, while leaving user-facing instances unaffected performance-wise.

If you wouldn't mind could you confirm if that assumption is correct? My apologies if there are any fundamental misunderstandings on my side to how this works.

Once i've had a week to toy with the concept in staging i'll deploy to production and advise on progress.

sbidoul · 2025-03-02T10:06:38Z

My understanding is that with the DB managing leader election it should be perfectly acceptable to have a dedicated auto-scaling pool of queue job only instances for distributing jobs to, that can scale up/down with demand, while leaving user-facing instances unaffected performance-wise.

You can do that yes. I'm curious about the metrics you plan to use for auto scaling.

Note this was already feasible without this PR, with a single dedicated pod with --load=queue_job sending the requests to run jobs to the pool. This PR simplifies configuration, helps avoiding configuration mistakes, and also helps in situations where you cannot have instances with different configs such as odoo.sh.

luke-stdev001 · 2025-03-03T12:12:34Z

My understanding is that with the DB managing leader election it should be perfectly acceptable to have a dedicated auto-scaling pool of queue job only instances for distributing jobs to, that can scale up/down with demand, while leaving user-facing instances unaffected performance-wise.

You can do that yes. I'm curious about the metrics you plan to use for auto scaling.

Note this was already feasible without this PR, with a single dedicated pod with --load=queue_job sending the requests to run jobs to the pool. This PR simplifies configuration, helps avoiding configuration mistakes, and also helps in situations where you cannot have instances with different configs such as odoo.sh.

Thanks, I wasn't aware of that ability previously, i'll take a look.

To be perfectly honest when it comes to auto-scaling metrics we will be figuring it out as we go along and playing with what works and learning what doesn't. I'm happy to report back here with our own internal notes and would love to hear from anyone else who has any advice.

I am thinking at present perhaps WSGI request queue length, request rate, request duration latency, ratio of busy workers to total number of workers, and then DB connection pool saturation.

I'm happy to report back that this PR is working fine in production, and has been for a few days. I will monitor closely for issues, but so far I have not encountered any hiccups.

sbidoul mentioned this pull request Jul 2, 2024

[IMP] Multi-node high-availability jobrunner #607

Closed

sbidoul force-pushed the 16.0-ha-runner-sbi branch 3 times, most recently from 02ef89b to deecd27 Compare July 2, 2024 12:00

guewen approved these changes Jul 2, 2024

View reviewed changes

qgroulard mentioned this pull request Jul 18, 2024

[17.0] [IMP] queue_job: HA job runner using session level advisory lock #673

Open

github-actions bot added the stale PR/Issue without recent activity, it'll be soon closed automatically. label Nov 3, 2024

sbidoul removed the stale PR/Issue without recent activity, it'll be soon closed automatically. label Nov 3, 2024

sbidoul changed the title ~~[IMP] queue_job: HA job runner using session level advisory lock~~ [16.0][IMP] queue_job: HA job runner using session level advisory lock Dec 4, 2024

sbidoul force-pushed the 16.0-ha-runner-sbi branch from b65bbc6 to ffb27a4 Compare December 6, 2024 08:40

AnizR approved these changes Dec 6, 2024

View reviewed changes

OCA-git-bot added the approved label Dec 6, 2024

sbidoul mentioned this pull request Jan 31, 2025

[Question] queue_job when multiple odoo servers are used with load balancing (and single postgres) #422

Open

sbidoul added 3 commits February 28, 2025 08:34

[FIX] queue_job: close connection to databases without job queue

3c5e525

Without this, we leak connections to Databases that don't have queue_job installed.

[FIX] queue_job: handle exceptions in Database constructor

d505f65

Without this we risk connection leaks in case of exceptions in the constructor.

[IMP] queue_job: HA job runner using session level advisory lock

c592ed8

[IMP] queue_job: make sorting more explicit

2631808

sbidoul force-pushed the 16.0-ha-runner-sbi branch from ffb27a4 to 2631808 Compare February 28, 2025 07:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[16.0][IMP] queue_job: HA job runner using session level advisory lock #668

[16.0][IMP] queue_job: HA job runner using session level advisory lock #668

sbidoul commented Jul 2, 2024 •

edited

Loading

OCA-git-bot commented Jul 2, 2024

sbidoul commented Jul 2, 2024

guewen left a comment

sbidoul commented Jul 4, 2024

PCatinean commented Jul 4, 2024

sbidoul commented Jul 4, 2024 •

edited

Loading

github-actions bot commented Nov 3, 2024

simahawk commented Dec 6, 2024

sbidoul commented Dec 6, 2024 •

edited

Loading

AnizR left a comment

0yik commented Jan 9, 2025

luke-stdev001 commented Jan 20, 2025 •

edited

Loading

sbidoul commented Jan 20, 2025

luke-stdev001 commented Jan 21, 2025

luke-stdev001 commented Feb 28, 2025

AnizR commented Feb 28, 2025

sbidoul commented Feb 28, 2025

luke-stdev001 commented Mar 1, 2025 •

edited

Loading

sbidoul commented Mar 2, 2025

luke-stdev001 commented Mar 3, 2025 •

edited

Loading

[16.0][IMP] queue_job: HA job runner using session level advisory lock #668

Are you sure you want to change the base?

[16.0][IMP] queue_job: HA job runner using session level advisory lock #668

Conversation

sbidoul commented Jul 2, 2024 • edited Loading

OCA-git-bot commented Jul 2, 2024

sbidoul commented Jul 2, 2024

guewen left a comment

Choose a reason for hiding this comment

sbidoul commented Jul 4, 2024

PCatinean commented Jul 4, 2024

sbidoul commented Jul 4, 2024 • edited Loading

github-actions bot commented Nov 3, 2024

simahawk commented Dec 6, 2024

sbidoul commented Dec 6, 2024 • edited Loading

AnizR left a comment

Choose a reason for hiding this comment

0yik commented Jan 9, 2025

luke-stdev001 commented Jan 20, 2025 • edited Loading

sbidoul commented Jan 20, 2025

luke-stdev001 commented Jan 21, 2025

luke-stdev001 commented Feb 28, 2025

AnizR commented Feb 28, 2025

sbidoul commented Feb 28, 2025

luke-stdev001 commented Mar 1, 2025 • edited Loading

sbidoul commented Mar 2, 2025

luke-stdev001 commented Mar 3, 2025 • edited Loading

sbidoul commented Jul 2, 2024 •

edited

Loading

sbidoul commented Jul 4, 2024 •

edited

Loading

sbidoul commented Dec 6, 2024 •

edited

Loading

luke-stdev001 commented Jan 20, 2025 •

edited

Loading

luke-stdev001 commented Mar 1, 2025 •

edited

Loading

luke-stdev001 commented Mar 3, 2025 •

edited

Loading