-
-
Notifications
You must be signed in to change notification settings - Fork 472
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[16.0][IMP] queue_job: HA job runner using session level advisory lock #668
base: 16.0
Are you sure you want to change the base?
Conversation
Hi @guewen, |
02ef89b
to
deecd27
Compare
Yep, this should work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes!
@PCatinean do you know who we should ping in the odoo.sh team to have an opinion on this approach? |
@amigrave @sts-odoo so the TL;DR here is that we have one long lived connection to the database on which we take a session-level advisory lock and do a I plan to deploy this on a odoo.sh dev env soon to see how it goes. I can PM you the details if you wish to monitor something. |
There hasn't been any activity on this pull request in the past 4 months, so it has been marked as stale and it will be closed automatically if no further activity occurs in the next 30 days. |
@sbidoul any feedback? |
b65bbc6
to
ffb27a4
Compare
Feeback given in #673 (comment). And rebased. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code LGTM.
I'm going to install it on one of my projects and battle test it.
sorry, why is this not merged yet? |
I'd like to run this on my staging and production GKE cluster. I would especially like to test the scaling capabilities of this in my staging environment. If I deploy this to my staging env, would either of you like the keys to my staging env and the GKE staging cluster to kick the tires and load test this with K6 or similar tools? I would love to see this merged, and would be happy to run this in production after some load testing in staging and report back on results or allow you to monitor. I can reach out to you via email to get this going through your company's official channels if this is something you'd like to explore. |
Hi everyone. This is not merged precisely because we would like more feedback from actual deployments. Tests are ongoing at Acsone, and I would encourage others to do the same. |
Thanks. I'll get this into staging and then production and report back with findings. |
@sbidoul I'm pulling this into our staging environment now, but if all goes well I do plan to run this in production over a few weeks and report back if I encounter issues. |
It has been deployed in production for almost 3 weeks and I haven't any issue to report |
Without this, we leak connections to Databases that don't have queue_job installed.
Without this we risk connection leaks in case of exceptions in the constructor.
ffb27a4
to
2631808
Compare
@luke-stdev001 yes it should be safe. I just rebased. |
Thank you.
@sbidoul , Seems to be working well from initial load testing in staging, thank you. I'd like to rearchitect our GKE based HA deployment of Odoo:
My understanding is that with the DB managing leader election it should be perfectly acceptable to have a dedicated auto-scaling pool of queue job only instances for distributing jobs to, that can scale up/down with demand, while leaving user-facing instances unaffected performance-wise. If you wouldn't mind could you confirm if that assumption is correct? My apologies if there are any fundamental misunderstandings on my side to how this works. Once i've had a week to toy with the concept in staging i'll deploy to production and advise on progress. |
You can do that yes. I'm curious about the metrics you plan to use for auto scaling. Note this was already feasible without this PR, with a single dedicated pod with |
Thanks, I wasn't aware of that ability previously, i'll take a look. To be perfectly honest when it comes to auto-scaling metrics we will be figuring it out as we go along and playing with what works and learning what doesn't. I'm happy to report back here with our own internal notes and would love to hear from anyone else who has any advice. I am thinking at present perhaps WSGI request queue length, request rate, request duration latency, ratio of busy workers to total number of workers, and then DB connection pool saturation. I'm happy to report back that this PR is working fine in production, and has been for a few days. I will monitor closely for issues, but so far I have not encountered any hiccups. |
Another attempt.
closes #422