Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[14.0] queue_job: add mixin to link jobs #573

Closed
wants to merge 1 commit into from

Conversation

simahawk
Copy link
Contributor

@simahawk simahawk commented Nov 7, 2023

You can easily link records to their related jobs

TODO

  • write tests
  • decide if we keep it in queue_job or should be moved to an extension

You can easily link jobs to their related records.
@OCA-git-bot
Copy link
Contributor

Hi @guewen,
some modules you are maintaining are being modified, check this out!

@simahawk
Copy link
Contributor Author

simahawk commented Nov 7, 2023

decide if we keep it in queue_job or should be moved to an extension

@guewen WDYT?

@simahawk simahawk mentioned this pull request Nov 7, 2023
2 tasks
@guewen
Copy link
Member

guewen commented Nov 7, 2023

Hey @simahawk,

You can easily link jobs to their related records.

This is actually doing the reverse: it links records to jobs. This is something I'd heavily discourage to do (and already did in addons reviews). Adding foreign keys on tables pointing to queue_job can quite hinder performance, especially if FK relations have no index.

A job may contain metadata pointing back to records, but I'd prefer to think the jobs being only "messages" like you'd have if jobs where in a Redis queue or else.

Now, if the need is about "recording" if a job has already been generated / cancelling it on actions etc., it can be handled using idempotent jobs (e.g. the job cancels itself if the action is no longer required).
If the need is about UX, and you want to show jobs to users, maybe we should rather find a way to provide feedbacks (send job status on UI?) to users?

@simahawk
Copy link
Contributor Author

@guewen thanks for your feedback.
The main goal for me is improve UI. When you have tons of jobs for the same kind of records is really handy to be able to jump from those records to their jobs, no matter the state. Is nothing functional. Is really: "Show me all jobs that are directly linked to this record".
We could also find them on the fly but since the metadata stored is not easily "selectable" I'm not sure we gain anything in terms of perf. Regarding this, I'd rather add indexes.
However, the goal is not to display related jobs inside a view. Is to have an action to open a tree view on related jobs.
Having a m2m that is loaded via action does not sound like something that kills perf. No?

@guewen
Copy link
Member

guewen commented Nov 15, 2023

I'm speaking of inserting and deleting rows in the queue_job table, which is the requirement number one and has to be as fast as possible. Operations on the table must be as fast as possible. Each index slows down operations, and conversely, FK from other tables referencing queue_job have checks to do (e.g. when you delete a job, it has to do a select on every referencing table to check cascading constraints).

@simahawk
Copy link
Contributor Author

Oh, I see, you are right...
Then I'll check how is possible to find them w/o touching their creation.
Do you see any better option other than searching in the serialized field?

@guewen
Copy link
Member

guewen commented Nov 15, 2023

Another concern: it (m2o or m2m) is meant to be a link from other records to show jobs (other queues using redis or whatever do not have this and are fine without btw), but then people will start to use some logic such as checking if a job exists, cancel it etc. And then you start having jobs locked by other "long" transactions that took locks / updates them when a user clicks on a button. It's really an anti-pattern when dealing with jobs, and oh boy have I seen problems with that. They must be idempotent and fire and forget.
This change would probably encourage bad usage of jobs.

@simahawk
Copy link
Contributor Author

@guewen maybe it's time to convert the records field to jsonb so that we can query it easily?

They must be idempotent and fire and forget.

Yes... and no. 1st of all is nice to be able to jump to specific records' jobs from their form view. Really a time saver.
2nd but most important: sometimes you want to be able to pick the current job to check its state and do something based on this. Eg: do not schedule another job or chain a new job to the existing ones (eg: to avoid sending out new info before the previous one was sent). You could use a channel w/ capacity one but is not the same thing because it's also hard to scale. I hope is clear what I'm trying to address :)

@guewen
Copy link
Member

guewen commented Nov 29, 2023

2nd but most important: sometimes you want to be able to pick the current job to check its state and do something based on this. Eg: do not schedule another job or chain a new job to the existing ones (eg: to avoid sending out new info before the previous one was sent). You could use a channel w/ capacity one but is not the same thing because it's also hard to scale. I hope is clear what I'm trying to address

Well, the thing is that is generally considered a bad pattern in queues as you shouldn't rely on the state of others jobs. You should not need to care about 'not schedule another job' : you should enqueue it anyway and the job itself has to verify if it has to actually do something or not.

I get what you are trying to address but IMO (at least on the 2nd point), that should never be necessary if the jobs are well designed and themselves check the outside state (sometimes it may require adding a state/flag on the model on which work is done).

Same for this example:

eg: to avoid sending out new info before the previous one was sent

Store the fact that the info was sent (using steps, state machine, you name it) on the concerned record, but do not use the job's state / existence to track the current "state of the world".

A job is an asynchronous execution of a function, that's all there should be about it. If you were to use actual asynchronous python functions, you would make them store the state needed to be idempotent and be able to recover in case of reboot.

I'm convinced designing jobs this way makes their usage more robust and resilient. Also more understandable (idempotency handling always in the job, not in the callers).

@guewen
Copy link
Member

guewen commented Nov 29, 2023

As for using jsonb, yeah it makes sense in any case, but I would still discourage messing with the internals 😉

Copy link

There hasn't been any activity on this pull request in the past 4 months, so it has been marked as stale and it will be closed automatically if no further activity occurs in the next 30 days.
If you want this PR to never become stale, please ask a PSC member to apply the "no stale" label.

@github-actions github-actions bot added the stale PR/Issue without recent activity, it'll be soon closed automatically. label Mar 31, 2024
@github-actions github-actions bot closed this May 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale PR/Issue without recent activity, it'll be soon closed automatically.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants