Combine all schedulers for all organisations #3838

jpbruinsslot · 2024-11-13T16:37:18Z

Proposal

The current situation is that we create individual 'schedulers' for every organisation, this amounts to 3 schedulers for every organisation.

The addition of queues in the scheduler being persisted as a database table allows us to explore the possibility of running a dedicated schedulers for all organisations.

Create one BoefjeScheduler, NormalizerScheduler and ReportScheduler for all organisations instead of individual schedulers for every organisation. One message queue for all scan profile mutations to which scan profile mutations are posted of all organisations for the BoefjeScheduler, and one message queue for raw file creation of all organisations for the NormalizerScheduler, the ReportScheduler will reference its internal database table.

Advantages

Less overhead of creating and removing schedulers for every organisation
Stops continuous checking of available organisations
Allows to replicate scheduler application instances when used with a blocking messaging queue
Additionally we need to support popping multiple tasks from the endpoint

Disadvantages

the allow_updates, allow_replace, allow_priority_updates for individual organisations can't be used, however this function doesn't seem to be used and is for every type scheduler the same
we should investigate how we solve when organisations wan to disable a scheduler, and even if this has a use-case

Impact

now we have several different rabbitmq messaging queues for every organisation, this needs to be reduced to one
organisation ids need to be passed in the messages on the queue
changes to task runner to pop off tasks
rocky likely needs to be update to interface with changes from the scheduler API

Next steps and impact

Pop endpoint has changed from /queues to /schedulers/{id}/pop , additionally it will return a paginated result instead of a single Task , this is because the pop endpoint now supports filtering with multiple tasks returns. Services that rely on the scheduler pop endpoint need to update their interfaces (Update services that rely on /pop endpoint of scheduler #3961)
Push endpoint changed from /queues to schedulers/{id}/push, services that interface with the push endpoints (rocky) need to update their interfaces. (Update services that rely on /push endpoint of scheduler #3962)
scan profile mutation message queue, for every organisation a message queue is created for scan profile mutations, this needs to be updated and all scan profile mutations for every organisation needs to be relayed on a single scan profile mutations message queue (Combine organisational scan profile mutation message queues #3963)
raw data file received message queue (the as with scan profile mutations) (Combine raw data recieved message queues #3964)
Model definitions updates: organisation fields are added to Task , Schedule , services using these models need to update their specifications. (Model definition updates: addition of organisation field to Task and Schedule models #3965)
Batch status updates. Several places in the scheduler we can consider batch updating status field of task. Potentially exposing this as a endpoint. This because the task runner will be able to pop off multiple task from a scheduler, it might therefore be beneficial to batch update the tasks. (
Deleting of organisations needs to be addressed, and what the protocol needs to be with regards of queued tasks, and schedules. (Discussion: how do we handle the deletion of organisations? #3966)

PR

#3839

The text was updated successfully, but these errors were encountered:

jpbruinsslot added mula Issues related to the scheduler discussion scalability labels Nov 13, 2024

jpbruinsslot assigned jpbruinsslot and Donnype Nov 13, 2024

jpbruinsslot added this to KAT Nov 13, 2024

github-project-automation bot moved this to Incoming features / Need assessment in KAT Nov 13, 2024

jpbruinsslot moved this from Incoming features / Need assessment to To be discussed in KAT Nov 13, 2024

jpbruinsslot mentioned this issue Nov 18, 2024

Refactor in-memory schedulers to postgresql table #3358

Closed

jpbruinsslot linked a pull request Nov 21, 2024 that will close this issue

Combined schedulers #3839

Open

9 tasks

jpbruinsslot changed the title ~~One scheduler type for all organisations~~ Combine all schedulers for all organisations Nov 21, 2024

Donnype moved this from To be discussed to Backlog / To do in KAT Dec 2, 2024

jpbruinsslot moved this from Backlog / To do to In Progress in KAT Dec 3, 2024

Donnype removed their assignment Dec 3, 2024

jpbruinsslot moved this from In Progress to Review in KAT Dec 12, 2024

madelondohmen assigned ammar92 and unassigned jpbruinsslot Dec 17, 2024

ammar92 assigned jpbruinsslot and unassigned ammar92 Dec 18, 2024

madelondohmen assigned ammar92 Dec 19, 2024

jpbruinsslot moved this from Review to In Progress in KAT Dec 30, 2024

jpbruinsslot moved this from In Progress to Review in KAT Jan 6, 2025

jpbruinsslot moved this from Review to Blocked in KAT Jan 6, 2025

This was referenced Jan 8, 2025

Boefjes combined schedulers integration #4015

Open

Octopoes combined schedulers integration #4016

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Combine all schedulers for all organisations #3838

Combine all schedulers for all organisations #3838

jpbruinsslot commented Nov 13, 2024 •

edited

Loading

Combine all schedulers for all organisations #3838

Combine all schedulers for all organisations #3838

Comments

jpbruinsslot commented Nov 13, 2024 • edited Loading

Proposal

Advantages

Disadvantages

Impact

Next steps and impact

PR

jpbruinsslot commented Nov 13, 2024 •

edited

Loading