Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement centralized scheduling for recurring tasks #553

Closed
nscuro opened this issue May 15, 2023 · 4 comments
Closed

Implement centralized scheduling for recurring tasks #553

nscuro opened this issue May 15, 2023 · 4 comments
Assignees
Labels
architecture component/api-server p1 Critical bugs that prevent DT from being used, or features that must be implemented ASAP size/L High effort

Comments

@nscuro
Copy link
Member

nscuro commented May 15, 2023

Recurring tasks are currently scheduled in every instance of the API server, in TaskScheduler, using simple Timers behind the scenes.

We need to scale the API server to multiple instances (#375), but this will be problematic when every instance schedules redundant tasks. As per #375 (comment), as a short- to mid-term solution, tasks should be triggerable via REST API endpoint.

Once the endpoints are available, tasks can be scheduled via CronJob. The CronJobs merely curl an API server endpoint to trigger tasks. Requests are made against the k8s service of the API server.

CronJob supports various settings for retry, backoff and timeouts. To leverage these functionalities, API endpoints must block as long as the respective task is running.

To prevent certain heavy tasks (e.g. portfolio metrics update) from being executed concurrently, a locking mechanism must be used. Usage of https://github.com/lukas-krecan/ShedLock should be investigated.

@nscuro nscuro added this to Hyades Jun 29, 2023
@nscuro nscuro moved this to In Progress in Hyades Jun 29, 2023
@nscuro nscuro added the p1 Critical bugs that prevent DT from being used, or features that must be implemented ASAP label Jun 29, 2023
@nscuro
Copy link
Member Author

nscuro commented Jun 30, 2023

There's a working implementation for locking the portfolio metrics update task. We'll get that merged and then create new issues for the remaining work. In the meantime the new implementation can be thoroughly tested on test environments.

@mehab
Copy link
Collaborator

mehab commented Jul 7, 2023

Remaining tasks

  • Use cron expression instead of fixed delay scheduling until kubernetes cron jobs are used.
  • Make sure end to end test are unaffected even with multiple instances of apiserver
  • Should not lead to performance degradation overall

@mehab
Copy link
Collaborator

mehab commented Jul 21, 2023

Currently under test and needs to be observed for a few days before confirming and closing the issue

@nscuro
Copy link
Member Author

nscuro commented Jul 28, 2023

Implemented in:

And tested for multiple days in a test environment. Singleton tasks are only ever executed by one instance.

Thanks @VithikaS! 🙌

@nscuro nscuro closed this as completed Jul 28, 2023
@github-project-automation github-project-automation bot moved this from In Progress to Done in Hyades Jul 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
architecture component/api-server p1 Critical bugs that prevent DT from being used, or features that must be implemented ASAP size/L High effort
Projects
Archived in project
Development

No branches or pull requests

3 participants