Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for adopting orphaned task instances #15

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

mattellis
Copy link

@mattellis mattellis commented May 12, 2021

Addresses this issue: #14

Adds optional support for "adopting orphaned task instances" within the Batch and Fargate executors. This means that instead of terminating batch jobs / fargate tasks when the scheduler / executor are shutting down, they instead leave the tasks running. When a new scheduler / executor boots up, it will try to "adopt" the orphaned tasks by using the external_executor_id of the orphan task instances to resume synchronising respective task statuses from Batch / Fargate.

Airflow documentation on this behaviour is limited, though there is some basic context in the scheduler "tunables" doc:
https://airflow.apache.org/docs/apache-airflow/stable/scheduler.html#scheduler-tuneables

This feature is disabled by default, but can be enabled by setting the following conf option in either executor:

[batch | ecs_fargate]
...
adopt_task_instances = True

or by env var:

AIRFLOW__BATCH__ADOPT_TASK_INSTANCES=True
AIRFLOW__ECS_FARGATE__ADOPT_TASK_INSTANCES=True

In order to support adoption of orphaned tasks, the BatchExecutor just needs to store the AWS Batch job_id in the TaskInstance.external_executor_id field when it submits a job, and then implement the BaseExecutor.try_adopt_task_instances method. This method simply needs to put the orphaned task instance key and external_executor_id attributes in the active_workers.add_job method of the newly booted executor.

The Fargate executor can support task adoption with the exact same flow, by storing the Fargate task_arn field in the external_executor_id. The Fargate executor needs to make a call to describe_tasks() in the try_adopt_task_instances method (using the orphaned task arns), in order to get the full Fargate task attributes required in its active_workers collection.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant