Add support for adopting orphaned task instances #15
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Addresses this issue: #14
Adds optional support for "adopting orphaned task instances" within the Batch and Fargate executors. This means that instead of terminating batch jobs / fargate tasks when the scheduler / executor are shutting down, they instead leave the tasks running. When a new scheduler / executor boots up, it will try to "adopt" the orphaned tasks by using the
external_executor_id
of the orphan task instances to resume synchronising respective task statuses from Batch / Fargate.Airflow documentation on this behaviour is limited, though there is some basic context in the scheduler "tunables" doc:
https://airflow.apache.org/docs/apache-airflow/stable/scheduler.html#scheduler-tuneables
This feature is disabled by default, but can be enabled by setting the following conf option in either executor:
or by env var:
In order to support adoption of orphaned tasks, the BatchExecutor just needs to store the AWS Batch
job_id
in theTaskInstance.external_executor_id
field when it submits a job, and then implement theBaseExecutor.try_adopt_task_instances
method. This method simply needs to put the orphaned task instance key and external_executor_id attributes in theactive_workers.add_job
method of the newly booted executor.The Fargate executor can support task adoption with the exact same flow, by storing the Fargate
task_arn
field in the external_executor_id. The Fargate executor needs to make a call todescribe_tasks()
in thetry_adopt_task_instances
method (using the orphaned task arns), in order to get the full Fargate task attributes required in itsactive_workers
collection.