-
Notifications
You must be signed in to change notification settings - Fork 479
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support loading DAG definitions from S3 buckets #249
Comments
hey guys, until we have it as native solution i created a sidecar container for syncing dags from aws s3 |
Hi @thesuperzapper , I started working on this and implementing kind of similar to syncing dags from git as you mentioned - My approach is that we can use rclone sync "running as k8s job" to fetch data from s3 bucket containing the dags and store these dags in a mount volume , that volume is also mounted to AF scheduler pod - Should I continue implementing that ? Best Regards, |
i have a better solution but you have to configure pvc for the dag bag folder /opt/airflow/dags after the pvc is ready you just need to create a cronejob that run every X min and sync 2 ways from s3
|
Not a bad idea, I'd also add that if you want the GitOps approach, you can disable the schedule via |
I just want to say that while baked-in support for Now you can effectively do what was proposed in #828, by using the following values:
If someone wants to share their values and report how well it works, I am sure that would help others. PS: You can still use a PVC-based approach, where you have a Deployment (or CronJob) that syncs your S3 bucket into that PVC as described in #249 (comment) |
Hi, I'm using KubernetesExecutor and my extra container gets stuck and doesn't let the executor pod finish. Any tips on what to do? |
https://janetvn.medium.com/s3-sync-sidecar-to-continuously-deploy-dags-for-airflow-running-on-kubernetes-ab4d417dd8e6 I'm trying to do the same thing but follow this link. I'd like to see Airflow can support this officially |
Currently we support git-sync with the
dags.gitSync.*
values, but we can probably do something similar for S3 buckets. That is, let people store their dags in a folder on an S3 bucket.Possibly we should generalise this to include GCS and ABS, but these probably have different libraries needed to do the sync (so might need to be separate features/containers). However, clearly S3 is the best place to start, as it's the most popular.
The text was updated successfully, but these errors were encountered: