Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deploy Airflow on Cloud Foundry using pip #4434

Closed
3 of 7 tasks
robert-bryson opened this issue Aug 25, 2023 · 5 comments
Closed
3 of 7 tasks

Deploy Airflow on Cloud Foundry using pip #4434

robert-bryson opened this issue Aug 25, 2023 · 5 comments
Assignees

Comments

@robert-bryson
Copy link
Contributor

robert-bryson commented Aug 25, 2023

User Story

In order to [use Airflow], [datagov team] wants [to ensure that it can be deployed on the existing cloud foundry infrastructure].

Acceptance Criteria

[ACs should be clearly demoable/verifiable whenever possible. Try specifying them using BDD.]

  • GIVEN [an Airflow repo]
    WHEN [a cf push] happens
    THEN [a successful deployment is made]

Background

[Any helpful contextual notes or links to artifacts/evidence, if needed]

Security Considerations (required)

[Any security concerns that might be implicated in the change. "None" is OK, just be explicit here!]

Sketch

  • Attempt simple deployment to cloud.gov, using postgreSQL instance
  • Attempt configuring 3 apps in cloud.gov: a UI, a timer, and a worker (which may have 2 instances).
  • Move code to new https://github.com/GSA/datagov-harvester repo
  • Configure Celery queue backend
  • Confirm DAG able to complete using above infrastructure
  • Confirm various features (debug, logs, workflow view, etc) are working and useful

If any of the above take longer than 3 days to complete, document blockers and status and move on where we can.

Potential additional work;

  • wire up github actions
  • experiment with dockerized deploys
@robert-bryson
Copy link
Contributor Author

Image

!!!

@robert-bryson
Copy link
Contributor Author

Where we are on this:

  • Docker/pip deploys both work if needed, but using python buildpack is probably best option
  • Airflow standalone can be deployed with with manifest-standalone.yml and is suitable for development/testing.
  • Airflow multi is mostly there, but a number of issues remain: setting up celery/flower, logging issue, need to figure out auth, etc.

@robert-bryson
Copy link
Contributor Author

I think for the first time I can get everything to be up via a cf push:

Image

And even mostly healthy:

Image

Although the CeleryExecutor isn't yet wired up correctly:

Image

@robert-bryson
Copy link
Contributor Author

Well, it ain't much but I did finally get the dag-processor to show healthy:

Image

@robert-bryson
Copy link
Contributor Author

robert-bryson commented Sep 26, 2023

This spike is wildly overdue so I will close it at the current state and write up additional tickets for the remaining issues with the deployment. Overall a proof of concept for deploying Airflow on cloud.gov was successful though with challenges. I think with the feeling that we may explore other deployment options (k8s, rabbitmq, etc) this work should be paused.

Done:

  • Airflow standalone on cf
  • Airflow standalone docker on cf
  • Airflow multi-tenant deployment on cf
    • Working webserver/metastore/scheduler/dag processor/etc

Remains:

@github-project-automation github-project-automation bot moved this from 🏗 In Progress [8] to ✔ Done in data.gov team board Sep 26, 2023
@hkdctol hkdctol added the H2.0/Harvest-General General Harvesting 2.0 Issues label Sep 29, 2023
@hkdctol hkdctol moved this from ✔ Done to 🗄 Closed in data.gov team board Sep 29, 2023
@btylerburton btylerburton added H2.0/orchestrator and removed H2.0/Harvest-General General Harvesting 2.0 Issues labels Dec 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

3 participants