Compare Various Workflow Automation Solutions for potential integration into Nebari #1098

Adam-D-Lewis · 2022-02-22T19:26:17Z

Feature description

Currently long running computations require a browser window to be kept open for the duration of the computation. Some prototype work has been done to enable ‘background’ and ‘batch’ processes to be run on QHub.

Value and/or benefit

This feature would enhance this and make it easily accessible to scientists and engineers using the platform.

Anything else?

No response

Adam-D-Lewis · 2022-03-22T22:25:47Z

There are a few possible options that I've considered: jupyterhub-ssh, kbatch, or just let whatever solution we come up with #1100 and/or #1099 handle this as well.

Kbatch and jupyterhub-ssh could both solve this potentially. Kbatch has the advantage that it can run on custom docker images, allowing users access to dependencies that aren't available through conda, but the user does not have access to all the jupyter user's files. Instead, kbatch has a limitation which only allows you to pass in a single file or directory only up to 1 MiB in size (uses ConfigMap under the hood).

Jupyterhub-ssh is simpler, works with no additional dependencies, and allows access to all of the jupyter user's files by default so seems preferred to me. Both Kbatch and jupyterhub-ssh can allow users access to the conda envs in conda-store, and both could run notebooks via papermill. Neither option currently allows the user to choose what instance size they run on, and it's not clear to me whether users could still use dask-gateway with Kbatch (maybe a permissions issue?) since Kbatch runs the job as a separate pod. It's still not clear to me if the ssh-jupyter sessions would be closed after the job was finished when using jupyterhub-ssh, so that maybe something to look into still.

How jupyterhub-ssh would work currently

Create Jupyter API token
Open terminal (even possible from within Qhub's Jupyterlab session)
run ssh -o User=<username> <qhub-url> -p 8022 and enter token as password
run nohup <my-command> &
- this logs stdout to nohup.log and job will continue to run even if ssh terminal is closed.

The above isn't too complex, but it might be nice to wrap that in a thin CLI tool similar to kbatch's cli tool. I'm not particular on the name, but let's say it's "qrunner" for the sake of this example. The user could pip install qrunner, then do something like

qrunner configure --url="<qhub-url>" --token="<JUPYTERHUB_TOKEN>"
qrunner run <my-command> --output my-command-output.log --conda-env my-env
which simply runs the command prefaced by nohup conda run -n my-env, and directs the stdout to my-command-output.log.

How kbatch would work currently

create Jupyterhub API token
Configure kbatch
- kbatch configure --kbatch-url="https://url-to-kbatch-server" --token="<JUPYTERHUB_TOKEN>"
Do either step 1 or 2 below:
1. go to CLI and write something like
kbatch job submit --name=test \ --image="<conda-store-docker-image-url>" \ --command='["papermill", "notebook.ipynb"]' \ --file=notebook.ipynb
2a. write a yaml job file
```
 # my-job.yaml
 name: "my-job"
 command:
   - sh
   - script.sh
 image: <conda-store-docker-image-url>
 code: "script.sh"
```
2b. then submit it
- batch job submit -f my-job.yaml
View status/logs as needed
- kbatch job show "<job-id>
- kbatch job logs "<pod-id>" (have to get pod id first)

Ideal Solution Attributes

Regardless of what solution we use, I believe the ideal solution would have the following attributes:

Can be used to run long-running notebooks or .py files without a window needing to stay up
integrates with Keycloak permissioning
integrates with Conda store
allows picking pod sizes from those specified in options form page
User has access to logs/result
very simple to use
easy to implement/maintain

Adam-D-Lewis · 2022-03-28T18:20:28Z

Other options to consider:

Adam-D-Lewis · 2022-03-29T21:48:58Z

Yason

last non-trivial commit was May 31, 2019
looked similar to jupyterflow, but Yason only runs notebooks so I think we'd prefer Jupyterflow over Yason.

Adam-D-Lewis · 2022-03-29T21:53:15Z

Kedro

very popular, 6800 stars on Github
Active discord community
wide integration
- integration with Grafana
- can deploy with argo, prefect, kubeflow, aws batch, and databricks, airflow, spark
came out of McKinsey's QuantumBlack Labs
used for data pipelines
- focused on data science, not necessarily general purpose workflows

Kedro Features

manage datasources (somewhat similar to intake)
overly complex workflows (for simple dags)
wrapper over environment manager
- kedro install
kind of like MLFlow
can run pipeline nodes in containers (kedro-docker)

It seems like Kedro could technically act as a workflow manager, but it's very data science use case focused, and using it as a general purpose workflow engine would likely require us to shoehorn our needs into their existing structure leading to a bad user experience. I'd see Kedro as useful during data science projects, but not as a general workflow manager.

Adam-D-Lewis · 2022-03-29T21:55:45Z

Jupyterflow

CLI tool that will launch an environment similar to jupyter user pod via an Argo workflow
Can specify simple dependencies (a bit clunky, but works)
Seems stagnant for a year (last commit Mar 1, 2021)
Can schedule workflows via cron syntax in workflow file
Has option to override cpu, memory, nodeSelector, etc.
Uses same image as jupyteruser by default, so we'd need to override either:
- 1) volume mount (include conda envs mount also) (actually this might be okay by default, I'd need to check)
- 2) container image (include conda store image)
Some work required to expose Argo Workflows Web UI?
No reporting except what you get by default with Argo Workflows

Jupyterflow Example Usage

# workflow.yaml
jobs:
- conda run -n myenv papermill input.ipynb              # 1
- conda run -n myenv python train.py softmax 0.5        # 2
- conda run -n myenv python train.py softmax 0.9        # 3
- conda run -n myenv python train.py relu 0.5           # 4
- conda run -n myenv python train.py relu 0.9           # 5
- conda run -n myenv python output.py                   # 6

# Job index starts at 1.
dags:
- 1 >> 2
- 1 >> 3
- 1 >> 4
- 1 >> 5
- 2 >> 6
- 3 >> 6
- 4 >> 6
- 5 >> 6

then jupyterflow run -f workflow.yaml

I like jupyterflow for it's simplicity. It seems to make some reasonable assumptions (image to use, volumes to mount) which make it easy for users not familiar with Kubernetes to define and run workflows. We could likely add the functionality to either launch in the conda-store image by default or use conda run -n without the user needing to specify it. We could also add the ability to transfer over env vars to the workflow by default as well. It also supports scheduling of workflows (cron). However, more complex workflows may require a different tool. I'm also not familiar with what the reporting capabilities
of Argo Workflows look like which is the only reporting/observability solution (by default) for this.

Perhaps, creating some way to make similar assumptions, but using a more fully featured tool could also be an option if preferred over juptyerflow.

Adam-D-Lewis · 2022-03-29T22:13:27Z

Hera / Couler

These are unrelated, but similar projects that are both like a python SDK for Argo Workflows.
They both seem a bit young, without great documentation, so less experienced users may have a tough time using them.
I'd like to look more into these though, as I think they could potentially fit for our use case.

iameskild · 2022-03-30T01:01:36Z

Argo Workflow

I played around with Argo Workflow today and got a few sample workflows to run using the argo CLI. This was fairly trivial once you have a Kubernetes cluster up and running (I was doing so on QHub deployed on Minikube).

Working with Argo Workflow requires an argo-server running on the cluster (installed from a kubectl apply command), and then to interact with it, you'll need the aforementioned argo CLI. Argo does seem to have a argo-helm repo which might be useful if/when we want to integrate it into QHub.

From skimming the docs for many of the tools listed above, it seems like many of them either require or will play nicely with Argo Worklow.

Kedro has a page dedicated to "Deployments with Argo Workflow"
Yason is tool that requires either Papermill or Argo Workflow, see prerequisites.
Jupyterflow is another tool that seems to require Argo Workflow, see install docs.

The gap that exists with Argo Workflow is how to enable users to launch these workflows from JupyterLab. Yason or Jupyterflow might be possible solutions. My main concern around these two tools is that they both seem to be maintained by individuals.

In the same vain as Hera, Argo Workflow seems to have an existing Python SDK.

Adam-D-Lewis · 2022-03-31T14:22:23Z

I'm curious to learn more about the visualizations/reporting in Argo Workflows. I'm also not clear on how authentication/authorization would work. Maybe we don't need to worry about authentication/authorization just yet though.

dharhas · 2022-03-31T17:13:01Z

So it sounds like Argo is a strong contender for the base layer of our integrated workflow solution and then on top of it we could potentially have multiple client side tools leveraging it.

something simple like reviving/forking jupyterflow and connecting it to conda-store for environments
papermill
other tools like mentioned above

@trallard

trallard · 2022-04-02T09:55:19Z

Argo is a really versatile orchestrator engine - not only it integrates well with other pipeline/Ml tools but opens up loads of possibilities for CI driven ML workflows. I think is it a good bet in terms of flexibility and extensibility for Qhub and its users

trallard · 2022-09-22T12:35:17Z

@dharhas @Adam-D-Lewis are we planning to explore more options?

dharhas · 2022-09-22T12:43:20Z

Well, I don't think we need to explore more options per se but the current integrations are not fully complete. i.e.

kbatch - this is integrated and works but requires specifying a docker image and also does not have the user volumes mounted so it isn't very straightforward to use
argo workflows - this is integrated on the backend but we do not yet understand the best way to use it from user space (i.e python or jupyter)

The above should probably opened as new issues and this can be closed.

iameskild · 2023-05-29T16:45:45Z

Argo-Workflows has been integrated. This can be closed 🎉

Adam-D-Lewis added the type: enhancement 💅🏼 New feature or request label Feb 22, 2022

trallard added this to QHub Project Mangement 🚀 Feb 25, 2022

trallard moved this to Needs Triage 🔍 in QHub Project Mangement 🚀 Feb 25, 2022

trallard moved this from Needs Triage 🔍 to Backlog 🏁 in QHub Project Mangement 🚀 Feb 25, 2022

Adam-D-Lewis changed the title ~~[ENH] - Enable long running computations without requiring a browser window to be kept open~~ Compare Various Workflow Automation Solutions for potential integration into QHub Apr 6, 2022

Adam-D-Lewis mentioned this issue Apr 6, 2022

[ENH] - Enable long running computations without requiring a browser window to be kept open #1231

Closed

iameskild changed the title ~~Compare Various Workflow Automation Solutions for potential integration into QHub~~ Compare Various Workflow Automation Solutions for potential integration into Nebari May 29, 2023

github-project-automation bot added this to 🪴 Nebari Project Management May 29, 2023

github-project-automation bot moved this to New 📬 in 🪴 Nebari Project Management May 29, 2023

iameskild closed this as completed May 29, 2023

github-project-automation bot moved this from Backlog 🏁 to Done 💪🏾 in QHub Project Mangement 🚀 May 29, 2023

github-project-automation bot moved this from New 📬 to Done 💪🏾 in 🪴 Nebari Project Management May 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compare Various Workflow Automation Solutions for potential integration into Nebari #1098

Compare Various Workflow Automation Solutions for potential integration into Nebari #1098

Adam-D-Lewis commented Feb 22, 2022

Adam-D-Lewis commented Mar 22, 2022 •

edited

Loading

Adam-D-Lewis commented Mar 28, 2022 •

edited

Loading

Adam-D-Lewis commented Mar 29, 2022

Adam-D-Lewis commented Mar 29, 2022 •

edited

Loading

Adam-D-Lewis commented Mar 29, 2022 •

edited

Loading

Adam-D-Lewis commented Mar 29, 2022 •

edited

Loading

iameskild commented Mar 30, 2022

Adam-D-Lewis commented Mar 31, 2022 •

edited

Loading

dharhas commented Mar 31, 2022

trallard commented Apr 2, 2022

trallard commented Sep 22, 2022

dharhas commented Sep 22, 2022

iameskild commented May 29, 2023

Compare Various Workflow Automation Solutions for potential integration into Nebari #1098

Compare Various Workflow Automation Solutions for potential integration into Nebari #1098

Comments

Adam-D-Lewis commented Feb 22, 2022

Feature description

Value and/or benefit

Anything else?

Adam-D-Lewis commented Mar 22, 2022 • edited Loading

How jupyterhub-ssh would work currently

How kbatch would work currently

Ideal Solution Attributes

Adam-D-Lewis commented Mar 28, 2022 • edited Loading

Adam-D-Lewis commented Mar 29, 2022

Yason

Adam-D-Lewis commented Mar 29, 2022 • edited Loading

Kedro

Kedro Features

Adam-D-Lewis commented Mar 29, 2022 • edited Loading

Jupyterflow

Jupyterflow Example Usage

Adam-D-Lewis commented Mar 29, 2022 • edited Loading

Hera / Couler

iameskild commented Mar 30, 2022

Argo Workflow

Adam-D-Lewis commented Mar 31, 2022 • edited Loading

dharhas commented Mar 31, 2022

trallard commented Apr 2, 2022

trallard commented Sep 22, 2022

dharhas commented Sep 22, 2022

iameskild commented May 29, 2023

Adam-D-Lewis commented Mar 22, 2022 •

edited

Loading

Adam-D-Lewis commented Mar 28, 2022 •

edited

Loading

Adam-D-Lewis commented Mar 29, 2022 •

edited

Loading

Adam-D-Lewis commented Mar 29, 2022 •

edited

Loading

Adam-D-Lewis commented Mar 29, 2022 •

edited

Loading

Adam-D-Lewis commented Mar 31, 2022 •

edited

Loading