Data Science Pipelines is the Open Data Hub's pipeline solution for data scientists. It is built on top of the upstream Kubeflow Piplines and kfp-tekton projects. The Open Data Hub community has a fork of this upstream under the Open Data Hub org.
- The cluster needs to be OpenShift 4.9 or higher
- OpenShift Pipelines 1.7.2 or higher needs to be installed on the cluster
- The Open Data Hub operator needs to be installed
- The default installation namespace for Data Science Pipelines is
odh-applications
. This namespace will need to be created. In case you wish to install in a custom location, create it and update the kfdef as documented below.
-
Ensure that the prerequisites are met.
-
Apply the kfdef at kfctl_openshift_ds-pipelines.yaml. You may need to update the
namespace
field undermetadata
in case you want to deploy in a namespace that isn'todh-applications
. -
To find the url for Data Science pipelines, you can run the following command.
$ oc get route -n <kdef_namespace> ds-pipeline-ui -o jsonpath='{.spec.host}'
The value of
<kfdef_namespace>
should match the namespace field of the kfdef that you applied. -
Alternatively, you can access the route via the console. To do so:
- Go to
<kfdef_namespace>
- Click on
Networking
in the sidebar on the left side. - Click on
Routes
. It will take you to a new page in the console. - Click the url under the
Location
column for the row item matchingds-pipeline-ui
- Go to
This directory contains artifacts for deploying all backend components of Data Science Pipelines. This deployment currently includes the kfp-tekton backend as well as a Minio deployment to act as an object store. The Minio deployment will be moved to an overlay at some point in the near future.
- metadata-store-mariadb: This overlay contains artifacts for deploying a MariaDB database. MySQL-based databases are currently the only supported backend for Data Science Pipelines, so if you don't have an existing MySQL database deployed, this overlay can be applied to satisfy the requirement.
- metadata-store-mysql: This overlay contains artifacts for deploying a MySQL database. MySQL-based databases are currently the only supported backend for Data Science Pipelines, so if you don't have an existing MySQL database deployed, this overlay can be applied to satisfy the requirement.
- metadata-store-postgresql: This overlay contains artifacts for deploying a PostgreSQL database. Data Science Pipelines does not currently support PostgreSQL as a backend, so deploying this overlay will not actually modify Data Science Pipelines behaviour.
- ds-pipeline-ui: This overlay contains deployment artifacts for the Data Science Pipelines UI. Deploying Data Science Pipelines without this overlay will result in only the backend artifacts being created.
- object-store-minio: This overlay contains artifacts for deploying Minio as the Object Store to store Pipelines artifacts.
- default-configs: This overlay creates ConfigMaps and Secrets with default values for a deployment with both a local MySQL database and Minio object store. Note: Using this overlay allows for a simple and quick setup, but also marks the configs as managed objects when used with the ODH Operator, which will reconcile any post-deployment changes made, and cannot be overridden.
- integration-odhdashboard: Adds resources required to integrate the Data Science Pipelines application into the ODH Dashboard UI, such as documentation and application launcher tiles.
- component-mlmd: Adds the ML-Metadata component which provides artifact lineage tracking in the UI.
This directory contains the service monitor definition for Data Science Pipelines. It is always deployed by base, so this will eventually be moved into the base directory itself.
You can customize the Data Science Pipelines deployment by injecting custom parameters to change the default deployment. The following parameters can be used:
- pipeline_install_configuration: The ConfigMap name that contains the values to install the Data Science Pipelines environment. This parameter defaults to
pipeline-install-config
and you can find an example in the repository. - ds_pipelines_configuration: The ConfigMap name that contains the values to integrate Data Science Pipelines with the underlying components (Database and Object Store). This parameter defaults to
kfp-tekton-config
and you can find an example in the repository. - database_secret: The secret that contains the credentials for the Data Science Pipelines Databse. It defaults to
mysql-secret
if using themetadata-store-mysql
overlay orpostgresql-secret
if using themetadata-store-postgresql
overlay. - ds_pipelines_ui_configuration: The ConfigMap that contains the values to customize UI. It defaults to
ds-pipeline-ui-configmap
.
- It is possible to configure what S3 storage is being used by Pipeline Runs. Detailed instructions on how to configure this will be added once Minio is moved to an overlay.
These instructions will be updated once Data Science Pipelines has a tile available in odh-dashboard
- Go to the ds-pipelines-ui route.
- Click on
Pipelines
on the left side. - There will be a
[Demo] flip-coin
Pipeline already available. Click on it. - Click on the blue
Create run
button towards the top of the screen. - You can leave all the fields untouched. If desired, you can create a new experiment to link the pipeline run to, or rename the run itself.
- Click on the blue
Start
button. - You will be taken to the
Runs
page. You will see a row matching theRun name
you previously picked. Click on theRun name
in that row. - Once the Pipeline is done running, you can see a graph of all the pods that were created as well as the paths that were followed.
- For further verification, you can view all the pods that were created as part of the Pipeline Run in the
<kfdef_namespace>
. They will all show up asCompleted
.
A complete architecture can be found at ODH Data Science Pipelines Architecture and Design. This document will be moved to GitHub once the corresponding ML Ops SIG repos are created.