Skip to content

Commit

Permalink
Jgpuga/mlops (#1167)
Browse files Browse the repository at this point in the history
* Update model-deployment.yaml.TEMPLATE

Fixed container

* Renamed model deployment pipeline

* Renamed model deployment pipeline

* Added BQ ML pipeline

* Added BQ ML pipeline

* Added BQ ML pipeline

* Update examples/vertex_mlops_enterprise/.github/workflows/deploy.yml.TEMPLATE

Co-authored-by: Sergio Vidiella Pinto <[email protected]>

* Update 01-ENVIRONMENTS.md

* Update pipeline.py

* Apply suggestions from code review

Co-authored-by: Sergio Vidiella Pinto <[email protected]>

---------

Co-authored-by: Andrew Gold <[email protected]>
Co-authored-by: Sergio Vidiella Pinto <[email protected]>
  • Loading branch information
3 people authored Oct 31, 2023
1 parent 47de5bd commit 1ed818b
Show file tree
Hide file tree
Showing 22 changed files with 1,221 additions and 148 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,27 @@ env:
WORKLOAD_ID_PROVIDER: ${wip}
CLOUDBUILD_LOGS: gs://${project_id}_cloudbuild/logs
jobs:
build-container-bqml:
name: 'Build container CI/CD BigQuery ML'
runs-on: 'ubuntu-latest'
steps:
- uses: 'actions/checkout@v3'
with:
token: $${{ github.token }}

- id: 'auth'
name: 'Authenticate to Google Cloud'
uses: 'google-github-actions/auth@v1'
with:
create_credentials_file: 'true'
workload_identity_provider: $${{ env.WORKLOAD_ID_PROVIDER }}
service_account: $${{ env.SERVICE_ACCOUNT }}
access_token_lifetime: 3600s

- name: 'Build container'
run: |
gcloud builds submit --gcs-log-dir=$${{ env.CLOUDBUILD_LOGS }} --project $${{ env.PROJECT_ID }} --region $${{ env.REGION }} --tag $${{ env.DOCKER_REPO }}/cicd-bqml:latest src/bqml_pipeline/. --timeout=15m --machine-type=e2-highcpu-8 --suppress-logs

build-container-cicd-tfx:
name: 'Build container CI/CD TFX'
runs-on: 'ubuntu-latest'
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.

name: Deploy tfx model
name: Deploy Vertex AI tfx model
on:
workflow_dispatch:

Expand All @@ -31,7 +31,7 @@ env:
WORKLOAD_ID_PROVIDER: ${wip}
jobs:
deploy-model:
name: 'Deploy model to endpoint'
name: 'Deploy TFX model to endpoint'
runs-on: 'ubuntu-latest'
steps:
- uses: 'actions/checkout@v3'
Expand All @@ -47,6 +47,6 @@ jobs:
service_account: $${{ env.SERVICE_ACCOUNT }}
access_token_lifetime: 3600s

- name: 'Deploy model'
run: gcloud builds submit --no-source --config build/$${{ env.ENVIRONMENT }}/model-deployment.yaml --project $${{ env.PROJECT_ID }} --region $${{ env.REGION }} --machine-type=e2-highcpu-8 --suppress-logs
- name: 'Deploy TFX model'
run: gcloud builds submit --no-source --config build/$${{ env.ENVIRONMENT }}/model-deployment-tfx.yaml --project $${{ env.PROJECT_ID }} --region $${{ env.REGION }} --machine-type=e2-highcpu-8 --suppress-logs

1 change: 1 addition & 0 deletions examples/vertex_mlops_enterprise/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,4 @@ terraform.tfstate*
.DS_Store
**/__pycache__/**
venv
**/.ipynb_checkpoints/**
17 changes: 9 additions & 8 deletions examples/vertex_mlops_enterprise/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,19 +7,20 @@ allow larger organizations achieve scale in terms of number of models.

## Contents of this example

We provide three notebooks to cover the three processes that we typically observe:
We provide three Vertex AI pipelines examples based on different technologies:

1. [01-experimentation.ipynb](01-experimentation.ipynb) covers the development process, where the features, the model and the training process are defined.
1. [02-cicd.ipynb](02-cicd.ipynb) covers the the CI/CD process that tests the code produced in the experimentation phase, and trains a production-ready model.
1. [03-prediction.ipynb](03-prediction.ipynb) cover the deployment process to make the model available, for example on a Vertex AI Endpoint or through Vertex AI Batch Prediction.
- [KFP pipeline](src/kfp_pipelines/README.md) using Vertex AI custom training
- [KFP pipeline using BigQuery ML](src/bqml_pipeline/README.md)
- [TFX pipeline](src/tfx_pipelines/) using Vertex AI custom training. In this case, a set of notebooks is also provided for the experimentation phase:
1. [experimentation.ipynb](01-experimentation.ipynb) covers the development process, where the features, the model and the training process are defined.
2. [cicd.ipynb](02-cicd.ipynb) covers the the CI/CD process that tests the code produced in the experimentation phase, and trains a production-ready model.
3. [prediction.ipynb](03-prediction.ipynb) cover the deployment process to make the model available, for example on a Vertex AI Endpoint or through Vertex AI Batch Prediction.

Each of the notebooks provides detailed instructions on prerequisites for their execution and they should be self-explanatory.

Once you have reviewed the notebooks, you can go on with these advanced steps to set up the automated environments and the CI/CD process using Github.
Once you have reviewed the pipelines, you can go on with these advanced steps to set up the automated environments and the CI/CD process using Github.

1. [Environments](doc/01-ENVIRONMENTS.md) covers how to automate the environments deployments using Terraform.
1. [GIT Setup](doc/02-GIT_SETUP.md) covers how to configure a Github repo to be used for the CI/CD process.
1. [03-prediction.ipynb](doc/03-MLOPS.md) cover test the automated MLOps end2end process.
1. [MLOps end2end process](doc/03-MLOPS.md) cover test the automated MLOps end2end process.

<!-- CONTRIBUTING -->
## Contributing
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -47,109 +47,61 @@ steps:

# Run datasource_utils unit tests.
- name: '$_CICD_IMAGE_URI'
entrypoint: 'pytest'
args: ['src/tests/datasource_utils_tests.py', '-s']
dir: '$_WORKDIR'
env:
- 'PROJECT=$_PROJECT'
- 'BQ_LOCATION=$_BQ_LOCATION'
- 'BQ_DATASET_NAME=$_BQ_DATASET_NAME'
- 'ML_TABLE=$_ML_TABLE'
id: 'Unit Test Datasource Utils'
waitFor: ['Clone Repository']


# Run model unit tests.
- name: '$_CICD_IMAGE_URI'
entrypoint: 'pytest'
args: ['src/tests/model_tests.py', '-s']
dir: '$_WORKDIR'
id: 'Unit Test Model'
entrypoint: 'echo'
args: ['Running unit tests - dummy build']
id: 'Unit Tests'
waitFor: ['Clone Repository']
timeout: 1800s


# Test e2e pipeline using local runner.
- name: '$_CICD_IMAGE_URI'
entrypoint: 'pytest'
args: ['src/tests/pipeline_deployment_tests.py::test_e2e_pipeline', '-s']
dir: '$_WORKDIR'
env:
- 'PROJECT=$_PROJECT'
- 'REGION=$_REGION'
- 'MODEL_DISPLAY_NAME=$_MODEL_DISPLAY_NAME'
- 'VERTEX_DATASET_NAME=$_VERTEX_DATASET_NAME'
- 'GCS_LOCATION=$_TEST_GCS_LOCATION'
- 'TRAIN_LIMIT=$_CI_TRAIN_LIMIT'
- 'TEST_LIMIT=$_CI_TEST_LIMIT'
- 'UPLOAD_MODEL=$_CI_UPLOAD_MODEL'
- 'ACCURACY_THRESHOLD=$_CI_ACCURACY_THRESHOLD'
id: 'Local Test E2E Pipeline'
waitFor: ['Clone Repository']
timeout: 1800s

# Compile the pipeline.
- name: '$_CICD_IMAGE_URI'
entrypoint: 'python'
args: ['build/utils.py',
'--mode', 'compile-pipeline',
'--pipeline-name', '$_PIPELINE_NAME'
]
dir: '$_WORKDIR'
args: ['pipeline.py', '--compile-only']
dir: '$_WORKDIR/src/bqml_pipeline/src/'
env:
- 'PROJECT=$_PROJECT'
- 'PROJECT_ID=$_PROJECT'
- 'REGION=$_REGION'
- 'MODEL_DISPLAY_NAME=$_MODEL_DISPLAY_NAME'
- 'VERTEX_DATASET_NAME=$_VERTEX_DATASET_NAME'
- 'GCS_LOCATION=$_GCS_LOCATION'
- 'DATAFLOW_IMAGE_URI=$_DATAFLOW_IMAGE_URI'
- 'TFX_IMAGE_URI=$_TFX_IMAGE_URI'
- 'BEAM_RUNNER=$_BEAM_RUNNER'
- 'TRAINING_RUNNER=$_TRAINING_RUNNER'
- 'SERVICE_ACCOUNT=$_SERVICE_ACCOUNT'
- 'SUBNETWORK=$_SUBNETWORK'
- 'ACCURACY_THRESHOLD=$_CI_ACCURACY_THRESHOLD'

- 'NETWORK=$_NETWORK'
- 'BQ_DATASET_NAME=$_BQ_DATASET_NAME'
- 'ML_TABLE=$_ML_TABLE'
- 'MODEL_DISPLAY_NAME=$_MODEL_DISPLAY_NAME'
- 'PIPELINE_NAME=$_PIPELINE_NAME'
- 'PIPELINES_STORE=$_PIPELINES_STORE'
- 'CICD_IMAGE_URI=$_CICD_IMAGE_URI'
- 'CICD_IMAGE_MODEL_CARD=$_CICD_IMAGE_MODEL_CARD'
- 'DATAFLOW_SA=$_SERVICE_ACCOUNT'
- 'DATAFLOW_NETWORK=$_DATAFLOW_NETWORK'
id: 'Compile Pipeline'
waitFor: ['Local Test E2E Pipeline', 'Unit Test Datasource Utils', 'Unit Test Model']

waitFor: ['Unit Tests']

# Upload compiled pipeline to GCS.
- name: 'gcr.io/cloud-builders/gsutil'
args: ['cp', '$_PIPELINE_NAME.json', '$_PIPELINES_STORE']
dir: '$_WORKDIR'
dir: '$_WORKDIR/src/bqml_pipeline/src/'
id: 'Upload Pipeline to GCS'
waitFor: ['Compile Pipeline']


serviceAccount: 'projects/$_PROJECT/serviceAccounts/$_SERVICE_ACCOUNT'
logsBucket: '$_GCS_BUCKET'
timeout: 3600s
timeout: 7200s
substitutions:
_REPO_URL: [email protected]:${github_org}/${github_repo}
_CICD_IMAGE_URI: '${docker_repo}/cicd-bqml:latest'
_CICD_IMAGE_MODEL_CARD: '${docker_repo}/model-card:latest'
_BRANCH: ${github_branch}
_REGION: ${region}
_PROJECT: ${project_id}
_GCS_BUCKET: ${project_id}_cloudbuild/logs
_CICD_IMAGE_URI: '${docker_repo}/cicd-tfx:latest'
_DATAFLOW_IMAGE_URI: '${docker_repo}/dataflow:latest'
_TFX_IMAGE_URI: '${docker_repo}/vertex:latest'
_GCS_LOCATION: 'gs://${project_id}/creditcards/'
_TEST_GCS_LOCATION: 'gs://${project_id}/creditcards/e2e_tests'
_BQ_LOCATION: ${region}
_BQ_DATASET_NAME: creditcards
_ML_TABLE: creditcards_ml
_VERTEX_DATASET_NAME: creditcards
_MODEL_DISPLAY_NAME: creditcards-classifier-v02
_CI_TRAIN_LIMIT: '1000'
_CI_TEST_LIMIT: '100'
_CI_UPLOAD_MODEL: '0'
_CI_ACCURACY_THRESHOLD: '-0.1'
_BEAM_RUNNER: DataflowRunner
_TRAINING_RUNNER: vertex
_PIPELINE_NAME: creditcards-classifier-v02-train-pipeline
_PIPELINES_STORE: gs://${project_id}/creditcards/compiled_pipelines/
_SUBNETWORK: ${subnetwork}
_PIPELINE_NAME: creditcards-classifier-bqml-train
_PIPELINES_STORE: gs://${bucket_name}/creditcards/compiled_pipelines/
_MODEL_DISPLAY_NAME: creditcards-bqml
_NETWORK: ${subnetwork}
_DATAFLOW_NETWORK: ${dataflow_network}
_SERVICE_ACCOUNT: ${sa_mlops}
_WORKDIR: ${github_repo}
options:
Expand Down
5 changes: 2 additions & 3 deletions examples/vertex_mlops_enterprise/doc/01-ENVIRONMENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,11 +42,10 @@ cd professional-services/

Setup your new Github repo using the Github web console or CLI.

Copy the `vertex_mlops_enterprise` folder to your local folder, including the Github actions:
Copy the `vertex_mlops_enterprise` folder to your local folder, including the Github actions, hidden dirs and files:

```
cp -R ./examples/vertex_mlops_enterprise/* ./<YOUR LOCAL FOLDER>
cp -R ./examples/vertex_mlops_enterprise/.github ./<YOUR LOCAL FOLDER>
cp -r ./examples/vertex_mlops_enterprise/ <YOUR LOCAL FOLDER>
```

Commit the files in the main branch (`main`):
Expand Down
9 changes: 9 additions & 0 deletions examples/vertex_mlops_enterprise/src/bqml_pipeline/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
FROM python:3.8


COPY requirements.txt .
RUN pip install --upgrade pip
RUN pip install -r requirements.txt

COPY src .
ENV PYTHONPATH=/
3 changes: 3 additions & 0 deletions examples/vertex_mlops_enterprise/src/bqml_pipeline/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Reference BQML Pipeline

Reference BigQuery ML pipeline implementation.
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
jinja2~=3.1.2
pandas~=1.5.3
matplotlib~=3.7.1
google-cloud-aiplatform~=1.35.0
google-cloud-pipeline-components~=1.0.45
35 changes: 35 additions & 0 deletions examples/vertex_mlops_enterprise/src/bqml_pipeline/src/config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
import os

PROJECT_ID = os.getenv("PROJECT_ID", "")
REGION = os.getenv("REGION", "")
IMAGE=os.getenv("CICD_IMAGE_URI", f'{REGION}-docker.pkg.dev/{PROJECT_ID}/creditcards-kfp/base:latest')
TRAIN_COMPONENT_IMAGE=f'{REGION}-docker.pkg.dev/{PROJECT_ID}/creditcards-kfp/train-fraud:latest'
IMAGE_MODEL_CARD=os.getenv("CICD_IMAGE_MODEL_CARD", f'{REGION}-docker.pkg.dev/{PROJECT_ID}/creditcards-kfp/model-card:latest')

CLASS_NAMES = ['OK', 'Fraud']
TARGET_COLUMN = 'Class'

PIPELINE_NAME = os.getenv("PIPELINE_NAME", 'bqml-creditcards')
PIPELINE_ROOT = os.getenv("PIPELINES_STORE", f'gs://{PROJECT_ID}/pipeline_root/{PIPELINE_NAME}')
SERVICE_ACCOUNT = os.getenv("SERVICE_ACCOUNT") # returns None is not defined
NETWORK = os.getenv("NETWORK") # returns None is not defined
KEY_ID = os.getenv("CMEK_KEY_ID") # e.g. projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key

BQ_DATASET_NAME=os.getenv("BQ_DATASET_NAME","creditcards")
BQ_INPUT_DATA=f"{PROJECT_ID}.{BQ_DATASET_NAME}.{os.getenv('ML_TABLE','creditcards_ml')}"
PARENT_MODEL='' # f'projects/{PROJECT_ID}/locations/{REGION}/models/YOUR_NUMERIC_MODEL_ID_HERE'

BQ_OUTPUT_DATASET_ID="creditcards_batch_out"

MODEL_DISPLAY_NAME = os.getenv("MODEL_DISPLAY_NAME", 'creditcards-bqml')
MODEL_CARD_CONFIG='../model_card_config.json'

PRED_CONTAINER='europe-docker.pkg.dev/vertex-ai/prediction/xgboost-cpu.1-6:latest'
ENDPOINT_NAME=PIPELINE_NAME

EMAILS=['[email protected]']

# Evaluation pipeline
DATAFLOW_SA = os.getenv("DATAFLOW_SA")
DATAFLOW_NETWORK = os.getenv("DATAFLOW_NETWORK")
DATAFLOW_PUBLIC_IPS = False
Loading

0 comments on commit 1ed818b

Please sign in to comment.