Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jgpuga/mlops #1167

Merged
merged 14 commits into from
Oct 31, 2023
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,27 @@ env:
WORKLOAD_ID_PROVIDER: ${wip}
CLOUDBUILD_LOGS: gs://${project_id}_cloudbuild/logs
jobs:
build-container-bqml:
name: 'Build container CI/CD BigQuery ML'
runs-on: 'ubuntu-latest'
steps:
- uses: 'actions/checkout@v3'
with:
token: $${{ github.token }}

- id: 'auth'
name: 'Authenticate to Google Cloud'
uses: 'google-github-actions/auth@v1'
with:
create_credentials_file: 'true'
workload_identity_provider: $${{ env.WORKLOAD_ID_PROVIDER }}
service_account: $${{ env.SERVICE_ACCOUNT }}
access_token_lifetime: 3600s

- name: 'Build container'
run: |
gcloud builds submit --gcs-log-dir=$${{ env.CLOUDBUILD_LOGS }} --project $${{ env.PROJECT_ID }} --region $${{ env.REGION }} --tag $${{ env.DOCKER_REPO }}/cicd-bqml:latest src/bqml_pipeline/. --timeout=15m --machine-type=e2-highcpu-8 --suppress-logs

build-container-cicd-tfx:
name: 'Build container CI/CD TFX'
runs-on: 'ubuntu-latest'
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.

name: Deploy tfx model
name: Deploy Vertex AI tfx model
javiergp marked this conversation as resolved.
Show resolved Hide resolved
on:
workflow_dispatch:

Expand All @@ -31,7 +31,7 @@ env:
WORKLOAD_ID_PROVIDER: ${wip}
jobs:
deploy-model:
name: 'Deploy model to endpoint'
name: 'Deploy TFX model to endpoint'
runs-on: 'ubuntu-latest'
steps:
- uses: 'actions/checkout@v3'
Expand All @@ -48,5 +48,5 @@ jobs:
access_token_lifetime: 3600s

- name: 'Deploy model'
javiergp marked this conversation as resolved.
Show resolved Hide resolved
run: gcloud builds submit --no-source --config build/$${{ env.ENVIRONMENT }}/model-deployment.yaml --project $${{ env.PROJECT_ID }} --region $${{ env.REGION }} --machine-type=e2-highcpu-8 --suppress-logs
run: gcloud builds submit --no-source --config build/$${{ env.ENVIRONMENT }}/model-deployment-tfx.yaml --project $${{ env.PROJECT_ID }} --region $${{ env.REGION }} --machine-type=e2-highcpu-8 --suppress-logs
javiergp marked this conversation as resolved.
Show resolved Hide resolved

1 change: 1 addition & 0 deletions examples/vertex_mlops_enterprise/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,4 @@ terraform.tfstate*
.DS_Store
**/__pycache__/**
venv
**/.ipynb_checkpoints/**
javiergp marked this conversation as resolved.
Show resolved Hide resolved
17 changes: 9 additions & 8 deletions examples/vertex_mlops_enterprise/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,19 +7,20 @@ allow larger organizations achieve scale in terms of number of models.

## Contents of this example

We provide three notebooks to cover the three processes that we typically observe:
We provide three Vertex AI pipelines examples based on different technologies:

1. [01-experimentation.ipynb](01-experimentation.ipynb) covers the development process, where the features, the model and the training process are defined.
1. [02-cicd.ipynb](02-cicd.ipynb) covers the the CI/CD process that tests the code produced in the experimentation phase, and trains a production-ready model.
1. [03-prediction.ipynb](03-prediction.ipynb) cover the deployment process to make the model available, for example on a Vertex AI Endpoint or through Vertex AI Batch Prediction.
- [KFP pipeline](src/kfp_pipelines/README.md) using Vertex AI custom training
- [KFP pipeline using BigQuery ML](src/bqml_pipeline/README.md)
- [TFX pipeline](src/tfx_pipelines/) using Vertex AI custom training. In this case, a set of notebooks is also provided for the experimentation phase:
1. [experimentation.ipynb](01-experimentation.ipynb) covers the development process, where the features, the model and the training process are defined.
2. [cicd.ipynb](02-cicd.ipynb) covers the the CI/CD process that tests the code produced in the experimentation phase, and trains a production-ready model.
3. [prediction.ipynb](03-prediction.ipynb) cover the deployment process to make the model available, for example on a Vertex AI Endpoint or through Vertex AI Batch Prediction.

Each of the notebooks provides detailed instructions on prerequisites for their execution and they should be self-explanatory.

Once you have reviewed the notebooks, you can go on with these advanced steps to set up the automated environments and the CI/CD process using Github.
Once you have reviewed the pipelines, you can go on with these advanced steps to set up the automated environments and the CI/CD process using Github.

1. [Environments](doc/01-ENVIRONMENTS.md) covers how to automate the environments deployments using Terraform.
1. [GIT Setup](doc/02-GIT_SETUP.md) covers how to configure a Github repo to be used for the CI/CD process.
1. [03-prediction.ipynb](doc/03-MLOPS.md) cover test the automated MLOps end2end process.
1. [MLOps end2end process](doc/03-MLOPS.md) cover test the automated MLOps end2end process.

<!-- CONTRIBUTING -->
## Contributing
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -47,109 +47,61 @@ steps:

# Run datasource_utils unit tests.
- name: '$_CICD_IMAGE_URI'
entrypoint: 'pytest'
args: ['src/tests/datasource_utils_tests.py', '-s']
dir: '$_WORKDIR'
env:
- 'PROJECT=$_PROJECT'
- 'BQ_LOCATION=$_BQ_LOCATION'
- 'BQ_DATASET_NAME=$_BQ_DATASET_NAME'
- 'ML_TABLE=$_ML_TABLE'
id: 'Unit Test Datasource Utils'
waitFor: ['Clone Repository']


# Run model unit tests.
- name: '$_CICD_IMAGE_URI'
entrypoint: 'pytest'
args: ['src/tests/model_tests.py', '-s']
dir: '$_WORKDIR'
id: 'Unit Test Model'
entrypoint: 'echo'
args: ['Running unit tests - dummy build']
id: 'Unit Tests'
waitFor: ['Clone Repository']
timeout: 1800s


# Test e2e pipeline using local runner.
- name: '$_CICD_IMAGE_URI'
entrypoint: 'pytest'
args: ['src/tests/pipeline_deployment_tests.py::test_e2e_pipeline', '-s']
dir: '$_WORKDIR'
env:
- 'PROJECT=$_PROJECT'
- 'REGION=$_REGION'
- 'MODEL_DISPLAY_NAME=$_MODEL_DISPLAY_NAME'
- 'VERTEX_DATASET_NAME=$_VERTEX_DATASET_NAME'
- 'GCS_LOCATION=$_TEST_GCS_LOCATION'
- 'TRAIN_LIMIT=$_CI_TRAIN_LIMIT'
- 'TEST_LIMIT=$_CI_TEST_LIMIT'
- 'UPLOAD_MODEL=$_CI_UPLOAD_MODEL'
- 'ACCURACY_THRESHOLD=$_CI_ACCURACY_THRESHOLD'
id: 'Local Test E2E Pipeline'
waitFor: ['Clone Repository']
timeout: 1800s

# Compile the pipeline.
- name: '$_CICD_IMAGE_URI'
entrypoint: 'python'
args: ['build/utils.py',
'--mode', 'compile-pipeline',
'--pipeline-name', '$_PIPELINE_NAME'
]
dir: '$_WORKDIR'
args: ['pipeline.py', '--compile-only']
dir: '$_WORKDIR/src/bqml_pipeline/src/'
env:
- 'PROJECT=$_PROJECT'
- 'PROJECT_ID=$_PROJECT'
- 'REGION=$_REGION'
- 'MODEL_DISPLAY_NAME=$_MODEL_DISPLAY_NAME'
- 'VERTEX_DATASET_NAME=$_VERTEX_DATASET_NAME'
- 'GCS_LOCATION=$_GCS_LOCATION'
- 'DATAFLOW_IMAGE_URI=$_DATAFLOW_IMAGE_URI'
- 'TFX_IMAGE_URI=$_TFX_IMAGE_URI'
- 'BEAM_RUNNER=$_BEAM_RUNNER'
- 'TRAINING_RUNNER=$_TRAINING_RUNNER'
- 'SERVICE_ACCOUNT=$_SERVICE_ACCOUNT'
- 'SUBNETWORK=$_SUBNETWORK'
- 'ACCURACY_THRESHOLD=$_CI_ACCURACY_THRESHOLD'

- 'NETWORK=$_NETWORK'
- 'BQ_DATASET_NAME=$_BQ_DATASET_NAME'
- 'ML_TABLE=$_ML_TABLE'
- 'MODEL_DISPLAY_NAME=$_MODEL_DISPLAY_NAME'
- 'PIPELINE_NAME=$_PIPELINE_NAME'
- 'PIPELINES_STORE=$_PIPELINES_STORE'
- 'CICD_IMAGE_URI=$_CICD_IMAGE_URI'
- 'CICD_IMAGE_MODEL_CARD=$_CICD_IMAGE_MODEL_CARD'
- 'DATAFLOW_SA=$_SERVICE_ACCOUNT'
- 'DATAFLOW_NETWORK=$_DATAFLOW_NETWORK'
id: 'Compile Pipeline'
waitFor: ['Local Test E2E Pipeline', 'Unit Test Datasource Utils', 'Unit Test Model']

waitFor: ['Unit Tests']

# Upload compiled pipeline to GCS.
- name: 'gcr.io/cloud-builders/gsutil'
args: ['cp', '$_PIPELINE_NAME.json', '$_PIPELINES_STORE']
dir: '$_WORKDIR'
dir: '$_WORKDIR/src/bqml_pipeline/src/'
id: 'Upload Pipeline to GCS'
waitFor: ['Compile Pipeline']


serviceAccount: 'projects/$_PROJECT/serviceAccounts/$_SERVICE_ACCOUNT'
logsBucket: '$_GCS_BUCKET'
timeout: 3600s
timeout: 7200s
substitutions:
_REPO_URL: [email protected]:${github_org}/${github_repo}
_CICD_IMAGE_URI: '${docker_repo}/cicd-bqml:latest'
_CICD_IMAGE_MODEL_CARD: '${docker_repo}/model-card:latest'
_BRANCH: ${github_branch}
_REGION: ${region}
_PROJECT: ${project_id}
_GCS_BUCKET: ${project_id}_cloudbuild/logs
_CICD_IMAGE_URI: '${docker_repo}/cicd-tfx:latest'
_DATAFLOW_IMAGE_URI: '${docker_repo}/dataflow:latest'
_TFX_IMAGE_URI: '${docker_repo}/vertex:latest'
_GCS_LOCATION: 'gs://${project_id}/creditcards/'
_TEST_GCS_LOCATION: 'gs://${project_id}/creditcards/e2e_tests'
_BQ_LOCATION: ${region}
_BQ_DATASET_NAME: creditcards
_ML_TABLE: creditcards_ml
_VERTEX_DATASET_NAME: creditcards
_MODEL_DISPLAY_NAME: creditcards-classifier-v02
_CI_TRAIN_LIMIT: '1000'
_CI_TEST_LIMIT: '100'
_CI_UPLOAD_MODEL: '0'
_CI_ACCURACY_THRESHOLD: '-0.1'
_BEAM_RUNNER: DataflowRunner
_TRAINING_RUNNER: vertex
_PIPELINE_NAME: creditcards-classifier-v02-train-pipeline
_PIPELINES_STORE: gs://${project_id}/creditcards/compiled_pipelines/
_SUBNETWORK: ${subnetwork}
_PIPELINE_NAME: creditcards-classifier-bqml-train
_PIPELINES_STORE: gs://${bucket_name}/creditcards/compiled_pipelines/
_MODEL_DISPLAY_NAME: creditcards-bqml
_NETWORK: ${subnetwork}
_DATAFLOW_NETWORK: ${dataflow_network}
_SERVICE_ACCOUNT: ${sa_mlops}
_WORKDIR: ${github_repo}
options:
Expand Down
4 changes: 2 additions & 2 deletions examples/vertex_mlops_enterprise/doc/01-ENVIRONMENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,8 +45,8 @@ Setup your new Github repo using the Github web console or CLI.
Copy the `vertex_mlops_enterprise` folder to your local folder, including the Github actions:

```
cp -R ./examples/vertex_mlops_enterprise/* ./<YOUR LOCAL FOLDER>
cp -R ./examples/vertex_mlops_enterprise/.github ./<YOUR LOCAL FOLDER>
# Copy base directory from git repo, including hidden dirs and files
javiergp marked this conversation as resolved.
Show resolved Hide resolved
cp -r ./examples/vertex_mlops_enterprise/ <YOUR LOCAL FOLDER>
```

Commit the files in the main branch (`main`):
Expand Down
9 changes: 9 additions & 0 deletions examples/vertex_mlops_enterprise/src/bqml_pipeline/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
FROM python:3.8


COPY requirements.txt .
RUN pip install --upgrade pip
RUN pip install -r requirements.txt

COPY src .
ENV PYTHONPATH=/
3 changes: 3 additions & 0 deletions examples/vertex_mlops_enterprise/src/bqml_pipeline/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Reference BQML Pipeline

Reference BigQuery ML pipeline implementation.
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
jinja2~=3.1.2
pandas~=1.5.3
matplotlib~=3.7.1
google-cloud-aiplatform~=1.35.0
google-cloud-pipeline-components~=1.0.45
35 changes: 35 additions & 0 deletions examples/vertex_mlops_enterprise/src/bqml_pipeline/src/config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
import os

PROJECT_ID = os.getenv("PROJECT_ID", "")
REGION = os.getenv("REGION", "")
IMAGE=os.getenv("CICD_IMAGE_URI", f'{REGION}-docker.pkg.dev/{PROJECT_ID}/creditcards-kfp/base:latest')
TRAIN_COMPONENT_IMAGE=f'{REGION}-docker.pkg.dev/{PROJECT_ID}/creditcards-kfp/train-fraud:latest'
IMAGE_MODEL_CARD=os.getenv("CICD_IMAGE_MODEL_CARD", f'{REGION}-docker.pkg.dev/{PROJECT_ID}/creditcards-kfp/model-card:latest')

CLASS_NAMES = ['OK', 'Fraud']
TARGET_COLUMN = 'Class'

PIPELINE_NAME = os.getenv("PIPELINE_NAME", 'bqml-creditcards')
PIPELINE_ROOT = os.getenv("PIPELINES_STORE", f'gs://{PROJECT_ID}/pipeline_root/{PIPELINE_NAME}')
SERVICE_ACCOUNT = os.getenv("SERVICE_ACCOUNT") # returns None is not defined
NETWORK = os.getenv("NETWORK") # returns None is not defined
KEY_ID = os.getenv("CMEK_KEY_ID") # e.g. projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key

BQ_DATASET_NAME=os.getenv("BQ_DATASET_NAME","creditcards")
BQ_INPUT_DATA=f"{PROJECT_ID}.{BQ_DATASET_NAME}.{os.getenv('ML_TABLE','creditcards_ml')}"
PARENT_MODEL='' # f'projects/{PROJECT_ID}/locations/{REGION}/models/YOUR_NUMERIC_MODEL_ID_HERE'

BQ_OUTPUT_DATASET_ID="creditcards_batch_out"

MODEL_DISPLAY_NAME = os.getenv("MODEL_DISPLAY_NAME", 'creditcards-bqml')
MODEL_CARD_CONFIG='../model_card_config.json'

PRED_CONTAINER='europe-docker.pkg.dev/vertex-ai/prediction/xgboost-cpu.1-6:latest'
ENDPOINT_NAME=PIPELINE_NAME

EMAILS=['[email protected]']

# Evaluation pipeline
DATAFLOW_SA = os.getenv("DATAFLOW_SA")
DATAFLOW_NETWORK = os.getenv("DATAFLOW_NETWORK")
DATAFLOW_PUBLIC_IPS = False
Loading