To run these deployment options, you first need:
- an existing Azure ML workspace (see cookbook)
- an existing orchestrator (see tutorial)
- have permissions to create resources, set permissions, and create identities in this subscription (or at least in one resource group),
- Note that to set permissions, you typically need Owner role in the subscription or resource group - Contributor role is not enough. This is key for being able to secure the setup.
- Optional: install the Azure CLI.
Note: this deployment can take up to 15-20 minutes. Note: please do not forget create InstanceType that is necessary to be able to fully utilize provisioned cluster, the tutorial is at the end of this document
-
Adjust parameters, in particular:
- Region: this will be set by Azure to the region of your resource group.
- Machine Learning Name: need to match the name of the AzureML workspace in the resource group.
- Machine Learning Region: the region in which the AzureML workspace was deployed (default: same as resource group).
- Pair Region: the region where the compute and storage will be deployed (default: same as resource group).
- Pair Base Name: a unique name for the silo, example
silo1-westus
. This will be used to create all other resources (storage name, compute name, etc.).
In the resource group of your AzureML workspace, use the following command with parameters corresponding to your setup:
az deployment group create --template-file ./mlops/bicep/modules/fl_pairs/open_aks_with_confcomp_storage_pair.bicep --resource-group <resource group name> --parameters pairBaseName="silo1-westus" pairRegion="westus" machineLearningName="aml-fldemo" machineLearningRegion="eastus"
The option 1 wraps up multiple provisioning steps from multiple sections of the Azure documentation. If you want to reproduce this manually, here's the steps you can use:
- Quickstart: Deploy an AKS cluster with confidential computing Intel SGX agent nodes by using the Azure CLI,
- How to deploy Kubernetes extension,
- How to attach Kubernetes to Workspace.
-
Navigate the Azure portal to find your resource group.
-
Look for a resource of type Managed Identity in the region of the silo named like
uai-<pairBaseName>
. It should have been created by the instructions above. -
Open this identity and click on Azure role assignments. You should see the list of assignments for this identity.
It should contain 3 roles towards the storage account of the silo itself:
- Storage Blob Data Contributor
- Reader and Data Access
- Storage Account Key Operator Service Role
-
Click on Add role assignment and add each of these same role towards the storage account of your orchestrator.
InstanceType sets restrictions for each job running on the AKS cluster. You can create multiple InstanceType(s) for different type of jobs. For example, job for pre-processing data is usually less demanding than a training job and thus the InstanceType can provide process with less resources. You can find example InstanceType definition in mlops/k8_templates/instance-type.yaml
. To create InstanceType follow these steps:
Note: Make sure you have
kubectl
tool installed: https://kubernetes.io/docs/tasks/tools/
- Update
mlops/k8_templates/instance-type.yaml
file to reflect minimum and limit resources for the job you intend to deploy (for simplicity you can just set the limit to resources provided by provisioned node in the AKS cluster) - Update
name
property undermetadata
section in themlops/k8_templates/instance-type.yaml
file. Please remember this name as you will need it later on. - Run
az login
- Run
az account set --subscription <your-subscription-id>
- Run
az aks get-credentials --resource-group <rg-name> --name <aks-name>
- Navigate to
mlops/k8s_templates
folder and run:kubectl apply -f instance-type.yaml
- Add
instance_type
property to your pipeline config for the AKS silo and set value to the name set in the step 2