To configura the infra automation the first time takes ~4h, which is great compared to 500-1500 hours.
And after this is done, you can setup as many AIFactory's you want, with configuration time 2-15min per AIFactory.
Note
See the new bootstrap template repository - even more automated way to setup Enterprise Scale AIFactory's. (This section is still valid and good to read) Enterprise Scale AIFactory - Template repo, using the AI Factory as submodule
The pipelines will automate the execution of BICEP/Terraform, Powershell, Azure CLI. The goal of this documentation page, is to configure and run at least the two pipelines:
In the central submodule the pipeline templates exists here:
- Option A) Setup AIFactory - Infra Automation (AzureDevops+BICEP)
- Option B) Setup AIFactory - Infra Automation (GithubActions+BICEP)
In your local repo, after you have done STEP 3 and copied the files you will see the templates here.
- Azure Devops - Yaml aifactory/esml-infra/azure-devops/bicep/yaml
- Azure Devops - Classic aifactory/esml-infra/azure-devops/bicep/classic
- Github Actions aifactory/esml-infra/github-actions/bicep/readme.md
Important
You need to use the local version as below, since you need to configure the Variable file. Either Azure Devops variables or Github Actions .env file
infra-aifactory-common
infra-project-genai
and/orinfra-project-esml
You can see all avaialable pipelines below:
infra-aifactory-common
- Creates Azure infrastructure, inclduing vNets & services, network rules, RBAC. Optionally adds core team member access (add-coreteam-member)
infra-project-genai
- Creates Azure infrastructure, inclduing subnets, network rules, RBAC, and add user access to a list of user ID (add-project-member-esml or add-project-member-genai)
infra-project-esml
- Creates Azure infrastructure, inclduing subnets, network rules, RBAC, and add user access to a list of user ID (add-project-member-esml or add-project-member-genai)
add-coreteam-member
- For all users, it will set the correct RBAC roles for the AI Factory Common part of Dev/Test/Prod
add-project-member-esml
- For all users, it will set the correct RBAC roles for Projectspecicif resource group, with optional IP-whitelisting for the services:
Azure Machine Learning, Keyvault, Storage
- For all users, it will set the correct RBAC roles for Projectspecicif resource group, with optional IP-whitelisting for the services:
add-project-member-genai
- For all users, it will set the correct RBAC roles for Projectspecicif resource group, with optional IP-whitelisting for the services:
Azure AI Hub, AI Services, AI Search, Keyvault, Storage
- For all users, it will set the correct RBAC roles for Projectspecicif resource group, with optional IP-whitelisting for the services:
Below is how it will look like in the Pipelines/Release view in Azure Devops:
Note
Equivalent Github Action workflows also exists
This aifactory-infra-001 will be your repo, where you have your configuration is overwriting the AIfactory config-template files.
-
Open GIT command prompt, go to your
local root folder for the code
(you should see the folderazure-enterprise-scale-ml
andnotebook_demos
with adir
in the GIT CMD)run below:git config --system core.longpaths true
git submodule add https://github.com/jostrm/azure-enterprise-scale-ml
-
Note: If the sumodule is already added by another team member in your project, the above command, git submodule add, will not work. Then you need to run the below instead:
git submodule update --init --recursive
- Open the notebook 01_init_templates_ALL.ipynb
- Run all cells.
- Note: When you run the first cell, VS code will ask you to choose a kernel - choose _Python environment. Recommended Python version is 3.12.5 but any Python version above 3.7 should work.
- After you executed all cells in the notebook, you will have a new folder called ai factory with sub-folders, that includes templates.
- Verify that it looks as the screenshot below, that you have an aifactory folder at the top.
In the central submodule the pipeline templates exists here:
- Option A) Setup AIFactory - Infra Automation (AzureDevops+BICEP)
- Option B) Setup AIFactory - Infra Automation (GithubActions+BICEP)
In your local repo, after you have done STEP 3 and copied the files you will see the templates here.
- Azure Devops - Yaml aifactory/esml-infra/azure-devops/bicep/yaml
- Azure Devops - Classic aifactory/esml-infra/azure-devops/bicep/classic
- Github Actions aifactory/esml-infra/github-actions/bicep/readme.md
Important
You need to use the local version as below, since you need to configure the Variable file. Either Azure Devops variables or Github Actions .env file
Import all, but start to imort two of them and execute them in the following order:
- esml-infra-common-bicep.json
- esml-infra-project-bicep-adv.json
Thess two are the Azure Devops Release pipelines we need for an AIFactory, and its first project (and upcoming projects):
Start with the 1st file esml-infra-common-bicep.json
- Open the Azure Devops portal, and browse to your org and project, to click the main menue to the left on Pipelines/Releases
- Click the New button, to find the Import release pipeline button
After import, it should look like this:
- Click on the red marking at Tasks where there are three task stages: esml-common-dev,esml-common-test,esml-common-prod, you need to configure all of them.
- Click on the red marking at Tasks where there are three tasks stages: esml-common-dev,esml-common-test,esml-common-prod, you need to configure all of them, start with the task stage esml-common-dev
- Click on Agent job, where it says Some settings need attention
- Select Agent pool
- Option A) Choose Hosted/Azure Pipelines, with the Agent Specification _windows-2022 (windows-latest usually works also)
- Option B) You may also use your own self-hosted Windows server (Windows 2019 or Windows 2022)
- Click the Azure CLI task called 11-Common RG and RBAC, and then cliick the Manage link to get to the Azure Resource Manager connection page, where you can create connections. A new browers tab will open.
- Click the New service connection button, and select Azure Resource Manager radio button, click NEXT.
- Select _Service principal (manual) in the dialog, click NEXT
- Use the service principal information for esml-common-bicep-sp, you created in the seeding keuyvault in the prerequisites-steup to configure it.
- That service principal should have the priviledged role ONWEr on the subscription, and be able to assigne other priviledged roles, such as CONTRIBUTOR and OWNER on Resource groups scope, as image:
- Verify the ARM connection, and also checl the box Grant access permission to all pipelines
- Create all 3: You need to create three Azure Resource Manager Connections (ARM connections). The ARM connections should be created with a service principle that has OWNER permissions to the subscription we want to work with in the AIFactory, as eithe DEV, TEST, or PROD environment.
You may create all 3 ARM connections at once, either based on same service principle from the seeding keyvault called esml-common-bicep-sp that in that case are owner on all three subscriptions, or you may have three service principals.
- ARM connection names: esml-aifactory-infra-dev, esml-aifactory-infra-test, esml-aifactory-infra-prod
- Service principal info
- Role: OWNER (able to assign other idnetities priviledged roles)
- Scope: Subscription (DEV if Task is esml-common-dev, TEST subscription if esml-common-test)
- If external vNet (BYO vNet): - CONTRIBUTOR the Resourcegroup where the external vNet resides for Dev, Test, Prod subscriptions/spokes - Reason: To be able to create Network sercurity groups - Network Contributor to the vNet Reason: To be able to create subnets, and to be able to assigne network security groups to the subnets. Read more about: Permissions for the service principle
- Role: OWNER (able to assign other idnetities priviledged roles)
TODO: Support federated ARM connections https://learn.microsoft.com/en-us/azure/devops/pipelines/release/configure-workload-identity?view=azure-devops
- Go back to the other TAB, where you have the RELEASE pipeline open, at the TASK view with task esml-common-dev
- Click the Azure CLI task called 11-Common RG and RBAC to configure it, and select the ARM connection you created earlier, called esml-aifactory-infra-dev
- Note: You may need to click the refresh icon, for the combobox to re-load the newly created ARM connections to be selectable.
- Repeat this process, 1 and 2, for all steps 12-Common Networking, 13-Deploy resources
- Repeat 1-3 for all task stages - also for esml-common-test,esml-common-prod_ where you select the other respective ARM connections
- esml-common-test stage using the ARM connection: esml-aifactory-infra-test
- esml-common-prod stage using the ARM connection: esml-aifactory-infra-prod
- SAVE the release pipeline.
More information about variables can bee seen here
To get "my ip":
-
Option A) Go to any storage account in Azure, and click networking. At the green marking in image, your public IP is seen
-
Option B) Open a terminal and run:
nslookup myip.opendns.com resolver1.opendns.com
More information about variables can bee seen here
The AzureDatabricks application in your Microsoft EntraID is global, and does not exist if not anyone have created it before. It is a global application, same ObjectID (OID) for all Azure Databricks instances.
This is about the parameter: databricksOID in the file 10-esml-globals-5-13_23.json
- Problem: If you have a new tenant, without any subscriptions yet to have created an Azure Databricks services, then you will not have any Object ID for the AzureDatabricks enterprise application
- Solution: Create a dummy Azure databricks service. For example in the seeding keyvault. Then the ObjectID will be created.
Before, if not AzureDatabricks application:
After, when Azure Databricks dummy is created, and application exists:
If you cannot allow the AIFactory orchestration to create it own vNets, you can configure your precreated vNet in the parameter file 10-esml-globals-override.json
Example, of what you need to override:
If you want to BYOVnets for Dev, Stage, Prod, you need to pre-created them, and match some parameters more such as
- Your vNet: vnet-spoke-aifactory-sdc-dev-001
- Your addressspace: _10.11.0.0/18
- Parameter file that need to match the CIDR: 12-esml-cmn-parameters.json
- Parameter that needs to be matching: 12-esml-cmn-parameters.json "10.XX.0.0/18"
- Variable (Azure Devops, Github) that needs to be matching: cidr_range "11"
NB! seeding keyvault = inputKeyvault when speaking of variables and parameters in the AIFActory.
- This, due to legacy reason (ESML AIFactory was established 2019), but will be synced in the future as seeding keyvault
7) Check in your code, and add artifact to point at your sources code in Azure Devops Release pipeline
- Check in your code
- Click EDIT button
- Remove the artifact with source alias name: _esml-aifactory
- Click on the artifact box, a dialog opens
- Copy the source alias name at the bottom. You will need to add a new artifact with same source alias name
- Click the DELETE button
- Add artifact with name _esml-aifactory
Clich Add artifact
Configure as below, and keep everything else as default
- Source Type
- Azure repository (If classic ADO)
- BUILD (if .yaml ADO)
- Project: Select your Azure Devops project (e.g. where you have the parameter files and azure-enterprise-scale-ml submodule)
- Respository: Select your repo (e.g. where you have the parameter files and azure-enterprise-scale-ml submodule)
- Branch: main (e.g. where you have the parameter files and azure-enterprise-scale-ml submodule)
- Default version: latest
- Checkbox: "Checkout submodules" needs to be checked.
- Source alias name: _esml-aifactory
Cick SAVE button.
This is a checkpoint to see if all prerequisites setup have been done, before you run the pipeline.
Q: Have you created all Private DNS zones in the hub, manually?
E.g. if you want to have your Private DNS zones in your HUB, as recommended, e.g. that you have the flag centralDnsZoneByPolicyInHub=true in the file 10-esml-globals-4-13_21_22.json and that you have specified parameters: privDnsSubscription_param, privDnsResourceGroup_param
TODO:
- Ensure you have all Private DNS zones, pre-created in the HUB, manually (util-script are work in in TODO list)
- Ensure you have created vNet Link to the Hub vNet, for all Private DNS Zones
- Ensure you have the Azure Policy and Azure Initiative assigned How-To: Networking: peering-of-spookes-to-hub
- Ensure you have peered the spoke vNets to the Hub How-To: Networking: peering-of-spookes-to-hub
- Ensure you have all settings set in the parameter file 10-esml-globals-4-13_21_22.json
- The parameters: privDnsSubscription_param, privDnsResourceGroup_param, centralDnsZoneByPolicyInHub
Private DNS zones, when created:
Azure Policy's, when created:
E.g. if you want to have your Private DNS zones locally in each AIfactory spoke, in common resource group, only recommended if you do not want to peer th AIFactory to your hub, e.g DMEO mode - You have the flag centralDnsZoneByPolicyInHub=false in the file 10-esml-globals-4-13_21_22.json
TODO: You do not need to do anything.
- Note: But you cannot peer it either in an efficient way. Usuallu this is only done when testing the AIFactory isolated, via Bastion-only access mode.
If you don't know. Please go back to this step 12-resourceproviders.md where you have an automationscript to do this.
The parameters you edited, do they look as you configured them locally in Azure Devops also?
- The BICEP will have to create artifact under 1 or many subscriptions.
- Note: If you have an external vNet (BYO vNet) in another subscription than its AIFactory environment subscription, it needs Contributor on ResourceGroup to create NSG's, and Network Contributor on the vNet to be able to assign the NSG's. Read more about: Permissions for the service principle
5) Verify Azure Devops, inline script arguemts. Especially for service principle esml-common-bicep-sp
Check specifically the service principle name, of the secret name, in the seeding keyvault. If you have the default name, it should work. If now, you need to edit in the Azure Devops Task setup, Script Arguments inline. See image
Even in the service principle esml-common-bicep-sp has OWNER on the Azure subscription, the Access Policy on secrets: GET,LIST,SET
Otherwise you will encoutner an error message similas as below:
If so, you need to visit the seeding keyvault, Access policys, and give the service principle Get, List, Set
, and rerun the pipeline release.
Now you can go ahead and run the pipeline in Azure Devops.
The process for this is described here in a process flow diagram - Add AIFactory project
For more trouble shooting, Visit the FAQ