-
Notifications
You must be signed in to change notification settings - Fork 21
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Added Hyperdrive Example via CLI * Updated readme * Added hyperdrive notebooks
- Loading branch information
Showing
10 changed files
with
433 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
# Exercise Instructions | ||
|
||
Open [`hyperdrive_pipeline.ipynb`](hyperdrive_pipeline.ipynb) and follow the instructions in the notebook. | ||
|
||
# Running this via CLI | ||
|
||
You can also run the Hyperdrive Hyperparameter Tuning via CLI: | ||
|
||
```console | ||
az ml folder attach -w <YOUR WORKSPACE NAME> -g <YOUR RESOURCE GROUP> | ||
az ml run submit-hyperdrive --hyperdrive-configuration-name hyperdrive_config.yml -c hyperdrive -e hyperdrive-test | ||
``` | ||
|
||
In this case: | ||
* [`hyperdrive_config.yml`](hyperdrive_config.yml) holds the configuration for the hyperparameter tuning. Full details on the parameters can be found [here](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-tune-hyperparameters#define-the-search-space) | ||
* [`hyperdrive.runconfig`](hyperdrive.runconfig) holds the general script definition (which dataset, cluster, etc.) | ||
* [`train.py`](train.py) takes all the hyperparameters as argument inputs | ||
|
||
You can check the results in the Studio UI (navigate to the run, then select `Child Runs`): | ||
|
||
 | ||
|
||
Each Hyperparameter permutation is its own child run. | ||
|
||
# Knowledge Check | ||
|
||
To be written |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
name: workshop-env | ||
channels: | ||
- conda-forge | ||
- defaults | ||
dependencies: | ||
- python=3.6.2 | ||
- pip: | ||
- azureml-defaults | ||
- azureml-sdk | ||
- scikit-learn==0.20.3 | ||
- pandas==0.25.3 | ||
- joblib==0.13.2 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
script: train.py | ||
arguments: [--data-path, /data] | ||
target: cpu-cluster | ||
framework: Python | ||
communicator: None | ||
nodeCount: 1 | ||
environment: | ||
environmentVariables: | ||
EXAMPLE_ENV_VAR: EXAMPLE_VALUE | ||
python: | ||
userManagedDependencies: false | ||
interpreterPath: python | ||
condaDependenciesFile: conda.yml | ||
docker: | ||
enabled: true | ||
baseImage: mcr.microsoft.com/azureml/base:intelmpi2018.3-ubuntu16.04 | ||
arguments: [] | ||
mpi: | ||
processCountPerNode: 1 | ||
data: | ||
training_dataset: | ||
environmentVariableName: training_dataset | ||
dataLocation: | ||
dataset: | ||
name: german-credit-train-tutorial | ||
version: 1 | ||
mechanism: download | ||
pathOnCompute: /data | ||
overwrite: true |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
# For more details, visit: | ||
# https://docs.microsoft.com/en-us/azure/machine-learning/how-to-tune-hyperparameters#define-the-search-space | ||
sampling: | ||
type: random # Supported options: Random, Grid, Bayesian | ||
parameter_space: # specify a name|expression|values tuple for each parameter. | ||
- name: --c # The name of a script parameter to generate values for. | ||
expression: choice # supported options: choice, randint, uniform, quniform, loguniform, qloguniform, normal, qnormal, lognormal, qlognormal | ||
values: [0.5, 1, 1.5] # The list of values, the number of values is dependent on the expression specified. | ||
policy: | ||
type: BanditPolicy # Supported options: BanditPolicy, MedianStoppingPolicy, TruncationSelectionPolicy, NoTerminationPolicy | ||
evaluation_interval: 1 # Policy properties are policy specific. See the above link for policy specific parameter details. | ||
slack_factor: 0.2 | ||
primary_metric_name: Test accuracy # The metric used when evaluating the policy | ||
primary_metric_goal: Maximize # Maximize or Minimize | ||
max_total_runs: 8 # The maximum number of runs to generate | ||
max_concurrent_runs: 1 # The number of runs that can run concurrently. | ||
max_duration_minutes: 60 # The maximum length of time to run the experiment before cancelling. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,197 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Hyperparameter Tuning pipeline examples\n", | ||
"\n", | ||
"In this example, we'll build a pipeline for Hyperparameter tuning. This pipeline will test multiple hyperparameter permutations and then register the best model.\n", | ||
"\n", | ||
"**Note:** This example requires that you've ran the notebook from the first tutorial, so that the dataset and compute cluster are set up." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"import os\n", | ||
"import azureml.core\n", | ||
"from azureml.core import Workspace, Experiment, Dataset, RunConfiguration\n", | ||
"from azureml.pipeline.core import Pipeline, PipelineData\n", | ||
"from azureml.pipeline.steps import PythonScriptStep, HyperDriveStep, HyperDriveStepRun\n", | ||
"from azureml.data.dataset_consumption_config import DatasetConsumptionConfig\n", | ||
"from azureml.train.hyperdrive import RandomParameterSampling, BanditPolicy, HyperDriveConfig, PrimaryMetricGoal\n", | ||
"from azureml.train.hyperdrive import choice, loguniform, uniform\n", | ||
"from azureml.core import ScriptRunConfig\n", | ||
"\n", | ||
"print(\"Azure ML SDK version:\", azureml.core.VERSION)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"First, we will connect to the workspace. The command `Workspace.from_config()` will either:\n", | ||
"* Read the local `config.json` with the workspace reference (given it is there) or\n", | ||
"* Use the `az` CLI to connect to the workspace and use the workspace attached to via `az ml folder attach -g <resource group> -w <workspace name>`" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"ws = Workspace.from_config()\n", | ||
"print(f'WS name: {ws.name}\\nRegion: {ws.location}\\nSubscription id: {ws.subscription_id}\\nResource group: {ws.resource_group}')" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Preparation\n", | ||
"\n", | ||
"Let's reference the dataset from the first tutorial:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"training_dataset = Dataset.get_by_name(ws, \"german-credit-train-tutorial\")\n", | ||
"training_dataset_consumption = DatasetConsumptionConfig(\"training_dataset\", training_dataset).as_download()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Here, we define the parameter sampling (defines the search space for our hyperparameters we want to try), early termination policy (allows to kill poorly performing runs early), then we put this togehter as a `HyperDriveConfig` and execute it in an `HyperDriveStep`. Lastly, we have a short step to register the best model." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"runconfig = RunConfiguration.load(\"runconfig.yml\")\n", | ||
"script_run_config = ScriptRunConfig(source_directory=\"./\",\n", | ||
" run_config=runconfig)\n", | ||
"script_run_config.data_references = None\n", | ||
"\n", | ||
"ps = RandomParameterSampling(\n", | ||
" {\n", | ||
" '--c': uniform(0.1, 1.9)\n", | ||
" }\n", | ||
")\n", | ||
"early_termination_policy = BanditPolicy(evaluation_interval=2, slack_factor=0.1)\n", | ||
"\n", | ||
"hd_config = HyperDriveConfig(run_config=script_run_config, \n", | ||
" hyperparameter_sampling=ps,\n", | ||
" policy=early_termination_policy,\n", | ||
" primary_metric_name='Test accuracy', \n", | ||
" primary_metric_goal=PrimaryMetricGoal.MAXIMIZE, \n", | ||
" max_total_runs=4,\n", | ||
" max_concurrent_runs=1)\n", | ||
"\n", | ||
"hd_step = HyperDriveStep(name='hyperparameter-tuning',\n", | ||
" hyperdrive_config=hd_config,\n", | ||
" estimator_entry_script_arguments=['--data-path', training_dataset_consumption],\n", | ||
" inputs=[training_dataset_consumption],\n", | ||
" outputs=None)\n", | ||
"\n", | ||
"register_step = PythonScriptStep(script_name='register.py',\n", | ||
" runconfig=runconfig,\n", | ||
" name=\"register-model\",\n", | ||
" compute_target=\"cpu-cluster\",\n", | ||
" arguments=['--model_name', 'best_model'],\n", | ||
" allow_reuse=False)\n", | ||
"\n", | ||
"# Explicitly state that registration runs after training, as there is not direct dependency through inputs/outputs\n", | ||
"register_step.run_after(hd_step)\n", | ||
"\n", | ||
"steps = [hd_step, register_step]" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Finally, we can create our pipeline object and validate it. This will check the input and outputs are properly linked and that the pipeline graph is a non-cyclic graph:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"pipeline = Pipeline(workspace=ws, steps=steps)\n", | ||
"pipeline.validate()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Lastly, we can submit the pipeline against an experiment:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"tags": [ | ||
"outputPrepend" | ||
] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"pipeline_run = Experiment(ws, 'hyperparameter-pipeline').submit(pipeline)\n", | ||
"pipeline_run.wait_for_completion()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [] | ||
} | ||
], | ||
"metadata": { | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.7.9-final" | ||
}, | ||
"orig_nbformat": 2, | ||
"kernelspec": { | ||
"name": "python3", | ||
"display_name": "Python 3", | ||
"language": "python" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
import json | ||
import os | ||
import ast | ||
import argparse | ||
import azureml.core | ||
from azureml.core import Run | ||
from azureml.pipeline.steps.hyper_drive_step import HyperDriveStepRun | ||
|
||
def getRuntimeArgs(): | ||
parser = argparse.ArgumentParser() | ||
parser.add_argument('--model_name', type=str) | ||
args = parser.parse_args() | ||
return args | ||
|
||
def main(): | ||
args = getRuntimeArgs() | ||
model_name = args.model_name | ||
|
||
# current run is the registration step | ||
current_run = Run.get_context() | ||
|
||
# parent run is the overall pipeline | ||
parent_run = current_run.parent | ||
|
||
# Get the HyperDriveStep of the pipeline by name (make sure only 1 exists) | ||
hd_step_run = HyperDriveStepRun(step_run=parent_run.find_step_run('hyperparameter-tuning')[0]) | ||
|
||
# Get RunID for best run | ||
best_run = hd_step_run.get_best_run_by_primary_metric() | ||
best_run_id = best_run.id | ||
|
||
# Get the best run's metrics and hyperparameters | ||
hyperparameters = ast.literal_eval(hd_step_run.get_hyperparameters()[best_run_id].replace('--', '')) | ||
metrics = hd_step_run.get_metrics()[best_run_id] | ||
|
||
best_run.register_model(model_path='outputs/model.pkl', | ||
model_name=model_name, | ||
properties={**metrics, **hyperparameters}) | ||
|
||
if __name__ == "__main__": | ||
main() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
script: train.py | ||
arguments: [] # This is set in our pipeline definition script | ||
target: cpu-cluster | ||
framework: Python | ||
communicator: None | ||
nodeCount: 1 | ||
environment: | ||
environmentVariables: | ||
EXAMPLE_ENV_VAR: EXAMPLE_VALUE | ||
python: | ||
userManagedDependencies: false | ||
interpreterPath: python | ||
condaDependenciesFile: conda.yml | ||
docker: | ||
enabled: true | ||
baseImage: mcr.microsoft.com/azureml/base:intelmpi2018.3-ubuntu16.04 | ||
arguments: [] | ||
mpi: | ||
processCountPerNode: 1 |
Oops, something went wrong.