Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Readability #1757

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions NBSETUP.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ We recommend you create a Python virtual environment ([Miniconda](https://conda.
# install just the base SDK
pip install azureml-sdk

# clone the sample repoistory
# clone the sample repository
git clone https://github.com/Azure/MachineLearningNotebooks.git

# below steps are optional
Expand Down Expand Up @@ -57,10 +57,10 @@ Please make sure you start with the [Configuration](configuration.ipynb) noteboo

You need to have Docker engine installed locally and running. Open a command line window and type the following command.

__Note:__ We use version `1.0.10` below as an exmaple, but you can replace that with any available version number you like.
__Note:__ We use version `1.0.10` below as an example, but you can replace that with any available version number you like.

```sh
# clone the sample repoistory
# clone the sample repository
git clone https://github.com/Azure/MachineLearningNotebooks.git

# change current directory to the folder
Expand Down
4 changes: 2 additions & 2 deletions contrib/RAPIDS/azure-ml-with-nvidia-rapids.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -334,7 +334,7 @@
"source": [
"RunConfiguration is used to submit jobs to Azure Machine Learning service. When creating RunConfiguration for a job, users can either \n",
"1. specify a Docker image with prebuilt conda environment and use it without any modifications to run the job, or \n",
"2. specify a Docker image as the base image and conda or pip packages as dependnecies to let AML build a new Docker image with a conda environment containing specified dependencies to use in the job\n",
"2. specify a Docker image as the base image and conda or pip packages as dependencies to let AML build a new Docker image with a conda environment containing specified dependencies to use in the job\n",
"\n",
"The second option is the recommended option in AML. \n",
"The following steps have code for both options. You can pick the one that is more appropriate for your requirements. "
Expand All @@ -351,7 +351,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The following code shows how to install RAPIDS using conda. The `rapids.yml` file contains the list of packages necessary to run this tutorial. **NOTE:** Initial build of the image might take up to 20 minutes as the service needs to build and cache the new image; once the image is built the subequent runs use the cached image and the overhead is minimal."
"The following code shows how to install RAPIDS using conda. The `rapids.yml` file contains the list of packages necessary to run this tutorial. **NOTE:** Initial build of the image might take up to 20 minutes as the service needs to build and cache the new image; once the image is built the subsequent runs use the cached image and the overhead is minimal."
]
},
{
Expand Down
2 changes: 1 addition & 1 deletion contrib/fairness/fairlearn-azureml-mitigation.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -266,7 +266,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Looking at the disparity in accuracy when we select 'Sex' as the sensitive feature, we see that males have an error rate about three times greater than the females. More interesting is the disparity in opportunitiy - males are offered loans at three times the rate of females.\n",
"Looking at the disparity in accuracy when we select 'Sex' as the sensitive feature, we see that males have an error rate about three times greater than the females. More interesting is the disparity in opportunity - males are offered loans at three times the rate of females.\n",
"\n",
"Despite the fact that we removed the feature from the training data, our predictor still discriminates based on sex. This demonstrates that simply ignoring a protected attribute when fitting a predictor rarely eliminates unfairness. There will generally be enough other features correlated with the removed attribute to lead to disparate impact."
]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -313,7 +313,7 @@
"|**allowed_models** | *List* of *strings* indicating machine learning algorithms for AutoML to use in this run. Same values listed above for **blocked_models** allowed for **allowed_models**.|\n",
"|**experiment_exit_score**| Value indicating the target for *primary_metric*. <br>Once the target is surpassed the run terminates.|\n",
"|**experiment_timeout_hours**| Maximum amount of time in hours that all iterations combined can take before the experiment terminates.|\n",
"|**enable_early_stopping**| Flag to enble early termination if the score is not improving in the short term.|\n",
"|**enable_early_stopping**| Flag to enable early termination if the score is not improving in the short term.|\n",
"|**featurization**| 'auto' / 'off' Indicator for whether featurization step should be done automatically or not. Note: If the input data is sparse, featurization cannot be turned on.|\n",
"|**n_cross_validations**|Number of cross validation splits.|\n",
"|**training_data**|Input dataset, containing both features and label column.|\n",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -197,7 +197,7 @@
" \"primary_metric\": \"average_precision_score_weighted\",\n",
" \"enable_early_stopping\": True,\n",
" \"max_concurrent_iterations\": 2, # This is a limit for testing purpose, please increase it as per cluster size\n",
" \"experiment_timeout_hours\": 0.25, # This is a time limit for testing purposes, remove it for real use cases, this will drastically limit ablity to find the best model possible\n",
" \"experiment_timeout_hours\": 0.25, # This is a time limit for testing purposes, remove it for real use cases, this will drastically limit ability to find the best model possible\n",
" \"verbosity\": logging.INFO,\n",
"}\n",
"\n",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
"metadata": {},
"source": [
"## Introduction\n",
"In this example we use AutoML and Pipelines to enable contious retraining of a model based on updates to the training dataset. We will create two pipelines, the first one to demonstrate a training dataset that gets updated over time. We leverage time-series capabilities of `TabularDataset` to achieve this. The second pipeline utilizes pipeline `Schedule` to trigger continuous retraining. \n",
"In this example we use AutoML and Pipelines to enable continuous retraining of a model based on updates to the training dataset. We will create two pipelines, the first one to demonstrate a training dataset that gets updated over time. We leverage time-series capabilities of `TabularDataset` to achieve this. The second pipeline utilizes pipeline `Schedule` to trigger continuous retraining. \n",
"Make sure you have executed the [configuration notebook](../../../configuration.ipynb) before running this notebook.\n",
"In this notebook you will learn how to:\n",
"* Create an Experiment in an existing Workspace.\n",
Expand Down Expand Up @@ -90,7 +90,7 @@
"outputs": [],
"source": [
"ws = Workspace.from_config()\n",
"dstor = ws.get_default_datastore()\n",
"dstore = ws.get_default_datastore()\n",
"\n",
"# Choose a name for the run history container in the workspace.\n",
"experiment_name = \"retrain-noaaweather\"\n",
Expand Down Expand Up @@ -367,13 +367,13 @@
"\n",
"metrics_data = PipelineData(\n",
" name=\"metrics_data\",\n",
" datastore=dstor,\n",
" datastore=dstore,\n",
" pipeline_output_name=metrics_output_name,\n",
" training_output=TrainingOutput(type=\"Metrics\"),\n",
")\n",
"model_data = PipelineData(\n",
" name=\"model_data\",\n",
" datastore=dstor,\n",
" datastore=dstore,\n",
" pipeline_output_name=best_model_output_name,\n",
" training_output=TrainingOutput(type=\"Model\"),\n",
")"
Expand Down Expand Up @@ -503,7 +503,7 @@
" pipeline_parameters={\"ds_name\": dataset, \"model_name\": \"noaaweatherds\"},\n",
" pipeline_id=published_pipeline.id,\n",
" experiment_name=experiment_name,\n",
" datastore=dstor,\n",
" datastore=dstore,\n",
" wait_for_provisioning=True,\n",
" polling_interval=1440,\n",
")"
Expand Down Expand Up @@ -550,7 +550,7 @@
" pipeline_parameters={\"ds_name\": dataset},\n",
" pipeline_id=published_pipeline.id,\n",
" experiment_name=experiment_name,\n",
" datastore=dstor,\n",
" datastore=dstore,\n",
" wait_for_provisioning=True,\n",
" polling_interval=1440,\n",
")"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@ def get_noaa_data(start_time, end_time):

print("Argument 1(ds_name): %s" % args.ds_name)

dstor = ws.get_default_datastore()
dstore = ws.get_default_datastore()
register_dataset = False
end_time = datetime.utcnow()

Expand Down Expand Up @@ -143,15 +143,15 @@ def get_noaa_data(start_time, end_time):
os.makedirs(folder_name, exist_ok=True)
train_df.to_csv(file_path, index=False)

dstor.upload_files(
dstore.upload_files(
files=[file_path], target_path=folder_name, overwrite=True, show_progress=True
)
else:
print("No new data since {0}.".format(end_time_last_slice))

if register_dataset:
ds = Dataset.Tabular.from_delimited_files(
dstor.path("{}/**/*.csv".format(args.ds_name)),
dstore.path("{}/**/*.csv".format(args.ds_name)),
partition_format="/{partition_date:yyyy/MM/dd/HH/mm/ss}/data.csv",
)
ds.register(ws, name=args.ds_name)
Original file line number Diff line number Diff line change
Expand Up @@ -185,7 +185,7 @@
"metadata": {},
"source": [
"The split data will be used in the remote compute by ModelProxy and locally to compare results.\n",
"So, we need to persist the split data to avoid descrepencies from different package versions in the local and remote."
"So, we need to persist the split data to avoid discrepancies from different package versions in the local and remote."
]
},
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -360,7 +360,7 @@
" \"track_child_runs\": False,\n",
"}\n",
"\n",
"mm_paramters = ManyModelsTrainParameters(\n",
"mm_parameters = ManyModelsTrainParameters(\n",
" automl_settings=automl_settings, partition_column_names=partition_column_names\n",
")"
]
Expand Down Expand Up @@ -405,7 +405,7 @@
" node_count=2,\n",
" process_count_per_node=2,\n",
" run_invocation_timeout=920,\n",
" train_pipeline_parameters=mm_paramters,\n",
" train_pipeline_parameters=mm_parameters,\n",
")"
]
},
Expand Down Expand Up @@ -506,7 +506,7 @@
"| **node_count** | The number of compute nodes to be used for running the user script. We recommend to start with the number of cores per node (varies by compute sku). |\n",
"| **process_count_per_node** | The number of processes per node.\n",
"| **train_run_id** | \\[Optional\\] The run id of the hierarchy training, by default it is the latest successful training many model run in the experiment. |\n",
"| **train_experiment_name** | \\[Optional\\] The train experiment that contains the train pipeline. This one is only needed when the train pipeline is not in the same experiement as the inference pipeline. |\n",
"| **train_experiment_name** | \\[Optional\\] The train experiment that contains the train pipeline. This one is only needed when the train pipeline is not in the same experiment as the inference pipeline. |\n",
"| **process_count_per_node** | \\[Optional\\] The number of processes per node, by default it's 4. |"
]
},
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
parsed_args, _ = parser.parse_known_args()
step_number = int(parsed_args.step_number)
step_size = int(parsed_args.step_size)
# Create the working dirrectory to store the temporary csv files.
# Create the working directory to store the temporary csv files.
working_dir = parsed_args.out_dir
os.makedirs(working_dir, exist_ok=True)
# Set input and output
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -515,7 +515,7 @@
"source": [
"# <font color='blue'>Backtest the best model</font> <a id=\"backtest_model\"></a>\n",
"\n",
"For model backtesting we will use the same parameters we used to backtest AutoML. All the models, we have obtained in the previous run were registered in our workspace. To identify the model, each was assigned a tag with the last trainig date."
"For model backtesting we will use the same parameters we used to backtest AutoML. All the models, we have obtained in the previous run were registered in our workspace. To identify the model, each was assigned a tag with the last training date."
]
},
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -185,7 +185,7 @@
"\n",
"We will use energy consumption [data from New York City](http://mis.nyiso.com/public/P-58Blist.htm) for model training. The data is stored in a tabular format and includes energy demand and basic weather data at an hourly frequency. \n",
"\n",
"With Azure Machine Learning datasets you can keep a single copy of data in your storage, easily access data during model training, share data and collaborate with other users. Below, we will upload the datatset and create a [tabular dataset](https://docs.microsoft.com/bs-latn-ba/azure/machine-learning/service/how-to-create-register-datasets#dataset-types) to be used training and prediction."
"With Azure Machine Learning datasets you can keep a single copy of data in your storage, easily access data during model training, share data and collaborate with other users. Below, we will upload the dataset and create a [tabular dataset](https://docs.microsoft.com/bs-latn-ba/azure/machine-learning/service/how-to-create-register-datasets#dataset-types) to be used training and prediction."
]
},
{
Expand Down Expand Up @@ -329,7 +329,7 @@
"|**label_column_name**|The name of the label column.|\n",
"|**compute_target**|The remote compute for training.|\n",
"|**n_cross_validations**|Number of cross validation splits. Rolling Origin Validation is used to split time-series in a temporally consistent way.|\n",
"|**enable_early_stopping**|Flag to enble early termination if the score is not improving in the short term.|\n",
"|**enable_early_stopping**|Flag to enable early termination if the score is not improving in the short term.|\n",
"|**forecasting_parameters**|A class holds all the forecasting related parameters.|\n"
]
},
Expand Down Expand Up @@ -504,7 +504,7 @@
"metadata": {},
"source": [
"### Retrieving forecasts from the model\n",
"We have created a function called `run_forecast` that submits the test data to the best model determined during the training run and retrieves forecasts. This function uses a helper script `forecasting_script` which is uploaded and expecuted on the remote compute."
"We have created a function called `run_forecast` that submits the test data to the best model determined during the training run and retrieves forecasts. This function uses a helper script `forecasting_script` which is uploaded and executed on the remote computer."
]
},
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ def run_remote_inference(
target_column_name,
inference_folder="./forecast",
):
# Create local directory to copy the model.pkl and forecsting_script.py files into.
# Create local directory to copy the model.pkl and forecasting_script.py files into.
# These files will be uploaded to and executed on the compute instance.
os.makedirs(inference_folder, exist_ok=True)
shutil.copy("forecasting_script.py", inference_folder)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -437,7 +437,7 @@
"source": [
"# The data set contains hourly data, the training set ends at 01/02/2000 at 05:00\n",
"\n",
"# These are predictions we are asking the model to make (does not contain thet target column y),\n",
"# These are predictions we are asking the model to make (does not contain the target column y),\n",
"# for 6 periods beginning with 2000-01-02 06:00, which immediately follows the training data\n",
"X_test"
]
Expand Down Expand Up @@ -765,7 +765,7 @@
"\n",
"![Recursive_forecast_overview](recursive_forecast_overview_small.png)\n",
"\n",
"Internally, we apply the forecaster in an iterative manner and finish the forecast task in two interations. In the first iteration, we apply the forecaster and get the prediction for the first forecast-horizon periods (y_pred1). In the second iteraction, y_pred1 is used as the context to produce the prediction for the next forecast-horizon periods (y_pred2). The combination of (y_pred1 and y_pred2) gives the results for the total forecast periods. \n",
"Internally, we apply the forecaster in an iterative manner and finish the forecast task in two iterations. In the first iteration, we apply the forecaster and get the prediction for the first forecast-horizon periods (y_pred1). In the second iteration, y_pred1 is used as the context to produce the prediction for the next forecast-horizon periods (y_pred2). The combination of (y_pred1 and y_pred2) gives the results for the total forecast periods. \n",
"\n",
"A caveat: forecast accuracy will likely be worse the farther we predict into the future since errors are compounded with recursive application of the forecaster.\n",
"\n",
Expand Down Expand Up @@ -840,7 +840,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Similarly with the simple senarios illustrated above, forecasting farther than the forecast horizon in other senarios like 'multiple time-series', 'Destination-date forecast', and 'forecast away from the training data' are also automatically handled by the `forecast()` function. "
"Similarly with the simple scenarios illustrated above, forecasting farther than the forecast horizon in other scenarios like 'multiple time-series', 'Destination-date forecast', and 'forecast away from the training data' are also automatically handled by the `forecast()` function. "
]
}
],
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -299,7 +299,7 @@
"\n",
"train, valid = split_full_for_forecasting(df, time_column_name)\n",
"\n",
"# Reset index to create a Tabualr Dataset.\n",
"# Reset index to create a Tabular Dataset.\n",
"train.reset_index(inplace=True)\n",
"valid.reset_index(inplace=True)\n",
"test_df.reset_index(inplace=True)\n",
Expand Down
Loading