diff --git a/cm-mlops/script/download-and-extract/README.md b/cm-mlops/script/download-and-extract/README.md index d04b6274f3..47cb4f4804 100644 --- a/cm-mlops/script/download-and-extract/README.md +++ b/cm-mlops/script/download-and-extract/README.md @@ -130,6 +130,8 @@ ___ - Workflow: * `_curl` - Workflow: + * `_gdown` + - Workflow: * `_torrent` - Environment variables: - *CM_DAE_DOWNLOAD_USING_TORRENT*: `yes` diff --git a/cm-mlops/script/download-file/README.md b/cm-mlops/script/download-file/README.md index af95125a0d..6e448bf5e3 100644 --- a/cm-mlops/script/download-file/README.md +++ b/cm-mlops/script/download-file/README.md @@ -126,6 +126,13 @@ ___ - Environment variables: - *CM_DOWNLOAD_TOOL*: `curl` - Workflow: + * `_gdown` + - Environment variables: + - *CM_DOWNLOAD_TOOL*: `gdown` + - Workflow: + 1. ***Read "deps" on other CM scripts*** + * get,generic-python-lib,_package.gdown + - CM script: [get-generic-python-lib](https://github.com/mlcommons/ck/tree/master/cm-mlops/script/get-generic-python-lib) * `_wget` - Environment variables: - *CM_DOWNLOAD_TOOL*: `wget` @@ -142,12 +149,13 @@ ___
Click here to expand this section. +* `--download_path=value` → `CM_DOWNLOAD_PATH=value` * `--url=value` → `CM_DOWNLOAD_URL=value` **Above CLI flags can be used in the Python CM API as follows:** ```python -r=cm.access({... , "url":...} +r=cm.access({... , "download_path":...} ```
diff --git a/cm-mlops/script/prepare-training-data-bert/README.md b/cm-mlops/script/prepare-training-data-bert/README.md new file mode 100644 index 0000000000..ec97eac807 --- /dev/null +++ b/cm-mlops/script/prepare-training-data-bert/README.md @@ -0,0 +1,188 @@ +
+Click here to see the table of contents. + +* [Description](#description) +* [Information](#information) +* [Usage](#usage) + * [ CM installation](#cm-installation) + * [ CM script automation help](#cm-script-automation-help) + * [ CM CLI](#cm-cli) + * [ CM Python API](#cm-python-api) + * [ CM GUI](#cm-gui) + * [ CM modular Docker container](#cm-modular-docker-container) +* [Customization](#customization) + * [ Variations](#variations) + * [ Script flags mapped to environment](#script-flags-mapped-to-environment) + * [ Default environment](#default-environment) +* [Script workflow, dependencies and native scripts](#script-workflow-dependencies-and-native-scripts) +* [Script output](#script-output) +* [New environment keys (filter)](#new-environment-keys-(filter)) +* [New environment keys auto-detected from customize](#new-environment-keys-auto-detected-from-customize) +* [Maintainers](#maintainers) + +
+ +*Note that this README is automatically generated - don't edit! Use `README-extra.md` to add more info.* + +### Description + +#### Information + +* CM GitHub repository: *[mlcommons@ck](https://github.com/mlcommons/ck/tree/master/cm-mlops)* +* GitHub directory for this script: *[GitHub](https://github.com/mlcommons/ck/tree/master/cm-mlops/script/prepare-training-data-bert)* +* CM meta description for this script: *[_cm.json](_cm.json)* +* CM "database" tags to find this script: *prepare,training,data,input,bert* +* Output cached?: *True* +___ +### Usage + +#### CM installation + +[Guide](https://github.com/mlcommons/ck/blob/master/docs/installation.md) + +##### CM pull repository + +```cm pull repo mlcommons@ck``` + +##### CM script automation help + +```cm run script --help``` + +#### CM CLI + +1. `cm run script --tags=prepare,training,data,input,bert[,variations] [--input_flags]` + +2. `cm run script "prepare training data input bert[,variations]" [--input_flags]` + +3. `cm run script 1e06a7abe23545eb [--input_flags]` + +* `variations` can be seen [here](#variations) + +* `input_flags` can be seen [here](#script-flags-mapped-to-environment) + +#### CM Python API + +
+Click here to expand this section. + +```python + +import cmind + +r = cmind.access({'action':'run' + 'automation':'script', + 'tags':'prepare,training,data,input,bert' + 'out':'con', + ... + (other input keys for this script) + ... + }) + +if r['return']>0: + print (r['error']) + +``` + +
+ + +#### CM GUI + +```cm run script --tags=gui --script="prepare,training,data,input,bert"``` + +Use this [online GUI](https://cKnowledge.org/cm-gui/?tags=prepare,training,data,input,bert) to generate CM CMD. + +#### CM modular Docker container + +*TBD* + +___ +### Customization + + +#### Variations + + * Group "**implementation**" +
+ Click here to expand this section. + + * **`_nvidia`** (default) + - Environment variables: + - *CM_TMP_VARIATION*: `nvidia` + - Workflow: + 1. ***Read "deps" on other CM scripts*** + * get,git,repo,_repo.https://github.com/mlcommons/training_results_v2.1 + - CM script: [get-git-repo](https://github.com/mlcommons/ck/tree/master/cm-mlops/script/get-git-repo) + 1. ***Read "prehook_deps" on other CM scripts*** + * download,file,_gdown,_url.https://drive.google.com/uc?id=1fbGClQMi2CoMv7fwrwTC5YYPooQBdcFW + - CM script: [download-file](https://github.com/mlcommons/ck/tree/master/cm-mlops/script/download-file) + * download,file,_gdown,_url.https://drive.google.com/uc?id=1USK108J6hMM_d27xCHi738qBL8_BT1u1 + - CM script: [download-file](https://github.com/mlcommons/ck/tree/master/cm-mlops/script/download-file) + * download,file,_gdown,_url.https://drive.google.com/uc?id=1tmMgLwoBvbEJEHXh77sqrXYw5RpqT8R_ + - CM script: [download-file](https://github.com/mlcommons/ck/tree/master/cm-mlops/script/download-file) + * download-and-extract,file,_gdown,_extract,_url.https://drive.google.com/uc?id=14xV2OUGSQDG_yDBrmbSdcDC-QGeqpfs_ + - CM script: [download-and-extract](https://github.com/mlcommons/ck/tree/master/cm-mlops/script/download-and-extract) + * download,file,_gdown,_url.https://drive.google.com/uc?id=1chiTBljF0Eh1U5pKs6ureVHgSbtU8OG_ + - CM script: [download-file](https://github.com/mlcommons/ck/tree/master/cm-mlops/script/download-file) + * download,file,_gdown,_url.https://drive.google.com/uc?id=1Q47V3K3jFRkbJ2zGCrKkKk-n0fvMZsa0 + - CM script: [download-file](https://github.com/mlcommons/ck/tree/master/cm-mlops/script/download-file) + * download,file,_gdown,_url.https://drive.google.com/uc?id=1vAcVmXSLsLeQ1q7gvHnQUSth5W_f_pwv + - CM script: [download-file](https://github.com/mlcommons/ck/tree/master/cm-mlops/script/download-file) + +
+ + +#### Default variations + +`_nvidia` + +#### Script flags mapped to environment +
+Click here to expand this section. + +* `--data_dir=value` → `CM_DATA_DIR=value` + +**Above CLI flags can be used in the Python CM API as follows:** + +```python +r=cm.access({... , "data_dir":...} +``` + +
+ +#### Default environment + +
+Click here to expand this section. + +These keys can be updated via `--env.KEY=VALUE` or `env` dictionary in `@input.json` or using script flags. + + +
+ +___ +### Script workflow, dependencies and native scripts + +
+Click here to expand this section. + + 1. Read "deps" on other CM scripts from [meta](https://github.com/mlcommons/ck/tree/master/cm-mlops/script/prepare-training-data-bert/_cm.json) + 1. ***Run "preprocess" function from [customize.py](https://github.com/mlcommons/ck/tree/master/cm-mlops/script/prepare-training-data-bert/customize.py)*** + 1. Read "prehook_deps" on other CM scripts from [meta](https://github.com/mlcommons/ck/tree/master/cm-mlops/script/prepare-training-data-bert/_cm.json) + 1. ***Run native script if exists*** + * [run.sh](https://github.com/mlcommons/ck/tree/master/cm-mlops/script/prepare-training-data-bert/run.sh) + 1. Read "posthook_deps" on other CM scripts from [meta](https://github.com/mlcommons/ck/tree/master/cm-mlops/script/prepare-training-data-bert/_cm.json) + 1. ***Run "postrocess" function from [customize.py](https://github.com/mlcommons/ck/tree/master/cm-mlops/script/prepare-training-data-bert/customize.py)*** + 1. Read "post_deps" on other CM scripts from [meta](https://github.com/mlcommons/ck/tree/master/cm-mlops/script/prepare-training-data-bert/_cm.json) +
+ +___ +### Script output +#### New environment keys (filter) + +#### New environment keys auto-detected from customize + +___ +### Maintainers + +* [Open MLCommons taskforce on automation and reproducibility](https://github.com/mlcommons/ck/blob/master/docs/taskforce.md) \ No newline at end of file