Skip to content

Commit

Permalink
Merge branch 'master' of https://github.com/ctuning/mlcommons-ck
Browse files Browse the repository at this point in the history
  • Loading branch information
gfursin committed Jun 8, 2023
2 parents 3c8903e + 45012f4 commit 626f7f8
Show file tree
Hide file tree
Showing 3 changed files with 199 additions and 1 deletion.
2 changes: 2 additions & 0 deletions cm-mlops/script/download-and-extract/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,8 @@ ___
- Workflow:
* `_curl`
- Workflow:
* `_gdown`
- Workflow:
* `_torrent`
- Environment variables:
- *CM_DAE_DOWNLOAD_USING_TORRENT*: `yes`
Expand Down
10 changes: 9 additions & 1 deletion cm-mlops/script/download-file/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,13 @@ ___
- Environment variables:
- *CM_DOWNLOAD_TOOL*: `curl`
- Workflow:
* `_gdown`
- Environment variables:
- *CM_DOWNLOAD_TOOL*: `gdown`
- Workflow:
1. ***Read "deps" on other CM scripts***
* get,generic-python-lib,_package.gdown
- CM script: [get-generic-python-lib](https://github.com/mlcommons/ck/tree/master/cm-mlops/script/get-generic-python-lib)
* `_wget`
- Environment variables:
- *CM_DOWNLOAD_TOOL*: `wget`
Expand All @@ -142,12 +149,13 @@ ___
<details>
<summary>Click here to expand this section.</summary>

* `--download_path=value` &rarr; `CM_DOWNLOAD_PATH=value`
* `--url=value` &rarr; `CM_DOWNLOAD_URL=value`

**Above CLI flags can be used in the Python CM API as follows:**

```python
r=cm.access({... , "url":...}
r=cm.access({... , "download_path":...}
```

</details>
Expand Down
188 changes: 188 additions & 0 deletions cm-mlops/script/prepare-training-data-bert/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,188 @@
<details>
<summary>Click here to see the table of contents.</summary>

* [Description](#description)
* [Information](#information)
* [Usage](#usage)
* [ CM installation](#cm-installation)
* [ CM script automation help](#cm-script-automation-help)
* [ CM CLI](#cm-cli)
* [ CM Python API](#cm-python-api)
* [ CM GUI](#cm-gui)
* [ CM modular Docker container](#cm-modular-docker-container)
* [Customization](#customization)
* [ Variations](#variations)
* [ Script flags mapped to environment](#script-flags-mapped-to-environment)
* [ Default environment](#default-environment)
* [Script workflow, dependencies and native scripts](#script-workflow-dependencies-and-native-scripts)
* [Script output](#script-output)
* [New environment keys (filter)](#new-environment-keys-(filter))
* [New environment keys auto-detected from customize](#new-environment-keys-auto-detected-from-customize)
* [Maintainers](#maintainers)

</details>

*Note that this README is automatically generated - don't edit! Use `README-extra.md` to add more info.*

### Description

#### Information

* CM GitHub repository: *[mlcommons@ck](https://github.com/mlcommons/ck/tree/master/cm-mlops)*
* GitHub directory for this script: *[GitHub](https://github.com/mlcommons/ck/tree/master/cm-mlops/script/prepare-training-data-bert)*
* CM meta description for this script: *[_cm.json](_cm.json)*
* CM "database" tags to find this script: *prepare,training,data,input,bert*
* Output cached?: *True*
___
### Usage

#### CM installation

[Guide](https://github.com/mlcommons/ck/blob/master/docs/installation.md)

##### CM pull repository

```cm pull repo mlcommons@ck```

##### CM script automation help

```cm run script --help```

#### CM CLI

1. `cm run script --tags=prepare,training,data,input,bert[,variations] [--input_flags]`

2. `cm run script "prepare training data input bert[,variations]" [--input_flags]`

3. `cm run script 1e06a7abe23545eb [--input_flags]`

* `variations` can be seen [here](#variations)

* `input_flags` can be seen [here](#script-flags-mapped-to-environment)

#### CM Python API

<details>
<summary>Click here to expand this section.</summary>

```python

import cmind

r = cmind.access({'action':'run'
'automation':'script',
'tags':'prepare,training,data,input,bert'
'out':'con',
...
(other input keys for this script)
...
})

if r['return']>0:
print (r['error'])

```

</details>


#### CM GUI

```cm run script --tags=gui --script="prepare,training,data,input,bert"```

Use this [online GUI](https://cKnowledge.org/cm-gui/?tags=prepare,training,data,input,bert) to generate CM CMD.

#### CM modular Docker container

*TBD*

___
### Customization


#### Variations

* Group "**implementation**"
<details>
<summary>Click here to expand this section.</summary>

* **`_nvidia`** (default)
- Environment variables:
- *CM_TMP_VARIATION*: `nvidia`
- Workflow:
1. ***Read "deps" on other CM scripts***
* get,git,repo,_repo.https://github.com/mlcommons/training_results_v2.1
- CM script: [get-git-repo](https://github.com/mlcommons/ck/tree/master/cm-mlops/script/get-git-repo)
1. ***Read "prehook_deps" on other CM scripts***
* download,file,_gdown,_url.https://drive.google.com/uc?id=1fbGClQMi2CoMv7fwrwTC5YYPooQBdcFW
- CM script: [download-file](https://github.com/mlcommons/ck/tree/master/cm-mlops/script/download-file)
* download,file,_gdown,_url.https://drive.google.com/uc?id=1USK108J6hMM_d27xCHi738qBL8_BT1u1
- CM script: [download-file](https://github.com/mlcommons/ck/tree/master/cm-mlops/script/download-file)
* download,file,_gdown,_url.https://drive.google.com/uc?id=1tmMgLwoBvbEJEHXh77sqrXYw5RpqT8R_
- CM script: [download-file](https://github.com/mlcommons/ck/tree/master/cm-mlops/script/download-file)
* download-and-extract,file,_gdown,_extract,_url.https://drive.google.com/uc?id=14xV2OUGSQDG_yDBrmbSdcDC-QGeqpfs_
- CM script: [download-and-extract](https://github.com/mlcommons/ck/tree/master/cm-mlops/script/download-and-extract)
* download,file,_gdown,_url.https://drive.google.com/uc?id=1chiTBljF0Eh1U5pKs6ureVHgSbtU8OG_
- CM script: [download-file](https://github.com/mlcommons/ck/tree/master/cm-mlops/script/download-file)
* download,file,_gdown,_url.https://drive.google.com/uc?id=1Q47V3K3jFRkbJ2zGCrKkKk-n0fvMZsa0
- CM script: [download-file](https://github.com/mlcommons/ck/tree/master/cm-mlops/script/download-file)
* download,file,_gdown,_url.https://drive.google.com/uc?id=1vAcVmXSLsLeQ1q7gvHnQUSth5W_f_pwv
- CM script: [download-file](https://github.com/mlcommons/ck/tree/master/cm-mlops/script/download-file)

</details>


#### Default variations

`_nvidia`

#### Script flags mapped to environment
<details>
<summary>Click here to expand this section.</summary>

* `--data_dir=value` &rarr; `CM_DATA_DIR=value`

**Above CLI flags can be used in the Python CM API as follows:**

```python
r=cm.access({... , "data_dir":...}
```

</details>

#### Default environment

<details>
<summary>Click here to expand this section.</summary>

These keys can be updated via `--env.KEY=VALUE` or `env` dictionary in `@input.json` or using script flags.


</details>

___
### Script workflow, dependencies and native scripts

<details>
<summary>Click here to expand this section.</summary>

1. Read "deps" on other CM scripts from [meta](https://github.com/mlcommons/ck/tree/master/cm-mlops/script/prepare-training-data-bert/_cm.json)
1. ***Run "preprocess" function from [customize.py](https://github.com/mlcommons/ck/tree/master/cm-mlops/script/prepare-training-data-bert/customize.py)***
1. Read "prehook_deps" on other CM scripts from [meta](https://github.com/mlcommons/ck/tree/master/cm-mlops/script/prepare-training-data-bert/_cm.json)
1. ***Run native script if exists***
* [run.sh](https://github.com/mlcommons/ck/tree/master/cm-mlops/script/prepare-training-data-bert/run.sh)
1. Read "posthook_deps" on other CM scripts from [meta](https://github.com/mlcommons/ck/tree/master/cm-mlops/script/prepare-training-data-bert/_cm.json)
1. ***Run "postrocess" function from [customize.py](https://github.com/mlcommons/ck/tree/master/cm-mlops/script/prepare-training-data-bert/customize.py)***
1. Read "post_deps" on other CM scripts from [meta](https://github.com/mlcommons/ck/tree/master/cm-mlops/script/prepare-training-data-bert/_cm.json)
</details>

___
### Script output
#### New environment keys (filter)

#### New environment keys auto-detected from customize

___
### Maintainers

* [Open MLCommons taskforce on automation and reproducibility](https://github.com/mlcommons/ck/blob/master/docs/taskforce.md)

0 comments on commit 626f7f8

Please sign in to comment.