Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

preparing sync between branches to avoid divergence #330

Merged
merged 45 commits into from
Oct 4, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
ec5d0df
Merge pull request #1 from mlcommons/main
gfursin Sep 20, 2024
c1eb857
Merge pull request #2 from mlcommons/main
gfursin Sep 24, 2024
db60dad
Merge pull request #298 from mlcommons/mlperf-inference
arjunsuresh Sep 26, 2024
28b817e
Merge pull request #299 from mlcommons/dev
gfursin Sep 26, 2024
c97f1fe
Merge pull request #3 from mlcommons/main
gfursin Sep 26, 2024
df4013c
remove sys deps from script "python hello world"
gfursin Sep 27, 2024
9a7057e
* added "dummy" script to test Docker containers
gfursin Sep 27, 2024
6b02a9d
* added better support to select Docker configurations via UID
gfursin Sep 27, 2024
5c74925
fixing docker cfg selection
gfursin Sep 27, 2024
8dbc038
clean up
gfursin Sep 27, 2024
653270c
Merge branch 'main' of github.com:flexaihq/cm4mlops
gfursin Sep 27, 2024
2087704
clean up
gfursin Sep 27, 2024
773ef26
removed ^M from setup.py
gfursin Sep 27, 2024
de55cc1
improving docker docs
gfursin Sep 27, 2024
a8feebd
Merge branch 'main' of github.com:flexaihq/cm4mlops
gfursin Sep 27, 2024
c5aeb3d
minor fix in get-cuda-devices
gfursin Sep 27, 2024
9b79c55
added --docker.key = value to cm docker script
gfursin Sep 29, 2024
07ef382
docker fixes
gfursin Sep 29, 2024
2924ce6
added ubuntu 24.04 config
gfursin Sep 29, 2024
e3b43b8
removed sys deps from image-classification examples
gfursin Sep 30, 2024
62c38ef
removed sys deps from image-classification examples and added YAML
gfursin Sep 30, 2024
f8a4e7a
fixing debug examples for customize.py and wrapped Python code (exter…
gfursin Sep 30, 2024
afaee88
Update customize.py | Fix download-file on windows when downloaded fi…
arjunsuresh Sep 30, 2024
07da361
Update customize.py
arjunsuresh Sep 30, 2024
47b2fd7
Merge pull request #312 from mlcommons/arjunsuresh-patch-2
gfursin Oct 1, 2024
79871c0
adding latest wget.exe to get-sys-utils-cm
gfursin Oct 1, 2024
1051735
Merge pull request #4 from mlcommons/main
gfursin Oct 1, 2024
2593657
Merge pull request #315 from flexaihq/main
gfursin Oct 1, 2024
5419871
turn on tests on Windows
gfursin Oct 1, 2024
44251aa
Merge pull request #317 from flexaihq/main
gfursin Oct 1, 2024
0942459
fixed md5sum bug in windows
anandhu-eng Oct 1, 2024
d459ef6
better handling of env variable
anandhu-eng Oct 1, 2024
7dcaa61
improved env handling + remove escaping
anandhu-eng Oct 1, 2024
a92f8d5
* removed windows test for MLPerf (requires interaction)
gfursin Oct 1, 2024
09a09a1
Merge pull request #320 from flexaihq/main
gfursin Oct 1, 2024
e71477a
commit for windows download-file errorfix
anandhu-eng Oct 1, 2024
b522d7e
code clean
anandhu-eng Oct 1, 2024
b0a217e
Merge pull request #318 from anandhu-eng/downloadfilefix
gfursin Oct 2, 2024
c4f69f6
Merge pull request #316 from mlcommons/dev
gfursin Oct 2, 2024
1927b5d
Merge pull request #324 from mlcommons/main
gfursin Oct 2, 2024
149a474
Merge pull request #5 from mlcommons/main
gfursin Oct 2, 2024
b9cfe35
fixed a few outdated URLs for Windows
gfursin Oct 2, 2024
535d0d5
fix stable version
gfursin Oct 2, 2024
c39c0f9
Merge pull request #325 from flexaihq/main
gfursin Oct 2, 2024
9be5704
Merge pull request #326 from mlcommons/dev
arjunsuresh Oct 2, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions .github/workflows/test-image-classification-onnx.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,12 +13,12 @@ on:

jobs:
build:

runs-on: ubuntu-latest
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
matrix:
python-version: [ "3.12"]
os: [ubuntu-latest, windows-latest, macos-latest]
python-version: [ "3.10", "3.12"]

steps:
- uses: actions/checkout@v3
Expand Down
2 changes: 2 additions & 0 deletions .github/workflows/test-mlperf-inference-resnet50.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,8 @@ jobs:
- os: macos-latest
backend: tf
- os: windows-latest
# MLPerf requires interaction when installing LLVM on Windows - that's why we excluded it here


steps:
- uses: actions/checkout@v4
Expand Down
5 changes: 5 additions & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
### 20240927
* added "test dummy" script to test Docker containers
* added more standard Nvidia Docker configuration for PyTorch
* added better support to select Docker configurations via UID

### 20240916
* fixed "cm add script"

Expand Down
2 changes: 1 addition & 1 deletion COPYRIGHT.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
Copyright (c) 2021-2024 MLCommons

The cTuning foundation and OctoML donated this project to MLCommons to benefit everyone.
Grigori Fursin, the cTuning foundation and OctoML donated this project to MLCommons to benefit everyone.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should agree on the history and keep it stable - it doesn't look good if it keeps on changing. Technically cm4mlops and all the code inside it started within MLCommons.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parts of the code comes from older versions of CK and CM and from the https://github.com/mlcommons/ck where it was originally resided. We have to reflect that to avoid legal problems in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally started developing CM4MLOps as an integral part of CM while reusing some parts of CK and CK4MLOps that I developed at cTuning so it has to have a common history and license.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The OctoAI blog has nothing to do with CM as it is talking about results taken using CK. Anyway I do not want to argue over these things. Since none of the projects are new, I request you to finalize the historical attributions and contributions and and keep them constant. This can help others decide in using the project. If anyone has a concern that the history of a project can change in future no one will be happy to contribute to it. Or else, let git history take care of them automatically as done in most other MLCommons repositories.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! Agree! Thank you!

But we still have a problem with the conflict in .github/workflows/test-mlperf-inference-resnet50.yml .

May I ask you to help resolve it and merge the PR, please?

Thanks a lot for your help - very appreciated!


Copyright (c) 2014-2021 cTuning foundation
5 changes: 5 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,11 @@ cm run script \

[Apache 2.0](LICENSE.md)

## CM concepts

* https://doi.org/10.5281/zenodo.8105339
* https://arxiv.org/abs/2406.16791
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should remove this arxiv paper - it has many factually wrong things. You had earlier agreed to fix them.

Copy link
Contributor Author

@gfursin gfursin Oct 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?
It has a few minor typos that I still need to clean up but it was not urgent. But if you feel that something is wrong, I will remove it for now to discuss it later. There are also many other things I would like to clarify with the very recent CM/CM4MLOps developments while I was on a sick leave before updating the paper.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the PR: #336

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Grigori. Yes, not just the typos - "Arjun Suresh - a senior software engineer from cKnowledge.org" - I was at OctoML and only later at cKnowledge and even there the title was not the same. And I think in the last section "Future Plans" you made a typo to use "We" as everywhere else in the PDF it is "I".


## Authors

[Grigori Fursin](https://cKnowledge.org/gfursin) and [Arjun Suresh](https://www.linkedin.com/in/arjunsuresh)
Expand Down
52 changes: 48 additions & 4 deletions automation/script/module.py
Original file line number Diff line number Diff line change
Expand Up @@ -4028,14 +4028,58 @@ def docker(self, i):

(out) (str): if 'con', output to console

parsed_artifact (list): prepared in CM CLI or CM access function
[ (artifact alias, artifact UID) ] or
[ (artifact alias, artifact UID), (artifact repo alias, artifact repo UID) ]

(repos) (str): list of repositories to search for automations

(output_dir) (str): output directory (./ by default)

(docker) (dict): convert keys into docker_{key} strings for CM >= 2.3.8.1


(docker_skip_build) (bool): do not generate Dockerfiles and do not recreate Docker image (must exist)
(docker_noregenerate) (bool): do not generate Dockerfiles
(docker_norecreate) (bool): do not recreate Docker image

(docker_cfg) (str): if True, show all available basic docker configurations, otherwise pre-select one
(docker_cfg_uid) (str): if True, select docker configuration with this UID

(docker_path) (str): where to create or find Dockerfile
(docker_gh_token) (str): GitHub token for private repositories
(docker_save_script) (str): if !='' name of script to save docker command
(docker_interactive) (bool): if True, run in interactive mode
(docker_it) (bool): the same as `docker_interactive`
(docker_detached) (bool): detach Docker
(docker_dt) (bool) the same as `docker_detached`

(docker_base_image) (str): force base image
(docker_os) (str): force docker OS (default: ubuntu)
(docker_os_version) (str): force docker OS version (default: 22.04)
(docker_image_tag_extra) (str): add extra tag (default:-latest)

(docker_cm_repo) (str): force CM automation repository when building Docker (default: cm4mlops)
(docker_cm_repos)
(docker_cm_repo_flags)

(dockerfile_env)

(docker_skip_cm_sys_upgrade) (bool): if True, do not install CM sys deps

(docker_extra_sys_deps)

(fake_run_deps)
(docker_run_final_cmds)

(all_gpus)
(num_gpus)

(docker_device)

(docker_port_maps)

(docker_shm_size)

(docker_extra_run_args)


Returns:
(CM return dict):

Expand Down
57 changes: 30 additions & 27 deletions automation/script/module_misc.py
Original file line number Diff line number Diff line change
Expand Up @@ -1335,15 +1335,9 @@ def dockerfile(i):
Args:
(CM input dict):

(out) (str): if 'con', output to console

parsed_artifact (list): prepared in CM CLI or CM access function
[ (artifact alias, artifact UID) ] or
[ (artifact alias, artifact UID), (artifact repo alias, artifact repo UID) ]

(repos) (str): list of repositories to search for automations

(output_dir) (str): output directory (./ by default)
(out) (str): if 'con', output to console
(repos) (str): list of repositories to search for automations
(output_dir) (str): output directory (./ by default)

Returns:
(CM return dict):
Expand Down Expand Up @@ -1632,15 +1626,6 @@ def docker(i):

(out) (str): if 'con', output to console

(docker_skip_build) (bool): do not generate Dockerfiles and do not recreate Docker image (must exist)
(docker_noregenerate) (bool): do not generate Dockerfiles
(docker_norecreate) (bool): do not recreate Docker image

(docker_path) (str): where to create or find Dockerfile
(docker_gh_token) (str): GitHub token for private repositories
(docker_save_script) (str): if !='' name of script to save docker command
(docker_interactive) (bool): if True, run in interactive mode
(docker_cfg) (str): if True, show all available basic docker configurations, otherwise pre-select one

Returns:
(CM return dict):
Expand All @@ -1653,6 +1638,20 @@ def docker(i):
import copy
import re

from cmind import __version__ as current_cm_version

self_module = i['self_module']

if type(i.get('docker', None)) == dict:
# Grigori started cleaning and refactoring this code on 20240929
#
# 1. use --docker dictionary instead of --docker_{keys}

if utils.compare_versions(current_cm_version, '2.3.8.1') >= 0:
docker_params = utils.convert_dictionary(i['docker'], 'docker')
i.update(docker_params)
del(i['docker'])

quiet = i.get('quiet', False)

detached = i.get('docker_detached', '')
Expand All @@ -1670,13 +1669,12 @@ def docker(i):

# Check simplified CMD: cm docker script "python app image-classification onnx"
# If artifact has spaces, treat them as tags!
self_module = i['self_module']
self_module.cmind.access({'action':'detect_tags_in_artifact', 'automation':'utils', 'input':i})

# CAREFUL -> artifacts and parsed_artifacts are not supported in input (and should not be?)
if 'artifacts' in i: del(i['artifacts'])
if 'parsed_artifacts' in i: del(i['parsed_artifacts'])

# Prepare "clean" input to replicate command
r = self_module.cmind.access({'action':'prune_input', 'automation':'utils', 'input':i, 'extra_keys_starts_with':['docker_']})
i_run_cmd_arc = r['new_input']
Expand All @@ -1693,13 +1691,19 @@ def docker(i):

# Check available configurations
docker_cfg = i.get('docker_cfg', '')
if docker_cfg != '':
docker_cfg_uid = i.get('docker_cfg_uid', '')

if docker_cfg != '' or docker_cfg_uid != '':
# Check if docker_cfg is turned on but not selected
if type(docker_cfg) == bool or str(docker_cfg).lower() in ['true','yes']:
docker_cfg= ''

r = self_module.cmind.access({'action':'select_cfg', 'automation':'utils,dc2743f8450541e3',
'tags':'basic,docker,configurations', 'title':'docker', 'alias':docker_cfg})

r = self_module.cmind.access({'action':'select_cfg',
'automation':'utils,dc2743f8450541e3',
'tags':'basic,docker,configurations',
'title':'docker',
'alias':docker_cfg,
'uid':docker_cfg_uid})
if r['return'] > 0:
if r['return'] == 16:
return {'return':1, 'error':'Docker configuration {} was not found'.format(docker_cfg)}
Expand All @@ -1708,10 +1712,9 @@ def docker(i):
selection = r['selection']

docker_input_update = selection['meta']['input']

i.update(docker_input_update)


########################################################################################
# Run dockerfile
if not noregenerate_docker_file:
Expand All @@ -1722,7 +1725,7 @@ def docker(i):
cur_dir = os.getcwd()

console = i.get('out') == 'con'

# Search for script(s)
r = aux_search({'self_module': self_module, 'input': i})
if r['return']>0: return r
Expand Down
62 changes: 36 additions & 26 deletions automation/utils/module_cfg.py
Original file line number Diff line number Diff line change
Expand Up @@ -230,16 +230,18 @@ def select_cfg(i):
self_module = i['self_module']
tags = i['tags']
alias = i.get('alias', '')
uid = i.get('uid', '')
title = i.get('title', '')

# Check if alias is not provided
r = self_module.cmind.access({'action':'find', 'automation':'cfg', 'tags':'basic,docker,configurations'})
if r['return'] > 0: return r

lst = r['list']

selector = []

# Do coarse-grain search for CM artifacts
for l in lst:
p = l.path

Expand All @@ -257,45 +259,53 @@ def select_cfg(i):
if not f.startswith('_cm') and (f.endswith('.json') or f.endswith('.yaml')):
selector.append({'path':os.path.join(p, f), 'alias':f[:-5]})

if len(selector) == 0:
return {'return':16, 'error':'configuration was not found'}

select = 0
if len(selector) > 1:
xtitle = ' ' + title if title!='' else ''
print ('')
print ('Available{} configurations:'.format(xtitle))

print ('')
# Load meta for name and UID
selector_with_meta = []
for s in range(0, len(selector)):
ss = selector[s]

for s in range(0, len(selector)):
ss = selector[s]
path = ss['path']

path = ss['path']
full_path_without_ext = path[:-5]

full_path_without_ext = path[:-5]
r = cmind.utils.load_yaml_and_json(full_path_without_ext)
if r['return']>0:
print ('Warning: problem loading configuration file {}'.format(path))

r = cmind.utils.load_yaml_and_json(full_path_without_ext)
if r['return']>0:
print ('Warning: problem loading configuration file {}'.format(path))
meta = r['meta']

meta = r['meta']
if uid == '' or meta.get('uid', '') == uid:
ss['meta'] = meta
selector_with_meta.append(ss)

# Quit if no configurations found
if len(selector_with_meta) == 0:
return {'return':16, 'error':'configuration was not found'}

selector = sorted(selector, key = lambda x: x['meta'].get('name',''))
select = 0
if len(selector_with_meta) > 1:
xtitle = ' ' + title if title!='' else ''
print ('')
print ('Available{} configurations:'.format(xtitle))

print ('')

selector_with_meta = sorted(selector_with_meta, key = lambda x: x['meta'].get('name',''))
s = 0
for ss in selector:
for ss in selector_with_meta:
alias = ss['alias']
name = ss['meta'].get('name','')
uid = ss['meta'].get('uid', '')
name = ss['meta'].get('name', '')

x = name
if x!='': x+=' '
x += '('+alias+')'
print ('{}) {}'.format(s, x))
x += '(' + uid + ')'

print (f'{s}) {x}'.format(s, x))

s+=1

print ('')
select = input ('Enter configuration number of press Enter for 0: ')

Expand All @@ -306,6 +316,6 @@ def select_cfg(i):
if select<0 or select>=len(selector):
return {'return':1, 'error':'selection is out of range'}

ss = selector[select]
ss = selector_with_meta[select]

return {'return':0, 'selection':ss}
39 changes: 39 additions & 0 deletions cfg/benchmark-run-mlperf-inference-v4.1/_cm.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
alias: benchmark-run-mlperf-inference-v4.1
uid: b7e89771987d4168

automation_alias: cfg
automation_uid: 88dce9c160324c5d

tags:
- benchmark
- run
- mlperf
- inference
- v4.1

name: "MLPerf inference - v4.1"

supported_compute:
- ee8c568e0ac44f2b
- fe379ecd1e054a00
- d8f06040f7294319

bench_uid: 39877bb63fb54725

view_dimensions:
- - input.device
- "MLPerf device"
- - input.implementation
- "MLPerf implementation"
- - input.backend
- "MLPerf backend"
- - input.model
- "MLPerf model"
- - input.scenario
- "MLPerf scenario"
- - input.host_os
- "Host OS"
- - output.state.cm-mlperf-inference-results-last.performance
- "Got performance"
- - output.state.cm-mlperf-inference-results-last.accuracy
- "Got accuracy"
9 changes: 9 additions & 0 deletions cfg/docker-basic-configurations/basic-ubuntu-24.04.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
uid: 12e86eb386314866

name: "Basic Ubuntu 24.04"

input:
docker_base_image: 'ubuntu:24.04'
docker_os: ubuntu
docker_os_version: '24.04'

Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
uid: 854e65fb31584d63

name: "Nvidia Ubuntu 20.04 CUDA 11.8 cuDNN 8.6.0 PyTorch 1.13.0"
name: "Nvidia Ubuntu 20.04 CUDA 11.8 cuDNN 8.6.0 PyTorch 1.13.0 (pytorch:22.10)"

ref_url: https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-22-10.html

input:
docker_base_image: 'nvcr.io/nvidia/pytorch:22.10-py3'
Expand Down
Loading
Loading