v3.0.0 (2020-12-07)
Implemented enhancements:
- Support multiple clusters in CLI #91
- Add notebook/code to visualize results #72
- Support AWS in CLI #33
- Fix rnn language model #303 (ehoelzl)
- Transformer language translation #99 (ehoelzl)
Fixed bugs:
- Training code keeps running for PyTorch after training is done #26
Closed issues:
- Remove loss argument for metric computation #295
- Update PyTorch to 1.7 #286
- Refactor optimizer and chose more appropriate names #284
- fails to create kind cluster #277
- Refactor CLI #253
- Dependabot couldn't authenticate with https://pypi.python.org/simple/ #252
- Unify requirements/setup.py versions #244
- isort failing on all PRs #227
- torch.div is not supported in PyTorch 1.6 #223
- Refactor common functionality for tiller and helm #108
- Add GPU support for AWS in CLI #104
- Change CPU limit to #CPUs - 1 #101
- Add --version flag #97
- Cluster creation/deletion errors with non-default zone #94
- Add command to list runs #86
- RefreshError from gcloud #83
- Run new benchmarks and document costs #82
- Make nvidia k80 default GPU #80
- Fix random seeds #79
- benchmark against torch.nn.parallel.DistributedDataParallel MPSG #75
- upgrade to pytorch 1.5 #74
- Provide comparison to competitors #66
- Add some integration tests #64
- Remove stale branches #62
- Add PowerSGD optimizer #59
- Add RNN Language Model #54
- Use torch.nn.DataParallel for intra-node computation #46
- Add CLI support for DIND #42
- Port over functionality from Language Model benchmark to the core library #34
- make results reproducible from command-line #24
- Contribution and docs section on README.md #17
- test new torch.distributed #15
Merged pull requests:
- Bugfix KIND cli #307 (ehoelzl)
- Update README.md to show new badge #306 (ehoelzl)
- Create manual.yml #305 (ehoelzl)
- Switch to github actions #304 (ehoelzl)
- Bump sphinx from 3.3.0 to 3.3.1 #301 (dependabot[bot])
- Remove loss from metric argument #297 (ehoelzl)
- Fix translators #294 (ehoelzl)
- Update pytorch #292 (ehoelzl)
- Bump sphinx from 3.2.1 to 3.3.0 in /docs #288 (dependabot[bot])
- Refactor optimizers #285 (ehoelzl)
- Bump isort from 5.5.4 to 5.6.4 #283 (dependabot[bot])
- Bump sphinx-autoapi from 1.5.0 to 1.5.1 #280 (dependabot[bot])
- Add gpu functionality on AWS #278 (mmilenkoski)
- Catch exceptions when creating/deleting clusters #276 (ehoelzl)
- Fix doc #275 (ehoelzl)
- Fix AWS deployment #274 (mmilenkoski)
- Create dependabot.yml #260 (ehoelzl)
- Merge requirements & Update doc #259 (ehoelzl)
- Bump google-api-python-client from 1.9.3 to 1.12.1 #246 (dependabot-preview[bot])
- Bump numpy from 1.19.0 to 1.19.2 #245 (dependabot-preview[bot])
- Bump boto3 from 1.14.6 to 1.14.50 #234 (dependabot-preview[bot])
- Fix isort errors #233 (mmilenkoski)
- Bump pytest-mock from 3.1.1 to 3.3.1 #231 (dependabot-preview[bot])
- Bump isort from 4.3.21 to 5.4.2 #221 (dependabot-preview[bot])
- Bump sphinx from 3.0.4 to 3.2.1 #220 (dependabot-preview[bot])
- Bump grpcio from 1.29.0 to 1.31.0 #207 (dependabot-preview[bot])
- Bump spacy from 2.3.0 to 2.3.2 #182 (dependabot-preview[bot])
- Downgrade Sphinx #162 (ehoelzl)
- Add developer docs #161 (Panaetius)
- Fp optimizer changes #160 (ehoelzl)
- Bump wcwidth from 0.1.9 to 0.2.5 #156 (dependabot-preview[bot])
- Bump all versions and add doc test #152 (Panaetius)
- Bump torchvision from 0.6.0 to 0.6.1 #151 (dependabot-preview[bot])
- Bump numpy from 1.18.5 to 1.19.0 #150 (dependabot-preview[bot])
- Bump torch from 1.5.0 to 1.5.1 #148 (dependabot-preview[bot])
- Bump google-auth from 1.17.2 to 1.18.0 #147 (dependabot-preview[bot])
- Bump sphinx-rtd-theme from 0.4.3 to 0.5.0 #144 (dependabot-preview[bot])
- Bump spacy from 2.2.4 to 2.3.0 #142 (dependabot-preview[bot])
- Bump sphinx from 3.1.0 to 3.1.1 #140 (dependabot-preview[bot])
- Bump dill from 0.3.1.1 to 0.3.2 #138 (dependabot-preview[bot])
- Update dependencies #137 (Panaetius)
- Bump spacy from 2.2.3 to 2.2.4 #135 (dependabot-preview[bot])
- Bump numpy from 1.16.6 to 1.18.5 #133 (dependabot-preview[bot])
- Bump freezegun from 0.3.12 to 0.3.15 #129 (dependabot-preview[bot])
- Bump tabulate from 0.8.6 to 0.8.7 #128 (dependabot-preview[bot])
- Bump deprecation from 2.0.6 to 2.1.0 #125 (dependabot-preview[bot])
- Bump pytest-black from 0.3.8 to 0.3.9 #124 (dependabot-preview[bot])
- Bump sphinx-rtd-theme from 0.4.2 to 0.4.3 #123 (dependabot-preview[bot])
- Bump sphinx from 1.8.1 to 3.1.0 #121 (dependabot-preview[bot])
- Bump pytest-mock from 1.10.0 to 3.1.1 #120 (dependabot-preview[bot])
- Bump torchtext from 0.5.0 to 0.6.0 #118 (dependabot-preview[bot])
- Bump torchvision from 0.5.0 to 0.6.0 #117 (dependabot-preview[bot])
- Adds support for multiple clusters #115 (Panaetius)
- Bump click from 7.0 to 7.1.2 #114 (dependabot-preview[bot])
- Bump google-cloud-container from 0.3.0 to 0.5.0 #113 (dependabot-preview[bot])
- Bump appdirs from 1.4.3 to 1.4.4 #112 (dependabot-preview[bot])
- Bump sphinxcontrib-bibtex from 0.4.0 to 1.0.0 #111 (dependabot-preview[bot])
- Bump sphinx-autoapi from 1.3.0 to 1.4.0 #110 (dependabot-preview[bot])
- Remove unused arguments in create_aws #109 (mmilenkoski)
- Fix Random seeds, Add new tracker stats #107 (ehoelzl)
- Add return_code check in test_cli #106 (mmilenkoski)
- Add AWS support in CLI #103 (mmilenkoski)
- Update test_cli.py #100 (giorgiosav)
- Adds a chart command to cli #95 (Panaetius)
- Add support for kind cluster creation in the CLI #93 (mmilenkoski)
v2.4.0 (2020-04-20)
Implemented enhancements:
- Switch to black for code formatting #35
Closed issues:
- Travis tests run only for Python 3.6 #65
- Downloading results fails if
--output
option is not provided #57 - Remember user input in mlbench run #56
- Aggregate the gradients by model, instead of by layers. #45
- Update docker images to CUDA10, mlbench-core module to newest #43
- Upgrade PyTorch to 1.4 #40
Merged pull requests:
- Pytorch v1.4.0 #68 (ehoelzl)
- Fix ci #67 (ehoelzl)
- Add aggregation by model #61 (ehoelzl)
- Remember user input in mlbench run #60 (mmilenkoski)
- Add default name of output file in CLI #58 (mmilenkoski)
- Cli adaptation #55 (ehoelzl)
- Update tags and patch version to 2.3.2 #52 (ehoelzl)
- Add get_optimizer to create optimizer object #48 (mmilenkoski)
v2.3.2 (2020-04-07)
Implemented enhancements:
- Add NCCL & GLOO Backend support #49
- Add NCCL & GLOO Backend support #47 (giorgiosav)
Fixed bugs:
- math ValueError with 1-node cluster #38
Merged pull requests:
- num_workers fix #51 (giorgiosav)
- Adds centralized Adam implementation #41 (mmilenkoski)
2.3.1 (2020-03-09)
Implemented enhancements:
- Customize Communication Scheme For Sparsified/Quantizatized/Decentralized scenarios #12
v2.3.0 (2019-12-23)
v2.2.1 (2019-12-16)
v2.2.0 (2019-11-11)
Implemented enhancements: - initialize_backends
can now be
called as context manager - Improved CLI to run multiple runs in
parallel
v2.1.1 (2019-11-11)
v2.1.0 (2019-11-4)
Implemented enhancements:
- Added CLI for MLBench runs
v2.0.0 (2019-06-13)
v1.4.4 (2019-05-28)
v1.4.3 (2019-05-23)
v1.4.2 (2019-05-21)
v1.4.1 (2019-05-16)
v1.4.0 (2019-05-02)
Implemented enhancements:
- Split Train and Validation in Tensorflow #22
v1.3.4 (2019-03-20)
Implemented enhancements:
- in controlflow, don't mix train and validation #20
Fixed bugs:
- Add metrics logging for Tensorflow #19
v1.3.3 (2019-02-26)
v1.3.2 (2019-02-13)
v1.3.1 (2019-02-13)
v1.3.0 (2019-02-12)
v1.2.1 (2019-01-31)
v1.2.0 (2019-01-30)
v1.1.1 (2019-01-09)
v1.1.0 (2018-12-06)
Fixed bugs:
- Bug when saving checkpoints #13
Implemented enhancements:
- Adds Tensorflow Controlflow, Dataset and Model code
- Adds Pytorch linear models
- Adds sparsified and decentralized optimizers
1.0.0 (2018-11-15)
Implemented enhancements:
- Add API Client to mlbench-core #6
- Move to google-style docs #4
- Add Imagenet Dataset for pytorch #3
- Move worker code to mlbench-core repo #1
v3.0.0 (2020-12-07)
Implemented enhancements:
Closed issues:
- Add integration tests for newer versions of Kubernetes #23
- Add deployment on KIND rather than Minikube #21
- Use of GCloud script #19
- Can not configure NVIDIA on AWS #17
- Migrate to Kubernetes API v1 #15
- Deployment on minikube requires kubernetes 1.15 #13
- Remove obsolete info in
values.yaml
#12 - mlbench worker pods not created #11
Merged pull requests:
- Add workflow #25 (ehoelzl)
- Update to v1 #24 (ehoelzl)
- Update doc requirements #22 (ehoelzl)
- Remove AWS and GCloud scripts #20 (ehoelzl)
- Removes unused entries from values.yaml #18 (Panaetius)
- Switch to eksctl for aws deployment #16 (mmilenkoski)
- Add setup script for kind with local registry #14 (mmilenkoski)
Implemented enhancements:
- Added GKE and AWS Setup Scripts
v3.0.0 (2020-12-07)
Implemented enhancements:
- Allow running of custom code #9
- Define Job resource for mpirun execution #2
- Create Kubernetes Job to execute mpirun #1
Closed issues:
- Add integration tests #86
- Dependabot couldn't authenticate with https://pypi.python.org/simple/ #74
- Fix dashboard scheduling #49
- Add ability to stop run before end #48
- Make sure all results are well zipped #44
- Prevent user from inserting invalid run names #28
- Travis tests run only for Python 3.6 #24
- Remove stale branches #23
Merged pull requests:
- Switch to actions #121 (ehoelzl)
- Bump sphinx from 3.3.0 to 3.3.1 in /docs #120 (dependabot[bot])
- Fix stream disconnection #115 (ehoelzl)
- Update images #114 (ehoelzl)
- Fix integration tests #113 (ehoelzl)
- Bump rq-scheduler from 0.8.3 to 0.10.0 #109 (dependabot[bot])
- Bump sphinx from 3.2.1 to 3.3.0 in /docs #108 (dependabot[bot])
- Bump fakeredis from 1.4.3 to 1.4.4 #102 (dependabot-preview[bot])
- Bump pytest from 6.0.2 to 6.1.2 #101 (dependabot-preview[bot])
- Bump pytest-django from 3.10.0 to 4.1.0 #100 (dependabot-preview[bot])
- Bump tox from 3.20.0 to 3.20.1 #96 (dependabot-preview[bot])
- Change 'Benchmarks' to 'Benchmark Implementations' #93 (ehoelzl)
- Add integration tests #91 (ehoelzl)
- Bump pytest-kind from 20.5.3 to 20.10.0 #89 (dependabot-preview[bot])
- Add tests #75 (ehoelzl)
- Bugfix #60 (ehoelzl)
- Bump watchdog from 0.8.3 to 0.10.3 #58 (dependabot-preview[bot])
- Bump uwsgi from 2.0.17 to 2.0.19.1 #57 (dependabot-preview[bot])
- Bump sphinx from 1.7.1 to 3.1.1 #52 (dependabot-preview[bot])
- Bump tox from 2.9.1 to 3.15.2 #46 (dependabot-preview[bot])
- Bump sphinx-rtd-theme from 0.4.0 to 0.4.3 #45 (dependabot-preview[bot])
- Bump django-constance from 2.2.0 to 2.6.0 #43 (dependabot-preview[bot])
- Bump pytest-black from 0.3.8 to 0.3.9 #42 (dependabot-preview[bot])
- Bump flake8 from 3.5.0 to 3.8.3 #40 (dependabot-preview[bot])
- Bump redis from 2.10.6 to 3.5.3 #38 (dependabot-preview[bot])
- Bump pip from 10.0.1 to 20.1.1 #37 (dependabot-preview[bot])
- Bump bumpversion from 0.5.3 to 0.6.0 #34 (dependabot-preview[bot])
- Bump django from 2.2.12 to 2.2.13 #33 (dependabot[bot])
- Bump django from 2.2.12 to 2.2.13 in /Docker #32 (dependabot[bot])
- Add backend benchmark #31 (ehoelzl)
- Add transformer image #30 (ehoelzl)
Implemented enhancements:
- Added Download of Task Goals
- Fixed some performance issues
Implemented enhancements:
- Added new Tensorflow Benchmark Image
- Remove Bandwidth limiting
- Added ability to run custom images in dashboard
v3.0.0 (2020-12-07)
Implemented enhancements:
- Update PyTorch base to 1.7 #64
- Add NLP/machine translation Transformer benchmark task #33
- Repair Logistic regression Model #30
- Add NLP/machine translation RNN benchmark task #27
- Add NLP benchmark images & task #24
- Add Gloo support to PyTorch images #23
- Add NCCL support to PyTorch images #22
- documentation: clearly link ref code to benchmark tasks #14
- Add time-to-accuracy speedup plot #7
- Update GKE documentation to use kubernetes version 1.10.9 #4
- Add tensorflow cifar10 benchmark #3
- Transformer language translation #51 (ehoelzl)
Fixed bugs:
- Change Tensorflow Benchmark to use OpenMPI #8
Closed issues:
- Clean-up tasks #63
- Support for local run #59
- task implementations: delete choco, name tasks nlp/language-model and nlp/translation #55
- remove open/closed division distinction #47
- [Not an Issue] Comparing 3 backends on multi-node single-gpu env #44
- Create light version of the base image for development #43
- No unit tests #40
- Remove stale branches #39
- Remove Communication backend from image name #36
- pytorch 1.4 #34
- create light version (in addition to full) for resource heavy benchmark tasks #19
- add script to compute official results from raw results (time to acc for example) #18
Merged pull requests:
- Add workflow #68 (ehoelzl)
- Fix rnn language model #67 (ehoelzl)
- Update pytorch #65 (ehoelzl)
- Adapt optimizer imports #62 (ehoelzl)
- Translation changes #61 (ehoelzl)
- Change 'Benchmarks' to 'Benchmark Implementations' #60 (ehoelzl)
- Add generic worker #58 (ehoelzl)
- Rename tasks #57 (ehoelzl)
- Add link to task description #56 (ehoelzl)
- Fix tasks #54 (ehoelzl)
- Add backend benchmark code and image #53 (ehoelzl)
- Update nccl #52 (ehoelzl)
- Remove open/closed division from benchmarks #49 (mmilenkoski)
- Pytorch 1.5.0 #48 (giorgiosav)
- Refactor controlflow #46 (ehoelzl)
- Add Image Recognition Benchmark with DistributedDataParallel #42 (mmilenkoski)
- Pytorch v1.4.0 #41 (ehoelzl)
- Add aggregation by model #38 (ehoelzl)
- Add NCCL & GLOO support to images #35 (giorgiosav)
- Rnn language translation #32 (ehoelzl)
- Linear model #28 (ehoelzl)
- Fix ci #26 (ehoelzl)
- [WIP]Add LSTM language model #25 (Panaetius)
Implemented enhancements:
- Added Goals to PyTorch Benchmark
- Updated PyTorch Tutorial code
- Changed all images to newest
mlbench-core
version.
Implemented enhancements:
- Added Tensorflow Benchmark