Skip to content

Latest commit

 

History

History
1024 lines (855 loc) · 40.5 KB

changelog.rst

File metadata and controls

1024 lines (855 loc) · 40.5 KB

Change Log

MLBench Core

v3.0.0

v3.0.0 (2020-12-07)

Full Changelog

Implemented enhancements:

  • Support multiple clusters in CLI #91
  • Add notebook/code to visualize results #72
  • Support AWS in CLI #33
  • Fix rnn language model #303 (ehoelzl)
  • Transformer language translation #99 (ehoelzl)

Fixed bugs:

  • Training code keeps running for PyTorch after training is done #26

Closed issues:

  • Remove loss argument for metric computation #295
  • Update PyTorch to 1.7 #286
  • Refactor optimizer and chose more appropriate names #284
  • fails to create kind cluster #277
  • Refactor CLI #253
  • Dependabot couldn't authenticate with https://pypi.python.org/simple/ #252
  • Unify requirements/setup.py versions #244
  • isort failing on all PRs #227
  • torch.div is not supported in PyTorch 1.6 #223
  • Refactor common functionality for tiller and helm #108
  • Add GPU support for AWS in CLI #104
  • Change CPU limit to #CPUs - 1 #101
  • Add --version flag #97
  • Cluster creation/deletion errors with non-default zone #94
  • Add command to list runs #86
  • RefreshError from gcloud #83
  • Run new benchmarks and document costs #82
  • Make nvidia k80 default GPU #80
  • Fix random seeds #79
  • benchmark against torch.nn.parallel.DistributedDataParallel MPSG #75
  • upgrade to pytorch 1.5 #74
  • Provide comparison to competitors #66
  • Add some integration tests #64
  • Remove stale branches #62
  • Add PowerSGD optimizer #59
  • Add RNN Language Model #54
  • Use torch.nn.DataParallel for intra-node computation #46
  • Add CLI support for DIND #42
  • Port over functionality from Language Model benchmark to the core library #34
  • make results reproducible from command-line #24
  • Contribution and docs section on README.md #17
  • test new torch.distributed #15

Merged pull requests:

v2.4.0

v2.4.0 (2020-04-20)

Full Changelog

Implemented enhancements:

  • Switch to black for code formatting #35

Closed issues:

  • Travis tests run only for Python 3.6 #65
  • Downloading results fails if --output option is not provided #57
  • Remember user input in mlbench run #56
  • Aggregate the gradients by model, instead of by layers. #45
  • Update docker images to CUDA10, mlbench-core module to newest #43
  • Upgrade PyTorch to 1.4 #40

Merged pull requests:

v2.3.2

v2.3.2 (2020-04-07)

Full Changelog

Implemented enhancements:

  • Add NCCL & GLOO Backend support #49
  • Add NCCL & GLOO Backend support #47 (giorgiosav)

Fixed bugs:

  • math ValueError with 1-node cluster #38

Merged pull requests:

v2.3.1

2.3.1 (2020-03-09)

Full Changelog

Implemented enhancements:

  • Customize Communication Scheme For Sparsified/Quantizatized/Decentralized scenarios #12

v2.3.0

v2.3.0 (2019-12-23)

Full Changelog

v2.2.1

v2.2.1 (2019-12-16)

Full Changelog

v2.2.0

v2.2.0 (2019-11-11)

Full Changelog

Implemented enhancements: - initialize_backends can now be called as context manager - Improved CLI to run multiple runs in parallel

v2.1.1

v2.1.1 (2019-11-11)

Full Changelog

v2.1.0

v2.1.0 (2019-11-4)

Full Changelog

Implemented enhancements:

  • Added CLI for MLBench runs

v2.0.0

v2.0.0 (2019-06-13)

Full Changelog

v1.4.4

v1.4.4 (2019-05-28)

Full Changelog

v1.4.3

v1.4.3 (2019-05-23)

Full Changelog

v1.4.2

v1.4.2 (2019-05-21)

Full Changelog

v1.4.1

v1.4.1 (2019-05-16)

Full Changelog

v1.4.0

v1.4.0 (2019-05-02)

Full Changelog

Implemented enhancements:

  • Split Train and Validation in Tensorflow #22

v1.3.4

v1.3.4 (2019-03-20)

Full Changelog

Implemented enhancements:

  • in controlflow, don't mix train and validation #20

Fixed bugs:

  • Add metrics logging for Tensorflow #19

v1.3.3

v1.3.3 (2019-02-26)

Full Changelog

v1.3.2

v1.3.2 (2019-02-13)

Full Changelog

v1.3.1

v1.3.1 (2019-02-13)

Full Changelog

v1.3.0

v1.3.0 (2019-02-12)

Full Changelog

v1.2.1

v1.2.1 (2019-01-31)

Full Changelog

v1.2.0

v1.2.0 (2019-01-30)

Full Changelog

v1.1.1

v1.1.1 (2019-01-09)

Full Changelog

v1.1.0

v1.1.0 (2018-12-06)

Full Changelog

Fixed bugs:

  • Bug when saving checkpoints #13

Implemented enhancements:

  • Adds Tensorflow Controlflow, Dataset and Model code
  • Adds Pytorch linear models
  • Adds sparsified and decentralized optimizers

v1.0.0

1.0.0 (2018-11-15)

Implemented enhancements:

  • Add API Client to mlbench-core #6
  • Move to google-style docs #4
  • Add Imagenet Dataset for pytorch #3
  • Move worker code to mlbench-core repo #1

MLBench Helm

v3.0.0

v3.0.0 (2020-12-07)

Full Changelog

Implemented enhancements:

  • Add DIND Setup Script #4
  • Add Amazon Cloud setup script #3

Closed issues:

  • Add integration tests for newer versions of Kubernetes #23
  • Add deployment on KIND rather than Minikube #21
  • Use of GCloud script #19
  • Can not configure NVIDIA on AWS #17
  • Migrate to Kubernetes API v1 #15
  • Deployment on minikube requires kubernetes 1.15 #13
  • Remove obsolete info in values.yaml #12
  • mlbench worker pods not created #11

Merged pull requests:

v2.0.0

Implemented enhancements:

  • Added GKE and AWS Setup Scripts

MLBench Dashboard

v3.0.0

v3.0.0 (2020-12-07)

Full Changelog

Implemented enhancements:

  • Allow running of custom code #9
  • Define Job resource for mpirun execution #2
  • Create Kubernetes Job to execute mpirun #1

Closed issues:

  • Add integration tests #86
  • Dependabot couldn't authenticate with https://pypi.python.org/simple/ #74
  • Fix dashboard scheduling #49
  • Add ability to stop run before end #48
  • Make sure all results are well zipped #44
  • Prevent user from inserting invalid run names #28
  • Travis tests run only for Python 3.6 #24
  • Remove stale branches #23

Merged pull requests:

v2.0.0

Implemented enhancements:

  • Added Download of Task Goals
  • Fixed some performance issues

v1.1.0

Implemented enhancements:

  • Added new Tensorflow Benchmark Image
  • Remove Bandwidth limiting
  • Added ability to run custom images in dashboard

MLBench Benchmarks

v3.0.0

v3.0.0 (2020-12-07)

Full Changelog

Implemented enhancements:

  • Update PyTorch base to 1.7 #64
  • Add NLP/machine translation Transformer benchmark task #33
  • Repair Logistic regression Model #30
  • Add NLP/machine translation RNN benchmark task #27
  • Add NLP benchmark images & task #24
  • Add Gloo support to PyTorch images #23
  • Add NCCL support to PyTorch images #22
  • documentation: clearly link ref code to benchmark tasks #14
  • Add time-to-accuracy speedup plot #7
  • Update GKE documentation to use kubernetes version 1.10.9 #4
  • Add tensorflow cifar10 benchmark #3
  • Transformer language translation #51 (ehoelzl)

Fixed bugs:

  • Change Tensorflow Benchmark to use OpenMPI #8

Closed issues:

  • Clean-up tasks #63
  • Support for local run #59
  • task implementations: delete choco, name tasks nlp/language-model and nlp/translation #55
  • remove open/closed division distinction #47
  • [Not an Issue] Comparing 3 backends on multi-node single-gpu env #44
  • Create light version of the base image for development #43
  • No unit tests #40
  • Remove stale branches #39
  • Remove Communication backend from image name #36
  • pytorch 1.4 #34
  • create light version (in addition to full) for resource heavy benchmark tasks #19
  • add script to compute official results from raw results (time to acc for example) #18

Merged pull requests:

v2.0.0

Implemented enhancements:

  • Added Goals to PyTorch Benchmark
  • Updated PyTorch Tutorial code
  • Changed all images to newest mlbench-core version.

v1.1.0

Implemented enhancements:

  • Added Tensorflow Benchmark