Tutel v0.1.4
What's New in v0.1.4:
- Enhance communication features: a2a overlap with computation, support different granularity of group creation, etc.
- Add single-thread CPU implementation for correctness check & reference;
- Refine JIT compiler interface for flexible usability: jit::inject_source && jit::jit_execute;
- Enhance examples: fp64 support, cuda amp, checkpointing, etc.
- Support execution inside torch.distributed.pipeline.
How to Setup:
python3 -m pip install --user https://github.com/microsoft/tutel/archive/refs/tags/v0.1.4.tar.gz
Contributors: @yzygitzh, @ghostplant, @EricWangCN