Skip to content

Latest commit

 

History

History
14 lines (9 loc) · 709 Bytes

llm_training.md

File metadata and controls

14 lines (9 loc) · 709 Bytes

LLM Training

General

  • The Llama 3 Herd of Models [paper]
  • TorchScale - A Library for Transformers at (Any) Scale [GitHub]
  • DLRover: An Automatic Distributed Deep Learning System [GitHub]

2024

  • FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision [paper]
  • MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs [paper]
  • ByteCheckpoint: A Unified Checkpointing System for LLM Development [paper]