Skip to content

Releases: NVIDIA/Megatron-LM

NVIDIA Megatron Core 0.11.0rc0

20 Feb 10:43
7c00175
Compare
Choose a tag to compare
Pre-release

Prerelease: NVIDIA Megatron Core 0.11.0rc0 (2025-02-20)

NVIDIA Megatron Core 0.10.0

17 Feb 17:31
7ee599a
Compare
Choose a tag to compare
  • Adding MLA to MCore
  • Enable FP8 for GroupedMLP
  • MoE Parallel Folding
  • Enhance MoE Architecture: Support MoE Layer Frequency Patterns and Configurable MoE FFN Hidden Size
  • Multimodal: NVLM training and evaluation support in MCore
  • Mamba Hybrid
    • Increase performance and reduce memory footprint of Triton language/compiler distributed caching
    • Add more unit testing and fix bugs

NVIDIA Megatron Core 0.9.0

24 Oct 10:30
Compare
Choose a tag to compare
  • Uneven pipeline parallelism
    • Enable pipeline parallelism where first and last ranks have fewer transformer layers than the intermediate ranks
  • Per layer CUDAGraph support for GPT training with Transformer Engine modules
  • Enable different TP sizes for the vision encoder
  • Enable pipeline parallelism for T5 & Llava models
  • Support multi-tile multi-image input in Llava models
  • MoE
    • FP8 support
    • Runtime upcycling support
    • Dispatcher implementation optimizations
    • Shared expert support with overlapping optimizations
      • Qwen Model support
  • Mamba Hybrid
    • Main branch is no longer compatible with released checkpoints (use ssm branch)
    • Add distributed checkpointing
    • Fix bugs related to inference
    • Add unit tests
  • Known Issues
    • When using sequence parallel, during the transformer block forward pass, dropout is not using the appropriate rng context.

NVIDIA Megatron Core 0.8.0

13 Aug 12:12
Compare
Choose a tag to compare
  • Multimodal
    • Added initial support for training vision language models using the LLaVA architecture
    • Added initial support for inference with multimodal inputs
    • End-to-end multimodal example from data collection to training to evaluation is provided in examples/multimodal
  • MoE
    • Context Parallel support.
    • Distributed checkpoint support for grouped GEMM.
  • Mamba
    • Added initial support for training and inference of Mamba-2 models
    • Support for hybrid models consisting of Mamba-2, attention, and MLP layers
    • Examples provided in examples/mamba

NVIDIA Megatron Core 0.7.0

05 Jun 23:12
Compare
Choose a tag to compare
  • MoE
    • Token drop support
    • Several efficiency optimizations
    • Improved model parallelism
    • Memory optimizations
  • Distributed checkpointing
    • Enabled for Retro
    • Asynchronous checkpoint saving
  • Several minor bug fixes, speed improvements, and memory optimizations

NVIDIA Megatron Core 0.6.0

19 Apr 23:46
Compare
Choose a tag to compare
  • MoE (Mixture of Experts)
    • Performance optimization
      • Communication optimization for multi GPU and Single GPU
      • 23% improvement (323 TFLOPS/GPU) over MCore 0.5.0 on Mixtral with Hopper BF16
      • GroupedMLP enhancement for Hopper
      • DP Overlapping. Support overlapping computation with gradient reduction and parameter gathering.
    • All-to-All based Token Dispatcher
    • Layer-wise logging for load balancing loss.
    • Improved expert parallel support including distributed optimizer.
  • Distributed optimizer
  • RETRO
    • Data processing
  • BERT
    • Distributed checkpointing
  • Dist checkpointing
    • PyTorch native distributed backend
    • Improved saving/loading speed
  • TensorRT-LLM Export
    • Integration with TensorRT Model Optimizer Post-training quantization (PTQ)
    • Text generation driver to perform PTQ in Megatron-LM
    • Llama2 and Nemotron3-8b examples to use TensorRT-LLM unified build API to build engine after training.
  • Several minor enhancements, bug fixes, and documentation updates

NVIDIA Megatron Core 0.5.0

22 Mar 16:44
Compare
Choose a tag to compare

Key Features and Enhancements

Megatron core documentation is now live!

Model Features

  • MoE (Mixture of Experts)
    • Support for Z-loss, Load balancing and Sinkhorn
    • Layer and communications refactor
    • Richer parallelism mappings and EP can be combined with other model parallel techniques for larger MoE variants, e.g. EP + TP + DP + SP + PP
    • Token dropless architecture with Top-K routing
    • Performance optimization with with GroupedGEMM when number of local experts is > 1
    • Distributed checkpointing
  • Interleaved rotary embedding

Datasets

  • Masked WordPiece datasets for BERT and T5
  • Raw and mock datasets

Parallelism

Performance

  • Activation offloading to CPU
  • Rope and Swiglu fusion
  • Sliding window attention (via Transformer Engine)

General Improvements

  • Timers

NVIDIA Megatron Core 0.4.0

14 Dec 23:18
Compare
Choose a tag to compare

Key Features and Enhancements

Models

  • BERT
  • RETRO
  • T5

Parallelism

  • Mixture of Experts support for GPT
  • Model parallel efficient Distributed Data Parallel (DDP)
  • Context Parallel (2D Tensor Parallel) support

Datasets

  • GPT Dataset
  • Blended Dataset

23.04

11 May 22:28
Compare
Choose a tag to compare
Merge branch 'pip_package' into 'main'

Add pip package for megatron.core

See merge request ADLR/megatron-lm!598