20 Feb 10:43

ko3n1g

v0.11.0rc0

7c00175

NVIDIA Megatron Core 0.11.0rc0 Pre-release

Pre-release

Prerelease: NVIDIA Megatron Core 0.11.0rc0 (2025-02-20)

Assets 2

17 Feb 17:31

ko3n1g

core_r0.10.0

7ee599a

NVIDIA Megatron Core 0.10.0 Latest

Latest

Adding MLA to MCore
Enable FP8 for GroupedMLP
MoE Parallel Folding
Enhance MoE Architecture: Support MoE Layer Frequency Patterns and Configurable MoE FFN Hidden Size
Multimodal: NVLM training and evaluation support in MCore
Mamba Hybrid
- Increase performance and reduce memory footprint of Triton language/compiler distributed caching
- Add more unit testing and fix bugs

Assets 2

24 Oct 10:30

ko3n1g

core_r0.9.0

1afee59

NVIDIA Megatron Core 0.9.0

Uneven pipeline parallelism
- Enable pipeline parallelism where first and last ranks have fewer transformer layers than the intermediate ranks
Per layer CUDAGraph support for GPT training with Transformer Engine modules
Enable different TP sizes for the vision encoder
Enable pipeline parallelism for T5 & Llava models
Support multi-tile multi-image input in Llava models
MoE
- FP8 support
- Runtime upcycling support
- Dispatcher implementation optimizations
- Shared expert support with overlapping optimizations
  - Qwen Model support
Mamba Hybrid
- Main branch is no longer compatible with released checkpoints (use ssm branch)
- Add distributed checkpointing
- Fix bugs related to inference
- Add unit tests
Known Issues
- When using sequence parallel, during the transformer block forward pass, dropout is not using the appropriate rng context.

Assets 2

13 Aug 12:12

ko3n1g

core_r0.8.0

baf94af

NVIDIA Megatron Core 0.8.0

Multimodal
- Added initial support for training vision language models using the LLaVA architecture
- Added initial support for inference with multimodal inputs
- End-to-end multimodal example from data collection to training to evaluation is provided in examples/multimodal
MoE
- Context Parallel support.
- Distributed checkpoint support for grouped GEMM.
Mamba
- Added initial support for training and inference of Mamba-2 models
- Support for hybrid models consisting of Mamba-2, attention, and MLP layers
- Examples provided in examples/mamba

Assets 2

05 Jun 23:12

ericharper

core_v0.7.0

a645f89

NVIDIA Megatron Core 0.7.0

MoE
- Token drop support
- Several efficiency optimizations
- Improved model parallelism
- Memory optimizations
Distributed checkpointing
- Enabled for Retro
- Asynchronous checkpoint saving
Several minor bug fixes, speed improvements, and memory optimizations

Assets 2

19 Apr 23:46

ericharper

core_v0.6.0

cac60ce

NVIDIA Megatron Core 0.6.0

MoE (Mixture of Experts)
- Performance optimization
  - Communication optimization for multi GPU and Single GPU
  - 23% improvement (323 TFLOPS/GPU) over MCore 0.5.0 on Mixtral with Hopper BF16
  - GroupedMLP enhancement for Hopper
  - DP Overlapping. Support overlapping computation with gradient reduction and parameter gathering.
- All-to-All based Token Dispatcher
- Layer-wise logging for load balancing loss.
- Improved expert parallel support including distributed optimizer.
Distributed optimizer
RETRO
- Data processing
BERT
- Distributed checkpointing
Dist checkpointing
- PyTorch native distributed backend
- Improved saving/loading speed
TensorRT-LLM Export
- Integration with TensorRT Model Optimizer Post-training quantization (PTQ)
- Text generation driver to perform PTQ in Megatron-LM
- Llama2 and Nemotron3-8b examples to use TensorRT-LLM unified build API to build engine after training.
Several minor enhancements, bug fixes, and documentation updates

Assets 2

22 Mar 16:44

ericharper

core_v0.5.0

0acc240

NVIDIA Megatron Core 0.5.0

Key Features and Enhancements

Megatron core documentation is now live!

Model Features

MoE (Mixture of Experts)
- Support for Z-loss, Load balancing and Sinkhorn
- Layer and communications refactor
- Richer parallelism mappings and EP can be combined with other model parallel techniques for larger MoE variants, e.g. EP + TP + DP + SP + PP
- Token dropless architecture with Top-K routing
- Performance optimization with with GroupedGEMM when number of local experts is > 1
- Distributed checkpointing
Interleaved rotary embedding

Datasets

Masked WordPiece datasets for BERT and T5
Raw and mock datasets

Parallelism

Performance

Activation offloading to CPU
Rope and Swiglu fusion
Sliding window attention (via Transformer Engine)

General Improvements

Timers

Assets 2

14 Dec 23:18

jaredcasper

core_v0.4.0

38879f8

NVIDIA Megatron Core 0.4.0

Key Features and Enhancements

Models

BERT
RETRO
T5

Parallelism

Mixture of Experts support for GPT
Model parallel efficient Distributed Data Parallel (DDP)
Context Parallel (2D Tensor Parallel) support

Datasets

GPT Dataset
Blended Dataset

Assets 2

11 May 22:28

jaredcasper

23.04

2360d73

23.04

Merge branch 'pip_package' into 'main'

Add pip package for megatron.core

See merge request ADLR/megatron-lm!598

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Key Features and Enhancements

Model Features

Datasets

Parallelism

Performance

General Improvements

Key Features and Enhancements

Models

Parallelism

Datasets

Releases: NVIDIA/Megatron-LM

NVIDIA Megatron Core 0.11.0rc0

NVIDIA Megatron Core 0.10.0

NVIDIA Megatron Core 0.9.0

NVIDIA Megatron Core 0.8.0

NVIDIA Megatron Core 0.7.0

NVIDIA Megatron Core 0.6.0

NVIDIA Megatron Core 0.5.0

Key Features and Enhancements

Model Features

Datasets

Parallelism

Performance

General Improvements

NVIDIA Megatron Core 0.4.0

Key Features and Enhancements

Models

Parallelism

Datasets

23.04