Skip to content

Commit

Permalink
etp docs (#12111)
Browse files Browse the repository at this point in the history
* etp docs

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* fix

Signed-off-by: Alexandros Koumparoulis <[email protected]>

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
  • Loading branch information
akoumpa authored Feb 9, 2025
1 parent 4cb31c9 commit bc48d3b
Showing 1 changed file with 28 additions and 3 deletions.
31 changes: 28 additions & 3 deletions docs/source/features/parallelisms.rst
Original file line number Diff line number Diff line change
Expand Up @@ -262,7 +262,9 @@ Unlike other model-parallel techniques, EP is applied to only the expert layers
Enable Expert Parallelism
~~~~~~~~~~~~~~~~~~~~~~~~~

To enable EP, set ``model.expert_model_parallel_size`` to the expert parallel size you want. For example, if the model has eight experts (``num_moe_experts=8``), then setting ``expert_model_parallel_size=4`` results in each GPU processing two experts. The number of experts should be divisible by the expert parallel size.
To enable EP, set ``expert_model_parallel_size`` to the expert parallel size you want in the ``MegatronStrategy``'s arguments.
For example, if the model has eight experts (``num_moe_experts=8``), then setting ``expert_model_parallel_size=4`` results in each GPU processing two experts.
The number of experts should be divisible by the expert parallel size.

.. code-block:: python
Expand All @@ -284,12 +286,35 @@ Set expert parallelism directly from CLI:
For further information on configuration, refer to the following documentation: `NeMo Megatron GPT Config <https://github.com/NVIDIA/NeMo/blob/main/examples/nlp/language_modeling/conf/megatron_gpt_config.yaml#L68>`__.


Implement Expert Parallelism
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Enable Expert Tensor Parallelism
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
To enable ETP, set ``expert_tensor_parallel_size`` to the size you want in the ``MegatronStrategy``'s arguments. For example:

.. code-block:: python
from nemo.collections import llm
from functools import partial
# Load train recipe
recipe = partial(llm.mixtral_8x7b.pretrain_recipe)()
# Set expert tensor parallel size
recipe.trainer.strategy.expert_tensor_parallel_size = 4
Set expert tensor parallelism directly from CLI:

.. code-block:: bash
nemo llm pretrain --factory mixtral_8x7b trainer.strategy.expert_tensor_parallel_size=4
Expert Parallelism Implementation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The NeMo Framework implementation of EP uses functionality from Megatron Core. Please consult the `Megatron Core MoE layer <https://github.com/NVIDIA/Megatron-LM/blob/e2ec14ab5690fead7e33760b0f8fb20c83b4fd1f/megatron/core/transformer/moe/moe_layer.py#L29>`_ for more MoE implementation details.



Activation Partitioning
-----------------------

Expand Down

0 comments on commit bc48d3b

Please sign in to comment.