etp docs (#12111)

* etp docs Signed-off-by: Alexandros Koumparoulis <[email protected]> * fix Signed-off-by: Alexandros Koumparoulis <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]>
NVIDIA · Feb 9, 2025 · bc48d3b · bc48d3b
1 parent 4cb31c9
commit bc48d3b
Showing 1 changed file with 28 additions and 3 deletions.
diff --git a/docs/source/features/parallelisms.rst b/docs/source/features/parallelisms.rst
@@ -262,7 +262,9 @@ Unlike other model-parallel techniques, EP is applied to only the expert layers
 Enable Expert Parallelism
 ~~~~~~~~~~~~~~~~~~~~~~~~~
 
-To enable EP, set ``model.expert_model_parallel_size`` to the expert parallel size you want. For example, if the model has eight experts (``num_moe_experts=8``), then setting ``expert_model_parallel_size=4`` results in each GPU processing two experts. The number of experts should be divisible by the expert parallel size.
+To enable EP, set ``expert_model_parallel_size`` to the expert parallel size you want in the ``MegatronStrategy``'s arguments.
+For example, if the model has eight experts (``num_moe_experts=8``), then setting ``expert_model_parallel_size=4`` results in each GPU processing two experts.
+The number of experts should be divisible by the expert parallel size.
 
    .. code-block:: python
 
@@ -284,12 +286,35 @@ Set expert parallelism directly from CLI:
 For further information on configuration, refer to the following documentation: `NeMo Megatron GPT Config <https://github.com/NVIDIA/NeMo/blob/main/examples/nlp/language_modeling/conf/megatron_gpt_config.yaml#L68>`__.
 
 
-Implement Expert Parallelism
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Enable Expert Tensor Parallelism
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+To enable ETP, set ``expert_tensor_parallel_size`` to the size you want in the ``MegatronStrategy``'s arguments. For example:
+
+   .. code-block:: python
+
+       from nemo.collections import llm
+       from functools import partial
+
+       # Load train recipe
+       recipe = partial(llm.mixtral_8x7b.pretrain_recipe)()
+
+       # Set expert tensor parallel size
+       recipe.trainer.strategy.expert_tensor_parallel_size = 4
+
+Set expert tensor parallelism directly from CLI:
+
+    .. code-block:: bash
+
+      nemo llm pretrain --factory mixtral_8x7b trainer.strategy.expert_tensor_parallel_size=4
+
+
+Expert Parallelism Implementation
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 The NeMo Framework implementation of EP uses functionality from Megatron Core. Please consult the `Megatron Core MoE layer <https://github.com/NVIDIA/Megatron-LM/blob/e2ec14ab5690fead7e33760b0f8fb20c83b4fd1f/megatron/core/transformer/moe/moe_layer.py#L29>`_ for more MoE implementation details.
 
 
+
 Activation Partitioning
 -----------------------