Update sharded_moe.py to support top2 gate with Tutel #6948

xenshinu · 2025-01-14T20:11:16Z

Tutel is forced to be unused on k > 1 since #2053
Given the fact that multiple experts per token is very common, and the gather and scatter operation without Tutel is so inefficient, I added support of tutel to top2 gate and tested on pipeline engine. This can be done for any k actually, I'll push that later when I have time to test,

xenshinu · 2025-01-14T20:13:05Z

@xenshinu please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.
@microsoft-github-policy-service agree [company="{your company}"]
Options:

(default - no company specified) I have sole ownership of intellectual property rights to my Submissions and I am not making Submissions in the course of work for my employer.
@microsoft-github-policy-service agree
(when company given) I am making Submissions in the course of work for my employer (or my employer has intellectual property rights in my Submissions by contract or applicable law). I have permission from my employer to make Submissions and enter into this Agreement on behalf of my employer. By signing below, the defined term “You” includes me and my employer.
@microsoft-github-policy-service agree company="Microsoft"
Contributor License Agreement

@microsoft-github-policy-service agree company="University of Michigan"

xenshinu · 2025-01-14T20:25:57Z

Not sure if this check is needed

    if use_tutel:
        # Tutel doesn't support index values masked with zero
        # so we need to replace masked indices with -1
        indices1_mask = mask1.sum(dim=1) * num_experts - 1
        indices1_s = torch.min(indices1_s, indices1_mask)
        indices2_mask = mask2.sum(dim=1) * num_experts - 1
        indices2_s = torch.min(indices2_s, indices2_mask)

I see that in top1gate,
https://github.com/microsoft/DeepSpeed/blob/66d3d3e94dbdfbbf6535cab66256c238983fc7c3/deepspeed/moe/sharded_moe.py#L252
but when I refer to examples from Tutel
https://github.com/microsoft/Tutel/blob/ab7937bb929bc78111d74261b490da25657a7e5c/tutel/impls/fast_dispatch.py#L143

I didn't see any specify check for non-zero mask.

deepspeed/moe/sharded_moe.py

loadams · 2025-02-10T19:45:26Z

Hi @xenshinu - FYI you'll need to update from the CLA to DCO requirements in this PR if you're able to?

xenshinu · 2025-02-10T19:52:53Z

Hi @xenshinu - FYI you'll need to update from the CLA to DCO requirements in this PR if you're able to?

Just reset the code to the previous commit and add my signed-and-off.

@loadams Could you please take care of this PR?

Signed-off-by: Xueshen Liu <[email protected]>

loadams · 2025-02-14T19:37:22Z

Hi @xenshinu - FYI you'll need to update from the CLA to DCO requirements in this PR if you're able to?

Just reset the code to the previous commit and add my signed-and-off.

@loadams Could you please take care of this PR?

@xenshinu - looks like the formatting check is failing, could you run the pre-commit formatter? pre-commit run --all-files and push those changes?

xenshinu requested a review from tohtana as a code owner January 14, 2025 20:11

loadams requested a review from hwchen2017 January 14, 2025 23:39

loadams reviewed Jan 15, 2025

View reviewed changes

deepspeed/moe/sharded_moe.py Outdated Show resolved Hide resolved

add tutel support for top2

fc15332

Signed-off-by: Xueshen Liu <[email protected]>

xenshinu force-pushed the patch-1 branch from e856af5 to fc15332 Compare February 10, 2025 20:13

xenshinu requested a review from tjruwase as a code owner February 10, 2025 20:13

xenshinu and others added 3 commits February 10, 2025 15:13

Merge branch 'master' into patch-1

898415c

Merge branch 'master' into patch-1

945c6bb

Merge branch 'master' into patch-1

6782418

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update sharded_moe.py to support top2 gate with Tutel #6948

Update sharded_moe.py to support top2 gate with Tutel #6948

xenshinu commented Jan 14, 2025 •

edited

Loading

xenshinu commented Jan 14, 2025

xenshinu commented Jan 14, 2025 •

edited

Loading

loadams commented Feb 10, 2025

xenshinu commented Feb 10, 2025 •

edited

Loading

loadams commented Feb 14, 2025

Update sharded_moe.py to support top2 gate with Tutel #6948

Are you sure you want to change the base?

Update sharded_moe.py to support top2 gate with Tutel #6948

Conversation

xenshinu commented Jan 14, 2025 • edited Loading

xenshinu commented Jan 14, 2025

xenshinu commented Jan 14, 2025 • edited Loading

loadams commented Feb 10, 2025

xenshinu commented Feb 10, 2025 • edited Loading

loadams commented Feb 14, 2025

xenshinu commented Jan 14, 2025 •

edited

Loading

xenshinu commented Jan 14, 2025 •

edited

Loading

xenshinu commented Feb 10, 2025 •

edited

Loading