Skip to content

reduce all-to-all communication volume when both expert and non-expert are tensor-parallel #10174

reduce all-to-all communication volume when both expert and non-expert are tensor-parallel

reduce all-to-all communication volume when both expert and non-expert are tensor-parallel #10174