Skip to content

Commit

Permalink
moe post
Browse files Browse the repository at this point in the history
  • Loading branch information
LeonEricsson committed Dec 12, 2023
1 parent aff78c5 commit e64c473
Show file tree
Hide file tree
Showing 2 changed files with 8 additions and 0 deletions.
8 changes: 8 additions & 0 deletions posts/2023-12-12-moe.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,3 +22,11 @@ Google was one the first to blend large scale Transformers with MoEs in a framew

- **Random routing**. The top expert is always picked but the second expert is sampled according to the gating weight probabilities.
- **Expert capacity**. A threshold for how many tokens can be processed by one expert. If both experts are at capacity, the token is considered overflowed and is sent to the next layer via a skip connection.

# Mixtral

Mixtral uses concepts inspired by the Switch Transformer. It has a similar architecture as Mistral 7B with the difference that each Transformer Block replaces the FFN with a Switch Transformer block. Below is an illustration from the Switch Transformer [paper](https://arxiv.org/abs/2006.16668):

![](/public/images/switchtransformer.png)

For every token, at each layer, a router network (gate) selects two experts to process the current state and combine the outputs. Mixtral uses 8 experts with top-2 gating. Even though each token only sees two experts, the selected expert can be different at each timestep. In practice, this means that Mixtral decodes at the speed of a 12B model, while having access to 45B parameters. The requirements to run this model are still quite hefty, you are looking at upwards of 90GB in memory but fortunately quantized versions of Mixtral have already been released and are available through popular frameworks such as llama.cpp, vLLM and HF Transformers. MoE as a architecture is interesting because of how you handle the experts in terms of batching, data parallelism and model parallelism.
Binary file added public/images/switchtransformer.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit e64c473

Please sign in to comment.