Expansion factor choices #14

zhaoyanlyu · 2023-09-11T23:55:29Z

Thank you for the clear and well-executed implementation.

Following up on this issue: #11

May I kindly ask why you chose to expand the token-mixing MLP while bottlenecking the channel-mixing MLP? Is there a particular reason behind this design, or is it simply because this setup provides the best performance?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expansion factor choices #14

Expansion factor choices #14

zhaoyanlyu commented Sep 11, 2023

Expansion factor choices #14

Expansion factor choices #14

Comments

zhaoyanlyu commented Sep 11, 2023