📄 [Paper] | 🤗 [Hugging Face] 📁 [Dataset] 💻 [Code] | 📊 [Log]
The experiments were conducted using the following frameworks:
- Framework: Megatron-LM
- Framework: moe-recipes
We conducted comprehensive evaluations using the evaluation framework from swallow-llm/swallow-evaluation (commit: 04948a0).
For detailed instructions on setting up the evaluation environment and running the evaluation scripts, please refer to the evaluation framework documentation.
@inproceedings{
nakamura2025dropupcycling,
title={Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization},
author={Taishi Nakamura and Takuya Akiba and Kazuki Fujii and Yusuke Oda and Rio Yokota and Jun Suzuki},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=gx1wHnf5Vp}
}