Skip to content

[ICLR'25] Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization

License

Notifications You must be signed in to change notification settings

Taishi-N324/Drop-Upcycling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Drop-Upcycling
Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization

📄 [Paper] | 🤗 [Hugging Face] 📁 [Dataset] 💻 [Code] | 📊 [Log]

Pretraining

The experiments were conducted using the following frameworks:

Dense Model Training

MoE Model Training

Evaluation

We conducted comprehensive evaluations using the evaluation framework from swallow-llm/swallow-evaluation (commit: 04948a0).

Setup and Usage

For detailed instructions on setting up the evaluation environment and running the evaluation scripts, please refer to the evaluation framework documentation.

Citation

@inproceedings{
    nakamura2025dropupcycling,
    title={Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization},
    author={Taishi Nakamura and Takuya Akiba and Kazuki Fujii and Yusuke Oda and Rio Yokota and Jun Suzuki},
    booktitle={The Thirteenth International Conference on Learning Representations},
    year={2025},
    url={https://openreview.net/forum?id=gx1wHnf5Vp}
}

About

[ICLR'25] Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published