Reference Fine-Tuning Code #498

Palmik · 2025-01-31T05:51:55Z

Is your feature request related to a problem? Please describe.
I am interested in fine-tuning DeepSeek V3/R1.

Describe the solution you'd like
It would be great to provide the fine-tuning code, even if it's simplistic, it would be invaluable reference for others to build upon.
MoEs have historically been tricky to fine-tune correctly (and in the case of some older MoE models, it took the community months to figure out all the bugs in the HF implementation).

dinithaw · 2025-01-31T06:06:26Z

Is your feature request related to a problem? Please describe.
Yes, I'm interested in fine-tuning DeepSeek V3/R1, but I’m aware that Mixture of Experts (MoEs) models can be tricky to fine-tune, especially considering the challenges the community has faced with previous models. It would be helpful to have guidance or reference code for fine-tuning to avoid potential pitfalls.

It would be invaluable if there could be fine-tuning code or a basic example provided for DeepSeek V3/R1. Even a simplistic version would go a long way in helping others in the community build upon it. MoE models often require specific adjustments, and having a working starting point or references for fine-tuning could save a lot of time and effort. It’s also worth noting that the community could benefit from any lessons learned regarding issues specific to the MoE structure.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reference Fine-Tuning Code #498

Reference Fine-Tuning Code #498

Palmik commented Jan 31, 2025

dinithaw commented Jan 31, 2025

Reference Fine-Tuning Code #498

Reference Fine-Tuning Code #498

Comments

Palmik commented Jan 31, 2025

dinithaw commented Jan 31, 2025