You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
I am interested in fine-tuning DeepSeek V3/R1.
Describe the solution you'd like
It would be great to provide the fine-tuning code, even if it's simplistic, it would be invaluable reference for others to build upon.
MoEs have historically been tricky to fine-tune correctly (and in the case of some older MoE models, it took the community months to figure out all the bugs in the HF implementation).
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
Yes, I'm interested in fine-tuning DeepSeek V3/R1, but I’m aware that Mixture of Experts (MoEs) models can be tricky to fine-tune, especially considering the challenges the community has faced with previous models. It would be helpful to have guidance or reference code for fine-tuning to avoid potential pitfalls.
It would be invaluable if there could be fine-tuning code or a basic example provided for DeepSeek V3/R1. Even a simplistic version would go a long way in helping others in the community build upon it. MoE models often require specific adjustments, and having a working starting point or references for fine-tuning could save a lot of time and effort. It’s also worth noting that the community could benefit from any lessons learned regarding issues specific to the MoE structure.
Is your feature request related to a problem? Please describe.
I am interested in fine-tuning DeepSeek V3/R1.
Describe the solution you'd like
It would be great to provide the fine-tuning code, even if it's simplistic, it would be invaluable reference for others to build upon.
MoEs have historically been tricky to fine-tune correctly (and in the case of some older MoE models, it took the community months to figure out all the bugs in the HF implementation).
The text was updated successfully, but these errors were encountered: