Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reference Fine-Tuning Code #498

Open
Palmik opened this issue Jan 31, 2025 · 1 comment
Open

Reference Fine-Tuning Code #498

Palmik opened this issue Jan 31, 2025 · 1 comment

Comments

@Palmik
Copy link

Palmik commented Jan 31, 2025

Is your feature request related to a problem? Please describe.
I am interested in fine-tuning DeepSeek V3/R1.

Describe the solution you'd like
It would be great to provide the fine-tuning code, even if it's simplistic, it would be invaluable reference for others to build upon.
MoEs have historically been tricky to fine-tune correctly (and in the case of some older MoE models, it took the community months to figure out all the bugs in the HF implementation).

@dinithaw
Copy link

Is your feature request related to a problem? Please describe.
Yes, I'm interested in fine-tuning DeepSeek V3/R1, but I’m aware that Mixture of Experts (MoEs) models can be tricky to fine-tune, especially considering the challenges the community has faced with previous models. It would be helpful to have guidance or reference code for fine-tuning to avoid potential pitfalls.

It would be invaluable if there could be fine-tuning code or a basic example provided for DeepSeek V3/R1. Even a simplistic version would go a long way in helping others in the community build upon it. MoE models often require specific adjustments, and having a working starting point or references for fine-tuning could save a lot of time and effort. It’s also worth noting that the community could benefit from any lessons learned regarding issues specific to the MoE structure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants