-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] How to improve the training perfomance in MLX compare to pytorch and keras ? #1542
Comments
Can it be related to the #1153 (comment) comment ?
|
If you can try either of those and report back, that would be useful to know.
It's possible..would be good to know if it fixes your issue first though. |
The 0 training loss in MLX seems incorrect particularly given the training MSE seems reasonable. I would double check you are averaging the loss in MLX correctly. Otherwise it mostly looks reasonable.. fine tuning learning rates, warmups, initializations etc could all help. |
I will have a synchronize LR scheduler tomorrow to be sure this is not the part that affect the model deviation. Yes I will look at the loss that is strange. |
I use this LR scheduler now in mlx, one potential issue is that in pytorch/tensorflow the LR scheduler is per epoch while in mlx it is per step, is it possible to have an epoch equivalent ?
|
Is there a way to fix a seed for mlx similar to the torch.manual_seed(int) ? |
Yes, was expected that was the case.The good accuracy on torch is due to a specific seed… so we can close that issue.Envoyé de mon iPhoneLe 12 nov. 2024 à 15:21, Awni Hannun ***@***.***> a écrit :
mx.random.seed
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Describe the bug
I have a major issue that I have seen in lot of the cases on other trial. The MLX training gives rarely a good performance while for torch and keras it is more stable and better. This is really a bottleneck to use MLX, as you need to train 10 to 20 time your model to get a good result while torch and keras are systematically in a good range (rmse : 0.50-0.55).
important: models (tf/keras, torch and mlx) have the same number of trainable parameters, and we use the same train, val and test split for the 3 methods).
To Reproduce
run several time the following code the best result is jumping out of the pytorch and tf/keras results
https://github.com/thegodone/apple_ai_model/blob/main/AttFP_mlx_faster.ipynb
https://github.com/thegodone/apple_ai_model/blob/main/AttFP_torch.ipynb
https://github.com/thegodone/apple_ai_model/blob/main/AttFP_tf.ipynb
Expected behavior
I don't know if it is weights initialization or optimizer that can cause this huge difference between the 3 packages.
Desktop (please complete the following information):
see #1531
The text was updated successfully, but these errors were encountered: