-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loss is NaN when using half precision #377
Comments
Okay I figured out that the nan's were due the adam optimisation. The default epsilon of 1e-8 is too low and rounded to zero like pointed out here. Setting it to 1e-4 fixes the nan problem but now the optimisation does not decrease the loss anymore. Is there a way to solve this wile keeping the same learning rate? |
You can keep FP32 for the optimizer as explained here : https://devblogs.nvidia.com/mixed-precision-training-deep-neural-networks/ |
I solved this issue by using autocast instead of .half(), which was from suggestion of PyTorch team. |
When I run my model on half precision(fp16) the Loss function returns NaN. It all works fine when I use normal floating point precision (fp32) so I don't think it is a problem of the learning parameters. It also is NaN right from the beginning of the training.
I am using the SpatialCrossEntropyCriterion and I also do explicitly not convert every MaxPooling and BatchNormalization to cudnn since these don't work otherwise.
Relevant code:
I am wondering if the reason is the (not existing?) CudaHalf implementation for the BatchNormalization module?
The text was updated successfully, but these errors were encountered: