Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NaNs during training #1

Open
eps696 opened this issue Feb 9, 2020 · 7 comments
Open

NaNs during training #1

eps696 opened this issue Feb 9, 2020 · 7 comments

Comments

@eps696
Copy link

eps696 commented Feb 9, 2020

Thanks you for this very interesting repo.
I got some issues with training though: the parameters (codebook values for motion and total_loss for appearance) drop to NaN within few epochs (5-20) when trained with default hyperparameters.
Increase in learning rate to 0.5~1e-4 eliminates this behaviour, but doesn't look like a solution.
Did you encounter such issues in your practice and do you have any advice to sort this out?

@olegkhomenko
Copy link

+1
Same problem during both motion and appearance model fine-tuning

@olegkhomenko
Copy link

@eps696 there is a bug if you are using pytorch>=1.3.0:
replace 142 line in train.py (and in test.py as well)

y = F.grid_sample(frame1, flow.permute(0,2,3,1), padding_mode="border")

with

y = F.grid_sample(frame1, flow.permute(0,2,3,1), padding_mode="border", align_corners=True)

@eps696
Copy link
Author

eps696 commented May 29, 2020

@olegkhomenko thanks; i use pytorch <= 1.2.0, there's no such option

@HellwayXue
Copy link

Have you solved this problem? could you please tell me the solution?

@eps696
Copy link
Author

eps696 commented Jun 26, 2020

@HellwayXue for that time i just added some dumb logging after defining codebook, manually resuming the training if that happens:

            if numpy.isnan(numpy.min(codebook)):
                raise Exception(' training failed at epoch %d' % epoch)

you may want to use numpy.nan_to_num instead (as it's used in sort_motion_codebook for instance)

@HellwayXue
Copy link

@eps696 Thank you for your reply! but how these nans come from? I've tried gradient clipping, optimizer weight decay and none of them work. Your suggestions are after defining codebook, but i get nans even if i don't save it.

@eps696
Copy link
Author

eps696 commented Jun 28, 2020

@HellwayXue no idea alas
tbh i didn't really get into codebook mechanics; as far as it works in general, i'm ok with some flaws

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants