Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

load data sequence is confusing #20358

Open
workhours opened this issue Oct 22, 2024 · 2 comments
Open

load data sequence is confusing #20358

workhours opened this issue Oct 22, 2024 · 2 comments
Labels
bug Something isn't working needs triage Waiting to be triaged by maintainers ver: 2.4.x

Comments

@workhours
Copy link

Bug description

I understand data consuming sequence in lightning is:
1, sanity check: call val_dataloader
2, training: call train_dataloader
3, validate: call val_dataloader
from above sequence I understand the cycle of a epoch is start from val_dataloader and end at train_dataloader, and the 3rd validate reuse val data from 1st val_dataloader.
but if if you check trainer.current_epoch: assume current_epoch is 1 at sanity check val_dataloader, then it increased to 2 at train_dataloader. in thise case it's seems the cycle of a epoch is start from train_dataloader and end at val_dataloader.
in this situation will confuse how to write code in val_dataloader when dynamic loading data. if infinite epoch, no problem. but at last epoch(I don't know now it's last one), should I ignore val_data is None or should I try to load it as if next round of cycle?

I think sanitcy check logic and validate logic should merge as one data-setup, but used twice for difference purpose. twice call val_dataloader and once call training_dataloader also make difficult to manage data load

What version are you seeing the problem on?

v2.4

How to reproduce the bug

No response

Error messages and logs

# Error messages and logs here please

Environment

Current environment
#- PyTorch Lightning Version (e.g., 2.4.0):
#- PyTorch Version (e.g., 2.4):
#- Python version (e.g., 3.12):
#- OS (e.g., Linux):
#- CUDA/cuDNN version:
#- GPU models and configuration:
#- How you installed Lightning(`conda`, `pip`, source):

More info

No response

@workhours workhours added bug Something isn't working needs triage Waiting to be triaged by maintainers labels Oct 22, 2024
@workhours
Copy link
Author

sorry for submit many times since damned firewall.
btw, fit_loop.on_run_start and on_advanced_start has twice setup_data, why? is advanced_start the real start of train?

@workhours
Copy link
Author

the simple scenario is if user want feed data for next epoch, just give a one-callable interface. once for all types of data(val, train,test,predict...) which give a clear message: if call again, it's must request data for next epoch.
not the framework is very very flexible that it's difficult to write data moving logic in val_dataloader, train_dataloader, etc..
or the framework should provide a clear notification that the current epoch is ended, no more data request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage Waiting to be triaged by maintainers ver: 2.4.x
Projects
None yet
Development

No branches or pull requests

1 participant