-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pytorch dataset: pass cache/prefetch to DataChain instances #653
Conversation
This should enable prefetching for pytorch datasets. I have found it to increase performance for `torch-loader.py` example by 20-25% when prefetch and cache is enabled. However, this was done on my machine and was not a scientific measurement.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #653 +/- ##
=======================================
Coverage 87.67% 87.67%
=======================================
Files 111 111
Lines 10601 10603 +2
Branches 1436 1436
=======================================
+ Hits 9294 9296 +2
Misses 945 945
Partials 362 362
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! 👍
I was considering whether we need prefetching at all, given that (Even with When running with I’m currently experimenting with a dataset containing a larger number of files to test the hypothesis that prefetching improves performance. While prefetching should theoretically enhance
|
This should enable prefetching on
to_pytorch
API. We were not passing anysettings
to theDataChain
instances created insideto_pytorch
, due to which it was not usingcache
orprefetch
.I have refrained from setting
workers
, etc. because I believe that is better left toPytorchDataLoader
for now.I have found this PR to increase performance for
torch-loader.py
example by 20-25% when prefetch and cache is enabled. However, this was done on my machine and was not a scientific measurement.Closes #631.