Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

presimple_toyzero in memory dataset #16

Open
wants to merge 10 commits into
base: master
Choose a base branch
from
Open

Conversation

YHRen
Copy link

@YHRen YHRen commented Oct 21, 2021

Motivation:

To preload entire dataset into memory, and shareable among worker processes.

Currently, loading each npz file, link, involves unzip and memory allocation (~5 MiB). We should do this once and store the resulting 128x128 (64 KiB) in memory. This will be beneficial for training and testing on small datasets (~64GB for 1M).

Here is a brief testing and profiling.

Test and Profiling ```python from toytools.datasets import PreSimpleToyzeroDataset ```
folder = '/home/yren/GC3D/adc_samples/toygan/small_sample_512'
fname = 'n100-U-128x128_sample_512.csv'
ds = PreSimpleToyzeroDataset(folder, fname, is_train=True, seed=7)
ds_mem = PreSimpleToyzeroDataset(folder, fname, is_train=True, seed=7, is_in_mem=True)
print("sample size = ", len(ds))
sample size =  410
%timeit ds[4]
23 ms ± 130 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit ds_mem[4]
477 ns ± 1.72 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
(ds[4][0] == ds_mem[4][0]).all()
True
from torch.utils.data import DataLoader
dl = DataLoader(ds, batch_size=32, num_workers=4)
dl_mem = DataLoader(ds_mem, batch_size=32, num_workers=4)
%%timeit
for _ in dl:
    pass
2.94 s ± 15.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit
for _ in dl_mem:
    pass
82.9 ms ± 4.34 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

n100-U-128x128_sample_512.csv is top 512 examples in the 50k dataset.

p.s. I also removed the __pycache__ folder.

@YHRen YHRen requested review from pphuangyi and usert5432 October 21, 2021 06:33
@usert5432
Copy link
Collaborator

Hi @YHRen. I was wondering if presimple_toyzero is the right place to implement prefetching? Similar to how you suggested to separate NamedDict functionality from the checkpoint functionality, maybe it will be better to separate prefetchhing from image handling? For example, one can implement the PreferchingDecorator following the OOP paradigm, like:

class PrefetchingDecorator(GenericDataset):

   def __init__(self, dset):
      self._dset = dset
      self._shared_data = self._preload_data()

   ...
    def __getitem__(self, index):
        return [self._shared_data[index][0], self._shared_data[index][1]]

And use PrefetchingDecorator(dataset) when needed? The benefit of this approach would be, is that the Prefetching decorator can be transparently used with all dataset types.

@usert5432
Copy link
Collaborator

Also, FYI, we already have a dataset that is designed for superfast handling of crops -- precropped_toyzero.

@YHRen
Copy link
Author

YHRen commented Oct 21, 2021

@usert5432

For example, one can implement the PreferchingDecorator following the OOP paradigm, like ...

Absolutely! Great idea. Feel free to change accordingly.
My goal was to quickly implement something so Yi and you can use with one line change in config.
What you have suggested is exactly a proper way to organize the code.

Also, FYI, we already have a dataset that is designed for superfast handling of crops -- precropped_toyzero.

I haven't looked into this. Yi pointed me to the presimple dataset file she is using. After briefly looking into it, the loading is similar but loads a 128x128 uncompressed file from filesystem?
(I don't think it would be faster than loading from memory directly : D , especially for a GPFS. I think each process will cache individually, and ended up with duplicating dataset num_of_workers times. For larger dataset, may lead to memory contention.) But feel free to test it and benchmark it.

In any case, let me know if this PR is erroneous. If you think it is ok, we should let Yi to switch to this one to save some training time.

@usert5432
Copy link
Collaborator

In any case, let me know if this PR is erroneous. If you think it is ok, we should let Yi to switch to this one to save some training time.

I do not see any obvious errors with this PR. Sure, Yi is welcome to use the new dataset.

I haven't looked into this. Yi pointed me to the presimple dataset file she is using. After briefly looking into it, the loading is similar but loads a 128x128 uncompressed file from filesystem?
(I don't think it would be faster than loading from memory directly : D , especially for a GPFS. I think each process will cache individually, and ended up with duplicating dataset num_of_workers times. For larger dataset, may lead to memory contention.) But feel free to test it and benchmark it.

Linux maintains a single cache per filesystem, so there won't be any cache duplication between processes. I am happily using it with 1:1 CPU:GPU ratio and the CPU is never a bottleneck.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants