presimple_toyzero in memory dataset #16

YHRen · 2021-10-21T06:33:49Z

Motivation:

To preload entire dataset into memory, and shareable among worker processes.

Currently, loading each npz file, link, involves unzip and memory allocation (~5 MiB). We should do this once and store the resulting 128x128 (64 KiB) in memory. This will be beneficial for training and testing on small datasets (~64GB for 1M).

Here is a brief testing and profiling.

Test and Profiling

```python from toytools.datasets import PreSimpleToyzeroDataset ```

folder = '/home/yren/GC3D/adc_samples/toygan/small_sample_512'
fname = 'n100-U-128x128_sample_512.csv'
ds = PreSimpleToyzeroDataset(folder, fname, is_train=True, seed=7)
ds_mem = PreSimpleToyzeroDataset(folder, fname, is_train=True, seed=7, is_in_mem=True)
print("sample size = ", len(ds))

sample size =  410

%timeit ds[4]

23 ms ± 130 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit ds_mem[4]

477 ns ± 1.72 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

(ds[4][0] == ds_mem[4][0]).all()

True

from torch.utils.data import DataLoader

dl = DataLoader(ds, batch_size=32, num_workers=4)
dl_mem = DataLoader(ds_mem, batch_size=32, num_workers=4)

%%timeit
for _ in dl:
    pass

2.94 s ± 15.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%%timeit
for _ in dl_mem:
    pass

82.9 ms ± 4.34 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

n100-U-128x128_sample_512.csv is top 512 examples in the 50k dataset.

p.s. I also removed the __pycache__ folder.

Add functions to detect multi-track windows

Extract script added

usert5432 · 2021-10-21T16:20:20Z

Hi @YHRen. I was wondering if presimple_toyzero is the right place to implement prefetching? Similar to how you suggested to separate NamedDict functionality from the checkpoint functionality, maybe it will be better to separate prefetchhing from image handling? For example, one can implement the PreferchingDecorator following the OOP paradigm, like:

class PrefetchingDecorator(GenericDataset):

   def __init__(self, dset):
      self._dset = dset
      self._shared_data = self._preload_data()

   ...
    def __getitem__(self, index):
        return [self._shared_data[index][0], self._shared_data[index][1]]

And use PrefetchingDecorator(dataset) when needed? The benefit of this approach would be, is that the Prefetching decorator can be transparently used with all dataset types.

usert5432 · 2021-10-21T16:21:45Z

Also, FYI, we already have a dataset that is designed for superfast handling of crops -- precropped_toyzero.

YHRen · 2021-10-21T19:39:34Z

@usert5432

For example, one can implement the PreferchingDecorator following the OOP paradigm, like ...

Absolutely! Great idea. Feel free to change accordingly.
My goal was to quickly implement something so Yi and you can use with one line change in config.
What you have suggested is exactly a proper way to organize the code.

Also, FYI, we already have a dataset that is designed for superfast handling of crops -- precropped_toyzero.

I haven't looked into this. Yi pointed me to the presimple dataset file she is using. After briefly looking into it, the loading is similar but loads a 128x128 uncompressed file from filesystem?
(I don't think it would be faster than loading from memory directly : D , especially for a GPFS. I think each process will cache individually, and ended up with duplicating dataset num_of_workers times. For larger dataset, may lead to memory contention.) But feel free to test it and benchmark it.

In any case, let me know if this PR is erroneous. If you think it is ok, we should let Yi to switch to this one to save some training time.

usert5432 · 2021-10-21T21:59:31Z

In any case, let me know if this PR is erroneous. If you think it is ok, we should let Yi to switch to this one to save some training time.

I do not see any obvious errors with this PR. Sure, Yi is welcome to use the new dataset.

I haven't looked into this. Yi pointed me to the presimple dataset file she is using. After briefly looking into it, the loading is similar but loads a 128x128 uncompressed file from filesystem?
(I don't think it would be faster than loading from memory directly : D , especially for a GPFS. I think each process will cache individually, and ended up with duplicating dataset num_of_workers times. For larger dataset, may lead to memory contention.) But feel free to test it and benchmark it.

Linux maintains a single cache per filesystem, so there won't be any cache duplication between processes. I am happily using it with 1:1 CPU:GPU ratio and the CPU is never a bottleneck.

Yi Huang and others added 10 commits September 22, 2021 00:30

extract script added

b2926e7

rm checkpoint folder

8a2dd54

extract_toy-sp.py

524a26c

remove unnecessary folders

be5537e

remove unnecessary folders

3711cb7

add a gitignore file

dd3c2de

Merge pull request #14 from LS4GAN/find_multiple_track

8c68663

Add functions to detect multi-track windows

better alignment

2711612

Merge pull request #15 from LS4GAN/extract_script_added

89b4236

Extract script added

presimple_toyzero in memory dataset

676f9ed

YHRen requested review from pphuangyi and usert5432 October 21, 2021 06:33

usert5432 force-pushed the master branch from 89b4236 to a36d98a Compare December 1, 2021 16:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

presimple_toyzero in memory dataset #16

presimple_toyzero in memory dataset #16

YHRen commented Oct 21, 2021

usert5432 commented Oct 21, 2021

usert5432 commented Oct 21, 2021

YHRen commented Oct 21, 2021

usert5432 commented Oct 21, 2021

presimple_toyzero in memory dataset #16

Are you sure you want to change the base?

presimple_toyzero in memory dataset #16

Conversation

YHRen commented Oct 21, 2021

usert5432 commented Oct 21, 2021

usert5432 commented Oct 21, 2021

YHRen commented Oct 21, 2021

usert5432 commented Oct 21, 2021