Use h5py for output data writing and consolidation to reduce memory footprint #10

thomas-a-neil · 2019-08-08T23:49:17Z

Building on CannyLab/rinokeras#12, the data consolidation step will read the entire output dataset into memory (which will crash for relatively small datasets if we include all encoder outputs, especially for the LSTM).

hdf5 allows us to iteratively write, and avoid the memory overhead of pickle

Upon reflection, some documentation update should probably be done as well, because I think we reference pickle a few time

…ootprint

thomas-a-neil · 2019-08-08T23:49:44Z

This should also help with songlab-cal/tape#8

rmrao · 2019-08-25T23:06:59Z

Should we merge this? I don't think the rinokeras changes have been merged to master?

thomas-a-neil · 2019-08-26T19:03:45Z

It depends on rinokeras changes, so I don't think we can merge it yet.

rmrao · 2020-01-07T22:40:22Z

Closing since both this and rinokeras are in basic maintenance mode now, so no major changes will be made.

Use h5py for output data writing and consolidation to reduce memory f…

5d6db4a

…ootprint

thomas-a-neil requested a review from rmrao August 8, 2019 23:49

Properly pick up each sequence in a batch

c61ad82

rmrao closed this Jan 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use h5py for output data writing and consolidation to reduce memory footprint #10

Use h5py for output data writing and consolidation to reduce memory footprint #10

thomas-a-neil commented Aug 8, 2019

thomas-a-neil commented Aug 8, 2019

rmrao commented Aug 25, 2019

thomas-a-neil commented Aug 26, 2019

rmrao commented Jan 7, 2020

Use h5py for output data writing and consolidation to reduce memory footprint #10

Use h5py for output data writing and consolidation to reduce memory footprint #10

Conversation

thomas-a-neil commented Aug 8, 2019

thomas-a-neil commented Aug 8, 2019

rmrao commented Aug 25, 2019

thomas-a-neil commented Aug 26, 2019

rmrao commented Jan 7, 2020