-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce RAM usage for reading ctapipe-stage1 files #104
Comments
I don't think you should read single events from file for that. Load tables with largish number of events that fit into RAM (configurable chunksize, like 10k to 50k events) and yield them one by one (without going back to the file) when the network needs a new batch. Read a new chunk when the current one is exhausted. We are planning on adding such chunked reading to the table reader API and since the last release |
Interesting point!
When I implemented the DL1DataReaderSTAGE1 this feature was not available. I don't think it makes much sense to change the current implementation and replace the "_get_child() indexing" with |
DL1DH is designed to load single events for the use case of training on simulated data files. The reason is randomization. First, @TjarkMiener mentioned, events of different particle types are contained in separate files, so every batch of events needs to contain events from different files (in general). We need to select events from across the dataset (as opposed to contiguous events from the same file) because contiguous simulated events can be correlated, due to multiple events being generated based on the same simulated shower being placed at different locations. Reading chunks into RAM as @maxnoe described would be fine for prediction, and would presumably be much more efficient for that use case. Right now, since DL is in the R&D phase, the computing time is dominated by training (although it should be noted that a non-negligible fraction of that time is predicting on the validation set), and it's not clear to me if it would be worth it right now to complicate the code by having separate data reading methods for training and prediction. If we want DL to eventually become a useful tool for analyzing real data, efficient prediction will be required at some point, though. Chunk-based reading could be used for training if the data were pre-randomized at write time. Some issues/caveats I can think of with this (other than the obvious, that the files we have aren't in fact randomized) are:
This is my understanding as well. Do we know the chunk size of the files? If it's very large, could that be the issue? |
@aribrill thanks, that is much more clear: random single-event reading was not a use case I had considered, but in fact it should work ok with the current format. One think you might want to look into is adding pytables indices to the files and see if that improves your performance - it should speed up random event lookup by event_id, but if you are doing it by event index only, maybe there is no strong effect. You can enabling the |
Thanks for the suggestions @kosack! I will try out to write the index tables afterward (for the merged files I always had that boolean on True). @maxnoe As of CTLearn v0.6.1 we support a pseudo-chunkwise processing at prediction phase. CTLearn v0.6.1 also converts the trained keras model into onnx format, which makes the interfacing with ctapipe much easier. |
Following up on the discussion in ctapipe PR 1730.
I noticed a significant difference in the RAM usage during training time between the ctapipe-stage1 DL1 files, which are split by tel_ids and the one split by tel_types.
The problem can be broken down to the lines L792-L800 and L827-L837 of the DL1DataReaderSTAGE1 in the stage1 branch. The core difference is that for the DL1 files split by tel_types, we only call
TABLE._f_get_child()
once in L830, outside the for loop over the telescope ids, so only onechild
is in memory. My (probably poorly) pytables understanding here is that getting achild
from anode
and then usingpytables indexing
is different than opening the whole table, storing the whole table temporarily in memory and retrieving the image by an index. @aribrill, can you please comment?As suggested by @maxnoe earsing the for loop when retreiving event images would be more efficient! I would need to check if multi-indexing on a
child
is supported bypytables
. This feature is only applicable to the DL1 files split by tel_types, because for the DL1 files split by tel_ids a for loop is required because we have multiplechilds
/tables
. The purpose of this function call (__getitem__(self, idx)
) is to retrieve information (charges, hillas parameters, simulation properties etc.) for a single event for a short time (forward pass of the network). This function is called continually during the training process on the whole DL1 production to build batches of event images. The batches are shuffled in a sense that gamma and proton events from different runs (in particular different DL1 files) are contained within those batches.The text was updated successfully, but these errors were encountered: