-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add multi-threading like in remfile #78
Comments
Are you talking about loading multiple zarr/hdf5 chunks in parallel? Or loading a single large zarr/hdf5 chunk more efficiently using multiple threads? If it's the former: Right now the slicing is passed on to zarr, which I believe does not do multi-threaded reads. Are you suggesting that we specially handle slicing in a special way in LindiH5pyDataset? The relevant code would be here lindi/lindi/LindiH5pyFile/LindiH5pyDataset.py Lines 167 to 219 in 85e7897
Would we look at the argument being passed in to the slicing in order to see if this could be parallelized, and then spawn multiple threads in this function? I think this would get complicated with all the various slicing possibilities. If it's the latter: |
Yes.
I see. Yeah, that would be challenging. In exploring this, I found a different possible solution: We could defer calls for |
We could also roll our own by overriding |
@rly I think we should try to "roll our own" because lindi takes care of authenticating URLs of embargoed dandisets. Shall I take a crack at that? |
Ah, I see. Yeah, that would be great. |
Currently Remfile reads a large number of chunks faster than lindi. I think it is because Remfile uses multi-threading for requests. It would be nice to add that here for reading large numbers of chunks efficiently.
The text was updated successfully, but these errors were encountered: