-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix compute() on plate grid #299
Conversation
Codecov ReportAll modified lines are covered by tests ✅
📢 Thoughts on this report? Let us know!. |
Hi @will-moore I installed from this branch with: pip install git+https://github.com/will-moore/ome-zarr-py.git@plate_dask_compute And I can see the updated lines in the site package of my environment. I can confirm that 286a4f2 fixes the plate level. However the well level still has the same issue: import dask.array as da
from ome_zarr.io import parse_url
from ome_zarr.reader import Reader
# well level in an HCS store
location = parse_url("hcs.ome.zarr/A/1")
reader = Reader(location)
data = next(reader()).data[0]
# this passes
assert isinstance(data, da.Array), type(data)
result = data.compute()
# this fails
assert not isinstance(result, da.Array), type(result) |
ome_zarr/reader.py
Outdated
@@ -541,7 +541,8 @@ def get_tile(tile_name: str) -> np.ndarray: | |||
LOGGER.debug("LOADING tile... %s with shape: %s", path, tile_shape) | |||
|
|||
try: | |||
data = self.zarr.load(path) | |||
# compute() to get the data from dask - we want to return array | |||
data = self.zarr.load(path).compute() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the computation not happen as late as possible, e.g., during the concatenation below, to make use of dask's scheduler?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't we already making use of dask's scheduler? Or do you mean make better use of it?
Would that actually give better performance? Not being very familiar with it, I'm reading https://docs.dask.org/en/stable/scheduler-overview.html which says there are gains to be made if you have a large graph and "share many intermediates", but I don't see that we can do that here?
I've simplified the code a fair bit in ee9ef7d to remove the extra layer of delayed()
wrapping. This looks cleaner and still works as expected. It just means that we call self.zarr.load(path)
for every tile right at the start, even if we never need to load/compute that region. I don't know if this makes any difference to performance but probably not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better use, yes, so that the right things happen parallel.
f20ad62
to
9a6b947
Compare
@ziw-liu Thanks for that - fixed for Wells too now. And I added some tests. |
No, I don't. The error is:
Possibly due to the deprecation back in 2021? https://www.sphinx-doc.org/en/master/changes.html#release-4-1-0-released-jul-12-2021 |
@will-moore @joshmoore This is a problem with readthedocs. I faced the same problem on ome/omero-scripts#187 |
That last commit fixes failing tests with a bug that comes from #300. That bug wasn't caught in the tests for that PR since that code is only run when the you call Also tested that fix in napari. But doing the same with current
|
Fixes #297
This fixes the issue above: The
get_tile()
already returns a dask array so it doesn't need to be wrapped indask.delayed()
.Test using the code sample on the ticket above.