Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix compute() on plate grid #299

Merged
merged 9 commits into from
Sep 29, 2023
3 changes: 2 additions & 1 deletion ome_zarr/reader.py
Original file line number Diff line number Diff line change
Expand Up @@ -541,7 +541,8 @@ def get_tile(tile_name: str) -> np.ndarray:
LOGGER.debug("LOADING tile... %s with shape: %s", path, tile_shape)

try:
data = self.zarr.load(path)
# compute() to get the data from dask - we want to return array
data = self.zarr.load(path).compute()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the computation not happen as late as possible, e.g., during the concatenation below, to make use of dask's scheduler?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't we already making use of dask's scheduler? Or do you mean make better use of it?

Would that actually give better performance? Not being very familiar with it, I'm reading https://docs.dask.org/en/stable/scheduler-overview.html which says there are gains to be made if you have a large graph and "share many intermediates", but I don't see that we can do that here?

I've simplified the code a fair bit in ee9ef7d to remove the extra layer of delayed() wrapping. This looks cleaner and still works as expected. It just means that we call self.zarr.load(path) for every tile right at the start, even if we never need to load/compute that region. I don't know if this makes any difference to performance but probably not?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better use, yes, so that the right things happen parallel.

except ValueError:
LOGGER.exception("Failed to load %s", path)
data = np.zeros(tile_shape, dtype=self.numpy_type)
Expand Down
Loading