Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible to handle arbitrary fsmaps? #402

Open
elyall opened this issue Nov 6, 2024 · 7 comments
Open

Possible to handle arbitrary fsmaps? #402

elyall opened this issue Nov 6, 2024 · 7 comments

Comments

@elyall
Copy link
Contributor

elyall commented Nov 6, 2024

I've used kerchunk and fsspec to create a reference file system of a full plate of raw tiff images and I was hoping to visualize them in napari using napari-ome-zarr. Is this ability currently possible, on the roadmap, or not planned?

The main benefit of this approach is that I only have to generate the referential json and don't have to read and write everything to a zarr file, which also duplicates the data as I don't like deleting raw data. I can tile the arrays in memory and add them to napari but I would lose out on any other benefits that the plugin provides now or in the future, such as surfacing metadata.

Example code:

from itertools import product
from pathlib import Path

import zarr
from fsspec import get_mapper
from kerchunk.tiff import tiff_to_zarr
from matplotlib import pyplot as plt
from PIL import Image


def write_data(rows, columns, folder=Path.cwd() / "test"):
    '''write some random images'''
    wells = product(rows, columns)
    folder.mkdir(exist_ok=True)
    paths = [folder / f"{r}{c}.tiff" for r, c in wells]
    for path in paths:
        array = np.random.randint(0, 255, (128, 128), dtype=np.uint8)
        img = Image.fromarray(array)
        img.save(path)
    return paths


def create_metadata(rows, columns):
    '''generate ome-ngff metadata'''
    plate_metadata = {
        "acquisitions": [
            {"id": 1, "maximumfieldcount": 1, "name": "acq_1", "starttime": 0}
        ],
        "columns": [{"name": str(c)} for c in columns],
        "field_count": 1,
        "name": "plate_name",
        "rows": [{"name": r} for r in rows],
        "version": "0.5-dev",
        "wells": [
            {
                "path": f"{r}/{c}",
                "rowIndex": rows.index(r),
                "columnIndex": columns.index(c),
            }
            for r, c in product(rows, columns)
        ],
    }
    well_metadata = {"images": [{"acquisition": 1, "path": "0"}], "version": "0.5-dev"}
    return plate_metadata, well_metadata


def create_reference(paths, plate_metadata, well_metadata):
    '''generate kerchunk reference'''
    references = dict([(path.stem, tiff_to_zarr(str(path))) for path in paths])
    reference_json = {
        ".zgroup": {"zarr_format": 2},
        ".zattrs": {"plate": plate_metadata},
    }
    for well, ref in references.items():
        row, col = well[0], well[1:]
        if (key := f"{row}/.zgroup") not in reference_json:
            reference_json[key] = {"zarr_format": 2}
        for k, v in ref.items():
            if (key := f"{row}/{col}/.zgroup") not in reference_json:
                reference_json[key] = {"zarr_format": 2}
                reference_json[f"{row}/{col}/.zattrs"] = {"well": well_metadata}
            reference_json[f"{row}/{col}/0/{k}"] = v
    return reference_json


# write data as tiffs
rows = ["A", "B"]
columns = ["1", "2", "3"]
paths = write_data(rows, columns)

# generate reference dict/json that can be mapped by fsspec
plate_metadata, well_metadata = create_metadata(rows, columns)
reference = create_reference(paths, plate_metadata, well_metadata)
print(reference)

# validate reference dict/json
fsmap = get_mapper("reference://", fo=reference)
root = zarr.open(fsmap, mode="r")
plt.imshow(root["A/1/0"])  # displays properly showing zarr handles the tiff as a zarr array chunk

# goal: display data using a standard plugin
viewer = napari.Viewer()
viewer.open(fsmap, plugin="napari-ome-zarr")  # fails
ValueError: Given reader 'napari-ome-zarr' is not a compatible reader for ['.zattrs']. No compatible readers are available for ['.zattrs'].

As a first step I should probably validate my NGFF metadata is in fact valid. Related to this I think it would be a good idea to refactor ome_zarr.writer so that metadata can be easily generated outside of actually writing it (i.e. surface and rename functions like _validate_plate_rows_columns to produce metadata dicts or dataclasses).

@imagesc-bot
Copy link

This issue has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/combining-referential-files-as-chunks-and-written-chunks-in-a-single-zarr/104609/6

@will-moore
Copy link
Member

I'm not aware that we've discussed or planned this, but thanks for the suggestion. It looks worth considering.

Looking at your code, I can't see where get_mapper() is defined?

It would also help to understand the proposed usage a bit better if you were to separate your code into one script for writing the data and one for reading, to show what info needs to be saved in order to read the data.

@elyall
Copy link
Contributor Author

elyall commented Nov 8, 2024

Looking at your code, I can't see where get_mapper() is defined?

Sorry, fixed.

It would also help to understand the proposed usage a bit better if you were to separate your code into one script for writing the data and one for reading, to show what info needs to be saved in order to read the data.

ome_zarr.io.ZarrLocation accepts a zarr.storage.FSStore as input and zarr's documentation says this is a wrapper of fsspec.FSMap which is what I was trying to input. But I can't seem to generate FSStore from a reference dict or JSON or a FSMap directly...

However it looks like with zarr v3 the zarr.storage module is drastically changing, including removing FSStore. I'll need to dig into it more. Hopefully these changes are good for ome-zarr-py and good for my proposed use case.

@joshmoore
Copy link
Member

I'm not aware that we've discussed or planned this, but thanks for the suggestion. It looks worth considering.

With the move to zarr-python 3.x, my hope is that we can accept any base Store as opposed to needing the explicit FSStore as with v2.

@will-moore
Copy link
Member

Yes, I was looking at using make_store_path() which zarr uses under the hood - from
https://github.com/zarr-developers/zarr-python/blob/89e1d4f23c33bb11a0641e0589bcfdf3555c34ae/src/zarr/storage/common.py#L235
in ome-zarr-py instead of reproducing our own logic for store creation.
However, that method is async and I don't see an equivalent non-async method I can access. The from zarr.core.sync import sync method for converting async -> sync isn't publicly available.
Maybe I should create an issue for this on zarr-python?

@will-moore
Copy link
Member

Alternatively, maybe ome-zarr-py should simply use "Implicit Store Creation" https://github.com/zarr-developers/zarr-python/blob/main/docs/guide/storage.rst#implicit-store-creation and always create a group when we do parse_url(). Do we need the ZarrLocation class with zarr-python v3?

@joshmoore
Copy link
Member

always create a group when we do parse_url()

Quite possibly.

Do we need the ZarrLocation class with zarr-python v3?

Not necessarily.

The from zarr.core.sync import sync method for converting async -> sync isn't publicly available.

Hmmm.... I was definitely using that in the challenge Python code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants