-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OME Tiff Information: Do we want to keep it? Also H5 Writer defaults balloon file size, see last comment #9
Comments
A few things discovered yesterday/yesterday evening:
EDIT: Link to discussion on Image.sc describing speeds of using H5/zarr for posterity. |
Tiff stacks will not be generated, instead we will be writing to either H5 or zarr. It would be best to retain the OME information if possible, but how to best do that is not yet clear. |
For posterity, including Image.sc post describing this fast stuff and also that MATLAB is ballooning file sizes 4x due to it's storage of datasets as If you observe an h5 file that was generated by MATLAB's h5 writer with default settings (I could never seem to figure out how to change the datatype it writes to...) in Python using something like this:
|
@jmdelahanty did you also work with multichannel ome tiff datasets?
I received a dual color example recently, and I was expecting to see an indication of the presence of two channels in the XML file, but as I iterated over the |
I spent a little time today looking for the source of this line:
But I haven't been able to locate it. In a couple minutes I'll take a super small recording with 2 channels and can share what I find here. Here's an example .xml file with a fake recording I took today (at the bottom). Parsing it properly is surprisingly difficult for me! The tldr is that it appears the very first tif still has both channels encoded in it. Here's what I've done so far.
Is this helpful? To be clear, this is the first tiff of the recording which has tags inside it formatted as an xml string. I couldn't figure out how to access that directly, so I dumped it to an xml file that I've shared here. |
Thank you @jmdelahanty for looking into this! This is how I was trying to parse it, but in our dual color example "SizeC" is 1, but I expected that it will be 2.
Now I wonder how the XML file comes into the picture, so far the Bruker TIFF format that we've seen with @CodyCBakerPhD are the numerous |
Thanks for the snippet looking into this! I've been struggling to properly access that information and it looks like that is indeed how to do it better. It did! All Bruker recordings have an associated master Here's that xml file as a .txt: And here's the env file as a .txt: |
Thank you @jmdelahanty this is incredibly useful! I think we will keep the format as close to the "raw" output as possible, but it's definitely good to know. |
@jmdelahanty I was also wondering based on your experience with Bruker, is it possible to define disjoint z planes (defining a gap between the z planes) or they always correspond to a volume? |
In our lab's use of one of our Bruker scopes thus far, we image one plane at a time and do a set of trials at each plane. Our new Bruker system, the 2P+, has an electrically tunable lens. Deryn (@doleduke), one of our lab's graduate students, is going to be using it in the future! I'll also tag one of Bruker's helpful service engineers Kevin Mann (@mannk2) who would know. I don't know how often they check their GitHub notifications, but I can ask Deryn if she knows yet and also email Kevin from Bruker about it. |
Hey, thank so much, could you add my work email too? |
Done! |
@jmdelahanty Could you confirm for us with version >= 5.8. we can expect multipage tiffs only, or is it still possible for the user to write individual files? I'm just thinking whether we can trust the version number to infer whether we have multipage tiffs or just read the first .ome.tif and check the number of pages within.
|
Hello! I literally just tested this a couple hours ago through this repo/docker! The default "ripper" still produces individual ome.tif files. There's a second executable that can take those tiffs and place them into multi-page tiffs. I would imagine most users probably don't even know that multi-page tif writing is even possible with the newer updates since I don't know how closely people read changelogs from Bruker for their systems. The author of Suite2p (Carsen Stringer), a super common processing pipeline, doesn't suggest you use single page tifs unless you have to as with the Bruker case (see here). At some point in the future, Bruker is planning on doing online, multi-page tif writing by default unless a legacy machine can't keep up with that. So eventually you'll probably have to handle those outputs. Could link my docs describing that message from their lead dev Michael Fox. |
Bruker outputs OME.tiff files after the ripper has completed but only the first image that's written is the
OME.tif
master file. This contains any additional channels that are recorded at the time. So if two channels were recorded at once, the OME.tifs that are generated will ONLY have the master-file generated forCh1
which contains all the data for any subsequent channels inside recorded during a session.I'm asking questions in a gitter chatroom for zarr developers who have been helpful so far but I haven't heard back about a couple other new things yet.
I'm currently able to parse the
OME
xml properly but am still learning about the different image tag numbers that are used. There doesn't appear to be a clear explanation for what these different tags are or what other tag information is available in the OME-tiff master file. Here's what I've found out so far about the structure:As of 12/13/21, a method for parsing an OME-tiff master file structured as this one that will preserve the metadata of tiffs for chunked HDF5/zarr files has not been found.
Good news is that it appears that Deryn's code in MATLAB using the
h5write
function does know how to create chunks of data from the many tiffs! It will take some additional testing to make sure it's doing what we expect, but MATLAB may have solved this issue for us thankfully. Another piece of good news is that we can likely run multiple copies MATLAB at once because it looks like each conversion to H5 only uses 1 CPU for a conversion (probably because it's I/O dependent).Unfortunately, it looks like MATLAB's
h5write
doesn't seem to know how to keep the OME metadata. It doesn't seem like not having the metadata gets in the way of any processing or sharing and I'm currently unaware ofNWB
usesOME
data/can store it somehow. I've opened a discussion here to find out more.However the tiffs are processed, each channel should have it's own chunked dataset with chunks of around 300MB which consists of about 600 images each. This will make it so processing can be done upon smaller chunks on machines without using so much memory for single planes/recordings and, especially important, make it easy to display data on people's desktops in the office. If one day multiple channels are to be recorded (I've seen up to 6 at once in some forum posts!) consistently and we do decide to keep the OME information for a given session we'll probably want to get that data out of the xml and build chunked datasets automatically.
I've also commented on a Suite2p issue here to see if the developers/users believe that having chunked datasets makes sense/would be helpful. Other people have told me that it makes sense to do so, but the user in this question is somehow recording over 1 million images in a recording which is just bonkers. I don't know how in the world they're processing that... Regardless, Carsen Stringer, the lead developer from Janelia, basically said that making stacks is annoying (which it certainly is turning out to be...) and so maybe doing this is unnecessary. Will update this upon hearing answers...
For now, here's what I propose be done in the near future:
dask
The text was updated successfully, but these errors were encountered: