Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some improvements to OME-TIFF write performance #4242

Merged
merged 3 commits into from
Oct 21, 2024

Conversation

melissalinkert
Copy link
Member

88b846d is primarily meant to address #4204, while 0b42e1b is expected to provide some help for #4204, #3983, and #3480.

If the use of ByteArrayOutputStream in 0b42e1b is too concerning, I can rework that change to just reduce the seeks in that were removed at line 1181 - that alone would give some measurable benefit.

Using the same test as #4204 (comment), I now see:

$ for i in `cat timepoint-counts.txt` ; do java SmallPyramidWriter $i.ome.tiff $i; done
Populating metadata...
Writing image to '1.ome.tiff'...
init: 32 ms
image writing: 347 ms
close: 214 ms
Done.
Populating metadata...
Writing image to '8.ome.tiff'...
init: 36 ms
image writing: 699 ms
close: 250 ms
Done.
Populating metadata...
Writing image to '64.ome.tiff'...
init: 31 ms
image writing: 3409 ms
close: 955 ms
Done.
Populating metadata...
Writing image to '128.ome.tiff'...
init: 35 ms
image writing: 7984 ms
close: 1009 ms
Done.
Populating metadata...
Writing image to '512.ome.tiff'...
init: 41 ms
image writing: 68755 ms
close: 2649 ms
Done.
Populating metadata...
Writing image to '1024.ome.tiff'...
init: 26 ms
image writing: 257061 ms
close: 5343 ms
Done.
Populating metadata...
Writing image to '2048.ome.tiff'...
Switching to BigTIFF (by file size)
init: 41 ms
image writing: 1002326 ms
close: 10170 ms
Done.

Comparing bfconvert performance on gh-3983&sizeX=512&sizeY=1024&pixelType=uint16&series=27&sizeT=55.fake (which matches the dimensions in #3983), I see total conversion times:

Local disk Network disk
develop 58s 351s
this PR 22s 91s

To check that this did not impact working with slides, I also tried bfconvert -noflat CMU-1.svs CMU-1.ome.tiff with and without this change. As expected, I did not see any change in conversion performance; both output OME-TIFF files behaved identically when opened in QuPath.

All together, that looks like it might be enough of an improvement to justify closing #4204 / #3983 / #3480, but is only one set of tests - confirmation with other test data and on other systems would definitely be appreciated. In particular, if any of the original issue reporters (@NicoKiaru / @anntzer / @ebremer / @tischi) have time and interest to try this, comments would be welcome.

@melissalinkert melissalinkert added this to the 8.0.0 milestone Sep 22, 2024
@NicoKiaru
Copy link
Contributor

Thanks @melissalinkert ! I'll give it a shot tomorrow.

@anntzer
Copy link

anntzer commented Sep 23, 2024

From a very quick test (not exactly the same dataset as #3983, but similar: 5 series x 141 planes, written to the same network drive or locally), comparing with 6.13.0, this PR is actually 1) worse when writing to a local fast storage (tmpfs): 27s vs. 16s and 2) still as bad when writing to the network drive: ~300s.

@NicoKiaru
Copy link
Contributor

This fixes #4204!

@ebremer
Copy link

ebremer commented Sep 30, 2024

@melissalinkert looking forward to testing this! Now, can you coax some more speed out of the readers? :-)

Copy link
Member

@sbesson sbesson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performed some initial testing on the pyramidal OME-TIFF writing. Using a real-world example from IDR and the AstroP65.ome.tiff file which has 1402 Z-sections and 3 channels i.e. over 1.2K planes.

Using Bio-Formats 7.3.1, the default bfconvert -noflat command completed in 544m11.512s. With this PR included and three consecutive executions, the same command completed in ~48min. So a massive performance improvement due when writing the IFDs which matches the initial investigation and the testing in the original issue.

From a code perspective, the change consumes the new API introduced in #3410 allowing to use directly the offset of the IFD rather than recomputing it from its index. One comment about dropping the unused currentFullResolution variable.

I'll look into the second change next week.

Copy link
Member

@sbesson sbesson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the buffered writing to a network storage, I reproduced the measurements using a synthetic file test&sizeX=512&sizeY=1024&pixelType=uint16&sizeT=55&series=27.fake and converting it using the default options on the local filesystem (/tmp) or on a NFS mount.

Conversions were repeated 3 consecutive times for each scenario with the following results

Bio-Formats Local storage (/tmp) NFS mount
7.3.1 27.1s +/- 0.3s 1880s +/ 69s
8.0.0-SNAPSHOT 12.94s +/- 0.03s 510s +/- 2.5s

@sbesson
Copy link
Member

sbesson commented Oct 7, 2024

As discussed earlier at the weekly @ome/formats meeting, the testing above indicates this PR improves the writing behavior for the two original issues. In preparation for inclusion in the 8.0.0 release, my plan is to run a few additional checks in the upcoming days and cover other typical TIFF writing scenarios including:

  • writing default TIFF files
  • tiled TIFF writing
  • tile sizes & compression
  • writing single OME-TIFF vs multiple OME-TIFF vs OME-TIFF with companion

and confirm these changes create no regression.

@sbesson
Copy link
Member

sbesson commented Oct 8, 2024

Tested an HC conversion using a synthetic fake file representative of a typical plate plate&sizeC=4&plateRows=12&plateCols=8&fields=9.fake. The data was converted using TIFF strips and zlib/Deflate compression into three outputs:

  • a single TIFF file
  • an OME-TIFF dataset with one file per series / field of view
  • an OME-TIFF dataset with one file per series / field of view and a companion OME file

For each scenario, the conversion was reproduced three times

Bio-Formats single TIFF OME-TIFF (1 file/series) OME-TIFF (1 file/series, companion)
7.3.1 240.3s +/- 4.2s 3579s +/ 342s 3670s +/ 74s
8.0.0-SNAPSHOT 230.9s +/- 7.6s 3482s +/- 54s 3695s +/- 124s

@sbesson
Copy link
Member

sbesson commented Oct 8, 2024

Tested the WSI scenario using a synthetic fake file "slide&sizeX=40000&sizeY=40000&sizeC=3&rgb=3&resolutions=5.fake". The data was converted using tiled TIFF with JPEG compression using:

  • plain TIFF
  • OME-TIFF with 1024x1024 tiles
  • OME-TIFF with 512x512 tiles
Bio-Formats TIFF OME-TIFF (1024x104) OME-TIFF (512x512)
7.3.1 181.0 +/- 1.5s 184.5 +/ 4.0s 301.2s +/ 62.3s
8.0.0-SNAPSHOT 187.1 +/- 8.9s 177.0s +/- 7.7s 297.1s +/- 47.0s

Copy link
Member

@sbesson sbesson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the different rounds of testing in a different conditions (TIFF vs OME-TIFF, tiles vs strips, multi-resolution, multi-series, multi-file), I have identified no scenario where the changes proposed in this PR lead to an obvious performance degradation.

Additionally, I was able to confirm the changes improve the writing performance in the two scenarios described in the associated issues i.e.:

  • multi-resolution OME-TIFF with a large number of planes
  • TIFF writing (using strips) on a shared network

The resolution of the first issue has also been confirmed independently by the thorough testing of @NicoKiaru in #4204 (comment).

I'll give 24h for a last round of comments to this PR and then proposing to merge for inclusion in the upcoming Bio-Formats 8 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants