-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some improvements to OME-TIFF write performance #4242
Some improvements to OME-TIFF write performance #4242
Conversation
Thanks @melissalinkert ! I'll give it a shot tomorrow. |
From a very quick test (not exactly the same dataset as #3983, but similar: 5 series x 141 planes, written to the same network drive or locally), comparing with 6.13.0, this PR is actually 1) worse when writing to a local fast storage (tmpfs): 27s vs. 16s and 2) still as bad when writing to the network drive: ~300s. |
This fixes #4204! |
@melissalinkert looking forward to testing this! Now, can you coax some more speed out of the readers? :-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Performed some initial testing on the pyramidal OME-TIFF writing. Using a real-world example from IDR and the AstroP65.ome.tiff file which has 1402 Z-sections and 3 channels i.e. over 1.2K planes.
Using Bio-Formats 7.3.1, the default bfconvert -noflat
command completed in 544m11.512s. With this PR included and three consecutive executions, the same command completed in ~48min. So a massive performance improvement due when writing the IFDs which matches the initial investigation and the testing in the original issue.
From a code perspective, the change consumes the new API introduced in #3410 allowing to use directly the offset of the IFD rather than recomputing it from its index. One comment about dropping the unused currentFullResolution
variable.
I'll look into the second change next week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the buffered writing to a network storage, I reproduced the measurements using a synthetic file test&sizeX=512&sizeY=1024&pixelType=uint16&sizeT=55&series=27.fake
and converting it using the default options on the local filesystem (/tmp
) or on a NFS mount.
Conversions were repeated 3 consecutive times for each scenario with the following results
Bio-Formats | Local storage (/tmp) | NFS mount |
---|---|---|
7.3.1 | 27.1s +/- 0.3s | 1880s +/ 69s |
8.0.0-SNAPSHOT | 12.94s +/- 0.03s | 510s +/- 2.5s |
As discussed earlier at the weekly @ome/formats meeting, the testing above indicates this PR improves the writing behavior for the two original issues. In preparation for inclusion in the 8.0.0 release, my plan is to run a few additional checks in the upcoming days and cover other typical TIFF writing scenarios including:
and confirm these changes create no regression. |
Tested an HC conversion using a synthetic fake file representative of a typical plate
For each scenario, the conversion was reproduced three times
|
Tested the WSI scenario using a synthetic fake file "slide&sizeX=40000&sizeY=40000&sizeC=3&rgb=3&resolutions=5.fake". The data was converted using tiled TIFF with JPEG compression using:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the different rounds of testing in a different conditions (TIFF vs OME-TIFF, tiles vs strips, multi-resolution, multi-series, multi-file), I have identified no scenario where the changes proposed in this PR lead to an obvious performance degradation.
Additionally, I was able to confirm the changes improve the writing performance in the two scenarios described in the associated issues i.e.:
- multi-resolution OME-TIFF with a large number of planes
- TIFF writing (using strips) on a shared network
The resolution of the first issue has also been confirmed independently by the thorough testing of @NicoKiaru in #4204 (comment).
I'll give 24h for a last round of comments to this PR and then proposing to merge for inclusion in the upcoming Bio-Formats 8 release.
88b846d is primarily meant to address #4204, while 0b42e1b is expected to provide some help for #4204, #3983, and #3480.
If the use of
ByteArrayOutputStream
in 0b42e1b is too concerning, I can rework that change to just reduce the seeks in that were removed at line 1181 - that alone would give some measurable benefit.Using the same test as #4204 (comment), I now see:
Comparing
bfconvert
performance ongh-3983&sizeX=512&sizeY=1024&pixelType=uint16&series=27&sizeT=55.fake
(which matches the dimensions in #3983), I see total conversion times:To check that this did not impact working with slides, I also tried
bfconvert -noflat CMU-1.svs CMU-1.ome.tiff
with and without this change. As expected, I did not see any change in conversion performance; both output OME-TIFF files behaved identically when opened in QuPath.All together, that looks like it might be enough of an improvement to justify closing #4204 / #3983 / #3480, but is only one set of tests - confirmation with other test data and on other systems would definitely be appreciated. In particular, if any of the original issue reporters (@NicoKiaru / @anntzer / @ebremer / @tischi) have time and interest to try this, comments would be welcome.