Improve speed and RAM consumption of buffered slice writer #937

philippotto · 2023-08-10T13:39:04Z

Description:

Improves RAM consumption by avoiding to concate all data sections at once. Instead, the given sections are traversed in chunks (using the shard size) and each chunk is filled so that it can be written to disk.
As a byproduct, the speed also increased a bit. However, I'm not too sure whether this is a general win. Either way, np.pad is not necessary anymore, because a sufficiently sized buffer is created first into which the sections are written.
Adds more tests for BufferedSliceWriter
Adds warnings when the BufferedSliceWriter is used without aligning it to the dataset's chunk size.

As a performance benchmark I created a WKW dataset with chunk_shape=(32, 32, 32) and chunks_per_shard=(32, 32, 32). Then, I wrote random data with shape (4096, 4096, 32) to it (multiple times).

Runs	Avg. Time [s]	Avg. RAM [MB]
New	4.76	1188
Old	5.94	1713

So, the new approach only uses ~70% of the RAM for this benchmark. It should be close to 50%, but I measured the RAM with time which also measures the overall ram consumption of the entire python script and pytest environment in which I executed the benchmark.

Issues:

related to https://github.com/scalableminds/voxelytics/issues/3318

Todos:

Make sure to delete unnecessary points or to check all before merging:

Updated Changelog
Updated Documentation
Added / Updated Tests
Considered adding this to the Examples

…n up tests

normanrz · 2023-08-14T10:00:42Z

Runs Avg. Time [s] Avg. RAM [MB]
Old 4.76 1188
New 5.94 1713

Did you mix up old <> new?

philippotto · 2023-08-14T10:01:48Z

Runs Avg. Time [s] Avg. RAM [MB]
Old 4.76 1188
New 5.94 1713

Did you mix up old <> new?

Oops, yes, I did. Corrected it now.

normanrz

Looks good. I think the next iteration could include to read the input images in a chunked manner instead of having to read entire images at once.

normanrz · 2023-08-14T11:01:37Z

webknossos/tests/dataset/test_buffered_slice_utils.py

+    assert np.all(data == written_data)
+
+
+def test_buffered_slice_writer_should_warn_about_unaligned_usage(


So, in test_buffered_slice_writer_along_different_axis you aligned the offset and here are testing that unaligned screams?

Yes, exactly.

philippotto · 2023-08-14T11:56:11Z

Looks good. I think the next iteration could include to read the input images in a chunked manner instead of having to read entire images at once.

I'm not sure if I understand correctly. The BufferedSliceWriter isn't responsible for reading images. Do you mean its interface should support sending chunks to it so that users of the Writer can read images in chunks?

normanrz · 2023-08-14T12:14:55Z

Looks good. I think the next iteration could include to read the input images in a chunked manner instead of having to read entire images at once.

I'm not sure if I understand correctly. The BufferedSliceWriter isn't responsible for reading images. Do you mean its interface should support sending chunks to it so that users of the Writer can read images in chunks?

I know this isn't the responsibility of the BufferedSliceWriter. But when looking holistically at Dataset.from_images (and similar APIs), that would be my next angle of attack for optimization.

…o improve-buffered-slice-writer

philippotto · 2023-08-15T07:35:07Z

Ah ok, sorry for the misunderstanding. Sounds good 👍

philippotto added 3 commits August 10, 2023 15:10

add tests for buffered slice writer

d905dd9

reduce data size for test

794b2c5

improve performance of buffered slice writer

ea588fb

philippotto self-assigned this Aug 10, 2023

philippotto added 7 commits August 10, 2023 15:52

format

8e39e07

format and lint

9a3f358

re-add support for dimension parameter in buffered slice writer; clea…

b7a9fa3

…n up tests

implement warnings for unaligned writes in buffered slice writer

9c0ebbc

fix type

ef06af9

linting

0b906ba

format

e5121ab

philippotto changed the title ~~[WIP] Improve speed and RAM consumption of buffered slice writer~~ Improve speed and RAM consumption of buffered slice writer Aug 11, 2023

try to fix flaky test

d39882d

philippotto requested a review from normanrz August 12, 2023 16:32

normanrz approved these changes Aug 14, 2023

View reviewed changes

Merge branch 'master' into improve-buffered-slice-writer

b1573a9

Merge branch 'master' of github.com:scalableminds/webknossos-libs int…

8449f4e

…o improve-buffered-slice-writer

philippotto enabled auto-merge (squash) August 15, 2023 07:35

philippotto merged commit 4be5dba into master Aug 15, 2023
18 checks passed

philippotto deleted the improve-buffered-slice-writer branch August 15, 2023 08:23

philippotto mentioned this pull request Dec 19, 2023

Fix buffered slice writer overwriting data under certain conditions #973

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve speed and RAM consumption of buffered slice writer #937

Improve speed and RAM consumption of buffered slice writer #937

philippotto commented Aug 10, 2023 •

edited

Loading

normanrz commented Aug 14, 2023

philippotto commented Aug 14, 2023

normanrz left a comment

normanrz Aug 14, 2023

philippotto Aug 14, 2023

philippotto commented Aug 14, 2023

normanrz commented Aug 14, 2023

philippotto commented Aug 15, 2023

		assert np.all(data == written_data)


		def test_buffered_slice_writer_should_warn_about_unaligned_usage(

Improve speed and RAM consumption of buffered slice writer #937

Improve speed and RAM consumption of buffered slice writer #937

Conversation

philippotto commented Aug 10, 2023 • edited Loading

Description:

Issues:

Todos:

normanrz commented Aug 14, 2023

philippotto commented Aug 14, 2023

normanrz left a comment

Choose a reason for hiding this comment

normanrz Aug 14, 2023

Choose a reason for hiding this comment

philippotto Aug 14, 2023

Choose a reason for hiding this comment

philippotto commented Aug 14, 2023

normanrz commented Aug 14, 2023

philippotto commented Aug 15, 2023

philippotto commented Aug 10, 2023 •

edited

Loading