Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve speed and RAM consumption of buffered slice writer #937

Merged
merged 13 commits into from
Aug 15, 2023

Conversation

philippotto
Copy link
Member

@philippotto philippotto commented Aug 10, 2023

Description:

  • Improves RAM consumption by avoiding to concate all data sections at once. Instead, the given sections are traversed in chunks (using the shard size) and each chunk is filled so that it can be written to disk.
  • As a byproduct, the speed also increased a bit. However, I'm not too sure whether this is a general win. Either way, np.pad is not necessary anymore, because a sufficiently sized buffer is created first into which the sections are written.
  • Adds more tests for BufferedSliceWriter
  • Adds warnings when the BufferedSliceWriter is used without aligning it to the dataset's chunk size.

As a performance benchmark I created a WKW dataset with chunk_shape=(32, 32, 32) and chunks_per_shard=(32, 32, 32). Then, I wrote random data with shape (4096, 4096, 32) to it (multiple times).

Runs Avg. Time [s] Avg. RAM [MB]
New 4.76 1188
Old 5.94 1713

So, the new approach only uses ~70% of the RAM for this benchmark. It should be close to 50%, but I measured the RAM with time which also measures the overall ram consumption of the entire python script and pytest environment in which I executed the benchmark.

Issues:

Todos:

Make sure to delete unnecessary points or to check all before merging:

  • Updated Changelog
  • Updated Documentation
  • Added / Updated Tests
  • Considered adding this to the Examples

@philippotto philippotto self-assigned this Aug 10, 2023
@philippotto philippotto changed the title [WIP] Improve speed and RAM consumption of buffered slice writer Improve speed and RAM consumption of buffered slice writer Aug 11, 2023
@normanrz
Copy link
Member

Runs Avg. Time [s] Avg. RAM [MB]
Old 4.76 1188
New 5.94 1713

Did you mix up old <> new?

@philippotto
Copy link
Member Author

Runs Avg. Time [s] Avg. RAM [MB]
Old 4.76 1188
New 5.94 1713

Did you mix up old <> new?

Oops, yes, I did. Corrected it now.

Copy link
Member

@normanrz normanrz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I think the next iteration could include to read the input images in a chunked manner instead of having to read entire images at once.

assert np.all(data == written_data)


def test_buffered_slice_writer_should_warn_about_unaligned_usage(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, in test_buffered_slice_writer_along_different_axis you aligned the offset and here are testing that unaligned screams?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly.

@philippotto
Copy link
Member Author

Looks good. I think the next iteration could include to read the input images in a chunked manner instead of having to read entire images at once.

I'm not sure if I understand correctly. The BufferedSliceWriter isn't responsible for reading images. Do you mean its interface should support sending chunks to it so that users of the Writer can read images in chunks?

@normanrz
Copy link
Member

Looks good. I think the next iteration could include to read the input images in a chunked manner instead of having to read entire images at once.

I'm not sure if I understand correctly. The BufferedSliceWriter isn't responsible for reading images. Do you mean its interface should support sending chunks to it so that users of the Writer can read images in chunks?

I know this isn't the responsibility of the BufferedSliceWriter. But when looking holistically at Dataset.from_images (and similar APIs), that would be my next angle of attack for optimization.

@philippotto
Copy link
Member Author

Ah ok, sorry for the misunderstanding. Sounds good 👍

@philippotto philippotto enabled auto-merge (squash) August 15, 2023 07:35
@philippotto philippotto merged commit 4be5dba into master Aug 15, 2023
18 checks passed
@philippotto philippotto deleted the improve-buffered-slice-writer branch August 15, 2023 08:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants