Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix parallel buffered slice writer bug #922

Merged
merged 20 commits into from
Aug 14, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
9204b09
Extend traceback to get exception of future.
markbader Jun 27, 2023
2bb3883
Implement pad flag for webknossos cli convert subcommand.
markbader Jun 29, 2023
54d3477
Implement hotfix solution for fitting bbox without pad flag.
markbader Jul 10, 2023
3fe785e
Merge branch 'master' into Fix_parallel_BufferedSliceWriter_bug
markbader Jul 10, 2023
7c6f46d
Update changelog.
markbader Jul 10, 2023
44dd88c
Merge branch 'master' into Fix_parallel_BufferedSliceWriter_bug
markbader Jul 13, 2023
52cc6b6
Add update_bbox argument and propagate it from from_images to actual …
markbader Jul 17, 2023
5fe1387
Merge branch 'master' into Fix_parallel_BufferedSliceWriter_bug
markbader Jul 17, 2023
1863194
Implement requested changes.
markbader Jul 17, 2023
3bbcd06
Minor changes to default values.
markbader Jul 17, 2023
a6366c1
Run formatter.
markbader Jul 17, 2023
f230aed
Merge branch 'master' into Fix_parallel_BufferedSliceWriter_bug
markbader Jul 20, 2023
e3336f7
Add filelock dependency and start to change to SoftFileLock implement…
markbader Jul 20, 2023
1f4f773
Implement filelock (upaths are not supported yet).
markbader Jul 20, 2023
9122256
Adapted implementation of SoftFileLock. Still does not support filelo…
markbader Jul 24, 2023
cf643d1
Reverted changes with filelock and make some notes for further implem…
markbader Jul 26, 2023
b21b997
Add json_update_allowed bool.
markbader Aug 1, 2023
54acf99
Merge branch 'master' into Fix_parallel_BufferedSliceWriter_bug
markbader Aug 1, 2023
ea7dc34
Add paragraph in docs and implement requested changes.
markbader Aug 14, 2023
3256aa0
Merge branch 'master' into Fix_parallel_BufferedSliceWriter_bug
markbader Aug 14, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions docs/src/webknossos-py/examples/dataset_usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,3 +14,13 @@ which themselves can comprise multiple [magnifications represented via `MagView`
webknossos/examples/dataset_usage.py
--8<--
```

## Parallel Access of WEBKNOSSOS Datasets

Please consider these restrictions when accessing a WEBKNOSSOS dataset in a multiprocessing-context:

- When writing shards in parallel, `json_update_allowed` should be set to `False` to disable the automatic update of the bounding box metadata. Otherwise, race conditions may happen. The user is responsible for updating the bounding box manually.
- When writing to chunks in shards, one chunk may only be written to by one actor at any time.
- When writing to compressed shards, one shard may only be written to by one actor at any time.
- For Zarr datasets, parallel write access to shards is not allowed at all.
- Reading in parallel without concurrent writes is fine.
1 change: 1 addition & 0 deletions webknossos/Changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ For upgrade instructions, please check the respective _Breaking Changes_ section
### Changed

### Fixed
- Fixed a bug where parallel access to the properties json leads to an JsonDecodeError in the webknossos CLI [#919](https://github.com/scalableminds/webknossos-libs/issues/919)


## [0.13.4](https://github.com/scalableminds/webknossos-libs/releases/tag/v0.13.4) - 2023-08-14
Expand Down
2 changes: 1 addition & 1 deletion webknossos/webknossos/cli/convert.py
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@ def main(
source,
target,
voxel_size,
name=name,
name,
data_format=data_format,
executor=executor,
compress=compress,
Expand Down
2 changes: 1 addition & 1 deletion webknossos/webknossos/cli/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
upsample,
)

app = typer.Typer(no_args_is_help=True)
app = typer.Typer(no_args_is_help=True, pretty_exceptions_short=False)

app.command("check-equality")(check_equality.main)
app.command("compress")(compress.main)
Expand Down
5 changes: 5 additions & 0 deletions webknossos/webknossos/dataset/_utils/buffered_slice_writer.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,9 @@ def __init__(
self,
view: "View",
offset: Optional[Vec3IntLike] = None,
# json_update_allowed enables the update of the bounding box and rewriting of the properties json.
# It should be False when parallel access is intended.
json_update_allowed: bool = True,
# buffer_size specifies, how many slices should be aggregated until they are flushed.
buffer_size: int = 32,
dimension: int = 2, # z
Expand All @@ -48,6 +51,7 @@ def __init__(
self.buffer_size = buffer_size
self.dtype = self.view.get_dtype()
self.use_logging = use_logging
self.json_update_allowed = json_update_allowed
if offset is None and relative_offset is None and absolute_offset is None:
relative_offset = Vec3Int.zeros()
if offset is not None:
Expand Down Expand Up @@ -129,6 +133,7 @@ def _write_buffer(self) -> None:
offset=buffer_start.add_or_none(self.offset),
relative_offset=buffer_start_mag1.add_or_none(self.relative_offset),
absolute_offset=buffer_start_mag1.add_or_none(self.absolute_offset),
json_update_allowed=self.json_update_allowed,
)

except Exception as exc:
Expand Down
5 changes: 5 additions & 0 deletions webknossos/webknossos/dataset/_utils/pims_images.py
Original file line number Diff line number Diff line change
Expand Up @@ -481,6 +481,8 @@ def copy_to_view(
) -> Tuple[Tuple[int, int], Optional[int]]:
"""Copies the images according to the passed arguments to the given mag_view.
args is expected to be the start and end of the z-range, meant for usage with an executor.
copy_to_view returns an iterable of image shapes and largest segment ids. When using this
method a manual update of the bounding box and the largest segment id might be necessary.
"""
z_start, z_end = args
shapes = []
Expand All @@ -496,6 +498,9 @@ def copy_to_view(
with mag_view.get_buffered_slice_writer(
relative_offset=(0, 0, z_start * mag_view.mag.z),
buffer_size=mag_view.info.chunk_shape.z,
# copy_to_view is typically used in a multiprocessing-context. Therefore the
# buffered slice writer should not update the json file to avoid race conditions.
json_update_allowed=False,
markbader marked this conversation as resolved.
Show resolved Hide resolved
) as writer:
for image_slice in images[z_start:z_end]:
image_slice = np.array(image_slice)
Expand Down
2 changes: 1 addition & 1 deletion webknossos/webknossos/dataset/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -1249,7 +1249,7 @@ def add_layer_from_images(
if pims_images.expected_shape != actual_size:
warnings.warn(
"[WARNING] Some images are larger than expected, smaller slices are padded with zeros now. "
+ f"New size is {actual_size}, expected {pims_images.expected_shape}.",
+ f"New size is {actual_size}, expected {pims_images.expected_shape}."
)
if first_layer is None:
first_layer = layer
Expand Down
9 changes: 7 additions & 2 deletions webknossos/webknossos/dataset/mag_view.py
Original file line number Diff line number Diff line change
Expand Up @@ -145,6 +145,7 @@ def write(
self,
data: np.ndarray,
offset: Optional[Vec3IntLike] = None, # deprecated, relative, in current mag
json_update_allowed: bool = True,
*,
relative_offset: Optional[Vec3IntLike] = None, # in mag1
absolute_offset: Optional[Vec3IntLike] = None, # in mag1
Expand Down Expand Up @@ -177,10 +178,14 @@ def write(

# Only update the layer's bbox if we are actually larger
# than the mag-aligned, rounded up bbox (self.bounding_box):
if not self.bounding_box.contains_bbox(mag1_bbox):
if json_update_allowed and not self.bounding_box.contains_bbox(mag1_bbox):
self.layer.bounding_box = self.layer.bounding_box.extended_by(mag1_bbox)

super().write(data, absolute_offset=mag1_bbox.topleft)
super().write(
data,
absolute_offset=mag1_bbox.topleft,
json_update_allowed=json_update_allowed,
)

def read(
self,
Expand Down
12 changes: 9 additions & 3 deletions webknossos/webknossos/dataset/view.py
Original file line number Diff line number Diff line change
Expand Up @@ -193,6 +193,7 @@ def write(
self,
data: np.ndarray,
offset: Optional[Vec3IntLike] = None, # deprecated, relative, in current mag
json_update_allowed: bool = True,
*,
relative_offset: Optional[Vec3IntLike] = None, # in mag1
absolute_offset: Optional[Vec3IntLike] = None, # in mag1
Expand Down Expand Up @@ -264,9 +265,10 @@ def write(
abs_mag1_offset=absolute_offset,
current_mag_size=Vec3Int(data.shape[-3:]),
)
assert self.bounding_box.contains_bbox(
mag1_bbox
), f"The bounding box to write {mag1_bbox} is larger than the view's bounding box {self.bounding_box}"
if json_update_allowed:
assert self.bounding_box.contains_bbox(
mag1_bbox
), f"The bounding box to write {mag1_bbox} is larger than the view's bounding box {self.bounding_box}"

if len(data.shape) == 4 and data.shape[0] == 1:
data = data[0] # remove channel dimension for single-channel data
Expand Down Expand Up @@ -654,6 +656,9 @@ def get_buffered_slice_writer(
offset: Optional[Vec3IntLike] = None,
buffer_size: int = 32,
dimension: int = 2, # z
# json_update_allowed enables the update of the bounding box and rewriting of the properties json.
# It should be False when parallel access is intended.
json_update_allowed: bool = True,
*,
relative_offset: Optional[Vec3IntLike] = None, # in mag1
absolute_offset: Optional[Vec3IntLike] = None, # in mag1
Expand Down Expand Up @@ -695,6 +700,7 @@ def get_buffered_slice_writer(
return BufferedSliceWriter(
view=self,
offset=offset,
json_update_allowed=json_update_allowed,
buffer_size=buffer_size,
dimension=dimension,
relative_offset=relative_offset,
Expand Down
10 changes: 5 additions & 5 deletions webknossos/webknossos/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -309,19 +309,19 @@ def __len__(self) -> int:

class NDArrayLike(Protocol):
def __getitem__(self, selection: Tuple[slice, ...]) -> np.ndarray:
pass
...

def __setitem__(self, selection: Tuple[slice, ...], value: np.ndarray) -> None:
pass
...

@property
def shape(self) -> Tuple[int, ...]:
pass
...

@property
def ndim(self) -> int:
pass
...

@property
def dtype(self) -> np.dtype:
pass
...