-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make largest_segment_id optional #925
Conversation
Most of this issue was already solved in #786. This PR removes an assertion that prevents the creation of a segmentation layer without the largest segment id and adapts the test to check the expected behavior. I assumed that a segmentation layer that contains no data has |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@normanrz do you have an opinion concerning None for newly created empty segmentation layers? I guess zero could also be a valid value in that case, but if the data write functions don’t update it, it may make more sense to leave it at None so that it is more explicitly unset?
Other than this question, the changes look good to me.
I'd leave it at |
Please also add a method to determine+save the max_id. |
You can probably take that (or inspiration) from https://github.com/scalableminds/voxelytics/blob/master/voxelytics/worker/jobs/find_largest_segment_id.py |
I added a function def _find_largest_segment_id(self):
self.get_finest_mag().read().flatten().max() Do you have thought on the current implementation? @normanrz @fm3 |
I agree that parallelization is good to have for that :) Maybe you could make use of I think it would be good to avoid |
Definitely needs parallelization. These segmentations can be 100s of TVx. Perhaps, we should add a |
I refactored the Is it reasonable to assume that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool stuff, looks good to me!
I’m not certain about auto-updating the largest_segment_id. It would be a nice feature, but it may be expensive, since we have to call max on all written data. Also, when opening an existing dataset, we can never be certain that the largest_segment_id in its metadata is actually correct, so we might not want to give the impression that we can guarantee it being correct and up to date.
Do you have opinions about this @philippotto @normanrz ?
I'm leaning towards auto-updating the LSI, since it simplifies data handling for the user. However, it shouldn't add an additional read-pass across all written data chunks (instead, during each write, it should compute the maximum of the input data which is already in-memory).
If a dataset already has an LSI, that value should be trusted in my opinion (if existing metadata cant be trusted the user should be responsible for fixing it). If the LSI is None (and bbox > 0), an automatic update should not happen, as partial writes wouldn't be able to know the new LSI properly. Does this make sense? |
Sounds fair, let’s wait if norman has an opinion as well :) Also note that this may also need a file lock for the json during in parallel writes. |
I don't think an auto-update is necessary. It would bring the same concurrency issues as the bbox changes. In the future, Webknossos should have an "optimize" worker job for segmentations that
That job could run directly on import or user-triggered. |
Do you think an entry in the documentation or an example would be useful? Otherwise, since there is no longer a need for an auto-update mechanism, I think the PR is in a state that it can be merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think some documentation on the largest_segment_id
would be nice in the write
method.
@@ -203,6 +204,25 @@ def write( | |||
This parameter used to be relative for `View` and absolute for `MagView`, | |||
and specified in the mag of the respective view. | |||
|
|||
Writing data to a segmentation layer manually does not automatically update the largest_segment_id. To set | |||
the largest segment id properly run the `refresh_largest_segment_id` method on your layer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the largest segment id properly run the `refresh_largest_segment_id` method on your layer. | |
the largest segment id properly run the `refresh_largest_segment_id` method on your layer or set the `largest_segment_id` property manually. |
Description:
Issues:
Todos:
Make sure to delete unnecessary points or to check all before merging: