Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to add remote mags and remote layer to an existing dataset #1187

Merged
merged 31 commits into from
Sep 26, 2024

Conversation

MichaelBuessemeyer
Copy link
Contributor

Description:

This PR adds option to add remote mags and remote layer to an existing dataset via dataset.add_remote_layer(layer / URL, layer_name and `layer.add_remote_mag(mag / URL).

When using add_remote_mag make sure that the layer to which the mag is added has matching dtype, category and so on.

Testing:

  • I wrote tests for this, but the tests working with the plain URLs somehow do not work. Maybe I need to configure / code the tests differently 🙈
    I used this script for local testing. Run a local wk to make this work. Else change the local host URLs to webknossos.org urls.
small_dataset = Dataset.open(<local dataset path>)
new_mag = MagView._ensure_mag_view(
    "http://localhost:9000/data/zarr/sample_organization/l4_sample/segmentation/1"
)
new_layer = small_dataset.add_layer(
    "test_remote_layer",
    "segmentation",
    data_format=new_mag.info.data_format,
    dtype_per_channel=new_mag.get_dtype(),
)
new_layer.add_remote_mag(new_mag)
new_mag = MagView._ensure_mag_view(
    "http://localhost:9000/data/zarr/sample_organization/l4_sample/segmentation/2-2-1/"
)
new_layer.add_remote_mag(new_mag)
new_layer.add_remote_mag(
    "http://localhost:9000/data/zarr/sample_organization/l4_sample/segmentation/4-4-1/"
)
small_dataset.add_remote_layer(
    "http://localhost:9000/data/zarr/sample_organization/l4_sample/segmentation",
    "another_segmentation",
)

Issues:

Todos:

Make sure to delete unnecessary points or to check all before merging:

  • Updated Changelog
  • Updated Documentation
  • Added / Updated Tests
  • Considered adding this to the Examples

@MichaelBuessemeyer
Copy link
Contributor Author

It seems to me like the libs already support adding new mags / layers to remote datasets. Therefore, maybe calling the new methods add_foreign_mag/layer might be better suited.

Comment on lines 593 to 597
full_mag_path = _find_mag_path(
self.dataset.path, self.name, mag.to_layer_name(), self.mags[mag].path
)
assert is_fs_path(full_mag_path), "Cannot delete remote mags."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This throws an error in the tests. It is really true, that wklibs support deleting remote mags? e.g. on s3?

@MichaelBuessemeyer
Copy link
Contributor Author

Therefore, maybe calling the new methods add_foreign_mag/layer might be better suited.

@daniel-wer I am now renaming the methods to *_foreign_* as this makes more sense as wklibs already supports handling remote datasets, but the new feature itself is having a layer / mag from another foreign dataset

Copy link
Contributor Author

@MichaelBuessemeyer MichaelBuessemeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok that should be it. I commented everything that I think might be dangerous changes / changes about which I am unsure whether they might be breaking some wklibs functionallity.

webknossos/webknossos/dataset/layer.py Outdated Show resolved Hide resolved
webknossos/webknossos/dataset/layer.py Outdated Show resolved Hide resolved
webknossos/webknossos/dataset/mag_view.py Show resolved Hide resolved
webknossos/webknossos/dataset/layer.py Show resolved Hide resolved
@daniel-wer
Copy link
Member

Should @markbader review this, maybe?

@MichaelBuessemeyer
Copy link
Contributor Author

Should @markbader review this, maybe?

Yeah maybe. to me it looks like a state for a first review :)

@MichaelBuessemeyer MichaelBuessemeyer marked this pull request as ready for review September 11, 2024 11:31
@daniel-wer
Copy link
Member

It seems to me like the libs already support adding new mags / layers to remote datasets. Therefore, maybe calling the new methods add_foreign_mag/layer might be better suited.

Can you clarify why you renamed this? As far as I understand, the libs support opening remote datasets and it's possible to add layers and mags to them. Do you think add_remote_mag could be misinterpreted to mean that this is the method to add a mag to a remote dataset? I would not expect this to be an issue since the method is not called add_mag_to_remote_.... Instead, I think it might be confusing users what "foreign" mags or layers are. Using the term "remote" would be more clear as that terminology is used in other parts of the API as well (i.e. remote datasets). What do you and @markbader think? :)

Copy link
Contributor Author

@MichaelBuessemeyer MichaelBuessemeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @markbader for the feedback. It should now be applied

webknossos/webknossos/utils.py Outdated Show resolved Hide resolved
webknossos/webknossos/dataset/dataset.py Outdated Show resolved Hide resolved
webknossos/webknossos/dataset/layer.py Outdated Show resolved Hide resolved
webknossos/webknossos/dataset/layer.py Outdated Show resolved Hide resolved
webknossos/webknossos/dataset/layer.py Outdated Show resolved Hide resolved
@daniel-wer
Copy link
Member

@MichaelBuessemeyer What do you think about my last comment?

If possible, I'd like to get this merged and pulled into voxelytics in the next week, before I'm out of office :) Let me know if there are any open discussions or other blockers for you!

@MichaelBuessemeyer
Copy link
Contributor Author

To do so, I still find the term add_remote_layer/mag well aligned with the current mental model.

In this specific scenario, I agree with you. But I think the add remote /foreign layer / mag is more mighty than this. At least it has the potential add layers located on any server or accessible file system.

You want to enable another use case, namely adding a local layer to a remote dataset. I guess that's probably a little more difficult than the first use case, because the remote dataset does not have a representation on the local file system, but rather the layer is added in the webknossos database, right? Does that require to label the layer as remote or foreign? Sorry I'm not familiar with that code.

Yeah that's definitely more complicated. What happens is that wklibs updates the datasource-properties.json which include the path to each mag. Thus, the wk server & wklibs should be able to access layers from other "remote" datasets.

In any case, maybe we don't overcomplicate things and focus on the first use case or is there an issue for the second one as well?

sure 👍
Done

Copy link
Contributor

@markbader markbader left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me now. I just commented on the Datasets is_remote_dataset property, as I can't see the value of it yet.

webknossos/webknossos/dataset/dataset.py Outdated Show resolved Hide resolved
@daniel-wer
Copy link
Member

@MichaelBuessemeyer @markbader Thank you for your work on this!
I just saw that webknossos.utils includes an is_fs_path method. Could you clarify whether that's the exact opposite of the is_remote_path method (meaning the one returns true iff the other returns false and vice versa)? If so, one could implement one using the other to make that more clear, and it would also simplify/unify some imports in voxelytics if one of the two methods could be used exclusively.

@MichaelBuessemeyer
Copy link
Contributor Author

MichaelBuessemeyer commented Sep 25, 2024

@MichaelBuessemeyer @markbader Thank you for your work on this!
I just saw that webknossos.utils includes an is_fs_path method. Could you clarify whether that's the exact opposite of the is_remote_path method (meaning the one returns true iff the other returns false and vice versa)? If so, one could implement one using the other to make that more clear, and it would also simplify/unify some imports in voxelytics if one of the two methods could be used exclusively.

I'd say they are not strictly the opposite of each other as there are more implementations of upath which are not checked in either of the methods. See the list of official upath implementations from the gh repo

image

But I think we only use the few implementations that are checked in is_fs_path and is_remote_path. So I'll give it a try to define is_remote_path as the bool'sche negation of is_fs_path and lets see what the ci says. (Although the ci currently fails due to unknown reasons to me :/

@markbader Maybe you have an Idea why this is failing 🙈?

@MichaelBuessemeyer
Copy link
Contributor Author

But I think we only use the few implementations that are checked in is_fs_path and is_remote_path. So I'll give it a try to define is_remote_path as the bool'sche negation of is_fs_path and lets see what the ci says. (Although the ci currently fails due to unknown reasons to me :/

Ok this should work (at least the ci still fails at the same point as before)
Here is the CI error:

FAILED tests/dataset/test_dataset.py::test_ome_ngff_metadata[output_path1] - FileNotFoundError: https://ngff.openmicroscopy.org/0.4/schemas/image.schema

Strangely tests/dataset/test_dataset.py::test_ome_ngff_metadata[output_path0] does not fail 🥴

@markbader
Copy link
Contributor

I get the same CI error in #1196 since yesterday, Both output_path0 and output_path1 fail but they are splitted in the CI (so output_path0 fails in webknossos_linux(3.12,1) and output_path1 in webknossos_linux(3.12, 3)). I don't know where this error comes from, but I think it is unrelated to the failing tests that you wrote. I had this json decode error before and thought they might be caused by vcr-py. But I didn't found a good way for debugging this.

@normanrz
Copy link
Member

I also think, this might be related to vcr-py. All the more reason to get rid of it. Can we skip these tests for now and move on with merging?

@daniel-wer
Copy link
Member

@MichaelBuessemeyer @markbader Instead of commenting out the tests you can use this pytest decorator and also include the reason for the skip :)

@pytest.mark.skip(
    reason="The test is flaky on the CI <for some reason>. Disable it for now."
)

@MichaelBuessemeyer
Copy link
Contributor Author

I also think, this might be related to vcr-py. All the more reason to get rid of it. Can we skip these tests for now and move on with merging?

ok on it :)

@MichaelBuessemeyer
Copy link
Contributor Author

Now
FAILED tests/dataset/test_add_layer_from_images.py::test_test_images[8909407-embedded_NCI_mono_matrigelcollagen_docetaxel_day10_sample10.czi-kwargs4-uint16-1-size4] fails. I tested locally and it works 🤷

I'll give it a retry and if it doesn't work I'll have to disable this as well I guess :/

Copy link
Contributor

@markbader markbader left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀 I noticed two minor spelling issues and a merge conflict needs to be resolved, as I disabled the ngff_metadata test in another PR already. Then it should be ready to merge.

@bulldozer-boy bulldozer-boy bot merged commit d79488c into master Sep 26, 2024
19 checks passed
@bulldozer-boy bulldozer-boy bot deleted the add-remote-layer-and-mag branch September 26, 2024 10:28
@daniel-wer
Copy link
Member

@MichaelBuessemeyer Sorry, I would have wanted to add a changelog entry before merging, but bulldozer was too eager. We've already spoken to Norman to reconfigure bulldozer to only merge when the automerge label is active. This is somehow misconfigured in this repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants