Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review whether zarr_urls have trailing slash or not #677

Open
jluethi opened this issue Apr 4, 2024 · 8 comments
Open

Review whether zarr_urls have trailing slash or not #677

jluethi opened this issue Apr 4, 2024 · 8 comments

Comments

@jluethi
Copy link
Collaborator

jluethi commented Apr 4, 2024

@tcompa Should zarr_urls have trailing slashes? Are we robust against it? What should the correct behavior be?

@jluethi jluethi changed the title Review whether zarr_urls have trailing slash or not Review whether zarr_url`s have trailing slash or not Apr 4, 2024
@jluethi jluethi changed the title Review whether zarr_url`s have trailing slash or not Review whether zarr_urls have trailing slash or not Apr 4, 2024
@tcompa
Copy link
Collaborator

tcompa commented Apr 4, 2024

My take:

  1. Server-side, we must choose one option (any of the two will do), since we rely on zarr_url to uniquely identify elements in the image list. The way it can work is that any zarr_url is normalized before using it in fractal-server (e.g. before adding it into an image list, before using it to check if it already exists, ...). I would start with os.path.normpath (which produces paths without trailing slash), although we'll need to update this to support s3 URLs in the future.

  2. Task-side, we should rather be robust. For filesystem paths, this can be achieved e.g. by using the os.path.normpath function or by using pathlib.Path objects. For s3 URLs, we will later find the best way to normalize them.

@tcompa
Copy link
Collaborator

tcompa commented Apr 4, 2024

Side comment (I mentioned it in the past, but without a link): something close to pathlib but for s3 URLs is https://github.com/aws-samples/s3pathlib-project.

@jluethi
Copy link
Collaborator Author

jluethi commented Apr 4, 2024

The strategy sounds good to me. Is there some library we can already use now that would work for the normalization both for paths and s3 objects? e.g. something in FSSpec or s3fs?

@tcompa
Copy link
Collaborator

tcompa commented Apr 4, 2024

Is there some library we can already use now that would work for the normalization both for paths and s3 objects? e.g. something in FSSpec or s3fs?

I've not found one during a very quick look. I'll need to review it later.

@tcompa
Copy link
Collaborator

tcompa commented Apr 9, 2024

Ref #652

@jluethi
Copy link
Collaborator Author

jluethi commented Apr 10, 2024

Server will only provide zarr_url without trailing slashes. Still some care when handling this in different tasks

@tcompa
Copy link
Collaborator

tcompa commented Apr 12, 2024

One example where a task should be made more robust: InitArgsMIP.origin_url will or won't be normalized depending on how it is prepared within the copy task, and then it's used in a non-safe way:

    data_czyx = da.from_zarr(init_args.origin_url + "/0")

within the MIP task.

@jluethi
Copy link
Collaborator Author

jluethi commented Apr 18, 2024

We should handle this more carefully in our tasks. Let's use this issue to tag tasks that need care taken for this.

  • MIP task for init args

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants