Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accessing restricted groups over s3 produces zero filled array #1504

Open
samueljackson92 opened this issue Aug 18, 2023 · 6 comments
Open
Labels
bug Potential issues with the zarr-python library

Comments

@samueljackson92
Copy link

Zarr version

2.15.0

Numcodecs version

0.11.0

Python Version

3.11.4

Operating System

Linux/Ubuntu

Installation

pip install zarr

Description

Hi,

I am attempting to read a consolidated zarr file with lots of groups remotely from a minio s3 bucket. Everything is working nicely with anonymous access. However, when I try to restrict some groups using the minio ACL tools, Zarr returns a blank zero filled array with no error. I would expect that Zarr/FSpec would throw a "permission denied" or similar error.

For example, I have the following structure, and want to deny anonymous access to group 28412.

amc_test.zarr
├── .zgroup
├── .zmetadata
├── 28412               ---> Access is denied in s3 ACL to this group and below.
│   ├── .zgroup
│   ├── data
│   │   ├── .zarray
│   │   └── 0
│   ├── error
│   │   ├── .zarray
│   │   └── 0
│   └── time
│       ├── .zarray
│       └── 0
├── 28415              ---> Still want public access to this group
│   ├── .zgroup
│   ├── data
│   │   ├── .zarray
│   │   └── 0
│   ├── error
│   │   ├── .zarray
│   │   └── 0
│   └── time
│       ├── .zarray
│       └── 0

I can set the corresponding ACL to deny anonymous access on that particular group in minio. But when I go to try and read the data, instead of getting an error, I get a blank array filled with zeros. I guess Zarr sees the file is not readable and assumes that that it doesn't exist, rather than throw an error.

Steps to reproduce

Here is a simple example to how I am accessing the data from s3. The minio bucket is at localhost:10101. If I turn of the ACL in minio then this script runs without errors. However, I would expect an error to be thrown on the penultimate line. Instead I get a blank zero filled array.

import s3fs
import zarr
import numpy as np

file_name = 'amc_test.zarr'

s3 = s3fs.S3FileSystem(
    anon=True,
    use_ssl=False,
    client_kwargs={
        "endpoint_url": "http://localhost:10101"
    },
)

store = zarr.storage.FSStore(f'mast/{file_name}', fs=s3)
handle = zarr.open_consolidated(store)
arr = handle['28415']['data'][:]        # <-- This works as expected. There is no access control on this group.
assert not np.all(arr == 0)

arr = handle['28412']['data'][:]        # <-- I expected an error to be thrown here.
assert not np.all(arr == 0)             # <-- This assert fails.

Additional output

No response

@samueljackson92 samueljackson92 added the bug Potential issues with the zarr-python library label Aug 18, 2023
@joshmoore
Copy link
Member

Thanks for the clear issue, @samueljackson92. My assumption is you are running into the inverse problem that I had -- fsspec/filesystem_spec#342 -- i.e. some implementations were raising a 403 on chunks leading to application errors. Could you try configuring the exceptions at:

https://github.com/zarr-developers/zarr-python/pull/546/files#diff-565e487a2f60258b6baa2e4db8ef175cc16b8a949651834bd43d0a9f21e07358R974

exceptions=(KeyError, PermissionError, IOError)

such that you see the behavior you are looking for? It might be that I was looking for things to be too tolerant, but you raise a good point.

@samueljackson92
Copy link
Author

Hi @joshmoore, thanks for your help. I hadn't noticed the options for tweaking exceptions in FSStore. Unfortunately, changing the exception list doesn't seem to make a difference. I tried:

store = zarr.storage.FSStore(f'mast/{file_name}', fs=s3, exceptions=[BaseException])

to try and just catch anything, but I still get the same assertion failure as above.

Do you have any further advice for where this could be being handled?

@joshmoore
Copy link
Member

Hi @samueljackson92. I think you want the reverse, i.e. you want the PermissionErrors thrown, i.e.:

exceptions=(KeyError, IOError)

@samueljackson92
Copy link
Author

@joshmoore ah sorry, my mistake!

Unfortunately, I still see the same issue. I tried both:

store = zarr.storage.FSStore(f'mast/{file_name}', fs=s3, exceptions=(KeyError, IOError))

and

store = zarr.storage.FSStore(f'mast/{file_name}', fs=s3, exceptions=())

Both of these also just return a zero filled array with no error

@joshmoore
Copy link
Member

joshmoore commented Aug 25, 2023

Doh! Ok. Either it requires something additional or it was a different issue to begin with. In looking around, I see #1237 -- could you see if that helps? If not, #489 is another candidate.

@samueljackson92
Copy link
Author

@joshmoore thanks for the signposting! I will investigate the two suggested PRs when I have a little more time and see if they help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Potential issues with the zarr-python library
Projects
None yet
Development

No branches or pull requests

2 participants