Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't open a read-only Google Cloud Storage store #14

Open
dstansby opened this issue Feb 27, 2024 · 13 comments
Open

Can't open a read-only Google Cloud Storage store #14

dstansby opened this issue Feb 27, 2024 · 13 comments
Labels
bug Something isn't working

Comments

@dstansby
Copy link

Zarr version

2.17.0

Numcodecs version

0.12.1

Python Version

3.11.7

Operating System

macOS

Installation

using conda

Description

I am trying to access a read-only Google Cloud Storage bucket, but my code is failing. It looks like it's because zarr is trying to write something to the bucket, but I haven't properly worked out what's going wrong.

If I'm doing something wrong, it would be nice to add an example to the documentation to make this easy to do in the future.

Steps to reproduce

import gcsfs
import zarr

bucket = "ucl-hip-ct-35a68e99feaae8932b1d44da0358940b"
fs = gcsfs.GCSFileSystem(project='ucl-hip-ct', token='anon', access='read_only')
fs.ls(bucket)  # Works
store = fs.get_mapper(root=bucket)
group = zarr.group(store=store, path="LADAF-2020-27/kidney-left/25.08um_complete-organ_bm05")
_request non-retriable exception: Anonymous caller does not have storage.objects.create access to the Google Cloud Storage object. Permission 'storage.objects.create' denied on resource (or it may not exist)., 401
Traceback (most recent call last):
  File "/Users/dstansby/miniconda3/envs/hipct/lib/python3.11/site-packages/gcsfs/retry.py", line 123, in retry_request
    return await func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dstansby/miniconda3/envs/hipct/lib/python3.11/site-packages/gcsfs/core.py", line 430, in _request
    validate_response(status, contents, path, args)
  File "/Users/dstansby/miniconda3/envs/hipct/lib/python3.11/site-packages/gcsfs/retry.py", line 110, in validate_response
    raise HttpError(error)
gcsfs.retry.HttpError: Anonymous caller does not have storage.objects.create access to the Google Cloud Storage object. Permission 'storage.objects.create' denied on resource (or it may not exist)., 401
Traceback (most recent call last):
  File "/Users/dstansby/software/hipct/hipct-reg/scripts/test_real_data.py", line 8, in <module>
    group = zarr.group(store=store, path="LADAF-2020-27/kidney-left/25.08um_complete-organ_bm05")
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dstansby/miniconda3/envs/hipct/lib/python3.11/site-packages/zarr/hierarchy.py", line 1427, in group
    init_group(store, overwrite=overwrite, chunk_store=chunk_store, path=path)
  File "/Users/dstansby/miniconda3/envs/hipct/lib/python3.11/site-packages/zarr/storage.py", line 668, in init_group
    _require_parent_group(path, store=store, chunk_store=chunk_store, overwrite=overwrite)
  File "/Users/dstansby/miniconda3/envs/hipct/lib/python3.11/site-packages/zarr/storage.py", line 313, in _require_parent_group
    _init_group_metadata(store, path=p, chunk_store=chunk_store)
  File "/Users/dstansby/miniconda3/envs/hipct/lib/python3.11/site-packages/zarr/storage.py", line 736, in _init_group_metadata
    store[key] = store._metadata_class.encode_group_metadata(meta)
    ~~~~~^^^^^
  File "/Users/dstansby/miniconda3/envs/hipct/lib/python3.11/site-packages/zarr/storage.py", line 1449, in __setitem__
    self.map[key] = value
    ~~~~~~~~^^^^^
  File "/Users/dstansby/miniconda3/envs/hipct/lib/python3.11/site-packages/fsspec/mapping.py", line 171, in __setitem__
    self.fs.pipe_file(key, maybe_convert(value))
  File "/Users/dstansby/miniconda3/envs/hipct/lib/python3.11/site-packages/fsspec/asyn.py", line 118, in wrapper
    return sync(self.loop, func, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dstansby/miniconda3/envs/hipct/lib/python3.11/site-packages/fsspec/asyn.py", line 103, in sync
    raise return_result
  File "/Users/dstansby/miniconda3/envs/hipct/lib/python3.11/site-packages/fsspec/asyn.py", line 56, in _runner
    result[0] = await coro
                ^^^^^^^^^^
  File "/Users/dstansby/miniconda3/envs/hipct/lib/python3.11/site-packages/gcsfs/core.py", line 1268, in _pipe_file
    location = await simple_upload(
               ^^^^^^^^^^^^^^^^^^^^
  File "/Users/dstansby/miniconda3/envs/hipct/lib/python3.11/site-packages/gcsfs/core.py", line 1954, in simple_upload
    j = await fs._call(
        ^^^^^^^^^^^^^^^
  File "/Users/dstansby/miniconda3/envs/hipct/lib/python3.11/site-packages/gcsfs/core.py", line 437, in _call
    status, headers, info, contents = await self._request(
                                      ^^^^^^^^^^^^^^^^^^^^
  File "/Users/dstansby/miniconda3/envs/hipct/lib/python3.11/site-packages/decorator.py", line 221, in fun
    return await caller(func, *(extras + args), **kw)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dstansby/miniconda3/envs/hipct/lib/python3.11/site-packages/gcsfs/retry.py", line 158, in retry_request
    raise e
  File "/Users/dstansby/miniconda3/envs/hipct/lib/python3.11/site-packages/gcsfs/retry.py", line 123, in retry_request
    return await func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dstansby/miniconda3/envs/hipct/lib/python3.11/site-packages/gcsfs/core.py", line 430, in _request
    validate_response(status, contents, path, args)
  File "/Users/dstansby/miniconda3/envs/hipct/lib/python3.11/site-packages/gcsfs/retry.py", line 110, in validate_response
    raise HttpError(error)
gcsfs.retry.HttpError: Anonymous caller does not have storage.objects.create access to the Google Cloud Storage object. Permission 'storage.objects.create' denied on resource (or it may not exist)., 401

Additional output

No response

@dstansby dstansby added the bug Something isn't working label Feb 27, 2024
@dstansby
Copy link
Author

To illustrate that this should work in theory, with tensorstore I don't have an issue and the following code works:

import tensorstore as ts
bucket = "ucl-hip-ct-35a68e99feaae8932b1d44da0358940b"

dataset = ts.open({
    "driver": "n5",
    "kvstore": f"gs://{bucket}/LADAF-2020-27/kidney-left/25.08um_complete-organ_bm05/s0",
    'context': {
         'cache_pool': {
             'total_bytes_limit': 100_000_000
         }
     }
}).result()

x = dataset[0, 0, 0].read().result()
print(x)

@d-v-b
Copy link
Collaborator

d-v-b commented Feb 27, 2024

cc @martindurant

@martindurant
Copy link
Member

There is no .zgroup at the given path. I'm not sure of the sematics of zarr.group (as opposed to open_group() ), but it appears to be "create if doesn't exist". So gcsfs is doing the right thing.

What does exist in the location is attributes.json - is this a V3 thing? I can see it has some zarr-like information in, but not the usual .z stuff; e.g., here is the one inside s0:

{"axes":["x","y","z"],
"blockSize":[128,128,128],
"compression":{"blocksize":0,"clevel":9,"cname":"zstd","shuffle":2,"type":"blosc"},
"dataType":"uint16",
"dimensions":[3020,3412,2829],
"neuroglancer-pipeline-version":"1"}

@d-v-b
Copy link
Collaborator

d-v-b commented Feb 27, 2024

What does exist in the location is attributes.json - is this a V3 thing?

no, it's n5

@martindurant
Copy link
Member

Is zarr.group expected to be able to read that?

Also, turning on the logger "gcsfs" would tell you what call is actually causing the exception: what is zarr trying to create?

import fsspec
fsspec.utils.setup_logging(logger_name="gcsfs")

@martindurant
Copy link
Member

Side issue, @d-v-b : what are the prospects of tensorstore reading kerchunk manifests in JSON or parquet? This doesn't belong here, but I wonder whether there has been any thought along that.

@d-v-b
Copy link
Collaborator

d-v-b commented Feb 27, 2024

Side issue, @d-v-b : what are the prospects of tensorstore reading kerchunk manifests in JSON or parquet? This doesn't belong here, but I wonder whether there has been any thought along that.

That's a question for @jbms :) I don't know much about tensorstore myself.

@dstansby you should probably look at N5FSStore, which is FSStore modified to support N5.

@d-v-b
Copy link
Collaborator

d-v-b commented Feb 27, 2024

my apologies for presumptively pinging you martin, I didn't notice immediately that this was actually an N5 thing.

@dstansby
Copy link
Author

Apologies, the path should have "/s0" on the end. I also understand that I should use open_array instead as I have an N5 array and not a group. Updated code (that gives me the same error):

import gcsfs
import zarr

bucket = "ucl-hip-ct-35a68e99feaae8932b1d44da0358940b"
fs = gcsfs.GCSFileSystem(project='ucl-hip-ct', token='anon', access='read_only')
fs.ls(bucket)  # Works
store = fs.get_mapper(root=bucket)
group = zarr.open_array(store=store, path="LADAF-2020-27/kidney-left/25.08um_complete-organ_bm05/s0")

So I'm not sure how I can combine a N5 store with the GCSFileSystem?

@jbms
Copy link

jbms commented Feb 27, 2024

Side issue, @d-v-b : what are the prospects of tensorstore reading kerchunk manifests in JSON or parquet? This doesn't belong here, but I wonder whether there has been any thought along that.

That's a question for @jbms :) I don't know much about tensorstore myself.

@dstansby you should probably look at N5FSStore, which is FSStore modified to support N5.

That is definitely something we could support pretty easily, and has been on the TODO list.

@d-v-b
Copy link
Collaborator

d-v-b commented Feb 27, 2024

@dstansby does this work for you?

import zarr
import gcsfs
bucket = "ucl-hip-ct-35a68e99feaae8932b1d44da0358940b"
fs = gcsfs.GCSFileSystem(project='ucl-hip-ct', token='anon', access='read_only')
store = zarr.N5FSStore(url=bucket, fs=fs)
zarr.open_array(store=store, path="LADAF-2020-27/kidney-left/25.08um_complete-organ_bm05/s0")

@dstansby
Copy link
Author

Thanks, it did! I've opened a PR to the docs to make sure the working example doesn't get lost to a closed issue 😄

@d-v-b
Copy link
Collaborator

d-v-b commented Oct 18, 2024

I transferred this issue from zarr-python to n5py.

@d-v-b d-v-b transferred this issue from zarr-developers/zarr-python Oct 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants