Can't open a read-only Google Cloud Storage store #14

dstansby · 2024-02-27T15:47:49Z

Zarr version

2.17.0

Numcodecs version

0.12.1

Python Version

3.11.7

Operating System

macOS

Installation

using conda

Description

I am trying to access a read-only Google Cloud Storage bucket, but my code is failing. It looks like it's because zarr is trying to write something to the bucket, but I haven't properly worked out what's going wrong.

If I'm doing something wrong, it would be nice to add an example to the documentation to make this easy to do in the future.

Steps to reproduce

import gcsfs
import zarr

bucket = "ucl-hip-ct-35a68e99feaae8932b1d44da0358940b"
fs = gcsfs.GCSFileSystem(project='ucl-hip-ct', token='anon', access='read_only')
fs.ls(bucket)  # Works
store = fs.get_mapper(root=bucket)
group = zarr.group(store=store, path="LADAF-2020-27/kidney-left/25.08um_complete-organ_bm05")

_request non-retriable exception: Anonymous caller does not have storage.objects.create access to the Google Cloud Storage object. Permission 'storage.objects.create' denied on resource (or it may not exist)., 401
Traceback (most recent call last):
  File "/Users/dstansby/miniconda3/envs/hipct/lib/python3.11/site-packages/gcsfs/retry.py", line 123, in retry_request
    return await func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dstansby/miniconda3/envs/hipct/lib/python3.11/site-packages/gcsfs/core.py", line 430, in _request
    validate_response(status, contents, path, args)
  File "/Users/dstansby/miniconda3/envs/hipct/lib/python3.11/site-packages/gcsfs/retry.py", line 110, in validate_response
    raise HttpError(error)
gcsfs.retry.HttpError: Anonymous caller does not have storage.objects.create access to the Google Cloud Storage object. Permission 'storage.objects.create' denied on resource (or it may not exist)., 401
Traceback (most recent call last):
  File "/Users/dstansby/software/hipct/hipct-reg/scripts/test_real_data.py", line 8, in <module>
    group = zarr.group(store=store, path="LADAF-2020-27/kidney-left/25.08um_complete-organ_bm05")
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dstansby/miniconda3/envs/hipct/lib/python3.11/site-packages/zarr/hierarchy.py", line 1427, in group
    init_group(store, overwrite=overwrite, chunk_store=chunk_store, path=path)
  File "/Users/dstansby/miniconda3/envs/hipct/lib/python3.11/site-packages/zarr/storage.py", line 668, in init_group
    _require_parent_group(path, store=store, chunk_store=chunk_store, overwrite=overwrite)
  File "/Users/dstansby/miniconda3/envs/hipct/lib/python3.11/site-packages/zarr/storage.py", line 313, in _require_parent_group
    _init_group_metadata(store, path=p, chunk_store=chunk_store)
  File "/Users/dstansby/miniconda3/envs/hipct/lib/python3.11/site-packages/zarr/storage.py", line 736, in _init_group_metadata
    store[key] = store._metadata_class.encode_group_metadata(meta)
    ~~~~~^^^^^
  File "/Users/dstansby/miniconda3/envs/hipct/lib/python3.11/site-packages/zarr/storage.py", line 1449, in __setitem__
    self.map[key] = value
    ~~~~~~~~^^^^^
  File "/Users/dstansby/miniconda3/envs/hipct/lib/python3.11/site-packages/fsspec/mapping.py", line 171, in __setitem__
    self.fs.pipe_file(key, maybe_convert(value))
  File "/Users/dstansby/miniconda3/envs/hipct/lib/python3.11/site-packages/fsspec/asyn.py", line 118, in wrapper
    return sync(self.loop, func, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dstansby/miniconda3/envs/hipct/lib/python3.11/site-packages/fsspec/asyn.py", line 103, in sync
    raise return_result
  File "/Users/dstansby/miniconda3/envs/hipct/lib/python3.11/site-packages/fsspec/asyn.py", line 56, in _runner
    result[0] = await coro
                ^^^^^^^^^^
  File "/Users/dstansby/miniconda3/envs/hipct/lib/python3.11/site-packages/gcsfs/core.py", line 1268, in _pipe_file
    location = await simple_upload(
               ^^^^^^^^^^^^^^^^^^^^
  File "/Users/dstansby/miniconda3/envs/hipct/lib/python3.11/site-packages/gcsfs/core.py", line 1954, in simple_upload
    j = await fs._call(
        ^^^^^^^^^^^^^^^
  File "/Users/dstansby/miniconda3/envs/hipct/lib/python3.11/site-packages/gcsfs/core.py", line 437, in _call
    status, headers, info, contents = await self._request(
                                      ^^^^^^^^^^^^^^^^^^^^
  File "/Users/dstansby/miniconda3/envs/hipct/lib/python3.11/site-packages/decorator.py", line 221, in fun
    return await caller(func, *(extras + args), **kw)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dstansby/miniconda3/envs/hipct/lib/python3.11/site-packages/gcsfs/retry.py", line 158, in retry_request
    raise e
  File "/Users/dstansby/miniconda3/envs/hipct/lib/python3.11/site-packages/gcsfs/retry.py", line 123, in retry_request
    return await func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dstansby/miniconda3/envs/hipct/lib/python3.11/site-packages/gcsfs/core.py", line 430, in _request
    validate_response(status, contents, path, args)
  File "/Users/dstansby/miniconda3/envs/hipct/lib/python3.11/site-packages/gcsfs/retry.py", line 110, in validate_response
    raise HttpError(error)
gcsfs.retry.HttpError: Anonymous caller does not have storage.objects.create access to the Google Cloud Storage object. Permission 'storage.objects.create' denied on resource (or it may not exist)., 401

Additional output

No response

dstansby · 2024-02-27T16:02:04Z

To illustrate that this should work in theory, with tensorstore I don't have an issue and the following code works:

import tensorstore as ts
bucket = "ucl-hip-ct-35a68e99feaae8932b1d44da0358940b"

dataset = ts.open({
    "driver": "n5",
    "kvstore": f"gs://{bucket}/LADAF-2020-27/kidney-left/25.08um_complete-organ_bm05/s0",
    'context': {
         'cache_pool': {
             'total_bytes_limit': 100_000_000
         }
     }
}).result()

x = dataset[0, 0, 0].read().result()
print(x)

d-v-b · 2024-02-27T16:03:32Z

cc @martindurant

martindurant · 2024-02-27T16:12:02Z

There is no .zgroup at the given path. I'm not sure of the sematics of zarr.group (as opposed to open_group() ), but it appears to be "create if doesn't exist". So gcsfs is doing the right thing.

What does exist in the location is attributes.json - is this a V3 thing? I can see it has some zarr-like information in, but not the usual .z stuff; e.g., here is the one inside s0:

{"axes":["x","y","z"],
"blockSize":[128,128,128],
"compression":{"blocksize":0,"clevel":9,"cname":"zstd","shuffle":2,"type":"blosc"},
"dataType":"uint16",
"dimensions":[3020,3412,2829],
"neuroglancer-pipeline-version":"1"}

d-v-b · 2024-02-27T16:13:08Z

What does exist in the location is attributes.json - is this a V3 thing?

no, it's n5

martindurant · 2024-02-27T16:16:21Z

Is zarr.group expected to be able to read that?

Also, turning on the logger "gcsfs" would tell you what call is actually causing the exception: what is zarr trying to create?

import fsspec
fsspec.utils.setup_logging(logger_name="gcsfs")

martindurant · 2024-02-27T16:17:16Z

Side issue, @d-v-b : what are the prospects of tensorstore reading kerchunk manifests in JSON or parquet? This doesn't belong here, but I wonder whether there has been any thought along that.

d-v-b · 2024-02-27T16:19:08Z

Side issue, @d-v-b : what are the prospects of tensorstore reading kerchunk manifests in JSON or parquet? This doesn't belong here, but I wonder whether there has been any thought along that.

That's a question for @jbms :) I don't know much about tensorstore myself.

@dstansby you should probably look at N5FSStore, which is FSStore modified to support N5.

d-v-b · 2024-02-27T16:57:51Z

my apologies for presumptively pinging you martin, I didn't notice immediately that this was actually an N5 thing.

dstansby · 2024-02-27T17:26:28Z

Apologies, the path should have "/s0" on the end. I also understand that I should use open_array instead as I have an N5 array and not a group. Updated code (that gives me the same error):

import gcsfs
import zarr

bucket = "ucl-hip-ct-35a68e99feaae8932b1d44da0358940b"
fs = gcsfs.GCSFileSystem(project='ucl-hip-ct', token='anon', access='read_only')
fs.ls(bucket)  # Works
store = fs.get_mapper(root=bucket)
group = zarr.open_array(store=store, path="LADAF-2020-27/kidney-left/25.08um_complete-organ_bm05/s0")

So I'm not sure how I can combine a N5 store with the GCSFileSystem?

jbms · 2024-02-27T17:59:27Z

Side issue, @d-v-b : what are the prospects of tensorstore reading kerchunk manifests in JSON or parquet? This doesn't belong here, but I wonder whether there has been any thought along that.

That's a question for @jbms :) I don't know much about tensorstore myself.

@dstansby you should probably look at N5FSStore, which is FSStore modified to support N5.

That is definitely something we could support pretty easily, and has been on the TODO list.

d-v-b · 2024-02-27T18:48:32Z

@dstansby does this work for you?

import zarr
import gcsfs
bucket = "ucl-hip-ct-35a68e99feaae8932b1d44da0358940b"
fs = gcsfs.GCSFileSystem(project='ucl-hip-ct', token='anon', access='read_only')
store = zarr.N5FSStore(url=bucket, fs=fs)
zarr.open_array(store=store, path="LADAF-2020-27/kidney-left/25.08um_complete-organ_bm05/s0")

dstansby · 2024-03-18T20:39:10Z

Thanks, it did! I've opened a PR to the docs to make sure the working example doesn't get lost to a closed issue 😄

d-v-b · 2024-10-18T12:59:16Z

I transferred this issue from zarr-python to n5py.

dstansby added the bug Something isn't working label Feb 27, 2024

dstansby mentioned this issue Mar 18, 2024

Add Google Cloud Storage example zarr-developers/zarr-python#1713

Closed

6 tasks

d-v-b transferred this issue from zarr-developers/zarr-python Oct 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't open a read-only Google Cloud Storage store #14

Can't open a read-only Google Cloud Storage store #14

dstansby commented Feb 27, 2024

dstansby commented Feb 27, 2024

d-v-b commented Feb 27, 2024

martindurant commented Feb 27, 2024

d-v-b commented Feb 27, 2024 •

edited

Loading

martindurant commented Feb 27, 2024

martindurant commented Feb 27, 2024

d-v-b commented Feb 27, 2024

d-v-b commented Feb 27, 2024

dstansby commented Feb 27, 2024

jbms commented Feb 27, 2024

d-v-b commented Feb 27, 2024

dstansby commented Mar 18, 2024

d-v-b commented Oct 18, 2024

Can't open a read-only Google Cloud Storage store #14

Can't open a read-only Google Cloud Storage store #14

Comments

dstansby commented Feb 27, 2024

Zarr version

Numcodecs version

Python Version

Operating System

Installation

Description

Steps to reproduce

Additional output

dstansby commented Feb 27, 2024

d-v-b commented Feb 27, 2024

martindurant commented Feb 27, 2024

d-v-b commented Feb 27, 2024 • edited Loading

martindurant commented Feb 27, 2024

martindurant commented Feb 27, 2024

d-v-b commented Feb 27, 2024

d-v-b commented Feb 27, 2024

dstansby commented Feb 27, 2024

jbms commented Feb 27, 2024

d-v-b commented Feb 27, 2024

dstansby commented Mar 18, 2024

d-v-b commented Oct 18, 2024

d-v-b commented Feb 27, 2024 •

edited

Loading