Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/store paths #2272

Open
wants to merge 32 commits into
base: main
Choose a base branch
from
Open

Feat/store paths #2272

wants to merge 32 commits into from

Conversation

d-v-b
Copy link
Contributor

@d-v-b d-v-b commented Sep 30, 2024

Adds a path attribute to all the store classes. Key operations like get, set, list, list_prefix etc are now scoped to the subtree rooted by the path attribute of the store, which is how RemoteStore works in the v3 branch. This change is motivated by the desire for a consistent, scalable store API.

Key changes:

  • renamed LocalStore.root to LocalStore.path, and changed the type of the attribute from pathlib.Path to str
  • added a path attribute to MemoryStore
  • changed the semantics of the path attribute of ZipStore, and introduced a new attribute file_path that does what ZipStore.path used to do (point to the location of the Zipfile in the parent file system).

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/tutorial.rst
  • Changes documented in docs/release.rst
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Sep 30, 2024

Just a note, this fixes a failure in some xarray tests that write an array at the top level of the store, open a group at the root and then get the array

Edit: whoops, never mind. I missed that this only affects zarr_format=2 when testing. I'll make a new issue.

Copy link
Member

@jhamman jhamman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @d-v-b. I'm not 100% convinced the with_mode approach is going to be substantially cleaner than our StorePath right now but I'm curious to see how this plays out.

@@ -320,6 +336,13 @@ async def _get_many(
for req in requests:
yield (req[0], await self.get(*req))

def with_path(self, path: str) -> Self:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def with_path(self, path: str) -> Self:
@abstractmethod
def with_path(self, path: str) -> Self:

Or do you think think we can do this in a generic way that applies to all stores?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can make it generic unfortunately, because each store class might have its own extra attributes. maybe we could use __getstate__ to get around this?

src/zarr/store/local.py Outdated Show resolved Hide resolved
@@ -28,12 +29,15 @@ def __init__(
self,
store_dict: MutableMapping[str, Buffer] | None = None,
*,
path: str = "",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this store highlights the fact that not all stores have a meaningful path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I agree -- if store_dict really is an arbitrary mutable mapping, there's no guarantee that we want all IO operations to be scoped to the top-level key space, and that's exactly what the path attribute is for.

Comment on lines 55 to 57
file_path: Path | str,
*,
path: str = "",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels quite confusing. IIUC, you are allowing the caller to specify a path within the zipfile as opposed to putting the root of the store at the root of the zip, is that right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exactly! this is required to make ZipStore congruent to RemoteStore

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

though this may have some impact (simplifying) zarr-developers/zarr-specs#311

@jhamman jhamman added the V3 Affects the v3 branch label Oct 11, 2024
@jhamman jhamman changed the base branch from v3 to main October 14, 2024 20:52
@d-v-b d-v-b marked this pull request as ready for review October 20, 2024 19:56
@d-v-b
Copy link
Contributor Author

d-v-b commented Oct 20, 2024

worth mentioning some changes to the reprs:

  • In main, MemoryStore and GPUMemoryStore reprs are <store class name>://<id of ._store_dict>. This PR changes that to <store class name>://<id of ._store_dict>/<store.path>
  • in main, LocalStore repr is file://<store.root>, which is allegedly wrong for a file url. In this PR, it's file:///<store.path>
  • ZipStore is a bit weird, because it's a nested store: we need a wrapping file system to get to the zip file, and then inside the zip file we have another file system. In this PR i'm using the syntax introduced by @jbms for this purpose, which is basically a sequence of URLs separated by the | character, e.g. file:///<store.file_path>|zip://<store.path> (Jeremy please correct me if I'm wrong here).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
V3 Affects the v3 branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants