Skip to content

Commit

Permalink
docs: ZIP-related tweaks (#1641)
Browse files Browse the repository at this point in the history
* docs: use 'ZIP archive' instead of 'zip file'; clarify utility of caching in s3 + ZIP example; style

* docs: update release notes, correct spelling of greg lee's name in past release notes, and fix markup in past release notes

* docs: use 'ZIP archive' instead of 'zip file'; clarify utility of caching in s3 + ZIP example; style

* docs: update release notes, correct spelling of greg lee's name in past release notes, and fix markup in past release notes
  • Loading branch information
d-v-b authored Feb 15, 2024
1 parent 7449853 commit 3db4176
Show file tree
Hide file tree
Showing 2 changed files with 24 additions and 23 deletions.
20 changes: 10 additions & 10 deletions docs/release.rst
Original file line number Diff line number Diff line change
Expand Up @@ -62,8 +62,8 @@ Docs
* Add Norman Rzepka to core-dev team.
By :user:`Joe Hamman <jhamman>` :issue:`1630`.

* Added section about accessing zip files that are on s3.
By :user:`Jeff Peck <jeffpeck10x>` :issue:`1613`.
* Added section about accessing ZIP archives on s3.
By :user:`Jeff Peck <jeffpeck10x>` :issue:`1613`, :issue:`1615`, and :user:`Davis Bennett <d-v-b>` :issue:`1641`.

* Add V3 roadmap and design document.
By :user:`Joe Hamman <jhamman>` :issue:`1583`.
Expand Down Expand Up @@ -157,10 +157,10 @@ Maintenance
By :user:`Davis Bennett <d-v-b>` :issue:`1462`.

* Style the codebase with ``ruff`` and ``black``.
By :user:`Davis Bennett` <d-v-b> :issue:`1459`
By :user:`Davis Bennett <d-v-b>` :issue:`1459`

* Ensure that chunks is tuple of ints upon array creation.
By :user:`Philipp Hanslovsky` <hanslovsky> :issue:`1461`
By :user:`Philipp Hanslovsky <hanslovsky>` :issue:`1461`

.. _release_2.15.0:

Expand Down Expand Up @@ -548,7 +548,7 @@ Maintenance
By :user:`Saransh Chopra <Saransh-cpp>` :issue:`1079`.

* Remove option to return None from _ensure_store.
By :user:`Greggory Lee <grlee77>` :issue:`1068`.
By :user:`Gregory Lee <grlee77>` :issue:`1068`.

* Fix a typo of "integers".
By :user:`Richard Scott <RichardScottOZ>` :issue:`1056`.
Expand All @@ -566,7 +566,7 @@ Enhancements
Since the format is not yet finalized, the classes and functions are not
automatically imported into the regular `zarr` name space. Setting the
`ZARR_V3_EXPERIMENTAL_API` environment variable will activate them.
By :user:`Greggory Lee <grlee77>`; :issue:`898`, :issue:`1006`, and :issue:`1007`
By :user:`Gregory Lee <grlee77>`; :issue:`898`, :issue:`1006`, and :issue:`1007`
as well as by :user:`Josh Moore <joshmoore>` :issue:`1032`.

* **Create FSStore from an existing fsspec filesystem**. If you have created
Expand Down Expand Up @@ -688,7 +688,7 @@ Enhancements
higher-level array creation and convenience functions still accept plain
Python dicts or other mutable mappings for the ``store`` argument, but will
internally convert these to a ``KVStore``.
By :user:`Greggory Lee <grlee77>`; :issue:`839`, :issue:`789`, and :issue:`950`.
By :user:`Gregory Lee <grlee77>`; :issue:`839`, :issue:`789`, and :issue:`950`.

* Allow to assign array ``fill_values`` and update metadata accordingly.
By :user:`Ryan Abernathey <rabernat>`, :issue:`662`.
Expand Down Expand Up @@ -835,7 +835,7 @@ Bug fixes
~~~~~~~~~

* Fix FSStore.listdir behavior for nested directories.
By :user:`Greggory Lee <grlee77>`; :issue:`802`.
By :user:`Gregory Lee <grlee77>`; :issue:`802`.

.. _release_2.9.4:

Expand Down Expand Up @@ -919,7 +919,7 @@ Bug fixes
By :user:`Josh Moore <joshmoore>`; :issue:`781`.

* avoid NumPy 1.21.0 due to https://github.com/numpy/numpy/issues/19325
By :user:`Greggory Lee <grlee77>`; :issue:`791`.
By :user:`Gregory Lee <grlee77>`; :issue:`791`.

Maintenance
~~~~~~~~~~~
Expand All @@ -931,7 +931,7 @@ Maintenance
By :user:`Elliott Sales de Andrade <QuLogic>`; :issue:`799`.

* TST: add missing assert in test_hexdigest.
By :user:`Greggory Lee <grlee77>`; :issue:`801`.
By :user:`Gregory Lee <grlee77>`; :issue:`801`.

.. _release_2.8.3:

Expand Down
27 changes: 14 additions & 13 deletions docs/tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -774,7 +774,7 @@ the following code::

Any other compatible storage class could be used in place of
:class:`zarr.storage.DirectoryStore` in the code examples above. For example,
here is an array stored directly into a Zip file, via the
here is an array stored directly into a ZIP archive, via the
:class:`zarr.storage.ZipStore` class::

>>> store = zarr.ZipStore('data/example.zip', mode='w')
Expand All @@ -798,12 +798,12 @@ Re-open and check that data have been written::
[42, 42, 42, ..., 42, 42, 42]], dtype=int32)
>>> store.close()

Note that there are some limitations on how Zip files can be used, because items
within a Zip file cannot be updated in place. This means that data in the array
Note that there are some limitations on how ZIP archives can be used, because items
within a ZIP archive cannot be updated in place. This means that data in the array
should only be written once and write operations should be aligned with chunk
boundaries. Note also that the ``close()`` method must be called after writing
any data to the store, otherwise essential records will not be written to the
underlying zip file.
underlying ZIP archive.

Another storage alternative is the :class:`zarr.storage.DBMStore` class, added
in Zarr version 2.2. This class allows any DBM-style database to be used for
Expand Down Expand Up @@ -846,7 +846,7 @@ respectively require the `redis-py <https://redis-py.readthedocs.io>`_ and
`pymongo <https://api.mongodb.com/python/current/>`_ packages to be installed.

For compatibility with the `N5 <https://github.com/saalfeldlab/n5>`_ data format, Zarr also provides
an N5 backend (this is currently an experimental feature). Similar to the zip storage class, an
an N5 backend (this is currently an experimental feature). Similar to the ZIP storage class, an
:class:`zarr.n5.N5Store` can be instantiated directly::

>>> store = zarr.N5Store('data/example.n5')
Expand Down Expand Up @@ -1000,12 +1000,13 @@ separately from Zarr.

.. _tutorial_copy:

Accessing Zip Files on S3
~~~~~~~~~~~~~~~~~~~~~~~~~
Accessing ZIP archives on S3
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The built-in `ZipStore` will only work with paths on the local file-system, however
it is also possible to access ``.zarr.zip`` data on the cloud. Here is an example of
accessing a zipped Zarr file on s3:
The built-in :class:`zarr.storage.ZipStore` will only work with paths on the local file-system; however
it is possible to access ZIP-archived Zarr data on the cloud via the `ZipFileSystem <https://filesystem-spec.readthedocs.io/en/latest/_modules/fsspec/implementations/zip.html>`_
class from ``fsspec``. The following example demonstrates how to access
a ZIP-archived Zarr group on s3 using `s3fs <https://s3fs.readthedocs.io/en/latest/>`_ and ``ZipFileSystem``:

>>> s3_path = "s3://path/to/my.zarr.zip"
>>>
Expand All @@ -1014,15 +1015,15 @@ accessing a zipped Zarr file on s3:
>>> fs = ZipFileSystem(f, mode="r")
>>> store = FSMap("", fs, check=False)
>>>
>>> # cache is optional, but may be a good idea depending on the situation
>>> # caching may improve performance when repeatedly reading the same data
>>> cache = zarr.storage.LRUStoreCache(store, max_size=2**28)
>>> z = zarr.group(store=cache)

This store can also be generated with ``fsspec``'s handler chaining, like so:

>>> store = zarr.storage.FSStore(url=f"zip::{s3_path}", mode="r")

This can be especially useful if you have a very large ``.zarr.zip`` file on s3
This can be especially useful if you have a very large ZIP-archived Zarr array or group on s3
and only need to access a small portion of it.

Consolidating metadata
Expand Down Expand Up @@ -1161,7 +1162,7 @@ re-compression, and so should be faster. E.g.::
└── spam (100,) int64
>>> new_root['foo/bar/baz'][:]
array([ 0, 1, 2, ..., 97, 98, 99])
>>> store2.close() # zip stores need to be closed
>>> store2.close() # ZIP stores need to be closed

.. _tutorial_strings:

Expand Down

0 comments on commit 3db4176

Please sign in to comment.