Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Backend Configuration Va] Basic user documentation #802

Merged
merged 131 commits into from
May 6, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
131 commits
Select commit Hold shift + click to select a range
426224d
first integration with interface
CodyCBakerPhD Apr 2, 2024
8eb87b7
several fixes
CodyCBakerPhD Apr 2, 2024
ecda769
saving sttate
CodyCBakerPhD Apr 2, 2024
82f1eeb
add backend to context tools and tests
CodyCBakerPhD Apr 2, 2024
e748489
debugging
CodyCBakerPhD Apr 2, 2024
69d53dd
disable test
CodyCBakerPhD Apr 2, 2024
378e5ce
Merge branch 'extend_context_for_zarr' into converter_backend_integra…
CodyCBakerPhD Apr 2, 2024
20c0992
pass backend to context
CodyCBakerPhD Apr 2, 2024
d4bfc36
temporarily suppress Zarr
CodyCBakerPhD Apr 2, 2024
8757db1
Update CHANGELOG.md
CodyCBakerPhD Apr 2, 2024
f78becf
Merge branch 'extend_context_for_zarr' into converter_backend_integra…
CodyCBakerPhD Apr 2, 2024
b6c0812
add changelog
CodyCBakerPhD Apr 2, 2024
ba34434
first pass of docs for backend configuration
CodyCBakerPhD Apr 2, 2024
48e2bcd
debug for references
CodyCBakerPhD Apr 2, 2024
a880a3f
enhance
CodyCBakerPhD Apr 2, 2024
19b71b7
adjust call signature
CodyCBakerPhD Apr 2, 2024
a2d8f3d
add docstrings
CodyCBakerPhD Apr 2, 2024
58c8d75
Merge branch 'main' into extend_context_for_zarr
CodyCBakerPhD Apr 2, 2024
dc32598
Merge branch 'extend_context_for_zarr' into converter_backend_integra…
CodyCBakerPhD Apr 2, 2024
078ce17
correct call signature
CodyCBakerPhD Apr 3, 2024
12b178a
Merge branch 'converter_backend_integration' into backend_integration…
CodyCBakerPhD Apr 3, 2024
d0e6c28
Update CHANGELOG.md
CodyCBakerPhD Apr 3, 2024
0d3a56e
Update CHANGELOG.md
CodyCBakerPhD Apr 3, 2024
23cf2c2
Merge branch 'extend_context_for_zarr' into converter_backend_integra…
CodyCBakerPhD Apr 3, 2024
d3225e2
Merge branch 'converter_backend_integration' into backend_integration…
CodyCBakerPhD Apr 3, 2024
12c6862
debugs
CodyCBakerPhD Apr 3, 2024
767a2e4
Merge branch 'converter_backend_integration' into backend_integration…
CodyCBakerPhD Apr 3, 2024
4dd1a24
Merge branch 'main' into extend_context_for_zarr
CodyCBakerPhD Apr 3, 2024
6b7b654
Merge branch 'extend_context_for_zarr' into converter_backend_integra…
CodyCBakerPhD Apr 3, 2024
1e72386
Merge branch 'main' into backend_config_docs
CodyCBakerPhD Apr 3, 2024
ab032db
Merge branch 'converter_backend_integration' into backend_integration…
CodyCBakerPhD Apr 3, 2024
e88218a
enhance error message
CodyCBakerPhD Apr 5, 2024
08bf96b
Merge branch 'main' into backend_config_docs
CodyCBakerPhD Apr 9, 2024
c8f1d2b
add interface/converter docs
CodyCBakerPhD Apr 11, 2024
6339710
Merge branch 'main' into backend_config_docs
CodyCBakerPhD Apr 11, 2024
056a3f3
enhance modification examples
CodyCBakerPhD Apr 11, 2024
1074e9c
pr suggestion
CodyCBakerPhD Apr 11, 2024
f12c575
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 11, 2024
931966c
Merge branch 'main' into converter_backend_integration
CodyCBakerPhD Apr 11, 2024
1d581ac
Merge branch 'main' into converter_backend_integration
CodyCBakerPhD Apr 11, 2024
30acbb3
split nwbfile equivalency logic
CodyCBakerPhD Apr 12, 2024
acc1486
Merge branch 'main' into converter_backend_integration
CodyCBakerPhD Apr 12, 2024
9854a5b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 12, 2024
aa92e1c
Merge branch 'converter_backend_integration' into backend_integration…
CodyCBakerPhD Apr 12, 2024
30de1e5
pr suggestions
CodyCBakerPhD Apr 12, 2024
4eb1c68
added tests
CodyCBakerPhD Apr 12, 2024
150ac04
Merge branch 'converter_backend_integration' into backend_integration…
CodyCBakerPhD Apr 12, 2024
3ee5a78
Merge branch 'backend_integration_nwbconverter' into backend_config_docs
CodyCBakerPhD Apr 12, 2024
2a0c3de
update to latest syntax
CodyCBakerPhD Apr 12, 2024
9718dcc
updated for other PR; added tests
CodyCBakerPhD Apr 12, 2024
c4633ca
Merge branch 'backend_integration_nwbconverter' into backend_config_docs
CodyCBakerPhD Apr 12, 2024
3a6853f
fix name
CodyCBakerPhD Apr 12, 2024
3973d2b
fix name
CodyCBakerPhD Apr 12, 2024
9de3f14
Merge branch 'converter_backend_integration' into backend_integration…
CodyCBakerPhD Apr 12, 2024
6f4b6fe
Merge branch 'backend_integration_nwbconverter' into backend_config_docs
CodyCBakerPhD Apr 12, 2024
2254896
fix
CodyCBakerPhD Apr 12, 2024
dace752
fix
CodyCBakerPhD Apr 12, 2024
1902f55
fix
CodyCBakerPhD Apr 12, 2024
729d286
Merge branch 'converter_backend_integration' into backend_integration…
CodyCBakerPhD Apr 12, 2024
8000073
fix
CodyCBakerPhD Apr 12, 2024
d70d594
format for small screens
CodyCBakerPhD Apr 12, 2024
9f038b9
Merge branch 'backend_integration_nwbconverter' into backend_config_docs
CodyCBakerPhD Apr 12, 2024
f2fcae0
fix DLC
CodyCBakerPhD Apr 12, 2024
fa502b1
Merge branch 'converter_backend_integration' into backend_integration…
CodyCBakerPhD Apr 12, 2024
5c06008
Merge branch 'backend_integration_nwbconverter' into backend_config_docs
CodyCBakerPhD Apr 12, 2024
42c4b2a
move to separate PR
CodyCBakerPhD Apr 13, 2024
dbf28bd
move to separate PR
CodyCBakerPhD Apr 13, 2024
06bf3e5
Merge branch 'swap_internal_name' into converter_backend_integration
CodyCBakerPhD Apr 13, 2024
2da7338
simplify remnants
CodyCBakerPhD Apr 13, 2024
cce019f
Merge branch 'converter_backend_integration' into backend_integration…
CodyCBakerPhD Apr 13, 2024
dc5108b
fix
CodyCBakerPhD Apr 13, 2024
20e6df3
Merge branch 'backend_integration_nwbconverter' into backend_config_docs
CodyCBakerPhD Apr 13, 2024
51b7b56
enhance
CodyCBakerPhD Apr 13, 2024
eabefd4
Merge branch 'main' into backend_integration_nwbconverter
CodyCBakerPhD Apr 15, 2024
ffe7acf
Merge branch 'backend_integration_nwbconverter' into backend_config_docs
CodyCBakerPhD Apr 15, 2024
c50fcb5
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 15, 2024
57a77b5
Merge branch 'main' into backend_integration_nwbconverter
CodyCBakerPhD Apr 16, 2024
c2cc907
various debugs
Apr 16, 2024
9e2b2ff
debug audio
Apr 16, 2024
bc49150
debugs
Apr 17, 2024
8ffac8e
debug mock schema
Apr 17, 2024
0def0ea
Merge branch 'main' into backend_integration_nwbconverter
CodyCBakerPhD Apr 17, 2024
ac30ed8
cheat around it
Apr 17, 2024
fabc3a7
Merge branch 'backend_integration_nwbconverter' of https://github.com…
Apr 17, 2024
9db0404
Merge branch 'main' into backend_integration_nwbconverter
CodyCBakerPhD Apr 18, 2024
05a1ef4
rollback labels
Apr 18, 2024
e7694d0
Merge branch 'backend_integration_nwbconverter' of https://github.com…
Apr 18, 2024
d5e5759
rollback labels
Apr 18, 2024
a9c9501
rollback labels
Apr 18, 2024
face948
skip labeled events
Apr 18, 2024
46c107c
try to fix EDF
Apr 18, 2024
e20cf80
fix
Apr 18, 2024
6d22866
remove previous
Apr 18, 2024
cbcb6c0
restore accidental removal
Apr 18, 2024
e912fce
fix typing
Apr 18, 2024
e2f2fc7
Merge branch 'backend_integration_nwbconverter' into backend_config_docs
CodyCBakerPhD Apr 18, 2024
d5bf51a
Merge branch 'main' into backend_config_docs
CodyCBakerPhD Apr 29, 2024
1d31c58
remove prints
CodyCBakerPhD Apr 29, 2024
9bc2af0
fix merge conflict
CodyCBakerPhD Apr 29, 2024
42cf030
remove unnecessary skip
CodyCBakerPhD Apr 29, 2024
7ca52b8
Merge branch 'main' into backend_config_docs
CodyCBakerPhD Apr 29, 2024
b0c206a
fix styling
CodyCBakerPhD Apr 29, 2024
6b8a54f
update name
CodyCBakerPhD Apr 29, 2024
c1e10a5
update
CodyCBakerPhD Apr 29, 2024
17f9b7a
Merge branch 'main' into backend_config_docs
CodyCBakerPhD Apr 30, 2024
1ba6ade
fix index
CodyCBakerPhD Apr 30, 2024
b8e0fe4
bump versions and remove timezone setting from test assertion
CodyCBakerPhD May 3, 2024
3d7e006
bump versions and remove timezone setting from test assertion
CodyCBakerPhD May 3, 2024
fa92bee
Apply suggestions from code review
CodyCBakerPhD May 3, 2024
dcd8431
Merge branch 'main' into backend_config_docs
CodyCBakerPhD May 3, 2024
5165ece
update for latest byte rendering
CodyCBakerPhD May 3, 2024
3bc75a9
Update docs/user_guide/backend_configuration.rst
CodyCBakerPhD May 3, 2024
a42f7d6
Update docs/user_guide/backend_configuration.rst
CodyCBakerPhD May 3, 2024
75be748
Merge branch 'main' of https://github.com/catalystneuro/neuroconv
CodyCBakerPhD May 4, 2024
7a61117
Merge branch 'main' into backend_config_docs
CodyCBakerPhD May 4, 2024
e87e8c7
Update backend_configuration.rst
CodyCBakerPhD May 5, 2024
4bcce88
Merge branch 'main' into backend_config_docs
CodyCBakerPhD May 6, 2024
72f462c
Apply suggestions from code review
CodyCBakerPhD May 6, 2024
1c22b64
try internal rst to see how it renders; swap note style to FAQ; move …
CodyCBakerPhD May 6, 2024
44d2723
restore header
CodyCBakerPhD May 6, 2024
774039a
try API docs
CodyCBakerPhD May 6, 2024
c492afe
meth instead of method
CodyCBakerPhD May 6, 2024
114620b
added h5py demo to confirm output was written as expected
CodyCBakerPhD May 6, 2024
1e02927
Update docs/user_guide/backend_configuration.rst
CodyCBakerPhD May 6, 2024
8853a80
fix syntax
CodyCBakerPhD May 6, 2024
432b057
fix syntax
CodyCBakerPhD May 6, 2024
6364fd0
rephrase
CodyCBakerPhD May 6, 2024
4952977
fix rendering
CodyCBakerPhD May 6, 2024
3bff83c
Merge branch 'main' into backend_config_docs
CodyCBakerPhD May 6, 2024
30de25c
Backend config docs3 (#843)
bendichter May 6, 2024
8b53c71
minor fixes
CodyCBakerPhD May 6, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ Below is an overview of the key sections to help you navigate our documentation

* **User Guide**

The `User Guide <user_guide/user_guide.rst>`_ offers a comprehensive overview of NeuroConv's data model and functionalities.
The `User Guide <user_guide/index.rst>`_ offers a comprehensive overview of NeuroConv's data model and functionalities.
It is recommended for users who wish to understand the underlying concepts and extend their scripts beyond basic conversions.

* **Catalogue of Projects**
Expand All @@ -60,7 +60,7 @@ We are happy to help and appreciate your feedback.
:maxdepth: 2
:hidden:

user_guide/user_guide
user_guide/index
conversion_examples_gallery/conversion_example_gallery
catalogue/catalogue
developer_guide
Expand Down
351 changes: 351 additions & 0 deletions docs/user_guide/backend_configuration.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,351 @@
Backend Configuration
=====================

NeuroConv offers convenient control over the type of file backend and the way each dataset is configured.

Find out more about possible backend formats in the `main NWB documentation <https://nwb-overview.readthedocs.io/en/latest/faq_details/why_hdf5.html#why-use-hdf5-as-the-primary-backend-for-nwb>`_.

Find out more about chunking and compression in the `advanced NWB tutorials for dataset I/O settings <https://pynwb.readthedocs.io/en/stable/tutorials/advanced_io/h5dataio.html#sphx-glr-tutorials-advanced-io-h5dataio-py>`_.

Find out more about memory buffering of large source files in the `advanced NWB tutorials for iterative data write <https://pynwb.readthedocs.io/en/stable/tutorials/advanced_io/plot_iterative_write.html#sphx-glr-tutorials-advanced-io-plot-iterative-write-py>`_.



Default configuration
---------------------

To retrieve a default configuration for an in-memory ``pynwb.NWBFile`` object, use the :py:meth:`~neuroconv.tools.nwb_helpers.get_default_backend_configuration` function:

.. code-block:: python

from datetime import datetime
from uuid import uuid4

from neuroconv.tools.nwb_helpers import get_default_backend_configuration
from pynwb import NWBFile, TimeSeries

session_start_time = datetime(2020, 1, 1, 12, 30, 0)
nwbfile = NWBFile(
identifier=str(uuid4()),
session_start_time=session_start_time,
session_description="A session of my experiment.",
)

time_series = TimeSeries(
name="MyTimeSeries",
description="A time series from my experiment.",
unit="cm/s",
data=[1., 2., 3.],
timestamps=[0.0, 0.2, 0.4],
)
nwbfile.add_acquisition(time_series)

default_backend_configuration = get_default_backend_configuration(
nwbfile=nwbfile, backend="hdf5"
)

From which a printout of the contents:

.. code-block:: python

print(default_backend_configuration)

returns:

.. code-block:: bash

HDF5 dataset configurations
---------------------------

acquisition/MyTimeSeries/data
-----------------------------
bendichter marked this conversation as resolved.
Show resolved Hide resolved
dtype : float64
full shape of source array : (3,)
full size of source array : 24 B

buffer shape : (3,)
expected RAM usage : 24 B

chunk shape : (3,)
disk space usage per chunk : 24 B

compression method : gzip

acquisition/MyTimeSeries/timestamps
-----------------------------------
dtype : float64
full shape of source array : (3,)
full size of source array : 24 B

buffer shape : (3,)
expected RAM usage : 24 B

chunk shape : (3,)
disk space usage per chunk : 24 B

compression method : gzip



Customization
-------------

To modify the chunking or buffering patterns and compression method or options, change those values in the ``.dataset_configurations`` object using the location of each dataset as a specifier.
h-mayorquin marked this conversation as resolved.
Show resolved Hide resolved

Let's demonstrate this by modifying everything we can for the ``data`` field of the ``TimeSeries`` object generated above:

.. code-block:: python

dataset_configurations = default_backend_configuration.dataset_configurations
dataset_configuration = dataset_configurations["acquisition/MyTimeSeries/data"]

dataset_configuration.chunk_shape = (1,)
dataset_configuration.buffer_shape = (2,)
dataset_configuration.compression_method = "Zstd"
dataset_configuration.compression_options = dict(clevel=3)

We can confirm these values are saved by re-printing that particular dataset configuration:

.. code-block:: python

print(dataset_configuration)

.. code-block:: bash

acquisition/MyTimeSeries/data
-----------------------------
dtype : float64
full shape of source array : (3,)
full size of source array : 24 B

buffer shape : (2,)
expected RAM usage : 16 B

chunk shape : (1,)
disk space usage per chunk : 8 B

compression method : Zstd
compression options : {'clevel': 3}

Then we can use this configuration to write the NWB file:

.. code-block:: python

from neuroconv.tools.nwb_helpers import configure_backend, BACKEND_NWB_IO

dataset_configurations["acquisition/MyTimeSeries/data"] = dataset_configuration

configure_backend(nwbfile=nwbfile, backend_configuration=default_backend_configuration)
IO = BACKEND_NWB_IO[default_backend_configuration.backend]
with IO("my_nwbfile.nwb", mode="w") as io:
io.write(nwbfile)


Interfaces and Converters
-------------------------

The normal workflow when writing an NWB file using a ``DataInterface`` or ``NWBConverter`` is simple to configure.

The following example uses the :ref:`example data <example_data>` available from the testing repo:

.. code-block:: python

from datetime import datetime

from dateutil import tz
from neuroconv import ConverterPipe
from neuroconv.datainterfaces import SpikeGLXRecordingInterface, PhySortingInterface
from neuroconv.tools.nwb_helpers import (
make_or_load_nwbfile,
get_default_backend_configuration,
configure_backend,
)

# Instantiate interfaces and converter
ap_interface = SpikeGLXRecordingInterface(
file_path=".../spikeglx/Noise4Sam_g0/Noise4Sam_g0_imec0/Noise4Sam_g0_t0.imec0.ap.bin"
)
phy_interface = PhySortingInterface(
folder_path=".../phy/phy_example_0"
)

data_interfaces = [ap_interface, phy_interface]
converter = ConverterPipe(data_interfaces=data_interfaces)

# Fetch available metadata
metadata = converter.get_metadata()

# Create the in-memory NWBFile object and retrieve a default configuration
CodyCBakerPhD marked this conversation as resolved.
Show resolved Hide resolved
backend="hdf5"

nwbfile = converter.create_nwbfile(metadata=metadata)
backend_configuration = converter.get_default_backend_configuration(
nwbfile=nwbfile,
backend=backend,
)

# Make any modifications to the configuration in this step, for example...
dataset_configurations = backend_configuration.dataset_configurations
dataset_configuration = dataset_configurations["acquisition/ElectricalSeriesAP/data"]
dataset_configuration.compression_method = "Blosc"

# Configure and write the NWB file
nwbfile_path = "./my_nwbfile_name.nwb"
converter.run_conversion(
nwbfile_path=nwbfile_path,
nwbfile=nwbfile,
backend_configuration=backend_configuration,
)

If you do not intend to make any alterations to the default configuration for the given backend type, then you can follow the classic workflow:

.. code-block:: python

converter = ConverterPipe(data_interfaces=data_interfaces)

# Fetch available metadata
metadata = converter.get_metadata()

# Create the in-memory NWBFile object and apply the default configuration for HDF5
backend="hdf5"

# Configure and write the NWB file
nwbfile_path = "./my_nwbfile_name.nwb"
converter.run_conversion(
nwbfile_path=nwbfile_path,
nwbfile=nwbfile,
backend=backend,
)

and all datasets in the NWB file will automatically use the default configurations!


Generic tools
CodyCBakerPhD marked this conversation as resolved.
Show resolved Hide resolved
-------------

If you are not using data interfaces or converters you can still use the general tools to configure the backend of any in-memory ``pynwb.NWBFile``:

.. code-block:: python

from uuid import uuid4
from datetime import datetime

from dateutil import tz
from neuroconv.tools.nwb_helpers import make_or_load_nwbfile, get_default_backend_configuration, configure_backend
from pynwb import NWBFile, TimeSeries

nwbfile_path = "./my_nwbfile.nwb"
backend="hdf5"

session_start_time = datetime(2020, 1, 1, 12, 30, 0, tzinfo=tz.gettz("US/Pacific"))
nwbfile = NWBFile(
session_start_time=session_start_time,
session_description="My description...",
identifier=str(uuid4()),
)

# Add neurodata objects to the NWBFile, for example...
time_series = TimeSeries(
name="MyTimeSeries",
description="A time series from my experiment.",
unit="cm/s",
data=[1., 2., 3.],
timestamps=[0.0, 0.2, 0.4],
)
nwbfile.add_acquisition(time_series)

with make_or_load_nwbfile(
h-mayorquin marked this conversation as resolved.
Show resolved Hide resolved
nwbfile_path=nwbfile_path,
nwbfile=nwbfile,
overwrite=True,
backend=backend,
verbose=True,
CodyCBakerPhD marked this conversation as resolved.
Show resolved Hide resolved
):
backend_configuration = get_default_backend_configuration(
nwbfile=nwbfile, backend=backend
)

# Make any modifications to the configuration in this step, for example...
dataset_configurations = backend_configuration.dataset_configurations
dataset_configurations["acquisition/MyTimeSeries/data"].compression_options = dict(level=7)

configure_backend(
nwbfile=nwbfile, backend_configuration=backend_configuration
)



FAQ
---

**How do I see what compression methods are available on my system?**

You can see what compression methods are available on your installation by printing out the following variable:

.. code-block:: python

from neuroconv.tools.nwb_helpers import AVAILABLE_HDF5_COMPRESSION_METHODS

AVAILABLE_HDF5_COMPRESSION_METHODS

.. code-block:: bash

{'gzip': 'gzip',
...
'Zstd': hdf5plugin._filters.Zstd}

And likewise for ``AVAILABLE_ZARR_COMPRESSION_METHODS``.


**Can I modify the maximum shape or data type through the NeuroConv backend configuration?**

Core fields such as the maximum shape and data type of the source data cannot be altered using the NeuroConv backend configuration.

Instead, they would have to be changed at the level of the read operation; these are sometimes exposed to the initialization inputs or source data options.


**Can I specify a buffer shape that incompletely spans the chunks?**

The ``buffer_shape`` must be a multiple of the ``chunk_shape`` along each axis.

This was found to give significant performance increases compared to previous data iterators that caused repeated I/O operations through partial chunk writes.


**How do I disable chunking and compression completely?**

To completely disable chunking for HDF5 backends (i.e., 'contiguous' layout), set both ``chunk_shape=None`` and ``compression_method=None``. Zarr requires all datasets to be chunked.

You could also delete the entry from the NeuroConv backend configuration, which would cause the neurodata object to fallback to whatever default method wrapped the dataset field when it was added to the in-memory ``pynwb.NWBFile``.


**How do I confirm that the backend configuration has been applied?**

The easiest way to check this information is to open the resulting file in ``h5py`` or ``zarr`` and print out the dataset properties.

For example, we can confirm that the dataset was written to disk according to our instructions by using ``h5py`` library to read the file we created in the previous section:

.. code-block:: python

import h5py

with h5py.File("my_nwbfile.nwb", "r") as file:
chunks = file["acquisition/MyTimeSeries/data"].chunks
compression = file["acquisition/MyTimeSeries/data"].compression
compression_options = file["acquisition/MyTimeSeries/data"].compression_opts

print(f"{chunks=}")
print(f"{compression=}")
print(f"{compression_options=}")

Which prints out:

.. code-block:: bash

chunks=(1,)
compression='zstd'
compression_options=7

.. note::

You may have noticed that the name of the key for that compression option got lost in translation; this is because
HDF5 implicitly forces the order of each option in the tuple (or in this case, a scalar).
Original file line number Diff line number Diff line change
Expand Up @@ -21,5 +21,6 @@ and synchronize data across multiple sources.
temporal_alignment
csvs
expand_path
backend_configuration
yaml
docker_demo
Loading