Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write block tutorial #696

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 51 additions & 0 deletions docs/source/tutorials/writing.md
TibbersHao marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,57 @@ Read the data.
2 3 6
```

In some scenarios, you may want to write your data a chunk at a time, rather than sending it all at once. This might be in cases where the full data is not available at once, or the data is too large for memory. This can be achieved in two ways:

The first one is to stack them before saving back to client using the above mentioned `write_array` method. This works when the size of data is small.

When the size of merged data becomes an issue for memory, or in cases when you want to save the result on-the-fly as each individual array is generated, this could be achieved by using the `write_block` method with a pre-allocated space in client.

```python
# This approach will require you to know the final array dimension beforehand.

# Assuming you have five 2d arrays (eg. images), each in shape of 32 by 32.
>>> stacked_array_shape = (5, 32, 32)

# Define a tiled ArrayStructure based on shape
>>> import numpy
>>> from tiled.structures.array import ArrayStructure

>>> structure = ArrayStructure.from_array(numpy.zeros(stacked_array_shape, dtype=numpy.int8)) # A good practice to keep the dtype the same as your final results to avoid mismatch.
>>> structure
ArrayStructure(data_type=BuiltinDtype(endianness='not_applicable', kind=<Kind.integer: 'i'>, itemsize=1), chunks=((5,), (32,), (32,)), shape=(5, 32, 32), dims=None, resizable=False)

# Re-define the chunk size to allow single array to be saved.
TibbersHao marked this conversation as resolved.
Show resolved Hide resolved
# In our example, this becomes ((1, 1, 1, 1, 1), (32,), (32,))
>>> structure.chunks = ((1,) * stacked_array_shape[0], (stacked_array_shape[1],), (stacked_array_shape[2],))

# Now to see that the chunk for the first axis has been divided.
>>> structure
ArrayStructure(data_type=BuiltinDtype(endianness='not_applicable', kind=<Kind.integer: 'i'>, itemsize=1), chunks=((1, 1, 1, 1, 1), (32,), (32,)), shape=(5, 32, 32), dims=None, resizable=False)

# Allocate a new array client in tiled
# Note: the following line of code works for tiled version <= v.0.1.0a114
>>> array_client = client.new(structure_family="array", structure=structure, key="stacked_result", metadata={"color": "yellow", "barcode": 13})

# For tiled version >= v0.1.0a115, consider the following
>>> from tiled.structures.data_source import DataSource
>>> data_source = DataSource(structure=structure, structure_family="array")
>>> array_client = client.new(structure_family="array", data_sources=[data_source], key ="stacked_result", metadata={"color": "yellow", "barcode": 13})

>>> array_client
<ArrayClient shape=(5, 32, 32) chunks=((1, 1, 1, 1, 1), (32,), (32,)) dtype=int8>

# Save a single slice with specific index
# Save to the first array (first block index 0)
>>> first_array = numpy.random.rand(32, 32).astype(numpy.int8)
>>> array_client.write_block(first_array, block=(0, 0, 0))

# Save to the 3rd array (first block index 2)
>>> third_array = numpy.random.rand(32, 32).astype(numpy.int8)
>>> array_client.write_block(third_array, block=(2, 0, 0))
```


## Launch catalog with persistent data

First, we initialize a file which Tiled will use as a database.
Expand Down
Loading