You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have been working on an iterator for conversions involving a large number of images to prevent excessive RAM usage when loading the images as arrays. Together with @oruebel, I developed a working example that performs well for cases involving only an Images container.
Click For Minimal Example with Image Iterator
fromscipy.datasetsimportfacefromhdmf.data_utilsimportAbstractDataChunkIterator, DataChunkfrompynwb.testing.mock.fileimportmock_NWBFilefrompynwb.baseimportImageReferencesfrompynwb.imageimportGrayscaleImage, Images, IndexSeries, RGBImagefrompathlibimportPathfromPILimportImageimportnumpyasnpclassSingleImageIterator(AbstractDataChunkIterator):
"""Simple iterator to return a single image. This avoids loading the entire image into memory at initializing and instead loads it at writing time one by one"""def__init__(self, filename):
self._filename=Path(filename)
# Get image information without loading the full imagewithImage.open(self._filename) asimg:
self.image_mode=img.modeself._image_shape=img.size[::-1] # PIL uses (width, height) instead of (height, width)self._max_shape= (None, None)
self.number_of_bands=len(img.getbands())
ifself.number_of_bands>1:
self._image_shape+= (self.number_of_bands,)
self._max_shape+= (self.number_of_bands,)
# Calculate file size in bytesself._size_bytes=self._filename.stat().st_size# Calculate approximate memory size when loaded as numpy arrayself._memory_size=np.prod(self._image_shape) *np.dtype(float).itemsizeself._images_returned=0# Number of images returned in __next__def__iter__(self):
"""Return the iterator object"""returnselfdef__next__(self):
""" Return the DataChunk with the single full image """ifself._images_returned==0:
data=np.asarray(Image.open(self._filename))
selection= (slice(None),) *data.ndimself._images_returned+=1returnDataChunk(data=data, selection=selection)
else:
raiseStopIterationdefrecommended_chunk_shape(self):
""" Recommend the chunk shape for the data array. """returnself._image_shapedefrecommended_data_shape(self):
""" Recommend the initial shape for the data array. """returnself._image_shape@propertydefdtype(self):
""" Define the data type of the array """returnnp.dtype(float)
@propertydefmaxshape(self):
""" Property describing the maximum shape of the data array that is being iterated over """returnself._max_shapedef__len__(self):
returnself._image_shape[0]
@propertydefsize_info(self):
""" Return dictionary with size information """return {
'file_size_bytes': self._size_bytes,
'memory_size_bytes': self._memory_size,
'shape': self._image_shape,
'mode': self.image_mode,
'bands': self.number_of_bands
}
gs_image_array=face(gray=True)
rgb_image_array=face()
use_iterator=Trueifnotuse_iterator:
gs_face_object=GrayscaleImage(
name="gs_face",
data=gs_image_array,
description="Grayscale version of a raccoon.",
resolution=35.433071,
)
rgb_face_object=RGBImage(
name="rgb_face",
data=rgb_image_array,
resolution=70.0,
description="RGB version of a raccoon.",
)
else:
# Save the images to diskimage_folder=Path("./images")
image_folder.mkdir(parents=True, exist_ok=True)
gs_file_path=image_folder/"gs_face.png"rgb_file_path=image_folder/"rgb_face.png"Image.fromarray(gs_image_array).save(gs_file_path)
Image.fromarray(rgb_image_array).save(rgb_file_path)
gs_face_object=GrayscaleImage(
name="gs_face",
data=SingleImageIterator(filename=gs_file_path),
description="Grayscale version of a raccoon.",
resolution=35.433071,
)
rgb_face_object=RGBImage(
name="rgb_face",
data=SingleImageIterator(filename=rgb_file_path),
resolution=70.0,
description="RGB version of a raccoon.",
)
images= [gs_face_object, rgb_face_object]
order_of_images=ImageReferences("order_of_images", images)
print(f"order of images shape: {order_of_images_shape}")
image_container=Images(
name="raccoons",
images=images,
description="A collection of raccoons.",
order_of_images=ImageReferences("order_of_images", images),
)
idx_series=IndexSeries(
name="stimuli",
data=np.asarray([0, 1, 0, 1], dtype=np.uint64),
indexed_images=image_container,
unit="N/A",
timestamps=[0.1, 0.2, 0.3, 0.4],
)
nwbfile=mock_NWBFile()
nwbfile.add_stimulus_template(image_container)
nwbfile.add_stimulus(idx_series)
frompynwbimportNWBHDF5IOnwbfile_path="test_images.nwb"withNWBHDF5IO(nwbfile_path, "w") asio:
io.write(nwbfile)
# nwbfile_read = NWBHDF5IO(nwbfile_path, "r").read()# nwbfile_read
However, a problem arises when I try to use the images as templates to be referenced by an IndexSeries, which requires an ImageReferences object. The ImageReferences object contains a list of images where the data attribute is an iterator. Writing the NWB file in this case fails because hdmf.utils.get_data_shape attempts to infer the shape of the data by indexing the iterator, which is not subscriptable. This results in an error:
Additionally, even without an iterator, the get_data_shape method calculates an incorrect shape for a ImageReferences. For example, it reports a shape of [2, height, width] for a list with two references, which conflicts with the spec for ImageReferences, where the expected shape is [None]:
fromhdmf.utilsimportget_data_shapeorder_of_images_shape=get_data_shape(order_of_images.data)
print(f"order of images shape: {order_of_images_shape}")
orderofimagesshape: (2, 768, 1024)
I have been working on an iterator for conversions involving a large number of images to prevent excessive RAM usage when loading the images as arrays. Together with @oruebel, I developed a working example that performs well for cases involving only an
Images
container.Click For Minimal Example with Image Iterator
However, a problem arises when I try to use the images as templates to be referenced by an
IndexSeries
, which requires anImageReferences
object. The ImageReferences object contains a list of images where the data attribute is an iterator. Writing the NWB file in this case fails becausehdmf.utils.get_data_shape
attempts to infer the shape of the data by indexing the iterator, which is not subscriptable. This results in an error:Click For Erorr Trace
Additionally, even without an iterator, the
get_data_shape
method calculates an incorrect shape for aImageReferences
. For example, it reports a shape of [2, height, width] for a list with two references, which conflicts with the spec forImageReferences
, where the expected shape is[None]
:This mismatch triggers the following warning:
hdmf/src/hdmf/build/objectmapper.py
Lines 872 to 876 in ff4a0aa
The warning indicates that the calculation is incorrect because the spec shape of ImageReferences expects [None].
@rly and I were discussing a fix for this issue, which I will add as a PR soon.
The text was updated successfully, but these errors were encountered: