Skip to content

Commit

Permalink
Fix bug when exporting HDF5 datasets with unlimited dimension (#155)
Browse files Browse the repository at this point in the history
* Fix #154 determine datashape for unlimtited HDF5 datasets on write

* Update changelog

* Add unit test using maxshape with None values in HDF5

* Exclude backup code from codecov

* Simplify io options logic
  • Loading branch information
oruebel authored Jan 16, 2024
1 parent 71bfa6d commit abf7a02
Show file tree
Hide file tree
Showing 3 changed files with 20 additions and 5 deletions.
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,9 @@
### Enhancements
* Enhanced `ZarrIO` and `ZarrDataIO` to infer io settings (e.g., chunking and compression) from HDF5 datasets to preserve storage settings on export if possible @oruebel [#153](https://github.com/hdmf-dev/hdmf-zarr/pull/153)

### Bug Fixes
* Fixed bug when converting HDF5 datasets with unlimited dimensions @oruebel [#155](https://github.com/hdmf-dev/hdmf-zarr/pull/155)

## 0.5.0 (December 8, 2023)

### Enhancements
Expand Down
16 changes: 11 additions & 5 deletions src/hdmf_zarr/backend.py
Original file line number Diff line number Diff line change
Expand Up @@ -1174,9 +1174,8 @@ def __list_fill__(self, parent, name, data, options=None): # noqa: C901
io_settings = dict()
if options is not None:
dtype = options.get('dtype')
io_settings = options.get('io_settings')
if io_settings is None:
io_settings = dict()
if options.get('io_settings') is not None:
io_settings = options.get('io_settings')
# Determine the dtype
if not isinstance(dtype, type):
try:
Expand All @@ -1191,9 +1190,16 @@ def __list_fill__(self, parent, name, data, options=None): # noqa: C901
# Determine the shape and update the dtype if necessary when dtype==object
if 'shape' in io_settings: # Use the shape set by the user
data_shape = io_settings.pop('shape')
# If we have a numeric numpy array then use its shape
# If we have a numeric numpy-like array (e.g., numpy.array or h5py.Dataset) then use its shape
elif isinstance(dtype, np.dtype) and np.issubdtype(dtype, np.number) or dtype == np.bool_:
data_shape = get_data_shape(data)
# HDMF's get_data_shape may return the maxshape of an HDF5 dataset which can include None values
# which Zarr does not allow for dataset shape. Check for the shape attribute first before falling
# back on get_data_shape
if hasattr(data, 'shape') and data.shape is not None:
data_shape = data.shape
# This is a fall-back just in case. However this should not happen for standard numpy and h5py arrays
else: # pragma: no cover
data_shape = get_data_shape(data) # pragma: no cover
# Deal with object dtype
elif isinstance(dtype, np.dtype):
data = data[:] # load the data in case we come from HDF5 or another on-disk data source we don't know
Expand Down
6 changes: 6 additions & 0 deletions tests/unit/test_io_convert.py
Original file line number Diff line number Diff line change
Expand Up @@ -868,6 +868,12 @@ def __get_data_array(self, foo_container):
"""For a container created by __roundtrip_data return the data array"""
return foo_container.buckets['bucket1'].foos['foo1'].my_data

def test_maxshape(self):
"""test when maxshape is set for the dataset"""
data = H5DataIO(data=list(range(5)), maxshape=(None,))
self.__roundtrip_data(data=data)
self.assertContainerEqual(self.out_container, self.read_container, ignore_hdmf_attrs=True)

def test_nofilters(self):
"""basic test that export without any options specified is working as expected"""
data = list(range(5))
Expand Down

0 comments on commit abf7a02

Please sign in to comment.