Skip to content

Commit

Permalink
Merge branch 'dev' into postinit
Browse files Browse the repository at this point in the history
  • Loading branch information
mavaylon1 authored Apr 10, 2024
2 parents 4799bb9 + d85d0cb commit 8f991a7
Show file tree
Hide file tree
Showing 6 changed files with 185 additions and 31 deletions.
27 changes: 14 additions & 13 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@

### Enhancements
- Added `TermSetConfigurator` to automatically wrap fields with `TermSetWrapper` according to a configuration file. @mavaylon1 [#1016](https://github.com/hdmf-dev/hdmf/pull/1016)
- Updated `TermSetWrapper` to support validating a single field within a compound array. @mavaylon1 [#1061](https://github.com/hdmf-dev/hdmf/pull/1061)

## HDMF 3.13.0 (March 20, 2024)

Expand Down Expand Up @@ -138,8 +139,8 @@ will increase the minor version number to 3.10.0. See the 3.9.1 release notes be
## HDMF 3.6.0 (May 12, 2023)

### New features and minor improvements
- Updated `ExternalResources` to have `FileTable` and new methods to query data. the `ResourceTable` has been removed along with methods relating to `Resource`. @mavaylon [#850](https://github.com/hdmf-dev/hdmf/pull/850)
- Updated hdmf-common-schema version to 1.6.0. @mavaylon [#850](https://github.com/hdmf-dev/hdmf/pull/850)
- Updated `ExternalResources` to have `FileTable` and new methods to query data. the `ResourceTable` has been removed along with methods relating to `Resource`. @mavaylon1 [#850](https://github.com/hdmf-dev/hdmf/pull/850)
- Updated hdmf-common-schema version to 1.6.0. @mavaylon1 [#850](https://github.com/hdmf-dev/hdmf/pull/850)
- Added testing of HDMF-Zarr on PR and nightly. @rly [#859](https://github.com/hdmf-dev/hdmf/pull/859)
- Replaced `setup.py` with `pyproject.toml`. @rly [#844](https://github.com/hdmf-dev/hdmf/pull/844)
- Use `ruff` instead of `flake8`. @rly [#844](https://github.com/hdmf-dev/hdmf/pull/844)
Expand All @@ -153,7 +154,7 @@ will increase the minor version number to 3.10.0. See the 3.9.1 release notes be
[#853](https://github.com/hdmf-dev/hdmf/pull/853)

### Documentation and tutorial enhancements:
- Updated `ExternalResources` how to tutorial to include the new features. @mavaylon [#850](https://github.com/hdmf-dev/hdmf/pull/850)
- Updated `ExternalResources` how to tutorial to include the new features. @mavaylon1 [#850](https://github.com/hdmf-dev/hdmf/pull/850)

## HDMF 3.5.6 (April 28, 2023)

Expand Down Expand Up @@ -193,13 +194,13 @@ will increase the minor version number to 3.10.0. See the 3.9.1 release notes be

### Bug fixes
- Fixed issue with conda CI. @rly [#823](https://github.com/hdmf-dev/hdmf/pull/823)
- Fixed issue with deprecated `pkg_resources`. @mavaylon [#822](https://github.com/hdmf-dev/hdmf/pull/822)
- Fixed `hdmf.common` deprecation warning. @mavaylon [#826]((https://github.com/hdmf-dev/hdmf/pull/826)
- Fixed issue with deprecated `pkg_resources`. @mavaylon1 [#822](https://github.com/hdmf-dev/hdmf/pull/822)
- Fixed `hdmf.common` deprecation warning. @mavaylon1 [#826]((https://github.com/hdmf-dev/hdmf/pull/826)

### Internal improvements
- A number of typos fixed and Github action running codespell to ensure that no typo sneaks in [#825](https://github.com/hdmf-dev/hdmf/pull/825) was added.
- Added additional documentation for `__fields__` in `AbstactContainer`. @mavaylon [#827](https://github.com/hdmf-dev/hdmf/pull/827)
- Updated warning message for broken links. @mavaylon [#829](https://github.com/hdmf-dev/hdmf/pull/829)
- Added additional documentation for `__fields__` in `AbstactContainer`. @mavaylon1 [#827](https://github.com/hdmf-dev/hdmf/pull/827)
- Updated warning message for broken links. @mavaylon1 [#829](https://github.com/hdmf-dev/hdmf/pull/829)

## HDMF 3.5.1 (January 26, 2023)

Expand All @@ -218,9 +219,9 @@ will increase the minor version number to 3.10.0. See the 3.9.1 release notes be
- Added ``HDMFIO.__del__`` to ensure that I/O objects are being closed on delete. @oruebel[#811](https://github.com/hdmf-dev/hdmf/pull/811)

### Minor improvements
- Added support for reading and writing `ExternalResources` to and from denormalized TSV files. @mavaylon [#799](https://github.com/hdmf-dev/hdmf/pull/799)
- Changed the name of `ExternalResources.export_to_sqlite` to `ExternalResources.to_sqlite`. @mavaylon [#799](https://github.com/hdmf-dev/hdmf/pull/799)
- Updated the tutorial for `ExternalResources`. @mavaylon [#799](https://github.com/hdmf-dev/hdmf/pull/799)
- Added support for reading and writing `ExternalResources` to and from denormalized TSV files. @mavaylon1 [#799](https://github.com/hdmf-dev/hdmf/pull/799)
- Changed the name of `ExternalResources.export_to_sqlite` to `ExternalResources.to_sqlite`. @mavaylon1 [#799](https://github.com/hdmf-dev/hdmf/pull/799)
- Updated the tutorial for `ExternalResources`. @mavaylon1 [#799](https://github.com/hdmf-dev/hdmf/pull/799)
- Added `message` argument for assert methods defined by `hdmf.testing.TestCase` to allow developers to include custom error messages with asserts. @oruebel [#812](https://github.com/hdmf-dev/hdmf/pull/812)
- Clarify the expected chunk shape behavior for `DataChunkIterator`. @oruebel [#813](https://github.com/hdmf-dev/hdmf/pull/813)

Expand Down Expand Up @@ -361,7 +362,7 @@ the fields (i.e., when the constructor sets some fields to fixed values). @rly
- Plotted results in external resources tutorial. @oruebel (#667)
- Added support for Python 3.10. @rly (#679)
- Updated requirements. @rly @TheChymera (#681)
- Improved testing for `ExternalResources`. @mavaylon (#673)
- Improved testing for `ExternalResources`. @mavaylon1 (#673)
- Improved docs for export. @rly (#674)
- Enhanced data chunk iteration speeds through new ``GenericDataChunkIterator`` class. @CodyCBakerPhD (#672)
- Enhanced issue template forms on GitHub. @CodyCBakerPHD (#700)
Expand Down Expand Up @@ -437,7 +438,7 @@ the fields (i.e., when the constructor sets some fields to fixed values). @rly
- Allow passing ``index=True`` to ``DynamicTable.to_dataframe()`` to support returning `DynamicTableRegion` columns
as indices or Pandas DataFrame. @rly (#579)
- Improve ``DynamicTable`` documentation. @rly (#639)
- Updated external resources tutorial. @mavaylon (#611)
- Updated external resources tutorial. @mavaylon1 (#611)

### Breaking changes and deprecations
- Previously, when using ``DynamicTable.__getitem__`` or ``DynamicTable.get`` to access a selection of a
Expand Down Expand Up @@ -522,7 +523,7 @@ the fields (i.e., when the constructor sets some fields to fixed values). @rly
- Add experimental namespace to HDMF common schema. New data types should go in the experimental namespace
(hdmf-experimental) prior to being added to the core (hdmf-common) namespace. The purpose of this is to provide
a place to test new data types that may break backward compatibility as they are refined. @ajtritt (#545)
- `ExternalResources` was changed to support storing both names and URIs for resources. @mavaylon (#517, #548)
- `ExternalResources` was changed to support storing both names and URIs for resources. @mavaylon1 (#517, #548)
- The `VocabData` data type was replaced by `EnumData` to provide more flexible support for data from a set of
fixed values.
- Added `AlignedDynamicTable`, which defines a `DynamicTable` that supports storing a collection of sub-tables.
Expand Down
14 changes: 14 additions & 0 deletions docs/gallery/plot_term_set.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,7 @@
"""
from hdmf.common import DynamicTable, VectorData
import os
import numpy as np

try:
import linkml_runtime # noqa: F401
Expand Down Expand Up @@ -129,6 +130,19 @@
data=TermSetWrapper(value=['Homo sapiens'], termset=terms)
)

######################################################
# Validate Compound Data with TermSetWrapper
# ----------------------------------------------------
# :py:class:`~hdmf.term_set.TermSetWrapper` can be wrapped around compound data.
# The user will set the field within the compound data type that is to be validated
# with the termset.
c_data = np.array([('Homo sapiens', 24)], dtype=[('species', 'U50'), ('age', 'i4')])
data = VectorData(
name='species',
description='...',
data=TermSetWrapper(value=c_data, termset=terms, field='species')
)

######################################################
# Validate Attributes with TermSetWrapper
# ----------------------------------------------------
Expand Down
5 changes: 4 additions & 1 deletion src/hdmf/data_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,10 @@ def append_data(data, arg):
data.append(arg)
return data
elif isinstance(data, np.ndarray):
return np.append(data, np.expand_dims(arg, axis=0), axis=0)
if len(data.dtype)>0: # data is a structured array
return np.append(data, arg)
else: # arg is a scalar or row vector
return np.append(data, np.expand_dims(arg, axis=0), axis=0)
elif isinstance(data, h5py.Dataset):
shape = list(data.shape)
shape[0] += 1
Expand Down
64 changes: 52 additions & 12 deletions src/hdmf/term_set.py
Original file line number Diff line number Diff line change
Expand Up @@ -216,19 +216,26 @@ class TermSetWrapper:
{'name': 'value',
'type': (list, np.ndarray, dict, str, tuple),
'doc': 'The target item that is wrapped, either data or attribute.'},
{'name': 'field', 'type': str, 'default': None,
'doc': 'The field within a compound array.'}
)
def __init__(self, **kwargs):
self.__value = kwargs['value']
self.__termset = kwargs['termset']
self.__field = kwargs['field']
self.__validate()

def __validate(self):
# check if list, tuple, array
if isinstance(self.__value, (list, np.ndarray, tuple)): # TODO: Future ticket on DataIO support
values = self.__value
# create list if none of those -> mostly for attributes
if self.__field is not None:
values = self.__value[self.__field]
else:
values = [self.__value]
# check if list, tuple, array
if isinstance(self.__value, (list, np.ndarray, tuple)):
values = self.__value
# create list if none of those -> mostly for scalar attributes
else:
values = [self.__value]

# iteratively validate
bad_values = []
for term in values:
Expand All @@ -243,6 +250,10 @@ def __validate(self):
def value(self):
return self.__value

@property
def field(self):
return self.__field

@property
def termset(self):
return self.__termset
Expand Down Expand Up @@ -273,26 +284,55 @@ def __iter__(self):
"""
return self.__value.__iter__()

def __multi_validation(self, data):
"""
append_data includes numpy arrays. This is not the same as list append.
Numpy array append is essentially list extend. Now if a user appends an array (for compound data), we need to
support validating arrays with multiple items. This method is an internal bulk validation
check for numpy arrays and extend.
"""
bad_values = []
for item in data:
if not self.termset.validate(term=item):
bad_values.append(item)
return bad_values

def append(self, arg):
"""
This append resolves the wrapper to use the append of the container using
the wrapper.
"""
if self.termset.validate(term=arg):
self.__value = append_data(self.__value, arg)
if isinstance(arg, np.ndarray):
if self.__field is not None: # compound array
values = arg[self.__field]
else:
msg = "Array needs to be a structured array with compound dtype. If this does not apply, use extend."
raise ValueError(msg)
else:
msg = ('"%s" is not in the term set.' % arg)
values = [arg]

bad_values = self.__multi_validation(values)

if len(bad_values)!=0:
msg = ('"%s" is not in the term set.' % ', '.join([str(value) for value in bad_values]))
raise ValueError(msg)

self.__value = append_data(self.__value, arg)

def extend(self, arg):
"""
This append resolves the wrapper to use the extend of the container using
the wrapper.
"""
bad_data = []
for item in arg:
if not self.termset.validate(term=item):
bad_data.append(item)
if isinstance(arg, np.ndarray):
if self.__field is not None: # compound array
values = arg[self.__field]
else:
values = arg
else:
values = arg

bad_data = self.__multi_validation(values)

if len(bad_data)==0:
self.__value = extend_data(self.__value, arg)
Expand Down
95 changes: 95 additions & 0 deletions tests/unit/common/test_table.py
Original file line number Diff line number Diff line change
Expand Up @@ -220,6 +220,101 @@ def test_add_row_validate_bad_data_all_col(self):
with self.assertRaises(ValueError):
species.add_row(Species_1='bad data', Species_2='bad data')

def test_compound_data_append(self):
c_data = np.array([('Homo sapiens', 24)], dtype=[('species', 'U50'), ('age', 'i4')])
c_data2 = np.array([('Mus musculus', 24)], dtype=[('species', 'U50'), ('age', 'i4')])
compound_vector_data = VectorData(
name='Species_1',
description='...',
data=c_data
)
compound_vector_data.append(c_data2)

np.testing.assert_array_equal(compound_vector_data.data, np.append(c_data, c_data2))

@unittest.skipIf(not REQUIREMENTS_INSTALLED, "optional LinkML module is not installed")
def test_array_append_error(self):
c_data = np.array(['Homo sapiens'])
c_data2 = np.array(['Mus musculus'])

terms = TermSet(term_schema_path='tests/unit/example_test_term_set.yaml')
vectordata_termset = VectorData(
name='Species_1',
description='...',
data=TermSetWrapper(value=c_data, termset=terms)
)

with self.assertRaises(ValueError):
vectordata_termset.append(c_data2)

def test_compound_data_extend(self):
c_data = np.array([('Homo sapiens', 24)], dtype=[('species', 'U50'), ('age', 'i4')])
c_data2 = np.array([('Mus musculus', 24)], dtype=[('species', 'U50'), ('age', 'i4')])
compound_vector_data = VectorData(
name='Species_1',
description='...',
data=c_data
)
compound_vector_data.extend(c_data2)

np.testing.assert_array_equal(compound_vector_data.data, np.vstack((c_data, c_data2)))

@unittest.skipIf(not REQUIREMENTS_INSTALLED, "optional LinkML module is not installed")
def test_add_ref_wrapped_array_append(self):
data = np.array(['Homo sapiens'])
data2 = 'Mus musculus'
terms = TermSet(term_schema_path='tests/unit/example_test_term_set.yaml')
vector_data = VectorData(
name='Species_1',
description='...',
data=TermSetWrapper(value=data, termset=terms)
)
vector_data.append(data2)

np.testing.assert_array_equal(vector_data.data.data, np.append(data, data2))

@unittest.skipIf(not REQUIREMENTS_INSTALLED, "optional LinkML module is not installed")
def test_add_ref_wrapped_array_extend(self):
data = np.array(['Homo sapiens'])
data2 = np.array(['Mus musculus'])
terms = TermSet(term_schema_path='tests/unit/example_test_term_set.yaml')
vector_data = VectorData(
name='Species_1',
description='...',
data=TermSetWrapper(value=data, termset=terms)
)
vector_data.extend(data2)

np.testing.assert_array_equal(vector_data.data.data, np.vstack((data, data2)))

@unittest.skipIf(not REQUIREMENTS_INSTALLED, "optional LinkML module is not installed")
def test_add_ref_wrapped_compound_data_append(self):
c_data = np.array([('Homo sapiens', 24)], dtype=[('species', 'U50'), ('age', 'i4')])
c_data2 = np.array([('Mus musculus', 24)], dtype=[('species', 'U50'), ('age', 'i4')])
terms = TermSet(term_schema_path='tests/unit/example_test_term_set.yaml')
compound_vector_data = VectorData(
name='Species_1',
description='...',
data=TermSetWrapper(value=c_data, field='species', termset=terms)
)
compound_vector_data.append(c_data2)

np.testing.assert_array_equal(compound_vector_data.data.data, np.append(c_data, c_data2))

@unittest.skipIf(not REQUIREMENTS_INSTALLED, "optional LinkML module is not installed")
def test_add_ref_wrapped_compound_data_extend(self):
c_data = np.array([('Homo sapiens', 24)], dtype=[('species', 'U50'), ('age', 'i4')])
c_data2 = np.array([('Mus musculus', 24)], dtype=[('species', 'U50'), ('age', 'i4')])
terms = TermSet(term_schema_path='tests/unit/example_test_term_set.yaml')
compound_vector_data = VectorData(
name='Species_1',
description='...',
data=TermSetWrapper(value=c_data, field='species', termset=terms)
)
compound_vector_data.extend(c_data2)

np.testing.assert_array_equal(compound_vector_data.data.data, np.vstack((c_data, c_data2)))

def test_constructor_bad_columns(self):
columns = ['bad_column']
msg = "'columns' must be a list of dict, VectorData, DynamicTableRegion, or VectorIndex"
Expand Down
11 changes: 6 additions & 5 deletions tests/unit/test_term_set.py
Original file line number Diff line number Diff line change
Expand Up @@ -155,21 +155,22 @@ def setUp(self):
self.wrapped_array = TermSetWrapper(value=np.array(['Homo sapiens']), termset=self.termset)
self.wrapped_list = TermSetWrapper(value=['Homo sapiens'], termset=self.termset)

c_data = np.array([('Homo sapiens', 24)], dtype=[('species', 'U50'), ('age', 'i4')])
self.wrapped_comp_array = TermSetWrapper(value=c_data,
termset=self.termset,
field='species')

self.np_data = VectorData(
name='Species_1',
description='...',
data=self.wrapped_array
)
self.list_data = VectorData(
name='Species_1',
description='...',
data=self.wrapped_list
)

def test_properties(self):
self.assertEqual(self.wrapped_array.value, ['Homo sapiens'])
self.assertEqual(self.wrapped_array.termset.view_set, self.termset.view_set)
self.assertEqual(self.wrapped_array.dtype, 'U12') # this covers __getattr__
self.assertEqual(self.wrapped_comp_array.field, 'species')

def test_get_item(self):
self.assertEqual(self.np_data.data[0], 'Homo sapiens')
Expand Down

0 comments on commit 8f991a7

Please sign in to comment.