Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add transforms module with scale function #384

Merged

Conversation

stellaprins
Copy link
Contributor

@stellaprins stellaprins commented Jan 23, 2025

Description

What is this PR

  • Bug fix
  • Addition of a new feature
  • Other

Why is this PR needed?
Enable users to convert data arrays expressed in pixels to SI units (like meters). Users can scale data to a known reference size (e.g. the size of a cage) and add appropriate units. This allows distances to be represented in standard units instead of just the number of pixels.

What does this PR do?
Adds a transforms module with a scale function. The scale function scales data (xarray.DataArray) by a given factor with an optional unit (str| None). Units can by any strings (e.g. "elephants") and will be added to xarray.DataArray.attrs["unit"]. Passing None as a unit (explicitely or by default) "dequantifies" the data (i.e. drops .attrs["unit"]).

I've looked at pint-array but as it stands units are simply strings. Another option using pint-array to what @niksirbi mentioned in #141 (i.e. something like data.pint.quantify({'distance': 'metres', 'time': 'seconds'}), could be to use the .attrs['units'] entry for each data variable (see below).

Unit-aware arithmetic in Xarray, via pint

Alternatively, we can quantify from the object's .attrs, automatically reading the metadata which xarray objects carry around. If nothing is passed to .quantify(), it will attempt to parse the .attrs['units'] entry for each data variable.

References

part of #366.

How has this PR been tested?

The scale function has been tested with various unit tests to ensure it works correctly. These tests include:

  • Correct scaling when passing different factors and units.
  • Last unit is used and product of scaling factors when same data is scaled twice.
  • Applying scaling to the correct dimension when data is transposed.
  • Applying scaling to the first matching dimension if multiple dimensions match the scaling factor's length.

Is this a breaking change?

No.

Does this PR require an update to the documentation?

Docstrings have been added to the module and all functions. No further documentation needed.

Checklist:

  • The code has been tested locally
  • Tests have been added to cover all new functionality
  • The documentation has been updated to reflect any changes
  • The code has been formatted with pre-commit

Copy link

codecov bot commented Jan 23, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.80%. Comparing base (df4b4b1) to head (95f6d7e).
Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #384   +/-   ##
=======================================
  Coverage   99.80%   99.80%           
=======================================
  Files          14       15    +1     
  Lines        1025     1048   +23     
=======================================
+ Hits         1023     1046   +23     
  Misses          2        2           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@stellaprins stellaprins linked an issue Jan 24, 2025 that may be closed by this pull request
@stellaprins stellaprins requested a review from niksirbi January 24, 2025 09:09
Copy link
Member

@niksirbi niksirbi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your first movement contribution, @stellaprins!

This is really well done and thoroughly tested. I do have an alternative suggestion for the implementation, though:

  • I think the scale function (and any future linear transforms of this kind) should only work on data arrays with a space dimension (with Cartesian coordinates), and broadcasting should happen only along that dimension. Please see my specific comments for details.
  • I believe the new attribute should be called space_unit, mirroring the existing time_unit attribute we populate when loading a dataset. In the future, I’m inclined to merge these two into an attribute named units that accepts a dictionary mapping dimension names to units (as you mentioned in your PR description). However, that should be handled in a separate issue/PR, possibly in conjunction with the pint-xarray issue. For now, renaming unit to space_unit is perfectly fine.

On the same topic, since we are introducing a new attribute, I wonder if it would be worth populating it directly when a dataset is loaded from a file, as we do for time_unit. For most of our supported formats, space_unit would be "pixels", with the possible exception of "Anipose" (I need to double-check). I have opened an issue to keep track of this idea.

movement/transforms.py Outdated Show resolved Hide resolved
movement/transforms.py Show resolved Hide resolved
movement/transforms.py Outdated Show resolved Hide resolved
movement/transforms.py Outdated Show resolved Hide resolved
tests/test_unit/test_transforms.py Show resolved Hide resolved
tests/test_unit/test_transforms.py Outdated Show resolved Hide resolved
tests/test_unit/test_transforms.py Show resolved Hide resolved
tests/test_unit/test_transforms.py Outdated Show resolved Hide resolved
tests/test_unit/test_transforms.py Show resolved Hide resolved
@stellaprins stellaprins requested a review from niksirbi January 28, 2025 11:12
Copy link
Member

@niksirbi niksirbi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only have two remaining minor comments.

I think we can get rid of a paragraph in the docstring, and I think we should add a simple test to confirm that scaling data arrays with 3d space (x,y,z) works (it indeed does).

I'm pre-approving this, so feel free to merge once you've dealt with these comments.

movement/transforms.py Outdated Show resolved Hide resolved
movement/transforms.py Outdated Show resolved Hide resolved
@stellaprins stellaprins added this pull request to the merge queue Jan 30, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to no response for status checks Jan 30, 2025
@willGraham01 willGraham01 added this pull request to the merge queue Jan 30, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to no response for status checks Jan 30, 2025
@willGraham01
Copy link
Contributor

@stellaprins It looks like there's a GH-wide issue with pull requests, which is why this has been kicked back out of the merge queue (even though it passes the tests!).

Once the status clears (likely not till tomorrow) try hitting the button again!

@willGraham01 willGraham01 added this pull request to the merge queue Jan 30, 2025
Merged via the queue into neuroinformatics-unit:main with commit 41942e1 Jan 30, 2025
18 checks passed
@willGraham01 willGraham01 deleted the sp/366-transforms-scale branch January 30, 2025 17:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Scale pixels to SI units
3 participants