Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I/O support for the ndx-pose NWB extension #166

Closed
wants to merge 48 commits into from

Conversation

edeno
Copy link

@edeno edeno commented Apr 19, 2024

Description

What is this PR

  • Bug fix
  • Addition of a new feature
  • Other

Why is this PR needed?
NWB is a standardized format for neuroscience data. As one of the goals of this package is to standardize output from pose estimation algorithms, it would be useful to output to NWB. In addition, importing from the ndx-pose extension would be useful as future algorithms could simply output to this format and then import into the common movement format.

What does this PR do?
Exports skeletons and PoseEstimation objects from the ndx-pose extension.

References

Fix #23

How has this PR been tested?

Added export example in examples.

Is this a breaking change?

No.

Does this PR require an update to the documentation?

Have not updated documentation yet.

Checklist:

  • The code has been tested locally
  • Tests have been added to cover all new functionality
  • The documentation has been updated to reflect any changes
  • The code has been formatted with pre-commit

To-Do

  • Import from nwb file to movement format
  • Add kinematics

Notes:

The ndx-pose package on pypi is not updated yet (@rly). The current code utilizes a pip install directly from https://github.com/rly/ndx-pose/ however the pre-commit hooks would not allow me to add a pip dependency for installing directly from the github repo.

The current export code does not include anything about the training of the models.

Copy link

codecov bot commented Apr 19, 2024

Codecov Report

Attention: Patch coverage is 0% with 57 lines in your changes missing coverage. Please review.

Project coverage is 91.08%. Comparing base (cf0a6b1) to head (739c4d8).

Current head 739c4d8 differs from pull request most recent head 90c3287

Please upload reports for the commit 90c3287 to get more accurate results.

Files Patch % Lines
movement/io/nwb.py 0.00% 57 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #166      +/-   ##
==========================================
- Coverage   99.68%   91.08%   -8.60%     
==========================================
  Files          11       11              
  Lines         639      662      +23     
==========================================
- Hits          637      603      -34     
- Misses          2       59      +57     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@niksirbi
Copy link
Member

Thanks a lot for this @edeno 🤗 .
We've been wanting to integrate with .nwb for a while, this is definitely on our roadmap. We'll take a look at the PR next week and get back to you with comments.

@niksirbi
Copy link
Member

Just a heads up that we've just merged a PR introducing some new linting rules for docstrings. So it's probably a good idea to update from the main branch (via rebase or a merge commit), to avoid any merge conflicts.

Copy link

Quality Gate Passed Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarCloud

Copy link
Member

@niksirbi niksirbi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this @edeno, I've had a look at the code and it looks nicely done.

I've left many comments at particularly places in the code, but most of them are small pedantic things (aimed at ensuring consistency in formatting and language throughout the project).

On a higher level, I have two substantial comments:

Tests needed

The main thing to be done is testing. The code appears to do what it claims to do, and some of that functionality is verifiable through the example you added. That said, we want to be absolutely sure that the conversions from NWB to movement and back are properly done. We could take a similar approach to how we test other file types.

Specifically, we could host a valid ndx-pose .nwb file as part of our sample data and use it for the tests. At minimum, what I'd like to see is the following: load the nwb file, convert it to a movement dataset, validate the movement dataset (we have some existing functionality for that), then convert it back to ndx-pose and check that we end up with the same thing we started with. You can check the existing IO integration tests for inspiration. We also have more fine-grained unit tests for loading and saving files.

If implementing the tests looks too daunting or you can't spare the time, I'm happy to help with writing this. It would be certainly helpful if you could point me to a valid ndx-pose file that we can publicly share with our sample data. Also, since I'm not very familiar with the NWB ecosystem, do you happen to know of an nwb/ndx-pose validator tool we can use you programmatically check that the movement-exported nwb files are valid?

User-facing interface for loading/saving NWB files

For context, what follows is my vision for what an ideal movement-nwb integration could look like. This doesn't need to be implemented in the current PR. I'm happy to merge this PR after the tests are added (and ndx-pose 0.2 is released on PyPI), and we can refactor the interface in a future PR.

One of our major aspirations is to present a unified interface for loading/saving files containing pose data, as agnostic to the file format as possible. Of course this is easier said than done. Currently, the interface for loading/saving files works like this:

from movement.io import load_poses, save_poses

ds = load_poses.from_dlc_file(file_path)  # DLC
ds = load_poses.from_sleap_file(file_path)  # SLEAP
ds = load_poses.from_lp_file(file_path) # LigthningPose

# the consistency allows as to wrap the above in a more generic function
ds = load_poses.from_file(file_path, source_software)

# the interface for saving is similar, e.g.
save_poses.to_dlc_file(ds, file_path)

I'm keen to maintain the consistency as much as possible/practical. Ideally this would mean exposing the following functions:

from movement.io import load_poses, save_poses

ds = load_poses.from_nwb_file(file_path)
save_poses.to_nwb_file(ds, file_path)

The actual under-the hood machinery can stay in the nwb.py module you've introduced, but what the user "sees" should be similar to the existing loading/saving functions.

Reading your implementation, I realised that this may not be that easy for ndx-pose .nwb files, because (as far as I can tell), there is strictly one subject per file. In that case, we have to do one of the following:

  • accept a list of more than one file paths in the load_poses.from_nwb_file function, this is sort of the route you've chosen.
  • still accept only 1 file, same as the other loading functions, and provide a separate function for merging individuals into a common dataset (after loading). This could be as easy as xr.merge, which we can showcase with an example.

The save_poses.to_nwb_file() is more straightforward to implement I think, because we can always choose to split a multi-subject dataset into multiple .nwb file and add the subject's ID to the file suffix. We do something similar for the save_poses.to_dlc_file() function when split_individuals=True.

Let me know your take on these ideas about the interface, I'm keen to have feedback on the API.

movement/io/nwb.py Outdated Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
movement/io/nwb.py Outdated Show resolved Hide resolved
movement/io/nwb.py Outdated Show resolved Hide resolved
movement/io/nwb.py Outdated Show resolved Hide resolved
movement/io/nwb.py Outdated Show resolved Hide resolved
)


def convert_nwb_to_movement(nwb_filepaths: list[str]) -> xr.Dataset:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly for this function, it's public, so a full docstring is needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be an assymetry between the two converter functions:
convert_nwb_to_movement reads nwb files from disk, while convert_movement_to_nwb writes to the NWB file handlers, without actually writing to disk.

To be consistent with our other loading/saving functions, I'd prefer to have functions to read from nwb file(s) on disk to movement ds, and functions for doing the exact opposite, i.e. write from movement ds to nwb file(s) on disk.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, in our other IO functions we accept both str and pathlib.Path as file paths.
In this particular fuction it would make sense to accept either a single path (Union[Path, str]) (which would work for single-subject datasets), or a list of such paths.

You could alro re-use our movement.io.validators.ValidFile validator for checking the paths, either directly, or via the _validate_file_path utility that can be found in movement.io.save_poses

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my reasoning for not writing from movement dataset to NWB file on disk is because there are often times where you'd like to add other things to the NWB file before writing to disk.

examples/nwb_conversion.py Outdated Show resolved Hide resolved
examples/nwb_conversion.py Outdated Show resolved Hide resolved
movement/io/nwb.py Outdated Show resolved Hide resolved
@niksirbi
Copy link
Member

niksirbi commented May 7, 2024

Hey @edeno, a recently merged PR introduced some breaking changes that affect the sample data and tests. You'd have to update (again) from the main branch (via rebase or a merge commit), for the CI to work and to avoid any merge conflicts. Sorry about that.

Also, do you need any help with making progress on this PR? Don't hesitate to poke me if you do.

@edeno
Copy link
Author

edeno commented May 7, 2024

Hi @niksirbi, that's fine. Happy to update.

Also thank you for your review. I haven't had time to work on this recently but I was intending to pick it up again when I have some more free time.

@niksirbi
Copy link
Member

niksirbi commented May 8, 2024

Hi @niksirbi, that's fine. Happy to update.

Also thank you for your review. I haven't had time to work on this recently but I was intending to pick it up again when I have some more free time.

No worries, this is not urgent. It's more important to get it right than to do it fast. Let me know when you get back to it (or if at any point you realise you can't find the time).

@niksirbi
Copy link
Member

I've written a "developer guide" for adding new Input/Output functionalities to movement, partly motivated by the ongoing work in this PR. I hope that it will help clarify the relevant code structure and make it easier to find things for contributors such as yourself @edeno. Have a read when you get back to working on this PR and let me know if anything is unclear.

Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
C Reliability Rating on New Code (required ≥ A)

See analysis details on SonarCloud

Catch issues before they fail your Quality Gate with our IDE extension SonarLint

@niksirbi
Copy link
Member

Hi @niksirbi, I am unfortunately somewhat limited in my time in the next month or so. Happy to have you take it over and I can fill in details where necessary.

That's completely fine, you've already done most of the work. I'll just have to go over it again and make sure it's consistent with the rest of our codebase. I'll ping you if I have concrete questions.

@niksirbi
Copy link
Member

niksirbi commented Nov 8, 2024

See also https://github.com/talmolab/sleap-io/pull/143/files (linking for my own reference).

@luiztauffer
Copy link

hey guys, this looks great and very useful! Any chance this could be merged soon? I'm interested in using movement for a couple of projects, and NWB integration is important.

I'd like to give my two cents on two points:

  1. Many times the name attribute of an nwb object has a value different from the default, that is, not every ndx_pose.PoseEstimation will have the name PoseEstimation, and in those cases the line 276 will give an error. An easy way around that would be an optional argument:
def convert_nwb_to_movement(
    nwb_filepaths: str | list[str] | list[Path],
    key_name: str = "PoseEstimation",
) -> xr.Dataset:
   ...
            pose_estimation = nwbfile.processing["behavior"][key_name]
  1. I agree with @edeno point that many times you want to add other things to an nwb file object before writing it to disk, so it is a common practice to separate functions that add to the object from functions that write to disk. You could have both, I suppose

@niksirbi
Copy link
Member

Thank a lot for your input @luiztauffer.

This PR is on my priority list, and I expect to complete it within the next ~2 weeks.

Regarding the two points you made, I agree on both counts, and I will make sure to also include functions to read/write NWB file objects (not just from/to disk).

I'd value your feedback on this feature, when its ready. Don't hesitate to mention any other requirements that come to mind and to file bug reports if/when necessary.

@luiztauffer
Copy link

great, @niksirbi! Let me know if I could help in any way

@niksirbi
Copy link
Member

niksirbi commented Dec 9, 2024

Apologies for the delay @luiztauffer, I got caught up in other things, but I've resumed work on this now.
I've started a new PR to avoid pushing onto @edeno's branch. Since that PR branched off this one, all of Eric's contributions are preserved.

I will soon close this PR as superseded by #360.

@luiztauffer would you be OK with me pinging you when the new PR is ready to be reviewed? I'd like to make sure that the implementation covers your use cases.

@luiztauffer
Copy link

of course @niksirbi! thanks for your work on this

@niksirbi niksirbi closed this Dec 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

I/O support for the ndx-pose NWB extension
3 participants