-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I/O support for the ndx-pose NWB extension #166
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #166 +/- ##
==========================================
- Coverage 99.68% 91.08% -8.60%
==========================================
Files 11 11
Lines 639 662 +23
==========================================
- Hits 637 603 -34
- Misses 2 59 +57 ☔ View full report in Codecov by Sentry. |
Thanks a lot for this @edeno 🤗 . |
Just a heads up that we've just merged a PR introducing some new linting rules for docstrings. So it's probably a good idea to update from the |
for more information, see https://pre-commit.ci
Quality Gate passedIssues Measures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for this @edeno, I've had a look at the code and it looks nicely done.
I've left many comments at particularly places in the code, but most of them are small pedantic things (aimed at ensuring consistency in formatting and language throughout the project).
On a higher level, I have two substantial comments:
Tests needed
The main thing to be done is testing. The code appears to do what it claims to do, and some of that functionality is verifiable through the example you added. That said, we want to be absolutely sure that the conversions from NWB to movement and back are properly done. We could take a similar approach to how we test other file types.
Specifically, we could host a valid ndx-pose .nwb file as part of our sample data and use it for the tests. At minimum, what I'd like to see is the following: load the nwb file, convert it to a movement dataset, validate the movement dataset (we have some existing functionality for that), then convert it back to ndx-pose and check that we end up with the same thing we started with. You can check the existing IO integration tests for inspiration. We also have more fine-grained unit tests for loading and saving files.
If implementing the tests looks too daunting or you can't spare the time, I'm happy to help with writing this. It would be certainly helpful if you could point me to a valid ndx-pose file that we can publicly share with our sample data. Also, since I'm not very familiar with the NWB ecosystem, do you happen to know of an nwb/ndx-pose validator tool we can use you programmatically check that the movement-exported nwb files are valid?
User-facing interface for loading/saving NWB files
For context, what follows is my vision for what an ideal movement-nwb integration could look like. This doesn't need to be implemented in the current PR. I'm happy to merge this PR after the tests are added (and ndx-pose 0.2 is released on PyPI), and we can refactor the interface in a future PR.
One of our major aspirations is to present a unified interface for loading/saving files containing pose data, as agnostic to the file format as possible. Of course this is easier said than done. Currently, the interface for loading/saving files works like this:
from movement.io import load_poses, save_poses
ds = load_poses.from_dlc_file(file_path) # DLC
ds = load_poses.from_sleap_file(file_path) # SLEAP
ds = load_poses.from_lp_file(file_path) # LigthningPose
# the consistency allows as to wrap the above in a more generic function
ds = load_poses.from_file(file_path, source_software)
# the interface for saving is similar, e.g.
save_poses.to_dlc_file(ds, file_path)
I'm keen to maintain the consistency as much as possible/practical. Ideally this would mean exposing the following functions:
from movement.io import load_poses, save_poses
ds = load_poses.from_nwb_file(file_path)
save_poses.to_nwb_file(ds, file_path)
The actual under-the hood machinery can stay in the nwb.py
module you've introduced, but what the user "sees" should be similar to the existing loading/saving functions.
Reading your implementation, I realised that this may not be that easy for ndx-pose .nwb files, because (as far as I can tell), there is strictly one subject per file. In that case, we have to do one of the following:
- accept a list of more than one file paths in the
load_poses.from_nwb_file
function, this is sort of the route you've chosen. - still accept only 1 file, same as the other loading functions, and provide a separate function for merging individuals into a common dataset (after loading). This could be as easy as
xr.merge
, which we can showcase with an example.
The save_poses.to_nwb_file()
is more straightforward to implement I think, because we can always choose to split a multi-subject dataset into multiple .nwb file and add the subject's ID to the file suffix. We do something similar for the save_poses.to_dlc_file()
function when split_individuals=True
.
Let me know your take on these ideas about the interface, I'm keen to have feedback on the API.
movement/io/nwb.py
Outdated
) | ||
|
||
|
||
def convert_nwb_to_movement(nwb_filepaths: list[str]) -> xr.Dataset: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly for this function, it's public, so a full docstring is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seems to be an assymetry between the two converter functions:
convert_nwb_to_movement
reads nwb files from disk, while convert_movement_to_nwb
writes to the NWB file handlers, without actually writing to disk.
To be consistent with our other loading/saving functions, I'd prefer to have functions to read from nwb file(s) on disk to movement ds, and functions for doing the exact opposite, i.e. write from movement ds to nwb file(s) on disk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, in our other IO functions we accept both str
and pathlib.Path
as file paths.
In this particular fuction it would make sense to accept either a single path (Union[Path, str]
) (which would work for single-subject datasets), or a list of such paths.
You could alro re-use our movement.io.validators.ValidFile
validator for checking the paths, either directly, or via the _validate_file_path
utility that can be found in movement.io.save_poses
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my reasoning for not writing from movement dataset to NWB file on disk is because there are often times where you'd like to add other things to the NWB file before writing to disk.
Hey @edeno, a recently merged PR introduced some breaking changes that affect the sample data and tests. You'd have to update (again) from the main branch (via rebase or a merge commit), for the CI to work and to avoid any merge conflicts. Sorry about that. Also, do you need any help with making progress on this PR? Don't hesitate to poke me if you do. |
Hi @niksirbi, that's fine. Happy to update. Also thank you for your review. I haven't had time to work on this recently but I was intending to pick it up again when I have some more free time. |
No worries, this is not urgent. It's more important to get it right than to do it fast. Let me know when you get back to it (or if at any point you realise you can't find the time). |
I've written a "developer guide" for adding new Input/Output functionalities to |
for more information, see https://pre-commit.ci
Co-authored-by: Niko Sirmpilatze <[email protected]>
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
Co-authored-by: Niko Sirmpilatze <[email protected]>
Quality Gate failedFailed conditions See analysis details on SonarCloud Catch issues before they fail your Quality Gate with our IDE extension SonarLint |
That's completely fine, you've already done most of the work. I'll just have to go over it again and make sure it's consistent with the rest of our codebase. I'll ping you if I have concrete questions. |
See also https://github.com/talmolab/sleap-io/pull/143/files (linking for my own reference). |
hey guys, this looks great and very useful! Any chance this could be merged soon? I'm interested in using I'd like to give my two cents on two points:
def convert_nwb_to_movement(
nwb_filepaths: str | list[str] | list[Path],
key_name: str = "PoseEstimation",
) -> xr.Dataset:
...
pose_estimation = nwbfile.processing["behavior"][key_name]
|
Thank a lot for your input @luiztauffer. This PR is on my priority list, and I expect to complete it within the next ~2 weeks. Regarding the two points you made, I agree on both counts, and I will make sure to also include functions to read/write NWB file objects (not just from/to disk). I'd value your feedback on this feature, when its ready. Don't hesitate to mention any other requirements that come to mind and to file bug reports if/when necessary. |
great, @niksirbi! Let me know if I could help in any way |
Apologies for the delay @luiztauffer, I got caught up in other things, but I've resumed work on this now. I will soon close this PR as superseded by #360. @luiztauffer would you be OK with me pinging you when the new PR is ready to be reviewed? I'd like to make sure that the implementation covers your use cases. |
of course @niksirbi! thanks for your work on this |
Description
What is this PR
Why is this PR needed?
NWB is a standardized format for neuroscience data. As one of the goals of this package is to standardize output from pose estimation algorithms, it would be useful to output to NWB. In addition, importing from the ndx-pose extension would be useful as future algorithms could simply output to this format and then import into the common movement format.
What does this PR do?
Exports
skeletons
andPoseEstimation
objects from thendx-pose
extension.References
Fix #23
How has this PR been tested?
Added export example in examples.
Is this a breaking change?
No.
Does this PR require an update to the documentation?
Have not updated documentation yet.
Checklist:
To-Do
Notes:
The
ndx-pose
package on pypi is not updated yet (@rly). The current code utilizes a pip install directly from https://github.com/rly/ndx-pose/ however the pre-commit hooks would not allow me to add a pip dependency for installing directly from the github repo.The current export code does not include anything about the training of the models.