-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pose tracks io #33
Pose tracks io #33
Conversation
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #33 +/- ##
==========================================
+ Coverage 96.51% 97.74% +1.23%
==========================================
Files 5 8 +3
Lines 86 310 +224
==========================================
+ Hits 83 303 +220
- Misses 3 7 +4
☔ View full report in Codecov by Sentry. |
…g with xarray.DataArray names
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! Caught a few typos, left a few comments, questions, and suggestions here and there and some points to note and discuss (for future).
movement/io/validators.py
Outdated
@individual_names.validator | ||
def _validate_individual_names(self, attribute, value): | ||
if (value is not None) and (len(value) != self.tracks_array.shape[1]): | ||
log_and_raise_error( | ||
ValueError, | ||
f"Expected {self.tracks_array.shape[1]} `{attribute}`, " | ||
f"but got {len(value)}.", | ||
) | ||
|
||
@keypoint_names.validator | ||
def _validate_keypoint_names(self, attribute, value): | ||
if (value is not None) and (len(value) != self.tracks_array.shape[2]): | ||
log_and_raise_error( | ||
ValueError, | ||
f"Expected {self.tracks_array.shape[2]} `{attribute}`, " | ||
f"but got {len(value)}.", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are essentially doing the same things, we could consider writing this as a callable, e.g.,
def _make_list_length_validator(expected_idx: int):
def _validate_list_length(self, attribute, value):
"""Raise ValueError if the length of the list is not as expected."""
if (value is not None) and (
len(value) != self.tracks_array.shape[expected_idx]
):
raise log_and_raise_error(
ValueError,
f"Expected {self.tracks_array.shape[expected_idx]} `{attribute}`, "
f"but got {len(value)}.",
)
return _validate_list_length
individual_names: Optional[List[str]] = field(
default=None,
converter=converters.optional(_list_of_str),
validator=_make_list_length_validator(1),
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, we have some duplication there. But your proposed solution won't work I think. The _make_list_length_validator()
function needs to access self
(because of self.tracks_array.shape
), so it has to be a class method. Class methods are not available upon attribute initialisation, so we cannot do:
individual_names: Optional[List[str]] = field(
default=None,
converter=converters.optional(_list_of_str),
validator=self._make_list_length_validator(1),
)
It seems to be an inherent limitation in how attrs
internal work. I'll try to see whether I can find an alternative way of avoiding the duplication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative approach would be to write a generic list length validator:
def _validate_list_length(
attribute: str, value: Optional[List], expected_length: int
):
"""Raise a ValueError if the list does not have the expected length."""
if (value is not None) and (len(value) != expected_length):
raise log_error(
ValueError,
f"Expected `{attribute}` to have length {expected_length}, "
f"but got {len(value)}.",
)
Then the in-class validators become:
@individual_names.validator
def _validate_individual_names(self, attribute, value):
_validate_list_length(attribute, value, self.tracks_array.shape[1])
@keypoint_names.validator
def _validate_keypoint_names(self, attribute, value):
_validate_list_length(attribute, value, self.tracks_array.shape[2])
I kind of like this compromise, since it is compatible with attrs
and it gives as a re-usable generic list length validator, which we will probably need in other places as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _make_list_length_validator()
function does not need to be a class function nor does it need to access self
as it returns the callable validator _validate_list_length
in validator=_make_list_length_validator(1)
. This returned _validate_list_length
will receive the same (instance, attribute, value
) arguments as with the decorator approach. But I do agree that the alternative you suggested is easier to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a neat trick! I will keep this in mind for future use, to get out of similarly tricky situations.
For the case here, I will leave it as is now, in the interest of readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to go 🚀
Kudos, SonarCloud Quality Gate passed! 0 Bugs No Coverage information |
This PR thoroughly overhauls input/output (I/O) functionalities for pose estimations data. First I will give a summary of the added features, and then I will explain some of them a bit more in detail.
Main changes
DeepLabcut
andSLEAP
(closes Import data from pose estimation outputs #2)xarray.Dataset
(closes Define data model for representing points and trajectories #12)xarray.Dataset
to a DeepLabCut-style dataframes and save them as '.h5' (partially addresses Ability to export poses in various formats #13)attrs
classes instead ofPydantic
(closes Consider replacing Pydantic with attrs #24)Where to start
To get a better understanding of the new functionalities, it's best to start by reading the docs, which include:
Details
Loading pose estimation data
There are two main user-facing functions for this, in the
movement.io.load_poses
modulefrom_sleap_file()
: this currently loads SLEAP analysis.h5 files, and potentially also .slp files containing predicted instances (though this is experimental). Anyhow, the SLEAP docs themselves encourage using the .h5 files for downstream analysis.from_dlc_file()
: DeepLabCut stores pose estimation predictions in a pandas DataFrame, saved either as .h5 or as .csv. This function can load both. First, a pandas DataFrame is loaded from .h5 or .csv and then thefrom_dlc_df()
function is called. The user can also directly callfrom_dlc_df()
(for example if they have already imported the DeepLabCut DataFrame by other means).All loading functions return a
xarray.Dataset
object (see next point).Saving pose estimation data
Currently, I have only implemented
movement.io.save_poses.to_dlc_file()
which saves the data to a DeepLabCut-style .h5 or .csv file. Internally it calls themovement.io.save_poses.to_dlc_df()
function, which convert the data fromxarray.Dataset
topandas.DataFrame
before saving it.Common representation for pose tracking data
Regardless of where data is loaded from, pose tracks are represented in movement a xarray.Dataset, containing two data variables as xarray.DataArrays:
pose_tracks
andconfidence
.Pose tracks are essentially an array with shape (frames, individuals, keypoints, space), which is meant to capture a variety of tracking data. The dimensions are labelled with meaningful coordinates (see the relevant docs section for details).
The confidence array has shape (frames, individuals, keypoints), and holds point-wise prediction scores - i.e. point-wise confidence scores form SLEAP of the "likelihood" score from DeepLabCut. This will probably be useful later for preprocessing.
In the future, if we need to extend movement's
xarray.Dataset
object with specialised functionalities, the recommended way of doing that is through accessors instead of inheritance (subclassing). I've already started implementing aPosesAccessor
object (inmovement.io.poses_accessor.py
). Currently, it only implements avalidate()
method, but more methods can be added as needed.Validation
I wrote some custom validator classes with
attrs
mainly for validating the files from which the data is loaded or written to.They are organised into four classes for now:
See
movement.io.validators.py
and the API reference for more info.TODO before merging
Currently, the documentation workflow is configured to be deployed (also) from this branch, to facilitate the review process. After the review is done, I will revert it to its original configuration - i.e. deploy when a new release is made.