Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow preprocessing on dicts of recordings #3711

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

chrishalcrow
Copy link
Collaborator

@chrishalcrow chrishalcrow commented Feb 19, 2025

This PR would allow user to pass dicts of recordings to all preprocessing classes (for now: just common reference!)

This would allow for the following syntax:

import spikeinterface.full as si
rec, _ = si.generate_ground_truth_recording(num_channels=16)
rec.set_property(key="tetrode_number", values=[0,0,0,0,1,1,1,1,2,2,2,2,3,3,3,3])
pp_rec = si.common_reference(rec.split_by("tetrode_number"))
pp_rec
>>> {0: GroundTruthRecording (CommonReferenceRecording): 4 channels - 25.0kHz - 1 segments 
                       250,000 samples - 10.00s - float32 dtype - 3.81 MiB,
 1: GroundTruthRecording (CommonReferenceRecording): 4 channels - 25.0kHz - 1 segments 
                       250,000 samples - 10.00s - float32 dtype - 3.81 MiB,
 2: GroundTruthRecording (CommonReferenceRecording): 4 channels - 25.0kHz - 1 segments 
                       250,000 samples - 10.00s - float32 dtype - 3.81 MiB,
 3: GroundTruthRecording (CommonReferenceRecording): 4 channels - 25.0kHz - 1 segments 
                       250,000 samples - 10.00s - float32 dtype - 3.81 MiB}

which can be chained without problem. Also allows for users to take in two sessions from e.g. different days and ensure the same preprocessing steps are applied to them. Or helpful if you want to split your recording in half and sort seperately (UnitMatch almost do this).

This would certainly remove a few annoying for loops from our labs' pipeline.

We agreed to not touch the BaseRecording class. I made the define_function_or_dict_from_class , which looks through the args and kwargs for a recording or dict of recordings, then applies either the normal function (e.g. CommonReferenceRecording) or produces a dict, where each recording has had the normal function applied to it, keeping the original keys. I've copied the signature, doc and set a new name in the function, rather than calling another decorator (ala

@copy_signature(source_class)
) . I think this is simpler??

Also, note that this only works for the functions. So common_reference(dict_of_recs) is good but CommonReferenceRecording(dict_of_recs) does not work. This seems ok from the user point of view, but maybe there's some deeper issue with loading that I'm not thinking of??

@zm711
Copy link
Collaborator

zm711 commented Feb 19, 2025

Also, note that this only works for the functions. So common_reference(dict_of_recs) is good but CommonReferenceRecording(dict_of_recs) does not work. This seems ok from the user point of view, but maybe there's some deeper issue with loading that I'm not thinking of??

Also I think this was Sam who liked keeping this around right @alejoe91 ? I only ever use the function version of all the classes. And I remember a discussion somewhere on the repo about removing classes from init and only exposing the functions moving forward.

@alejoe91 alejoe91 added core Changes to core module preprocessing Related to preprocessing module labels Feb 20, 2025
@chrishalcrow
Copy link
Collaborator Author

Had another go and it feels more beautiful! Added the silence_period example because it has another arg (rather than kwarg). @alejoe91 @samuelgarcia, would appreciate feedback!

Copy link
Collaborator

@zm711 zm711 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple naming things to think about while you wait for Alessio and Sam.

return preprocessed_recordings_dict


def define_rec_or_dict_function(source_class, name):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same for symmetry maybe define_rec_or_dict_function_from_class so that we see that this function is the same as below function but to have a dict_function.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about: define_function_or_dict_from_class, which is the same as below, but I think is more accurate?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me. I guess the real full name would be define_function_or_dict_of_function_from_class but at some point the name is just too long :P.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can keep tho original name qnd explain what it does on the docstting for simplicity :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Zach is right: it's nice to have the same grammar as define_function_from_class. Also added docstrings to both new functions

@samuelgarcia
Copy link
Member

This looks good.
We should check the behavior when the input is a list of recordings for instance when concatenating.
What do we want here ?

@chrishalcrow
Copy link
Collaborator Author

This looks good. We should check the behavior when the input is a list of recordings for instance when concatenating. What do we want here ?

I think we should limit it to recordings or dicts of recordings. It encourages the user to label groups of recordings.

Also, if we use lists, we'd probably add a key when we save to keep track of things. But when a user loads it, they'd expect a list. To me, it complexifies the saving/loading.

Anyone have a good use case for lists?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Changes to core module preprocessing Related to preprocessing module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants